How the AI Works
It's just math, not magic. Here's what's actually happening.
The Simple Truth About "AI"
When people hear "AI" and "machine learning," they often imagine something mysterious or impossibly complex. Here's the truth: it's just math. Very straightforward math, applied at scale.
The Core Idea: Linear Regression
Remember "draw a line through the dots" from math class? That's essentially what machine learning does -- just with millions of dots and in many dimensions instead of two.
When you learned to draw a "line of best fit" through scattered points on a graph, you were doing machine learning. The only difference is scale:
- School: Find a line through 10 points in 2 dimensions
- Vault: Find a "hyperplane" through millions of points in thousands of dimensions
Same concept. Different scale. That's it.
What Actually Happens When You Scan a Photo
Step 1: Image to Numbers (Vectorization)
Your photo is just a grid of colored pixels. Each pixel has three numbers (red, green, blue values from 0-255). A 640x640 photo becomes:
640 x 640 x 3 = 1,228,800 numbers
That's your photo as a vector -- just a long list of numbers.
Step 2: Numbers x Weights = New Numbers (Matrix Multiplication)
The AI model is essentially a giant table of numbers (called "weights") that were learned during training. We multiply your photo's numbers by these weights:
Photo Vector x Model Weights = Result Vector
[1.2M numbers] x [weights matrix] = [new numbers]
This is just multiplication and addition -- the same operations from elementary school, done millions of times very fast.
Think of it like a recipe
If a cake recipe says "2 cups flour + 1 cup sugar + 3 eggs," you're multiplying quantities by weights and adding them up. Neural networks do the same: multiply inputs by learned weights, add them up, repeat.
Step 3: Layer After Layer (Deep Learning)
We repeat this process through multiple "layers":
Each layer extracts more abstract features. Early layers detect simple edges and colors. Later layers recognize complex patterns and objects.
Step 4: Read the Output (Classification)
The final layer produces numbers that represent confidence scores for each category. Higher numbers = more confident.
Output: {
"sensitive_content": 0.87, // 87% confident
"safe_content": 0.13 // 13% confident
}
Why This Works
During training, the model saw millions of labeled photos:
- "This photo contains sensitive content" -- adjust weights to output high confidence
- "This photo is safe" -- adjust weights to output low confidence
After seeing enough examples, the weights converge to values that generalize to new photos. It's pattern recognition through statistics.
Key insight: The model doesn't "understand" photos the way humans do. It learned statistical patterns: "when I see these pixel patterns, the answer is usually X." It's sophisticated pattern matching, not comprehension.
Why On-Device Matters
All of this math happens on your iPhone's Neural Engine -- specialized hardware designed for exactly these matrix multiplications. This means:
- Fast: Hardware acceleration makes it ~200 ms per photo
- Private: Your photos never leave your device
- Offline: No internet required -- the model is bundled in the app
Privacy by Design
We can't see your photos because they literally never leave your phone. The math happens entirely on your device. We only ship you the weights (the ~40 MB model file) -- your photos multiply against those weights locally.
The Bottom Line
Machine learning sounds fancy, but it's fundamentally:
- Convert your photo to numbers
- Multiply by learned weights (lots of them)
- Add up the results
- Repeat through multiple layers
- Read the final confidence scores
That's it. Linear algebra at scale. The "intelligence" comes from the weights, which were learned by seeing millions of examples during training.
Now that you understand how detection works, learn about what the confidence scores mean.