How the AI Works - Vault Support

The Simple Truth About "AI"

When people hear "AI" and "machine learning," they often imagine something mysterious or impossibly complex. Here's the truth: it's just math. Very straightforward math, applied at scale.

The Core Idea: Linear Regression

Remember "draw a line through the dots" from math class? That's essentially what machine learning does -- just with millions of dots and in many dimensions instead of two.

When you learned to draw a "line of best fit" through scattered points on a graph, you were doing machine learning. The only difference is scale:

School: Find a line through 10 points in 2 dimensions
Vault: Find a "hyperplane" through millions of points in thousands of dimensions

Same concept. Different scale. That's it.

What Actually Happens When You Scan a Photo

Step 1: Image to Numbers (Vectorization)

Your photo is just a grid of colored pixels. Each pixel has three numbers (red, green, blue values from 0-255). A 640x640 photo becomes:

640 x 640 x 3 = 1,228,800 numbers

That's your photo as a vector -- just a long list of numbers.

Your Photo

Image

->

As Numbers

[0.2, 0.8, ...]

Step 2: Numbers x Weights = New Numbers (Matrix Multiplication)

The AI model is essentially a giant table of numbers (called "weights") that were learned during training. We multiply your photo's numbers by these weights:

Photo Vector x Model Weights = Result Vector

[1.2M numbers] x [weights matrix] = [new numbers]

This is just multiplication and addition -- the same operations from elementary school, done millions of times very fast.

Think of it like a recipe

If a cake recipe says "2 cups flour + 1 cup sugar + 3 eggs," you're multiplying quantities by weights and adding them up. Neural networks do the same: multiply inputs by learned weights, add them up, repeat.

Step 3: Layer After Layer (Deep Learning)

We repeat this process through multiple "layers":

Input

Photo

->

Layer 1

Edges

->

Layer 2

Shapes

->

Layer 3+

Objects

->

Output

Detection

Each layer extracts more abstract features. Early layers detect simple edges and colors. Later layers recognize complex patterns and objects.

Step 4: Read the Output (Classification)

The final layer produces numbers that represent confidence scores for each category. Higher numbers = more confident.

Output: {
  "sensitive_content": 0.87,  // 87% confident
  "safe_content": 0.13        // 13% confident
}

Why This Works

During training, the model saw millions of labeled photos:

"This photo contains sensitive content" -- adjust weights to output high confidence
"This photo is safe" -- adjust weights to output low confidence

After seeing enough examples, the weights converge to values that generalize to new photos. It's pattern recognition through statistics.

Key insight: The model doesn't "understand" photos the way humans do. It learned statistical patterns: "when I see these pixel patterns, the answer is usually X." It's sophisticated pattern matching, not comprehension.

Why On-Device Matters

All of this math happens on your iPhone's Neural Engine -- specialized hardware designed for exactly these matrix multiplications. This means:

Fast: Hardware acceleration makes it ~200 ms per photo
Private: Your photos never leave your device
Offline: No internet required -- the model is bundled in the app

Privacy by Design

We can't see your photos because they literally never leave your phone. The math happens entirely on your device. We only ship you the weights (the ~40 MB model file) -- your photos multiply against those weights locally.

The Bottom Line

Machine learning sounds fancy, but it's fundamentally:

Convert your photo to numbers
Multiply by learned weights (lots of them)
Add up the results
Repeat through multiple layers
Read the final confidence scores

That's it. Linear algebra at scale. The "intelligence" comes from the weights, which were learned by seeing millions of examples during training.

Now that you understand how detection works, learn about what the confidence scores mean.