Privacy-Preserving ML

Chapter 18: Responsible AI and Ethics 13 subtopics

I avoided the privacy side of machine learning for an embarrassingly long time. Every time I read about differential privacy, my eyes would glaze over at the Greek letters and I'd tell myself "I'll get to it later." Meanwhile, I was shipping models trained on user data with nothing but a vague "we anonymized it" in the docs. Then a colleague showed me a paper where researchers reconstructed patients' faces from a medical imaging model's gradients. Actual faces. From gradients. The discomfort of not understanding what was going on under the hood grew too great for me. Here is that dive.

Privacy-preserving machine learning is a collection of mathematical and systems techniques for training and deploying models without exposing individual data points. The field traces back to Cynthia Dwork's work on differential privacy in 2006 and Google's introduction of federated learning in 2016, but it has accelerated dramatically since regulations like GDPR (2018) and CCPA (2020) gave it legal teeth. It sits at the intersection of cryptography, statistics, and systems engineering.

Before we start, a heads-up. We're going to be talking about probability bounds, noise distributions, cryptographic protocols, and gradient mechanics, but you don't need to know any of that beforehand. We'll add each piece when we need it, with explanation.

This isn't a short journey, but I hope you'll be glad you came.

Contents

Why "anonymized" doesn't mean private
Privacy attacks: what an adversary actually does
What "private" means mathematically
Adding noise the right way
Privacy leaks with every question you ask
Training a model privately: DP-SGD
Rest stop
What if the data can't leave?
Federated learning is not automatically private
Computing on data you can't see
The law catches up: GDPR, CCPA, and machine unlearning
Synthetic data: fake records, real problems
The fundamental tradeoff

Why "Anonymized" Doesn't Mean Private

Let's start with a story that will ground everything we build in this section. Imagine three hospitals — City General, Valley Medical, and Riverside Clinic — each sitting on thousands of patient records. They want to collaborate on a disease prediction model. The obvious approach: pool all the data in one place, strip the names and social security numbers, and train a model on the combined dataset.

This is exactly what Netflix did in 2006 when they released 10 million movie ratings for a recommendation contest. They stripped usernames and figured that was enough. Researchers Arvind Narayanan and Vitaly Shmatikov cross-referenced the "anonymous" Netflix ratings with public IMDb reviews and re-identified specific users within weeks. The trick was surprisingly low-tech: if someone rated six or eight of the same movies on both platforms around the same dates, the overlap was enough to fingerprint them. Suddenly Netflix knew that user #482,931 had a name, and that name had a viewing history they might prefer to keep private.

Our three hospitals face the same problem. Even without names attached, a rare combination of diagnosis, age, and zip code can uniquely identify someone. A 34-year-old in zip code 02139 with a specific rare condition might be the only person matching that description in the entire dataset. "Anonymized" data is often a polite fiction. The fix isn't better anonymization. It's building systems where individual records are mathematically protected regardless of what an attacker knows.

That's where this story goes. But first, we need to understand what we're protecting against.

Privacy Attacks: What an Adversary Actually Does

I'll be honest — when I first heard the term "membership inference attack," I thought it sounded academic and impractical. Then I saw one work on a model I'd built. It shook me.

A membership inference attack asks a disarmingly pointed question: "Was this specific data point used to train this model?" The attacker feeds a sample to the model and looks at how the model responds. If the model is more confident on a particular input — lower loss, sharper probability distribution — that input was probably in the training set. Models tend to memorize their training data to some degree. They behave differently on data they've seen before versus data they haven't, and that behavioral gap is the crack an attacker exploits.

Here's a concrete example with our hospitals. Suppose the combined disease prediction model is deployed as an API. An attacker has a patient record — say, Bob's medical history — and wants to know whether Bob was a patient at one of those three hospitals. They feed Bob's record into the model and measure the confidence of the prediction. If the model is suspiciously confident about Bob's outcome compared to random inputs, there's a good chance Bob's data was in the training set. The attacker now knows Bob was treated at one of those hospitals, which might reveal something sensitive.

The original attack by Shokri et al. in 2017 trained "shadow models" that mimic the target model's behavior. You build several models on data you control, observe how they treat members versus non-members, then train a classifier to distinguish the two patterns. Recent work has shown these attacks succeed even with minimal information — sometimes even when you only see the model's top-1 prediction, not the full probability distribution.

Model inversion attacks go further. Instead of asking "was this person in the dataset?", they ask "what did this person's data look like?" The attacker uses the model's outputs to work backward toward the inputs. In a famous 2015 demonstration, Fredrikson et al. reconstructed recognizable face images from a facial recognition model given nothing but the model's API and a person's name. The reconstructed faces weren't perfect, but they were close enough to be disturbing.

And then there's training data extraction from large language models. Carlini et al. at Google showed in 2021 that GPT-2, when prompted with the right seed text, would sometimes regurgitate memorized training data verbatim — email addresses, phone numbers, even snippets of copyrighted books. The model had absorbed these rare, unique strings during training and, given the right nudge, would spit them back out. This isn't a theoretical risk. It's a demonstrated capability on production-scale models.

These three attack categories — membership inference, model inversion, and data extraction — form the threat landscape we need to defend against. Stripping names doesn't help against any of them. We need something stronger.

What "Private" Means Mathematically

Here's the core question: how do we define privacy in a way we can actually measure and guarantee? Hand-waving about "anonymization" and "de-identification" clearly isn't enough.

The answer came from Cynthia Dwork in 2006, and it's called differential privacy. The idea is elegant. Imagine our three hospitals pool their data into two versions of the same dataset: one that includes Bob's record, and one that doesn't. A mechanism (an algorithm, a query, a model) is differentially private if its output looks almost the same regardless of whether Bob is in the dataset or not. An adversary staring at the output can't tell, with any meaningful confidence, whether Bob participated.

Let's build this up with a tiny example. Suppose City General has exactly three patients, and we want to publish the average age. The patients are aged 30, 50, and 70. The true average is 50. Now remove the 70-year-old (imagine they requested deletion). The new average is 40. An adversary who knows the other two ages and sees the published average change from 50 to 40 can immediately deduce that the removed patient was 70. With three patients, every individual has massive influence on the output.

Differential privacy formalizes this vulnerability. The mathematical statement looks intimidating at first, but it's pinning down something intuitive:

P[M(D) ∈ S] ≤ e^ε · P[M(D′) ∈ S] + δ

D and D′ are datasets that differ by exactly one person's record. M is the mechanism — whatever computation we're performing. S is any set of possible outputs. That's the whole definition. Let's unpack the two knobs.

Epsilon (ε) controls how much the output can shift when one person's data is added or removed. Think of it as the thickness of a soundproof wall between your data and the outside world. When ε = 0, the wall is infinitely thick — the output is pure random noise, completely useless but perfectly private. When ε = ∞, there's no wall at all — the raw data leaks through. Practical deployments live somewhere in between. Apple uses ε between 2 and 8 for emoji frequency statistics. The 2020 US Census used ε ≈ 19.6, which was controversial — many statisticians argued the wall was too thin.

Delta (δ) is the probability that the whole guarantee fails catastrophically. It's a safety valve for the rare case where the mechanism accidentally leaks everything. You want this to be negligible — typically much smaller than 1/n, where n is the dataset size. A common choice is δ = 1/n². For our three-hospital dataset of, say, 10,000 patients, that would be δ = 10^-8.

Back to our tiny example. With three patients and a true average of 50, any change in the average is highly informative. With 10,000 patients, adding or removing one person barely moves the average. Differential privacy works much better when datasets are large, because each individual's influence is naturally diluted. The noise we add just has to cover whatever residual influence remains.

I'll be using the soundproof wall analogy throughout. ε is the wall's thickness. Every time we make a query or train another epoch, we're drilling a small hole in that wall. The question is always: how many holes can we drill before the wall stops being useful?

Adding Noise the Right Way

Knowing the definition is one thing. Actually making a computation differentially private requires adding carefully calibrated noise. Too little noise and privacy leaks. Too much and the results are garbage. The calibration depends on a property called sensitivity — how much can a single person's data change the output?

Let's go back to our hospital average age example, but now with all 10,000 patients. If ages range from 0 to 120, the sensitivity of the average query is (120 - 0) / 10,000 = 0.012. That's the maximum amount the average can shift when one record changes. A query on a small dataset of 3 patients would have sensitivity (120 - 0) / 3 = 40. The smaller the sensitivity, the less noise we need to add to protect privacy.

The Laplace mechanism is the simplest approach. Take the true answer, add noise drawn from a Laplace distribution with scale equal to sensitivity/ε. It satisfies what's called "pure" (ε, 0)-differential privacy — the δ is exactly zero, meaning no probability of catastrophic failure. For our hospital example with 10,000 patients and ε = 1, we'd add Laplace noise with scale 0.012. The published average would be within a fraction of a year of the truth. That's a small price for mathematical privacy guarantees.

The Gaussian mechanism adds noise from a normal distribution instead. It satisfies the slightly weaker (ε, δ)-DP, meaning there's a tiny δ probability of failure. But here's why people use it anyway: it composes much more tightly across multiple queries. When you're going to run hundreds of computations on the same data — which is exactly what happens during model training — the Gaussian mechanism lets you get away with less total noise.

import numpy as np

def laplace_mechanism(true_answer, sensitivity, epsilon):
    scale = sensitivity / epsilon
    return true_answer + np.random.laplace(0, scale)

def gaussian_mechanism(true_answer, sensitivity, epsilon, delta):
    sigma = sensitivity * np.sqrt(2 * np.log(1.25 / delta)) / epsilon
    return true_answer + np.random.normal(0, sigma)

# Our hospital example: average age of 10,000 patients
true_avg_age = 47.3
sensitivity = 120.0 / 10000  # max age range / number of patients

# With epsilon = 1.0, the noise is tiny relative to the answer
dp_avg_laplace = laplace_mechanism(true_avg_age, sensitivity, epsilon=1.0)
dp_avg_gaussian = gaussian_mechanism(true_avg_age, sensitivity, epsilon=1.0, delta=1e-8)

print(f"True average age:    {true_avg_age:.1f}")
print(f"Laplace DP average:  {dp_avg_laplace:.1f}")
print(f"Gaussian DP average: {dp_avg_gaussian:.1f}")
# Both will be very close to 47.3 — the large dataset makes this easy

The noise scale tells the whole story. For our 10,000-patient hospital at ε = 1, the Laplace scale is 0.012 years — about 4 days. The published average age would be off by, at most, a few days. Nobody would notice. Now imagine the same query on a 3-patient dataset: the scale becomes 40 years. The noise would completely obliterate the answer. This is the fundamental tension of differential privacy: it works beautifully at scale and struggles terribly with small datasets.

Privacy Leaks with Every Question You Ask

Here's where things get uncomfortable. Every time you query a differentially private dataset, you spend some of your privacy budget. Ask one question at ε = 1, and you've used 1 unit. Ask a hundred questions, and the privacy guarantee degrades. Our soundproof wall gets another hole drilled in it with each query.

The simplest accounting is called basic composition: k queries at ε each cost kε total. A hundred queries at ε = 1 each give you ε = 100 total — which is terrible privacy. Basically no wall left.

Fortunately, basic composition is pessimistic. It assumes the worst case every time. Advanced composition gives a tighter bound: k queries at ε each cost roughly ε√(2k · ln(1/δ)) total. For our 100 queries, that's roughly ε√(200 · 18) ≈ 60ε instead of 100ε. Better, but the square root is doing a lot of work.

The tightest tracking comes from Rényi Differential Privacy and the moments accountant, introduced by Abadi et al. in 2016. Instead of tracking privacy loss as a single number, the moments accountant tracks the entire distribution of privacy loss across training steps. This approach is what makes DP model training practical at all — without it, even a few dozen epochs would blow through any reasonable privacy budget. The Opacus library uses this internally, which is why it can train models for hundreds of epochs and still report meaningful (ε, δ) guarantees at the end.

I'm still developing my intuition for why the moments accountant works so much better than simple composition. The high-level idea is that worst-case privacy losses are unlikely to all stack up simultaneously — the accountant exploits the fact that random noise tends to cancel out over many steps. But I won't pretend I find the proof intuitive.

Training a Model Privately: DP-SGD

Now we reach the payoff. We know how to add noise to a single query. But training a neural network involves thousands of gradient updates, each one computed from training data. How do we make that entire process differentially private?

The answer is DP-SGD (Differentially Private Stochastic Gradient Descent), introduced by Abadi et al. in the same 2016 paper. It modifies standard SGD in two ways, and both are essential.

First, per-example gradient clipping. In standard SGD, you compute the gradient for each example in a batch, average them, and update the model. In DP-SGD, before averaging, you clip each individual gradient to have a maximum norm C. If one patient's record produces a gradient with norm 50 while the clipping threshold is C = 1, that gradient gets scaled down to norm 1. This bounds the sensitivity — no single training example can influence the update by more than C.

Second, Gaussian noise addition. After clipping and averaging the gradients, you add Gaussian noise calibrated to the clipping threshold. The noise scale is σ = noise_multiplier × C / batch_size. This makes each gradient update differentially private.

Let's trace through one step with our hospital model. Suppose we have a batch of 64 patient records. We compute 64 individual gradients. Patient #17 has a rare condition that produces an unusually large gradient (norm = 12.5). We clip it to norm C = 1.0, which divides it by 12.5. All 64 clipped gradients get averaged, then we add Gaussian noise with scale σ × 1.0 / 64. The model updates. Nobody's individual record dominated the step, and the noise ensures the step looks statistically similar whether patient #17 was in the batch or not.

import torch
import torch.nn as nn

def dp_sgd_step(model, loss_fn, batch_x, batch_y,
                max_grad_norm=1.0, noise_multiplier=1.1, lr=0.01):
    per_example_grads = []

    for x_i, y_i in zip(batch_x, batch_y):
        model.zero_grad()
        loss = loss_fn(model(x_i.unsqueeze(0)), y_i.unsqueeze(0))
        loss.backward()
        grads = [p.grad.clone() for p in model.parameters()]
        per_example_grads.append(grads)

    # Clip: bound each example's influence
    clipped_grads = []
    for grads in per_example_grads:
        total_norm = torch.sqrt(sum(g.norm()**2 for g in grads))
        clip_factor = min(1.0, max_grad_norm / (total_norm + 1e-8))
        clipped_grads.append([g * clip_factor for g in grads])

    # Noise: make the update indistinguishable
    batch_size = len(batch_x)
    noise_scale = noise_multiplier * max_grad_norm / batch_size

    with torch.no_grad():
        for i, param in enumerate(model.parameters()):
            avg_grad = torch.stack([g[i] for g in clipped_grads]).mean(dim=0)
            noise = torch.normal(0, noise_scale, size=avg_grad.shape)
            param -= lr * (avg_grad + noise)

The clipping does the heavy lifting for sensitivity bounding. The noise does the heavy lifting for indistinguishability. Together, they make each update (ε, δ)-differentially private. The moments accountant tracks the cumulative cost across all training steps.

There's a real cost. DP-SGD typically drops accuracy by 2 to 10 percentage points compared to standard training. On CIFAR-10, state-of-the-art DP models at ε = 8 reach about 96% accuracy versus 99% without DP. On large datasets like ImageNet, the gap is smaller because noise gets averaged over more examples. On small medical datasets — like our three hospitals might have — the gap can be devastating. I've seen cases where DP-SGD reduced a model from "clinically useful" to "no better than guessing." Always benchmark your specific task before committing to a privacy budget.

For production work, use Opacus (PyTorch) or TF Privacy (TensorFlow). They handle per-example gradient computation through clever hooks, clip and noise automatically, and track the privacy budget via the moments accountant so you can report exact (ε, δ) at the end of training.

Rest Stop

Congratulations on making it this far. You can stop here if you want.

You now have a working mental model of how privacy attacks work (membership inference, model inversion, data extraction), what differential privacy means (ε controls how much one person's data can influence the output, δ is the probability the guarantee fails), and how to train a model with formal privacy guarantees (DP-SGD: clip gradients, add noise, track the budget).

That's genuinely useful. If someone asks "how do we make this model private?" you can give a real answer, not a hand-wave about anonymization.

But there's a problem we haven't addressed: our three hospitals can't necessarily pool their data at all, even with DP. Maybe regulations forbid patient records from leaving the hospital network. Maybe the data is too large to transfer. Maybe the hospitals don't trust each other. What do we do when the data can't move?

There's also the question of what happens when a patient demands their data be deleted from a trained model, or when we need to compute on encrypted data without ever decrypting it. Those are the problems the rest of this section tackles.

If the discomfort of those open questions is nagging at you, read on.

What if the Data Can't Leave?

Back to our hospitals. City General has 5,000 cardiology records. Valley Medical has 3,000. Riverside Clinic has 2,000. Regulations say patient data cannot leave the hospital's network. A centralized DP-SGD approach is off the table because there's no central dataset to train on.

The insight behind federated learning is to flip the script: instead of sending data to the model, send the model to the data. The algorithm, called FedAvg (Federated Averaging), works like this.

A central server holds a global model — say, an initial random neural network. It sends a copy to each hospital. Each hospital trains the model on its local data for a few epochs, producing a slightly different version. The hospitals send their updated models (not their data) back to the server. The server averages the three updates, weighted by how much data each hospital contributed. This averaged model becomes the new global model, and the cycle repeats.

Let's trace through one round. The server sends model version v0 to all three hospitals. City General trains on its 5,000 records for 5 epochs, producing v1_city. Valley Medical produces v1_valley. Riverside produces v1_riverside. The server computes a weighted average: (5000 × v1_city + 3000 × v1_valley + 2000 × v1_riverside) / 10000. That becomes v1_global. Send it back out. Repeat for 50 or 100 rounds.

import torch
import torch.nn as nn
from copy import deepcopy

class FederatedServer:
    def __init__(self, model):
        self.global_model = model

    def aggregate(self, client_states, client_sizes):
        total = sum(client_sizes)
        new_state = {}
        for key in self.global_model.state_dict():
            new_state[key] = sum(
                client_states[i][key] * (client_sizes[i] / total)
                for i in range(len(client_states))
            )
        self.global_model.load_state_dict(new_state)

def local_train(model_state, model_fn, data, labels, epochs=5, lr=0.01):
    model = model_fn()
    model.load_state_dict(model_state)
    optimizer = torch.optim.SGD(model.parameters(), lr=lr)
    loss_fn = nn.CrossEntropyLoss()
    for _ in range(epochs):
        optimizer.zero_grad()
        loss = loss_fn(model(data), labels)
        loss.backward()
        optimizer.step()
    return deepcopy(model.state_dict())

# One round: server distributes, hospitals train, server aggregates
# No patient record ever leaves its hospital

No patient record crosses the hospital's network boundary. City General's patients never learn that Valley Medical exists, and vice versa. The only thing shared is model parameters — numbers that describe the model's weights, not individual data points.

But here's where the soundproof wall analogy comes back. Our three hospitals have very different patient populations. City General is in a dense urban area — lots of hypertension, diabetes, gunshot wounds. Riverside Clinic is rural — more agricultural injuries, fewer specialists. The data at each hospital is non-IID (not independently and identically distributed), and FedAvg was designed under the assumption that data is roughly uniform across clients. When data is highly heterogeneous, FedAvg can diverge: the local models pull in such different directions that the average ends up in a no-man's-land that works poorly for everyone.

FedProx is the standard fix. It adds a penalty term to each hospital's local training: μ/2 · ||w - w_global||². Think of it as an elastic band connecting each hospital's model to the global model. The hospital can drift away from the global model to fit its local data, but the elastic band pulls it back, preventing it from wandering too far. Setting μ between 0.01 and 1.0 works for most cases.

There's also a fundamental communication cost. Every round, each hospital uploads and downloads a full model. For a model with 100 million parameters at 32-bit precision, that's 400 MB per round per hospital. Over 100 rounds, that's 40 GB of transfers per hospital. Techniques like gradient compression (sending only the top-k largest gradients), quantization (reducing precision from 32-bit to 8-bit), and fewer but longer local rounds can help, but they all interact with the non-IID problem in ways that require careful tuning.

Federated Learning Is Not Automatically Private

I need to correct an oversimplification. I said the hospitals send "model parameters, not individual data points," and that's true. But model parameters can leak individual data points.

In 2019, Zhu et al. demonstrated a gradient inversion attack that reconstructed individual training images from a single gradient update with startling fidelity. The attacker starts with a random image, computes what gradient that fake image would produce, compares it to the real gradient the hospital shared, and iteratively adjusts the fake image until the gradients match. When they match, the fake image looks remarkably like the real training image. Actual faces. From gradients.

So "data stays on device" does not equal "data is private." Our soundproof wall has a crack: the gradient updates themselves carry information about individual records.

Secure aggregation patches one part of this problem. It ensures the central server sees only the sum of all hospitals' updates, never an individual hospital's update. The technique uses cryptographic masking: City General adds a random mask +r₁ to its update, and Valley Medical adds -r₁. When the server sums the two updates, the masks cancel out, leaving the true aggregate. The server never sees either individual update. Google deployed this for their GBoard keyboard predictions in 2017.

But secure aggregation alone isn't enough. An adversary who controls the server and two of the three hospitals can still infer the third hospital's update by subtracting. For real privacy guarantees, you need all three layers: federated learning (keep data local) + secure aggregation (hide individual updates) + differential privacy noise on each update (make individual contributions statistically invisible). It's belts, suspenders, and safety pins.

Computing on Data You Can't See

Secure aggregation is a special case of a more general idea: what if multiple parties could jointly compute a result without any party seeing the others' inputs? This is secure multi-party computation (SMPC).

The core technique is secret sharing. Suppose City General has a patient count it wants to contribute to a joint computation without revealing the exact number to the other hospitals. It splits the number into random "shares" — for instance, the number 5000 becomes three shares: 1847, -923, and 4076 (which sum to 5000). One share goes to each hospital. No single hospital can reconstruct the original number. But when the hospitals combine their shares at the end, the computation comes out correct.

Addition on secret shares is free — each party adds their shares locally, and the results sum correctly. Multiplication is harder — it requires a round of communication between parties, which is where the overhead comes from. For simple operations like aggregation and linear regression, SMPC is practical. For training a neural network with billions of parameters, the communication overhead makes it impractical — every multiplication in every forward and backward pass would need inter-party communication.

Then there's the dream scenario: homomorphic encryption (HE). With HE, you can perform computations directly on encrypted data. A hospital encrypts its patient records, sends the ciphertext to a cloud server, the server runs the model on the encrypted data, and sends back an encrypted result. The hospital decrypts the result. At no point did the server see the actual patient data.

I'll be honest — the first time I read about fully homomorphic encryption, I didn't believe it could work. Computing on encrypted data sounds like magic. The reality is less magical: fully homomorphic encryption is 1,000 to 1,000,000 times slower than plaintext computation. A neural network inference that takes 10 milliseconds in plaintext might take 10 seconds to 3 hours under FHE. Libraries like Microsoft SEAL and Zama's Concrete-ML have made impressive progress, especially for simple models — logistic regression on encrypted medical records is practical today. But running a large transformer on encrypted data? Not yet.

The practical hierarchy today looks like this: for model training, use federated learning + DP. For secure aggregation and small computations, SMPC works well with 2 to 10 parties. For inference on sensitive data where latency is acceptable, HE is becoming viable. Each technique protects against different threats, and production systems often layer them.

The Law Catches Up: GDPR, CCPA, and Machine Unlearning

Everything we've built so far addresses a technical question: how do we protect privacy mathematically? But there's also a legal question that has real consequences for system design.

The GDPR (General Data Protection Regulation), which took effect in the EU in 2018, established several principles that directly impact ML engineering. Data minimization means you can only collect features you actually need — that 200-column user profile table requires justification for every column. Purpose limitation means data collected for disease prediction can't be repurposed for insurance scoring without new consent. Right to explanation (Article 22) means automated decisions with significant effects require meaningful explanation — which ties directly to interpretability.

But the most technically challenging requirement is Article 17: the right to erasure. A patient at City General can demand that their data be deleted. Deleting the row from a database is easy. But what about the model that was trained on their data? The model's weights were shaped by that patient's record. In some sense, the model "remembers" the patient. Does it need to be retrained from scratch?

For our three-hospital consortium, retraining the federated model from scratch every time a patient exercises their erasure rights could cost tens of thousands of dollars in compute and take days. That's the machine unlearning problem, and it's arguably the hardest open challenge in privacy engineering.

The CCPA (California Consumer Privacy Act) creates similar requirements in the US. While it's less prescriptive than GDPR about what "deletion" means for models, the trajectory is clear: regulators worldwide are moving toward requiring that individuals can be removed from trained models.

The cleverest engineering solution is SISA training (Sharded, Isolated, Sliced, and Aggregated). Instead of training one big model on all the data, you partition the data into, say, 10 shards and train a separate sub-model on each shard. Predictions come from ensembling all 10 sub-models. When a patient requests deletion, you only retrain the single shard that contained their data — one-tenth the cost of a full retrain.

class SISATrainer:
    def __init__(self, model_fn, num_shards=10):
        self.model_fn = model_fn
        self.num_shards = num_shards
        self.shard_models = []
        self.record_to_shard = {}

    def assign_record(self, record_id, shard_idx):
        self.record_to_shard[record_id] = shard_idx

    def train_shard(self, shard_idx, data, labels, epochs=10, lr=0.01):
        model = self.model_fn()
        optimizer = torch.optim.SGD(model.parameters(), lr=lr)
        loss_fn = nn.CrossEntropyLoss()
        for _ in range(epochs):
            optimizer.zero_grad()
            loss = loss_fn(model(data), labels)
            loss.backward()
            optimizer.step()
        self.shard_models[shard_idx] = model

    def predict(self, x):
        logits = []
        for m in self.shard_models:
            m.eval()
            with torch.no_grad():
                logits.append(m(x))
        return torch.stack(logits).mean(dim=0)

    def unlearn(self, record_id, remaining_data, remaining_labels):
        shard_idx = self.record_to_shard.pop(record_id)
        # Retrain only the affected shard — 1/10th the compute
        self.train_shard(shard_idx, remaining_data, remaining_labels)

Exact unlearning means the retrained model is identical to one that never saw the deleted record. SISA achieves this. Approximate unlearning uses gradient-based methods to "nudge" the model away from the deleted data point's influence — much faster, but hard to verify. Regulators haven't yet settled on which standard they'll require, and my favorite thing about this corner of the field is that the legal and technical definitions of "forgotten" are still being actively negotiated.

Synthetic Data: Fake Records, Real Problems

There's an appealing idea floating around: instead of protecting real data, why not generate fake data that has the same statistical properties? Train a generative model — a GAN, a VAE, a diffusion model — on the real hospital records, then publish the synthetic dataset instead. The synthetic records don't correspond to real patients, so there's nothing to re-identify.

In theory, this sounds like a free lunch. In practice, it's more like a discounted lunch with hidden fees.

Without differential privacy in the training process, the generative model itself can memorize individual records. A membership inference attack on the GAN can reveal whether a specific patient's data was used to train it. The synthetic data looks different from the real data, but the model that produced it still carries the information. A DP-GAN — a GAN trained with DP-SGD — provides formal guarantees, but at a cost: synthetic data quality drops significantly, with downstream model accuracy falling 5 to 20 percentage points at ε between 2 and 5.

Synthetic data is genuinely useful for one thing: development and testing. Engineers can build and debug pipelines against synthetic records without ever touching real patient data. But for regulatory compliance or publishable analysis, you need either differential privacy guarantees on the generative model or a formal privacy analysis showing the synthetic data doesn't leak. Think of synthetic data as a development convenience, not a privacy guarantee.

The Fundamental Tradeoff

Everything in this section comes back to one tension: privacy and utility pull in opposite directions. Every technique we've covered — DP noise, gradient clipping, federated training, encryption overhead — trades away some model performance or computational efficiency for privacy protection.

Our three hospitals face this directly. With no privacy protections, they could pool all records and train the best possible model. With DP-SGD at ε = 1 (strong privacy), the model might lose 8 percentage points of accuracy — potentially the difference between clinically useful and clinically useless. With federated learning, the non-IID data distribution adds convergence challenges on top of whatever DP noise they add. With homomorphic encryption, inference takes minutes instead of milliseconds.

There is no magic setting that gives you perfect privacy and perfect accuracy. But there are ways to push the frontier. Larger datasets help enormously — noise gets averaged over more examples. Better privacy accounting (Rényi DP, the moments accountant) lets you squeeze more utility out of the same privacy budget. Pre-training on public data and fine-tuning with DP on private data is emerging as a powerful pattern — the model learns general features from public data and only needs a small DP-protected fine-tuning step on the sensitive data.

The honest answer for where to set ε is: it depends on what you're building, who you're protecting, and what happens if privacy fails. A medical diagnosis model where leakage could reveal HIV status demands a very different ε than an emoji frequency counter. There is no universal right answer, and anyone who tells you otherwise is selling something.

Closing Thoughts

If you're still with me, thank you. I hope it was worth it.

We started with the uncomfortable truth that "anonymized" data isn't anonymous — the Netflix attack proved that in 2006, and the same class of attack still works today. We built up the mathematical language of differential privacy from a tiny three-patient example, added noise mechanisms, and tracked how privacy degrades with use. We learned to train models privately with DP-SGD, discovered that federated learning isn't automatically private, explored cryptographic techniques for computing on data we can't see, wrestled with the legal reality of GDPR's right to erasure, and stared at the fundamental tension between privacy and utility.

My hope is that the next time someone on your team says "we'll anonymize the data," instead of nodding along, you'll ask "what's our epsilon?" — having a pretty solid mental model of what's going on under the hood, and what the tradeoffs actually are.

Resources

Dwork & Roth, "The Algorithmic Foundations of Differential Privacy" (2014) — The definitive textbook on DP. Dense but complete. If you read one thing on the theory, make it this. Available free online.

Abadi et al., "Deep Learning with Differential Privacy" (2016) — The paper that introduced DP-SGD and the moments accountant. The reason private model training is practical at all.

McMahan et al., "Communication-Efficient Learning of Deep Networks from Decentralized Data" (2017) — The FedAvg paper. Still the foundation of every federated learning system deployed at scale.

Carlini et al., "Extracting Training Data from Large Language Models" (2021) — The paper that made everyone take LLM memorization seriously. Unforgettable demonstrations of GPT-2 regurgitating phone numbers and addresses.

Opacus (opacus.ai) — Meta's PyTorch library for DP-SGD. Production-quality code with excellent tutorials. Start here if you want to train a differentially private model this afternoon.

Narayanan & Shmatikov, "Robust De-anonymization of Large Sparse Datasets" (2008) — The Netflix deanonymization paper. A masterclass in why removing names from datasets doesn't protect anyone.

← Previous Explainability & Interpretability Next → ML Security