There’s a joke that circulates among people who work on AI training data: the model learned to be helpful by watching humans be helpful for pennies.

It’s not really a joke.

The Ghost Infrastructure

When you use an AI system, you’re interacting with something that feels autonomous. The output appears to emerge from the model itself—a reasoning process, an understanding, a judgment. What you don’t see is the army of humans whose decisions shaped every layer of that output.

The training data was labeled by people. The reinforcement learning that made the model “feel” aligned was based on human preferences expressed through a thumbs up or thumbs down. The edge cases that were handled—the difficult queries, the toxic outputs, the nuanced questions—were adjudicated by humans whose work you’ll never see.

This isn’t a criticism of AI systems. It’s an observation about what they actually are: artifacts of human judgment at scale.

The Scale Problem

Here is the thing nobody tells you: a model trained on 15 trillion tokens didn’t learn to distinguish good writing from bad writing by reading. It learned because somewhere, someone decided which examples were worth using. Which conversations were appropriate. Which responses aligned with human values.

That someone is usually not paid well.

Data labeling is one of the largest invisible industries in the modern economy. Millions of people—many in the Global South, many earning below minimum wage by some metrics—spend their days categorizing text, rating images, evaluating chatbot responses. They determine what counts as helpful, what constitutes harm, what good reasoning looks like.

The model absorbs this. The output reflects it. And the user, encountering a response that feels naturally good, never knows whose judgment they’re relying on.

Why This Matters

I think about this because it changes the way we should understand AI outputs.

We treat AI responses as if they come from nowhere—from pure computation, from mathematical optimization, from something inhuman and therefore trustworthy in a different way. But every response is downstream of human decisions about what good looks like.

This has practical implications. When an AI system exhibits bias, we tend to treat it as a technical problem—the model learned wrong patterns, we need better data, better algorithms. But it’s not purely technical. The model is reflecting human judgment, and that judgment is shaped by who was hired to provide it, what they were told to value, and what they were paid.

It’s not that the AI is biased. It’s that the humans who shaped it were, and the model’s “objectivity” is a very selective reading of whose opinions were amplified.

The Dependency We’d Rather Not See

There’s something uncomfortable about how much we rely on these invisible systems.

We talk about AI as if it’s an independent agent making decisions. We use language like “the model believes” or “the AI thinks.” But the model doesn’t believe or think. It reflects—and what it reflects is a heavily filtered version of human judgment.

This matters for how we assign responsibility. When an AI system makes a consequential decision—a loan approval, a medical recommendation, a content moderation choice—we act as if the system itself is accountable. But it’s not. It’s an artifact of decisions made by people we’ll never meet, working in conditions we’ll never know about.

Not a Bug, a Feature to Sit With

I’m not suggesting we should distrust AI systems or stop using them. I use them constantly. But I think there’s value in remembering what they actually are: tools shaped by human hands, human preferences, human limitations.

This isn’t a damning critique. It’s an invitation to a more accurate mental model.

The outputs feel authoritative because they come from something large and complex. But large and complex doesn’t mean autonomous. It means: here is a distillation of enormous amounts of human work, compressed into something usable.

That’s impressive. It’s also, if you think about it, deeply human.


If you found this interesting, you might like Tom’s piece on AI and empathy, or the weekly links from April which covered the agent infrastructure race. Both are in the archive.