AI doesn't hallucinate your photos. It edits them.

Every AI model that looks at an image makes editorial decisions: which details matter, which can be smoothed over, whether to add atmosphere or remove clutter. Most people using AI tools encounter these decisions often without noticing. I built Signal Drift to make them visible.

I created it for a university presentation on multi-model AI, using personal photos unlikely to appear in any training set: never shared, several taken the same day. Each model had to work from what it actually saw, not what it had memorized. The same patterns emerged across every subject.

Editorialization, not hallucination

Hallucinations are easier to catch because they look wrong: extra fingers, impossible text, objects that shouldn’t exist. Editorialization is subtler. The model sees your photo, then quietly decides what to focus on, what to emphasize, what to leave out. The generator does the same when it renders the description. Nothing looks obviously wrong at either stage.

Hallucinations are a known problem that models are being trained to reduce. Editorialization is harder to fix, because it’s not a bug. It’s a set of choices baked into how each model sees the world.

In a controlled experiment with the original right there, you can see the drift immediately. In everyday use (summarizing a document, describing data, drafting a report), you can’t. The model’s perspective becomes your perspective, silently, unless you think to check.

The experiments

Both experiments work like a telephone game: one model looks at a photo and describes it in words, then a different model generates a new image from that description alone, without ever seeing the original. Whatever gets lost, changed, or added along the way reveals each model’s editorial instincts.

The first experiment holds the description constant and changes the generator. ChatGPT’s vision model (GPT-5.2) writes one detailed description of each photo, and that same description is fed into three generators: ChatGPT (GPT Image), Copilot Designer, and Gemini 3 Pro. The generators never see the original image. Each one received the description with the same instruction: “Create a single square (1:1) image that matches the description as literally as possible. Do not add new objects or text. Do not change the setting, time of day, or art medium.” Any difference comes purely from the generator.

The second experiment flips it. Three vision models (GPT-5.2, Copilot, and Gemini 3.1 Pro) each look at the same photo and write their own description. All three descriptions are then fed into the same generator (ChatGPT), which never sees the original photo. Same photo, same generator, three different descriptions. Any difference in the output comes entirely from what each vision model chose to notice.

Here’s what came back. The original is always top-left.

Original | Described by ChatGPT | Described by Copilot | Described by GeminiA child’s palm tree drawing: three models described it, one generator (ChatGPT) rendered each description

Original | Described by ChatGPT | Described by Copilot | Described by GeminiA Montreal street mural: three models described it, one generator (ChatGPT) rendered each description

Original | Generated by ChatGPT | Generated by Copilot | Generated by GeminiA Montreal street festival: one description (by ChatGPT), three generators rendered it

Original | Generated by ChatGPT | Generated by Copilot | Generated by GeminiCountertop: one description (by ChatGPT), three generators rendered it

Original | Described by ChatGPT | Described by Copilot | Described by GeminiBasement table: three models described it, one generator (ChatGPT) rendered each description

What the editing looks like

Each model makes consistent editorial decisions that show up across subjects. Three patterns stood out.

Cultural erasure

Copilot consistently replaced culturally specific details with generic defaults.

Basement table: Copilot listed “Hedbanz,” “Battleship,” “Operation,” “Jenga,” “Double Ditto,” “Laughcronyms,” and “Deep Freeze.” None of these games are on the table. The actual boxes are French STEM toys (Meccano, Découvre la Lumière, Hydraulique en Solaire). It fabricated seven common English board game names out of thin air
Mural: Gemini named Beethoven, Armstrong, and Bach. ChatGPT described them as figures with physical details but no names. Copilot reduced them to “several illustrated individuals” sitting on what it called “a large wooden-plank surface,” the same stylized structures ChatGPT read as “piano-key-like bands.” The generator faithfully rendered Copilot’s description: dark, atmospheric figures on wooden benches. Every layer of cultural context (the music, the musicians, the artistic composition) was gone
Countertop: Copilot’s generator staged every object with breathing room and added two red apples that don’t exist in the original or the description. No model mentioned apples. The generator invented it because the scene template said it should be there. A lived-in surface became a real estate listing

Copilot describes the pattern, not the content. The schema, not the data. The basement and mural show it happening at the description stage. The countertop shows it happening at the rendering stage. French STEM toys become English board games. Musicians become anonymous figures. An apple appears from nowhere. The tidying happens at both ends of the pipeline.

Researchers call this cultural erasure, and my experiments surfaced what they’ve been documenting across the industry. A study presented at CHI 2026, the leading academic conference on human-computer interaction, found that 71.5% of culturally specific phrasing is erased during AI text processing. The researchers call this “cultural ghosting” and their conclusion is direct: it’s a design choice, not an inevitability. On the visual side, a study in Patterns found that AI image generators converge toward just 12 dominant motifs of “visual elevator music”: polished, commercially safe compositions regardless of the source material.

Researchers building CroissantLLM, a French-English bilingual model, found that major LLMs are up to 40% less efficient on French text, with cultural knowledge skewed toward American events. The Australian government’s Copilot evaluation (7,600 staff, dozens of agencies) found that Copilot was “prioritising western thinking” and had “challenges with using First Nations words, often leading to misspelt places or names.” Microsoft’s own transparency note acknowledges that “the language, image, and audio models that underlie the Copilot experience may include training data that can reflect societal biases.”

The pattern extends beyond language. Stanford researchers found that when generative AI models create stories about learners, Native and Indigenous students are overwhelmingly erased. When they appear at all, they’re depicted as “objects of study rather than learners.” At Concordia, a student in First Peoples studies reported that DALL-E “repeatedly generated stereotypical and racist representations of Black and Indigenous peoples,” with each image taking “several hours of prompting to get past racist, hypersexualized, problematic imagery.” Concordia researchers are now leading an international program to Indigenize AI, arguing that the current trajectory has “extended a legacy of Indigenous erasure.”

I manage web and mobile technology at Concordia University in Montreal, where French and English regularly coexist in meetings, emails, and documents. Like most organizations that run on Microsoft products, Copilot is our default AI tool. In my experiments, Copilot fabricated English products that weren’t there and erased the identities of recognizable figures. In a separate test, it anglicized French names on a gym chalkboard, turning “Pierre” into “Pierce” and “Mickael” into “Michael.” That was photos. The same model reads your documents, summarizes your meetings, drafts your emails, and generates your reports. What is it normalizing there?

Privacy safeguards as editorial decisions

OpenAI’s usage policies prohibit “using biometric systems for identification or assessment, including facial recognition.” Its vision models are trained to refuse identification requests, a restriction documented in the GPT-4V system card. In practice, this is an editorial decision baked into the platform.

Mural: Three models, three responses to the same identification problem. ChatGPT wrote 605 words of exhaustive physical detail: “a large portrait of a person with wild, swept-back hair and a stern expression.” It described Beethoven’s hair without knowing it was Beethoven. Copilot reduced the same figures to “several illustrated individuals.” Gemini used 383 words and named Beethoven, Armstrong, and Bach outright

Whether ChatGPT and Copilot recognized Beethoven and were told not to say, or genuinely couldn’t tell, the output is the same: a spectrum from total description to total erasure, with not a single name. The policy is invisible in the output. It’s a platform-level editorial decision. The description is the perspective.

Blind users discovered the same problem with Meta’s AI glasses: ask what colour shirt someone is wearing, and the model refuses. OpenAI’s policies also prohibit “inference regarding an individual’s emotions in the workplace and educational settings.” The safeguard is reasonable. The problem is that no one tells you it’s there.

Creative liberty

Gemini is the narrative director. Its strength is cinematic framing: atmospheric details that set a mood before the content arrives.

Street festival: Gemini didn’t document the scene. It directed one. The original photo shows wet ground from earlier rain, but clear skies and no umbrellas. Gemini added open umbrellas, implying it was actively raining. The generation prompt explicitly said “Do not add new objects.” Gemini added them anyway. The image is beautiful. It’s just not what happened

But creative liberty isn’t always a flaw.

My kid’s drawing: Where Copilot wrote “pink and purple marker strokes,” Gemini wrote “energetic, overlapping, broadly angled marker strokes in varying shades of bright pink, magenta, and light purple.” It noticed the energy of the drawing, not just the content
Basement table: Opens with atmosphere before inventory: “a heavily cluttered wooden desk in an unfinished room, likely a basement, with rough, grey concrete walls.” Setting the scene before naming a single object gave the generator a mood to work with, producing the grittiest, most realistic output

Copilot removes character. Gemini adds narrative. Both are forms of editorializing, and both are invisible unless you have the original to compare against. The difference is that Gemini’s instinct sometimes captures what the others flatten.

The editorial choices go deeper than content. The same drawing became oil pastel, digital clip-art, or coloured pencil depending on the describer. Three different descriptions, one generator, three different art media. The describer’s word choices didn’t just determine what was in the image. They determined what it was made of. Copilot used 196 words and the output was clip-art. ChatGPT used 638 and saw “a narrow orange-brown trunk with pink fill and darker brown crisscross lines making diamond shapes.” One saw a trunk. The other saw what a child actually drew.

In this experiment, I had the originals to compare. Cultural details got replaced, identities got withheld, atmosphere got invented. Most of the time, you don’t have the ability, the time, or even the reason to cross-check the original source. Nothing warns you that something changed. The model summarizes your document, describes your data, drafts your report, and you read the output as if it were neutral. It never is.

Try Signal Drift yourself →