How I built Signal Drift

Every year, Concordia University’s Library hosts Digital Skill-Share Days, a 2-day collaborative learning event where staff and faculty share digital skills with each other. It started in 2020 during the pandemic to meet the demand for digital upskilling, and it’s been running ever since.

The 2026 edition featured a Lightning Round on Artificial Intelligence and Data on Tuesday, February 24. Four speakers, 5 minutes each to present plus 5 minutes for Q&A, covering everything from the environmental cost of AI to how data work is changing.

I manage web and mobile technology at Concordia, and the title I’d committed to was: “The Multi-Model Approach: Getting Better Answers by Switching Perspectives.” My thesis was straightforward: using multiple AI models intentionally, switching between them based on what each does best, produces better results than sticking with just one. But saying that on a stage doesn’t prove much. I needed to show it with a real project.

I started where most projects start: a messy brain dump. Six rough ideas in a markdown file: a site builder, a Kindle article aggregator, a Montreal events tool, a Kindle highlights re-surfacer, a project management dashboard, and a gym workout analyzer. None of them felt obviously right for a 5-minute talk. So I did the only logical thing for someone preparing a presentation about AI: I asked AI for help. Four of them, actually.

From the start, I kept notes, partly by hand, partly by asking each AI to document decisions and reasoning in a running markdown file as we went. That log became the backbone of this write-up.

Along the way, each model surprised me in ways I didn’t expect. This is the story of how that process turned into “Signal Drift.”

The prompt is the product

Before I asked any model for project ideas, I tried something: I asked them to help me ask the question better. I fed that brain dump into all four models with one rule: “Don’t answer my question yet. Just make the question better.”

Each model came back with something noticeably different:

Cursor + Opus 4.6Brain dump: six rough ideas into a markdown file, then asked Opus to analyze

ChatGPT (The Thorough Analyst) gave me a structurally exhaustive project charter. It demanded scoring matrices, deliverables, and assumption challenges.
Claude Sonnet 4.5 (The Storyteller) cut straight to decisions. It defined crisp success criteria and focused on the narrative arc of the eventual demo.
Gemini (The Big-Picture Creative) assigned itself a “Creative Director” persona, demanding I focus on the visual “aha!” moment to maximize audience impact.
Copilot (The Operations Manager) produced a highly structured “Idea Card” format and a weighted decision matrix. Enterprise DNA runs deep.

What stood out wasn’t just the different formats. It was how directly they pushed back on my assumptions. My brain dump was full of confident claims: “Opus is best,” “Gemini is probably best at searching because it’s Google.” I’d even pre-assigned roles before testing anything.

The models didn’t just go along with it. ChatGPT challenged the idea that the audience would care about whatever project I built. They’d care about the insight, not the pipeline. Gemini pointed out that a different colour palette doesn’t prove different intelligence, just different defaults. Sonnet warned that 5 model switches in 5 minutes would feel like a gimmick parade; two or three deep, visible swaps would land harder. And multiple models flagged that watching code scroll in a screen recording is just not interesting; the demo needed to show output transforming, not code being written.

That pushback was more useful than any of the actual ideas. It forced me to stop trying to build something technically impressive and focus on building something clear.

No single model nailed it. But together? They covered every angle. I took the best parts of all four and used Opus 4.6 (The Decisive Architect) to synthesize them into one massive, 1,500-word “Mega-Prompt.”

That hour of prompt refinement ended up saving me a ton of time later. The sharper the question, the better every model performed.

The ideation explosion and the “aha!” pivot

I fed the Mega-Prompt back into the models and let them run. The output was an explosion: over 50 concept cards, critiques, and matrices. The models were brutally honest. Opus, in particular, told me to definitively “kill” several of my original ideas. It correctly pointed out that building a PM dashboard or a workout analyzer might be useful, but they wouldn’t provide clear, visual proof of model switching in a 5-minute window.

To figure out what actually worked, I didn’t just read the ideas. I launched five quick proofs of concept in parallel using GPT Codex. Once I could actually see and touch the ideas, my evaluation completely changed.

GPT CodexFive quick POCs launched in parallel in Codex to stress-test each idea before committing

The Morning Dashboard was instantly eliminated: not enough visual proof of model switching. The Gym Coach Analyzer was too much work to build into something useful for a 5-minute lightning talk. But the “Translation Telephone Game,” had real potential. The idea was to translate a message across different languages and AI models to see what meaning survived the round-trip.

GPT CodexThe Translation Telephone Game POC, which had real potential

I decided to proceed with it and had Claude Code build the initial dashboard. It worked. It was clever. But as I tested it, a problem became obvious. Text comparisons on a screen just aren’t that engaging, especially for a live audience. Opus pointed this out bluntly during a critique session: the “aha!” moment needed to be visual and immediate. An image registers in 2 seconds; a text block takes 2 minutes to parse.

So, I pivoted. I dropped the translations entirely and focused strictly on images.

The build journey: four models, one spec

Once I had the concept, I didn’t just ask one model to build it. I orchestrated an entire pipeline.

First, I used Opus in Cursor to generate a full, multi-phase architectural plan. It broke down the project into steps and even generated a list of evocative project names (which gave birth to “Signal Drift”).

But before writing code, I consulted Gemini 3 Pro as a sounding board. It loved the plan but immediately caught a flaw: I needed a “Control Group.” If I was going to test how different models mutated content, I needed to show what happens when the same model handles every step as a baseline. That structural insight fundamentally improved the app.

Then came the actual coding. To test the agentic coding landscape, I took the exact same spec and ran four parallel rebuilds from scratch: Claude Code + Opus 4.6, Claude Code + Sonnet 4.5, GPT 5.3 Codex, and Gemini 3 Pro.

The results were stark. Codex was remarkably bad, not even worth tweaking. Gemini stumbled on the very first step, needing manual intervention to fix a breaking error, and ultimately produced a mediocre result. Sonnet 4.5 was the undisputed winner. It built a fantastic, working app incredibly fast. As a bonus, Opus burns through tokens incredibly fast in Claude Code, whereas Sonnet gave me stellar results while making my tokens last much longer.

Claude Code + Sonnet 4.5Claude Code rebuilds the entire app from a spec file in 8 minutes

After pivoting the app from text translations to image comparisons, the real detailed work began. I spent an evening iterating through UI details, juggling between Opus and Gemini. At one point, Opus completely messed up the JavaScript for the UI accordions, but I handed the broken code to Gemini, which fixed it instantly. That swap ended up being one of the smoothest fixes of the whole project.

Pivoted to image telephone: visual proof beats text comparisons on a projector

Finally, it was time for design polish. The app was looking very generic, the classic “AI slop” aesthetic. I prompted Gemini 3 Pro to completely redesign the frontend, explicitly telling it to avoid overused fonts (like Roboto) and clichéd colour schemes. Gemini delivered a beautiful, dark-themed UI. We then worked together to add a complex, animated particle background. I brought in Kimi K2 for some final artistic touches.

Looking back, the build was a constant baton-pass: Opus for the architectural blueprint, Gemini for structural critique and visual design, Sonnet for writing the actual code. No single model carried the whole thing.

Signal Drift: watching AIs hallucinate (and sanitize)

The final app, Signal Drift, visualizes how AI distorts visual information. Two experiments:

Same prompt, different generators. One highly detailed description of a photo fed to ChatGPT, Copilot, and Gemini to generate images.
Different describers, same generator. Three different models describe a photo, with all three text descriptions fed into a single generator (ChatGPT).

To pick the right image models to test, I didn’t guess. I asked Gemini Research to investigate the current landscape of image recognition and generation LLMs, specifically looking for popular ones that are surprisingly bad at certain tasks.

This is where things got really interesting. The results lined up with the personality traits I’d noticed during planning:

ChatGPT (The Perfectionist) tried to cram every single detail into the image, resulting in cluttered, forensic reconstructions. When tasked with describing a messy desk, it literally read every label on every box it could see.
Copilot (The Corporate Editor) sanitized everything. A cluttered basement table with board games piled at odd angles and toys spilling out of bins became a tidy display with boxes squared up in neat stacks. A messy gym chalkboard became a formatted, corporate spreadsheet.
Gemini (The Storyteller) sacrificed exact details to capture the mood, atmosphere, and cinematic lighting. It invented rain to make a street festival look more dramatic.

What surprised me most wasn’t the hallucinations. It was the sanitization. Copilot didn’t lie about what was on the chalkboard; it just quietly tidied reality into something that felt corporate and clean. In the demo, you can see it instantly because the original is right there. But in everyday AI use (summarizing a document, drafting a report, describing data) you’d never know reality got quietly edited unless you thought to check. That’s what makes silent editorializing dangerous: not that it’s hard to spot, but that nothing prompts you to look.

The ensemble of tools

From brainstorm to final CSS, every stage of Signal Drift ran through a different tool, each one paired with whichever model fit the task.

Tool	Models	Role
Cursor	Opus 4.6, Gemini Pro	Architectural planning, UI refinement, and deep prompt-chaining
Claude Code	Opus 4.6, Sonnet 4.5	Rapid, from-scratch app building directly in the terminal. Rebuilt the entire app from a spec in 8 minutes.
GPT Codex	GPT 5.3 Codex	Launching quick, parallel prototypes
Gemini CLI	Gemini Pro, Gemini 2.5	Terminal-based task execution and bug fixing
Antigravity	Gemini Pro, Gemini Flash	Experimental tasks

What I’d do differently

The biggest thing I took away from this project: sticking with one AI model would have given me a worse result. Each model had blind spots, and the only reason I caught them was because another model didn’t share them.

I’m treating models less like competing products now and more like colleagues with different strengths. Opus for architecture, Sonnet for fast reliable code, Gemini for design critique and creative direction. The orchestration (knowing when to swap) turned out to be the actual skill I was building the whole time.

5 minutes wasn’t enough to say all that on stage. But it was enough to show it.

Try Signal Drift yourself →

Read what Signal Drift revealed →