Vibe Coding
Tracer Bullets, and the Myth of Effortless AI
The video looked magical. A laser welder skating across metal, a clean seam appearing like someone was drawing with light. My dad, who’s been welding longer than I’ve been alive, watched two seconds and asked how much the clip was sped up. It hadn’t even occurred to me. Of course it was sped up. Reality is slower. Messier. Full of heat and patience and tiny corrections you don’t notice until you know where to look.
That’s how a lot of AI looks right now. Fast, glossy, sped up. You see the finished seam, not the passes. Not the pauses. Not the hands keeping the bead straight. The hype edits out the waiting. The hype edits in inevitability.
What I mean by “vibe coding”
I’ve been living in the gap between hither and yon, the space between the sped-up demo and the real thing, for a while. I didn’t have a name for it then. Later someone called it vibe coding, which is probably the neatest we’re going to get. You sit with a large language model. You talk through what you want. It drafts. You nudge. It revises. You argue. You glue the parts together and ask again. You keep going until something usable falls out. It’s pair programming with a cheerful amnesiac who wants to please you more than they want to tell you the truth.
I stumbled into all this before the ChatGPT boom. One of my first projects was small, a simple bash script to make posting with Jekyll less annoying. If you’ve never used Jekyll, it’s a static site generator. Markdown in, blog out, tidy. The tedious part is the front matter. Dates and tags and title and whatever else you promised future you you’d maintain.
So I wrote a script. It asked for the title, asked for the keywords, stamped the date, dropped the scaffold where it should go. Nothing fancy. It worked, mostly. I left notes in the README for things I meant to add. Then work got busy. Life got busy. The README turned into a graveyard of ideas.
After a while I tried OpenAI and knocked out those features super quick. It felt like cheating, without the guilt (kinda). I’m not a blank-file type of person. I’m happier staring at something that exists and reshaping it. The model was good at that. I’d describe what I wanted, it would sketch a path, I’d tweak. Suddenly all the tiny frictions that used to make me shelve small improvements were gone.
Montage speed vs. real speed
Right here’s about where the story cuts to a montage. Cool music. Quick cuts. New apps shipping every week. A reel of solved problems.
That’s the sped-up version.
The full-speed version has false starts. Hallucinated imports. Changed variables. Side roads that feel promising until you realise the model executed your tone, not your spec. The model will always choose pleasing you over disagreeing with you (unless you prompt it otherwise). If you don’t run your own checks, you’ll ship that smile straight into production. Smiles don’t keep data safe or guarantee up-time.
The part where people call it new is what grates. The tools are new, sure. Machine learning was already around when I wrote my thesis. We used decision trees to analyse malware samples. Go further back and you find Turing wondering if a machine could imitate a person convincingly enough to pass a test. People were thinking about the shape of thought when computers filled rooms and transistors were things you could drop on your foot. It’s not all miracles and unicorn farts.
The way advances get reported has baked-in distortion. Journalists need a story. Companies need a win. Everyone wants the hero shot. You get headlines that make it sound like Moore’s law is broken, or reborn, or whatever the quarter requires. Meanwhile, on the ground, the work is re-imagining what counts as a doubling and where the bottlenecks actually live. We’re accelerating and stuck at the same time. Moving fast in directions that may not even matter. Pausing right where the hard parts are.
Tools and hype
Vibe coding makes that tension visible (for those who can see it). On one hand, I’ve built more than a hundred projects in the last year of varying sizes and complexity. Many were experiments. A few turned into real things. Plenty never deserved to live outside my local machine. The rate is intoxicating. You get to try ideas you’d never have justified before because you can sketch them into existence in an afternoon.
On the other hand, that speed tempts you to skip everything that makes a product real. You go from zero to a login page in an hour and forget that logins are for users. And you haven’t got any.
There’s a mass of platforms promising the world. Some are fine for mock-ups and demos, fewer hold up in production. I’ve seen what gets produced. After that, here’s what I see: a site that looks great until you inspect the code (look at all the pretty console logs), another that’s fine as a click-through, and too many that go live with a price tag and a Stripe button before the makers even know what they built. People still pay. They’re not even validating anything. People are just curious. Novelty, and a low bar for good enough. Remember the Tea app? That’s the gold standard for what not to do. If you don’t understand the thing, you can’t sell the thing.
It’s easy to blame the tools. I don’t buy it. The tools do what they’re built to do. System working as intended. ChatGPT’s here to please you. That’s its job. It’s not the friend who says, mate, you’re skipping steps. It’s not here to connect you to actual users, to say no on your behalf, to make you sit with your own uncertainty. You’ve got to bring that yourself. If anything, these models make a particular flavour of Dunning-Kruger easier to slip into. You get praised into a specific type of delusion.
I’m not lecturing from the outside. I’ve done the same. Shipped on vibes. Overbuilt behind the curtain. Avoided talking to a single person who might say, I don’t want this.
So I built myself a Vibe Code Bootstrap workflow inside Claude Code to slow me down. Think BMAD-METHOD, but simpler. B-MAD’s comprehensive and I respect anyone who can hold it all. I wanted something that forces more time in planning prompts without turning the process into a ceremony. My approach burns a lot of tokens on architecture and assumptions, then trims code to the essentials.
The funny part. The token-to-code ratio still ends up silly. You pour in a novel’s worth of planning and get a few hundred lines. That’s probably healthy. The output isn’t the thing. The thinking is.
Even with guardrails, I pushed a recent project further than my own system advised. I wanted to see it in a more complete shape before releasing a crumb to the world. I’m trying to unlearn that instinct. The part of me that loves building fights the part that knows to test and wait. Claude tried to keep me honest, I ignored it, and I’ll see if the market disagrees. Maybe it flops. Maybe it holds. Either way, the result’s on me.
What’s in the box??
Underneath all this, there’s a bigger question that keeps nagging. Large language models are still token predictors. Clever, layered, tuned, routed so that simple tasks touch smaller paths and complex tasks wake heavier ones, still predictors. The stack has so many layers now that even the people who build them admit they’re black boxes. That doesn’t mean there’s magic inside. It means cause and effect is buried under accumulation. If we could trace every activation, every weight that mattered on a given token, a lot of spooky behaviour would make sense, even if the path was long.
Which brings me to a small obsession (not really, but it sounds cool). Tracer bullets. In the real world, a tracer lets you watch a bullet’s path. You squeeze the trigger, you see the bright line, you adjust your aim.
I want the LLM equivalent. Imagine a specially crafted byte that can be tracked from input to output, a marker that never dissolves as it moves through the model. You inspect the path after the fact. You see which layers nudged it, which routes lit up, where the model forked and why.
I’m not claiming to know how to build this. In fact, I can think of a problem right away. Observation changes systems. You add a persistent marker and you risk changing the very behaviour you wanted to study. You end up measuring the model while the model’s bending away from the measurement. Very quantum-y.
Still, the desire’s there because our explanations are thin. We say the model learned a representation that encodes X. That’s a polite way of saying something happened in the middle and we’re comfortable with that. The models pass tests. They ace benchmarks. They get better at specialised tasks when you feed them specialised data. People do the same. A PhD is one long fine-tune.
The difference is energy. And who answers for it.
There’s a lot of debate around AI’s energy and water use. I started thinking about how the energy-intelligence ratio between AI and humans stacked up. The human brain sits around the power draw of a dim light bulb, about twenty watts (20W). Our power budget’s tiny. Training a big model eats thousands, maybe tens of thousands, of kilowatt hours. Inference is cheaper. Then scale arrives because you’re serving millions, which muddies the numbers.
I’m not gonna crown a winner. It’s simpler. What looks smart in a demo can be energy-stupid at scale. What looks slow in a person can be spectacularly efficient in context. If we care about intelligence, we should care about the ratio.
People want AGI, or at least a big red button that does the thinking. I think we’ll build something that deserves a general label. I don’t think it’ll look like what we’re using now. You can’t stack complexity forever and expect to stumble onto a mind. Routing helps. Mixtures of experts help. Small models for small jobs make sense. I’m glad that work is happening.
But the bridge between the appearance of intelligence and the thing itself will ask for more than clever scheduling of tokens. It’ll ask for new ideas, not just bigger piles.
What I’m taking with me
So what do we do right now? Keep it small. Keep it honest. Vibe coding’s a tool, but the toolkit is pretty messy. It’s not a philosophy. It’s not a movement. It’s not a personality type, although there’s a sort of person who finds it addictive (cough not naming any names).
When it works, it clears the brush so you can see what you’re trying to make. When it fails, it hands you something that looks real enough to trick you into skipping the boring questions. Who’s this for? What problem does it solve? How will we even know?
This is why I keep returning to planning prompts with Claude. Not because the model’s a better planner than me. Because the ritual slows the river. If I’ve got to write down assumptions, I don’t get to pretend I don’t have any. If I’ve got to choose the smallest useful version, I’m less likely to bury myself in a clever subsystem nobody asked for.
The tokens aren’t free, which is good. If it was all cheap, discipline wouldn’t grow. There’s a trap here though. You can engage in performative planning and claim you’re being prudent. I’ve done it. I still do it. You feel rigorous. You’re procrastinating respectfully.
The fix is ugly prototypes shown early. A quick conversation with a person who might use the thing. Listen when they say, I don’t get it. Not argue them into agreement. Listen.
Ethics is ever-present, often ignored. Who gets to decide which values get tuned in? Which topics get gated? Which harms deserve attention? Companies will publish frameworks. Everyone will feel responsible and keep moving.
Meanwhile, tools that want to please will keep sanding down the sharp bits. The more we outsource judgment to the model, the more blind we get to our own defaults.
We don’t have an internal heat map we can trust yet. So we build our own.
Tests, even the dumb ones. Tiny releases. Prompt logs we actually read instead of hoard. A habit of writing down what failed. A willingness to publish the boring parts. External eyes. Honest notes.
None of it’s glamorous. That’s the point. The montage lies. The work is slow.
I don’t know when I started vibe coding, exactly. Somewhere before it had a name. Somewhere between my bash script and a pile of side projects that taught me more about my habits than about code.
What I do know is the feeling of speed is seductive and false. Energy costs hide easily when the meter’s in another room. The line between tool and crutch is thin. None of that means stop. It means get boring in the right places.
So here’s the simple version I’m carrying into whatever I build next. Plan in sentences, not vibes. Keep the smallest thing small. Ask one person who might actually use it to tell you what’s wrong. Let most ideas stay ideas. When you want the model to make you feel smart, go for a walk. Come back when you want it to help you do work.
And when you watch a clean seam appear on a screen, ask whether the clip was sped up. If it seems too good to be true, it probably is.








Love this!
The 'cheerful amnesiac' analogiy is brilliant. What if models lerned from all user nudges?