"Vibe coding": what it actually costs you

I've been building a SaaS product end to end using only Claude and GPT for the last several months. The work has taught me where the AI ends and where the rest of the work starts.

It has also taught me what vibe coding looks like up close.

If you're not familiar with the term: vibe coding is just typing a description of what you want, accepting whatever the AI produces, and moving on. Karpathy popularized the phrase. It's the lightest possible touch on the model.

When it works, it's fast. The cost shows up months later, in places you couldn't see at the time.

There are basically four patterns I've watched the AI fall into when I stopped steering.

The silent assumption (and the wrong follow-up)

This first one has cost me the most. It disguises itself as forward motion.

I asked the AI to build a pricing estimator for Orlando. It hardcoded Orlando into the logic and moved on. The follow-up question was: "Do you want to add a terms and conditions checkbox?"

The hardcoding was the load-bearing decision. The T&C checkbox is cosmetic.

The AI made a silent assumption: this app only needs Orlando. That assumption is wrong, and it's the kind of wrong that doesn't surface until next month when you need Tampa, and then a month after that when you need a dozen cities, and then you find out the entire logic layer was written against one city's parameters.

The trap is that the AI's follow-up question is plausible. It sounds like progress. A junior dev would think this is progress. They'd answer the T&C question, ship the feature, and pay the city-expansion tax six months later.

The "do it in a later phase" architecture

The AI's default architecture is whatever's most direct. It writes business logic into UI components. Validation goes into the route handler. The API layer gets skipped entirely.

When I push back ("shouldn't we separate this?"), the answer is usually some version of "you can do that in a later phase." That answer is technically true. You can refactor later. The unspoken part is that "later" is now expensive because two months of code is already written against the shortcut.

What the AI doesn't know: there is no later phase planned, because there's no future architect deciding when the refactor happens. It's me. And future me will regret it if I take the AI's "later phase" advice without flagging the cost out loud.

The right move at the time of the recommendation is to name the rewrite cost, decide whether the shortcut is acceptable, document the decision, and proceed.

Phantom prod

The AI will tell me we need to do a backfill in production for a new feature. Sometimes it's a migration plan for live data. Or a recommendation to feature-flag the rollout.

There is no prod yet. The product hasn't shipped. There are no live users.

The AI is pattern-matching from training data. Real teams DO talk about backfills and migrations and feature flags, because real teams have production. The model has seen so many of those conversations that it produces them even when the situation doesn't call for them.

The code is functional. It just solves a problem that doesn't exist. The cost is wasted cycles thinking about constraints that aren't real.

The apology loop

I correct the AI on something. It apologizes. The next time the same situation comes up, it makes the same mistake.

This pattern has shown up across the projects I've built with the AI. The model has no real memory of being corrected. Even within a single session, after the context window fills past a certain point, the correction is gone. The apology doesn't carry forward into behavior.

Practically: if a correction matters for the project, write it down. Put it in a CLAUDE.md, a system prompt, or a README the AI will read on every interaction. Don't treat the apology as a fix. Bake the rule into the prompt.

What separates good from vibe

Across all four patterns, the same thing is missing: a human steering the AI's choices instead of accepting them.

Steering is mostly two skills. Prompt writing, which is itself a craft that takes time to hone. And architectural judgment, which is basically knowing what questions the AI didn't ask because it didn't see them coming.

Steering also means treating the AI's first answer as a draft. When it says "this won't work for normal users," push back. Usually it walks back its claim. Sometimes the thinking surfaces edge cases you hadn't considered.

Vibe coding is what happens when you outsource both. The model will gladly produce code under those conditions. The tax just shows up later, in the rewrites, the painful migrations, and the weeks you spend untangling your own codebase.

The AI is genuinely useful when it's being driven. It's a faster typist than I am. It catches edge cases I would have missed. Tests I'd have skipped get written without much fuss.

But the model doesn't decide what to build. It doesn't flag what's safe to defer. The load-bearing assumptions still come from you. The AI just executes faster once you've decided.