First lessons from vibe coding

Published on May 30, 2025

There’s plenty of advice about using AI coding tools effectively. You should write detailed requirements, have the model create implementation plans, establish rules for your project, and so on. I’d seen these recommendations, but they were presented alone, without details about the problems they solve.

I’ll share here some details about the problems I had with vibe coding, how I addressed them, and the lessons I learned. As Andrej Karpathy puts it, I completely “forget that the code even exists” and only ever interacted with it via the agent. I mostly used Cursor Agent Mode and made life slightly difficult by using Claude 3.5 Sonnet. This model has more rough edges than 3.7 (and 4, which wasn’t out when I did this), but that makes it easier to discover failure modes.

I’ll give examples that I came across while expanding Link My Lineup, a website that matches users’ Spotify listening history against festival lineups. The existing code was a 2000-line JavaScript file with no structure, making it a fun experiment for AI-assisted refactoring.

Providing context

The first task went surprisingly well. I asked the agent to restructure the code using MVC architecture, and it produced a sensible design. It even one-shot migrated my localStorage caching to IndexedDB, and was so successful that I haven’t had to touch the storage since. I had to iterate on some of the changes a few times, describing visual problems with the site or pasting in stack traces, but nothing too difficult. It seems to me that this all worked because the app was contained in a single file, so the agent had everything in its context window.

The next task was not so successful. I had generated a new UI using Gemini and wanted to integrate it into the refactored codebase. The agent completely ignored the MVC structure it had just created. Instead, it connected the new UI through inline JavaScript, taking the most direct path possible and ignoring structural clues in the codebase.

The problem here was that I was constantly creating new chats, and the agent didn’t have the code it had previously written in its context. In some ways, it would have been nice if the agent looked through the codebase itself to figure out the existing architecture. In the absence of that, it is our job to manage what is in the context so the agent has what it needs to implement features correctly. The agent implemented things well after I pointed it to the existing files and described the principles I wanted it to follow.

Lesson 1: An agent will implement a feature in the most direct way possible, based on its context.

Failures in providing context

Providing architectural guidance seemed to be a good idea. I tried this with a new task of implementing a display of saved songs in the UI. This task is so basic that any problems would stem from a failure of context management rather than implementation complexity. I included with this request a description of how the information should logically flow from the model layer to the UI.

The feature worked on the first try, which is maybe down to this additional guidance. This was a shallow success though, as it had coupled the counting logic to a UI element, rather than keeping with MVC and maintaining the count in the controller. This is another example of the agent taking the easiest and most direct route to implementation. I assumed that, having been pointed to the necessary files that currently implemented MVC, it would follow suit, but not in this case.

Lesson 2: The agent might not learn everything from its context that you want it to. This might indicate that you haven’t been specific or broad enough with your instructions.

I found this problem with the count element when I was trying to display the count elsewhere. The agent was having real trouble doing that, which I found surprising given its quick initial implementation. Eventually it managed to display it the second time, and I took a look at its solution. I found the offending calculation in the view layer for the first component, and a horrible circuitous flow that gave this value to the controller, then back out to the view layer to populate the second component. Maybe it was trying to stick to MVC by having all data flow through the controller, but it failed to notice the glaring problem with where the value was calculated. I was surprised at this narrow-mindedness. Anyway, after some targeted architectural direction I got the agent to implement a proper MVC solution.

Lesson 3: If the agent struggles with a task simpler than what it has previously accomplished, there might be a problem in the architecture of the code it is building on.

It quickly became tedious to include detailed instructions in every new agent chat. Project rules solve this by automatically including instructions in the context. My only note on these is that I found it useful to keep a close relationship between the rules I was adding and problems I was facing, rather than dumping in loads of rules. This way, I could maintain an understanding of what parts of the model’s behaviour were intrinsic, and what was driven by the rules I had added. Or rather, rules that I had the model write and update itself, based on the architectural direction I provided it in our chats. This was much nicer than writing rules from scratch, and although the generated rules were sometimes a bit generic or too specific, they were an easy base to work from.

Lesson 2 addendum: Investing time in building project-specific instructions and rules is worthwhile.

Thinking and planning

My next goal was adding a refresh feature to update the local Spotify cache. This required changes across all components - the event system, storage layer, and UI - making it a good test of whether the architectural rules would hold under complexity.

Even with explicit rules in place, the agent’s first attempt was chaotic. It tried to change everything simultaneously and lost track of the component responsibilities it had previously followed. The result was a broken tangle of code.

This suggested that complexity itself was overwhelming the agent’s ability to follow guidelines. I returned to one of the standard recommendations I’d initially dismissed: having the model create implementation plans. I did this via a template “thinking” rule that I included at the beginning of new chats. This rule directed the agent to plan changes step-by-step across each component before writing any code. When the plan violated architectural principles, I could correct it easily before implementation began.

The planning step successfully restored the agent’s adherence to existing rules. However, it didn’t compensate for gaps in documentation. Since I hadn’t documented the events system, the agent created multiple new events when it should have reused existing ones. This is another example of why lesson 2 is important.

Lesson 4: Having the agent create a plan means it can handle larger and more complex tasks on which it would otherwise fail.

Understanding model behaviour

Throughout my experiments, the pattern emerged that the agent defaults to making a minimal change without investing effort to understand the codebase’s architecture. It operates like a highly focused sighthound, moving quickly towards a goal but with a narrow assessment of what the task actually requires. I think about this behaviour in terms of a reasoning budget that models must allocate across different aspects of problem-solving.

The agent’s (really, the underlying model’s) reasoning budget is a finite capacity that gets consumed during problem-solving. This budget is an intrinsic property of the model, and the size of the budget reflects how “good” the model feels. To complete a task successfully, the model must allocate its reasoning budget between two activities: assessing the true scope of what the task requires, and then implementing a solution within that assessed scope. These constraints interact in important ways.

Tasks have inherent scope requirements. Some genuinely demand understanding complex interactions across many parts of the codebase, and others can be solved with minimal context. The model must spend part of its thinking budget on scope assessment to understand these requirements. If the model underestimates the true scope of a task, it may make surface-level changes that ignore architectural requirements and only partially complete the task.

To help models with low reasoning budgets, the developer can support the scope assessment process by “pre-computing” some of what the model would have to figure out. By populating the context with rules and documentation, the developer allows the model to spend more of its budget on implementation details rather than on figuring out how the codebase works.

Project rules work well for many scenarios, but have a natural limitation. At some point, the documentation will become as complicated, if not more, than the code it tries to explain. When at the limit of sensible documentation, if a model still cannot complete some task successfully, then its reasoning budget is simply too small for the remaining scope assessment plus implementation cost. In such cases, a different model is required.

Lesson 5: Before making changes, a model must understand the existing architecture and plan the changes it will make in that context. Documentation helps the model understand what is already present so it can spend more of its reasoning budget on making changes.

Closing thoughts

Throughout this project I tried to be as unconcerned as possible with the actual code being produced. Instead, I felt my effort was best spent ensuring that the model stuck to a good architecture. If the code was organised, I was more confident that the model wasn’t making spaghetti foundations for me to build on later. All of my lessons are about how to achieve this.

Having a clean architecture also made it easier for me to dive into the code on the occasions where the agent couldn’t debug something. Debugging someone else’s code is often a pain, and is made even harder if the code hasn’t ever been reviewed and is a tangle of ideas. If the code generally evolves in keeping with an architecture, then you can spot any new additions that break those constraints. In a similar way to lesson 3, this can be an indication that the model struggled with the task and so made a mistake.

Finally, I’ll acknowledge that it’s uncomfortable to work so detached from code. It’s a very different way of making software. It does become easier with practice, and I encourage you to think about it as a transition from working at the line-by-line level to worrying about architecture.