I find LLMs 100x more productive for greenfield work.
If I want to create a React app with X amount of pages, some Redux stores, Auth, etc. then it can smash that out in minutes. I can say "now add X" and it'll do it. Generally with good results.
But when it comes to maintaining existing systems, or adding more complicated features, or needing to know business domain details, a LLM is usually not that great for me. They're still great as a code suggestion tool, finishing lines and functions. But as far as delivering whole features, they're pretty useless once you get past the easy stuff. And you'll spend as much time directing the LLM to do this kind of this as you would just writing it yourself.
What I tend to do is write stubbed out code in the design I like, then I'll get an LLM to just fill in the gaps.
These people who say LLMs make them 100x more productive probably are only working on greenfield stuff and haven't got to the hard bit yet.
Like everyone says, the first 90% is the easy bit. The last 10% is where you'll spend most of your time, and I don't see LLMs doing the hard bit that well currently.
I couldn’t agree more and I’ve said the same thing many times.
I have yet to see an LLM-generated app not collapse under it’s own weight after enough iterations/prompts. It gets stuck in loops (removing and adding the same code/concept), it gets hung up on simple errors, etc.
For greenfield it’s amazing, no doubt, but unless you are watching it very closely and approving/reviewing the code along the way it will go off the rails. At a certain point it’s easier to add the new feature or make the modification yourself. Even if the LLM could do it, it would burn tons of money and time.
I expect things to get better, this will not always be the state of things, but for now “vibe coding” (specifically not reviewing/writing code yourself) is not sustainable.
Most people doing it have a github profile that is a mile wide and a meter deep.
LLM’s are amazing and useful, but “vibe coding” with them is not sustainable currently.
>I expect things to get better, this will not always be the state of things, but for now “vibe coding” (specifically not reviewing/writing code yourself) is not sustainable.
It will not.
And I say this as someone whose been building internal LLM tools since 2021.
The issue is their context window. If you increase the context window so they can see more code costs skyrocket as n^2 the size of the code base. If you don't then you have all the issues people have in this thread.
The reason why I have a job right now is that you can get around this by building tooling for intelligent search that limits the overfill of each context window. This is neither easy, fast, or cheap when done at scale. Worse the problems that you have when doing this are at best very weakly related to the problems the major AI labs are focusing on currently - I've interviewed at two of the top five AI labs and none of the people I talked to cared or really understood what a _real_ agentic system that solves coding should look like.
I can't help but wonder whether the solution here is something like building a multi-resolution understanding of the codebase. All the way from an architectural perspective including business context, down to code structure & layout, all the way down to what's happening in specific files and functions.
As a human, I don't need to remember the content of every file I work on to be effective, but I do need to understand how to navigate my way around, and enough of how the codebase hangs together to be able to make good decisions about where new code belongs, when and how to refactor etc.. I'm pretty sure I don't have the memory or reading comprehension to match a computer, but I do have the ability to form context maps at different scales and switch 'resolution' depending on what I'm hoping to achieve.
> building tooling for intelligent search that limits the overfill of each context window
I'm interested to know what you mean by this, in our system we've been trying to compress the context but this is the first I've seen about filtering it down.
For general text you run some type of vector search against the full-text corpus to see what relevant hits there are and where. Then you feed the first round of results into a ranking/filtering system which does pair wise comparison between each chunk that you've had a good score from the vector search. Contract/expand until you've reach the limit of the context window for your model and run against the original query.
For source code, you are even luckier since there are a lot of deterministic tools which provide solid grounding, e.g., etags, and the languages themselves enforce a hierarchical tree-like structure on the source code, viz. block statements. The above means that ranking and chunking strategies are solved already - which is a huge pain for general text.
The vector search is then just an enrichment layer on top which brings in documentation and other soft grounding text that keeps the LLM from going berserk.
Of course, none of the commercial offerings come even close to letting you do this well. Even the dumb version of search needs to be a self-recursive agent which comes with a good set of vector embeddings and the ability to decide if it's searched enough before it starts answering your questions.
If you're interested drop a line on my profile email.
I'll preface this comment with: I am a recent startup owner (so only dev, which is important) and my entire codebase has been generated via Sonnet (mostly 3.7, now using 4.0). If you actually looked at the work I'm (personally) producing, I guess I'm more of a product-owner/project-manager as I'm really just overseeing the development.
> I have yet to see an LLM-generated app not collapse under it’s own weight after enough iterations/prompts.
There's a few crucial steps to make an LLM-generated app maintainable (by the LLM):
- _have a very, very strong SWE background_; ideally as a "strong" Lead Dev, _this is critical_
- your entire workflow NEEDS to be centered around LLM-development (or even model-specific):
- use MCPs wherever possible and make sure they're specifically configured for your project
- don't write "human" documentation; use rule + reusable prompt files
- you MUST do this in a *very* granular but specialized way; keep rules/prompts very small (like you would when creating tickets)
- make sure rules are conditionally applied (using globs); do not auto include anything except your "system rules"
- use the LLM to generate said prompts and rules; this forces consistency across prompts, very important
- follow a typical agile workflow (creating epics, tickets, backlogs etc)
- TESTS TESTS AND MORE TESTS; add automated tools (like linters) EVERYWHERE you can
- keep your code VERY modular so the LLM can keep a focused context, rules should provide all key context (like the broader architecture); the goal is for your LLM to only need to read or interact with files related to the strict 'current task' scope
- iterating on code is almost always more difficult than writing it from scratch: provided your code is well architected, no single rewrite should be larger than a regular ticket (if the ticket is too large then it needs to be split up)
This is off the top of my head so it's pretty broad/messy but I can expand on my points.
LLM-coding requires a complete overhaul of your workflow so it is tailored specifically to an LLM, not a human, but this is also a massive learning curve (that take's a lot of time to figure out and optimize). Would I bother doing this if I were still working on a team? Probably not, I don't think it would've saved me much time in a "regular" codebase. As a single developer at a startup? This is the only way I've been able to get "other startup-y" work done while also progressing the codebase - the value of being able to do multiple things at a time, let the LLM and intermittently review the output while you get to work on other things.
The biggest tip I can give: LLMs struggle at "coding like a human" and are much better at "bad-practice" workflows (e.g. throwing away large parts of code in favour of a total rewrite) - let the LLM lead the development process, with the rules/prompts as guardrails, and try stay out of it's way while it works (instead of saying "hey X thing didn't work, go fix that now") - hold its hand but let it experiment before jumping in.
This document outlines the standardized approach to ticket management in the <redacted> project. All team members should follow these guidelines when creating, updating, or completing tickets.
## Ticket Organization
Tickets are organized by status and area in the following structure:
TICKETS/
COMPLETED/ - Finished tickets
BACKEND/ - Backend-related tickets
FRONTEND/ - Frontend-related tickets
IN_PROGRESS/ - Tickets currently being worked on
BACKEND/
FRONTEND/
BACKLOG/ - Tickets planned but not yet started
BACKEND/
FRONTEND/
## Ticket Status Indicators
All tickets must use consistent status indicators:
- *BACKLOG* - Planned but not yet started
- *IN_PROGRESS* - Currently being implemented
- *COMPLETED* - Implementation is finished
- *ABANDONED* - Work was stopped and will not continue
## Required Ticket Files
Each ticket directory must contain these files:
1. *Main Ticket File* (TICKET_.md):
- Problem statement and background
- Detailed analysis
- Implementation plan
- Acceptance criteria
1. Create tickets in the appropriate BACKLOG directory
2. Use standard templates from .templates/ticket_template.md
3. Set status to *Status: BACKLOG*
4. Update the TICKET_INDEX.md file
### Updating Tickets
1. Move tickets to the appropriate status directory when status changes
2. Update the status indicator in the main ticket file
3. Update the "Last Updated" date when making significant changes
4. Document progress in IMPLEMENTATION_PROGRESS.md
5. Check off completed tasks in IMPLEMENTATION_PLAN.md
### Completing Tickets
1. Ensure all acceptance criteria are met
2. Move the ticket to the COMPLETED directory
3. Set status to *Status: COMPLETED*
4. Update the TICKET_INDEX.md file
5. Create a completion summary in the main ticket file
### Abandoning Tickets
1. Document reasons for abandonment
2. Move to COMPLETED/ABANDONED directory
3. Set status to *Status: ABANDONED*
4. Update the TICKET_INDEX.md file
## Ticket Linking
When referencing other tickets, use relative links with appropriate paths:
markdown
@TICKET_NAME
Ensure all links are updated when tickets change status.
## Ticket Cleanup and Streamlining
### When to Streamline Tickets
Tickets should be streamlined and cleaned up at major transition points to maintain focus on remaining work:
1. *Major Phase Transitions* - When moving between phases (e.g., from implementation to testing)
2. *Milestone Achievements* - After completing significant portions of work (e.g., 80%+ complete)
3. *Infrastructure Readiness* - When moving from setup/building to operational phases
4. *Team Handoffs* - When different team members will be taking over the work
### What to Streamline
*Replace Historical Implementation Details With:*
- Brief completed tasks checklist ( high-level achievements)
- Current status summary
- Forward-focused remaining work
*Remove or Simplify:*
- Detailed session-by-session progress logs
- Extensive implementation decision histories
- Verbose research findings documentation
- Historical status updates and coordination notes
### Why Streamline Tickets
1. *Git History Preservation* - All detailed progress, decisions, and implementation details are preserved in git commits
2. *Clarity for Future Work* - Makes it easier to quickly understand "what needs to be done next"
3. *Team Efficiency* - Anyone picking up the work can immediately see current state and next steps
4. *Maintainability* - Shorter, focused tickets are easier to read, understand, and keep updated
### How to Streamline
1. *Archive Detailed Progress* - Historical implementation details are preserved in git history
2. *Create Completion Summary* - Replace detailed progress with a brief "What's Complete" checklist
3. *Focus on Remaining Work* - Make current and future phases the primary content
4. *Update Status Sections* - Keep status concise and action-oriented
5. *Preserve Essential Context* - Keep architectural decisions, constraints, and key requirements
*Goal*: Transform tickets from "implementation logs" into "actionable work plans" while preserving essential context.
## Maintenance Requirements
1. Keep the TICKET_INDEX.md file up to date
2. Update "Last Updated" dates when making significant changes
3. Ensure all ticket files follow the standardized format
4. Include links between related tickets in both directions
## Complete Documentation
For detailed instructions on working with tickets, refer to:
- @Ticket Workflow Guide
- @Ticket Index
- @Tickets README
I've recently been able to use LLM on a large-ish internal project to find a bug. The prompt took the form of "here's the symptoms I observe, and some hypothesis, tell me where the code that handles this case is written" (it was a brand new repo that I hadn't looked at before - code written by a different team, that were claiming some weird race condition/ were not really willing to look into the bug). Basically I was asking the LLM to tell me where to look, and it actually found the issue itself.
Not 100x more productive, that's an exaggeration... not even 10x. But it helps. It is an extremely competent rubber duck [1].
I too did this (although on a small project), and I was incredibly impressed. My problem with it is that I first did it myself, and it was fairly quick and easy. The hard part was figuring out that there was a bug, and how exactly the bug behaved. The LLM helped with the easy part, but I don't know how to even explain the difficult part to it. There was no way to know which repo the problem is in, or that it wasn't a user error.
> If I want to create a React app with X amount of pages, some Redux stores, Auth, etc. then it can smash that out in minutes. I can say "now add X" and it'll do it. Generally with good results.
Not discounting your experience, but a lot of these examples are about frameworks that never had good bootstrapping, such as Rails does/did. LLMs are really good at boilerplate, but maybe this points to these such stacks needing too much fiddling to get going, vs 10x coder AI.
I’m not sure that LLM makes it easier to do. Pain points I’ve seen:
1. You have to remember all the technologies you need included, my company template already has them.
2. LLM doesn’t have a standardized directory structure, so you end up with different projects having different structures and file naming conventions. This makes later refactoring or upgrades across multiple projects less automatable (sometimes this can be solved by having an LLM do those, but they often are unsuccessful in some projects still)
3. LLMs have a knowledge cutoff. If your company has already moved to a version after that knowledge cutoff, you need to upgrade the LLM generated code.
I very much prefer having a company template to asking an LLM to generate the initial project.
I agree, initial comment was basically that, and seeing a lot of folks (especially those with debatable technical skills) being very impressed with LLMs for boilerplate generation, e.g. "I build a WHOLE APP", etc.
Maybe they haven't use the wizard in IDEs like Intellij and Visual Studio. You can boostrap things so quickly that you don't even think about it, just like creating a new file in the editor.
> create a React app with X amount of pages, some Redux stores, Auth, etc.
Unless you're a contractor making basic websites for small businesses, how many of these do you need to make? This really a small fraction of the job of most developers, except for entry-level devs.
> when it comes to maintaining existing systems, or adding more complicated features, or needing to know business domain details,
This is what experienced developers will spend 90% of their time doing.
So yes, LLMs can replace entry-level devs, but not the more experienced ones.
This begs the question: if companies stop hiring entry-level devs because LLMs can do their job, how will new devs get experience?
Your conclusion is wrong I think. LLMs cannot magically replace entry levels devs. Who’s gonna ask the LLM to create the basic website? The product owner? The accountant? The sales guy? They wouldn’t know how to be precise enough to state what they actually need. An entry level engineer would make use of the LLM to produce the website and push it to production. Hell, only engineers know that the devil is in the details. Quick example: let’s say a Contact Us page needs to be built. There are tons of details that need to be accounted for and the LLM may skip them if it is not told about them: where does the data of the form go to? A backend endpoint? What about captcha? What about analytics? What about validation of specific fields? What about the friendly URL? And disabling the button after sending to prevent duplicate requests?
An LLM is very capable of implementing all of that… if only someone who knows all of that stuff tell them first.
And most importantly: LLMs don’t challenge the task given. Engineers do. Many times, problems are solved without code.
> An LLM is very capable of implementing all of that… if only someone who knows all of that stuff tell them first.
I agree with you, but I don't think it's the entry-level dev who is going to be interfacing with the client to discuss and resolve all the questions you posed, and/or decide on them. That was part of the OP's point -- that much of their time is spent interfacing with the client to very precisely determine the requirements.
A lot of the value a good engineer provides is saying “you don’t want to do that, it’s a bad idea for these reasons.” Or “that’s actually easier than you think. We could do it this way.”
Knowing what’s possible, difficult, easy, risky, cheap, expensive, etc.
> I find LLMs 100x more productive for greenfield work.
Greenfield != boilerplate and basic CRUD app.
I'm a consultant writing greenfield apps solo, and 90% of my time is spent away from my editor thinking, planning, designing, meeting with stakeholders. I see no benefit in using a low-IQ autocomplete tool to automate a small part of the remaining 10% of the job, the easiest and most enjoyable part in fact.
Personally I find most of coding I do is unsuitable for LLMs anyway, because I don't need them to regurgitate standard logic when libraries are available, so most of that 10% is writing business logic tailored for the program/client.
Call me elitist (I don't care) but LLMs are mostly useful to two kinds of people: inexperienced developers, and those that think that hard problems are solved with more code. After almost two decades writing software, I find I need fewer and fewer code to ship a new project, most of my worth is thinking hard away from a keyboard. I really don't see the use of a machine that egregiously and happily writes a ton of code. Less is more, and I appreciate programming-as-an-art rather than being a code monkey paid by the line of code I commit.
Disclaimer: I am anti-LLM by choice so my bias is opposite than most of HN.
I completely agree with you that the coders who are "smitten" by LLMs are just inexperienced. I personally find that LLMs get subtle improvements in capability over time and it's usually worthwhile to check in on the progress from time to time. Even if it's just for fun.
I can see why you would chime in to say that in your experience you don't get any value out of it, but to chime in to say that the millions of people who do are "inexperienced" is pretty offensive. In the hands of skilled developers these tools are a complete gamechanger.
> In the hands of skilled developers these tools are a complete gamechanger.
This is where both sides are basically just accusing the other of not getting it
The AI coders are saying "These tools are a gamechanger in the hands of skilled developers" implying if you aren't getting gamechanging results you aren't skilled
The non-AI coders are basically saying the same thing back to them. "You only think this is gamechanging because you aren't skilled enough to realize how bad they are"
Personally, I've tried to use LLMs for coding quite a bit and found them really lacking
If people are finding a lot of success with them, either I'm using them wrong and other people have figured out a better way, or their standards are way, way lower than mine, or maybe they wind up spending just as long fixing the broken code as it would take me to write it
Sorry for the late reply here, but let me develop a bit more what I meant. I have a long and vast programming experience, but am also an entrepreneur with many projects that I switch between. I could have focused full-time on, say, React Native, and then I could churn out my monkey code all day long. But I don't spend full-time on RN and then suddenly 12 months passed where I didn't write any at all so my specific knowledge of that domain is always a bit behind.
But something like o4-mini-high is a domain expert in all versions of React, Redux, RN etc. and knows every internal SDK change over the last 10 years (or goes out and reads the changelogs and code itself). Countless times I've had it port old code to new and it figures it out 100%. It formulates good modern canonical ways to solve stuff. It knows all the stupid tricks you have to do to get RN stuff run well on Android and iOS that I would never be able to keep in my head unless I work full-time on that. And it does the eye-watering boring styling code that nobody likes, you can even just upload a screenshot of another app or a sketch on paper and it will correctly output code for the style in a matter of seconds.
The end result is that I can, without investing a full-time of keeping myself current, do a professional RN dual Android/iOS app development cycle because I have the general skill to understand what to ask it and how to merge its output properly. This leaves me time to do other stuff and generally be more productive.
My guess is that many who gave up on the AI coding stuff tried the bad tools like the default chatgpt 4o-mini (or tried the tools available 2 years ago) and got a bad experience. There are light-years of differences between these and something like o4-mini-high.
TL;DR: use the correct model for the job, and it doesn't really need to be an argument - if it makes you more productive it's a good tool, if it doesn't, nobody is forcing you to use it. But I don't think you should imply that everybody who likes these tools are stupid.
Where are these millions, and where is their output? You're in an echo chamber mate, there aren't millions of people using AI to do significant amounts of work.
Indie hackers just did an article on 4 vibe coded Startups and they all seem like a joke.
And they could only find 4!
I didn't look at them all, but the flight sim is spectacularly bad, the revenue numbers obviously unsustainable and it looks like something moderately motivated school children might have made for a school project in a week.
It's sort of a slow-motion avalanche. For example the price of outsourcing app development is really coming down now because it's one of the areas where the AI coding tools really excel. It's a lot of boiler-plate style code, pretty canonical stuff, no rocket science and, to be frank, not really that much that has to be clever. You can give the tools a screenshot of an UI and it gladly outputs correct styling code for it in seconds. It supercharges already skilled developers. And if you needed 6 in-house app developers before, now you only need 2. It isn't an immediate effect, but it is an effect that is slowly going to run though the business.
The question is, are companies going to use fewer people to do the same, or the same amount of people and just create better products?
For prototyping new ideas this is also an invaluable turbocharge for startups who can't really afford to have hordes of developers trying out alternative solutions.
I think this is needlessly snarky and also presupposes something that wasn't said. No one said it can write something that the developer couldn't write (faster) themselves. Tab complete and refactoring tools in your IDE/editor don't do anything you can't write on your own but it's hard to argue that they don't increase productivity.
I have only used cline for about a week, but honestly I find it useful in a (imo badly organized) codebase at work as an auto-grepper. Just asking it "Where does the check for X take place" where there's tons of inheritance and auto-constructor magic in a codebase I rarely touch, it does a pretty good job of showing me the flow of logic.
You shared my sentiments exactly. What none of these "all junior developers will be out of a job by 2026" proclamations never deal with is the non coding stuff. Sure it can generate a boiler plate app in 1.5 seconds, but can in communicate with a stakeholder about the requirements and ask the right questions to determine the scope and importance of features.? I just imagine my Sales President spending more 90 seconds trying to write the proper prompt before he gives up and calls and talks to a human. There is just no way a C-suite executive is sitting in front of a computer typing in prompts.
I agree with the larger point you’re making, but an LLM absolutely can ask the right questions to determine scope and features for a basic application. It can even turn those into decent user stories. It’s the edge cases and little details that will be messed up.
Where I've found LLM coding assistants really effective in my client consulting work is around iteration: I can deliver so many more versions of an application UI in a given amount of time that it really changes the entire process of how we dig into a project. Where I might previously have wanted to start with low fidelity wireframes and go through approvals to avoid proto-duction and pain down the line when the client didn't like something, now we can rough out the whole thing in a functional proof of concept and then make sweeping changes live on a call as we test different interaction paradigms.
I don’t find the same, eg, greenfield AI projects.
It can do pieces in isolation, but requires significant handholding on refactors to get things correct (ie, it’s initial version has subtle bugs) — and sometimes requires me to read the docs to find the right function because it hallucinates things from other frameworks will work in its code.
> What I tend to do is write stubbed out code in the design I like, then I'll get an LLM to just fill in the gaps.
This seems like an interesting approach, though to me it begs the question: what does "stubbed out code" look like? How much stubbing is done? Have you considered using pseudocode as comments within a larger "stubbed out" portion?
The importance of rules and context has begun to elevate its significance (...that is, if context wasn't always very important), and finding ways to articulate that context seems to be a skill of greater importance...
I mean that in the sense that Excel is the tool that non developers could take the farthest to meet their needs without actually becoming a full time developer. But then to take it past that point a professional developer needs to step in.
I imagine non devs vibe coding their way to solutions far more complex than Excel can handle. But once they get past the greenfield vibe coding stage they will need a pro to maintain it, scale it, secure it, operationalize it, etc.
I have not seen much discussion on how to properly work with legacy code using LLMs.
Michael Feathers book comes to mind when thinking about the topic. One gets the idea that you have to write a lot of tests. But what happens when there are no tests, comments, documents etc?
Not really my case. Found that Codebase with good code benefit more with LLM, but it's not the prerequisite.
I just rewrote 300ish advanced PostgreSQL queries to mysql queries. The process is not magical, but it will take me 1 week rather than 3 days. Now I'm on testing phase, seems promising.
The point is if we can find a work to work along with the agent, can be very productive.
> I just rewrote 300ish advanced PostgreSQL queries to mysql queries.
Translation from one set of tokens to another is exactly the primary use case of LLMs. This is exactly what it should be good at. Developing new queries, much less so. Translation from one set of database queries to another was already very well defined and well covered before LLMs came about.
Also, both are formal grammar so if you really wanted to create a 1:1 translator, it's possible to do so (see virtual machines). But it's not as useful per se as no one really switch databases on a whim. If you really want to do so, you want to do it correctly.
I mean, using a good ORM with DB adapter options could achieve this in minutes. Sure, LLM has utility here for raw queries, hardly "replace SWEs" type of utility though
yeah, this is the issue. I've used Claude Code to great success to start a project. Once the basic framework is in place, it becomes less and less useful. I think it cannot handle the big context of a full project.
It is something that future versions could fix, if the context a llm can handle grows and also if you could fix it so it could handle debugging itself. Right now it can do it for short burst and it is not bad at it, but it will get distracted quickly and do other things I did not ask for
One of these problems has a technical fix that is only limited by money; the other does not
If I want to create a React app with X amount of pages, some Redux stores, Auth, etc. then it can smash that out in minutes. I can say "now add X" and it'll do it. Generally with good results.
But when it comes to maintaining existing systems, or adding more complicated features, or needing to know business domain details, a LLM is usually not that great for me. They're still great as a code suggestion tool, finishing lines and functions. But as far as delivering whole features, they're pretty useless once you get past the easy stuff. And you'll spend as much time directing the LLM to do this kind of this as you would just writing it yourself.
What I tend to do is write stubbed out code in the design I like, then I'll get an LLM to just fill in the gaps.
These people who say LLMs make them 100x more productive probably are only working on greenfield stuff and haven't got to the hard bit yet.
Like everyone says, the first 90% is the easy bit. The last 10% is where you'll spend most of your time, and I don't see LLMs doing the hard bit that well currently.