> We had weeks to ship what ended up being a million lines of code... Five months later, the repository contains on the order of a million lines of code across application logic, infrastructure, tooling, documentation, and internal developer utilities. Over that period, roughly 1,500 pull requests have been opened and merged with a small team of just three engineers driving Codex. This translates to an average throughput of 3.5 PRs per engineer per day, and surprisingly the throughput has increased as the team has grown to now seven engineers. Importantly, this wasn’t output for output’s sake: the product has been used by hundreds of users internally, including daily internal power users.
That's an insane level of throughput. What's a good baseline? Prior to agentic coding, whats the typical number of PRs engineers were expected to push? Maybe a 2-10?
Do people feel the software has gotten better in the last 6 months? The number of engs is prob the same so we should expect maybe 5x faster cycle in major software apps, but I don't see it. The AI apps do change very fast but given its a very new field, I'd expect as much. But outside of that, I don't see it.
Here's a fun one: firefox lists its current count at about 2.5M LOC, from roughly 1M commits during the years.
You end up with about 3 lines added per commit, which is not ridiculous when you consider that most would be editions rather than full additions.
Here, we have 1500 PRs and 1M LOC, which is about 650 added LOC per PR. Remember, not 650 lines total in the PR, but +650 balance after additions-removals.
Fun questions for attentive readers:
- What does a project growing at a rate of one full firefox-codebase worth of LOC per year look like, a decade down the line?
- What does the line count say about the verbosity of the tool, and what does it say about outcomes that the purpose of the project isn't clearly disclosed?
- Do we have reasons to care about LOC in a world where we don't write code manually? What happens to token usage numbers when the codebase is significantly larger?
- If it was confirmed that LLM usage blows up your line count, what's the implication for codebases that want to return to manual coding after months of usage? (Say, because the tool gets expensive).
They never specified what exactly the product was, without which it's impossible to judge the post.
For some reason most of the uses of "agents" are to build yet other AI products, it's turtles all the way down. Maybe that says more about the field of harnesses than it does about the power of "agents".
Feels like the active discovery going on is trying to understand what is computer vs what is AI, for every product.
Agents help a ton with the discovery, but the act of building a product needs a deeper level of thought and validation to make it actually better than what came before. So IMO what you see is people still learning what needs to be understood and crafted first hand to make a product better (including economics)
> should expect maybe 5x faster cycle in major software apps
To what end and what would that even look like though? Enshittifying everything at maximum speed? The apps/platforms I use regularly - GitHub, Spotify, Google maps (just to name a few), have gotten noticeably shittier in recent times.
This almost reeks of "I've never cleaned up our code base because there is too much code, and didn't even bother having agents/LLM cleaning them up".
You almost never need a million lines of code - this includes your software, infra, testing and operational tools. You didn't ship the linux kernel in 3 weeks and you know it. The code is already speghetti and it achieve the basic functions OK but it will harder and harder to simplify and untangle and maintain.
Even the linux kernel doesn't need millions of lines of code; most of the actual LOC is device drivers, and you don't need all of them, you just need the ones for the devices you have.
I worry most about blindspots with this kind of approach. Let's say that this repository goes on for years, at which point the docs folder is several MB in size. Would Codex be able to think outside of the box? Or would the aggregate of the Markdown content fundamentally cover enough ground to prevent it from thinking of novel new approaches to existing problems?
Codex updates usually appear every few hours (i am not saying this how often it's published) but that's my perception as a user. Often i update codex just to see new update within an hour so.
Many times those updates are not properly tested, for example in one update the model selector got completely changed.
then next hotfix was pushed which restored original.
But this is the only submission that's had traction at all. Since the content is nearly the same for all submissions, it highlights how getting to the front page can be a bit random. (Though this is the only one that capitalized 'Leveraged' so maybe that's the secret)
The other day I came across to a video showing workers in a e-vape factory. They pick up a bunch of e-vapes from the conveyor belt (each has 6 e-vape think), stick in their mouth and vigorously vape all of them for about 5 seconds, then test the next bunch. Humans reviewing hundreds of lines of change in a PR written by AI is not very different.
I understand that the’ve written zero lines of code for this application, but would it kill them to write a few lines of the blog post by hand?
Forcing readers to wade through an unceasing string of LLM clichés demonstrates the opposite of the point you’re trying to make—that the consumers of your work are worse off because you exercised no human judgment in creating it.
Everyone is criticizing the number of lines of code and the lack of attention that must certainly have been applied to generate that code and push it into production. What is being ignored is this awesome prompt that is almost certainly better than having no agents.md or plans.md or whatever you've come up with, to add validation steps for committed changes. You're still free to look at your code, the changes, and ask the agent to clean up. Try it. It's really nice.
That's an insane level of throughput. What's a good baseline? Prior to agentic coding, whats the typical number of PRs engineers were expected to push? Maybe a 2-10?
Do people feel the software has gotten better in the last 6 months? The number of engs is prob the same so we should expect maybe 5x faster cycle in major software apps, but I don't see it. The AI apps do change very fast but given its a very new field, I'd expect as much. But outside of that, I don't see it.
You end up with about 3 lines added per commit, which is not ridiculous when you consider that most would be editions rather than full additions.
Here, we have 1500 PRs and 1M LOC, which is about 650 added LOC per PR. Remember, not 650 lines total in the PR, but +650 balance after additions-removals.
Fun questions for attentive readers:
- What does a project growing at a rate of one full firefox-codebase worth of LOC per year look like, a decade down the line?
- What does the line count say about the verbosity of the tool, and what does it say about outcomes that the purpose of the project isn't clearly disclosed?
- Do we have reasons to care about LOC in a world where we don't write code manually? What happens to token usage numbers when the codebase is significantly larger?
- If it was confirmed that LLM usage blows up your line count, what's the implication for codebases that want to return to manual coding after months of usage? (Say, because the tool gets expensive).
For some reason most of the uses of "agents" are to build yet other AI products, it's turtles all the way down. Maybe that says more about the field of harnesses than it does about the power of "agents".
Agents help a ton with the discovery, but the act of building a product needs a deeper level of thought and validation to make it actually better than what came before. So IMO what you see is people still learning what needs to be understood and crafted first hand to make a product better (including economics)
We’ll get there if more of us try
To what end and what would that even look like though? Enshittifying everything at maximum speed? The apps/platforms I use regularly - GitHub, Spotify, Google maps (just to name a few), have gotten noticeably shittier in recent times.
This almost reeks of "I've never cleaned up our code base because there is too much code, and didn't even bother having agents/LLM cleaning them up".
You almost never need a million lines of code - this includes your software, infra, testing and operational tools. You didn't ship the linux kernel in 3 weeks and you know it. The code is already speghetti and it achieve the basic functions OK but it will harder and harder to simplify and untangle and maintain.
Many times those updates are not properly tested, for example in one update the model selector got completely changed.
then next hotfix was pushed which restored original.
It's interesting this was submitted to HN over 15 times since it was published in February: https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...
But this is the only submission that's had traction at all. Since the content is nearly the same for all submissions, it highlights how getting to the front page can be a bit random. (Though this is the only one that capitalized 'Leveraged' so maybe that's the secret)
Forcing readers to wade through an unceasing string of LLM clichés demonstrates the opposite of the point you’re trying to make—that the consumers of your work are worse off because you exercised no human judgment in creating it.