Harness engineering: Leveraging Codex in an agent-first world

(openai.com)

47 points | by pramodbiligiri 1 day ago

13 comments

bko 1 hour ago
> We had weeks to ship what ended up being a million lines of code... Five months later, the repository contains on the order of a million lines of code across application logic, infrastructure, tooling, documentation, and internal developer utilities. Over that period, roughly 1,500 pull requests have been opened and merged with a small team of just three engineers driving Codex. This translates to an average throughput of 3.5 PRs per engineer per day, and surprisingly the throughput has increased as the team has grown to now seven engineers. Importantly, this wasn’t output for output’s sake: the product has been used by hundreds of users internally, including daily internal power users.
That's an insane level of throughput. What's a good baseline? Prior to agentic coding, whats the typical number of PRs engineers were expected to push? Maybe a 2-10?
Do people feel the software has gotten better in the last 6 months? The number of engs is prob the same so we should expect maybe 5x faster cycle in major software apps, but I don't see it. The AI apps do change very fast but given its a very new field, I'd expect as much. But outside of that, I don't see it.
[-]
- torben-friis 50 minutes ago
  Here's a fun one: firefox lists its current count at about 2.5M LOC, from roughly 1M commits during the years.
  You end up with about 3 lines added per commit, which is not ridiculous when you consider that most would be editions rather than full additions.
  Here, we have 1500 PRs and 1M LOC, which is about 650 added LOC per PR. Remember, not 650 lines total in the PR, but +650 balance after additions-removals.
  Fun questions for attentive readers:
  - What does a project growing at a rate of one full firefox-codebase worth of LOC per year look like, a decade down the line?
  - What does the line count say about the verbosity of the tool, and what does it say about outcomes that the purpose of the project isn't clearly disclosed?
  - Do we have reasons to care about LOC in a world where we don't write code manually? What happens to token usage numbers when the codebase is significantly larger?
  - If it was confirmed that LLM usage blows up your line count, what's the implication for codebases that want to return to manual coding after months of usage? (Say, because the tool gets expensive).
- krackers 1 hour ago
  They never specified what exactly the product was, without which it's impossible to judge the post.
  For some reason most of the uses of "agents" are to build yet other AI products, it's turtles all the way down. Maybe that says more about the field of harnesses than it does about the power of "agents".
  [-]
  - becomevocal 39 minutes ago
    Feels like the active discovery going on is trying to understand what is computer vs what is AI, for every product.
    Agents help a ton with the discovery, but the act of building a product needs a deeper level of thought and validation to make it actually better than what came before. So IMO what you see is people still learning what needs to be understood and crafted first hand to make a product better (including economics)
    We’ll get there if more of us try
- aleqs 32 minutes ago
  > should expect maybe 5x faster cycle in major software apps
  To what end and what would that even look like though? Enshittifying everything at maximum speed? The apps/platforms I use regularly - GitHub, Spotify, Google maps (just to name a few), have gotten noticeably shittier in recent times.
- Aperocky 57 minutes ago
  > ended up being a million lines of code
  This almost reeks of "I've never cleaned up our code base because there is too much code, and didn't even bother having agents/LLM cleaning them up".
  You almost never need a million lines of code - this includes your software, infra, testing and operational tools. You didn't ship the linux kernel in 3 weeks and you know it. The code is already speghetti and it achieve the basic functions OK but it will harder and harder to simplify and untangle and maintain.
  [-]
  - bombcar 21 minutes ago
    Even the linux kernel doesn't need millions of lines of code; most of the actual LOC is device drivers, and you don't need all of them, you just need the ones for the devices you have.
    [-]
    - Chu4eeno 2 minutes ago
      And Linux maintainers are actively pushing to radically cut down on the LOC by eliminating drivers etc.
  - girvo 55 minutes ago
    Yeah I cannot see how "we shipped 1 million lines of code in three weeks" is... something to be proud of haha
zatkin 8 minutes ago
I worry most about blindspots with this kind of approach. Let's say that this repository goes on for years, at which point the docs folder is several MB in size. Would Codex be able to think outside of the box? Or would the aggregate of the Markdown content fundamentally cover enough ground to prevent it from thinking of novel new approaches to existing problems?
faangguyindia 22 minutes ago
Codex updates usually appear every few hours (i am not saying this how often it's published) but that's my perception as a user. Often i update codex just to see new update within an hour so.
Many times those updates are not properly tested, for example in one update the model selector got completely changed.
then next hotfix was pushed which restored original.
[-]
- dawnerd 16 minutes ago
  Who needs a QA team when you can just rest on users and iterate instantly /s
varenc 24 minutes ago
digression:
It's interesting this was submitted to HN over 15 times since it was published in February: https://hn.algolia.com/?dateRange=all&page=0&prefix=false&qu...
But this is the only submission that's had traction at all. Since the content is nearly the same for all submissions, it highlights how getting to the front page can be a bit random. (Though this is the only one that capitalized 'Leveraged' so maybe that's the secret)
murat124 31 minutes ago
The other day I came across to a video showing workers in a e-vape factory. They pick up a bunch of e-vapes from the conveyor belt (each has 6 e-vape think), stick in their mouth and vigorously vape all of them for about 5 seconds, then test the next bunch. Humans reviewing hundreds of lines of change in a PR written by AI is not very different.
bronny1989 4 minutes ago
why do you have “weeks” to ship what would take “months”?
rfw300 52 minutes ago
I understand that the’ve written zero lines of code for this application, but would it kill them to write a few lines of the blog post by hand?
Forcing readers to wade through an unceasing string of LLM clichés demonstrates the opposite of the point you’re trying to make—that the consumers of your work are worse off because you exercised no human judgment in creating it.
angrydev 40 minutes ago
Published Feb 11, 2026
darepublic 56 minutes ago
Codex pushed an update that made my old threads inaccessible. This takes a million of lines to put out a half baked crud app?
drchaim 46 minutes ago
But this is almost what we have been doing for the last 3/5 months, isn’t?
[-]
- fbrncci 29 minutes ago
  Well to a lot of people this is still a foreign concept.
- wilsonnb3 45 minutes ago
  Article is from February so that tracks
Sarkie 59 minutes ago
I would never dare put that in production
jlintc 56 minutes ago
[flagged]
knicholes 51 minutes ago
Everyone is criticizing the number of lines of code and the lack of attention that must certainly have been applied to generate that code and push it into production. What is being ignored is this awesome prompt that is almost certainly better than having no agents.md or plans.md or whatever you've come up with, to add validation steps for committed changes. You're still free to look at your code, the changes, and ask the agent to clean up. Try it. It's really nice.