How to train your program verifier

(risemsr.github.io)

58 points | by matt_d 4 days ago

5 comments

  • woodruffw 7 hours ago
    At a very quick look, no evidence is given that the "bugs" found in requests are in fact reachable, i.e. not prevented by construction. And sure enough, the very first one is impossible because of a validating guard[1]: `address_in_network` only gets called after `is_valid_cidr`, which enforces the presence of a slash.

    I think we should hold claims about effective static analysis and/or program verification to a higher standard than this.

    [1]: https://github.com/psf/requests/blob/4bd79e397304d46dfccd76f...

    • JimDabell 5 hours ago
      > the very first one is impossible because of a validating guard[1]: `address_in_network` only gets called after `is_valid_cidr`, which enforces the presence of a slash.

      It’s correct to flag this code. The check is performed manually outside of the function in question. If you call the function directly, the bug surfaces.

      There is no mention in the function documentation of the validation requirement, making it easy to call incorrectly. Also, if it is required to call the validator before calling this function, then the function could just call it itself.

      In short, it’s possible to make this code safe by definition, but instead it relies upon the developer to always make the undocumented right choices every single time it is called. I would expect something more rigorous from verified code.

      • sebastianmestre 26 minutes ago
        > I would expect something more rigorous from verified code.

        I think you just want the illusion of safety :p

        A big advantage of verified code is that it enables you to write the sketchy and dangerous-looking code BECAUSE it's proven correct

        In fact, skipping as many safety checks as possible is highly desirable. For performance, yes, but also because it's less code to maintain.

        Our tools already do this to some extent. E.g. compilers that remove your bounds or type checks in the generated code when it can prove it's not needed.

      • teraflop 4 hours ago
        That doesn't mean there's a problem with the code, only with the documentation. So the article is wrong to call it a "real bug". At most it's poor code style that could theoretically lead to a bug in the future.

        There's nothing inherently wrong with a function throwing an exception when it receives invalid input. The math.sqrt function isn't buggy because it fails if you pass it a negative argument.

        • Someone 3 hours ago
          > That doesn't mean there's a problem with the code, only with the documentation.

          I disagree. If the obvious way to use an API is the incorrect way, there is a problem with the code.

          If you must call A each time before calling B, drop A and have B do both things.

          If you must call A once before calling B, make A return a token that you then must pass to B to show you called A.

          As another example, look at https://blog.trailofbits.com/2026/02/18/carelessness-versus-... (HN discussion: https://news.ycombinator.com/item?id=47060334):

          “Two popular AES libraries, aes-js and pyaes, “helpfully” provide a default IV in their AES-CTR API, leading to a large number of key/IV reuse bugs. These bugs potentially affect thousands of downstream projects.”

          Would you call that “poor code style that could theoretically lead to a bug in the future”, too?

    • seanmcdirmid 5 hours ago
      Most (all?) static analyzers are conservative, and reducing your false positive rate is always a struggle. You should never expect a false positive rate of zero (it’s probably impossible to not have false positives), but you shouldn’t be presenting your false positives as successes either.
      • woodruffw 5 hours ago
        Sure, but this one doesn’t pass the sniff test. I’ve written plenty of static analysis tools (including ones that do symbolic execution), and one of the first things you do to ensure that your results are valid is create some model of tainting/reachability. Even an analysis that’s 1-callsite sensitive would have caught this and discarded it as a false positive.

        (In case it isn’t clear, I’m saying this is slop that someone whipped up and didn’t even bother to spot check.)

  • saithound 6 hours ago
    What if you asked your favorite AI agent to produce mathematics at the level of Vladimir Voevodsky, Fields Medal-winning, foundation-shaking work but directed toward something the legendary Nikolaj Bjørner (co-creator of Z3) could actually use?

    Well, you'd get this embarrassing mess, apparently.

    • geraneum 1 hour ago
      That’s because they didn’t add “and don’t make mistakes!”.

      And yes, the exclamation mark matters!

      • grey-area 14 minutes ago
        Should have used ultrathink. I'm disappointed this is not called deep thought.
  • grey-area 4 hours ago
    I miss the days when humans submitted things they had done to this site, instead of generating long slop articles in 5 minutes: ‘LLM‑based code synthesis—while mind-numbingly effective—’ about slop code they generated in 5 minutes (or worse in hours) with foolish prompts:’Produce mathematics at the level of Vladimir Voevodsky, Fields Medal-winning, foundation-shaking work’.

    Should we even read this or should we get an LLM to summarise it onto a few bullet points again?

    This bit was interesting in illuminating the human authors’ credulity (assuming they believe in their own article):

    ‘The central move was elegant: stop asking only “is the system safe?“, start asking “how far is it from safety?“‘

    This ersatz profundity couched in a false opposition is common in generated text - does it have anything at all to do with the code generated or is it all just convincing bullshit?

  • dhjjdjjjd 2 hours ago
    [dead]
  • naillang 6 hours ago
    [dead]