Qwen3.7-Max Ran for 35 Hours on Unknown Hardware and Achieved a 10× Speedup

(firethering.com)

35 points | by steveharing1 2 days ago

9 comments

big-chungus4 39 minutes ago
This article was generated from the original Qwen3.7-Max release blogpost and contains nothing new https://qwen.ai/blog?id=qwen3.7
l23k4 1 hour ago
LLM written.
See the authors twitter, he speaks english at a rather basic level and certainly did not write this https://x.com/mohitgeryani/with_replies
[-]
- geek_at 1 hour ago
  https://xcancel.com/mohitgeryani/with_replies
- Mashimo 1 hour ago
  Also I'm pretty sure the original source was linked here on HN before.
keyle 1 hour ago
I don't doubt that it did it but I wouldn't want to maintain whatever it ended up spewing after 35 hrs.
In my experience, AI fixes problems by mostly adding more code.
It's a short term gain for a long term hurt.
[-]
- userbinator 1 hour ago
  In my experience, AI fixes problems by mostly adding more code.
  In my experience, humans unfortunately tend to do the same.
  [-]
  - Iolaum 56 minutes ago
    The LLM's had to learn that from somewhere :p
  - Balinares 34 minutes ago
    Some do, but we should not be level-setting at mediocrity.
- bahmboo 1 hour ago
  How do you know that? What information do you have that would explain your position? We are talking about a specific circumstance and you have brought unsupported generalities to the discussion.
skew-aberration 49 minutes ago
I've had a very similar experience optimising a hidden markov model prediction tool I work on. I wanted to experiment with an alternative architecture and data structures. Opus 4.7 did the refactor, and eventually the only hot spot became the maths kernel. Over the course of an hour or two it iteratively rewrote that code using all the usual optimisations to improve branching, cache usage, vectorisation, etc. It reviewed the disassembly and the hardware counters with perf to verify that the changes were working as intended. It could have taken me several days to cover that much ground doing low level optimisations - and I would have spent most of it grappling with gcc, perf, searching for information about particular SIMD instructions, etc.
trilogic 1 hour ago
Don´t give up on native agents, best logic will prevail. The open weights will show the real deal.
teravor 39 minutes ago
so basically just brute force the kernel.
there are more elegant ways to leverage an LLM, see AlphaEvolve: https://arxiv.org/abs/2506.13131
it's difficult to frame most coding tasks in such a way where you can trivially verify correctness.
[-]
- m00x 27 minutes ago
  [flagged]
mannyv 1 hour ago
At this point the models should just start improving themselves.
[-]
- singingtoday 1 hour ago
  Rumor is that anthropic writes all their code with Claude. So it kind of is.
mosselman 50 minutes ago
what a nonsense, generated, article.
> For context: GLM 5.1 ran the same task and reached 7.3x. Kimi K2.6 reached 5x. DeepSeek V4 Pro reached 3.3x. The models that stopped early did so because they issued no tool calls for five consecutive rounds, they concluded they couldn’t make further progress and stopped. Qwen3.7-Max didn’t stop.
By this reasoning I could release a model that lacks all the basic optimisations. Have it optimise itself for hours to reach 20x the throughput and then claim that the model is superior to the others?
I am not saying that is what happened here, but the reporting is abysmal.
[-]
- rurban 34 minutes ago
  It is not the model's job to stop or continue, it's the agent. Qwen has nothing to do with it.
  Right now now I switched to the latest codewhale agent (in Rust), and it would perform much better according to his qualifications. Much better async IO implementation and orchestration, no more deadlocks as in the typical typescript tooling. It just doesnt stop out the blue, as claude, kimi or opencode.
- big-chungus4 41 minutes ago
  It optimized the Extend Attention operator in triton. All models were optimizing the same operator
- hobofan 44 minutes ago
  They didn't optimize their own kernels and optimize their own runtime, which I think is what you are implying.
yjftsjthsd-h 1 hour ago
Obligatory: Either written by AI or by a human who has spent so much time with AI that they adopted its writing style. Anyways.
> Over 35 hours it performed 432 kernel evaluations. Each cycle meant writing code, compiling it, running it, reading the profiling output, deciding what to change, and trying again. The model diagnosed compilation failures it hadn’t seen before, identified performance bottlenecks through runtime feedback rather than prior knowledge, and redesigned the kernel architecture multiple times when incremental improvements stopped working.
Anyone remember genetic algorithms? This might be an improvement, but it still feels a little like deja vu.
[-]
- thatoneguy 1 hour ago
  Yeah, I remember. I still have Usenet postings about the genetic algorims conference back in the '90s and some magazine clippings about researcher from the University of Sussex where I first learned about genetic algorithms back in high school.
- dist-epoch 58 minutes ago
  Genetic algorithm is random. This is intelligent evolution. Big difference.
  [-]
  - Kim_Bruning 28 minutes ago
    I got nerd-sniped wrt the genetic algorithm.
    Technically birdshot from a shotgun is also randomly distributed (passing through a cone). This actually improves the chance of hitting the clay pigeon, because the birdshot spreads out and each individual ball has a chance to hit.
    Genetic algo is similar. it's an optimizer that - in order to avoid local optima - will 'shotgun' an area around its current best guess.
  - qrobit 26 minutes ago
    Both are non-deterministic, both have some metric to optimise, one is specific and efficient, the other is too broad and very expensive
  - rurban 32 minutes ago
    Temperature is the rephrasing of randomness. No difference, just much better matchers.
- greenavocado 1 hour ago
  Key word is non-differentiable optimization. That's what generic algorithms were traditionally good at.
- Zardoz84 1 hour ago
  So a LLM wrote 432 kernel variations and it found what was the faster...