AlphaEvolve: Gemini-powered coding agent scaling impact across fields

(deepmind.google)

62 points | by berlianta 1 hour ago

7 comments

alecco 3 minutes ago
Are Googlers themselves happy using Gemini coding agent instead of Claude Code or Codex? (no snark, I'm really asking)
[-]
- carbocation 1 minute ago
  Last month, Steve Yegge suggested that they are not: https://xcancel.com/Steve_Yegge/status/2043747998740689171
baq 3 minutes ago
RSI is here on the hardware level and on software level. Sprinkle with a couple algorithmic breakthroughs and results are nigh unimaginable.
pingou 5 minutes ago
AI improving itself (or at least the architecture it runs on), the singularity is near as they say.
Do we have other examples of AI being used to improve the LLMs, apart for the creation of synthetic data and the testing of the models?
maxothex 4 minutes ago
[flagged]
marcus_ai 48 minutes ago
[flagged]
kadam2576 47 minutes ago
Results are real but the setup is doing a lot of work. Every win here (scheduling, kernels, chip design) is in a domain with well-defined automated metrics and years of prior optimization. That's the ideal case for evolutionary search. The question isn't whether it works at Google, it's how much comes from the agent vs. the evaluation infrastructure wrapped around it.
[-]
- stalfie 35 minutes ago
  Well, if the evaluation infrastructure is something humans could have had access to before, and that the agents key "skill" is just that it's a more patient and scalable worker, I would still argue that this "comes from the agent".
  Humans get bored, inpatient, or run out of time, and so often give up in what they perceive to be a decent "local minima". Early verification harnesses using gpt-4 for optimizing robot reward functions succeeded quite well on the fact that the LLM just kept going (link below). As long as it is too boring for a human to use the same evaluation infrastructure, this is still an agent skill.
  https://arxiv.org/abs/2310.12931