GLM-5V-Turbo: Toward a Native Foundation Model for Multimodal Agents

(arxiv.org)

51 points | by gmays 1 hour ago

3 comments

julius 0 minutes ago
Click coordinates. Agentic GUI is really annoying when the multi-modal agent cannot click on x,y coordinates.
I tested Qwen3.6, Gemma4, Nemotron3-nano. They fully hallucinate x,y coords.
GPT-5.5 can easily do it. But also Vocaela, a tiny 500M model, is quite good at it. Hope they improve the training for x,y clicking soon on the smallish multi-modals.
Recently slopped a http service together just so my local models can click, instead of relying on all the wild ways agents currently hack into the browser (browser-use, browser-harness, agent-browser, dev-browser etc) https://github.com/julius/vocaela-click-coords-http
gertlabs 1 hour ago
GLM-5V-Turbo is a model I wanted to like due to its speed and API reliability, but it didn't perform well in our coding and reasoning testing. More recent open source models have made it obsolete. GLM 5.1 is so many light years ahead of it on everything except speed, that I'm not sure why it's still being served.
Comprehensive evaluation results at https://gertlabs.com/rankings
[-]
- gruez 9 minutes ago
  >but it didn't perform well in our coding and reasoning testing
  >Comprehensive evaluation results at https://gertlabs.com/rankings
  But if you go to the linked site, it seems like the only thing that's part of the evaluation is how well the models play various games? I suppose that counts as "reasoning", but I don't see how coding ability tested?
- BugsJustFindMe 11 minutes ago
  This may be a strange request, but is it at all possible to include Cursor's Composer models in your tests?
- XYen0n 1 hour ago
  GLM-5.1 does not support image input.
- scotty79 46 minutes ago
  I think the point is to use them both with GLM 5.1 delegating vision tasks to GLM-5V-Turbo
muddi900 1 hour ago
z.ai will use quantized models in off hours. Buyer beware
[-]
- _aavaa_ 44 minutes ago
  Do you have proof for this?
- yogthos 33 minutes ago
  I have a subscription and I have not seen any difference in performance during on/off hours. What exactly are you basing this on?