Rio de Janeiro's "homegrown" LLM appears to be a merge of an existing model

(github.com)

73 points | by unrvl22 1 hour ago

13 comments

zinodaur 46 minutes ago
Oh no, someone is profiting off of their work without proper attribution!?!?
[-]
- carlosjobim 19 minutes ago
  This is a pure scam on tax payer money. But what else would be expected?
  [-]
  - jrm4 4 minutes ago
    Unlike the big companies who do this, which often are merely impure scams on tax payer money a little more downstream.
- internet2000 44 minutes ago
  Attribution isn't the relevant part. Lying about your lab's capabilities is.
  [-]
  - Planktonne 35 minutes ago
    That's also something all the AI companies have been doing.
    [-]
    - dofm 18 minutes ago
      Lying about model capability is right now the lingua franca of the cloud AI business model, almost; they yes-and each other's lies because they are in a position of needing to generate interest, including going as far as needing to trigger regulatory capture.
      (It's not news to anyone who has worked in sales-led businesses that salespeople are prone to believing the claims of other salespeople, I guess).
  - adrian_b 24 minutes ago
    I do not see anyone lying.
    The model card says:
    > Post-trained from Qwen 3.5 397B
    The model card also says that they use an inference framework based on "SwiReasoning: Switch-Thinking in Latent and Explicit for Pareto-Superior Reasoning LLMs" by Shi et al.:
    https://arxiv.org/abs/2510.05069
    So the sources seem properly attributed.
    They only claim that what they did to "Qwen 3.5 397B" has improved the LLM, including, as expected, with "strong performance in Portuguese".
  - functionmouse 24 minutes ago
    leopards ate my face
- clear-octopus 23 minutes ago
  [dead]
- woadwarrior01 20 minutes ago
  Are you new to the latest AI hype cycle? /s
- bachmeier 24 minutes ago
  "Their work"? First you had the original content creators that did 99.99% of the work. Then you had the US companies bundle it up into a frontier LLM. Then "they" did the "work" of using the US model as a foundation for their own. So in the sense of doing 0.00001% of the actual work that went into their product, sure.
  I'd say it's more like someone forking a Linux distro, adding a few themes and fonts, and then complaining when someone else forks their distro and adds another theme.
  [-]
  - JoshStrobl 14 minutes ago
    That joke really went over your head, huh...
  - bwilliams18 17 minutes ago
    That was the joke of the parent comment.
  - dghlsakjg 18 minutes ago
    That’s the joke.
  - harikb 17 minutes ago
    It is only a problem if you claim it to be an independently developed OS with no attribution to base
  - idiotsecant 10 minutes ago
    Oof this is delete your post level I think. Sorry bud, I been there.
unrvl22 1 hour ago
The municipality of Rio de Janeiro (via its IT company IplanRIO) released Rio-3.5-Open-397B, presented as a homegrown Qwen3.5 fine-tune that beats comparable open models on benchmarks. The linked issue argues it's actually a weighted merge of ~60% Nex-N2 Pro + ~40% Qwen3.5-397B-A17B - Nex-N2 having been released about a week earlier.
[-]
- Lucasoato 11 minutes ago
  So the problem isn’t in the missing attribution to Qwen, but with the fact that they didn’t mention Nex-N2 Pro right?
- clear-octopus 19 minutes ago
  [dead]
jrm4 2 minutes ago
“Well, Steve (Jobs), I think it’s more like we both had this rich neighbor named Xerox, and I broke into his house to steal the TV set, but I found out that you had already stolen it.”
-- Bill Gates
fkozlowski 29 minutes ago
I'm honestly surprised that they even had the inclination to attempt creating a model. I guess it's bullish that a municipal IT department had the guts to try this?
ekjhgkejhgk 33 minutes ago
One funny thing about incompetence is that they don't have the competence to know that their incompetence is straightforward to verify by a competent person.
[-]
- thimabi 3 minutes ago
  I wouldn’t describe what happened here as incompetence. As a “carioca”, I am pleasantly surprised to know that the government’s IT department is involved in AI work — even without the budget to create its own models from scratch.
- root-parent 28 minutes ago
  You just described every single vibe coder...
- carlosjobim 16 minutes ago
  Why would they care? They get their salaries and pensions and bonuses, and the tax payer is footing the bill.
AlienRobot 46 minutes ago
The model's webpage at https://huggingface.co/prefeitura-rio/Rio-3.5-Open-397B says it's a merge now. It previously didn't contain this paragraph:
>The model is built via a merge of https://huggingface.co/nex-agi/Nex-N2-Pro and https://huggingface.co/Qwen/Qwen3.5-397B-A17B, proceeded by On-Policy Distillation from a stronger model. We detected an incorrect upload in the previous version, where the base merged version was upload instead of the final distilled model. We are sorry for the confusion and apologize profusely.
Incidentally are people using Github issues as blogs now?
MadrasTh0rn 33 minutes ago
Not surprised
AnotherGoodName 53 minutes ago
This is fascinating that it worked though. Can we just merge all the open weight models and get something better?
[-]
- wds 42 minutes ago
  I imagine it'd work the same as merging all the good-tasting foods to get an even tastier one
- avereveard 20 minutes ago
  most merge improve a small subset of "feeling" benchmark (too small, too specific, or out of distribution) and tend to show degradation on actual benchmark, with especially punishing result on long chain benchmarks.
  also only work on matching architectures (i.e. finetunes/loras of the same model)
- dindunuf 27 minutes ago
  that kinda worked in llama 1/2 era, not between different models but between finetunes of the same model. the briefly legendary Mythomax was IIRC a merge of 5+ tunes, some of which were merges themselves.
- _3u10 47 minutes ago
  No, they need the same arch, but you can distill them into a single model. And yes, if you use the API directly Claude will often say it’s an open weight model (likely the ones it was distilled from)
alfiedotwtf 40 minutes ago
Wasn’t it already obvious given the awfully familiar parameter numbers?
yieldcrv 26 minutes ago
Didn’t the last thread about this have someone from the lab or an enthusiast in Rio saying exactly that?
Its a fine tune of Qwen
Not a conspiracy
[-]
- daemonologist 7 minutes ago
  The allegation here is that it's not actually a fine-tune of Qwen, but instead an undisclosed mashup (merge) of someone else's fine-tune of Qwen and the original model. Rio subsequently said that the model was in fact a merge, that they did additional fine-tuning after the merge, and that they accidentally uploaded the base merge instead of the version with additional fine-tuning. But this seems like quite an oversight...
Aurornis 11 minutes ago
[dead]
antii 10 minutes ago
[dead]
elzbardico 57 minutes ago
This is so typical of brazilian academia.
[-]
- guiraldelli 48 minutes ago
  Without evidence, your comment is just bad mouthing.
  I have been involved in academia, including in Brazil, and I don't find academia there any more copycat than any other institution, including top tier ones.
- dghlsakjg 16 minutes ago
  This was a municipality working with a government associated IT company.
  What does it have to do with Brazilian academia?
- _3u10 47 minutes ago
  No, typically Brazilians go to Paraguay for their education, most of their technology comes from Paraguay too.
  [-]
  - knuppar 11 minutes ago
    that's just a lie lol, stop spreading misinformation
  - cassiogo 30 minutes ago
    What? Never heard of this
    [-]
    - stymaar 7 minutes ago
      That sounds like nonsense, they don't even speak the same language in Brasil and Paraguay …