10 comments

  • kamranjon 36 minutes ago
    DeepSeek continues to not only push the boundaries but also publish these incredible papers explaining how they achieved their gains - something the American labs no longer do unfortunately. Chinese labs are doing the most interesting work in AI right now.
    • tomalaci 30 minutes ago
      Probably because American AI companies are on the hook for quite a lot of investment money. I think they are trying to find the magical moat to justify their valuation.

      Revealing optimizations similar to these would pretty much reduce their competitive position.

      • lwansbrough 22 minutes ago
        Chinese labs are also still behind, so they’re incentivized to collaborate and have no reason to do it in private.

        I suspect their tune will change if they ever take the lead..

        • oefrha 15 minutes ago
          Which is a good thing. Self-serving motives are more reliable than altruistic ones.
          • nubg 6 minutes ago
            Very interesting take
            • broodbucket 0 minutes ago
              Look at how far OpenAI has drifted from their original mission. Everything comes back to greed, so it's ideal for the world if selfish motives happen to coincide with what's good for the world, like advancements in open models
        • tw1984 19 minutes ago
          > Chinese labs are also still behind, so they’re incentivized to collaborate and have no reason to do it in private.

          US labs in Google, Meta and SpaceX are not leading, none of them managed to build something on par with GLM 5.2.

          Care to explain to me why they still don't collaborate and still choose to do it in private?

          • vidarh 14 minutes ago
            I'm not sure I'd put Google in that list, but either way: Because they think they have enough capital that they can catch up and don't need the reputational boost of this.
            • CuriouslyC 7 minutes ago
              As good as Gemini's visual intelligence is, it's a terrible agent.
          • budsniffer952 14 minutes ago
            Wait, are you claiming that these companies haven't contributed to the ecosystem via research and open source?
          • lwansbrough 16 minutes ago
            No idea I don’t work there.
        • colordrops 20 minutes ago
          So the marketplace is working.
          • abc123abc123 11 minutes ago
            This is the way! Open source models will benefit, and once open source models reach the state of "good enough" the hyped up US AI companies will fear, since the availability of free, good enough, AI models will set the ceiling for how much they can charge. Then the bubble will pop.
      • cromka 4 minutes ago
        I seriously am far from fear mongering and doomsday mentality, but I just can't see how OpenAI and Anthropic can have a successful IPO if the quality gap between the free and paid continues to narrow like that...
      • budsniffer952 16 minutes ago
        Do you think that DeepSeek are building their models for free, or something? They aren't "on the hook" for anything?

        What's with all the China glazing about this stuff? They release some open-source work and people act like they are suddenly the beacon of freedom and transparency.

        • abc123abc123 11 minutes ago
          This is incorrect binary thinking. Them releasing open source can be good, but that does not commit you to think that china or chinese companies are saints. There are many shades of grey here and one does not exclude the other (nor include it).
    • herodoturtle 32 minutes ago
      Publishing by necessity I wonder? American labs on the cutting edge pioneering the way forward, so Deepseek open sourcing what they’ve got is to help even the playing field.

      Hopefully the experts here can offer insight. The above is just my hunch and I’m not a specialist in this field.

      • jonplackett 30 minutes ago
        Wouldn’t that just help the American labs anyway though? Or do they assume they’ve actually already figured this stuff out and kept it secret?
      • epolanski 2 minutes ago
        Chinese papers and techniques have been very influential and copied by US labs.
      • _0ffh 21 minutes ago
        I'm afraid I'm even balking at the word "pioneering" in context with US frontier labs. They are probably doing a few new things, right, but they are not blazing any trails for others to follow along, the Chinese are.
    • epolanski 3 minutes ago
      R1 was very influential on US models development.

      Meanwhile US labs pushing non sense narratives about "distillation", as if having a bunch of post training QAs to align a model's behaviour made any major difference in the incredible chinese results.

    • rvz 23 minutes ago
      Exactly. They did not have to open up their research up and this is what happens when smart researchers are forced to squeeze performance gains out of existing hardware.

      They don't have TPUs or access to the latest Vera Rubin GPUs either to get performance gains for free. All of the optimizations Deepseek have done are in software and it goes down to the PTX assembly level.

      Compared to Anthropic who are celebrating in fixing a flickering issue in a terminal app which took months to fix.

      • yorwba 1 minute ago
        Anthropic almost certainly also has optimized software down to the assembly level, considering this take-home interview challenge they published: https://github.com/anthropics/original_performance_takehome/... which is all about instruction-level performance optimizations. That they don't prioritize UI fixes just means they consider other things more important.
      • vidarh 9 minutes ago
        > Compared to Anthropic who are celebrating in fixing a flickering issue in a terminal app which took months to fix.

        It's funny, because if you ran Claude Code on a slow terminal, the cause of the flicker was obvious: They kept dumping the entire history of the chat back into the terminal in a number of situations, and relied on the terminal to them end up in the correct state.

  • Havoc 58 minutes ago
    Nice.

    Guessing the timing isn't accidental. Demonstrated openness vs harsh regulation

  • pokot0 15 minutes ago
    I am wondering if this is why they can offer their pro model at ~1/4th of the price compared to the other providers offering the same model, and if other providers will be able to do the same in a short timeframe.
    • vidarh 8 minutes ago
      It'd presumably help a lot, but also when you use their endpoint they get more training data.
      • nicce 5 minutes ago
        This applies to every provider. OpenAI seems to be the worst hoarder.
  • piterrro 49 minutes ago
    I’ve been using DeepSeek v4 pro for a month now in Kilo Code and its great. Fast, reliable, large context window and cheap as… Did 1,5B tokens this month and cost me 40usd (majority cached, but still).
    • spiderfarmer 42 minutes ago
      Is there a way to see how many tokes one does with claude code (pro)?
  • Jackobrien 56 minutes ago
    I see a world soon where there’s an extremely wide variety of small models for speculative decoding, unique to use cases, companies, and even individuals.
    • nicce 45 minutes ago
      Hopefully that is the case and hardware does not get impossible to get.
    • pydry 33 minutes ago
      yes, heavily constrained by sophisticated guardrails.

      this is definitely where things are going. the enormous "eat the world" models have extreme diminishing returns by comparison.

  • ricardobeat 56 minutes ago
    Presumably this has been in production for a while, and is one of the reasons they were able to dramatically lower prices a month ago?
    • _0ffh 29 minutes ago
      Lookahead Sparse Attention should be playing a big role as well, as it dramatically slashes memory consumption.
  • 2838383838 39 minutes ago
    Must be wonderful to be on the board of OpenAi et al & their PE investors whilst China keeps blowing up these mines under their feet lmao. Luckily Korean pension funds will buy all the trash as usual but goddamn you gotta start moving quick or you are gonna need some serious AGI to show you how to offload those bonds
    • ozgrakkurt 1 minute ago
      Don’t worry they will sell all the hardware and data they acquired with their grift
    • ForHackernews 29 minutes ago
      "We will build the machine-god and pray for it to pay for itself."
      • FridgeSeal 15 minutes ago
        Every day, the rate of “could post a picture of 40k tech priests and have it taken unironically” goes up, and it’s starting to get concerning.
  • rvz 40 minutes ago
    This is just one of many papers DeepSeek have released to be able to serve models at extremely cheap prices, unlike the others taking on >$100B+ of debt in building data centers for the same thing.

    > As with V4-Flash, we treat this point as an indication that DSpark sustains useful throughput under an interactivity target that the baseline cannot efficiently support. At matched system capacities, DSpark delivers 57% to 78% faster per-user generation.

    Reminds me of the flawed solution in scaling servers in 2017 that use memory-intensive technologies by adding even more servers to solve the problem. (It just increases costs.)

    Rather than doing that, think about which critical parts of your app can be written in a more performant technology.

    Fast forward to 2026, now you can see who is just throwing more money at the problem to create even more problems where as DeepSeek is giving us optimized solutions.

    I know exactly who I would pay attention to, and it is absolutely not Anthropic.

  • preetham_rangu 49 minutes ago
    do they use their OCR, or someone else?
  • imrozim 14 minutes ago
    [dead]