14 comments

  • Karrot_Kream 1 hour ago
    According to the OpenASR Leaderboard [1], looks like Parakeet V2/V3 and Canary-Qwen (a Qwen finetune) handily beat Moonshine. All 3 models are open, but Parakeet is the smallest of the 3. I use Parakeet V3 with Handy and it works great locally for me.

    [1]: https://huggingface.co/spaces/hf-audio/open_asr_leaderboard

    • reitzensteinm 5 minutes ago
      Parakeet V3 is over twice the parameter count of Moonshine Medium (600m vs 245m), so it's not an apples to apples comparison.

      I'm actually a little surprised they haven't added model size to that chart.

  • nmstoker 44 minutes ago
    Any plans regarding JavaScript support in the browser?

    There was an issue with a demo but it's missing now. I can't recall for sure but I think I got it working locally myself too but then found it broke unexpectedly and I didn't manage to find out why.

  • fareesh 40 minutes ago
    Accuracy is often presumed to be english, which is fine, but it's a vague thing to say "higher" because does it mean higher in English only? Higher in some subset of languages? Which ones?

    The minimum useful data for this stuff is a small table of language | WER for dataset

  • 999900000999 42 minutes ago
    Very cool. Anyway to run this in Web assembly, I have a project in mind
  • armcat 1 hour ago
    This is awesome, well done guys, I’m gonna try it as my ASR component on the local voice assistant I’ve been building https://github.com/acatovic/ova. The tiny streaming latencies you show look insane
  • ac29 1 hour ago
    No idea why 'sudo pip install --break-system-packages moonshine-voice' is the recommended way to install on raspi?

    The authors do acknowledge this though and give a slightly too complex way to do this with uv in an example project (FYI, you dont need to source anything if you use uv run)

  • asqueella 1 hour ago
    For those wondering about the language support, currently English, Arabic, Japanese, Korean, Mandarin, Spanish, Ukrainian, Vietnamese are available (most in Base size = 58M params)
  • pzo 1 hour ago
    haven't tested yet but I'm wondering how it will behave when talking about many IT jargon and tech acronyms. For those reason I had to mostly run LLM after STT but that was slowing done parakeet inference. Otherwise had problems to detect properly sometimes when talking about e.g. about CoreML, int8, fp16, half float, ARKit, AVFoundation, ONNX etc.
  • g-mork 1 hour ago
    How does this compare to Parakeet, which runs wonderfully on CPU?
  • aplomb1026 52 minutes ago
    The streaming latency numbers are what stand out to me here. Accuracy benchmarks get all the attention, but for real-time applications (voice assistants, live captioning, in-call transcription), the tail latency matters more than shaving a few points off WER. A 58M param model that can stream with sub-second latency on a Raspberry Pi opens up a whole class of edge applications that just aren't practical with larger models, even if those larger models score higher on static benchmarks.
  • sroussey 1 hour ago
    onnx models for browser possible?
  • alexnewman 43 minutes ago
    If only it did Doric
  • lostmsu 2 hours ago
    How does it compare to Microsoft VibeVoice ASR https://news.ycombinator.com/item?id=46732776 ?
  • cyanydeez 2 hours ago
    No LICENSE no go
    • bangaladore 2 hours ago
      There is a license blurb in the readme.

      > This code, apart from the source in core/third-party, is licensed under the MIT License, see LICENSE in this repository.

      > The English-language models are also released under the MIT License. Models for other languages are released under the Moonshine Community License, which is a non-commercial license.

      > The code in core/third-party is licensed according to the terms of the open source projects it originates from, with details in a LICENSE file in each subfolder.

    • altruios 2 hours ago
      reading through readme.md "License This code, apart from the source in core/third-party, is licensed under the MIT License, see LICENSE in this repository.

      The English-language models are also released under the MIT License. Models for other languages are released under the Moonshine Community License, which is a non-commercial license.

      The code in core/third-party is licensed according to the terms of the open source projects it originates from, with details in a LICENSE file in each subfolder."