I just learned about Handy in this thread and it looks great!
I think the biggest difference between FreeFlow and Handy is that FreeFlow implements what Monologue calls "deep context", where it post-processes the raw transcription with context from your currently open window.
This fixes misspelled names if you're replying to an email / makes sure technical terms are spelled right / etc.
The original hope for FreeFlow was for it to use all local models like Handy does, but with the post-processing step the pipeline took 5-10 seconds instead of <1 second with Groq.
Yes, I also use Handy. It supports local transcription via Nvidia Parakeet TDT2, which is extremely fast and accurate. I also use gemini 2.5 flash lite for post-processing via the free AI studio API (post-processing is optional and can also use a locally-hosted LM).
Since many are asking about apps with simillar capabilities I’m very happy with MacWhisper. Has Parakeet, near instant transcription of my lengthy monologues. All local.
Edit: Ah but Parakeet I think isn’t available for free. But very worthwhile single purchase app nonetheless!
Sounds like there's plenty of interest in those kind of tools. I'm not a huge fun API transcriptions given great local models.
I build https://github.com/bwarzecha/Axii to keep EVERYTHING locally and be fully open source - can be easily used at any company. No data send anywhere.
Could you make it use Parakeet? That's an offline model that runs very quickly even without a GPU, so you could get much lower latency than using an API.
I love this idea, and originally planned to build it using local models, but to have post-processing (that's where you get correctly spelled names when replying to emails / etc), you need to have a local LLM too.
If you do that, the total pipeline takes too long for the UX to be good (5-10 seconds per transcription instead of <1s). I also had concerns around battery life.
I think the biggest difference between FreeFlow and Handy is that FreeFlow implements what Monologue calls "deep context", where it post-processes the raw transcription with context from your currently open window.
This fixes misspelled names if you're replying to an email / makes sure technical terms are spelled right / etc.
The original hope for FreeFlow was for it to use all local models like Handy does, but with the post-processing step the pipeline took 5-10 seconds instead of <1 second with Groq.
Edit: Ah but Parakeet I think isn’t available for free. But very worthwhile single purchase app nonetheless!
https://github.com/Beingpax/VoiceInk
I build https://github.com/bwarzecha/Axii to keep EVERYTHING locally and be fully open source - can be easily used at any company. No data send anywhere.
If you do that, the total pipeline takes too long for the UX to be good (5-10 seconds per transcription instead of <1s). I also had concerns around battery life.
Some day!
It’s free and offline
[0] https://github.com/EpicenterHQ/epicenter