The key moment flagging is what makes this distinct. Most transcription tools assume you'll review after the call as a cleanup pass, but what you've built is more of an annotation layer you're constructing in real time. Different mental model.
Curious how the live recap handles latency. If it's updating every few seconds you can actually glance at it during a call, which starts to feel like in-meeting assistance rather than post-meeting review.
I've been working on something on that end of the spectrum at livesuggest.ai, real-time suggestions during the call rather than transcript after. Same no-bot, no-cloud constraint, different moment in the workflow.
Which Speech-to-Text is used? Is it possible to configure it? This might be crucial for supporting languages other than English - the model that comes built-in with macOS fails completely for German.
This looks like a good approach, though I would expect this to be a native macOs feature within 12 months -- this seems totally like it fits into their product roadmap.
Agreed with JohnBiz, the moment flagging is interesting and unusual, and a nice contrast to passive transcription. I only recently learned about MacWhisper (I'm Windows primarily) and was floored to learn how expensive the Pro option is. Nowadays it's not so hard to have some-level of DIY transcription, so crazy that it's priced with a premium.
What's your diarization pipeline? Pyannote?
I'd taken a different approach that used a LLM clean-up pass to summarize and progressively compress the transcript for ultra-long content, but I like the idea of targeted "pay attention here" flags.
Curious how the live recap handles latency. If it's updating every few seconds you can actually glance at it during a call, which starts to feel like in-meeting assistance rather than post-meeting review.
I've been working on something on that end of the spectrum at livesuggest.ai, real-time suggestions during the call rather than transcript after. Same no-bot, no-cloud constraint, different moment in the workflow.
What's your diarization pipeline? Pyannote?
I'd taken a different approach that used a LLM clean-up pass to summarize and progressively compress the transcript for ultra-long content, but I like the idea of targeted "pay attention here" flags.