For those not trying, this allows Deepseek to understand a picture (instead of just extracting text from it), and it can describe what's in the picture, but this is not an image generation system, so you can't ask it to modify an image.
Personally, I'm a bit surprised the DS chat app still doesn't offer its own text to speech and speech to text features (I know DS doesn't have any ASR model for example, but there are quite a few in the open).
Can you explain what the benefits are of actually "talking" with the bot instead of typing and reading?
As someone who would rather send a slack message to a coworker rather than actually walking over and talk to them, the idea of having to talk with my laptop is not appealing at all, haha.
Turns out, to use Claude Agents SDK, you need to have a vision enabled API. If Deepseek API could see, it can fully drive Claude Code and Claude Agents SDK. A project I'm working on relies on a Claude-in-CloudflareWorker setup and I've been relying on Qwen and gemini flash lite, both more expensive than Deepseek.
Are you running out of context? I’ve found that tooling and giberish most of the time happens when I’m butting up against the high watermark of my context window. One other thing it could be, I’ve read that lower quanta like Q1 and Q2 for smaller models can leak Chinese
If they'd do one of those little extraneous additions like Qwen does, so that I can have DS4 Flash with Vision that would be great. I've got to run a separate model entirely so that I can get vision and I'd prefer to just put it all in one space.
And it's really good and fast. Have tested with bunch of odd photos on what is happening. Overall the training set seems large enough to know what's what and where
If everything goes to plan everyone involved with big US models will be trillionaire and everyone else will poor and unemployed. If there are open and cheap to run Chinese models (and please god silicon) the financial house of cards that we have build will fall, people involved with big US models will be poor and unemployed, and everyone else will be slightly less poor and unemployed than in the first scenario.
Personally, I'm a bit surprised the DS chat app still doesn't offer its own text to speech and speech to text features (I know DS doesn't have any ASR model for example, but there are quite a few in the open).
As someone who would rather send a slack message to a coworker rather than actually walking over and talk to them, the idea of having to talk with my laptop is not appealing at all, haha.
Turns out, to use Claude Agents SDK, you need to have a vision enabled API. If Deepseek API could see, it can fully drive Claude Code and Claude Agents SDK. A project I'm working on relies on a Claude-in-CloudflareWorker setup and I've been relying on Qwen and gemini flash lite, both more expensive than Deepseek.
Can't wait to have it available on deepseek.
Is it a new silent update?
I use the API however, not the chat interface.
It also happened a handful of times with Anthropic models.
What is good for Dario is good for America.
Any ideas, theories where they get their payoff?