Only tangentially related: MiMo-2.5Pro is fast, cheap and very capable, although not quite gpt5.5 level iontelligence (I dont use the claudes). It works flawlessly in Pi and is an excellent workhorse. I expect big things from their next model.
I always feel GPT5.5 is better at ‘getting the bigger picture‘ when I am describing something vaguely vs Chinese models. What’s your experience with that?
That's true. The open models still do not match these extreme high end models yet on very high levels of understanding.
But that's also not needed in most of the times. There will always be a "better" model... but that doesn't make other models "bad".
For my use-cases, open models are now almost on par with these top models... and it's only extremely rare that I genuinely "need" the help of top-of-the line closed models.
Very cool work!
Regarding your finding "the tool ran successfully and returned data" and "the tool ran successfully but found nothing." Couldn’t this be solved by designing better tool responses instead of adding another layer in between? Just curious and probing my understanding.
100%, a better tool would work or even remove the problem overall.
The isssue/use-case is more around, say, a database table or legacy systems where your tool is just hitting a legacy API that may or may not be good. A surface you don't control.
It didn't come up as a use-case in this eval honestly, it's more the concept of a standard, like 4xx vs 5xx. I just felt it was missing from the ecosystem overall.
Just tested it via openrounter in the Pi Coding agent and it regularly fails to use the read and write tool correctly, very disappointing. Anyone know a fix besides prompting "always use the provided tools instead of writing your own call"
I am curious about this myself, as it's a major company that I would think is worth taking seriously.
But this and the previous release got suspiciously few comments.
reply