Orbstack is essentially a happy-path-only contraption that quickly breaks once you happen to take a less visited corner of the street. For example, if you happen to have multiple users who needs to work with it... good luck trying to clean up your system afterwards. So, it's a yoke as well. Maybe a better one for some people, but still a yoke.
I wanted to do something similar to this, then I started doing some research on birds in general, and those in my locality, then I started learning about Audio and spectograms and Nyquist Theorem and many other interesting audio stuff.
Then I started going through the Intro to Conservation Bioacoustics by Cornell course, and started watching Bioacoustic Talks by the K. Lisa Yang Center cornell center.
And now I am almost at the point where I cant start manually tagging audio sets, for target species so that I can train custom classifiers to identify birds in Rwanda which are poorly detected by birdnet.
TLDR: Being jobless can lead you into interesting ventures.
Thanks for sharing these resources and your story! I followed a very similar path, and ended up doing a biodiversity related MSc, with my dissertation being a custom classifier for poorly detected species in Príncipe. BirdNET and Perch are phenomenal achievements, but struggle in regions where, ironically, most of the world’s biodiversity is. What you’re doing for Rwandan species is so important!!
I have alas not published it yet, but I really should. How about you?
For me, on the research front, I’m very interested in methods that can be sustainably applied in remote, resource constrained locations. Heavy cloud dependent workflows to adapt a huge foundation model just aren’t practical on an island that doesn’t have 24/7 electricity and sporadic connectivity.
I know google has general sound classifiers like Yamnet, trained on youtube data but they are not very good for specific usecases. So you would have to create a custom model for you usecase.
just thinking, I've talked with a mechanic but he told me that now when they connect the car to a computer they almost always find anything wrong with a car, that and the experience they have they almost always know what's wrong.
I think sound + location could be really interesting, because you can filter parts of the car that could be making noises that are similar knowing where the mic is.
They explain some of the the reasons why they have a better solution and why they are very opinionated
>Automatic prefix caching activates only when the exact byte prefix of the previous request matches. Most agent loops reorder, rewrite, or inject fresh timestamps each turn — cache hit rate in practice: <20%.
So they optimize on this plus other techniques to improve cache hits, making it cheaper.
The last time I heard about something like this, it was Claude Code intentionally injecting random strings to break caching when you're not using a Claude model. Aside from that kind of intentional sabotage, I don't think any coding agent would just ignore prefix caching.
I'm not sure what the mechanism is, but I've definitely had Claude refuse to work on sessions that were touched by other models. Some kind of integrity check failure. Resetting the session back to the point before I used the other model fixed the problem.
IIRC Anthropic's API produces cryptographic signatures for thinking blocks. If you try to submit a set of messages that include thinking blocks with missing/invalid signatures, it'll refuse.
They do this to mitigate jailbreak attempts that rely on fabricated message history (e.g. making it look like the model was compliant in previous messages, increasing the likelihood that it'll continue to be compliant in future messages).
>Most agent loops reorder, rewrite, or inject fresh timestamps each turn
That's really surprising, since it'd defeat the whole point of KV caching. I mean I buy it considering how sloppily coded the harnesses seem to be, but this like obvious low hanging fruit.
I've also often wondered why LLMs aren't trained with a format of having a dedicated contextual system-instruction role at the _end_, which you could use to put context like current time or other misc stuff.
There are context pruning strategies that will prune old messages that are no longer relevant, and context compaction from summaries, etc. But to say "most" do this on "every turn" is overstating things. I think it's more correct to say that "many" do this "occasionally."
I'm also not sure what they mean about injecting fresh timestamps. I could see why you'd prepend/append a timestamp to the user's messages to make the model aware of the current time, and the passage of time, but I can't think of any good reason to edit timestamps in prior messages. I'm sure someone can come up with one, but I'd be very surprised if this was a thing that most agent loops do, let along doing it on every turn.
i put together this, for myself so i can try to track what coding agents are doing, I add agents to it or topics (like caching, or sandboxing, file editing methods, etc) just to try and find anything novel or good, since I am/was considering making a new harness but using all the best things from any of those. I still cannot find my perfect coding agent, every one of them has some problem or just not totally what it could be.
What I do is just point agents to a folder, have it loop around a few times on a repo, fact checks at the end, but people sometimes think the software/harness around the AI model doesn't do much which is TOTALLY wrong, its probably AS important or more.. file editing methods available matter a lot, context compaction methods... matter, caching matters. I am still fantasizing about a "best of N" coding agent, that tries to take all the best stuff from all of them.
I have an idea of a coding agent that puts a lot more effort into using more than one model at the same time. Sooo much can be done with that idea.. and no one is apparently doing it yet that I can find. I just am not sure I want to put that much time into a new coding agent project. I wonder how autonomous it could be - have weekly or daily scans of the current coding agent landscape and automatic scanning of coding agent/ai code related subreddits/hacker news, analyze it to figure out what the current problems are, complaints about existing coding agents, desires --> prioritized list of possible features/fixes ---> ai agents code and make releases
> Most agent loops reorder, rewrite, or inject fresh timestamps each turn
I haven't seen that, it'd be crazy slow if they did this. What "agent loops" are they talking about here specifically? The vagueness makes it sound potentially made up.
I've never seen an agent loop "reorder, rewrite, or inject fresh timestamps" each turn other than mostly towards the end of the messages. Messing with a large part of the context every turn would be a fairly crazy thing to do.
It's a really lazy one too - there are so many open source harnesses, including e.g. Codex and Kimi-CLI, and of course the leaked Claude Code source, so it's trivial to verify if someone even just bothered to ask an agent to check actual source code examples.
I am working on a research institute for East Africa, https://maiyoinstitute.org/. I want to tackle the dire lack of environmental data, by using 1. low cost hardware 2. Artificial Intelligence 3. Long term horizon. The problem set is huge, but I am focusing on low cost sensors for Air and Water data collection plus bioacoustics for now.
They mentioned that people like you would show up. "Push back on astroturfers. The "well, actually..." crowd is out in force. Don't let them set the narrative."
"Please don't post insinuations about astroturfing, shilling, bots, brigading, foreign agents and the like. It degrades discussion and is usually mistaken. If you're worried about abuse, email hn@ycombinator.com and we'll look at the data."
Yeah, saw that; rubbed me wrong. "If you disagree you are manufactured, a shill." This kind of condescension has never been very convincing. And I mostly agree with the petition.
What do you currently use for json and batch, I was doing some analysis and my results show that gpt-oss-120b (non batch via openrotuer) is the best for now for my use case, better than gemini-flash models (batch on google). How is your experience?
reply