Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

So isolation is correct. Forking a sandbox gives you multiple exact duplicates of isolated environments.

When your coding agent has 10 ideas for what to do, to evaluate them correctly it needs to be able to evaluate them in isolation.

If you're building a website testing agent and halfway down a website, with a form half filled out a session ongoing, etc and it realizes it wants to test 2 things in isolation, forking is the only way.

We also envision this powering the next generation of devcycles "AI Agent, go try these 10 things and tell me which works best". AI forks the environment 10 times, gets 10 exact copies, does the thing in each of them, evaluates it, then takes the best option.



> and it realizes it wants to test 2 things in isolation, forking is the only way

Why would forking be the only way, when humans don't work like that? You can easily try one thing, undo, try the second thing. Your way is a faster way potentially, but also uses more compute.


This assumes you can retain the same state after an operation.

> "I wonder if this is slow because we have 100k database rows" > DELETE FROM TABLE; > "Woah its way faster now" > But was is the 100k rows or was it a specific row

Thats a great place where drilling bugs and recreating exact issues can be really problem, and testing the issues themselves can be destructive to the environment leading to the need for snapshots and fork.


Again, that is a problem of approach, not of compute. Compute just makes that faster, it doesn't make it possible. It's like you saying the only way to do something is with threads. It's good for some use cases, bad for others, and makes most faster, but it doesn't unlock much


You should focus much more on this aspect, this makes so much more sense but it’s a very specific, narrow use case: multiple solution spaces must be explored in parallel, and then reconciled.

I can also see this being more of a framework / library that integrates into existing LLM frameworks than a SaaS; I wouldn’t switch my whole application to a different framework / runtime just for this.


This is a good note. We've never been great at explaining what we're doing and plan to do a lot more work on making it accessible/make sense.


Yep I can see this especially when the agent is spinning up test servers/smokes and you don't want those conflicting. How do we reconcile all the potential different git hashes though, upstream I guess etc (this might be an easy answer and I'm not super proficient with git so forgive)


So we recommend branch per fork, merge what you like.

You have to change the branch on each fork individually currently and thats unlikely to change in the short term due to the complexity of git internals, but its not that hard to do yourself `git checkout -b fork-{whateverDiscriminator}`


Have you considered git worktree?


Great for simple things, but git worktrees don't work when you have to fork processes like postgres/complex apps.


For postgres there are pg containers, we use them in pytest fixtures for 1000's of unit-tests running concurrently. I imagine you could run them for integration test purposes too. What kind of testing would you run with these that can't be run with pg containers or not covered by conventional testing?

I'll say this is still quite useful win for browser control usecases and also for debugging their crashes.


The other way might be testing VMs vs agent VMs but that would be slower as to "fork" it would need to run the test again to that point. But wouldn't need agent context.

The forking you provided adds a lot more speed.


That + its not always simple to replicate state. A QA agent in the future could run for hours to trigger an edge case that if all actions to get there were theoretically taken again it wouldn't happen.

That can happen via race conditions, edge states, external service bugs.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: