So isolation is correct. Forking a sandbox gives you multiple exact duplicates o...

vasco · 2026-04-07T05:17:47 1775539067

> and it realizes it wants to test 2 things in isolation, forking is the only way

Why would forking be the only way, when humans don't work like that? You can easily try one thing, undo, try the second thing. Your way is a faster way potentially, but also uses more compute.

benswerd · 2026-04-07T05:22:08 1775539328

This assumes you can retain the same state after an operation.

> "I wonder if this is slow because we have 100k database rows" > DELETE FROM TABLE; > "Woah its way faster now" > But was is the 100k rows or was it a specific row

Thats a great place where drilling bugs and recreating exact issues can be really problem, and testing the issues themselves can be destructive to the environment leading to the need for snapshots and fork.

vasco · 2026-04-07T05:26:54 1775539614

Again, that is a problem of approach, not of compute. Compute just makes that faster, it doesn't make it possible. It's like you saying the only way to do something is with threads. It's good for some use cases, bad for others, and makes most faster, but it doesn't unlock much

stingraycharles · 2026-04-07T05:51:05 1775541065

You should focus much more on this aspect, this makes so much more sense but it’s a very specific, narrow use case: multiple solution spaces must be explored in parallel, and then reconciled.

I can also see this being more of a framework / library that integrates into existing LLM frameworks than a SaaS; I wouldn’t switch my whole application to a different framework / runtime just for this.

benswerd · 2026-04-07T05:57:18 1775541438

This is a good note. We've never been great at explaining what we're doing and plan to do a lot more work on making it accessible/make sense.

indigodaddy · 2026-04-06T18:14:12 1775499252

Yep I can see this especially when the agent is spinning up test servers/smokes and you don't want those conflicting. How do we reconcile all the potential different git hashes though, upstream I guess etc (this might be an easy answer and I'm not super proficient with git so forgive)

benswerd · 2026-04-06T18:16:07 1775499367

So we recommend branch per fork, merge what you like.

You have to change the branch on each fork individually currently and thats unlikely to change in the short term due to the complexity of git internals, but its not that hard to do yourself `git checkout -b fork-{whateverDiscriminator}`

chrisweekly · 2026-04-06T21:58:41 1775512721

Have you considered git worktree?

benswerd · 2026-04-06T22:16:41 1775513801

Great for simple things, but git worktrees don't work when you have to fork processes like postgres/complex apps.

ghm2199 · 2026-04-07T02:14:00 1775528040

For postgres there are pg containers, we use them in pytest fixtures for 1000's of unit-tests running concurrently. I imagine you could run them for integration test purposes too. What kind of testing would you run with these that can't be run with pg containers or not covered by conventional testing?

I'll say this is still quite useful win for browser control usecases and also for debugging their crashes.

mememememememo · 2026-04-07T04:30:00 1775536200

The other way might be testing VMs vs agent VMs but that would be slower as to "fork" it would need to run the test again to that point. But wouldn't need agent context.

The forking you provided adds a lot more speed.

benswerd · 2026-04-07T04:41:16 1775536876

That + its not always simple to replicate state. A QA agent in the future could run for hours to trigger an edge case that if all actions to get there were theoretically taken again it wouldn't happen.

That can happen via race conditions, edge states, external service bugs.