There is in /config "Switch models when a message is flagged" now which can be set to false, but I had no chance to see what happens then, does it just stop or what.
Fable 5 has safety measures that flag messages on most cybersecurity or biology topics. They may flag safe, normal content as well. These measures let us bring you Mythos-level capability in other areas sooner, and we're working to refine them. Send feedback with
/feedback or learn more
1. Switch to Opus 4.8
2. Edit prompt and retry with Fable 5
'by the way, your previous attempts have these structural problems."
Just to be clear, it did not have access to any previous work that opus did? Because they are pretty good at digging out relevant tmp files and making use of whatever is out there.
With my fable adventures I caught it hallucinating something and stating it as a fact in CLI twice. And it was something that I did not see opus do in such way, opus obviously many times stated some things that it did not verify but guessed, but fable said something like "the probe showed that ..." - but there was no probe, it was not about some past events it was about what it was doing right now. "I overstated"...
But boy does it know Chinese, so much better than any other english model, gemini used to be the king but fable clearly was trained on a decent amount of it. It has a deep cultural understanding.
I maintain a failure registry in the repo. Every failed attempt gets documented with the exact mechanism, the test that regressed, the revert SHA, and an instruction to start from that frontier. Fable read all of it.
But so did Opus.
Each of the 16 Opus failures ran in the same harness with the same accumulating registry. By attempt 15, it had disproofs 1–14 in context. By the end, Opus had basically the same corpus that Fable started with, and it still kept failing, sometimes by re-deriving an already-disproved approach in a slightly different shape.
So “it leveraged the previous work” doesn’t really separate them. Both had the leverage. Only one converted it.
What changed wasn’t more context. It was that Fable rejected a premise inside the context.
The registry’s standing framing was: “this needs whole-program borrow inference, which conflicts with per-module incrementality” (architecturally blocked.) Fable ran around 5 fresh attempts in-session, hit the same wall, and then noticed the framing was a red herring: the borrow analysis already runs module-wide, and for a single-module program, the module is the whole program.
Opus read that same framing for months and treated it as a constraint. Fable falsified it.
its the same repo, same rules, same disproof history, same workflow. The model was the only variable that changed, and the outcome flipped. Is it possible that attempt 17 by Opus could have figured it out? sure. but there's 16 previous attempts that say otherwise.
As fars as anecdotes go, that’s about as controlled as it gets.
Pointing out past suboptimal / failing behaviours to new opus sessions would almost always actually create a sort of "anchoring bias" that would drive the agents towards exhibiting the failure mode (often while mentioning how it wouldn’t fall for it).
As far as I can recall, Fable has been the first model to discover the documented failure modes, comment on them, and just… keep going, actually avoiding them. Quite a surprise.
I kind of enjoy exploring black boxes, trying how different inputs are mapping to differences in outputs. It's kind of like hacking. The problem is, they keep altering the box.
It feels like Greek mythology should have some metaphor for "apparently simple structure that is so complex it leads anybody that studies it into madness". But I can't think of any name to put there.
The third sentence got to what my objection was going to be. It's fun trying to make the thing do what you want it to do! That's why many of us like computers. It's the randomness that sucks and makes the process unsatisfying.
Here's the thing. Building trust and then leaving stuff in has been around forever. The fact that it becomes cheaper does not matter that much (since protection against it is also getting better), but it required you to have a bunch of extremely talented people who has spent much of their life diving into given topic.
Such driven people are usually even hard to buy, they usually would rather get by with enough income and work on interesting projects with interesting people that get some uninteresting work for tons of money. This still does not stop them from working for Malice. But ethics do. Even if not right away, if people see that what they are doing is not quite OK, the talent stops eroding. People quit, productivity drops. That was a good dynamic. Which now will be gone.
Oh, a nice subthread place to vent. Their CLI is so f tragic that it is ridiculous. It keeps scrambling the terminal, scroll and basic shortcuts keep breaking, I've used so many tuis and terminal apps and many of them are a single man operation and a side project and I have never seen anything so bad.
If I didn't know from experience that directed properly claude can be powerful, knowing that they used it to create that CLI would be instant runaway based on very reasonable heuristics - if they are not able to use their product to create a decent piece of software that is not even sophisticated then it seems futile for me to try.
I just do not understand. I feel like most HN could vibe code better claude CLI in claude than the CLI (and certainly just write one) than what we have to deal with to use subscription.
I could not agree more that Claude itself is a janky, hacky, crappy piece of software.
When management at $DAYJOB brought the hammer down and said, "Everyone has to use genAI all the time, OR ELSE," I expected to be blown away by the tool I was avoiding due to ethical concerns, aesthetic objections, humanism, and long-term thinking.
I was born away, but not in a good way.
The CLI is _bad_. I've seen it randomly fail to render anything at all on the terminal multiple times. It has a vim-mode, but it's painfully buggy, and I can literally outrun it - if I try to type too quickly after hitting Esc for normal mode, it just doesn't return to normal mode. It's I was keeping track of the bugs in the Claude TUI, but gave up because it was taking _too much of my time_ to do so.
If nothing else, I'd say Claude shows convincingly that success is not the default for vibecoding.
Yes, it technically does the job, and no, I don't think I've ever used a worse TUI.
I'm fairly certain they were doing something similar already possibly with some quantizations and not for the good humanity but just trying to handle the increased usage. Not for API requests though, just subscription CLI usage.
You can do that with and without fake discussions.
One of the main reasons that killed my desire to move to the US was the amount of fake questions during - on a paper - friendly discussions, when the point of those questions was just, and only just, visibility. An average American non corporate discussion is worse than a non-American corporate one. And that seems to be pretty global to me.
My brother and my sister-in-law watched “Somebody feeds Phil”, and we watched together the Sydney episode and after that some others, because I’d just announced that I’d move to Australia soon. That Sydney episode had quite normal discussions for us, Europeans. Of course, people had agenda, but they still reacted to what response they got. Even if things were cut, most times people seemed to react to something else from before. Then the next episode was from Las Vegas. And it had full with questions where nobody responded to the answers, nobody cared what the response was. And they kept those in the episode. There was a point when Phil asked the people in a line one-by-one what they work. And they basically just listed it, Phil had zero responses to any answers. Zero reactions from anybody. The point wasn’t to engage with the answers or the people. There was another case, when a girl talked about her shop. There wasn’t a single sentence which was organically connected to another. Phil and the girl had different agenda and they had to perform based on those, no matter what. And I was enough now there to say that that happens way more frequently than elsewhere. The next one was from Manila. And there were organic discussions again. I’ve never seen that clearly this phenomenon which bugs me. Of course, the usual scripting which happens with these shows, even helped to make this more announced. Probably, the people talking in that episode were way less interesting, but still as a visual to what annoys me is quite good.
Of course, I had good conversations also over there, and I had bad ones elsewhere in this sense. Heck, I did similar things before, but maybe this is the exact reason why I’m so sensitive to this, because it annoyed me greatly when I did it. But on average, it was the worse over the pond. Especially on the extremities. But even in day-to-day discussions. It was annoying that I have to peal down an additional layer with anybody to get real answers, which is not needed basically anywhere else.
For one, they invested in infrastructure. They can build fast and efficiently. They can provide power, they can provide cooling. Even if you just make roads better you make everything more efficient. Plus level of standard education. It all compounds.
On HN China is seen as a cheap labor copycat. This used to be a fair approximation at some point in the past. In my opinion China is getting ahead of everyone else much more than US used to be.
SF is a beautiful thing in the US, vast power and wealth comes from there. Smart people collaborating communicating and building fast and with excitement. China did SF kind of thing for many different sectors in many different places.
reply