More

firstbabylonian · 2026-05-16T13:26:24 1778937984

Self-hatred is not a good look.

DaiPlusPlus · 2026-05-16T13:59:06 1778939946

Noe you’re making me self-conscious about when I get into a self-hating mood; unfortunately this gives me yet another new reason to hate myself: it’s not a good look.

Muromec · 2026-05-16T21:29:57 1778966997

That's a very privileged Westerner opinion to have.

firstbabylonian · 2026-03-23T15:01:12 1774278072

> SSD streaming to GPU

Is this solution based on what Apple describes in their 2023 paper 'LLM in a flash' [1]?

1: https://arxiv.org/abs/2312.11514

simonw · 2026-03-23T15:10:44 1774278644

Yes. I collected some details here: https://simonwillison.net/2026/Mar/18/llm-in-a-flash/

anemll · 2026-03-23T18:48:55 1774291735

Thanks for posting this, that's how I first found out about Dan's experiment! SSD speed doubled in the M5P/M generation, that makes it usable! I think one paper under the radar is "KV Prediction for Improved Time to First Token" https://arxiv.org/abs/2410.08391 which hopefully can help with prefill for Flash streaming.

Yukonv · 2026-03-23T19:12:39 1774293159

That’s exactly what I thought about. Getting my hands on an M5 Max this week and going to see hows Dan’s experiment performs with faster I/O. Also going to experiment with running active parameters at Q6 or Q8 since output is I/O bottlenecked there should room for higher accuracy compute.

anemll · 2026-03-23T19:19:23 1774293563

Check my repo, I had added some support for GUFF/untloth, Q3,Q5/Q8 https://github.com/Anemll/flash-moe/blob/iOS-App/docs/gguf-h...

3abiton · 2026-03-23T21:18:50 1774300730

To be fair, it's "possible" to run such setup with llama.cpp with ssd offload. It's just abysmal TG speeds. But it's possible.

superjan · 2026-03-23T17:23:28 1774286608

That was a very good summary. One detail the post could use is mentioning that 4 or 10 experts invoked where selected from the 512 experts the model has per layer (to give an idea of the savings).

trebligdivad · 2026-03-23T22:03:33 1774303413

I guess this is all set up to show off the new high-bandwidth-flash stuff that's due out soon?

zozbot234 · 2026-03-23T15:33:09 1774279989

A similar approach was recently featured here: https://news.ycombinator.com/item?id=47476422 Though iPhone Pro has very limited RAM (12GB total) which you still need for the active part of the model. (Unless you want to use Intel Optane wearout-resistant storage, but that was power hungry and thus unsuitable to a mobile device.)

Aurornis · 2026-03-23T16:06:15 1774281975

> Though iPhone Pro has very limited RAM (12GB total) which you still need for the active part of the model.

This is why mixture of experts (MoE) models are favored for these demos: Only a portion of the weights are active for each token.

zozbot234 · 2026-03-23T16:52:49 1774284769

Yes but most people are still running MoE models with all experts loaded in RAM! This experiment shows quite clearly that some experts are only rarely needed, so you do benefit from not caching every single expert-layer in RAM at all times.

Aurornis · 2026-03-23T18:00:03 1774288803

That's not what this test shows. It's just loading the parts of the model that are used in an on-demand fashion from flash.

The iPhone 17 Pro only has 12GB of RAM. This is a -17B MoE model. Even quantized, you can only realistically fit one expert in RAM at a time. Maybe 2 with extreme quantization. It's just swapping them out constantly.

If some of the experts were unused then you could distill them away. This has been tried! You can find reduced MoE models that strip away some of the experts, though it's ony a small number. Their output is not good. You really need all of the experts to get the model's quality.

zozbot234 · 2026-03-23T18:10:51 1774289451

The writeup from the earlier experiment (running on a MacBook Pro) shows quite clearly that expert routing choices are far from uniform, and that some layer-experts are only used rarely. So you can save some RAM footprint even while swapping quite rarely.

Aurornis · 2026-03-23T18:12:09 1774289529

I understand, but this isn't just a matter of not caching some experts. This is a 397B model on a device with 12GB of RAM. It's basically swapping experts out all the time, even if the distribution isn't uniform.

When the individual expert sizes are similar to the entire size of the RAM on the device, that's your only option.

zozbot234 · 2026-03-23T18:23:13 1774290193

"Individual experts" is a bit of a red-herring, what matters is expert-layers (this is the granularity of routing decisions), and these are small as mentioned by the original writeup. The filesystem cache does a tolerable job of keeping the "often used" ones around while evicting those that aren't needed (this is what their "Trust the OS" point is about). Of course they're also reducing the amount of active experts and quantizing a lot, AIUI this iPhone experiment uses Q1 and the MacBook was Q2.

QuantumNomad_ · 2026-03-23T20:14:34 1774296874

If I only use an LLM to ask questions about programming in one specific programming language, can I distill away other experts and get all the answers I need from a single expert? Or is it still different experts that end up handling the question depending on what else is in the question? For example, if I say “plan a static web server in Rust” it might use expert A for that, but if I say “implement a guessing game in Rust” it might use expert B, and so on?

Snoozus · 2026-03-24T07:26:37 1774337197

Unfortunately no, experts are typically switched out for every token. The way I understand it the idea was something like having each expert be good at one kind of task, but that's not how it panned out after training.

anemll · 2026-03-24T03:40:33 1774323633

17B includes 10 expert plus one shared. So actual size of the expert is much smaller

jnovek · 2026-03-23T17:54:12 1774288452

I’m so confused in these comments right now — I thought you had to load an entire MoE model and sparseness just made it so you can traverse the model more quickly.

MillionOClock · 2026-03-23T19:25:16 1774293916

I hope some company trains their models so that expert switches are less often necessary just for these use cases.

zozbot234 · 2026-03-23T19:33:37 1774294417

A model "where expert switches are less necessary" is hard to tell apart from a model that just has fewer total experts. I'm not sure whether that will be a good approach. "How often to switch" also depends on how much excess RAM has been available in the system to keep layers opportunistically cached from the previous token(s). There's no one-size fits all decision.

simonw · 2026-03-23T15:42:54 1774280574

Yeah, this new post is a continuation of that work.

foobiekr · 2026-03-23T16:11:34 1774282294

This is not entirely dissimilar to what Cerebus does with their weights streaming.

manmal · 2026-03-23T16:24:27 1774283067

And IIRC the Unreal Engine Matrix demo for PS5 was streaming textures directly from SSD to the engine as well?

WatchDog · 2026-03-24T01:46:38 1774316798

Yeah, also "RTX IO", and Microsoft "DirectStorage".

What was more interesting about the unreal engine demo, was that they can stream not only textures, but geometry too.

Virtual texturing had been around a long time, but virtual geometry with nanite is really interesting.

firstbabylonian · on Sept 21, 2024

Thanks, Firefox support is in the works.

firstbabylonian · on Sept 21, 2024

Thanks — there was a cookie-related bug, which should now be resolved.

firstbabylonian · on May 1, 2024

"Streaming in 0 countries. See where"

criddell · on May 1, 2024

Antarctica maybe?

firstbabylonian · on April 30, 2024

> ability to install an app off an unsigned IPA file for free

I feel like the thinking is that there must be an entity — somebody running an app store — who could be held legally responsible for any damage caused by malware distributed via their channels. Regular non-tech-savvy users cannot be trusted with such delicate software as apps running on their personal phones.

cwales95 · on April 30, 2024

The thing is though, as you said, it's my personal iPhone. If I want to be able to install an unsigned app I should be able to. There should be ways to dissuade the non-technical people but my feeling is it is my iPhone so I should be able to do as I wish.

firstbabylonian · on April 30, 2024

Nothing against you personally, but since you get the same iPhone as the non-technical folks, some compromises have to be made, and they ain’t gonna be in your favour.

robertjpayne · on April 30, 2024

This is the myth that everyone is going to be screwed by. Nobody is going to be legally responsible for malware that ends up on your device.

The only difference is Apple has the $$ and incentives to remove it as soon as it's brought to their attention (assuming it's actual malware that may cause large financial loss not just copyright infringement).

Alt-stores will be ridden with malware and nobody is going to be legally responsible for it. We can just hope the alt-stores that end up existing have incentives to keep them "clean".

firstbabylonian · on April 30, 2024

Correct, which is why allowing no-store app delivery would unleash an even greater chaos. In a world where any random website can trick a user into downloading an app via sideloading, there's no hope to protect people from 'unclean' software.

firstbabylonian · on April 30, 2024

The world is harsh and uncaring. Today's internet reflects that. What used to be a safe space for dorky techies is now a cross-sectional slice of the entire society, a live reel of human nature in motion.

Intralexical · on April 30, 2024

The world may be harsh and uncaring, but that's no excuse for you to be.

firstbabylonian · on April 30, 2024

I am a product of the world, and so are you.

southernplaces7 · on April 30, 2024

You're also a human being with agency and the ability to think and decide how to act towards others, and why. Not behaving like a total trolling, raging adolescent has nothing to do with ignoring the harsher edges of the world and society. You can still firmly defend X or Y argument, while making it as if you were speaking in person to the people on the other end.

Intralexical · on April 30, 2024

It's wild how low-empathy humans can get simultaneously offended and unironically self-righteous when you basically tell them something as simple as "Be nice" or "Try not to hurt people".

…I see that the top-level comment has been edited to be less aggressive, though. Previously it said something to the effect of "If you can't handle that, you have no business being on today's Internet".

firstbabylonian · on April 30, 2024

Yes, I edited the original comment, because I realised that you took offence in it and thought that I was addressing "you" — or the OP — specifically. It was a hyperbolic statement. Does the edit make it clear?

My argument is simple: the internet today is very broad, and humanity — let's face it — on the whole is not very nice. It's admirable to try to create pockets of positivity in it anyway. I simply want to highlight that it is futile, and maybe a more realist perspective is what we actually need to find a way forward.

Intralexical · on April 30, 2024

> It's admirable to try to create pockets of positivity in it anyway. I simply want to highlight that it is futile, and maybe a more realist perspective is what we actually need to find a way forward.

The issue I take with that is that as the problem is entirely social in nature, the perception of futility is also a self-fulfilling prophecy.

firstbabylonian · on April 30, 2024

It’s also something that has been historically true over and over, sadly. All ‘nice’ pacifist societies ended up getting wiped out or subjugated by their uncaring neighbours. Sustainable niceness does not exist. I wonder if we should accept this fact rather than continue to fight it.

Intralexical · on April 30, 2024

You say, after the the last thirty years have probably been the safest, most peaceful, most free, and most human-rights-respecting time period in all of human history for the world at large. And, I assume, you probably say while living in a liberal democracy that has specifically outlived multiple oppressive totalitarian regimes which have kept trying to dominate the world.

I think that's an easy bias to fall for, but also not actually true. If it were true, the world would only ever monotonically get more and more violent over time, until we lived in some kind of exaggerated parody of an apocalypse slasher film, which is not the case.

Sustainable naïve niceness does not exist as the norm. It is still possible to be kind as a default without immediately rolling over for anybody who does not share such values, even if you sometimes have to do so by reciprocating hostility where it is encountered.

Cruelty and conflict are ultimately destructive forces, wasting goodwill, physical resources, and cooperative potential. The social equilibrium may not bend towards kindness as sharply as we would like sometimes, but it certainly isn't a straight drop to apathy and cynicism either. Although perceiving and portraying something as inevitable can go a long way to rationalizing it, or to trying to justify giving up.

firstbabylonian · on May 1, 2024

Thank you for this thoughtful response.

logicprog · on April 30, 2024

Yes, everything about us is ultimately "determined by the world," but that doesn't mean that we have to be as harsh and uncaring as the world at all.

Just as easily as, seeing the harsh and uncaring nature of the world, we could imitate and perpetuate it deterministically, we also could see that harsh and uncaring nature, and choose to be more caring, compassionate, and understanding as an equally inevitable reaction, a rejection or countermeasure to it. It isn't free will, it's just that our individual experiences and personalities as people determine how we process and predicate our actions on what the world is like, and so we can all choose differently.

And in fact, even the exhortations and rationale of strangers are part of the stimuli in the world that may change how you act. Which is why I think it is worth it to say what I'm saying now.

Superficial determinism is the hobgoblin of little minds.

Personally, while I am not always nice on the internet because I struggled to countenance fools, rationalizations, and people who lie to themselves, which happens surprisingly often, I do really try, and more importantly even if I am not nice, even at my least nice, I try to always be as genuine and authentic as possible, and always be open to having my mind changed and genuinely put my beliefs on the line. And I think that in itself can be surprising and refreshing for the people I interact with.

firstbabylonian · on April 30, 2024

Does it not get exhausting to swim against the current all the time? Don't you ever wish you could let you go and let your inner little mind take control, if not just for a moment? What you describe as being genuine and authentic and open minded sounds like a cross you have decided you must bear. It's okay. You are not Jesus. You don't have to do this.

logicprog · on April 30, 2024

I just adjust the amount of time I spend online down in order to compensate for the added emotional strain, and also adjust what sorts of social media I interact with to ensure it is sustainable. I've also been slowly learning how to just disengage after a certain point: if a discussion really seems like it's going down the drain or not going to go anywhere useful then I'm learning to sort of just let someone else have the last word and move on, which has been good for my health. I don't want to engage with someone that isn't going to meet me on some fair level of discourse.

Because you are correct, making top level posts on a twitter-like social media constantly with this ethos was actually so emotionally exhausting for me it physically affected my health, but the solution to that is just to not do that anymore.

Also, I do this not because I've arbitrarily decided that I've got to bear this cross, as you say, but because it is my default mode of interaction, in fact the only one I know, and it's something I very much like about myself that I always interact in this manner, and I think trying to learn how to be less genuine and less invested and less open, even if just for my online interactions, would sincerely leak out into my character in general in a way I don't like. So I'm not really doing it out of a sense of duty, but essentially out of a sense of convenience, because I don't want to have to go through the effort of learning how to context switch between a mode of interaction for being online and a mode of interaction for being offline.

firstbabylonian · on April 30, 2024

Thank you for this honest response, really. I wish you the best of luck out there.

logicprog · on April 30, 2024

Thank you, I'll need it lol <3

Intralexical · on April 30, 2024

Like an animal, then?

Not sure if surrendering your free will to decide how you treat people, and all the moral implications that come from being seen as a person rather than just a creature, is the position you want to be taking.

firstbabylonian · on April 30, 2024

We are all animals though. That's what's freeing about the internet. I'm not sure you want to take it so seriously.

firstbabylonian · on March 4, 2024

> For performance reasons, Puter is built with vanilla JavaScript and jQuery.

jQuery is overdue for a comeback.

noduerme · on March 5, 2024

It never really went away for me. I still reach for it in every web app. Over time, I've stopped using some parts of it that used to be timesavers... e.g. shifting away from $.ajax calls to fetch, but it's a good base for rolling your own responsive frameworks if that's what you want to do. And it's what I want to do, because I dislike the paradigms for react and vue, and have no interest in relying on those projects.

enriquec · on March 5, 2024

could not disagree more. That statement alone makes me completely lose any consideration for this entire project.

firstbabylonian · on Feb 26, 2024

Clean, sensible, introverted living is overrated.

There's something deeply human in the whole 'awkward dance'. Not everyone is born a great dancer, but this doesn't mean that dancing is a stupid and pointless activity.

People have been trading with one another (and often tricking or getting tricked in the process) for millennia. The 'Grand Bazaar' you evoke stands for something timeless — a promise of wealth earned with wit. In a twisted way, our American love for capitalism and these sleazy dealerships are chasing the same ancient dream.

May as well be a part of it.

RugnirViking · on Feb 26, 2024

> tricking or getting tricked in the process

I live in a world where I do not have to trick or be tricked by other people. I surround myself with people whom I can actually trust to have my interests at heart, and I try to look out for their interests as well. This is not just the case for my friends, but has been in my communities, across multiple countries. I can happily go to the market and buy whatever at a listed price, confident that the market stall owner is pricing it at enough to make a comfortable living and no more. I can be confident that if my purchase is flawed, I can say, and be trusted with a refund easily.

This, to me, is a deeply human system. The idea that deceit and cheating are features of life just doesn't bear out for me. There are people that do that, and when I encounter them, I do whatever I can to make sure I never encounter them again. Unfortunately, the consolidation of large businesses makes it more and more common. I avoid those where I can also.

maxehmookau · on Feb 26, 2024

But dancing is fun. The act of dancing evokes joy, and happiness for people.

For most people, buying a car is a chore. I'm not here to be negotiated with. Give me the price of something that means you make your profit; if it seems like a good deal, I'll buy it.

Fire-Dragon-DoL · on March 3, 2024

I don't like dancing and for some reason society keeps forcing me into that.

anthonypasq · on Feb 27, 2024

you completely missed the point. theres many cultures throughout the world that enjoy haggling. In fact the west the weird one in that regard

firstbabylonian · on Feb 26, 2024

If you think of Huggingface as a source of 'fundamental models', it makes sense.