Interesting to note, as an outside observer only keeping track of this stuff as a hobby, that it seems like most of OpenAI’s efforts to drive down compute costs per token and scale up context windows is likely being done in service of enabling larger and larger chains of thought and reasoning before the model predicts its final output tokens. The benefits of lower costs and larger contexts to API consumers and applications - which I had assumed to be the primary goal - seem likely to mostly be happy side effects.
This makes obvious sense in retrospect, since my own personal experiments with spinning up a recursive agent a few years ago using GPT-3 ran into issues with insufficient context length and loss of context as tokens needed to be discarded, which made the agent very unreliable. But I had not realized this until just now. I wonder what else is hiding in plain sight?
I think you can slice it whichever direction you prefer e.g. OpenAI needs more than "we ran it on 10x as much hardware" to end up with a really useful AI model, it needs to get efficient and smarter just as proportionally as it gets larger. As a side effect hardware sizes (and prices) needed for a certain size and intelligence of model go down too.
In the end, however you slice it, the goal has to be "make it do more with less because we can't get infinitely more hardware" regardless of which "why" you give.
This makes obvious sense in retrospect, since my own personal experiments with spinning up a recursive agent a few years ago using GPT-3 ran into issues with insufficient context length and loss of context as tokens needed to be discarded, which made the agent very unreliable. But I had not realized this until just now. I wonder what else is hiding in plain sight?