This has been a busy week in the AI world with several outlets near simultaneously reporting that AI companies are seeing a decisive slowdown in scaling – i.e. the new AI models are no longer getting commensurately better and delivering new features with increases in data and parameters.
Until recently, the widespread expectations were that scaling will continue for the foreseeable future. We can see this from Microsoft (MSFT) CTO Kevin Scott interview last quarter. The “scaling will continue” has been the vocal theme among until about September when scaling proponents went silent. This changed during this last week. It started with an article in The Information “OpenAI Shifts Strategy as Rate of ‘GPT’ AI Improvements Slows” . Around the same time, a podcast from A16Z discussed reports of scaling slowing down. This was followed a day later by a story in Reuters where AI heavyweight Ilya Sutskever discussed scaling slowdown. Bloomberg followed up with an article pointing out that three major players – OpenAI, Google(GOOG) (GOOGL), and Anthropic – are all having the same problem.
Of these articles, the Reuters article was slightly more detailed and discussed the current research thrust that aims to overcome the scaling problem:
To overcome these challenges, researchers are exploring “test-time compute,” a technique that enhances existing AI models during the so-called “inference” phase, or when the model is being used. For example, instead of immediately choosing a single answer, a model could generate and evaluate multiple possibilities in real-time, ultimately choosing the best path forward.
The implications could alter the competitive landscape for AI hardware, thus far dominated by insatiable demand for Nvidia’s AI chips. Prominent venture capital investors, from Sequoia to Andreessen Horowitz, who have poured billions to fund expensive development of AI models at multiple AI labs including OpenAI and xAI, are taking notice of the transition and weighing the impact on their expensive bets.
“This shift will move us from a world of massive pre-training clusters toward inference clouds, which are distributed, cloud-based servers for inference,” Sonya Huang, a partner at Sequoia Capital, told Reuters.
Demand for Nvidia’s AI chips, which are the most cutting edge, has fueled its rise to becoming the world’s most valuable company, surpassing Apple in October. Unlike training chips, where Nvidia dominates, the chip giant could face more competition in the inference market.
At this point, just about the only major company left to validate scaling problems is Meta (META). We may not have to wait too long. Based on comments from CEO Zuckerberg, Meta appears to be no more than a couple of months away from completing training for its GPT5 class model. Meta seems to be applying more GPU scale than anyone else on the problem and it will be interesting to see if they see anything different. While Meta scale and approach leaves room for a surprise, odds favor Meta also coming to the same conclusion as the others. Time will tell.
Around the same time the news about scaling was making headlines, researchers from Harvard and Stanford, among others, published an interesting paper which questioned the ever increasing use of low precision math in AI computation. The paper, Scaling Laws for Precision, noted:
“Low precision training and inference affect both the quality and cost of language models, but current scaling laws do not account for this. In this work, we devise “precision-aware” scaling laws for both training and inference. We propose that training in lower precision reduces the model’s effective parameter count, allowing us to predict the additional loss incurred from training in low precision and post-train quantization. For inference, we find that the degradation introduced by post-training quantization increases as models are trained on more data, eventually making additional pretraining data actively harmful.”
When taken together, the developments during this last week mean two things:
1. Scaling, which drove easy gains in model performance, and which also drove ever increasing training farm sizes, is not likely to be a primary driver of AI model performance improvements anymore. Scaling may deliver some gains but may be too little and too expensive to be worthwhile.
2. Due to tradeoffs between training and inference, we may be near the end of the road with gains from lower precision.
These suggest that incremental model gains will be difficult with the current approaches and the industry needs to move to new approaches to get the desired outcomes. However, one should be careful and not assume that scaling is completely dead. It is likely that a incremental economic scaling will occur before further scaling becomes uneconomical.
Based on all the publicly available information, the path forward seems to be to make intelligent use of inference and agents to increase the efficacy of solutions. But this is not an optimal tradeoff in the sense that increasing the cost of training, roughly speaking, is a one-time effort. Whereas increasing the cost of inference means that cost goes up every time the AI service is used. For high volume applications, one would rather have higher cost training and lower cost inference than the other way around. Depending on the costs involved, higher cost inference becomes a barrier to widespread adoption.
Despite this dynamic, all efforts now point to increased use of inference as the path forward.
Impact On Hardware Deployment
Keep reading with a 7-day free trial
Subscribe to Beyond The Hype - Looking Past Management & Wall Street Hype to keep reading this post and get 7 days of free access to the full post archives.