Articles

AI: Bubble or bottleneck?

01 December 2025
Artificial Intelligence (AI) itself is not a bubble; rather, the current market exuberance is concentrated around a hardware bottleneck in its compute architecture. Carl Vine, Co-Head of Asia Pacific Equities, takes a closer look at the hardware underpinning AI and argues that the current model is unsustainable due to energy and physical constraints. In his view, the bottleneck will likely be resolved through innovations in hardware like photonic computing or software efficiency, which could disrupt the AI landscape and shift market valuations.

Introduction: Looking beyond the current compute bottleneck

AI is not a bubble, it is a technology. And it’s just getting started. Rather than arguing about bubbles, it might be more fruitful to be thinking about bottlenecks. That’s arguably a better description of what we are observing. If excessive valuations exist in equity markets today, it most likely lives in values that have accumulated very specifically around the bottleneck inside the current AI-compute stack.

Markets today are pricing the "GPU+HBM AI-compute modality” as the critical enabler of the AI age. (GPU stands for graphics processing unit and HBM is high bandwidth memory). In the recent past, and likely the near future, this has been and will be accurate. But with a longer-range lens, this very same architecture may well come to be seen as a constraining choke point.

Unless we are prepared to turn the entire world into a power station, the current GPU+HBM model is unlikely to scale as it needs to over the long term on account of its sheer energy intensity. This is the perversion that investors face. In the near- and maybe intermediate-term, the current AI compute modality is all we have and, let’s face it, compute demand is booming. However, the current modality is likely unsustainable in the medium to long term.

As the old trading line goes, “the best cure for high prices is high prices.” The world will work around this compute bottleneck because it has to. And when it does, the AI cost curve will reset and certain stock market values will be cast in a new light.

There will be winners and losers. On the other side of the bottleneck lies higher token1 throughput and speed. That will mean lower marginal cost and further explosion in AI-compute demand, taking the world into a new phase of AI capability, adoption and productivity. All of this is likely very bullish for AI itself, but potentially less bullish for the companies currently benefiting from bottleneck unit-economics. The critical question, as ever, is timing.

Bubble or technology?

When markets become confused, they reach for historical precedent. Think of it as the astrology of investing. The favourite comparison for what we are observing today is the so-called “internet bubble”. But the internet was never a bubble. Valuations were. The technology not only survived, it went on to rewire the entire global economy. It delivered on its promise.

The same distinction applies here. AI is not a bubble, it is a technology shift. Of course, what investors are prepared to pay for AI “exposure” is where bubble-like behaviour may manifest. 

“…what investors are prepared to pay for AI “exposure” is where bubble-like behaviour may manifest.”
 

This piece is not intended to convince you one way or another about whether the stock market is a bubble. It is merely an attempt to reframe the debate and provide perspective on the questions we should be asking to help navigate a complicated topic.

The AI premium

Bears are clearly not crazy to be concerned. Since ChatGPT launched in late 2022, the S&P 500 has added roughly 24 trillion dollars in market value and roughly 70% of that increase is linked to companies involved in AI infrastructure, semiconductors or hyperscaling2.

What's more, the gains have been extremely concentrated. Around 65 per cent of the rise in the S&P 500 in this period has come from just ten stocks, all but one of which (Eli Lilly) have AI-heavy narratives.

International markets show a similar, if less concentrated, pattern: roughly 25 per cent of the gains in Japan’s Topix 100, 35 per cent in Korea’s Kospi 100, 20 per cent in Europe’s STOXX 100 and one third in China’s CSI 300 have come from AI-linked stocks3. The concentration sits squarely in the compute value chain, including of course Nvidia, TSMC and SK Hynix.

It is clear, and interesting, that markets are not yet pricing AI as a broad productivity transformation. They are, however, pricing, and quite decisively so, AI as a hardware bottleneck. To understand whether this is sustainable, we need to understand that bottleneck.

AI-related stocks have fuelled stock market gains globally

Source: M&G, Bloomberg, November 2025. *ChatGPT was launched in November 2022.

The architecture behind the excitement

A large language model (LLM) is a stack of mathematical layers trained to predict the next word, or token, in a sequence. Each prediction involves multiplying enormous matrices of numbers by incoming data (prompts). Billions of small, identical operations define this workload.

This is what a GPU was originally designed to do. Built in the 1990s to render computer graphics, GPUs specialise in performing many simple calculations in parallel. That happened to align perfectly with the linear algebra at the heart of modern AI.

But GPUs are only half of the picture. To perform those operations, they must continuously pull model weights from the HBM mounted beside the GPU compute die. The GPU does the maths; the HBM feeds it. Every token requires data to be moved across that interface. This GPU+HBM structure has made Nvidia, TSMC, SK Hynix and their vast supply chains the centre of the current AI boom.

The 2025 inflection: speed versus throughput

While GPUs were indeed built for parallel compute, they were not originally designed for today’s leviathan LLMs. The GPU+HBM model is elegant but constrained. The rate at which data moves between compute and memory determines token speed. That link is a fixed-width pipe. Fill it with many user streams and you maximise throughput, meaning many tokens across many users, but each stream slows. Allocate more of the pipe to one user and you get fast tokens, but overall throughput falls. In this architecture, you can have fast tokens or lots of tokens; you cannot have both.

Until 2025, most AI workloads optimised for throughput; they were tolerant of latency. A chatbot or summariser could take a second to respond and nobody cared. Agentic AI changed that4. Planners, code assistants, search agents and multi-step reasoning systems must operate in real time. Each token triggers the next computation. Latency compounds.

In 2025 then, agentic-use cases not only drove up demand for tokens, they simultaneously drove up demand for speed. Token speed became the gating factor for usefulness.

Once enterprises realised that slow tokens meant slow agents, data-centre demand shifted sharply toward hardware that could deliver higher per-stream speed. But the GPU+HBM architecture cannot deliver fast tokens without sacrificing throughput. The only near-term fix was brute force: buy more GPUs.

Oracle’s US$400 billion backlog and the surge in DRAM (dynamic random-access memory) pricing reflect this reality. The stock market was not hallucinating this summer. It was responding to a massive demand surge. Part of that surge, however, is embedded in the structural constraints of the architecture we have.

Physical limits

Commodity traders say the best cure for high prices is high prices. Something similar may apply here. The physical movement of data between the GPU and HBM has become a major cost and energy driver. When workloads demand faster token speeds or longer context windows, the pipe between compute and memory saturates. Nvidia responds with more memory, more silicon and more power. Performance scales, but so does cost and energy demand.

This is the economic inversion at the heart of the AI boom. The cost per unit of "useful compute" (lots of fast tokens) is no longer falling. We are in a compute moment where adding capacity is not obviously making computation cheaper at the unit level.

This inversion has a single root cause. Dennard scaling, the principle introduced in 1972 by IBM engineer Robert Dennard that shrinking transistors improves both speed and efficiency, ended in the mid-2000s. Transistors still shrink, but power per transistor is no longer falling. Today, for leading-edge AI compute stacks, moving a bit across a board costs more energy than computing on it. The entire AI compute crunch, from power draw to data-centre thermals to grid stress, is downstream of this collapse.

The bottleneck is not HBM supply, interposer yield5 or GPU availability. Those are solvable; not easily but solvable. The real bottleneck is power, and that is what is potentially undermining the economics of large-scale AI compute with the current hardware modality.

When physics pushes back, architecture must change. As we argued in our recent piece, “From electrons to photons: the next great compute transition”, when physics breaks the economics of compute, the only durable response is to change the architecture. We have seen this before. The industry shifted from valves to transistors in the 1940s and 1950s for exactly the same reason. The incumbent modality became a victim of its own success and could no longer scale cheaply enough to satisfy demand.

Bottleneck unit economics

Demand for AI compute is soaring. GPU plus HBM is the hardware solution of the moment, but supply of both remains tight. Pricing power for the suppliers is therefore high, which means margins are exceptional. This is what is meant by “bottleneck unit economics”: scarcity pricing in the components at the centre of constrained supply.

Some observers point to circular activity, such as Nvidia investing in its cash-burning customers, as evidence of artificial demand. We see it differently. This is not an attempt to buy stimulate demand; it is an attempt at market share defence in the face of emerging competition. Google’s underappreciated tensor processing unit (TPU) success and the rising use of application-specific integrated circuit (ASIC)-based AI accelerators show that normal market forces are already at work6. These competitors still rely on HBM, so this is not yet a new modality, but it is a reminder that high margins in a constrained market attract innovation and substitution. Again, the best solution to a bottleneck is a bottleneck.

“…high margins in a constrained market attract innovation and substitution.”
 

Demand for AI services is rising far faster than any rethink of compute architecture can be implemented. This creates a window, perhaps three or four years long, in which bottleneck economics can potentially persist and incumbent beneficiaries can continue to milk their position.

What's next?

Let’s be honest, who knows. But we can make some educated guesses. In the short term, there is no choice but to double down on the existing architecture and optimise it relentlessly. That is happening right now. The entire compute ecosystem is currently engaged in an all-out effort to stretch the GPU+HBM model as far as physics will allow. This is visible everywhere in the supply chain. Some of the deepest pockets and smartest minds on the planet are on the case; we shouldn’t be too sceptical.

SK Hynix, already the global leader in HBM3E, is pushing ahead aggressively with next-generation HBM4. Its base-logic die technology remains ahead of peers, enabling higher clock rates, improved thermals and more reliable throughput at extreme bandwidths.

On the networking side, hyperscalers are deploying optical switches and advanced optical connectors at unprecedented scale to reduce electrical I/O bottlenecks7 and energy loss. Co-packaged optics is being accelerated because the traditional copper traces on large interposers cannot handle the required bandwidth per watt.

Even more exotic ideas are being trialled. Superconducting bus bars for power delivery are appearing in next-generation data-centre designs to reduce resistive losses. Liquid-cooled backplanes, cold plates and immersion systems are now standard, not fringe.

In the short-term then, the world likely continues to double down on the GPU+HBM modality. There are after all no currently available, mass-produceable alternatives. However, the underlying physics of electronic compute will likely prove immovable. There are really only two realistic ways out of this GPU+HBM bottleneck: change the hardware and/or change the software.

The hardware route: from electrons to photons?

One exit route is a hardware shift. If electrons are too power hungry and too expensive to move, one solution is to change the medium of compute. Across academia and industry, there is accelerating development in photonic computing, analogue accelerators, in-memory compute and optical interconnects. These approaches share a principle: move data less and compute where the data lives.

Photonic computing in particular offers a clear path. In the optical domain, matrix multiplications become dramatically more energy-efficient, and data movement becomes less costly. If moving bits dominates cost, the architecture must change to move bits differently. We outlined this trajectory in our article, From electrons to photons.

This is the hardware exit route: a new compute modality that restores the falling cost curve and unlocks the next phase of AI. Current estimates of when a photonic compute solution will be ready for commercial-grade mass production range between 3 years and 10 years. In the meantime, the world is very much reliant on what it currently has.

The software/algorithmic route and China’s forced experiment

The other route is to change the software, or in this case the algorithm. China has been running a natural experiment in constrained compute. Restricted from advanced GPU supply, Chinese labs have been forced to optimise for efficiency rather than brute force. Lighter models, compression, sparsity, retrieval-centric architectures and clever reuse of weights are becoming the norm.

This strategy will almost certainly not win the global race to artificial general intelligence8, if there is such a thing, but it may produce strong financial returns. Models that deliver 80 per cent of the capability at a fraction of the compute cost are attractive to enterprises. China’s constraint has forced a kind of capital discipline that may prove valuable if compute scarcity lasts longer than expected. This is one algorithmic exit route: do nearly the same with a lot less. 

“Models that deliver 80 per cent of the capability at a fraction of the compute cost are attractive to enterprises.”

The GPU-HBM supernova and market risk

Big bottlenecks often behave like supernovas. They grow bigger and brighter right before they collapse. Arguably, the same is happening in GPU+HBM-centric AI compute. Nvidia, SK Hynix and their huge ecosystems are pushing memory technology as hard as physics permits. The shortage of fast tokens has made HBM the gating resource. Its marginal value is rising.

But this brightness is also the signal. The scarcity premium embedded in GPU+HBM economics is fragile. When faster and cheaper alternatives to matrix multiplication become viable, that premium will evaporate. This is not a bearish call on Nvidia or Hynix or any other company per se. After all, large incumbents, in particular, have the capability and resources to pivot, adapt, innovate and reinvent themselves. What is at risk is the economics of the modality they currently dominate. Scarcity economics do not persist.

Once the bottleneck breaks, capital will likely migrate. It will move toward two destinations: the providers of alternative compute architectures, and the enterprises that turn cheaper compute into real-world productivity. Companies that solve the underlying power problem have the potential to win big. In addition, cheaper AI compute will likely give birth to new businesses and market-cap creation we cannot even yet imagine.

The point is not to be bearish per se about markets, but to recognise that much of the value creation in recent years, especially in the US market, has been around a compute modality that may prove to be the victim of its own success. In the meantime, the “double-down” period could conceivably last several years.

The issue for investors, of course, is that markets are discounting mechanisms. Even if mass production of photonic compute alternatives is, for the sake of argument, 5 years away, once the market confirms line-of-sight on a feasible alternative, its valuation inclination will likely change drastically.

In the early 2000s, hard disk drive (HDD) stocks traded on mid-single digit price-to-earnings ratios because the world was fixated on the perception that NAND Flash memory had established technical superiority9. Looking back, the market was being harsh. Some 20 years later, HDDs remain the dominant modality for storing data in the world today. The point, however, is that the market punished HDD valuations when it realised that a new technology had emerged.

Bottleneck economics may well be safe for a while yet, but markets might only ever be a headline away from an alternative architecture breakthrough and a wholesale market re-assessment. Many stocks have gone up 10x in recent years serving the GPU-HBM supply chain. One wonders how they might fare in such a moment.

Let’s face it, supernovas are hard.

Conclusion: Winners and losers of AI

AI itself is not a bubble. It is a once-in-a-generation restructuring of how computation and cognition interact. Its deployment is just getting started. On the software side, winners will be those firms that deploy AI to reallocate economic resources more efficiently. Interestingly, the market has thus far ascribed little love to this area, preferring instead to focus on the hardware.

If there is a bubble anywhere, it is localised in the bottleneck of the current compute modality. From here, the hardware winners will be those companies that solve the power/cost constraint. The losers will be those who assume today’s scarcity will last forever.
 

1 In AI tokens are the basic units of text that a language model understands and processes.
2 Hyperscalers are the operators of global-scale cloud computing, who provide the infrastructure and services for AI.
3 Source: M&G, Bloomberg, November 2025.
Agentic AI refers to AI systems that can act autonomously, operating more like “agents” than simply responding to prompts.
5 An interposer is a thin layer used in semiconductor packaging to connect a chip to a circuit board or another chip.
6 Tensor Processing Units (TPUs) are Google's custom-developed, application-specific integrated circuits (ASICs) used to accelerate machine learning workloads.
7 Electrical I/O bottlenecks refer to limitations in the speed and efficiency of data transfer between chips or components using traditional electrical interconnects.
8 Artificial general intelligence refers to the hypothetical intelligence of a machine that possesses the ability to understand or learn any intellectual task that a human being can.
NAND flash memory is a type of non-volatile storage technology that can retain data without a power source. It supports everything from mobile and embedded solutions to data centre storage applications.
By Carl Vine, Co-Head of Asia Pacific Equities


)
Sign Up

Sign in to continue reading

Access all our articles and search the provider directory for free.