Skip to main content
Illustration for: Cloud analyst Sriram Subramanian predicts mixed inference model for AI workloads

Cloud analyst Sriram Subramanian predicts mixed inference model for AI workloads

2 min read

Why does this matter? The recent debate sparked by the Perplexity CEO's warning about AI data‑centre capacity has put the industry on edge. Executives worry that ever‑larger models will swamp existing infrastructure, driving up costs and energy use.

At the same time, startups tout on‑device processing as a way to sidestep those bottlenecks. The tension between centralized horsepower and edge efficiency is now a boardroom staple. While the hype around massive clouds persists, a growing chorus of analysts suggests the answer may lie somewhere in between.

Here's the thing: cloud specialist Sriram Subramanian, who founded the research firm CloudDon, has been weighing in on the issue. The business‑startup community is watching closely, aware that any shift could reshape investment priorities. If the split proves viable, firms might avoid the massive capital outlays that have defined the last two years.

His remarks to AIM hint at a hybrid approach that could balance cloud power with on‑device efficiency.

In a conversation with AIM, Sriram Subramanian, cloud computing analyst and founder of market research firm CloudDon, said he expects a mixed model, in which inference is split between the cloud and the device to improve performance. "The other angle is moving to smaller AI models where the requirements aren't much for the user." "GPUs will be the larger pie definitely," he declared, adding that powerful cloud-based compute will remain necessary for accuracy and high-demand workloads. If users want the most accurate and contextually relevant responses, they may continue to prefer cloud-based GPUs, which will remain more powerful than on-device systems, even as local AI proves increasingly capable.

Related Topics: #AI #GPUs #Sriram Subramanian #CloudDon #on‑device #mixed inference #cloud #Perplexity

Will the cloud keep its hold on AI? Sriram Subramanian says the future will likely split inference between cloud servers and edge devices, a compromise that could ease latency and cut bandwidth costs. Companies are already pouring billions into GPUs and massive data‑centre expansions, operating on the premise that large models demand centralized power.

Yet Aravind Srinivas warns that local processing may become the biggest threat to those investments, suggesting that on‑device inference could erode the need for ever‑larger facilities. The analyst also notes a trend toward smaller models, which require less compute and could shift the balance further toward the edge. It remains unclear whether this mixed approach will curb the current growth of AI‑focused infrastructure or simply add a new layer of complexity.

What will drive adoption—performance gains, cost pressures, or regulatory factors? The answer, for now, sits in a space of uncertainty, with both cloud and device strategies vying for relevance.

Further Reading

Common Questions Answered

What mixed inference model does Sriram Subramanian predict for AI workloads?

Sriram Subramanian forecasts a hybrid approach where inference is divided between cloud servers and edge devices. This split aims to boost performance, reduce latency, and lower bandwidth costs while still leveraging powerful cloud GPUs for high‑accuracy tasks.

Why does Subramanian believe GPUs will remain a "larger pie" in AI processing?

He argues that GPUs provide the massive compute power required for large, accurate AI models, especially for demanding workloads. Consequently, cloud‑based GPU resources will continue to be essential even as some inference moves to smaller, on‑device models.

How might on‑device processing threaten current data‑centre investments, according to the article?

The article notes that experts like Aravind Srinivas view local inference as a potential threat because it could diminish the need for expansive cloud infrastructure. If more AI tasks run on edge devices, the billions invested in GPU farms and data‑centre expansions may see reduced utilization.

What concerns did the Perplexity CEO raise that relate to the cloud versus edge debate?

The Perplexity CEO warned that AI data‑centre capacity could become a bottleneck, with ever‑larger models straining existing infrastructure, driving up costs and energy consumption. This concern fuels the discussion about shifting some workloads to on‑device processing to alleviate cloud pressure.