The Sohu Bet: A Transformer-Only Future

Etched is making one of the boldest bets in AI hardware: burning the transformer architecture directly into silicon. Founded in 2022 by three Harvard dropouts, the company built Sohu, a transformer-only ASIC that discards all other neural network types to maximize speed and efficiency. The gamble looked reckless before ChatGPT, when CNNs and RNNs still dominated AI but the rise of transformers across every domain has validated Etched’s thesis.

The result is staggering performance. A handful of Sohu chips can match the throughput of hundreds of NVIDIA GPUs on transformer inference, while consuming far less power. This not only slashes costs for hyperscale data centers but also unlocks real-time applications like conversational agents, generative media, and humanoid robots, use cases where latency and efficiency are critical.

Etched is entering a field crowded with general-purpose GPU incumbents and custom silicon challengers, yet its position is unique. It is the only company to fully commit to transformer-only hardware. The risks are real. If transformers are ever displaced, Sohu’s advantage vanishes but if it does not, the upside is extraordinary. 

The Gamble That Is Paying Off

Etched was founded by Gavin Uberti, Chris Zhu, and Robert Wachen, three Harvard dropouts who recognized a gap between the promise of transformers and the limits of existing silicon. At the time, their focus looked almost reckless. ChatGPT hadn’t launched, diffusion models relied on U-Nets, self-driving cars ran on CNNs, and transformers were far from the universal standard they are today.

The world has since shifted in their favor. Transformers now dominate every frontier of AI, from language and vision to video, search, and agents, turning Etched’s bet from contrarian to prescient. That momentum has made Sohu, the company’s transformer-only chip, one of the most consequential hardware projects of the decade.

In mid-2024, investors rallied behind Etched, with the company raising $120 million in Series A financing at an undisclosed valuation, widely believed to exceed $1 billion. The round drew support from Peter Thiel, Primary Venture Partners, Positive Sum Ventures, and Replit CEO Amjad Masad. On secondary markets, Etched’s shares have since been trading at levels implying a valuation of roughly $1.5 billion as of early 2025.

Why Transformers Are the Moat

Etched’s strategy is deliberately narrow: by burning the transformer architecture directly into silicon, the Sohu chip cannot run older models like convolutional nets (CNNs), recurrent nets (RNNs, LSTMs), or even specialized systems such as AlphaFold 2 or the DLRMs that power Instagram ads. The tradeoff is absolute focus: Sohu cannot run everything, but on transformers, it runs faster and more efficiently than any alternative.

This focus rests on a simple thesis that transformers are not just another model in the toolkit, they are the defining paradigm of modern AI. In just a few years, they have gone from a research curiosity to the backbone of nearly every breakthrough, driving chatbots like ChatGPT, image generators like Stable Diffusion 3, video platforms like Sora, and even the next wave of AI-powered search and autonomous agents. The current and next-generation state-of-the-art models are transformers. 

Today’s software stack itself is optimized for transformers. Frameworks such as TensorRT-LLM, vLLM, and Hugging Face TGI all ship with highly tuned kernels designed specifically for transformer inference on GPUs. Features like speculative decoding or tree search, now central to modern AI systems, are deeply tied to the transformer architecture and far less compatible with alternative approaches. This level of ecosystem lock-in makes transformers not just dominant, but entrenched. The reasons for this dominance are structural. 

Transformers are generalizable. Their attention mechanism excels at capturing relationships and long-range dependencies, making them well suited to any data that can be represented as a sequence, words in text, patches in an image, or frequency patterns in audio. Unlike earlier architectures that were locked into a single modality, transformers can adapt across domains.

They are also scalable. By stacking encoders and decoders or multiplying attention heads, transformers expand almost without limit. This modularity makes them easy to parallelize, which in turn has enabled training at unprecedented scale, billions and soon trillions of parameters. The absence of rigid task-specific design is compensated by sheer size, a property that rewards those with the compute to push them further.

Most importantly, transformers are trainable at scale. They can learn directly from raw, unlabelled data by predicting the next element in a sequence, sidestepping the bottleneck of human-labeled datasets. This has opened the floodgates to orders of magnitude more training data: every web page, video, and audio file becomes usable fuel. The shift from supervised to self-supervised learning is the single greatest unlock of the transformer era.

This trio of generalizability, scalability, and trainability has created an extraordinary moat for transformers that Etched is tapping into. Competing architectures would need to match transformers not in one domain, but across all of them, while also catching up to the massive head start of pre-trained models already in use. 

Rewriting the Compute Playbook With Sohu

Sohu is the physical embodiment of Etched’s thesis: strip support for convolutional nets, recurrent nets, or any non-transformer models and pour every transistor into the bottlenecks unique to transformers. That laser focus pays off. Etched claims a single server outfitted with eight Sohu chips can outperform 160 NVIDIA H100 GPUs on transformer inference tasks while using significantly less energy. In benchmarks with models like LLaMA‑70B, these 8‑chip Sohu servers surpass 500,000 tokens per second far beyond the ~23,000 tokens per second typical of an eight‑GPU H100 cluster.

Even compared to NVIDIA’s newest Blackwell architecture, Sohu maintains a clear edge in transformer inference. Blackwell is impressive, in MLPerf inference, it delivers up to 4× the performance of H100 on large LLM workloads like LLaMA‑2 70B. But Sohu’s single-minded optimization for transformers lets it leap even further ahead in that specific domain, with multi-hundred-thousand token-per-second throughput that Blackwell has not demonstrated.

This kind of performance not only drives major cost savings in hyperscale data-centers, it also positions Sohu as a potential cornerstone for humanoid robot cognition, delivering transformer-grade compute with very low power consumption. 

The Crowd Around Etched

At first glance, Etched operates in a crowded field. A search for “AI chips” yields dozens of companies claiming to compete, but a closer look shows how different Etched’s bet really is.

Incumbents: NVIDIA dominates with roughly 87% of the discrete GPU market, followed by AMD at 10%. Hyperscalers like Google (TPUs), Amazon (Trainium, Inferentia), and Microsoft (Maia) have invested heavily in custom silicon, but they consistently trail NVIDIA by 6–18 months in raw performance. All of these chips are general-purpose, designed to handle both training and inference, which makes them slower and less efficient than a transformer-only ASIC like Sohu. NVIDIA could in theory build something similar, but they are disincentivized: their CUDA ecosystem depends on GPUs being general-purpose, and abandoning that flexibility would erode their moat.

Startups: Several well-funded challengers have tried to carve out space. Cerebras ($720M raised) built wafer-scale engines with extraordinary engineering but struggled to keep pace with the rapid growth of LLM sizes. SambaNova ($1B+) offers configurable accelerators but has pivoted to model-as-a-service after limited adoption. Graphcore ($760M) launched its Intelligence Processing Unit (IPU) but failed to deliver performance and lost key customers.

Other players focus more narrowly on inference. Groq ($360M) developed its Language Processing Unit (LPU), a chip designed for high-throughput inference that can be scaled in pods of hundreds. d-Matrix ($154M) is pursuing a chiplet-based architecture, while Kneron ($190M) and Sima ($200M) are building chips for edge devices like autonomous cars rather than LLM-scale inference.

Direct rivals: A few companies are betting more explicitly on transformers. MatX is working on LLM inference optimized for low-precision quantization (int4), though it is still unclear if accuracy can be preserved at that level. Positron markets a “Transformer Inference Appliance” by chaining GPUs together, but is not designing its own silicon.

Against this backdrop, Etched’s position is unique. It is the only company to ship a transformer-only ASIC designed from the ground up for this architecture. While risky if transformers are ever replaced, the upside is unmatched: if the architecture holds, Sohu is positioned to be the fastest, most efficient transformer chip in the market, with no direct competition. 

The Brains of Humanoids

Humanoid robots and Physical AI systems demand real-time cognition, the ability to perceive, reason, and act instantly. These workloads are dominated by transformers, which require enormous inference bandwidth at the edge. By delivering transformer performance at levels general-purpose hardware cannot reach, a specialized ASIC like Sohu is uniquely positioned to a critical piece of humanoid robot brains.

In future system-on-chip designs, GPUs will remain indispensable for their flexibility across diverse AI and parallel workloads. But wherever transformer inference dominates, those cycles will be offloaded to specialized silicon. For humanoid cognition, that means the most demanding reasoning and perception tasks could run directly on transformer ASICs like Sohu, bridging the gap between research demos and deployable robots.

Etched is the first company to commit to this vision at scale. By focusing exclusively on transformers, it has carved out an uncontested category, positioning Sohu not as just another chip but as a foundational infrastructure layer for the AI economy. If transformers remain the backbone of intelligence, Sohu could be remembered as the technology that enabled humanoid robots to step out of the lab and into the real world.

The risk is real. If a new paradigm displaces transformers, Etched’s chips could become obsolete overnight, a possibility the founders themselves acknowledge. But if the transformer moat holds, the upside is extraordinary. Etched has a credible path to becoming one of the defining hardware companies of the century, powering both the data centers of today and the Physical AI systems of tomorrow.

Bullish on Robotics? So Are We.

XMAQUINA is a decentralized ecosystem giving members direct access to the rise of humanoid robotics and Physical AI—technologies set to reshape the global economy.

Join thousands of futurists building XMAQUINA DAO and follow us on X for the latest updates.

Subscribe
to newsletter

Your form has been submitted successfully. We have already received your information and are processing it.

Oops! Something went wrong while submitting the form.

By clicking Sign Up you're confirming that you agree with our Terms and Conditions.

Owner: