Physical AI Will Go Mainstream This June

Tesla is launching its robo-taxi service in June, and with it, Physical AI will enter mainstream consciousness.

Yes, there are already a few robo-taxi services in service today: ‘Waymo’ operates in San Francisco and Austin, while Baidu’s ‘Apollo Go’ runs in China, but Tesla’s service will stand out for two reasons:

1. The CyberCab is a purpose-built vehicle, designed solely for autonomous driving, with no steering wheel, no accelerator, nor brake pedal

2. Tesla plans rapid global scaling, starting in the U.S. in 2025 and eyeing Europe, including a potential Netherlands rollout by late 2025.

We believe that Tesla is staging this launch perfectly before the release of its second wave of Physical AI deployment: humanoid robots with the Tesla Optimus. What do Optimus and the Cybercab have in common? And where does the difference lie?

‍

Robo-Taxis and Humanoids have a lot in common

They both need a different kind of AI

Both are designed to operate in the physical world, often moving alongside humans, navigating dynamic and unpredictable environments. This demands a different form of intelligence than that of a large language or diffusion model.

In the book, The Singularity Is Near, Ray Kurzweil postulated that humans are especially skilled at predicting the near future in real time. He noted, “We are constantly predicting the future and hypothesizing what we will experience. This expectation influences what we actually perceive. Predicting the future is actually the primary reason that we have a brain.”

Predicting the next word in a sequence is exactly what a large language model (LLM) does, and this produces remarkable outputs from trained data. Predicting the physical world, on the other hand, has a different kind of complexity. Unlike language, which draws from a finite vocabulary (e.g., an average of 3,000 words), the physical world encompasses infinite possibilities: motion, forces, and interactions. While governed by the laws of physics, these variables create countless unique scenarios.

Language models for text and diffusion models for images and videos function as black boxes: you provide an input, the model processes it, and an output is returned. Physical AI—like the systems behind Tesla’s Cybercab or Optimus—faces a different reality. The world is in constant flux, responding dynamically to the robot’s actions. Inputs such as video, audio, and prior instructions are continuously streamed and must be computed in real time to generate safe and effective physical behaviors. This introduces a unique set of challenges.

‍

They have to think locally

Unlike most AI services delivered via the cloud as SaaS, Physical AI—such as Tesla’s Full Self-Driving (FSD), demands on-site processing to avoid latency and maintain reliability in areas with poor connectivity. All real-time FSD decisions are handled locally by the vehicle’s FSD computer, powered by Tesla’s custom chips. For instance, the Hardware 3 chip processes 2,300 frames per second and performs 36 trillion operations per second per neural network array, enabling rapid analysis of camera inputs and neural network-driven driving decisions without cloud dependency.

However, while real-time decisions are local, FSD’s neural network training is remote. Tesla collects vehicle data, including camera footage and driving scenarios, and sends it to data centers for offline training. These improvements are then rolled out to vehicles via over-the-air updates. While data transmission isn’t needed for real-time FSD operation, it’s essential for long-term algorithm enhancement.

‍

The cost of training is enormous

Building the computational infrastructure for Physical AI requires immense capital investment. To put things in perspective, Tesla has already invested over $5 billion in its Cortex data centre at the Austin, Texas Gigafactory, deploying 50,000 GPUs between 2023 and 2024, with plans to add another 50,000.

Elon Musk has noted that training humanoid robots, due to their complex operating environments, could demand up to 10 times the computational resources of autonomous driving, potentially requiring a $50 billion investment for Optimus’s neural network training.

Tesla is in a unique position because it can leverage its FSD expertise to accelerate the development of its Optimus brain. Both systems rely on Tesla’s vision-based approach, using deep learning and neural networks for visual recognition and scene understanding. FSD’s algorithms, trained on vast datasets and refined through the Dojo supercomputer, can be adapted for Optimus, enhancing training efficiency via simulated virtual models and automated labelling.

Other humanoid robotics companies, lacking Tesla’s infrastructure and expertise, may need to collaborate on open-source platforms or rely on partners like Google or NVIDIA to access the immense computing power required. Decentralized ecosystems like Bittensor, which enable permission-less innovation and shared training data, could reduce costs by allowing companies to pool resources and accelerate the development of “humanoid brains”.

‍

They are both potentially dangerous

Robo-taxis and humanoid robots operate in close proximity to humans, making their presence in the physical world inherently risky. A misjudgment by a robo-taxi could cause a collision, while an error by a humanoid robot—especially during tasks like caregiving—could lead to injury. To ensure safety, both systems require real-time processing and robust fail-safes.

This direct interaction with the physical environment sets Physical AI apart from other AI systems. Unlike large language models, where mistakes may result in misinformation or hallucinations, errors in Physical AI can cause tangible harm, making exceptional reliability and precision absolutely critical.

The success of large-scale robo-taxi services will significantly influence the timeline for deploying humanoid robots in uncontrolled environments, such as homes or private spaces. If Tesla’s autonomous ride-hailing service demonstrates greater safety than human-driven alternatives, it could build public trust in humanoid robots and accelerate their adoption. Conversely, if robo-taxi systems face reliability issues, it may delay the broader rollout of humanoids outside controlled settings like factories and warehouses.

‍

Robo-Taxis and humanoids are also different in meaningful way

Humanoid robots are built for general-purpose tasks

However, they differ in purpose and application. AV’s focus on transportation, aiming to move passengers or goods safely, as seen with Cybercab's robo-taxi service. Humanoid robots, on the other hand, are general-purpose, designed for diverse tasks like caregiving or factory work, requiring both gross and fine motor skills. This distinction is crucial for understanding their roles in Physical AI's future.

‍

Humanoids have to deal with a more complex environment

Humanoids operate in far more complex environments than robo-taxis. AV’s navigate a relatively simple, near-2D road environment with limited degrees of freedom, primarily involving forward and backward motion, steering, and acceleration. In contrast, humanoids face intricate 3D spaces, with a single hand alone possessing up to 22 degrees of freedom. This enables them to perform a potentially unlimited variety of tasks, embodying the essence of general-purpose robotics—but also introduces a significantly higher level of complexity.

‍

Humanoids can be deployed privately

Autonomous vehicles (AV’s), like robo-taxis, operate in complex public environments, navigating dense traffic and strict regulations. Testing them in private spaces offers limited value due to the stark contrast in conditions, such as unpredictable pedestrian behavior and road infrastructure. This complexity, combined with regulatory hurdles, has slowed robo-taxi deployment, limiting their rollout to select cities with supervised operations.

In contrast, humanoid robots can be deployed in private settings—such as factories, warehouses, or homes—where regulations are either nonexistent or less stringent, yet the applications remain highly valuable. They also don’t require additional infrastructure like charging networks. As a result, once deployment begins, adoption is likely to scale rapidly due to fewer barriers and broad utility.

‍

Robo-taxis are a launchpad for humanoid robots

Tesla’s ambitious robo-taxi service—arguably the most advanced of its kind—positions the company to develop unique expertise in managing fleets of autonomous systems. This experience will directly inform the development and deployment of its Optimus humanoid robots, allowing Tesla to apply lessons from navigating public roads to mastering complex, human-centric tasks in private environments.

This strategic edge is likely to prompt competitors to collaborate in order to keep pace. We expect open-source ecosystems—built on shared innovation and pooled resources—to play a pivotal role in shaping the future of humanoid robotics, enabling broader participation in this rapidly evolving field.

The future could not be more exciting.

‍

Bullish on robotics? So are we.

XMAQUINA is a decentralized ecosystem giving members direct access to the rise of humanoid robotics and Physical AI—technologies poised to reshape the global economy.

Join thousands of futurists building XMAQUINA DAO and follow us on X for the latest updates.

‍

Owner: