Chinese Scientists Review Evolution and Challenges of Physical AI

Introduction

We have become accustomed to the “stunning yet distorted” AI experience. Video generation models can make a cup of coffee appear on a table, but the liquid inside flows in ways that defy gravity. Robots perform excellently in laboratories but become disoriented when placed in different environments. This is not due to insufficient computing power or data, but rather a fundamental lack of true understanding of the physical world by current AI.

Overview of Physical AI

In this context, a multi-institutional team led by researcher Wu Enhua from the Institute of Software, Chinese Academy of Sciences, published an important review on Physical AI in the Journal of Computer Science and Technology. This paper systematically outlines the origins, core framework, and future challenges of this emerging direction. The significance of this paper extends beyond academic knowledge; it serves as a collective declaration for the next phase of AI development.

The Three-Dimensional Structure of Physical AI

One of the core contributions of the paper is the proposal of a “three-dimensional structure” framework for Physical AI, breaking down this seemingly vague concept into three clear dimensions.

Physics-Informed AI: This dimension involves embedding physical laws directly into models. Instead of checking results after training, conservation laws, partial differential equations, and dynamic constraints are integrated from the beginning to constrain each output of the model.
Generative Physical AI: This dimension focuses on tasks such as video generation, 3D scene reconstruction, and dynamic object modeling. The generated content must not only “look real” but also be physically plausible. This represents a direct upgrade requirement for current diffusion models.
Embodied AI: This dimension emphasizes how intelligent agents form a complete loop of perception, decision-making, and action in real physical environments. The most challenging aspect here is the transfer from simulation to reality, or how to ensure that models trained in virtual environments remain effective in the real world.

These three dimensions are not isolated but are progressively layered: first understanding physics, then generating physically consistent content, and finally taking real actions in the physical world.

Importance of Physical AI

Reflecting on the evolution of artificial intelligence, from symbolic reasoning to statistical learning, from perceptual recognition to generative models, each leap has a common logic: AI is increasingly dealing with information that approaches the complexity of the real world. Physical AI represents the final and most challenging segment of this journey: allowing AI to truly enter the physical world rather than merely spinning in data space.

The practical value of this direction is vast. For robots to enter factories and homes, they must understand forces, friction, gravity, and object deformation. Autonomous driving requires physical-level predictive capabilities in complex traffic scenarios to maintain safety. Digital twins must ensure that virtual models align precisely with physical realities to effectively serve industrial decision-making.

NVIDIA CEO Jensen Huang highlighted Physical AI as one of the most important AI tracks in his 2025 CES speech, clearly stating that “the next wave of AI will unfold in the physical world,” aligning closely with the direction indicated by Wu Enhua’s team in this review.

The paper candidly points out the biggest shortcoming in the current field: the evaluation system is severely lagging. Existing benchmarks are scattered, standards are not unified, and reproducibility is poor, making it difficult to compare research outcomes across different teams and hindering objective measurement of progress in the entire field.

This is a typical stage where the technical direction is clear, but the infrastructure has not kept pace.

Challenges Ahead

Physical AI is not a breakthrough in a single technology but a system engineering effort requiring the collaboration of multiple technologies.

Differentiable physical simulation
Neural-symbolic integration
Large-scale multimodal physical datasets
Efficient real-time inference architectures

Any missing component can stall the entire chain. The core challenges mentioned in the paper, such as the dynamic differences between simulation and reality, the difficulty of physically annotating training data, and the computational cost of multi-physical field coupling, are each PhD-level problems.

However, this is precisely what makes this direction exciting. It is not about competing on existing tracks but attempting to open a new door.

Transitioning from the information world to the physical world sounds simple, but it requires a rethinking of the essence of AI: it should not just be a powerful pattern-matching machine but an intelligent system capable of understanding the laws governing the world and acting robustly within it.

The value of Wu Enhua’s team’s review lies in its clear mapping of this path and its honest acknowledgment of the unfilled gaps along the way. This honesty is, in itself, the attitude that science should embody.