Invest Like the Best

World's Top Researcher on AI, LLMs, and Robot Intelligence

Guest: Sergey LavineMarch 31, 2026

robotics artificial intelligence large language models (llms)foundation models robot intelligence physical intelligence deep reinforcement learning multimodal ai common sense reasoning generalization embodied ai robot hardware research & development autonomous systems

World's Top Researcher on AI, LLMs, and Robot Intelligence

Episode Summary

AI-generated · Mar 2026

AI-generated summary — may contain inaccuracies. Not a substitute for the full episode or professional advice.

Sergey Lavine, a co-founder and researcher at Physical Intelligence, joins the podcast to discuss the critical challenge of providing intelligence to physical robots, a problem he terms the "scarecrow problem." Lavine's company is dedicated to developing robotic foundation models capable of enabling any physical robot to perform any task in any environment, a generalist approach he argues is ultimately more effective than specialized solutions, drawing parallels to the evolution of large language models (LLMs).

Lavine explains that building generalizable robotics doesn't always translate into impressive single-task demonstrations; instead, it means tackling mundane tasks in a myriad of unforeseen situations, like a robot cleaning a kitchen without prior training in that specific home. He envisions a future where successful general-purpose embodied AI sparks a "Cambrian explosion" of robotic applications, much like personal computers and the internet, by radically lowering the barrier for innovation. He details the historical context of robotics, from end-to-end learning in the 1980s to deep reinforcement learning in the 2010s, highlighting the recent transformative role of multimodal LLMs in imbuing robots with "common sense" to handle long-tail scenarios. Their current approach at Physical Intelligence involves Vision Language Action models (VLAMs) that utilize "chain of thought" reasoning and reinforcement learning to improve.

A significant insight from Lavine's work is the shifting bottleneck in robotic learning: from low-level physical actions to mid-level interpretation of scenes, which allows robots to be "coached" with language to improve generalization. He addresses the "reservoir of data" challenge, suggesting that the key is to develop useful systems that can autonomously gather more data in the real world, akin to Tesla's approach. Lavine also discusses Moravec's paradox, noting how machine learning is altering which tasks are considered easy or hard for robots, making physically intricate tasks easier if data collection is straightforward. He delves into the ongoing "bitter lesson" controversy in robotics—whether to program explicit knowledge or let machines learn from data—and shares his optimism regarding the potential for general intelligence despite the field's long history of limited success.

Listeners will gain a profound understanding of the current state, technical challenges, and future trajectory of general robot intelligence. Lavine's insights offer a strategic roadmap for how foundational AI models can transform robotics, from unlocking unprecedented creativity in robot design to reshaping labor dynamics, and shed light on the non-technical factors like human comfort and expectations that will influence widespread adoption.

👤 Who Should Listen

AI researchers and engineers interested in the future of embodied AI and robotics.
Entrepreneurs and investors looking to understand the technical challenges and market potential of general-purpose robots.
Robotics enthusiasts curious about the contrast between specialized demos and generalizable intelligence.
Business leaders contemplating the integration of autonomous systems and the changing nature of labor.
Anyone interested in the societal implications and ethical considerations of advanced robotic systems.
Students and academics exploring career paths in AI, machine learning, and robotics research.

🔑 Key Takeaways

1.Robotic foundation models, like those developed at Physical Intelligence, aim to provide a general "brain" for any physical robot to perform any task in any environment, addressing robotics' "scarecrow problem." [00:00, 01:01]
2.The bet on generality, rather than domain-specific solutions, is crucial for robotics, mirroring LLMs' success by leveraging broader data and fostering foundational world understanding. [01:01, 02:02, 03:03]
3.Multimodal LLMs are revolutionizing robotics by providing "common sense" knowledge for handling long-tail, unusual scenarios that traditional data collection methods cannot cover cost-effectively. [11:08, 12:08]
4.Sergey Lavine's current research focuses on combining generative AI's vast knowledge with deep reinforcement learning's ability to surpass human performance, aiming to overcome the limitations of prior approaches. [15:11, 16:12]
5.The development of Vision Language Action models (VLAMs) that use "chain of thought" reasoning allows robots to interpret scenes and select next steps, moving the bottleneck from low-level actions to mid-level semantic interpretation, enabling "coaching" with language. [17:13, 27:27]
6.Success in general-purpose embodied AI could trigger a "Cambrian explosion" of robotic applications, akin to personal computers and the internet, by radically lowering the barrier to entry for innovators to create diverse form factors and functions. [05:04, 06:04]
7.Despite Moravec's paradox, machine learning is making physically intricate tasks easier if data collection is straightforward, but challenges remain for situations requiring common sense and multi-level abstraction. [24:23, 25:23]
8.The timeline for widespread robot adoption is uncertain due to the "bootstrap challenge" of reaching sufficient usefulness for real-world data collection at scale, and the complex interaction of technology with human comfort and expectations. [28:28, 63:05]

💡 Key Concepts Explained

Scarecrow Problem

This refers to the challenge in robotics where advanced physical devices, regardless of their form or function, lack a central 'intelligence' or 'brain' to make them truly useful. Physical Intelligence aims to solve this by providing foundation models as that missing intelligence. [00:00]

Robotic Foundation Models

These are general-purpose AI models designed to control any embodied system to perform any task, analogous to how large language models handle any language-based task. The episode emphasizes their importance in achieving broad applicability and generalization in robotics. [01:01]

Moravec's Paradox

A cognitive bias in AI that suggests things easy for humans (like physical dexterity or common sense) are difficult for machines, while things hard for humans (like calculus) are easy. Lavine notes that machine learning is changing this equation by making physically intricate tasks easier if sufficient data is available. [24:23]

Common Sense (in Robotics)

Defined as the ability of a robotic system to apply semantic inferences and knowledge learned from diverse sources (like multimodal LLMs) to a current physical task at hand. It's crucial for robots to navigate and respond reasonably to unusual or unexpected 'long-tail' scenarios. [25:23]

Vision Language Action Model (VLAM)

An AI model that is essentially an LLM adapted for robotic control. It is trained on text, then adapted with image data, and finally fine-tuned with diverse robot data, enabling it to bridge web knowledge with physical interaction. [17:13]

Chain of Thought (in Robotics)

A reasoning process where a robot, instead of directly executing an action, first 'thinks' about what it was asked to do and what steps it should take. This internal monologue leverages web-scale pre-training to improve common sense and decision-making in complex tasks. [17:13]

The Bitter Lesson

This principle states that powerful AI systems are best achieved by scaling up general learning methods with vast amounts of data, rather than programming in explicit knowledge or human-designed inductive biases. Lavine notes its ongoing controversy in the robotics community regarding end-to-end learning. [45:44]

Compositional Generalization

The ability of an AI system to combine and mix learned skills or knowledge in novel ways to solve new problems, even if it has never encountered that specific combination before. An example given is an LLM writing paragraphs in the International Phonetic Alphabet despite never seeing that format. [46:45]

⚡ Actionable Takeaways

→Understand that effective robotic learning prioritizes generalization over exciting, perfectly controlled demos, focusing on mundane tasks in varied environments. [04:04]
→Recognize the shift in robotic development towards general-purpose foundation models that understand physical interaction, rather than specialist robots for single tasks like dishwashing. [03:03]
→Leverage multimodal LLMs to imbue robotic systems with common sense, essential for handling unusual "long-tail" scenarios where specific training data is scarce. [11:08, 12:08]
→Consider how to allow robots to "talk to themselves" using chain of thought reasoning to unlock prior knowledge from web-scale pre-training for better decision-making in novel situations. [17:13]
→Explore how to provide high-level instructions and semantic commands to robots, as this can now improve their ability to generalize by supervising mid-level scene interpretation. [27:27]
→Reframe the "robot data reservoir" problem by focusing on creating useful systems that can autonomously gather more data in the world, rather than pre-quantifying total data needs. [21:18]

⏱ Timeline Breakdown

00:00Introduction to Sergey Lavine and Physical Intelligence's mission to solve the "scarecrow problem" with robotic foundation models.

01:01Defining Physical Intelligence's goal to build robotic foundation models for any embodied system and task, analogous to language models.

02:02Explanation of why a generalist approach is preferred over specialized robots, drawing parallels to language models leveraging broad data.

04:04Discussion on the difficulty of building generalizable robotics, which doesn't make for exciting single-task demos.

05:04Sergey's vision for success: unlocking creativity and a "Cambrian explosion" of diverse robot applications, like personal computers.

07:04Pros and cons of the humanoid approach versus a general intelligence adaptable to any body or tool.

09:06Historical timeline of robotics research, including end-to-end learning (Alvin, 1980s) and deep reinforcement learning (2010s).

11:08The recent shift in robotics leveraging multimodal LLMs to provide "common sense" for handling long-tail, unusual scenarios.

13:09Identifying key milestones like deep reinforcement learning and the advent of multimodal LLMs adapted for robotic control.

14:09Sergey's personal history: from blank-slate learning to collective learning, now combining practice with prior knowledge.

16:12The big challenge of combining generative AI's knowledge with deep reinforcement learning's ability to surpass human performance.

17:13Sergey describes their current approach using Vision Language Action models (VLAMs), chain of thought reasoning, and reinforcement learning for improvement.

21:18The "reservoir of data" challenge and the idea that useful robots should collect their own data, like Tesla.

23:21Surprising progress in dexterity and generalization across different robot embodiments without major model changes.

24:23Explanation of Moravec's Paradox and how machine learning is changing what is considered easy or difficult for robots.

25:23Defining "common sense" in robotics as applying semantic inferences from external knowledge to physical tasks.

27:27The shift in bottleneck to mid-level reasoning and the ability to "coach" robots with language to improve generalization.

28:28Potential non-technical explanations for slow robot adoption by 2050, focusing on interaction with people and comfort with imperfection.

30:31The biggest technical risk: dealing with the breadth of unexpected situations in open-world environments like homes.

31:33The core principle of Physical Intelligence's approach: achieving generality, especially regarding how the system can be improved.

32:33Discussion of other interesting approaches and the dichotomy between real data and simulation in robotics.

34:34Contrasting "cool" (Boston Dynamics backflips) with "useful" demos, emphasizing usefulness as the primary objective for Physical Intelligence.

35:35Description of the "robot Olympics" tasks and how Physical Intelligence's general system solved almost all of them.

37:37Examples of surpassing human ability, such as speeding up tasks by removing human-like cognitive pauses.

39:39The innovation on form factors, arguing that general AI can lower the barrier to experimentation for robot designers.

40:39Analogy of physical intelligence to learning to ride a bike, where tools become an extension of the body.

43:40Speculation on how the world might change with general robotics, similar to the impact of LLMs on engineering.

44:42Major controversies in robotics, including the historical debate on the place of learning and the current "bitter lesson" of end-to-end learning.

46:45Explanation of compositional generalization using the example of a language model writing paragraphs in the International Phonetic Alphabet.

47:45The last type of tasks to be achieved by robots: those involving intimate human interaction, like changing a child's diaper, due to Moravec's paradox.

48:47The 'dark parts' of the robotics brain: the challenge of using physical analogies for understanding and inference, which humans excel at.

50:48Discussion on the role of researchers and what constitutes 'good research,' emphasizing the collective effort and instructive failures.

52:54The delicate decision in research: when to try new things versus sticking with current approaches.

54:55The importance of manufacturing and scalability, and how general-purpose AI can reduce uncertainty in these areas.

55:57Advice for traditional companies on preparing for robotics, acknowledging technological uncertainties like data collection methods.

56:58The co-evolution of AI tools and human labor, with robots increasing productivity rather than fully replacing humans, similar to coding tools.

58:38Sergey's favorite robot: the Boston Dynamics Atlas for its agility and interesting design decisions.

59:01Reflection on Boston Dynamics' 'cool demos' vs. commercial usefulness and the value of illustrative challenges.

60:02Thinking about business endpoints and prototyping real-world applications with current models.

61:02Complementary technological trends, like the dramatic reduction in robotics hardware costs.

62:04Where to find information on major milestones: research papers, noting their inaccessibility and the misleading nature of public-facing demos.

63:05The most uncertain aspect: the timeline, due to the bootstrap challenge and varying deployment models.

64:05Questions people don't ask: how to specifically prepare for using autonomous robots, beyond just collecting data.

65:06The next visible challenge: better understanding mid-level reasoning and how to structure the robot's internal thinking process beyond text-based representations.

66:09Sergey's position on the optimism spectrum for robotics: optimistic among established researchers, pessimistic compared to entrepreneurs.

68:09Inspirations: Boston Dynamics for showing what's possible, and organizations like OpenAI for fostering experimentation and empowering researchers.

69:11Recalling a personal experience at Google ('arm farm') illustrating the power of leverage and agency for researchers.

71:13Kindest thing: Jeff Dean and Vincent Vanhoucke taking a bet on him for the 'arm farm' and Peter Abbeel betting on his potential at Berkeley.

Best of this topic

Best robotics episodes →Best artificial intelligence episodes →Best large language models (llms) episodes →Best foundation models episodes →Best robot intelligence episodes →

💬 Notable Quotes

“"Fundamentally, the goal of physical intelligence is to develop robotic foundation models that can control basically any embodied system to do any task." [01:01]”
— Sergey Lavine

“"Generalization, you can't just show it in like in one spot, right? Like the point of generalization is that it does something relatively mundane that any human could do, but it does it in any situation." [04:04]”
— Sergey Lavine

“"The bitter lesson says that you should not program the machine to think the way you think it should think, but you should let it learn from data." [45:44]”
— Sergey Lavine