Nature Foundation Models: A Hierarchical Framework for Learning Worlds, Embodiment, and Scientific Intelligence
We propose Nature Foundation Models (NFM), a research program for systems that learn representations, dynamics, causal structure, and mechanisms directly from interaction with the natural world. NFM organizes scientific intelligence as a hierarchy—NFM-Worlds, NFM-Robotics, Atlas, and Atlas-GS—with a shared state–action–dynamics abstraction and a seven-stage developmental pipeline from world representation to adaptive scientific reasoning. The central thesis is that scientific reasoning should emerge from increasingly sophisticated interactions with learned world models rather than from an independent symbolic module.
Abstract
Recent advances in artificial intelligence have demonstrated remarkable capabilities in language understanding, perception, planning, and control. However, most contemporary systems remain specialized architectures optimized for specific tasks, datasets, or environments. We propose Nature Foundation Models (NFM), a research program aimed at developing systems that learn representations, dynamics, causal structure, and mechanisms directly from interaction with the natural world.
NFM introduces a hierarchical framework consisting of three layers: NFM, a general scientific intelligence research program; NFM-Worlds, a foundational world-modeling platform that learns how environments evolve through observation and intervention; and NFM-Robotics, an embodied learning platform that grounds world learning in physical interaction.
Within this hierarchy, Atlas serves as the robotics project family, while Atlas-GS represents the first implementation based on Gaussian Splatting world representations. The framework defines a progression from world representation learning to action semantics acquisition, causal discovery, mechanism inference, hypothesis generation, active experimentation, and adaptive scientific reasoning.
The central thesis of this work is that scientific reasoning should not be treated as an independent symbolic capability but should emerge naturally from increasingly sophisticated interactions with learned world models.
This emergence is realized through a developmental pipeline—world representation, action semantics, causal discovery, mechanism inference, hypothesis generation, active experimentation, and adaptive reasoning—driven by increasingly sophisticated interaction with learned world models.
1. Introduction
Artificial intelligence has progressed through increasingly powerful pattern-learning systems. Deep learning enabled large-scale representation learning. Foundation models generalized these capabilities across broad domains. Reinforcement learning enabled decision-making through interaction.
A fundamental question remains:
How can intelligent systems learn the structure of reality itself?
Humans do not merely predict observations. They construct models of environments, infer causal relationships, propose explanations, perform experiments, and refine theories through interaction. Scientific reasoning emerges from this iterative process.
We argue that future intelligent systems require architectures capable of:
- Learning persistent world representations
- Understanding actions through interaction
- Discovering causal relationships
- Inferring hidden mechanisms
- Generating competing explanations
- Designing experiments to reduce uncertainty
Nature Foundation Models is proposed as a research framework toward this objective.
1.1 Research Questions
- Can world representation, action semantics, and causal reasoning be learned within a single developmental architecture rather than as disconnected modules?
- Does embodiment—physical interaction with environments—provide information that passive observation or simulation cannot?
- Can a unified state–action–dynamics abstraction span physical, biological, chemical, and ecological domains?
- At what stage of world-model sophistication does scientific reasoning become an emergent capability rather than a hand-engineered subsystem?
1.2 Contributions
This document contributes:
- A hierarchical research architecture connecting scientific intelligence, world modeling, embodiment, and concrete implementations.
- A formal core abstraction—state, action, dynamics—shared across domains.
- A seven-stage developmental roadmap from world representation to adaptive scientific reasoning.
- A clear positioning against prior work in world models, causal inference, embodied AI, and 3D scene representations.
- An initial instantiation (Atlas-GS) that grounds the framework in physical robotics.
2. Core Abstraction
The common abstraction across NFM is not the domain (robotics, biology, chemistry). It is the interaction loop:
State_t + Action_t → State_{t+1}
A world is defined as:
World = (State, Dynamics, Actions, Observations)
where:
- State — a representation of the environment at a point in time
- Dynamics — rules governing how states evolve
- Actions — interventions or controls that alter state
- Observations — partial, sensor-derived views of state
The overarching learning objective:
Observation + Action + Feedback + Time → Knowledge
where knowledge includes representations, dynamics, causal relationships, mechanisms, and—eventually—scientific theories.
2.1 Cross-Domain Equivalence
The same abstraction applies across domains:
| Domain | State | Action / Intervention | Observed Transition |
|---|---|---|---|
| Physical (robotics) | 3D scene, robot pose | Move, push, grasp | New viewpoint, object displacement |
| Biological | Gene expression, pathway activity | Drug perturbation, knockout | Expression change, pathway shift |
| Chemical | Molecular configuration | Reaction, catalyst | Product formation |
| Ecological | Population, resource levels | Environmental stress | Adaptation, population change |
Mathematically, these are the same class of problem: learn world structure, dynamics, and the semantics of interventions from experience.
3. The NFM Hierarchy
The framework consists of five layers. Each layer addresses a different level of abstraction.
Nature Foundation Models (NFM)
↓
NFM-Worlds
↓
NFM-Robotics
↓
Atlas
↓
Atlas-GS
Future domain branches (NFM-Biology, NFM-Chemistry, NFM-Ecology) and additional Atlas implementations (Atlas-NeRF, Atlas-Sim) extend from this core chain but follow the same state–action–dynamics abstraction.
NFM is the scientific-intelligence research program. It is not robotics, biology, or simulation individually.
NFM-Worlds is the foundational world-modeling layer. All domain branches share the state–action–dynamics abstraction.
NFM-Robotics provides the embodied learning platform—sensors, manipulators, mobility—that generates structured experiences for world learning. It is the "body"; NFM-Worlds is the "brain."
Atlas is the robotics project family built on NFM-Robotics. It treats navigation, manipulation, and observation as mechanisms for acquiring knowledge, not merely for task completion.
Atlas-GS is the first Atlas implementation, using Gaussian Splatting for persistent 3D world representations.
4. NFM-Worlds
NFM-Worlds learns how environments evolve through observation and intervention:
State_t + Action_t → State_{t+1}
without requiring complete prior knowledge of the environment.
4.1 World Representation
Learning persistent representations of environments.
Examples: 3D scenes, biological states, chemical systems, simulated environments.
4.2 Dynamics Learning
Learning how states evolve over time:
Current State → Future State
4.3 Action Semantics
Understanding what actions mean by observing their consequences. Rather than defining actions symbolically:
MoveForward = predefined meaning
NFM-Worlds learns:
Action → Observed State Difference → Semantic Meaning
directly from experience.
4.4 Causal Discovery
Moving beyond prediction toward explanation:
- What caused this change?
- Which variables were responsible?
- What interventions alter outcomes?
4.5 Mechanism Discovery
Inferring hidden structures—latent variables, unobserved processes—that produce observed transitions. A door opening is not merely a state change; a latch mechanism may explain it.
5. NFM-Robotics and Atlas
5.1 Role of Embodiment
While NFM-Worlds can learn from simulations or datasets, robotics enables direct interaction with physical reality. The role of robotics is not merely control. It is scientific observation.
Robots provide sensors, cameras, manipulators, mobility, and interventions that generate structured experiences for world learning.
NFM-Worlds = Brain
NFM-Robotics = Body
5.2 Atlas
Atlas is the embodied world-learning family built on NFM-Robotics. It provides:
- Navigation
- Manipulation
- Observation
- Experimentation
within learned world models. The objective is understanding, not merely task completion. Interaction is a mechanism for acquiring knowledge about environments.
Robotics provides fast feedback loops for validating NFM ideas—state representation, action semantics, causal intervention, replay and reflection—faster than in biological or chemical domains. Atlas is therefore both a practical engineering target and a scientific testbed for the broader NFM vision.
6. Atlas-GS: Gaussian World Models
Atlas-GS is the first implementation of the Atlas family.
Inputs:
RGB, RGB-D, Stereo, Robot Sensors
Output:
Persistent 3D Gaussian World
This representation supports:
- Localization and mapping
- Scene memory
- Action analysis
- Foundation for causal reasoning (future stages)
Scope note. Atlas-GS today is primarily a world-modeling and embodied-intelligence system—perception, mapping, action learning, prediction, and planning. Scientific reasoning capabilities (hypothesis generation, experiment design, theory refinement) are roadmap targets, not current defining characteristics. Positioning Atlas-GS accurately avoids overclaiming while preserving the long-term developmental arc.
7. Developmental Capability Roadmap
Capabilities emerge sequentially. Each stage builds on the previous; later stages are not independent modules bolted onto earlier ones.
World Representation
↓
Action Semantics Learning
↓
Causal Discovery
↓
Mechanism Discovery
↓
Hypothesis Generation
↓
Active Experiment Design
↓
Adaptive Scientific Reasoning
Stage 1: World Representation
| Learn | Observe → Build World |
| Output | Persistent world model |
| Example (Atlas-GS) | 3D Gaussian scene from RGB-D |
Stage 2: Action Semantics
| Learn | State + Action → State Difference |
| Output | Grounded action meanings |
| Example | "Push" correlates with object displacement |
Stage 3: Causal Learning
| Learn | Cause → Effect |
| Output | Interpretable causal models |
| Example | Identifying which action variable caused an outcome |
Stage 4: Mechanism Discovery
| Learn | Infer hidden structures responsible for observations |
| Output | Mechanistic explanations |
| Example | Latent latch state explains door behavior |
Stage 5: Hypothesis Generation
| Learn | Maintain multiple competing explanations |
| Output | Theory candidates |
| Example | H1: force threshold; H2: latch state change |
Stage 6: Active Experiment Design
| Learn | Select actions that maximize information gain |
| Output | Efficient discovery |
| Example | Probe latch before pushing harder |
Stage 7: Adaptive Scientific Reasoning
| Learn | Integrate all previous capabilities |
| Output | Agents that construct and refine theories through interaction |
8. What Is New?
The novelty is not a new robot, simulator, or world model alone.
The strongest novelty claim is not any single component in isolation:
| Component | Status |
|---|---|
| World models | Established (Dreamer, MuZero, GWM) |
| Robotics | Established |
| Gaussian Splatting | Established |
| Causal reasoning | Established (Pearl, Peters et al.) |
| Active inference | Established (Friston) |
| Scientific reasoning | Established in narrow domains |
The novelty is the hierarchical integration of:
World Learning
+
Action Semantics Learning
+
Causal Discovery
+
Mechanism Discovery
+
Hypothesis Generation
+
Experiment Design
within a single developmental framework.
The novelty is the developmental integration. NFM treats the following as a single continuous learning process—not separate research programs:
World Learning + Action Semantics + Causal Discovery
+ Mechanism Discovery + Hypothesis Generation + Experiment Design
Most existing systems focus on one or two of these components.
NFM proposes that these capabilities emerge sequentially from increasingly sophisticated interactions with learned world models.
Scientific reasoning is therefore not treated as an independent symbolic module but as an emergent consequence of world learning.
NFM proposes that scientific reasoning emerges sequentially from increasingly sophisticated interactions with learned world models, rather than from an independent symbolic module added after the fact.
This is the central research thesis worth defending.
9. Relationship to Prior Work
NFM builds on established directions but differs in architectural intent.
World Models
- Ha and Schmidhuber (2018); Dreamer (Hafner et al., 2019–2023); MuZero (Schrittwieser et al., 2020)
- NFM difference: World models are the substrate for a developmental pipeline toward scientific reasoning, not primarily for reward maximization or game playing.
Causal Reasoning
- Pearl (2009); Peters, Janzing, and Schölkopf (2017)
- NFM difference: Causal structure is discovered through embodied interaction with learned world models, not assumed from a fixed observational dataset.
Active Inference
- Friston (2010); Friston et al. (2017)
- NFM difference: Shares the emphasis on uncertainty reduction and action selection, but targets explicit mechanism and hypothesis stages beyond perceptual inference.
Cognitive Architectures
- Newell (1990); ACT-R (Anderson et al., 2004)
- NFM difference: Capabilities are learned from interaction rather than hand-specified production rules and memory structures.
Embodied AI
- Habitat, AI2-THOR, BEHAVIOR
- NFM difference: Benchmarks emphasize task completion; NFM emphasizes knowledge acquisition and developmental progression toward scientific reasoning.
3D World Representations
- NeRF; 3D Gaussian Splatting
- NFM difference: Representations serve persistent world memory, action-semantic learning, and future causal reasoning—not just novel-view synthesis.
Large Language Models
- GPT, Claude, Gemini, and related foundation models
- NFM difference: LLMs learn statistical structure of text; NFM learns structure of environments through interaction. Complementary, not competing—language models may eventually interface with NFM world models, but NFM grounds knowledge in sensorimotor experience.
Unlike most prior approaches, NFM explicitly places scientific reasoning as the final stage of a developmental world-learning pipeline.
10. Evaluation and Open Challenges
10.1 Evaluation Principles
Progress should be measured at each developmental stage, not only by end-task performance:
| Stage | Example Metrics |
|---|---|
| World representation | Reconstruction quality, map persistence, localization accuracy |
| Action semantics | Predictive accuracy of state differences given actions |
| Causal discovery | Intervention effect identification, counterfactual prediction |
| Mechanism discovery | Latent variable recovery on environments with hidden structure |
| Hypothesis generation | Diversity and calibration of competing explanations |
| Experiment design | Information gain per intervention, sample efficiency |
| Scientific reasoning | Theory revision under new evidence |
10.2 Open Challenges
- Scalability: Persistent world models that grow with exploration without catastrophic drift.
- Compositional generalization: Transferring learned action semantics and causal structure to novel objects and environments.
- Uncertainty: Representing and acting under epistemic uncertainty at every developmental stage.
- Cross-domain transfer: Whether abstractions learned in robotics accelerate learning in biological or chemical domains.
- Theory representation: How competing hypotheses and mechanisms are stored, compared, and revised within world models.
11. Conclusion
Nature Foundation Models proposes a hierarchical framework for learning representations, actions, causality, mechanisms, and scientific theories directly from interaction with environments.
The framework is organized as:
NFM
→ NFM-Worlds
→ NFM-Robotics
→ Atlas
→ Atlas-GS
where Gaussian world models provide the initial substrate for embodied learning.
The central hypothesis is that scientific reasoning emerges naturally from persistent world models that progressively acquire richer representations of actions, causes, mechanisms, and uncertainty.
If successful, this approach could provide a pathway from world modeling to autonomous scientific intelligence.
If successful, NFM could provide a pathway from world modeling to autonomous scientific intelligence, validated first in fast-feedback physical environments and extended to biological, chemical, and ecological domains through the same state–action–dynamics abstraction.
Appendix A: Terminology
| Term | Definition |
|---|---|
| NFM | Nature Foundation Models; overarching scientific-intelligence research program |
| NFM-Worlds | World-modeling platform; learns state, dynamics, actions, causality |
| NFM-Robotics | Embodied learning platform; sensors, actuators, and physical interaction |
| Atlas | Robotics project family built on NFM-Robotics |
| Atlas-GS | Atlas implementation using 3D Gaussian Splatting |
| GWM | Gaussian World Model; established term in splatting-robotics literature |
| Developmental pipeline | Sequential emergence of capabilities from world representation to scientific reasoning |