SciLayer

We present Atlas-GS v1, the first end-to-end implementation of a 3D Gaussian world-modeling pipeline within the Nature Foundation Models (NFM) hierarchy. Atlas-GS ingests RGB-D observations, constructs a persistent Gaussian field, localizes against that map, persists world state across sessions, and logs state–action–state transitions. The system implements Phases 0–6 as a modular Python package with CLI tools and demo video generation—without requiring GPU hardware for v1 validation. We report empirical results on TUM RGB-D (4,018 Gaussians, 0.0102 m localization RMSE) and synthetic sequences.

Abstract

We present Atlas-GS v1, the first end-to-end implementation of a 3D Gaussian world-modeling pipeline within the Nature Foundation Models (NFM) hierarchy. Atlas-GS ingests RGB-D observations from simulation or benchmark datasets, constructs a persistent Gaussian field representation of the environment, localizes subsequent observations against that map, persists world state across sessions, and logs state–action–state transitions for downstream learning. The system implements Phases 0–6 of the Atlas-GS build specification as a modular Python package with CLI tools, reproducible dataset ingestion, and demo video generation—without requiring GPU hardware or robot deployment for v1 validation. We describe the architectural placement of Atlas-GS within NFM → NFM-Worlds → NFM-Robotics → Atlas, the design of a CPU-friendly Gaussian proxy field, key algorithms for mapping and localization, the world bundle persistence format, and empirical results on TUM RGB-D and synthetic sequences. Atlas-GS v1 establishes the substrate on which action semantics (v2), causal world models (v3), and the seven-stage NFM developmental pipeline can be built without architectural rework.

Keywords: 3D Gaussian Splatting, world models, embodied AI, RGB-D mapping, scene memory, Nature Foundation Models

1. Introduction

Persistent world representation is a prerequisite for embodied systems that learn from interaction. Classical SLAM maintains sparse feature maps or dense voxel grids; neural approaches use NeRF-style implicit fields or explicit 3D Gaussian Splatting (3DGS) for photorealistic, editable scene models. Within the Nature Foundation Models (NFM) research program, world representation is not an isolated engineering task—it is Stage 1 of a developmental pipeline that progresses toward action semantics, causal discovery, mechanism inference, hypothesis generation, active experimentation, and adaptive scientific reasoning.

Atlas-GS is the first concrete implementation at the bottom of the NFM stack:

Nature Foundation Models (NFM)
        ↓
NFM-Worlds
        ↓
NFM-Robotics
        ↓
Atlas
        ↓
Atlas-GS

This paper documents what was built, how it works, and how it connects to the broader NFM vision. Atlas-GS v1 is deliberately scoped as a world-modeling and embodied-intelligence platform, not a scientific-reasoning system. Its job is to make the core loop operational:

Observe → Build / Update World → Log State → (Act → Observe consequences)*

with the abstraction:

State_t + Action_t → State_{t+1}

1.1 Contributions

End-to-end pipeline. A working implementation from dataset download through world build, localization, scene memory, transition logging, and demo video export.
Modular architecture. Eight Python modules (sensors, perception, gaussian_world, mapping, localization, scene_memory, transition_log, viz) aligned with the Atlas-GS specification.
CPU-first Gaussian proxy field. RGB-D fusion and voxel aggregation produce a splat-compatible representation runnable without CUDA, with a documented upgrade path to Inria 3DGS / gsplat.
Reproducible evaluation. Benchmark ingest (TUM RGB-D), synthetic data generation, localization metrics, and persisted world bundles.
Repository structure. The NFM hierarchy is reflected in folder layout: NFM-Worlds/, NFM-Robotics/Atlas/Atlas-GS/.

1.2 Scope and non-goals (v1)

In scope	Out of scope (v1)
RGB-D ingest (TUM, synthetic)	Action semantics learning (v2)
Gaussian world build & merge	Causal / mechanism models (v3–v4)
Pose localization vs map	LLM reasoning, symbolic planners
Scene memory save/load	Real-time multi-robot fleets
Transition logging	Full GPU 3DGS training (hook provided)
Novel-view / trajectory rendering	ROS 2 production nodes (planned)

2. Position in Nature Foundation Models

Atlas-GS instantiates layers of the NFM hierarchy as follows:

Layer	Role in Atlas-GS v1
NFM	Long-term vision; seven-stage developmental roadmap
NFM-Worlds	`State_t`, transition tuples, future dynamics APIs
NFM-Robotics	Sensor ingest, calibration hooks, sim-first development
Atlas	Navigation / observation as knowledge acquisition
Atlas-GS	Gaussian map, mapping, localization, scene memory

The repository mirrors this chain:

Nature-Foundation-Models/
├── NFM-Worlds/
├── NFM-Robotics/
│   └── Atlas/
│       └── Atlas-GS/
│           ├── implementation/    ← Python package
│           ├── docs/
│           ├── data/
│           ├── worlds/
│           └── demos/
└── paper/                         ← this document

Atlas-GS v1 implements world-state and transition locally within Atlas-GS/. As the stack matures, shared abstractions will migrate upward into NFM-Worlds/ (state schemas, dynamics) and NFM-Robotics/ (sensor middleware).

3. System Architecture

3.1 Pipeline overview

┌─────────────────────────────────────────────────────────────┐
│  Inputs: TUM RGB-D · Synthetic RGB-D · (future: ROS bags)   │
└────────────────────────────┬────────────────────────────────┘
                             ▼
┌─────────────────────────────────────────────────────────────┐
│  Phase 0: Environment — configs, dataset download            │
└────────────────────────────┬────────────────────────────────┘
                             ▼
┌─────────────────────────────────────────────────────────────┐
│  Phases 1–2: Sensor ingest + RGB-D fusion                  │
│  sensors/tum_loader · sensors/synthetic · perception/rgbd    │
└────────────────────────────┬────────────────────────────────┘
                             ▼
┌─────────────────────────────────────────────────────────────┐
│  Phase 3: Mapping — keyframes + Gaussian world build         │
│  mapping/mapper · gaussian_world/builder                     │
└────────────────────────────┬────────────────────────────────┘
                             ▼
┌─────────────────────────────────────────────────────────────┐
│  Phase 4: Localization — ICP vs Gaussian centers             │
│  localization/localizer                                      │
└────────────────────────────┬────────────────────────────────┘
                             ▼
┌─────────────────────────────────────────────────────────────┐
│  Phase 5: Scene memory — world bundle persist/load           │
│  scene_memory/world_bundle                                   │
└────────────────────────────┬────────────────────────────────┘
                             ▼
┌─────────────────────────────────────────────────────────────┐
│  Phase 6: Transition log + demo videos                       │
│  transition_log/logger · viz/video                           │
└─────────────────────────────────────────────────────────────┘

3.2 Module map

Module	Responsibility
`sensors/`	Load TUM sequences; procedural synthetic RGB-D
`perception/`	Depth back-projection to world-frame points
`gaussian_world/`	Gaussian field params, build, CPU splat render
`mapping/`	Keyframe selection by pose delta
`localization/`	ICP pose refinement against map
`scene_memory/`	PLY + binary bundle + metadata
`transition_log/`	JSONL `(state, action, state)` records
`viz/`	Orbit and trajectory MP4 export

3.3 Software stack

Component	Choice (v1)
Language	Python 3.9+
Arrays / geometry	NumPy, OpenCV
Config	YAML
Video	imageio + libx264
Tests	pytest
Package	`atlas_gs` (editable install via `pyproject.toml`)

4. Data Model

4.1 World state

At time (t):

State_t = {
  gaussian_map_id,
  gaussian_params: GaussianField,
  robot_pose: SE(3),
  timestamp,
  frame_id,
  metadata
}

4.2 Gaussian field (v1 proxy)

Full 3DGS stores anisotropic covariances and spherical-harmonic color. Atlas-GS v1 uses a Gaussian proxy field so the entire pipeline runs on CPU:

Field	Type	Description
`position`	(\mathbb{R}^3)	Mean in world frame
`scale`	(\mathbb{R})	Isotropic radius
`color`	RGB uint8	View-independent color (v1)
`opacity`	(\mathbb{R})	Alpha
`id`	uint64	Stable identifier

Upgrade path: gaussian_world/trainer_gsplat.py documents COLMAP export → Inria 3DGS / gsplat training → import of SH coefficients and anisotropic scales.

4.3 Observations

Observation_t = {
  rgb: Image[H, W, 3],
  depth: Image[H, W],           # meters
  intrinsics: CameraMatrix,
  extrinsics: SE(3),             # optional ground truth
  timestamp, frame_id
}

4.4 Transitions

Every logged interaction:

Transition = {
  state_before, action, state_after,
  delta_summary: { pose_delta, ... }
}

v1 derives actions as base_velocity from consecutive ground-truth pose translations—placeholder semantics for v2 learning.

5. Algorithms

5.1 RGB-D fusion

Valid depth pixels (within [min_depth, max_depth]) are back-projected:

[ \mathbf{p}_\text{cam} = z \cdot K^{-1} [u, v, 1]^T ]

and transformed to world frame when pose (T_{wc}) is available:

[ \mathbf{p}\text{world} = R \mathbf{p}\text{cam} + \mathbf{t} ]

Stride-2 subsampling balances density and speed.

5.2 Voxel Gaussian merge

Fused points are aggregated into voxels of size (v) (default 2–4 cm). Each voxel becomes one Gaussian with mean position, mean color, mean opacity, and scale (\approx v/2). Incremental mapping merges new keyframe Gaussians via GaussianField.merge() followed by voxel downsampling.

5.3 Keyframe selection

A new keyframe is accepted when translation exceeds 8–12 cm or rotation exceeds 8–10° relative to the previous keyframe (configurable in configs/default.yaml and configs/demo.yaml).

5.4 Localization

Given an initial pose (ground truth on TUM, or previous frame), subsampled depth points are refined with lightweight ICP against Gaussian center positions:

Max correspondence distance: 5–8 cm
Iterations: 20–30
Translation update: damped step on mean residual

Pose error is reported against TUM ground truth when available.

5.5 Rendering

A CPU splat renderer projects Gaussians as alpha-blended disks sorted by depth. Two demo modes:

Orbit — 360° tour around the world centroid
Trajectory — replay dataset camera poses against the map

6. Implementation Phases

Atlas-GS v1 implements all specification phases:

Phase	Goal	Status	Key artifacts
0	Environment & datasets	Done	`pyproject.toml`, `download_datasets.py`, synthetic generator
1	Offline world builder	Done	`gaussian_world/builder.py`, `atlas-gs build`
2	Sensor ingest	Done	`tum_loader.py`, `rgbd_fusion.py`
3	Online mapping	Done	Keyframes + incremental merge
4	Localization	Done	`localizer.py`, `localization.json`
5	Scene memory	Done	PLY + `ATLASGS1` binary + metadata
6	Transitions & demo	Done	`transitions.jsonl`, MP4 videos, `run_demo.py`

6.1 CLI

atlas-gs build      --input <dataset> --output <world_dir>
atlas-gs localize   --world <world_dir> --input <dataset>
atlas-gs log-transitions --world <world_dir> --input <dataset>
atlas-gs demo-video --world <world_dir> --output <mp4> [--mode orbit|trajectory]

6.2 End-to-end orchestrator

scripts/run_demo.py executes Phases 0–6 in sequence for synthetic or tum-fr1-xyz presets.

7. Datasets

Dataset	Preset	Role
TUM RGB-D `fr1_xyz`	`tum-fr1-xyz`	Mapping, localization, transitions
TUM RGB-D `fr1_desk`	`tum-fr1-desk`	Optional second scene
Synthetic room	`synthetic`	CI / offline demo, known GT poses

TUM ingest: fx=fy=525, cx=319.5, cy=239.5, 640×480; depth scale 5000; RGB–depth sync via nearest timestamp (Δt $\leq$ 20 ms).

Future: Replica, ScanNet, Habitat-Sim, AI2-THOR—same ingest path with additional loaders.

8. Evaluation and Results

We report results from the v1 demo configuration (configs/demo.yaml: voxel 4 cm, 40 frames).

8.1 TUM RGB-D `fr1_xyz`

Metric	Value
Frames processed	40
Keyframes selected	4
Gaussians in map	4,018
Localization trans. RMSE	0.0102 m
Transitions logged	39
World bundle	`worlds/tum-fr1-xyz/`

8.2 Synthetic room

Metric	Value
Frames processed	40
Keyframes	40 (full orbit)
Gaussians in map	~29,000 (demo config)
Demo videos	orbit + trajectory MP4

8.3 Artifacts

worlds/<name>/
├── metadata.json
├── gaussians.ply
├── gaussians.bin          # ATLASGS1 magic, fast reload
├── localization.json
└── transitions.jsonl

demos/videos/
├── tum-fr1-xyz_orbit.mp4
├── tum-fr1-xyz_trajectory.mp4
├── synthetic_orbit.mp4
└── synthetic_trajectory.mp4

9. Design Decisions

9.1 Simulation-first, hardware-optional

v1 validates the full loop on benchmark and synthetic data. RealSense / ROS 2 ingest is specified but deferred—reducing barrier to reproduction.

9.2 CPU proxy before GPU 3DGS

Training full 3DGS requires CUDA and minutes-to-hours per scene. The proxy field proves mapping, memory, localization, and logging semantics immediately, while trainer_gsplat.py preserves the upgrade path.

9.3 Transition logging for v2

JSONL transition logs with pose deltas are intentionally simple. They seed action semantics learning in v2 without committing to a specific policy or manipulation stack.

9.4 Folder hierarchy = conceptual hierarchy

NFM-Worlds/ and NFM-Robotics/Atlas/ exist as scaffolds even where code is not yet extracted—making the research architecture legible in the repository.

10. Limitations

Isotropic Gaussians, no SH. View-dependent effects and anisotropic geometry are approximated.
ICP localization. Not comparable to learned relocalizers or full Gaussian SLAM systems.
Batch keyframe processing. v1 merges keyframes in batch; true streaming SLAM is future work.
Hardcoded TUM intrinsics. Per-device calibration YAML is specified but not yet wired.
No dynamic objects. Static world assumption; dynamic scenes are out of scope.
CPU renderer. Demo videos are functional, not photorealistic.

11. Roadmap

Atlas-GS follows the NFM seven-stage pipeline:

Version	Capability
v1	World representation (this work)
v2	Action semantics from transition log
v3	Causal world models
v4	Mechanism discovery
v5	Hypothesis generation
v6	Active experiment design
v7	Adaptive scientific reasoning

Near-term engineering: gsplat integration, ROS 2 nodes, Replica/ScanNet loaders, learned localization, incremental online mapping.

12. Reproducibility

cd NFM-Robotics/Atlas/Atlas-GS/implementation
python3 -m venv .venv && source .venv/bin/activate
pip install -e ".[dev]"

python scripts/run_demo.py --dataset synthetic --max-frames 40
python scripts/run_demo.py --dataset tum-fr1-xyz --max-frames 40

Source: NFM-Robotics/Atlas/Atlas-GS/implementation/src/atlas_gs/
Documentation: NFM-Robotics/Atlas/Atlas-GS/docs/
Specification: private/atlas-gs-spec/atlas-gs-spec.md

13. Conclusion

Atlas-GS v1 delivers the first runnable Gaussian world-modeling stack in the Nature Foundation Models hierarchy. It transforms RGB-D observations into persistent Gaussian maps, localizes against them, saves and reloads world state, logs transitions for future action learning, and produces demo videos—all in a modular, CPU-accessible package. The implementation is intentionally narrow: it does not yet reason scientifically, but it establishes the world substrate on which NFM’s developmental pipeline depends. By mirroring the NFM hierarchy in repository structure and keeping the State_t + Action_t → State_{t+1} abstraction central, Atlas-GS provides a concrete foundation for v2–v7 without architectural rework.

References

Kerbl, B., et al. (2023). 3D Gaussian Splatting for Real-Time Radiance Field Rendering. SIGGRAPH.
Sturm, J., et al. (2012). A Benchmark for the Evaluation of RGB-D SLAM Systems. IROS (TUM RGB-D dataset).
Haarnoja, T., et al. (2023). Learning Universal Policies via Text-Guided Video Generation. (World models context.)
Manoharan, I. (2026). Nature Foundation Models: A Hierarchical Framework for Learning Worlds, Embodiment, and Scientific Intelligence. NFM framework paper.
Nature Foundation Models Project. Atlas-GS Build Specification. private/atlas-gs-spec/atlas-gs-spec.md.

Appendix A: Configuration defaults

mapping:
  voxel_size: 0.02
  max_depth: 4.0
  keyframe_translation: 0.08
  keyframe_rotation_deg: 8.0

localization:
  icp_max_correspondence: 0.05
  icp_iterations: 30

Demo profile uses coarser voxels (0.04 m) for faster build and render.

Appendix B: World bundle binary format

Magic:     "ATLASGS1" (8 bytes)
Count:     uint64
Positions: count × 3 × float32
Scales:    count × float32
Colors:    count × 3 × uint8
Opacity:   count × float32
IDs:       count × uint64

Interchange PLY is written in parallel for external tools.

Atlas-GS: An End-to-End Implementation of Gaussian World Modeling for Embodied Robotics