SciLayer

Most machine learning memory optimizes compression of observations—autoencoders, predictive coding, latent state descriptions. Scientific reasoning in unknown interactive worlds requires a different substrate: transition logs, exploration graphs, and experiment records optimized for intervention, not reconstruction. This concept paper contrasts compressive observation memory with transition-centric experiment memory, explains why causal structure emerges through interaction (Pearl), situates ASRA relative to Buchanan–Ma representation learning, and shows how ASRA Phase 3 decomposes directed exploration into novelty and usefulness under a step budget—turning episodic transitions into reusable exploration graphs rather than compressed latents.

Abstract

Most machine learning research on memory optimizes compression of observations: autoencoders compress pixels, predictive coding compresses predictions, representation learning compresses state descriptions. Adaptive scientific reasoning in unknown interactive environments requires a different contract. The agent must remember what changed when it acted, which experiments were informative, and which transitions led to progress—not merely reconstruct what was seen.

This paper presents two coupled ideas from the Adaptive Scientific Reasoning Architecture (ASRA). First, transition-centric memory replaces compressive observation memory with experiment memory: transition logs, exploration graphs, hypothesis histories, and causal traces grounded in (state, action, next_state) tuples. Second, directed exploration memory (ASRA Phase 3) decomposes action selection under a step budget into novelty and usefulness, turning episodic experience into reusable exploration graphs rather than latent compressions. We contrast these designs with representation-learning memory theory (Buchanan–Ma), cite Pearl’s intervention requirement for causal structure, and state evaluation implications for scientific agents.

1. Two memory pipelines

1.1 Compressive observation memory

The dominant pipeline in representation learning:

(Observation) → (Compressed observation)

Examples:

Mechanism	What is stored	Typical objective
Autoencoder latent	`compressed(state)`	Reconstruction error
Predictive coding	compressed prediction residual	Next-frame prediction
RAG chunk embedding	compressed text span	Retrieval similarity
Foundation-model hidden state	high-dimensional summary	Language modeling loss

These systems excel at recognition, denoising, and correlation across snapshots. They do not, by themselves, answer: If I intervene with action a in state s, what tends to happen?

Pearl’s intervention calculus makes the limit explicit: causal structure cannot be identified from passive observations alone. A latent that reconstructs pixels preserves appearance, not controllable dynamics.

1.2 Transition-centric experiment memory

ASRA’s alternative pipeline:

(Action) → (State change) → (Observed effect)

Equivalently:

(state, action, next_state, reward/metadata)

This is closer to an RL replay buffer than to an autoencoder—except ASRA treats the buffer as a scientific record, not only a training shard. Memory objects include:

Transition logs — append-only (s, a, s′) with effect signatures
Exploration graphs — visit counts, frontiers, edge statistics
Hypothesis histories — ranked, refuted, and confirmed semantic guesses
Intervention outcomes — perturbation–response pairs (Decision Biology bridge)
Causal traces — which experiments supported which semantic claims

The goal is not to reconstruct what was seen. The goal is to remember what changed, why it changed, and what happened when the system acted.

Memory for science is not compression of observations.
It is compression of experiments.

Transition-centric memory still compresses— but it compresses experiments and transitions, discarding redundant revisits, merging duplicate trials, and promoting only episodes that updated the agent’s model of mechanism (taboo lists, dead-end clusters, Phase 7 failure analyzers are curation, not bigger buffers).

2. Relation to Buchanan–Ma and Pearl

Buchanan, Pai, Wang, and Ma (2026) ask how machines should learn memory as compact structure in high-dimensional data—PCA through transformers and diffusion—optimizing rate–distortion on observed samples. ASRA asks how machines should reason scientifically when action semantics, goals, and mechanisms are unknown.

The programs are complementary:

Dimension	Buchanan–Ma (compressive)	ASRA (transition-centric)
Primary input	Passive / weakly labeled corpora	Action-conditioned transitions
Core object	Low-dimensional representation of data	Exploration graph, semantics, plans
Learning mode	Compression, denoising, priors	Exploration, intervention, hypothesis rank/refute
Memory artifact	Latent code, generative prior	Visitation keys, edge stats, transition JSONL
Intelligence target	Distributional memory (Levels 1–2)	Semantics, goals, experiments (Levels 3–4)

Buchanan–Ma supplies representational substrate ASRA largely defers (how to compress grids or cell profiles). ASRA supplies the interactive loop their Chapter 9 flags as open: close perception–action, test hypotheses, discriminate goals.

Nature Foundation Models extends the same primitive: State_t + Action_t → State_{t+1}—Atlas-GS logs world bundles and transition logs; Decision Biology treats perturbation outcomes as pathway memory. Cells do not “remember” microscopy images; they remember intervention outcomes.

3. Phase 1 prerequisite: you cannot navigate without logging

Transition-centric memory is not an optional upgrade. ASRA Phase 1 (Experience Engine) establishes hash-stable state IDs, effect logging, and coarse exploration under cell-diff semantics. Without trustworthy transition logs, Phase 3 exploration graphs inherit garbage structure.

Phase 2 adds object-centric observation—connected components, transform events, rule candidates—so that “change” can be described structurally, not only as pixel counts. Phase 3 consumes both streams.

Phase 1   Experience     — transitions, hashes, effect logging
Phase 2   Observation    — objects, scenes, transform events
Phase 3   Navigation     — exploration graph, novelty, usefulness, subgoals

4. Directed exploration memory: novelty ≠ usefulness

Under a fixed step budget, the objective is not maximum coverage or uniform random wandering. It is discovering which transitions are worth revisiting and which frontiers actually lead somewhere.

4.1 The mistake most teams make

Many agent systems optimize coverage or hash novelty alone:

Visit every unseen cell
Maximize entropy of state visitation
Treat “unvisited” as synonymous with “valuable”

That fails in interactive scientific settings because:

Case	Problem
Novel but useless	Dead-end corridor, zero reward, zero structural progress
Useful but familiar	Repeated step in a successful DoorKey or BabyAI chain
Cosmetic novelty	Hash changes while object multiset is unchanged (Phase 2 fingerprints help)

ASRA Phase 3 decomposes exploration into two axes:

Novelty(s)     — expected information gain from visiting s
Usefulness(a|s) — expected progress toward reward, frontiers, subgoals

Baseline state novelty (Phase 3 v1):

novelty(s) = 1 / sqrt(1 + visit_count(s))
           + α · 1[object_fingerprint unseen]
           + β · frontier_bonus(s)

Usefulness aggregates progress signals:

usefulness(a|s) = w_r·Δreward + w_f·frontier_gain + w_g·subgoal_progress
                + w_o·object_delta − w_d·dead_end

ExplorationPolicyV2 ranks actions by blending edge statistics (average novelty + usefulness − repeat penalty) with priors on unexplored edges and optional strategy bias—not by unseen-cell count alone.

4.2 State memory vs exploration memory

Phase 3 separates visitation keys from structural exploration records:

State memory (revisit detection)

Exact hash keys
Object-scene fingerprints (soft revisit when grids differ cosmetically)
Recent episodic window for loop penalty

Exploration memory (directed search)

Exploration graph with frontier scores and dead-end flags
Transition histories with subgoal metadata
Strategy library extracted from progress-making episodes
Replay buffer of high-value transitions for analysis

Memory here is episodic and structural, not a compressive autoencoder latent. The agent builds a graph:

(state) → action → next_state → outcome

Subgoals emerge from frequently reused progress-making transitions (BabyAI mission parsing, DoorKey milestone chains)—similar in spirit to compositional task decomposition, but extracted from logs, not hand-authored for every environment.

4.3 Cross-episode reuse

ExplorationSessionState persists graph, visitation memory, strategy library, and replay across batch episodes. A successful sequence in episode n biases episode n+1 when preconditions match—without oracle maps. This is strategy reuse, not memorization of pixels.

5. Evaluation: measure memory under action

Benchmarks that score reconstruction—pixel MSE, latent similarity, token perplexity—reward the wrong architecture for scientific agents.

Better tests ask:

After N exploratory steps, can the agent predict the effect of an untried intervention?
Can it cite which past experiments support that prediction?
Under a fixed budget, does directed exploration outperform coverage-only baselines on frontier discovery and subgoal completion?
Does memory curate failures (dead ends, wrong goals) rather than grow unbounded?

Memory quality is measured by generalization under action, not fidelity of observation compression.

6. Synthesis

Question	Compressive observation memory	Transition-centric + exploration memory
What is remembered?	What was present	What changed when we acted
Primary operation	Encode / decode	Log / rank / replay transitions
Causal content	Correlational	Intervention-conditional
Phase 3 mistake	Maximize unseen states	Balance novelty and usefulness
Good for	Recognition, generation	Discovery, navigation, scientific reasoning

ASRA’s memory stack is deliberately inspectable: sparse graphs, explicit visit counts, hypothesis counters—compatible with competition latency budgets and human audit. It trades the elegance of a single latent space for experiment records that survive intervention.

The right design question is not how much can we remember? but which experiments are worth remembering—and which should be forgotten so the next one teaches something new?

References

Ilakkuvaselvi Manoharan. Transition-Centric Adaptive Reasoning: ASRA Phase 1. https://sci-layer.vercel.app/articles/transition-centric-adaptive-reasoning-asra-phase-1
Ilakkuvaselvi Manoharan. Directed Exploration and Episodic Memory: ASRA Phase 3. https://sci-layer.vercel.app/articles/directed-exploration-episodic-memory-asra-phase-3
Ilakkuvaselvi Manoharan. ASRA vs Buchanan–Ma: A Mathematical Theory of Memory. https://sci-layer.vercel.app/articles/asra-vs-buchanan-mathematical-theory-of-memory
Ilakkuvaselvi Manoharan. Architectures for Adaptive Scientific Reasoning Under Uncertainty. https://sci-layer.vercel.app/articles/architectures-adaptive-scientific-reasoning-under-uncertainty
Judea Pearl. Causality — intervention vs observation (cited in Buchanan–Ma Ch. 9 and ASRA Phase 4+).
Buchanan, Pai, Wang, Ma (2026). Principles and Practice of Deep Representation Learning: or A Mathematical Theory of Memory. arXiv:2606.06624.

Transition-Centric Memory and Directed Exploration: Beyond Compressive Observation Memory in ASRA