SciLayer

Phase 4 adds the Semantics and Causal Inference Engine: semantic signatures, causal transition models, hypothesis confirm/refute, counterfactual queries, and epistemic uncertainty over (state, action) pairs—building on Phases 1–3. The article presents theory and architecture; a companion Kaggle notebook deploys CausalSemanticsEngine hints for ARC Prize 2026.

Abstract

Phase 1 of the Adaptive State–Reasoning Agent (ASRA) established transition-centric experience: log every (state, action, next_state, reward) tuple and infer coarse action semantics from cell-level diffs. Phase 2 added object-centric observation—transform events and compact scenes. Phase 3 added directed exploration and episodic memory—visit counts, novelty, usefulness, subgoals, and strategy reuse.

None of these layers yet answers the questions that define scientific and strategic reasoning over interventions: If I take ACTION3 here, what will happen? How confident am I? What would have happened if I had taken ACTION1 instead?

We describe ASRA Phase 4 as the Semantics & Causal Inference Engine: a stack that aggregates action–effect observations into semantic signatures, maintains a causal transition model for predicting next-state features, tracks hypotheses about action meaning with confirm/refute updates, supports counterfactual queries over alternate actions, and scores epistemic uncertainty per (state, action) pair. The full engine lives in asra-arc/src/asra/causality/; the Kaggle competition agent embeds a compact CausalSemanticsEngine that preserves Phase 2 object-scene and Phase 3 exploration hints while adding semantic labels, confidence, uncertainty, and transition prediction.

This article presents the theory, architectural decomposition, and design principles. It does not prescribe deployment mechanics; it specifies what Phase 4 adds and why it is the first explicit step toward Phase 5 goal inference and the long-range Decision Biology bridge.

1. The architectural gap Phase 4 closes

ASRA’s roadmap treats intelligence as a cumulative stack:

Phase 1   Experience Engine           — transitions, hashes, cell diffs
Phase 2   Observation Engine          — objects, transforms, rule hypotheses
Phase 3   Navigation & Memory         — exploration graph, visitation, subgoals
Phase 4   Semantics & Causal Inference — action meaning, prediction, counterfactuals
Phase 5   Goal inference              — win-condition hypotheses, experiment design
Phase 6+  Planning, robustness

Phase 1 answers: “What happened when we acted?”
Phase 2 answers: “What structural entities changed?”
Phase 3 answers: “Where should we go next, given what we already know?”
Phase 4 answers: “What does this action do, and how sure are we?”

Without Phase 4, an agent that explores efficiently still acts blind to semantics: ACTION3 and ACTION1 may produce identical cell-diff statistics in one context and divergent transform families in another. Phase 3 novelty rewards untested edges; Phase 4 tells the agent what kind of edge it is testing and whether repeated observations confirm or refute a causal hypothesis.

flowchart LR
  subgraph P1["Phase 1 — Experience"]
    T[Transition τ]
    A[Action reports]
  end
  subgraph P2["Phase 2 — Observation"]
    S[Compact scene Σ]
    X[Transform events]
  end
  subgraph P3["Phase 3 — Memory"]
    G[Exploration graph]
    M[Visitation memory]
  end
  subgraph P4["Phase 4 — Causality"]
    E[Effect signatures]
    P[P(s′|s,a)]
    H[Hypotheses]
    CF[Counterfactuals]
    U[Uncertainty]
  end
  subgraph Future["Phase 5+"]
    GH[Goal hypotheses]
    PL[Planning]
  end
  T --> E
  S --> E
  X --> E
  G --> P
  E --> P
  P --> H
  H --> CF
  E --> U
  U --> GH
  CF --> PL

2. Theoretical stance: intervention–response without oracle labels

ASRA Phase 4 does not assume the environment publishes action names or manuals. Semantics are induced from observed effects, exactly as Phase 1’s classify_effect() induced coarse buckets from cell counts— but Phase 4 subsumes and extends that taxonomy using Phase 2 transform histograms.

The epistemic object is an intervention–response tuple:

(s, a) → Δ_cells, Δ_obj, transform_histogram, s′, r

From repeated tuples with the same (game_id, state_hash, action) key, Phase 4 builds an ActionEffectSignature: distributional stats over diffs, aggregated transform classes, terminal and dead-end rates, a semantic label, and a confidence score.

This is already the form of perturbation–response reasoning used in biological settings—cell state, perturbation, next cell state—without yet switching domains. Phase 4 is the conceptual bridge where ASRA begins to resemble Decision Biology in structure, not yet in dataset.

Paradigm	Phase 4 stance
Hand-coded action meanings per game	Rejected — semantics are empirical
Cell-diff-only semantics (Phase 1 stub)	Subsumed — extended with object/transform features
Neural world models	Deferred — v1 uses lookup + smoothing
Full counterfactual imagination	v1 uses observed alternates + model fallback
Goal / win-condition inference	Deferred to Phase 5

3. Action-effect summarization

3.1 From coarse buckets to semantic labels

Phase 1’s action tester emits: no_change, small_change, large_change, dead_end, terminal_transition, repeated_state. The Kaggle Phase 3 stub (ActionSemanticsInferencer) refined this slightly to no-op / blocked, localized cell update, multi-cell transform using variance of changed-cell counts.

Phase 4’s ActionEffectSummarizer adds:

Feature	Source
`cell_change_mean`, `cell_change_std`	Phase 1 diffs
`object_delta_mean`	Phase 2 `delta_num_objects`
`transform_histogram`	Phase 2 transform classes or embedded scene deltas
`terminal_rate`, `dead_end_rate`	Episode outcomes

Semantic labels (v1) include: no_op, localized_transform, translate, recolor, create_object, delete_object, object_count_change, multi_cell_transform, terminal_transition, dead_end.

Label assignment combines distributional cell stats with the dominant transform class in the histogram—so two actions with similar cell counts but different transform profiles receive different semantics.

3.2 Confidence and consistency

Confidence grows with observation count and consistency (low variance in changed-cell counts):

confidence(s,a) = min(1, (n/5)·0.6 + (1/(1+σ_cells))·0.4)

where n is the number of observed (s,a) transitions and σ_cells is the standard deviation of changed-cell counts. This replaces the Phase 3 stub’s consistency_score with an explicit confidence usable in policy weighting and metadata export.

4. Causal transition model

Phase 4’s CausalTransitionModel (v1) is deliberately non-neural: a lookup table over (game_id, state_hash, action) → successor hash distribution plus averaged feature vectors (changed cells, object delta, transform list).

Given sufficient coverage from Phase 3 exploration, the model answers:

P(s′ | s, a)  ≈  count(s,a,s′) / Σ_s′ count(s,a,s′)
predicted_Δ_cells(s,a)  ≈  mean observed changed cells for top successor

When no observations exist, prediction returns zero support and the policy falls back to Phase 3 exploration terms—Phase 4 augments rather than replaces directed curiosity.

Evaluation: eval_prediction_mae compares predicted vs actual changed cells on held-out transition order (predict before observe each row). Beating a global-mean naive baseline is the initial success criterion on ARC JSONL logs.

5. Hypothesis testing and counterfactuals

5.1 CausalHypothesis records

The HypothesisTester maintains explicit records:

hypothesis = (game_id, action, predicted_effect, support, refute, status)
status ∈ {active, weak, confirmed, refuted}

New effect signatures upsert hypotheses. Subsequent transitions confirm when observed semantics match prediction within tolerance, or refute when cell-diff divergence exceeds a threshold. Weak hypotheses (support < 3) contribute extra uncertainty in the UncertaintyScorer.

This is lightweight scientific method over transitions: propose an effect class from data, test on new evidence, update status—without a separate symbolic logic engine.

5.2 Counterfactual simulator

The CounterfactualSimulator answers: “What if action a′ instead of a from state s?”

v1 mechanism:

Lookup observed (s, a′) transitions in the transition model.
If unseen, return low-confidence empty prediction.
Return predicted changed cells, object delta, transform list, and source flag (observed vs model vs none).

Full imagined grid states are out of scope for v1; counterfactuals operate on effect features, aligning with CLEVRER-style multiple-choice reasoning in later eval tracks without requiring video models in Phase 4.

6. Uncertainty and the unified change analyzer

6.1 Epistemic uncertainty

UncertaintyScorer assigns per-action uncertainty:

uncertainty(s,a) = 1 / sqrt(1 + n_obs)
                 + w_h · 1[hypothesis weak]
                 + w_v · variance_penalty(effect_signature)

High uncertainty aligns with Phase 3 novelty: actions worth probing because their effects are not yet stable. Low uncertainty plus high predicted progress aligns with exploitation—reuse actions whose semantics are confirmed.

Phase 4 therefore unifies explore because unseen (Phase 3) and explore because semantically unstable (Phase 4).

6.2 ChangeReport

The ChangeAnalyzer merges Phase 1 cell diffs with Phase 2 TransformationDetector output into a single ChangeReport: changed cells, object scene deltas, transform histogram, graph-edge-created flag, level-changed flag, and a human-readable summary.

This is the attach point for transition metadata and batch semantics mining—one diff object consumed by summarizer, model, and hypothesis tester.

7. System architecture (library view)

Phase 4 in asra-arc decomposes as:

effect_summarizer.py   →  ActionEffectSummarizer, semantic labels
transition_model.py    →  CausalTransitionModel, TransitionPrediction
hypothesis_tester.py   →  CausalHypothesis confirm/refute
counterfactual.py      →  CounterfactualSimulator
uncertainty.py         →  UncertaintyScorer
change_analyzer.py     →  ChangeReport (cell + object + transform)
semantics_store.py     →  online ingest, persistent per-game JSON
arc_semantics.py       →  batch mine JSONL, eval_prediction_mae
policy_v3.py           →  CausalExplorationPolicyV3 (extends Phase 3 v2)
schemas.py             →  dataclass contracts

SemanticsStore orchestrates online updates: ingest transition → update summarizer and model → upsert hypothesis → attach metadata.causality block.

flowchart TB
  subgraph Inputs
    JSONL[ARC / MiniGrid JSONL]
    LIVE[ARC-AGI-3 runner]
  end
  subgraph P4core["Phase 4 core"]
    CA[ChangeAnalyzer]
    ES[EffectSummarizer]
    TM[TransitionModel]
    HT[HypothesisTester]
    US[UncertaintyScorer]
    CF[CounterfactualSimulator]
    SS[SemanticsStore]
  end
  subgraph P3["Phase 3 (retained)"]
    POL2[ExplorationPolicyV2]
  end
  JSONL --> SS
  LIVE --> SS
  SS --> CA
  CA --> ES
  ES --> TM
  ES --> HT
  TM --> CF
  ES --> US
  POL2 --> POL3[CausalExplorationPolicyV3]
  US --> POL3
  TM --> POL3

Dataset tracks (roadmap):

Dataset	Phase 4 role
ARC-AGI-3 transition logs	Primary — ACTION1–ACTION7 semantics per game
PHYRE	Physical causal reasoning, experiment efficiency (Milestone 4C, pending)
CLEVRER	Counterfactual QA on annotations (optional v1)

8. Closing the loop with Phases 1–3

Phase 4 extends prior layers; it does not replace them.

Layer	Phase 4 consumption
Phase 1 transitions	Canonical τ; `metadata.causality` enrichment
Phase 1 action reports	Subsumed into effect signatures
Phase 2 compact scenes	Object delta + transform histogram inputs
Phase 2 transform events	Dominant class for semantic labeling
Phase 3 exploration graph	Edge observation counts weight model confidence
Phase 3 novelty / usefulness	Retained; uncertainty and prediction add terms
Phase 3 strategy reuse	Unchanged — semantics bias sits alongside

Kaggle competition agent (asra-v0.6-phase4): embeds Phase 2 compact_scene(), Phase 3 CompactExplorationHints, and Phase 4 CausalSemanticsEngine in a single ASRAExplorer.choose_action():

score(action) = Phase1_terms
              + OBJECT_HINT_WEIGHT · object_bonus
              + EXPLORATION_HINT_WEIGHT · exploration_score
              + SEMANTICS_HINT_WEIGHT · confidence
              + PREDICTION_HINT_WEIGHT · predicted_progress
              + UNCERTAINTY_HINT_WEIGHT · uncertainty

Reasoning strings cite semantic label, confidence, and uncertainty:

ASRA Phase4: ACTION3 | objects=5 | visits=2 | sem=translate conf=0.81 u=0.12

The notebook (asra-phase-4-arc-prize-2026.ipynb) writes my_agent.py and validates with --self-test (perception, exploration, and causality smoke tests without ARC runtime); Kaggle scoring re-runs the agent in an isolated venv.

9. Empirical landscape

Phase 4 metrics differ from Phase 2 rule coverage and Phase 3 coverage percentages. They measure semantic consistency and effect prediction quality.

9.1 ARC-AGI-3 transition logs

Metric	Intent
Semantics consistency	Same `(s,a)` → stable label across replay
Effect prediction MAE	\|predicted − actual\| changed cells vs naive global mean
Hypothesis confirm rate	Confirmed / active hypotheses over episodes
Mean confidence / uncertainty	Aggregate signature quality

CLI: python -m asra build-action-semantics, python -m asra eval-phase4-arc.

9.2 PHYRE and CLEVRER (roadmap)

PHYRE targets success prediction and probe efficiency under physical causality. CLEVRER targets counterfactual question accuracy on processed annotations. Both are secondary to ARC log mining in v1; PHYRE adapter remains Milestone 4C pending.

9.3 What Phase 4 metrics are not

Original ARC 800-task rule coverage (Phase 2)
MiniGrid coverage % (Phase 3)
Competition win rate or Milestone #2 claims (Phase 6)
Biological perturbation prediction (Phase 8)

10. Position in the ASRA research program

Question	Phase 3	Phase 4
Why try action a?	Novelty, usefulness, strategy	+ Semantics, uncertainty, predicted effect
Unit of causal memory	Edge stats on exploration graph	Effect signatures + transition model
Counterfactuals	None	Alternate-action effect lookup
Bridge to biology	Memory / coverage analogy	Explicit intervention–response structure

Phase 4 is where ASRA begins scientific-style reasoning over interventions: hypotheses, evidence, uncertainty, and counterfactual queries—still grounded in the same transition stream as Phase 1.

From the Decision Biology roadmap:

environment state  →  action  →  next state        (Phase 4, games)
cell state         →  perturbation  →  next cell state   (Phase 8, biology)

The inference loop is shared; only the state encoder and action vocabulary change.

11. Kaggle submission and agent evolution

Version	Tag	Layer added
Phase 1	`asra-v0.1` … v4	Transition logging, coarse semantics
Phase 2	`asra-v0.4-phase2`	Compact object-scene hints
Phase 3	`asra-v0.5-phase3`	Visit memory, novelty, loop penalty
Phase 4	`asra-v0.6-phase4`	Causal semantics, confidence, uncertainty, prediction

Submitted kernel: ilakkmanoharan/asra-phase-4-arc-prize-2026 (v1 ref 53273876; CLI kaggle.json enabled via setup_kaggle_cli.sh)

The notebook pattern matches Phase 2–3: bootstrap venv at /tmp/asra_venv, avoid mirroring agent trees into /kaggle/working, smoke-test with venv Python (including causality_self_test), emit placeholder submission.parquet for validation gate.

Full library capabilities (batch semantics JSON, hypothesis store export, CausalExplorationPolicyV3 on MiniGrid) remain in asra-arc for offline research; the competition agent carries the minimal sufficient causal hint stack.

12. Open problems and next theory steps

Goal inference (Phase 5) — rank win-condition hypotheses; semantics labels become operator vocabulary for progress detection.
PHYRE integration (4C) — physical experimentation policy tied to uncertainty reduction.
Conditional semantics — same action token, different effects by object context; precondition fields on signatures.
Neural transition models (v2) — when lookup tables saturate, tabular or small models over Phase 2 scene features.
Planning (Phase 6) — use confirmed semantics and transition predictions as edge costs in BFS/A* / MCTS.
Decision Biology (Phase 8) — swap grid state for cell-state embeddings; reuse SemanticsStore loop on LINCS / scPerturb.

13. Conclusion

ASRA Phase 4 is the project’s shift from remembering territory to understanding interventions: action-effect signatures make implicit button-press semantics explicit; transition models and uncertainty scores turn exploration into targeted experimentation; hypothesis confirm/refute and counterfactual lookup introduce the minimal machinery of scientific reasoning over (state, action, effect) tuples.

The Phase 4 Kaggle extension is not a new agent philosophy—it is Phase 2 plus Phase 3 plus causal memory of what actions do. Object scenes still describe structure; exploration memory still penalizes loops; semantics layer tells the agent which unknowns are worth resolving next.

Transition-centric adaptive reasoning remains the core; causal semantics is how those transitions become meaningful.

Reference notebook (GitHub)

Interactive companion with Phase 2 object-scene hints, Phase 3 exploration memory, and Phase 4 causal semantics:

References

Chollet, F. On the Measure of Intelligence. arXiv (2019).
Pearl, J. Causality: Models, Reasoning, and Inference. Cambridge University Press (conceptual lineage).
Ilakkuvaselvi Manoharan. Transition-Centric Adaptive Reasoning: ASRA Phase 1 for Interactive Environments. https://sci-layer.vercel.app/articles/transition-centric-adaptive-reasoning-asra-phase-1
Ilakkuvaselvi Manoharan. Object-Centric Adaptive Reasoning: ASRA Phase 2. https://sci-layer.vercel.app/articles/object-centric-adaptive-reasoning-asra-phase-2
Ilakkuvaselvi Manoharan. Directed Exploration and Episodic Memory: ASRA Phase 3. https://sci-layer.vercel.app/articles/directed-exploration-episodic-memory-asra-phase-3
Ilakkuvaselvi Manoharan. ASRA: Adaptive Scientific Reasoning Architecture. https://github.com/ilakkmanoharan/asra
Phase 4 causality implementation — https://github.com/ilakkmanoharan/asra/tree/main/asra-arc/src/asra/causality

Correspondence: ilakkmanoharan@gmail.com

Causal Action Semantics: ASRA Phase 4 — From Observed Effects to Intervention Reasoning