Transition-Centric Experience: ASRA Phase 1 — From Unknown Actions to Empirical Reasoning
Phase 1 establishes the ASRA Experience Engine: transition logging, hash-stable state IDs, effect-based action semantics inference, uncertainty-directed exploration, dead-end taboo, Swarm orchestration, and competition-grade execution fidelity. The article presents full theory and architecture; a companion Kaggle notebook deploys asra-v0.1-phase1 for ARC Prize 2026.
Abstract
Adaptive intelligence in unfamiliar environments cannot begin with predefined policies. The environment does not publish action manuals, win conditions, or transition rules. ASRA Phase 1 establishes the Experience Engine — a minimal agent architecture that treats every intervention as an experiment: observe grid state, apply an action token, measure cell-level diffs and reward proxies, accumulate transition evidence, infer coarse action semantics conditioned on state identity, and explore under uncertainty while tabooing empirically dead ends.
We describe Phase 1 as the foundation layer of the Adaptive Scientific Reasoning Architecture: transition logging (τ = (s, a, s′, r)), hash-stable state IDs, effect-based semantics inference, information-directed exploration, Swarm-scale multi-game orchestration, and competition-grade execution fidelity (isolated venv, validation vs. scoring separation). The research library lives in asra-arc/ (env/, memory/, agent/); the Kaggle agent embeds a compact ASRAAgent with ActionSemanticsInferencer and ASRAExplorer.
This article presents the full theory and architecture. Phase 1 does not solve ARC optimally — it builds the reproducible interaction-to-knowledge pipeline every later phase depends on.
1. The architectural foundation Phase 1 establishes
Phase 1 Experience Engine — transitions, hashes, semantics, exploration
Phase 2 Observation Engine — objects, transforms, rule hypotheses
Phase 3 Navigation & Memory — graphs, novelty, subgoals
Phase 4 Causality — intervention semantics, prediction
Phase 5+ Goals, planning, robustness, biology, integration
Phase 1 answers the first scientific question:
What happened when we acted, and how can we use that evidence to choose the next intervention?
Without Phase 1, later phases have nothing to abstract over. Object detectors need consecutive frames; causal models need logged effects; planners need edge tables. Phase 1 is therefore not a placeholder baseline — it is the empirical substrate of the entire ASRA program.
flowchart LR
subgraph P1["Phase 1 — Experience"]
O[Observe grid]
L[Log transition τ]
I[Infer φ̂(a|s)]
E[Explore]
end
subgraph Later["Phases 2–9"]
Obs[Structure]
Mem[Memory]
Cau[Causality]
Plan[Planning]
end
O --> L --> I --> E
L --> Obs
L --> Mem
I --> Cau
L --> Plan
2. Theoretical stance: experimentation precedes reasoning
Traditional agents assume fixed action semantics: MOVE_LEFT means move left because a programmer defined the mapping. ASRA targets adaptive scientific settings — novel games, instruments, biological perturbations — where the system receives tokens without definitions.
Phase 1 operationalizes one commitment:
Meaning is induced from consequences, not from labels.
The primary unit of intelligence is the transition:
τ = (s, a, s′, r, terminal, metadata)
Memory is not an action transcript; it is growing transition evidence linking states, interventions, and measurable diffs. Reasoning is hypothesis formation over latent operators:
φ̂(a | s) ≈ distribution of Δ(grid) and reward proxy given (s, a)
This framing aligns with ARC-AGI-3 (interactive grid worlds, latent action semantics, sparse feedback) and with Decision Biology (perturbations as interventions, expression readouts as state observations). Phase 1 stays game-bound in implementation but establishes the loop both domains share.
| Paradigm | Phase 1 stance |
|---|---|
| Memorize solutions | Rejected — no task-specific cache |
| Random exploration | Rejected — uncertainty-directed probing |
| Symbolic rules upfront | Deferred to Phase 2+ |
| Transition logging first | Adopted |
| Reproducible pipelines | Adopted — hash-stable exports |
3. State representation
3.1 Canonical grids
Observations are integer grids. ASRA normalizes to canonical 2D integer lists:
def canonical_grid(grid):
return np.array(grid, dtype=int).tolist()
3.2 State hashing
Full grids are not stored in inference tables. Each state receives state_hash = SHA256(JSON(canonical_grid)). Semantics are keyed by (state_hash, action) — the same action may behave differently in different contexts.
Hashing induces an equivalence relation: iff . The agent reasons over equivalence classes — analogous to observation buckets in partially observable control, with the caveat that distinct states may merge.
Phase 2 later adds structural representations (objects, components); Phase 1 accepts hash identity as sufficient for exploratory policy.
4. Action semantics inference
ActionSemanticsInferencer maintains an empirical model per (state_hash, action):
| Recorded | Role |
|---|---|
num_changed_cells |
Effect magnitude from cell diff |
reward |
Progress proxy (levels_completed when available) |
Inference produces human-readable hypothesis labels:
| Signal | Label |
|---|---|
| mean diff = 0 | no-op / blocked |
| mean diff 1.5 | localized cell update |
| higher mean | multi-cell transform |
consistency_score = 1/(1+std) |
reproducibility of effect |
Epistemic uncertainty:
u(s,a) = 1 if no observations
u(s,a) = 1 - min(1, c(s,a)) otherwise
High u means the agent has not stabilized φ̂(a|s). Phase 1 does not separate aleatoric vs. epistemic variance — both inflate the effect table.
5. Uncertainty-aware exploration
ASRAExplorer scores candidate actions:
score(a) = 2.0 / (1 + n(s,a)) # novelty
+ 0.7 * u(s,a) # epistemic uncertainty
+ 0.5 * mean_reward(a) # weak exploitation
+ ε, ε ~ U(0, 0.05) # tie-break
Dead-end tabu: if (s,a) produced zero cell changes and non-positive reward, exclude from future consideration at s.
Action space: simple actions (ACTION1–ACTION5, ACTION7). Complex coordinate actions (ACTION6) use center-grid defaults — a placeholder, not a learned pointer.
This is information-directed probing: prefer interventions that reduce uncertainty about latent semantics, not random walk.
6. Agent integration and game lifecycle
ASRAAgent implements the closed loop:
| Method | Behavior |
|---|---|
is_done |
Stop on WIN or MAX_ACTIONS (default 80) |
choose_action |
RESET when NOT_PLAYED / GAME_OVER; else explorer |
take_action |
Record action name for attribution |
append_frame |
Compute diff; update inferencer and explorer |
| Game state | Agent obligation |
|---|---|
NOT_PLAYED |
Issue RESET |
GAME_OVER |
Issue RESET |
| Active play | Choose among available simple actions |
WIN |
Terminate (is_done) |
7. Swarm orchestration and scorecard aggregation
For competition-scale evaluation, Phase 1 uses official Swarm orchestration:
Arcade.get_environments() → [game_id₁, …, game_idₙ]
↓
Per game: ASRAAgent instance → scorecard rows
↓
submission.parquet (game_id, score, levels_completed, actions, completed)
Design choice: Phase 1 shares GLOBAL_SEMANTICS and GLOBAL_EXPLORER across games — treating effect statistics as domain-general. Phase 2+ may isolate per game_id to reduce cross-game contamination.
8. Validation vs. scoring (two-phase evaluation)
Code competitions separate validation from scoring:
| Phase | What runs | What it proves |
|---|---|---|
| Validation (Run All) | Bootstrap → write my_agent.py → self-test → submission.parquet |
Artifacts exist; imports succeed |
| Scoring (re-run) | Platform executes my_agent.py → run_swarm() → API play |
End-to-end intervention on private games |
Validation success does not imply scoring success. Phase 1 research should report both bootstrap pass rate and scorecard aggregates.
9. Execution fidelity and runtime isolation
The v4-fixed reference pipeline adds epistemic isolation:
| Mechanism | Purpose |
|---|---|
Dedicated asra_venv |
Competition wheels isolated from host kernel |
--system-site-packages |
Host pandas/pyarrow for parquet |
pip install --target venv --no-deps |
No host numpy upgrades |
os.execve(venv_python) on agent entry |
Scoring uses same interpreter as self-test |
Stub agents package |
Load only tracing, recorder, agent, swarm |
Self-test (venv_python my_agent.py --self-test) verifies process isomorphism — same interpreter that scores can import runtime — without full API play during validation.
Environment flags: ASRA_VENV_ACTIVE, PYTHONNOUSERSITE=1, OPERATION_MODE=COMPETITION.
10. Experience Engine library (asra-arc)
Phase 1 research code beyond the Kaggle embed:
env/arc_agi3_runner.py — episode runner
memory/episode_logger.py — JSONL transitions
memory/state_graph.py — observed transition graph
agent/exploration_policy.py — simple exploration
agent/dead_end_detector.py — futile (s,a) detection
transition_schema.py — hash-stable schema + validation
complete-phase1 — reproducible pipeline CLI
Primary dataset output: data/exports/asra_v0_1_transitions.jsonl — self-generated transition corpus for Phases 2–4.
Non-goals (Phase 1): optimal ARC solving, object abstraction, planning, world models, long-horizon strategy.
11. Agent integration
| Version | Tag | Layer |
|---|---|---|
| Early baselines | asra-v0.3, v11.x |
Iterative competition fixes |
| Phase 1 (parity) | asra-v0.1-phase1 |
Experience Engine — canonical kernel |
Kaggle package: kaggle-notebooks/phase1/
cd kaggle-notebooks/phase1
./submit.sh all "asra-v0.1-phase1"
Kernel: ilakkmanoharan/asra-phase-1-arc-prize-2026
12. Position in the ASRA research program
| Question | Phase 1 answer |
|---|---|
| What is memory? | Transition evidence τ |
| What is an action? | Latent token; semantics from effects |
| What is exploration? | Reduce φ̂ uncertainty + avoid dead ends |
| What is success (Phase 1)? | Reproducible pipeline + scorecard participation |
| Bridge to biology | Perturbation → readout = action → state diff |
Phase 1 is the stress test substrate for ARC Prize 2026 and the schema foundation for Decision Biology transitions in Phase 8.
13. Open problems
- Per-game memory isolation — reduce cross-game semantics bleed.
- Complex action policies — learned pointers for
ACTION6. - Neural state encoders — when hash buckets saturate.
- Live ARC-AGI-3 API — credentials for non-sandbox episodes.
- Phase 2 object scenes — structural layer atop τ logs.
14. Conclusion
ASRA Phase 1 proves that adaptive reasoning can begin with nothing but transitions: log what changed, infer what actions tend to do in each state, explore where uncertainty is high, and taboo what empirically fails. No objects, no goals, no plans — yet the loop is already scientific: observe, intervene, compare, update beliefs.
The Phase 1 Kaggle agent (asra-v0.1-phase1) packages this loop for competition judges. The asra-arc library packages it for reproducible research. Every later phase is a refinement of the same question Phase 1 asked first: what happened when we acted, and what should we try next?
Reference notebook (GitHub & Kaggle)
Companion notebook with v4-fixed bootstrap, embedded ASRAAgent, and Swarm orchestration:
- ASRA Phase 1 — ARC Prize 2026 (Kaggle kernel)
- ASRA Phase 1 — ARC Prize 2026 (ASRA repository)
- SciLayer archive copy
References
- Ilakkuvaselvi Manoharan. Object-Centric Adaptive Reasoning: ASRA Phase 2. https://sci-layer.vercel.app/articles/object-centric-adaptive-reasoning-asra-phase-2
- ASRA Phase 1 specification —
documents/ASRA_Phase1_Official_Technical_Specification.md - Phase 1 implementation — https://github.com/ilakkmanoharan/asra/tree/main/kaggle-notebooks/phase1
- Experience Engine CLI —
python -m asra complete-phase1 - ARC Prize 2026 — ARC-AGI-3 competition context
- Decision Biology — perturbation–response analogy (Phase 8 extension)
Related: ASRA Phase 2 · Decision Biology · Nature Foundation Models
Correspondence: ilakkmanoharan@gmail.com