SciLayer

Technical specification for ASRA Phase 3 Milestones 3A–3D: exploration graph, visitation memory, novelty and usefulness scoring, subgoal inference, memory replay, strategy reuse, MiniGrid and BabyAI benchmarks, and competition agent integration. Companion to the Phase 3 conceptual preprint.

Track: Phase 3 (core ASRA roadmap)
Source roadmap: ASRA program documentation in https://github.com/ilakkmanoharan/asra
Timeline: July 2026
Status: COMPLETE — Milestones 3A–3D (see asra-arc/src/asra/exploration/)
Implementation: https://github.com/ilakkmanoharan/asra/tree/main/asra-arc/src/asra/exploration
Conceptual article: https://sci-layer.vercel.app/articles/directed-exploration-episodic-memory-asra-phase-3
Author: Ilakkuvaselvi (Ilak) Manoharan
Last updated: June 2026
Depends on: Phase 1 (Experience Engine) ✅, Phase 2 (Observation Engine) ✅ baseline

1. Mission

Phase 3 makes ASRA efficient in unknown space. Phase 1 proved transition logging and naive exploration; Phase 2 added object-centric structure. Neither phase answers:

Where have I been? What is still unknown? Which action opens new territory? What intermediate goal am I pursuing?

Phase 3 builds the Navigation & Memory Engine — the layer that turns episodic transitions into persistent spatial and strategic knowledge so the agent explores with direction instead of repeating loops.

Phase 1:  τ = (s, a, s′, r)           — log everything
Phase 2:  Σ(s), Δ_obj                  — interpret structure
Phase 3:  G_explore, M_visit, g_sub    — remember, prioritize novelty, infer subgoals
Phase 4+: causal semantics of actions

Primary goal: deliver ASRA exploration engine v1, memory system v1, and subgoal inference module, validated on MiniGrid and BabyAI before heavy integration into ARC-AGI-3 Milestone #2 work (Phase 6).

Non-goals for Phase 3:

Full causal action semantics (roadmap Phase 4)
Goal hypothesis ranking / win-condition inference (Phase 5)
BFS/A* / MCTS planners at competition scale (Phase 6)
Procgen / Crafter generalization (Phase 6–7)
Decision Biology datasets (Phase 8)
Replacing Phase 1 transition schema or Phase 2 perception stack

2. Position in ASRA theory

From ASRA-detailed-roadmap.md, Phases 1–3 form the general adaptive intelligence substrate:

Phase	Cognitive role	ASRA module name
1	Experience	Experience Engine
2	Observation	Observation Engine
3	Exploration & memory	Navigation & Memory Engine
4–5	Causality & hypotheses	Semantics + Hypothesis engines
6–7	Planning & robustness	Strategy / planner stack
8	Domain: Decision Biology	Biology transition graphs

Phase 3 is the last phase before ASRA begins to look like scientific reasoning in the Phase 4–5 sense. Exploration memory and subgoal structure are prerequisites for later perturbation–response modeling: a biologist does not re-test every well from scratch; they remember what was tried and what remained unexplored.

flowchart TB
  subgraph Done["Complete"]
    P1[Phase 1 — transitions, hash graph]
    P2[Phase 2 — object scenes, rules]
  end
  subgraph P3["Phase 3 — this spec"]
    EG[Exploration graph]
    VM[Visitation memory]
    NS[Novelty + usefulness scores]
    SG[Subgoal detector]
    MR[Memory replay]
    SR[Strategy reuse]
  end
  subgraph Later["Phase 4+"]
    SEM[Action semantics]
    PLAN[Planning]
  end
  P1 --> EG
  P2 --> EG
  EG --> VM
  VM --> NS
  NS --> SG
  SG --> MR
  MR --> SR
  SR --> SEM
  SR --> PLAN

3. Why Phase 3 follows Phase 2

After Phase 2 only	Phase 3 adds
Object hints bias single-step scoring	Multi-step coverage and frontier tracking
Hash graph counts visits	Novelty-weighted action selection
No map of “unseen” regions	Explicit exploration graph with frontiers
No task structure	Subgoals from BabyAI-style composition
No cross-episode reuse	Strategy patterns stored and replayed

Roadmap rationale: “After ASRA can parse states, it needs to learn how to explore efficiently instead of randomly trying actions.”

Phase 2 object scenes become node annotations and soft equivalence keys on the exploration graph (two hash-distinct grids with similar object multiset may share strategic context). Phase 3 does not require perfect object segmentation — it requires better exploration than hash-only novelty.

4. Inputs from Phase 1 and Phase 2

4.1 From Phase 1 (existing in `asra-arc/`)

Artifact	Location	Phase 3 use
Transition schema	`memory/transition_schema.py`	Canonical τ records; attach memory metadata
Episode logger	`memory/episode_logger.py`	Episode boundaries for replay
State graph (hash)	`memory/state_graph.py`	Extend → exploration graph
Simple exploration policy	`agent/exploration_policy.py`	Baseline to beat
Dead-end detector	`agent/dead_end_detector.py`	Penalize useless edges in graph
Dataset exporter	`export/dataset_exporter.py`	Export memory-enriched transitions
ARC-AGI-3 runner	`env/arc_agi3_runner.py`	Optional: plug exploration engine v2

4.2 From Phase 2 (existing)

Artifact	Location	Phase 3 use
`compact_scene_dict`	`perception/snapshot.py`	Node features: `num_objects`, shape summaries
Object extractor	`perception/objects.py`	Partial-obs merge / object-stable keys
Transform events	`perception/transforms.py`	Detect “progress” micro-events during navigation
Rule candidates	`perception/rules.py`	Not primary for MiniGrid; useful for ARC-AGI-3 feedback

4.3 Gap (what Phase 3 must add)

New package (proposed): src/asra/exploration/

Module	Responsibility
`exploration_graph.py`	Directed graph with visit counts, frontiers, object-annotated nodes
`visitation_memory.py`	Per-episode and cross-episode visit tables
`novelty.py`	State / edge / object-set novelty scores
`usefulness.py`	Action usefulness from reward + frontier expansion
`subgoals.py`	Parse BabyAI missions; detect milestone states
`replay.py`	Prioritized transition replay buffer
`strategies.py`	Named strategy templates + reuse index
`minigrid_runner.py`	Gymnasium adapter for MiniGrid
`babyai_runner.py`	Instruction-conditioned episodes

5. Datasets

Per roadmap: MiniGrid (primary), BabyAI (compositional / subgoal). Do not mix in PHYRE, Procgen, or biology yet.

5.1 MiniGrid

Role: Controlled partially observable grid navigation with sparse rewards, doors, keys, and layout variation — the standard testbed for exploration and memory in RL, adapted here for explicit graph-based reasoning rather than black-box policy gradients.

Why ASRA needs it:

Capability	MiniGrid teaches
Map building	Agent must infer layout from egocentric views
Partial observability	`MiniGrid--Partial-Obs-` variants
Sparse reward	Reward often only at goal — exploration must be intrinsic
Memory	Remember visited cells / object locations across steps
Navigation planning	Shortest paths, door-key sequences
Transfer	Same policy structure across grid sizes

Recommended environment curriculum (easy → hard):

Stage	Environment(s)	Phase 3 focus
A	`MiniGrid-Empty-8x8-v0`	Coverage, novelty, visit memory
B	`MiniGrid-FourRooms-v0`	Room graph, frontiers
C	`MiniGrid-DoorKey-8x8-v0`	Subgoal: get key → open door → goal
D	`MiniGrid-MultiRoom-N6-v0`	Longer horizons, strategy reuse
E	Partial-obs variants	Belief / aggregated node keys

Acquisition:

pip install minigrid gymnasium

Pin versions in pyproject.toml optional extra [exploration].

Data layout (proposed):

asra-arc/data/minigrid/
  episodes/           # JSONL transitions per env
  graphs/             # exploration graphs per env
  analysis/phase3/    # coverage, steps-to-goal, novelty metrics

ASRA use pattern:

Run N episodes with exploration engine v2 (not pure random).
Log transitions with exploration metadata (novelty, frontier distance, subgoal id).
Build exploration graph; compute coverage % of reachable cells (oracle for Empty/FourRooms).
Compare against Phase 1 SimpleExplorationPolicy on same step budget.

5.2 BabyAI

Role: Compositional language missions over MiniGrid worlds — “go to the red ball”, “pick up the key then go to the door”. Teaches task decomposition and reusable strategy patterns.

Why ASRA needs it:

Capability	BabyAI teaches
Subgoal structure	Missions factor into ordered subtasks
Compositional generalization	New word combinations from known primitives
Strategy reuse	Same “pick up X” across layouts
Instruction grounding	Map text mission → exploration objective (lightweight; no LLM required for Phase 3 baseline)

Recommended progression:

Stage	Setting	Focus
A	`BabyAI-GoToRedBallGrey-v0`	Single subgoal, object-centric target
B	`BabyAI-GoToObjMaze-v0`	Navigation + object identity
C	`BabyAI-PickupLoc-v0`	Pickup subgoal before navigation
D	`BabyAI-UnlockPickup-v0`	Multi-step: unlock → pickup → goto

Acquisition:

pip install babyai
# or minigrid[babyai] depending on release channel — pin in extras

Phase 3 scope on BabyAI:

Parse mission string into subgoal list (rule-based parser for v1; no neural mission encoder required).
Tag transitions with subgoal_index and subgoal_complete events.
Measure subgoal completion rate and steps per subgoal vs Phase 1 baseline.

Explicit non-goal: Natural-language understanding via LLM — Phase 3 uses structured mission parsers aligned with BabyAI’s synthetic grammar.

5.3 ARC-AGI-3 (secondary, integration track)

MiniGrid/BabyAI are the training ground. ARC-AGI-3 remains the north-star interactive benchmark but Phase 3 does not require leaderboard gains yet.

Integration plan (light):

Attach exploration graph builder to logged ARC-AGI-3 episodes.
Use Phase 2 num_objects + hash for dual-key novelty (avoid false novelty from permutation-equivalent grids).
Feed novelty/usefulness into Kaggle agent as Phase 3 hints (parallel to Phase 2 object hints) — target Milestone #1 v5+ / Milestone #2 prep.

Do not block Phase 3 completion on ARC-AGI-3 win rate.

5.4 Dataset ordering (roadmap discipline)

Phase 3 trains on:     MiniGrid → BabyAI
Phase 3 integrates:    ARC-AGI-3 logs (optional parallel)
Phase 3 excludes:      PHYRE, Procgen, Crafter, biology

6. What to build (seven modules + runners)

Roadmap list mapped to concrete ASRA modules.

6.1 Exploration graph

Purpose: Extend Phase 1 StateGraph into an exploration-centric directed graph suitable for coverage analysis and planning prep.

Node schema (proposed):

@dataclass
class ExplorationNode:
    node_id: str                    # primary: state_hash; optional: object_signature_hash
    state_hash: str
    visit_count: int
    first_seen_step: int
    last_seen_step: int
    terminal: bool
    object_summary: dict | None     # compact_scene from Phase 2
    grid_shape: tuple[int, int]
    frontier_score: float           # higher = more unexplored neighbors expected

Edge schema:

@dataclass
class ExplorationEdge:
    from_id: str
    to_id: str
    action: str
    count: int
    avg_reward: float
    avg_novelty_gain: float
    usefulness_score: float
    dead_end: bool

Algorithms:

Ingest transitions incrementally (online) or batch from JSONL.
Mark frontier nodes: visited states with outgoing actions that led to low visit-count successors.
Optional object-signature bucketing: cluster nodes with identical (num_objects, sorted shape_hashes) for ARC-like grids.

Output: exploration_graph.json + optional GraphML.

Extends: asra.memory.state_graph.StateGraph — do not fork unrelated graph code.

6.2 State visitation memory

Purpose: Fast lookup for “have we been here before?” at multiple resolutions.

Layers:

Layer	Key	Use
Exact	`state_hash`	Precise revisit detection
Object	`object_scene_fingerprint`	Soft revisit (Phase 2)
Spatial (MiniGrid)	`(room_id, cell_x, cell_y)` when oracle layout available for eval
Episodic	`(episode_id, step)`	Replay indexing

API (proposed):

class VisitationMemory:
    def observe(self, state_hash: str, step: int, object_scene: dict | None = None) -> None: ...
    def visit_count(self, state_hash: str) -> int: ...
    def is_novel(self, state_hash: str) -> bool: ...
    def recent_window(self, n: int) -> list[str]: ...

Integration: Called from runner after each transition; feeds novelty and policy.

6.3 Novelty score

Purpose: Quantify expected information gain from visiting a state or taking an edge.

Baseline formula (v1):

novelty(s) = 1 / sqrt(1 + visit_count(s))
           + α · 1[object_fingerprint unseen]
           + β · frontier_bonus(s)

Edge novelty:

edge_novelty(s, a) = novelty(s′) · (1 + γ · reward_proxy)
                     − δ · dead_end_penalty(s, a)

Calibration: Tune α, β, γ, δ on MiniGrid-Empty-8x8 to maximize cell coverage in fixed step budget vs Phase 1 policy.

Output: Stored in transition metadata.exploration.novelty for analysis.

6.4 Action usefulness score

Purpose: Separate novelty (information) from utility (progress toward reward/subgoal).

Signals:

Signal	Source
Reward delta	Environment
Frontier expansion	Exploration graph — new node created
Subgoal advance	BabyAI parser
Object delta	Phase 2 `delta_num_objects` (ARC)
Dead-end flag	Phase 1 dead-end detector

Combined score (v1):

usefulness(a | s) = w_r · Δreward + w_f · frontier_gain + w_g · subgoal_progress
                  − w_d · dead_end(s, a)

Used with novelty in Pareto-style action ranking or weighted sum for v1 simplicity.

6.5 Subgoal detector

Purpose: Infer current subgoal and detect subgoal completion without environment oracle (BabyAI) or with oracle for metrics only.

BabyAI (supervised structure):

Parse mission template → ordered subgoals: [GoTo(type=ball, color=red), Done].
Detect completion via environment mission API or observation predicates (agent at target cell, carrying object).

MiniGrid (unsupervised heuristics):

Key acquired → subgoal “has_key”
Door open → subgoal “door_open”
Goal visible → subgoal “see_goal”

ARC-AGI-3 (heuristic v1):

Level counter increase → subgoal “level_progress”
WIN status → terminal subgoal

Output schema:

@dataclass
class SubgoalState:
    subgoal_id: str
    index: int
    description: str
    status: Literal["pending", "active", "completed"]
    entered_at_step: int | None
    completed_at_step: int | None

6.6 Memory replay system v1

Purpose: Reuse high-value transitions for analysis, policy refinement, and debugging — not neural training initially.

Priority queue criteria:

High novelty at first visit
Subgoal boundary crossings
WIN / level-complete transitions
High object-delta (Phase 2)
Low visit count successor

Storage: Ring buffer per environment + spill to JSONL (data/replay/).

API:

class TransitionReplayBuffer:
    def push(self, transition: dict, priority: float) -> None: ...
    def sample(self, k: int) -> list[dict]: ...
    def export(self, path: Path) -> None: ...

Use cases:

Streamlit / notebook replay of “best discoveries”
Batch re-run perception on stored frames
Future: offline RL or imitation (out of scope for v1)

6.7 Strategy reuse mechanism

Purpose: Capture reusable macro-patterns across episodes and environments.

Strategy template (v1):

@dataclass
class StrategyPattern:
    strategy_id: str
    name: str                    # e.g. "door_key_sequence", "frontier_dfs"
    precondition: dict           # object/scene tags or subgoal types
    action_sequence: list[str]   # compressed edge sequence
    success_count: int
    source_env: str

Mechanism:

After successful BabyAI / DoorKey episodes, extract action subgraph from exploration graph.
Index by precondition (e.g. “see closed door + have no key”).
On matching state fingerprint, bias action scores toward first edge of stored sequence (soft reuse, not hard script).

Phase 3 success: Demonstrate one reused strategy across ≥2 episodes in DoorKey-8x8 with reduced steps vs first episode.

7. System architecture

                    ┌─────────────────────────────────────┐
                    │         Environment adapters         │
                    │  MiniGrid │ BabyAI │ ARC-AGI-3      │
                    └─────────────────┬───────────────────┘
                                      │ frames, reward
                                      ▼
                    ┌─────────────────────────────────────┐
                    │   Phase 1 — EpisodeLogger / τ schema   │
                    └─────────────────┬───────────────────┘
                                      │
          ┌───────────────────────────┼───────────────────────────┐
          ▼                           ▼                           ▼
┌─────────────────┐       ┌─────────────────┐       ┌─────────────────┐
│ Phase 2 snapshot│       │ ExplorationGraph │       │ VisitationMemory│
│ (object scene)  │──────▶│ + frontiers      │◀──────│                 │
└─────────────────┘       └────────┬─────────┘       └─────────────────┘
                                   │
                    ┌──────────────┼──────────────┐
                    ▼              ▼              ▼
              NoveltyScore  UsefulnessScore  SubgoalDetector
                    │              │              │
                    └──────────────┼──────────────┘
                                   ▼
                    ┌─────────────────────────────────────┐
                    │   ExplorationPolicyV2 (action select)  │
                    └─────────────────┬───────────────────┘
                                      │
                    ┌─────────────────┴─────────────────┐
                    ▼                                   ▼
           TransitionReplayBuffer              StrategyLibrary

Policy interface (proposed):

class ExplorationPolicyV2:
    name = "exploration_v2"

    def select_action(
        self,
        state_hash: str,
        available_actions: list[str],
        graph: ExplorationGraph,
        memory: VisitationMemory,
        subgoal: SubgoalState | None,
        object_scene: dict | None,
    ) -> dict[str, Any]: ...

8. Transition metadata extension

Phase 3 enriches Phase 1 transitions without breaking schema:

{
  "metadata": {
    "exploration": {
      "novelty": 0.82,
      "usefulness": 0.45,
      "visit_count_before": 2,
      "frontier_node": true,
      "subgoal_id": "pickup_key",
      "subgoal_index": 1,
      "strategy_hint": "door_key_sequence_v1"
    },
    "object_scenes_attached": true
  }
}

Parquet flatten columns (proposed): novelty, usefulness, subgoal_index, visit_count_before.

9. Implementation plan

Milestone 3A — MiniGrid foundation (week 1–2)

Task	Deliverable
`[exploration]` extra in pyproject	minigrid + gymnasium pinned
`MiniGridRunner`	Episodes → JSONL
`ExplorationGraph` + `VisitationMemory`	Unit tests
`NoveltyScore` v1	Coverage eval on Empty-8x8
CLI `run-minigrid`	Batch runner

Exit criteria: Beat Phase 1 random/simple policy on coverage % in 200 steps (Empty-8x8).

Milestone 3B — Useful exploration (week 2–3)

Task	Deliverable
`UsefulnessScore`	Combined ranking
`ExplorationPolicyV2`	Integrated policy
FourRooms + DoorKey eval	`data/analysis/phase3/` reports
Replay buffer	Top-100 transitions export

Exit criteria: DoorKey-8x8 — ≥50% success within step budget (baseline TBD) vs $\leq$ 20% for Phase 1 policy.

Milestone 3C — BabyAI subgoals (week 3–4)

Task	Deliverable
Mission parser	Subgoal list from BabyAI mission string
`SubgoalDetector`	Completion events in logs
`StrategyLibrary` v1	One extracted DoorKey pattern
BabyAI eval harness	Subgoal metrics CSV

Exit criteria: Correct subgoal boundary detection on ≥90% of successful GoToRedBall-class episodes (oracle comparison).

Milestone 3D — ARC-AGI-3 integration (week 4+, optional)

Task	Deliverable
Exploration graph from ARC logs	Per-game graphs
Object-augmented novelty	Dual-key scoring
Kaggle agent v0.5-phase3	Novelty/usefulness hints

Exit criteria: Documented ablation — with vs without Phase 3 hints on fixed seed episodes (not necessarily public leaderboard gain).

10. Success metrics

10.1 MiniGrid (primary)

Metric	Definition	Target (initial)
Coverage	% reachable cells visited	> Phase 1 by ≥20% (Empty-8x8, 200 steps)
Steps to goal	First success episode length	Lower median vs baseline (DoorKey)
Revisit rate	Revisits / total steps	Lower than Phase 1
Frontier efficiency	New nodes per 100 steps	Higher than Phase 1
Graph size	Unique nodes	Correlates with coverage

10.2 BabyAI (secondary)

Metric	Definition	Target
Subgoal detection accuracy	Match oracle milestones	≥90% on success episodes
Steps per subgoal	Mean steps between completions	Decrease vs Phase 1
Strategy reuse gain	Steps saved on 2nd+ success	≥15% reduction

10.3 ARC-AGI-3 (integration)

Metric	Definition	Target
Unique states per episode	Exploration graph nodes	Increase at fixed action budget
Loop count	Repeated hash cycles	Decrease
Levels completed	Competition proxy	Non-regression vs v0.4-phase2

10.4 What Phase 3 metrics are not

ARC Original 800-task rule coverage (Phase 2)
PHYRE physics accuracy (Phase 4)
Milestone #2 win rate (Phase 6)
Biological perturbation prediction (Phase 8)

11. Testing strategy

Layer	Tests
Unit	Novelty monotonicity, visit counts, subgoal parser on fixed strings
Integration	MiniGrid Empty — 50-step episode produces connected graph
Regression	Phase 1 + Phase 2 tests remain green
Eval scripts	`eval_phase3_minigrid.py`, `eval_phase3_babyai.py` (planned)

Fixtures: tiny 4×4 grid worlds in tests/fixtures/minigrid_micro/.

12. Repository layout (proposed)

asra-arc/
  src/asra/exploration/
    __init__.py
    exploration_graph.py
    visitation_memory.py
    novelty.py
    usefulness.py
    subgoals.py
    replay.py
    strategies.py
    policy_v2.py
    minigrid_runner.py
    babyai_runner.py
  scripts/
    run_phase3_minigrid_batch.py
    eval_phase3_minigrid.py
    eval_phase3_babyai.py
  tests/
    test_exploration_graph.py
    test_novelty.py
    test_subgoal_parser.py
  data/
    minigrid/
    analysis/phase3/

kaggle-notebooks/phase3/
  phase3-exploration-memory-navigation.md   # this file
  README.md
  # future: asra-phase-3-minigrid-notebook.ipynb

CLI additions (planned):

python -m asra run-minigrid --env ... --episodes ...
python -m asra run-babyai --env ... --episodes ...
python -m asra build-exploration-graph --input-dir ...

13. Relationship to Kaggle / competition agents

Phase 3 is research infrastructure first, competition second.

Agent version	Phase 3 features
`asra-v0.4-phase2`	Object-scene hints only
`asra-v0.5-phase3`	+ novelty/usefulness, visit memory, loop penalty (`CompactExplorationHints`)

Kaggle notebook: https://github.com/ilakkmanoharan/asra/blob/main/kaggle-notebooks/phase3/asra-phase-3-arc-prize-2026.ipynb — compact subset (visit counts, frontier bonus, Phase 2 object hints) without shipping the full asra-arc library to the competition kernel.

14. Risks and mitigations

Risk	Mitigation
MiniGrid API drift (Farama)	Pin versions; wrapper adapter
Object-hash false merges	Keep hash-primary; object as secondary bonus only
BabyAI mission parser fragility	Start with env API ground truth for eval; parser for logging
Scope creep into Phase 4 semantics	Freeze v1 scores to graph + reward + subgoals only
ARC-AGI-3 gains too small to measure	Accept; MiniGrid is source of truth for Phase 3 done

15. Open questions

Belief states for partial observability — explicit belief vector vs aggregated graph nodes?
Object-graph memory — full Phase 2 object graph per node vs compact fingerprint?
Cross-env strategy transfer — MiniGrid DoorKey → ARC-AGI-3 analog?
Neural components — Phase 3 stays symbolic/heuristic unless metrics plateau.

16. Summary deliverables (Phase 3 complete)

Output	Description
ASRA exploration engine v1	`ExplorationPolicyV2` + graph + scores
Memory system v1	`VisitationMemory` + `TransitionReplayBuffer`
Subgoal inference module	Parser + detector + BabyAI eval
Strategy reuse	`StrategyLibrary` with ≥1 demonstrated reuse
MiniGrid benchmark report	Coverage, steps, revisit metrics
BabyAI subgoal report	Detection accuracy, steps per subgoal
Docs	This spec + updated `asra-arc/README.md`

Reference notebook (GitHub)

ASRA Phase 3 — ARC Prize 2026 (ASRA repository)

17. References

Ilakkuvaselvi Manoharan. Directed Exploration and Episodic Memory: ASRA Phase 3. https://sci-layer.vercel.app/articles/directed-exploration-episodic-memory-asra-phase-3
Ilakkuvaselvi Manoharan. Object-Centric Adaptive Reasoning: ASRA Phase 2. https://sci-layer.vercel.app/articles/object-centric-adaptive-reasoning-asra-phase-2
Ilakkuvaselvi Manoharan. Transition-Centric Adaptive Reasoning: ASRA Phase 1. https://sci-layer.vercel.app/articles/transition-centric-adaptive-reasoning-asra-phase-1
Ilakkuvaselvi Manoharan. ASRA repository. https://github.com/ilakkmanoharan/asra
Phase 3 exploration modules — https://github.com/ilakkmanoharan/asra/tree/main/asra-arc/src/asra/exploration
Farama MiniGrid — https://minigrid.farama.org/
Chevalier-Boisvert et al. BabyAI. https://arxiv.org/abs/1810.08272

Phase 3 specification — exploration, memory, and navigation. Implementation complete (Milestones 3A–3D in asra-arc/src/asra/exploration/).

Correspondence: ilakkmanoharan@gmail.com

ASRA Phase 3 — Exploration, Memory, and Navigation (Technical Specification)