SciLayer

Phase 6 adds the Planning & Strategy Invention layer: BFS/A* and MCTS-lite over observed transitions, a strategy library mapping goal templates to operator sequences, meta explore-exploit control, and plan repair—building on Phases 1–5. The article presents theory and architecture; a companion Kaggle notebook deploys PlanningEngine hints for ARC Prize 2026.

Abstract

Phases 1–5 of the Adaptive Scientific Reasoning Architecture (ASRA) established transition logging, object-centric observation, directed exploration, action semantics, and goal hypothesis ranking. Phase 5 answers what the environment is trying to achieve; it does not yet answer how to compose multiple actions into a coherent pursuit of that objective.

We describe ASRA Phase 6 as the Planning & Strategy Invention layer: BFS and A* planners over observed transition graphs, MCTS-lite rollouts when graphs are sparse, a reusable strategy library mapping goal templates to operator sequences, a meta-controller for explore-exploit balance, and reset and plan repair mechanisms for recovery. The competition agent embeds a compact PlanningEngine atop Phase 5's GoalHypothesisEngine; the full research stack lives in asra-arc/src/asra/planning/.

This article presents the theory, architectural decomposition, and design principles. Phase 6 is the pivot from scientific inquiry toward Milestone #2 — a competition agent that plans, not only scores single actions.

1. The architectural gap Phase 6 closes

ASRA's cumulative cognitive stack:

Phase 1   Experience Engine           — transitions, hashes, cell diffs
Phase 2   Observation Engine          — objects, transforms, rule hypotheses
Phase 3   Navigation & Memory         — exploration graph, visitation, subgoals
Phase 4   Semantics & Causal Inference — action meaning, prediction, counterfactuals
Phase 5   Goal Inference & Hypotheses   — win conditions, progress, experiment design
Phase 6   Planning & Strategy Invention — multi-step plans, strategies, meta-control
Phase 7+  Robustness, Decision Biology, final submission

Phase 5 asks: What are we trying to accomplish?
Phase 6 asks: Given that belief, what sequence of actions should we execute — and when should we abandon the plan?

Without Phase 6, an agent with a leading goal hypothesis still selects actions myopically: each step is scored independently, with no commitment to multi-step structure. Human problem-solving in unknown environments — and biological experiment design — routinely requires sequencing: reach before collect, unlock before traverse, transform before match.

flowchart LR
  subgraph P5["Phase 5 — Goals"]
    GH[Goal hypotheses]
    HR[Hypothesis ranker]
  end
  subgraph P6["Phase 6 — Planning"]
    SL[Strategy library]
    BFS[BFS / A* planner]
    MCTS[MCTS-lite]
    MC[Meta-controller]
    PR[Plan repair]
  end
  subgraph Out["Output"]
    ACT[Action sequence]
  end
  GH --> SL
  HR --> SL
  SL --> BFS
  SL --> MCTS
  BFS --> MC
  MCTS --> MC
  MC --> ACT
  ACT --> PR
  PR --> BFS

2. Theoretical stance: plans as commitments under uncertainty

Interactive environments provide partial transition models: the agent knows only edges it has observed. ASRA Phase 6 does not assume a full world model. Instead, planning operates on three information sources:

Observed graph — Phase 1 state graph and Phase 3 exploration graph.
Semantic predictions — Phase 4 transition model for unobserved edges (low weight).
Goal-conditioned strategies — Phase 5 leading hypothesis maps to operator preferences.

The epistemic object is a plan:

π = (strategy, steps=[(a₁, ŝ₁), …, (aₖ, ŝₖ)], mode, success)

Plans are conditional commitments: they execute until progress stalls, a step fails, or the meta-controller shifts to explore mode. This mirrors experimental protocols in biology: a perturbation sequence is followed until the response contradicts the pathway hypothesis, then the protocol is repaired.

Paradigm	Phase 6 stance
Full RL policy over actions	Deferred — v1 uses symbolic plans
Hand-coded level solutions	Rejected — plans built from observed transitions
LLM chain-of-thought planning	Deferred — no instruction channel in ARC-AGI-3
MCTS with learned value net	Deferred — v1 uses semantic rollouts
BFS over logged transitions	Adopted — primary planner

3. Strategy library as reusable scientific protocols

Phase 6 introduces a strategy library — explicit mappings from Phase 5 goal templates to preferred semantic operators:

Strategy	Goal template	Scientific reading
`reach_target`	`move_to_target`	Navigate to target region in state space
`collect`	`collect_tokens`	Aggregate or remove discrete markers
`align`	spatial subgoals	Minimize structural misalignment
`avoid`	`avoid_hazard`	Constrained navigation
`unlock`	`unlock_passage`	Enable latent pathway
`transform`	`match_pattern`, `transform_to_goal`	Mechanism application
`sequence`	multi-step compositions	Protocol with ordered stages
`explore`	weak goals	Discovery mode

Strategies are not hardcoded solutions. They are bias functions over Phase 4 semantics that guide BFS edge preference and MCTS rollouts. The same strategy (reach_target) applies across games with different grid layouts — analogous to the same experimental protocol (e.g., dose escalation) across cell lines with different response curves.

4. Planners: BFS, A*, and MCTS-lite

4.1 BFS over observed transitions

When the agent has explored sufficiently, BFSPlanner finds shortest paths from the current state hash to known WIN-adjacent or high-reward states. Depth is capped (v1: 6) for competition latency. BFS is sound on observed edges: if a path exists in the log, the planner finds it.

4.2 A* with semantic heuristic

When multiple paths exist, AStarPlanner breaks ties using:

f(s) = g(s) + h(s)
h(s) = α · semantic_alignment(strategy, s) + β · graph_distance(s, goal_region)

This biases search toward states where Phase 4 semantics match the active strategy.

4.3 MCTS-lite for sparse graphs

When BFS fails (success=False), MCTSPlannerLite performs lightweight rollouts:

Match goal template → strategy.
Score each candidate action by strategy–semantic alignment + Phase 4 confidence.
Select best action; replan next step.

MCTS-lite is single-step in v1 — not full tree search — preserving Kaggle runtime bounds.

5. Meta-control: explore, exploit, and plan

Phase 5 experiment planning maximizes hypothesis discrimination. Phase 6 planning maximizes goal pursuit. These objectives conflict early in episodes when goals are uncertain.

The MetaController resolves this with three modes:

Mode	Condition	Effect
`explore`	Low visitation or high uncertainty	Down-weight plan; up-weight novelty
`exploit`	High goal confidence	Up-weight plan and goal alignment
`balanced`	Default	Equal weight blend

w_explore, w_goal, w_plan = blend(mode)
score(a) = w_explore · novelty + w_goal · goal(a) + w_plan · plan(a) + w_sem · sem(a)

This is the game analog of adaptive experimentation: explore broadly until the objective is sufficiently identified, then commit to a perturbation protocol.

6. Reset and plan repair

Plans fail. ASRA Phase 6 treats failure as first-class:

Plan repair (PlanRepairSystem):

Remove failed action from remaining steps.
Re-run BFS from current state.
Fall back to MCTS-lite if repair fails.

Reset policy (ResetPolicy):

Trigger when stuck counter ≥ 5 or action budget exhausted.
Clears plan cache but preserves goal hypotheses — the objective belief survives; only execution restarts.

Without repair, planners would cause oscillation: repeating failed edges until episode timeout. Phase 7 extends stuck detection; Phase 6 provides the recovery mechanism.

7. Closing the loop with Phases 1–5

Layer	Phase 6 consumption
Phase 1 transitions	BFS edge table
Phase 2 scenes	Plan preconditions, alignment checks
Phase 3 exploration graph	BFS frontier; novelty for meta-controller
Phase 3 subgoals	Plan milestones
Phase 4 semantics	MCTS scoring; A* heuristic
Phase 4 uncertainty	Meta-controller explore trigger
Phase 5 leading hypothesis	Strategy selection, plan objective
Phase 5 experiment planner	Fallback when goal confidence < threshold

Kaggle agent scoring (embedded):

score(action) = Phase1–5_terms + PLAN_WEIGHT · plan_step_match + STRAT_WEIGHT · strategy_score

Reasoning strings:

ASRA Phase6: ACTION3 | sem=translate conf=0.81 | goal=move_to_target | strat=reach_target | plan=bfs:2/4 | mode=exploit

8. Empirical landscape

Phase 6 metrics target competition performance (Milestone #2):

Benchmark	Metric
ARC-AGI-3	Win rate; actions to win; plan usage rate
MiniGrid	Path length vs optimal; unlock sequences
Procgen	Plan success on held-out seeds
Crafter	Long-horizon `sequence` strategy survival

Phase 6 claims Milestone #2 — the first phase where competition win rate is the primary success criterion.

9. Architecture: library and embedded engine

Research library (asra-arc/src/asra/planning/):

schemas.py           — Plan, PlanStep, Strategy
bfs_planner.py       — BFS over transition graph
mcts_planner.py      — MCTSPlannerLite
strategy_library.py  — Goal template → strategy mapping
meta_controller.py   — ExploreExploitMode, ResetPolicy, PlanRepairSystem
policy_v5.py         — PlanningExplorationPolicyV5

Kaggle embedded engine (asra_phase6_my_agent.py):

PlanningEngine — compact BFS + MCTS-lite + meta-controller
No external imports; self-contained for sandbox
Composes atop embedded Phase 5 GoalHypothesisEngine

flowchart TB
  subgraph Embedded["Kaggle agent"]
    GE[GoalHypothesisEngine]
    PE[PlanningEngine]
    POL[Policy v5]
  end
  subgraph Library["asra-arc/planning/"]
    FULL[Full planners + adapters]
  end
  GE --> PE --> POL
  FULL -.->|specification| PE

10. Agent integration

Version	Tag	Layer added
Phase 5	`asra-v0.7-phase5`	Goal hypotheses
Phase 6	`asra-v0.8-phase6`	Planning, strategies, meta-control

Package: kaggle-notebooks/phase6/

Build:

cd kaggle-notebooks/phase6
python3 build_phase6_kaggle_notebook.py
python3 asra_phase6_my_agent.py --self-test

The notebook writes my_agent.py, self-tests perception + exploration + causality + goals + planning, emits validation parquet.

11. Position in the ASRA research program

Question	Phase 5	Phase 6
Why try action a?	Goal alignment, discrimination	+ Plan step, strategy
Unit of task memory	Ranked hypotheses	+ Active plan
What is success?	Inferred win condition	+ Action sequence toward win
Bridge to biology	Latent objective	Perturbation protocol sequencing

From the Decision Biology roadmap:

pathway hypothesis  →  perturbation protocol  →  measured response
goal hypothesis     →  action plan            →  observed progress

Phase 6 is where objectives become procedures — the direct precursor to Phase 8 perturbation sequencing on LINCS data.

12. Open problems and next steps

Robustness (Phase 7) — stuck detection, generalization suite, action waste.
Learned value functions — when semantic rollouts saturate.
Hierarchical planning — Crafter-scale goal decomposition.
Cross-game plan transfer — strategy library priors from Original ARC.
Decision Biology (Phase 8) — pathway protocol planner on OmniPath graphs.

13. Conclusion

ASRA Phase 6 transforms goal beliefs into action sequences: the strategy library indexes reusable protocols; BFS and MCTS-lite search over partial transition knowledge; the meta-controller balances exploration, discrimination, and execution; reset and plan repair prevent commitment traps.

The Phase 6 Kaggle extension is not a new agent philosophy — it is Phases 1–5 + commitment. Goals describe why actions matter; plans describe how to chain them.

Transition-centric adaptive reasoning remains the core; planning is how inferred objectives become operational — the prerequisite for robustness tuning, final submission, and biological perturbation protocols in later phases.

Reference notebook (GitHub & Kaggle)

Interactive companion with Phases 2–5 stacks plus Phase 6 planning hints (PlanningEngine, strategy library, meta-controller):

References

Sutton, R. S. & Barto, A. G. Reinforcement Learning: An Introduction (conceptual lineage for explore-exploit).
Pearl, J. Causality: Models, Reasoning, and Inference. Cambridge University Press (intervention sequencing).
Ilakkuvaselvi Manoharan. Transition-Centric Adaptive Reasoning: ASRA Phase 1. https://sci-layer.vercel.app/articles/transition-centric-adaptive-reasoning-asra-phase-1
Ilakkuvaselvi Manoharan. Object-Centric Adaptive Reasoning: ASRA Phase 2. https://sci-layer.vercel.app/articles/object-centric-adaptive-reasoning-asra-phase-2
Ilakkuvaselvi Manoharan. Directed Exploration and Episodic Memory: ASRA Phase 3. https://sci-layer.vercel.app/articles/directed-exploration-episodic-memory-asra-phase-3
Ilakkuvaselvi Manoharan. Causal Action Semantics: ASRA Phase 4. https://sci-layer.vercel.app/articles/causal-action-semantics-asra-phase-4
Ilakkuvaselvi Manoharan. Goal Inference and Hypothesis Ranking: ASRA Phase 5. https://sci-layer.vercel.app/articles/goal-inference-hypothesis-ranking-asra-phase-5
Phase 6 planning implementation — https://github.com/ilakkmanoharan/asra/tree/main/asra-arc/src/asra/planning

Related: ASRA Phase 5 · ASRA Phase 4 · Decision Biology · Nature Foundation Models

Correspondence: ilakkmanoharan@gmail.com

Planning and Strategy Invention: ASRA Phase 6 — From Goals to Action Sequences