SciLayer

Adaptive Scientific Discovery Benchmark (ASDB): A Two-Track Framework for Evaluating Interactive Agents

Most agent benchmarks assume documented tool semantics and static ground-truth answers. Real scientific inquiry requires agents to learn what interventions do from state transitions, then infer hidden mechanisms, design discriminating experiments, and predict held-out observables. ASDB unifies two complementary tracks: Action Semantics Discovery (inferring an action map φ̂(a) from unlabeled controls) and Scientific Discovery Evaluation (recovering hidden theory classes under an intervention budget). Both share one interaction loop but score different constructs. Linked A→B episodes, decoy falsification, tiered difficulty, and decomposable metrics aim at construct validity for adaptive scientific reasoning evaluation.

Architectures for Adaptive Scientific Reasoning Under Uncertainty

Scientific intelligence increasingly depends on systems that reason from interventions rather than merely fit observations. This review synthesizes conceptual foundations from model-based reinforcement learning, active inference, causal inference, information theory, and perturbation biology into a unified architecture-level view of adaptive scientific reasoning under uncertainty.