Agent policies from higher-order causal functions
AI Breakdown
Get a structured breakdown of this paper — what it's about, the core idea, and key takeaways for the field.
Abstract
We establish a correspondence between equivalence classes of agent-state policies for deterministic POMDPs and one-input process functions (the classical-deterministic limit of higher-order quantum operations). We use this correspondence to build a bridge between the agent-environment interaction in artificial intelligence, causal structure in the foundations of physics, and logic in computer science. We construct a *-autonomous category PF of types which supports an interpretation of one-step evaluation of policies, and multi-agent observation constraints, into cuts and monoidal products. In terms of types, we develop the correspondence further by identifying observation-independent decentralised POMDPs as the natural domain for the multi-input process functions used to model indefinite causality. We then prove a strict separation between general multi-input process function and definite-ordered process function performance on such dec-POMDPs, by finding an instance for which policies utilizing an indefinite causal structure can achieve greater finite-horizon rewards than policies which are restricted to a fixed background causal structure.