๐ TAPE: Tool-Guided Adaptive Planning and Constrained Execution in Language Model Agents
๐ฅ Authors: Jongwon Jeong, Jungtaek Kim, Kangwook Lee
๐๏ธ Institution: University of Wisconsin-Madison
๐ Published: February 23, 2026
๐ฅ Upvotes: 4
๐ฏ What This Research Is About
Language model agents are impressive at completing complex tasks, but they have a critical weakness: they fail catastrophically when even a single mistake occurs in environments with strict constraints. Think of it like walking on a tightropeโone wrong step and the entire mission fails.
The researchers identified two main culprits: imperfect planning (choosing the wrong sequence of actions) and stochastic execution (random errors during execution). TAPE addresses both issues head-on.
๐ก Why This Matters
- Massive Performance Gains: TAPE improves success rates by 21.0 percentage points on challenging tasks and 20.0 points for weaker modelsโa dramatic improvement in AI agent reliability.
- Real-World Applications: This breakthrough is crucial for deploying AI agents in high-stakes environments like robotics, automated customer service, and complex workflow automation where errors can be costly.
- Smarter Planning: Instead of following a single plan blindly, TAPE creates multiple plans, combines them into a graph, and uses an external solver to find the most feasible path forward.
- Error Recovery: When things don't go as planned, TAPE detects deviations and adaptively re-plans on the fly, making agents much more resilient.
- Reduced Noise: Through constrained decoding during execution, TAPE minimizes random errors that plague traditional approaches.
๐ฌ The Technical Innovation
TAPE introduces three key innovations:
- Graph-Based Multi-Plan Aggregation: Combines multiple candidate plans into a unified graph structure
- External Solver Integration: Uses formal verification to identify feasible execution paths
- Adaptive Re-Planning: Monitors environmental feedback and adjusts the plan when reality diverges from expectations
๐ Proven Results
The framework was tested across multiple challenging benchmarks including Sokoban (puzzle-solving), ALFWorld (household tasks), MuSiQue (multi-hop reasoning), and GSM8K-Hard (mathematical reasoning). TAPE consistently outperformed existing frameworks, especially on the hardest problems where traditional agents struggled most.
Curated from Hugging Face daily papers | Research from University of Wisconsin-Madison