π AGENT2WORLD: Enhancing Symbolic World Models via Adaptive Multi-Agent Feedback π
Introduction
Agent2World is a cutting-edge framework designed to generate executable symbolic world models from natural language task descriptions. The model specializes in task-specific understanding and reasoning to generate environment code, such as PDDL domains and runnable simulators. By incorporating adaptive multi-agent feedback, Agent2World is able to refine its generated outputs based on real-time feedback, improving the accuracy of the generated world models and ensuring behavior-level correctness.
Key Model Improvements
Agent2World achieves notable enhancements in its ability to generate world models through the following capabilities:
Task Understanding and Code Generation:The model is trained to understand task descriptions and generate corresponding environment code, such as PDDL domains and interactive simulations. This process involves reasoning over task details to produce code that accurately reflects the requirements of the given task.
Multi-Agent Feedback for Code Correction:Agent2World uses multi-agent feedback to enhance its task execution performance. By allowing agents to provide iterative corrections based on generated code, the model can adjust and improve its outputs in real-time, increasing code accuracy and behavior alignment.
Execution-Based Refinement:The model also integrates execution-based feedback, which helps it refine the generated code during execution. This continuous improvement process ensures that the generated models are not only syntactically correct but also functional in real-world applications.
Supervised Fine-Tuning:Through supervised fine-tuning (SFT) on task-specific trajectories and the provided feedback, the model improves its ability to generate code accurately. This fine-tuning process results in a 30.95% improvement in execution accuracy.
π Benchmarks
Agent2World has been evaluated on several benchmarks to showcase its performance improvements:
Training Setup
Hardware Configuration
Training Devices:
- Llama-3.1-8B-Instruct model training was conducted on an 8xA100 GPU server.
- GPT-4.1-mini experiments were run on a CPU server without GPU acceleration.
Hyperparameters
- Learning Rate: (1 \times 10^{-6})
- Epochs: 5 epochs
- Optimizer: AdamW
- Batch Size: Optimized based on GPU capacity
Dataset Details
- Training Dataset: The model was trained on a diverse set of world model specifications, including PDDL, MuJoCo-style environments, text games, and MCP-style tool environments.
- Data Generation and Filtering: A total of 2,400 candidate interaction trajectories were generated, filtered down to 1,526 validated trajectories using verifier-guided rejection sampling.
License Agreement π
Agent2World is licensed under the Apache 2.0 License. Please see the LICENSE file for more details.
Citation π
If you use Agent2World in your research, please cite:
- Downloads last month
- 38
Model tree for agent2world/llama3.1_8b_instruct_full_sft_v1_3_epoch
Base model
meta-llama/Llama-3.1-8B