Updates¶
2025-05-06¶
New Features¶
- We have added support for Bi-level GAE. This enhances model performance by relatively 10% (21% → 23.4%) in Sokoban tasks. We may set it as a default configuration later after large-scale verification.
To turn on Bi-level GAE, simply passing algorithm.bi_level_gae to the code:
Evaluation Tools¶
- We have added instructions for running API-based LLMs for evaluation.
Please run:
Or optionally adding configs:
And you are expected to see results similar to:
rollout time: 89.1015682220459 seconds
rollout rewards: 2.3011717796325684
metrics:
SimpleSokoban/success: 0.27734375
Running local LLMs for evaluation is similar:
Or:
2025-05-04¶
Dependencies¶
- We have updated
verlto the latest version. - Commit: 1e47e412a441bae8cd1152888f6822871f95dec5
- Date: Sun May 4 19:07:22 2025 +0800
Code Improvements¶
- We have further updated the vllm LoRA implementation to be aligned with verl PR #1127
- Added config validation to forbid users from setting up wrong configs
Note: In the current version, LoRA rollout could be slow, and it is normal since it requires less memory.
2025-05-02¶
Configuration Updates¶
- In the RAGEN paper, we did not set
enable_response_mask. We find enabling response mask could improve stability of rollout/old_log_prob, as P(st|s0, aT0, r0...) are no longer calculated here. In the updated version, we set the default value ofenable_response_masktoTrue.