arXiv 2507.21053
Flow Matching Policy Gradients
By David McAllister, Songwei Ge, et al.
Published 2025-07-28
Wiki summary
Explore the paper's summary, context, and related research on Papiers.
Flow-based generative models, including diffusion models, excel at modeling continuous distributions in high-dimensional spaces. In this work, we introduce Flow Policy Optimization (FPO), a simple on-policy reinforcement learning algorithm that brings flow matching into the policy gradient framework. FPO casts policy optimization as maximizing an advantage-weighted ratio computed from the conditional flow matching lā¦