arXiv 2507.21053

Flow Matching Policy Gradients

By David McAllister, Songwei Ge, et al.

Published 2025-07-28

Wiki summary

Explore the paper's summary, context, and related research on Papiers.

Flow-based generative models, including diffusion models, excel at modeling continuous distributions in high-dimensional spaces. In this work, we introduce Flow Policy Optimization (FPO), a simple on-policy reinforcement learning algorithm that brings flow matching into the policy gradient framework. FPO casts policy optimization as maximizing an advantage-weighted ratio computed from the conditional flow matching l…

View the original paper on arXiv