Tencent logo
Tencent
Open Weights

SRPO

Released Sep 2025

AA Text→Image
#60
Parameters12B

SRPO (Semantic-Relative Preference Optimization) is a text-to-image alignment framework and model series developed by the Tencent Hunyuan Team. Primarily released as a refined version of the FLUX.1 [dev] architecture, the model is designed to enhance visual realism, aesthetic appeal, and prompt adherence. It addresses common challenges in aligning diffusion models with human preferences, such as "reward hacking," where a model achieves high reward scores by producing over-saturated or distorted images.

The model's core innovation is the Direct-Align technology, which enables the recovery of clean images from any noisy time step using predefined noise priors. This allows the framework to optimize the entire diffusion trajectory rather than focusing solely on the final denoising steps, leading to more stable and consistent training. Additionally, SRPO treats reward signals as text-conditioned indicators, facilitating online adjustment of preferences through positive and negative prompt variations.

Technically, the model utilizes a 12-billion-parameter flow-based transformer architecture. It is noted for its high training efficiency, reportedly reaching optimal convergence significantly faster than previous reinforcement learning methods. In performance evaluations, SRPO has achieved leading positions on open-source text-to-image leaderboards, demonstrating high structural coherence and superior semantic alignment across a wide range of creative prompts.

Rankings & Comparison