Original title: Using Human Feedback to Fine-tune Diffusion Models without Any Reward Model
Authors: Kai Yang, Jian Tao, Jiafei Lyu, Chunjiang Ge, Jiaxin Chen, Qimai Li, Weihan Shen, Xiaolong Zhu, Xiu Li
In the world of refining diffusion models, the path to perfection often involves reinforcement learning with human feedback (RLHF). However, existing methods demand intricate reward models aligned with human preferences, a costly and time-consuming endeavor.
Original article: https://arxiv.org/abs/2311.13231