PARM: Multi-Objective Test-Time Alignment via Preference-Aware Autoregressive Reward Model

Jan 1, 2025·
Baijiong Lin
,
Weisen Jiang
,
Yuancheng Xu
,
Hao Chen
Ying-Cong Chen
Ying-Cong Chen
· 0 min read
Type
Publication
Proceedings of the International Conference on Machine Learning (ICML)