References

Training language models to follow instructions with human feedback

Long Ouyang, Jeffrey Wu, Xu Jiang, Diogo Almeida, Carroll L. Wainwright, Pamela Mishkin, Chong Zhang, Sandhini Agarwal, Katarina Slama, Alex Ray, John Schulman, Jacob Hilton, Fraser Kelton, Luke Miller, Maddie Simens, Amanda Askell, Peter Welinder, Paul Christiano, Jan Leike, & Ryan Lowe (2022)

Advances in Neural Information Processing Systems, 35, 27730-27744.

URL: https://papers.nips.cc/paper_files/paper/2022/hash/b1efde53be364a73914f58805a001731-Abstract-Conference.html

Abstract. Ouyang et al.'s OpenAI RLHF paper — the foundational technical reference for how modern LLMs are aligned to human preferences through reinforcement learning. Increasingly cited in HCI work addressing what alignment-via-RLHF means for UX.

Tags: ai-usability rlhf alignment llm foundational

This site is currently in Beta. Contact: Chris Paton

Textbook of AI · Textbook of Digital Health

Auckland Maths and Science Tutoring