arxiv.org/pdf/2203….

Training language models to follow instructions with human feedback

Taiju Muto @tai2