Computer and Information SciencesMedium

Stories by Research Graph on Medium

Stories by Research Graph on Medium
Stories by Research Graph on Medium
Home PageRSS Feed
language
Published
Author Xuzeng He

Supervised Fine-tuning, Reinforcement Learning from Human Feedback and the latest SteerLM Author · Xuzeng He ( ORCID: 0009–0005–7317–7426) Introduction Large Language Models (LLMs), usually trained with extensive text data, can demonstrate remarkable capabilities in handling various tasks with state-of-the-art performance. However, people nowadays typically want something more personalised instead of a general solution.