arXiv 2409.05283

On the Relationship between Truth and Political Bias in Language Models

By Suyash Fulay, William Brannon, et al.

Published 2024-09-09

Discussion

Read the public discussion and references gathered around this paper.

Language model alignment research often attempts to ensure that models are not only helpful and harmless, but also truthful and unbiased. However, optimizing these objectives simultaneously can obscure how improving one aspect might impact the others. In this work, we focus on analyzing the relationship between two concepts essential in both language model alignment and political science: truthfulness and political…

View the original paper on arXiv