AILo Talk - Daniel Herrmann, University of Groningen
When: | We 10-04-2024 16:00 - 18:00 |
Where: | 5161.0162 Bernoulliborg |
Title: Measuring Beliefs in LLMs
Abstract:
Large language models (LLMs) such as ChatGPT have been doing remarkable things: they can write code, summarize challenging text, role play as different characters, and even play games of strategy like chess at a reasonable level. These recent achievements have prompted computer scientists to try to understand how they are able to accomplish these feats. In particular, they have been developing methods that aim to read things like belief off of both the internal computations and the behavior of an LLM. I will evaluate two popular belief extraction methods. I will provide empirical results that show these methods fail to generalize in very basic and desirable ways, and argue that we should think such methods are unlikely to be successful. Finally, building on this case study, and informed by a striking resemblance between the project of measuring the beliefs of LLMs and the project of measuring the beliefs of humans as carried out in decision theory and formal epistemology, I will discuss some paths forward for a positive, philosophically informed foundation of machine learning interpretability.