第41回ワークショップ:Replication for Language Models: Problems, Principles, and Best Practices for Political Science
報告者:Arthur Spirling [Princeton University]
日時:2025年6月26日(木)15:00~16:40
場所:東京大学本郷キャンパス社会科学研究所1F 第二会議室(104号室)
https://www.iss.u-tokyo.ac.jp/guide/
言語:英語
開催形式:対面
対象:一般公開
報告要旨: Large Language Models (LMs) are exciting tools: they require minimal researcher input and but make it possible to annotate and generate large quantities of data. Yet there has been almost no systematic research into the reproducibility of research using LMs. This is a potential problem for scientific integrity. We give a theoretical framework for replication in the discipline and show that LM work is perhaps uniquely problematic. We demonstrate the problem empirically using a rolling iterated replication design in which we compare crowdsourcing and LMs on multiple repeated tasks, over many months. We find that LMs can be accurate, but the observed variance in performance is often unacceptably high. Strict "temperature" control does not resolve these issues. This affects downstream results. In many cases the LM findings cannot be re-run, let alone replicated. We conclude with recommendations for best practice, including the use of locally versioned 'open' LMs.