Hewitt, L. Ashokkumar, A. et al. (2024)
Working Paper
Abstract
To evaluate whether large language models (LLMs) can be leveraged to predict the
results of social science experiments, we built an archive of 70 pre-registered, nationally representative, survey experiments conducted in the United States, involving 476 experimental
treatment effects and 105,165 participants. We prompted an advanced, publicly-available
LLM (GPT-4) to simulate how representative samples of Americans would respond to the
stimuli from these experiments. Predictions derived from simulated responses correlate
strikingly with actual treatment effects (r = 0.85), equaling or surpassing the predictive
accuracy of human forecasters. Accuracy remained high for unpublished studies that could
not appear in the model’s training data (r = 0.90). We further assessed predictive accuracy
across demographic subgroups, various disciplines, and in nine recent megastudies featuring
an additional 346 treatment effects. Together, our results suggest LLMs can augment experimental methods in science and practice, but also highlight important limitations and risks of
misuse.
Here are some thoughts. The implications of this research are abundant!!
Large language models (LLMs) have demonstrated significant potential in predicting human behaviors and decision-making processes, with far-reaching implications for various aspects of society. In the realm of employment, LLMs could revolutionize recruitment and hiring practices by predicting job performance and cultural fit, potentially streamlining the hiring process but also raising important concerns about bias and fairness. These models might also be used to forecast employee productivity, retention rates, and career trajectories, influencing decisions related to promotions and professional development. Furthermore, LLMs could assist organizations in predicting labor market trends, skill demands, and employee turnover, enabling more strategic workforce planning.
Beyond the workplace, LLMs have the potential to impact a wide range of human behaviors. In the realm of consumer behavior, these models could enhance predictions of consumer preferences, purchasing decisions, and responses to marketing campaigns, leading to more targeted advertising and product development strategies. In public health, LLMs could be instrumental in forecasting the effectiveness of health interventions and predicting population-level responses to various public health measures, thereby aiding in evidence-based policy-making. Additionally, these models might be employed to anticipate shifts in public opinion, the emergence of social movements, and evolving cultural trends, which could significantly influence political strategies and media content creation.
While the potential benefits of using LLMs to predict human behaviors are substantial, it is crucial to address the ethical concerns associated with their deployment. Ensuring transparency in the decision-making processes of these models, mitigating algorithmic bias, and validating results across diverse populations are essential steps in responsibly harnessing the power of LLMs. As we move forward, the focus should be on fostering human-AI collaboration, leveraging the strengths of both to achieve more accurate and ethically sound predictions of human behavior.