Forecasting LLM-enabled biorisk and the efficacy of safeguards

Capabilities of large language models (LLMs) on several biological benchmarks have prompted excitement about their usefulness for beneficial research, but also concern about potential biosecurity risks. We recruited 46 subject-matter experts in biology and biosecurity, and 22 generalist forecasters to estimate the risks of growing LLM capabilities. The median expert predicted a 0.3% baseline annual risk of a human-caused epidemic that causes 100,000 deaths. This estimate then rose to 1.5% conditional on several hypothetical LLM capabilities, including matching the performance of a top performing team of virologists on a virology troubleshooting test. Given this finding, we conducted a baselining study and found that LLMs have already crossed this performance threshold. The median respondent thought that this would not happen until after 2030. More encouragingly, experts reduced their risk forecast close to baseline (0.4%) conditional on the adoption of LLM safeguards and mandatory nucleic acid screening.

Read paper

Theme

AI and Biosecurity

Date

July 1, 2025

author

s

Bridget Williams, Luca Righetti, Josh Rosenberg, Rebecca Ceppas de Castro, Otto Kuusela, Rhiannon Britt, Emily Soice, Alvaro Morales, Jon Sanders, Seth Donoughe, James Black, Ezra Karger, Philip E. Tetlock

Forecasting LLM-enabled biorisk and the efficacy of safeguards

Theme

Date

author

s

Share

Research Summary

Footnotes

Further reading

AI and Biosecurity