Sharing Powerful AI Models

Increasingly, AI labs face dilemmas when deciding how to share their models. The emerging paradigm of structured access suggests a way forward.

This post, authored by Toby Shevlane, summarises the key claims and implications of his recent paper “Structured Access to AI Capabilities: An Emerging Paradigm for Safe AI Deployment.”

GovAI research blog posts represent the views of their authors, rather than the views of the organisation.

Sharing powerful AI models

There is a trend within AI research towards building large models that have a broad range of capabilities. Labs building such models face a dilemma when deciding how to share them.

One option is to open-source the model. This means publishing it publicly and allowing anyone to download a copy. Open-sourcing models helps the research community to study them and helps users to access their beneficial applications. However, the open source option carries risks: large, multi-use models often have harmful uses too. Labs (especially industry labs) might also want to maintain a competitive edge by keeping their models private.

Therefore, many of the most capable models built in the past year have not been shared at all. Withholding models carries its own risks. If outside researchers cannot study a model, they cannot gain the deep understanding necessary to ensure its safety. In addition, the model’s potential beneficial applications are left on the table.

Structured access tries to get the best of both approaches. In a new paper, which will be published in the Oxford Handbook on AI Governance, I introduce “structured access” and explain its benefits.

What is structured access?

Structured access is about allowing people to use and study an AI system, but only within a structure that prevents undesired information leaks and misuse. 

OpenAI’s GPT-3 model, which is capable of a wide range of natural language processing tasks, is a good example. Instead of allowing researchers to download their own copies of the model, OpenAI has allowed researchers to study copies that remain in its possession. Researchers can interact with GPT-3 through an “application programming interface” (API), submitting inputs and then seeing how the AI system responds. Moreover, subject to approval from OpenAI, companies can use the API to build GPT-3 into their software products.

This setup gives the AI developer much greater control over how people interact with the model. It is common for AI researchers to open-source a model and then have no way of knowing how people are using it, and no way of preventing risky or unethical applications.

With structured access, the developer can impose rules on how the model should be used. For example, OpenAI’s rules for GPT-3 state that the model cannot be used for certain applications, such as targeted political advertising. The AI developer can then enforce those rules, by monitoring how people are using the model and cutting off access to those who violate the rules.

The development of new technologies often runs into the “Collingridge dilemma”. The theory is that, by the time the impacts of a technology have become apparent, they are already irreversible. Structured access helps to fight against this. If the developer learns that their model is having serious negative impacts, they can withdraw access to the model or narrow its scope.

At the same time, structured access allows the research community to better understand the model – including its potential risks. There has been plenty of valuable research into GPT-3, relying simply on the API. For example, a recent paper analysed the “truthfulness” of the model’s outputs, testing GPT-3 on a new benchmark. Other research has explored GPT-3’s biases.

The hope is that we can accelerate the understanding of a model’s capabilities, limitations, and pathologies, before the proliferation of the model around the world has become irreversible.

How can we go further?

Although there are existing examples of structured access to AI models, the new paradigm has not yet reached maturity. There are two dimensions along which structured access can be improved: (1) the depth of model access for external researchers, and (2) the broader governance framework.

GPT-3 has demonstrated how much researchers can accomplish with a simple input-output interface. OpenAI has also added deeper access to GPT-3 as time goes on. For example, instead of just getting GPT-3’s token predictions, users can now get embeddings too. Users can also modify the model by fine-tuning it on their own data. GPT-3 is becoming a very researchable artefact, even though it has not been open-sourced.

The AI community should go even further. An important question is: how much of a model’s internal functioning can we expose without allowing an attacker to steal the model? Reducing this tradeoff is an important area for research and policy. Is it possible, for example, to facilitate low-level interpretability research on a model, even without giving away its parameter values? Researchers could run remote analyses of the model, analogous to privacy-preserving analysis of health datasets. They submit their code and are sent back the results. Some people are already working on building the necessary infrastructure – see, for example, the work of OpenMined, a privacy-focussed research community.

Similarly, labs could offer not just the final model, but multiple model checkpoints corresponding to earlier stages in the training process. This would allow outside researchers to study how the model’s capabilities and behaviours evolved throughout training – as with, for example, DeepMind’s recent paper analysing the progression of AlphaZero. Finally, AI developers could give researchers special logins, which give them deeper model access than commercial users.

The other area for improvement is the broader governance framework. For example, with GPT-3, OpenAI makes its own decisions about what applications should be permitted. One option could be to delegate these decisions to a trusted and neutral third party. Eventually, government regulation could step in, making certain applications of AI off-limits. For models that are deployed at scale, governments could require that the model can be studied by outsiders.

Structured access is very complementary with other governance proposals, such as external audits, red teams, and bias bounties. In a promising new development, Twitter has launched a collaboration with OpenMined to allow its models (and datasets) to be audited by external groups in a structured way. This illustrates how structured access to AI models can provide a foundation for new forms of governance and accountability. 

Industry and academia

I see structured access as part of a broader project to find the right relationship between AI academia and industry, when it comes to the development and study of large, multi-use models.

One possible arrangement is where certain academic research groups and industry research groups compete to build the most powerful models. Increasingly, this arrangement looks outdated. Academics do not have the same computational resources as industry researchers, and so are falling behind. Moreover, as the field matures, building stronger and stronger AI capabilities looks less like science and more like engineering.

Instead, industry labs should help academics to play to their strengths. There is still much science to be done, without academics needing to build large models themselves. Academics are well-placed, for example, to contribute to the growing model interpretability literature. As well as being scientific in nature, such work could be extremely safety-relevant and socially beneficial. As scientists, university-based researchers are well-placed to tackle the important challenge of understanding AI systems.

This benefits industry labs, who should try to cultivate thriving research ecosystems around their models. With the rise of very large, unwieldy models, no industry lab can, working alone, truly understand and address safety or bias issues that arise in them — or convince potential users that they can be trusted. These labs must work together with academia. Structured access is a scalable way of achieving this goal.

Conclusion

This is an exciting time for AI governance. The AI community is moving beyond high-level principles and starting to actually implement new governance measures. I believe that structured access could be an important part of this broader effort to shift AI development onto a safer path. We are still in the early stages, and there is plenty of work to be done to work out exactly how structured access should be implemented.


Footnotes
Further reading