Margaret Roberts & Jeffrey Ding on Censorship’s Implications for Artificial Intelligence

While artificial intelligence provides the backbone for many tools people use around the world, recent work has brought attention to the potential biases that may be baked into these algorithms. While most work in this area has focused on the ways in which these tools can exacerbate existing inequalities and discrimination, we bring to light another way in which algorithmic decision making may be affected by institutional and societal forces. We study how censorship has affected the development of Wikipedia corpuses, which are in turn regularly used as training data that provide inputs to NLP algorithms. We show that word embeddings trained on the regularly censored Baidu Baike have very different associations between adjectives and a range of concepts about democracy, freedom, collective action, equality, and people and historical events in China than its uncensored counterpart Chinese language Wikipedia. We examine the origins of these discrepancies using surveys from mainland China and their implications by examining their use in downstream AI applications.

Molly Roberts is an Associate Professor in the Department of Political Science and the Halıcıoğlu Data Science Institute at the University of California, San Diego. She co-directs the China Data Lab at the 21st Century China Center. She is also part of the Omni-Methods Group. Her research interests lie in the intersection of political methodology and the politics of information, with a specific focus on methods of automated content analysis and the politics of censorship and propaganda in China.

Jeffrey Ding is the China lead for the AI Governance Research Group. Jeff researches China’s development of AI at the Future of Humanity Institute, University of Oxford. His work has been cited in the Washington Post, South China Morning Post, MIT Technology Review, Bloomberg News, Quartz, and other outlets. A fluent Mandarin speaker, he has worked at the U.S. Department of State and the Hong Kong Legislative Council. He is also reading for a D.Phil. in International Relations as a Rhodes Scholar at the University of Oxford.

You can watch a recording of the event here or read the transcript below

Allan Dafoe  00:00

Welcome, I’m Allan Dafoe, the director of the Center for the Governance of AI, which is organizing this series. We are based at the Future of Humanity Institute at the University of Oxford. For those of you who don’t know about our work, we study the opportunities and challenges brought by advances in AI, so as to advise policy to maximize the benefits and minimize the risks from advanced AI. It’s worth clarifying that governance, this key term, refers descriptively to the ways that the decisions are made about the development and deployment of AI, but also the normative aspiration: that those decisions emerge from institutions that are effective, equitable, and legitimate. If you want to learn more about our work, you can go to I’m pleased today to welcome our speaker Molly Roberts, and our discussant Jeffrey Ding. Molly is Associate Professor of Political Science at the University of California, San Diego. She’s a scholar of political methodology, the politics of information, and specifically the politics of censorship and propaganda in China. She has produced a number of fascinating papers, including some employing truly innovative experimental design, probing the logic of Chinese web censorship. Molly will present today some of her work co-authored with Eddie Yang, on the relationship between AI and Chinese censorship. I was delighted to learn that Molly was turning her research attention to some issues in AI politics. After Molly’s presentation we will be joined by Jeffrey Ding in the role of discussant. Jeff is a researcher at FHI and Oxford DPhil PhD student in a pre-doctoral fellow at CSAC, at Stanford. I’ve worked with Jeffrey for the past three years now, and during that time, have seen him really flourish into one of the premier scholars on China’s AI ecosystem and politics. So now Molly, the floor is yours.

Molly Roberts  01:57

Thanks, Allan. And thanks for so much for having me. And I’m really excited to hear Jeffrey’s thoughts on this since I’m a follower of his newsletter, and also his work on AI in China. So this is a new project to try to understand the relationship between censorship and artificial intelligence. And I see this as sort of the beginning of a larger work on this relationship between censorship and artificial intelligence. So I’m really looking forward to this discussion. This is joint work with Eddie Yang, who’s also at UC San Diego. So you might have heard, and probably on this webinar series, that a lot of people think that data is the new oil, data is the input to a lot of products. It can be used to predict financial, to make financial predictions that can be used to then trade stocks or to predict the future of investments. And and at the same time, that data might be the new oil, we also worry a little bit about the quality of this data. So how good is this data? How good is data that’s inputted into these products, applications that are that we’re using now a lot in our AI world. So we know there’s a really interesting new literature in AI about politics and bias within artificial intelligence. And this idea behind this is that this huge data that powers AI applications is affected by human biases that are then encoded in that training data, which then impacts the algorithms that are then used within user facing interfaces or products that encode, that replicate or enhance that bias. So there’s been a lot of great work looking at how racial and gender biases can be encoded within these training datasets that then are put into these algorithms and user facing platforms. For example, there’s been – I don’t know why my tex didn’t work here – but Latanya Sweeney has some great work on ad delivery, speech recognition, there’s been also some great work on word embeddings and image labeling.

Sweeney, Latanya. “Discrimination in online ad delivery.” Communications of the ACM 56.5 (2013): 44-54.

Koenecke, Allison, Andrew Nam, Emily Lake, Joe Nudell, Minnie Quartey, Zion Mengesha, Connor Toups, John R. Rickford, Dan Jurafsky, and Sharad Goel. “Racial disparities in automated speech recognition.” Proceedings of the National Academy of Sciences 117, no. 14 (2020): 7684-7689.

Davidson, Thomas, Debasmita Bhattacharya, and Ingmar Weber. “Racial Bias in Hate Speech and Abusive Language Detection Datasets.” Proceedings of the Third Workshop on Abusive Language Online. 2019.

Caliskan, Aylin, Joanna J. Bryson, and Arvind Narayanan. “Semantics derived automatically from language corpora contain human-like biases.” Science 356.6334 (2017): 183-186.

Zhao, Jieyu, Tianlu Wang, Mark Yatskar, Vicente Ordonez, and Kai-Wei Chang. “Men Also Like Shopping: Reducing Gender Bias Amplification using Corpus-level Constraints.” In Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing. 2017.

Li, Shen, Zhe Zhao, Renfen Hu, Wensi Li, Tao Liu, and Xiaoyong Du. “Analogical Reasoning on Chinese Morphological and Semantic Relations.” In Proceedings of the 56th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), pp. 138-143. 2018.

So in this talk, we’re going to explore another institution that impacts AI, which is censorship. Censorship impacts the training data, which then impacts NLP models and applications that are used. So we’re going to look instead of at institutional or human biases that might impact training data, here, we’re going to look at how censorship policies on behalf of governments impact training data. But then how this might have a downstream impact on applications. So we know that large user-generated datasets are the building blocks for AI. So this could be anything from Wikipedia corpuses to social media data sets, government curated data, that more and more data is sort of being put online and this is being used in downstream AI applications. But we also know that governments around the world influence these datasets and have political incentives to influence these datasets which are then used downstream. And they can influence these datasets through fear, through threats or laws that create self censorship that make it so people won’t put things on social media or that they’re whatever their activities are not reflected in government curated data, they can influence these these data sets through friction, what I call friction, which is sort of deletion or blocking of social media posts, or preventing certain types of posts on Wikipedia or preventing some sort of data to be uploaded to a government website, for example. And they can also influence these datasets through flooding, or coordinated addition of information. So we think about coordinated sort of internet armies or other types of government organized groups trying to put information on Wikipedia or on social media, or to to influence the information environment.

Molly Roberts  06:05

So this data is then used in other user facing applications. So increasingly, AI is taking data available on the internet through common crawl through Wikipedia, through social media, and then using it as a base for algorithms in entertainment applications and productivity applications in algorithmic governance and a lot of different downstream applications. So our question is, how does censorship, how does this government influence on these data sets then affect the politics of downstream applications? And it could be that, it could be that even if some of these applications are not in themselves political, that because of this political censorship, they could have some political implications. Deciding which corpus to use, for example, could have political implications on downstream applications. So this paper looks particularly at censorship of Wikipedia corpuses. So we study censorship of Chinese online encyclopedias, and we look at how these different online encyclopedias have different implications for Chinese language, NLP (natural natural language processing). And I’m sorry that my citations aren’t working, but we use word embeddings, they’re trained on two Chinese online encyclopedia corpuses. These are trained by Lee et al., which are, are trained in the same way on Baidu Baike encyclopedia corpus and Chinese language Wikipedia. So we look at Chinese language Wikipedia, which is not blocked within China, and Baidu – sorry, which is blocked but uncensored and Baidu Baike which is not blocked within China but has pre-publication censorship restrictions on it, We look at  how using each of these different corpuses, which have different censorship controls can have different implications for downstream applications. We measure political word associations between these two corpus, and we find that word embeddings, which I’ll go over in a second, trained on Baidu Baike associate more negative adjectives with democracy in comparison to Chinese language Wikipedia, and more positive associations with CCP and social CCP and other types of social control words. And we find with a survey that Baidu Baike word embeddings are not actually more reflective of views of people within China. And therefore, we don’t think that this is coming from simply people’s contributions to Wikipedia, but we think it is coming from the censorship of Wikipedia. And then we also identify a tangible effect of the decision to use pre-trained word embeddings on Baidu Baike versus pre-trained word embeddings on Chinese language Wikipedia in downstream NLP applications. And we’ll talk a little bit at the end about what the strategic implications this might have for politics and AI. So pre-trained word embeddings, some of you may be familiar with what these are, but just by way of introduction in case you’re not: natural language processing, which are algorithms that are used on text, rely on sort of numerical representations of text. So, we have to figure out how to represent text numerically in order to use it then in downstream application. So anything that is doing AI on social media data, Wikipedia data, encyclopedia, on predictive text, this is all relying on a numerical representation of that text. So one way that is common within the social sciences, is to represent text is to simply give each word a number essentially, and say, is this word included within this document or not? This is called the bag of words representation, a one or zero whether or not you use this particular word or not within a text. But another way to represent this text that has become very, very popular in computer science and increasingly also in social sciences is to use word embeddings to represent text. So the idea behind this is that each word, word embeddings estimate a K-dimensional vector, sometimes a 200 300 length vector for any word within a within a huge dictionary of words. And this vector encodes the similarity between words. So words that are likely to be used as substitutes for each other words that are often used in the same context will be in more similar areas of this k dimensional space than other words, and this allows so using pre-trained word embeddings, which already have a K-dimensional vector trained on a large corpus allows an NLP application to know, at the start, how these words might be similar to each other. So often, these word embeddings are pre-trained on very large corpuses, and then they’re used as inputs in smaller NLP tasks. So I already know that two words are more similar to each other than another word, even before starting to train my data.

Molly Roberts  11:13

So often, pre-trained word embeddings are made available by companies, by academics. So this is just one example of a screenshot from fastText: Facebook makes available a lot of different pre-trained word vectors that are trained, these ones are trained on common crawl and Wikipedia. So really large corpuses, they’re using these in 157 languages, you can then go download them and use them as inputs into your NLP model. So here’s just an example,  sort of to fix your fix ideas of what word embeddings are doing. So say that I have two documents. This is from an IBM Research blog, they have two documents document x on the left here, says I gave a research talk in Boston and document y on the right, this is a data science lecture in Seattle, these actually don’t share any words. But if you have word embeddings as a representation of the text, you would know – so this is a very simple two dimensional word embeddings, but imagine in a 300 dimensional space, right? – you would know that actually, these two documents are quite similar in content to each other. Because [garbled] Boston are often reusing the same, often in the same context as each other, [garbled] similar research and science or similar talk and lecture are similar. So the place that these words are within space would be pre-trained on a large corpus, then you could use this word embedding as an input, which would give you more information about those documents. So here, we come to censorship of training data. So these are often pre-trained word embeddings are often trained on very large data sets, like Wikipedia: because they’re user generated, they cover lots and lots of different topics. So we think that they’re sort of representative of how people talk about many different things. In China, or in the Chinese language, however, this is complicated by the fact that the Chinese government has blocked Chinese language Wikipedia. And therefore there’s also been the development of another Wikipedia corpus Baidu Baike, which is unblocked within China but is censored. So both of these are online encyclopedias in China, they’re both commonly used as training data for NLP and if you look at CS literature, you’ll see both of them as used as training data. Chinese language Wikipedia is uncensored as I said, but it is blocked and Baidu Baike is censored in that there are a lot of regulations of what can be written on Baidu Baike, but it is unblocked in that it is available within mainland China. So for example, if you want to create a post or a entry on Baidu Baike about the June 4th movement, it automatically tells you you cannot create this post. Also, there are a lot of regulations about political topics have to follow Chinese official government, Chinese government official news sources, so there’s so there’s a lot of pre-censorship of these entries, unlike Chinese language Wikipedia, where you can contribute without pre-censorship. There’s been some great work by Zhang and Zhu in 2011, American Economic Review, censorship of Wikipedia has reduced contributions to it. So they show that when Chinese language Wikipedia was censored, that there are many, many fewer contributions to Chinese language Wikipedia, because there’s a decrease in that in the audience of the of the site. And because of this, we are apparently at least because of this, Baidu Baike is many, many times larger than Chinese language Wikipedia with 16 times more pages. And therefore it’s increasingly an attractive source of training data for Chinese language NLP.

Molly Roberts  14:49

So what we do in this paper is we compare word embeddings. Between, we compare essentially where word vectors sit in pre trained word embeddings, pre trained on Chinese language Wikipedia versus Baidu Baike. So I’ll just give you a really simple example of what this might look like. So we have word embedding a say this is Baidu Baike. These are a few different word vectors from this trained on this corpus, and we have word embedding b., say this is Chinese language Wikipedia What we’re interested in is how some target words, for example words like democracy, or other types of political words, where they sit in relation to adjectives, positive and negative adjectives. So in this case, is democracy closer in word embedding space to stability, or is it closer to chaos. And we could compare where democracy sits between these two positive and negative added adjectives on in Chinese language Wikipedia versus Baidu Baike. So what we do is we come up with groups of target words, and we have many different categories of target words, each category of target words, has about 100 different words associated with that category. So we use democratic concepts and ideas, we use categories such as democratic, democracy, freedom and election. So each of these categories that has about 100 different words that are sort of synonymous with this within it. And then we also use non targets of propaganda. So for example, social control, surveillance, collective action, political figures, CCP, or historical other historical events, we also find lots of lots of words associated with these in these categories. And then we look at how they are related to attribute words, like adjectives for example, or evaluative words, which are these words in blue. So we use a list of propaganda attribute words, words that we know from reading and studies of propaganda that are often associated with these concepts. And we also use general evaluative words from these big adjective evaluated word lists in Chinese that are often used in Chinese language NLP. So what we do is we take each target word vector, so this is xi, where xi is either that word that part of the vector of the target word from Baidu, or from Wikipedia. And then we take the attribute word vectors, where A is positive attribute that that vector in either Baidu or Wikipedia, and B has the negative word vector for other by doing Wikipedia. And then for each embedding for Baidu, or for Wikipedia, we examine the cosine similarity between the target word and the positive attribute words, minus the mean cosine similarity between the target word and the negative attribute words. And then we take the difference between these differences across all of the word target words within a category to get the relationship or how much closer positive words are overall to the target category in comparison of Baidu to Chinese language Wikipedia. So if Baidu is Category A, and Wikipedia is Category B, if this is very negative, it means that the target, the target category is more associated with negative words. If this is more positive, this means that this target category is more associated with positive words. And the difference between these would be negative which means that Baidu would be associated more negatively than Chinese language Wikipedia. To assess statistical significance, we do a permutation test where we permute the assignment of the word vectors to A B and then we see how extreme our result is in comparison to that permutation. So the theoretical expectation of this is that overall, freedom, democracy, election, collective action, negative figures, all of these categories will be more associated with negative attribute words in Baidu Baike as in comparison to Chinese language Wikipedia. On the other hand, categories like social control, surveillance, CCP, historical events, and positive figures should be more positively associated. And this is exactly what we find. So here is the effect size for propaganda words for each of these categories. And here’s the effect size for evaluative words for each of these categories along with the p value of statistical significance. And we find that overall, Baidu Baike target words in the categories of freedom, democracy, election, collective action, negative figures, are more associated with negative attribute words in Baidu Baike than they are in Chinese language Wikipedia, and the opposite for categories such as social control, surveillance, CCP, etc.

Molly Roberts  19:31

So this could be one possibility that you might think is that perhaps it’s just simply that mainland Chinese internet users view target categories differently than overseas internet users contributing to Chinese language Wikipedia. And therefore this just this difference in word associations between these two, these two sets of internet users is creating this difference in online encyclopedias. So to try to get at this, we did an online survey of about 1000 response in mainland China and we asked people if they thought that we asked them between the following options, which do you think better describes a particular target word. And we took the closest attribute word from Baidu Baike, in the word embedding space, and the closest attribute word for Wikipedia, and we asked people to evaluate that. And what we found is that overall, neither of neither of the Baidu Baike nor Chinese language Wikipedia seemed to better reflect the associations of our survey respondents. So for some words, Chinese so this on the x axis is the likelihood of choosing the Baidu word for some categories and for some lists of attribute words, Chinese language Wikipedia was preferred, and in some categories for some list of attribute words Baidu Baike was preferred. So we didn’t see one of these to necessarily dominate the other in terms of users evaluations of them. So we didn’t we that sort of rejected this sense that Baidu Baike is just better reflecting people’s work associations. So the third thing that we did was we evaluated the downstream effect of these word embeddings on a machine learning task. So the task that we set out to do is classify news headlines according to sentiment. And we use a big general Chinese news headlines data set as our training data. So this is saying, so say you were, you wanted to create a general sentiment news headline classifier, say to create a recommendation system, or to do content moderation on a social media website, for example, say you were creating this algorithm, you might use a general Chinese news headline data set as training data. And then we’re going to look at how the algorithm that that was trained, performs on news headlines that contain these target words, like words related to democracy and election and freedom and social control and surveillance and words with historical or figures that might be of interest to CCP. So we use three different models, Naive Bayes, SVM, and neural network. And we look at how using the same training data, the same models, but simply with different pre-trained word embeddings one that comes from Baidu Baike, one that comes from Chinese language Wikipedia, how just using different pre-trained word embeddings can influence that systematic classification error of this downstream of this downstream task. So do overall do models trained with pre-trained Baidu Baike word embeddings have a just a slightly more negative classification of headlines that contain democracy, than models that contain word embeddings that are models that were trained using pre-trained Chinese language Wikipedia word embeddings.

Molly Roberts  22:57

So this is an example of so for example, this is a headline “Tsai Ing-wen: Hope Hong Kong can enjoy democracy as Taiwan does” the Wikipedia label here comes out as positive when we train this, but the Baidu Baike label when we use the same classifier, same training data, just a different word embeddings comes out as negative, or “Who’s shamed by democratization in the kingdom of Bhutan”, the Baidu Baike label here is coming out as negative, the Wikipedia label is coming out as positive even though the human label here is negative. So, um, so, you know, what are the sort of systematic mistakes that these classifiers are making? Overall, we see that these classifiers actually have very similar accuracy. So it’s not that Baidu Baike models trained on Baidu Baike word embeddings have a higher accuracy than the models trained on Chinese language Wikipedia word embeddings, we see that the accuracy is quite similar between each each of these different word embeddings. But we see big effects on the classification error in each of these different categories. So me, LIJ is the human labeled score for a news headline for target word I in category J. So this would be a negative one if it was a negative sentiment and a positive one if it was a positive sentiment. And if we use the if we get the predicted scores from Baidu and from Wikipedia, our models trained on Baidu Baike word embeddings versus Wikipedia word embeddings. And then we create a dependent variable that is the difference between Baidu and the human label and Wikipedia and the human label for a category, then we can we can estimate how the difference between how the difference between the human label and the predicted label changes by category for the Baidu classifier versus the Wikipedia classifier. So our coefficient of interest here is beta J. How does, how is, is there systematic differences in the direction of classification for a certain category for the algorithm trained with Baidu word embeddings versus Wikipedia word embeddings. And what we find is that there are quite systematic differences across all different machine learning models in the direction that we would expect. So Baidu Baike overall is much are the classifiers trained with pre-trained word embeddings on Baidu Baike overall are much more likely to categorize headlines that contain target words in the categories of freedom, democracy, election, collective action, negative figures, as more negative than social control, surveillance, CCP, historical and positive figures. So just to sort of think about a little bit of the implications of the potential implications of this. So there are sort of strategic incentives. So given that what I hope I convinced you so far is that censorship of training data can have an impact, a downstream impact on NLP applications. And if that’s true, one thing that we might try to think about is, are there strategic incentives to manipulate training data? So we do know that there are lots of government-funded AI projects to create more training data, to gather more training data that then can be used in AI in order to sort of push AI along, might there be sort of strategic incentives to influence this part, the politics of this training data? And how could this play out sort of downstream. So you might think that there would be a strategic benefit to, for example, a government, for example, to manipulate the politics of the training data. And we might think that that could be that their censored training data could in some circumstances reinforce the state, right. So in applications like predictive text, where we’re creating predictive text algorithms, the state might want sort of associations that are reflective of its own propaganda, and not reflective of things that it would like to censor, to sort of replicate themselves within these predictive text algorithms, right. Or in cases like recommendation systems or search engines, we might think that a state might want these applications trained on on data that they themselves curate.

Molly Roberts  27:20

On the other hand, and this I think is maybe less, less obvious when you first start thinking about this, but became more obvious to us as we started thinking about this more: censored training data, it might actually make it more difficult for the state to see society in a ways that it might actually undermine some applications in in certain ways. So for example, content moderation, there are a lot of new AI algorithms to moderate content online, whether it’s to censor it, to remove content that that violates the terms of service of a website, etc. If content moderation is trained on data that has all of sensitive or objectionable topics removed, in fact, it might be worse actually distinguishing between these topics, from the state’s perspective, than if that initial training data were not censored, right, and so we can think about ways in which censorship of training data might actually undermine what the state is trying to achieve. The other way in which it could be problematic for the interests of the state is to see in public opinion monitoring. So if, for example, a lot of training data were censored, in that removed opinions or ideas that were in conflict with the states, it might also if that was used as training data to then understand what the public thinks on on in, for example, by looking at social media data, which we know a lot of states do. And this could bias the outcome of this data in ways that would make it harder for the state to sort of see society. So just to give a plug for another paper that I’ve been that it’s coming out in the Columbia Journal of Transnational Life, I work with some co-authors on the Chinese legal system. And we show that sort of legal automation, which is one of the objectives of the Supreme People’s Court in China is sort of undercut by a data missing this within this big legal data set that the the Supreme People’s Court has been trying to curate. So in summary, data reflects, we know, the institutional and political contexts in which it was created. And not only do human biases replicate themselves in AI, but also political policies impact training data, which then has downstream applications. We showed this in word embeddings and my downstream NLP applications as a result of Baidu Baike and Chinese language Wikipedia word embeddings. But of course, we think that this is a much more general phenomenon that is potentially worthy of future study. And this could have an effect In a wide range of areas, including public opinion monitoring, conversational agents, policing and surveillance and social media curation. So AI, is in some sense can can, in some sense, replicate or enhance sort of an automation of politics. And there have been some discussions about trying to de-bias AI, we also think that this is, would be might be difficult to do, especially in this context where we’re not really sure what a de-biased political algorithm would look like. And so, thanks to our sponsors, and really looking forward to your questions and comments.

Allan Dafoe  30:41

Thanks, Molly. 66 people are applauding right now. That was fantastic. I know I had troubles processing all of your contributions, and I did a few screenshots, but not enough. So I’m sure we’ll, or I think there’s a good chance we’ll have to have you flick back through some of the slides. A reminder to everyone, there’s a function at the bottom where you can ask questions, and then you can vote up and down on people’s questions. So that’d be great to have people engaging. So now, over to Jeffrey Ding for some reflections.

Jeffrey Ding  31:13

Great. Yeah, this was really cool, really cool presentation. Dr. Roberts sent along like a early version of the paper beforehand, so we can get into a little bit and unpack the paper a little bit more in the discussion. But I just wanted to say off the bat that it’s just a really cool paper and it’s a good example of kind of flipping the arrow because a lot of like related work in this area most people look at the effect of NLP and language models on censorship. And just flipping the arrow to look at the reverse effects is really cool. And it also speaks to like this broader issue in NLP research where the L matters in NLP. So much of the time, most of the time, we talk about like English language models, but we know there are differences in terms of languages, in terms of how NLP algorithms are applied. So we see that with like low resource languages, like Welsh and Punjabi, there’s still a lot of barriers to developing NLP algorithms. And your presentation shows that even for the two most resourced languages, English and Chinese, there are still significant differences, tied to the censorship. And finally, the thing that really stuck out to me from the papers, just in the presentation is just an understanding and the integration of the technical details about about how AI actually works, and tied to political implications. And then one line that really stuck out and then you emphasize in the presentation is that the differences that you’re seeing and the downstream effects don’t necessarily stem from the training data in the downstream applications, or even the model itself, but from the pre-trained word embeddings, that have been trained on another data set. So that’s just a really cool, detailed finding, and kind of a level of nuance that you just don’t really see in the space. So really excited to dig in. I just have kind of three buckets, and then a couple of just thoughts to throw at you at the end. So the first bucket is which words to choose, which target words to choose. And I find it really interesting just like thinking through this, because for example, you pick election and democracy as two of the examples. And for democracy, it actually brings up an interesting question in that like, the CCP has kind of co-opted the word democracy, mínzhǔ. And and it actually like ranks second on the party’s list of 12 core values that they published in December 2013. Elizabeth Perry has written on this sort of like, populist dream of Chinese Democracy. So I’d be curious if you thought about that of like, when you you know, when you’re in different Chinese cities, and you see like the huge banners with democracy plastered along all these banners? And just, I wonder, like, what if you picked a target word like representation, or something that might speak to this more kind of populist dream or populist co-option of what democracy means?

Jeffrey Ding  34:07

And then on the point about I think the second point is sort of, on the theoretical expectations kind of tied to this democracy component, whether we should expect kind of more negative connotations related to democracy in the first place, is this idea of the negative historical events and negative historical figures. And the question is, why should we expect a more negative portrayal if these events and figures have been erased from the corpus? So shouldn’t it be, shouldn’t it be just basically not positive or not negative, kind of like just a neutral take? And I think in the paper, you gestured, you kind of recognize this and say that there’s very little information about these historical figures, sSo so their word embeddings do not show strong relationships with the attribute words and I’m just curious if we should expect the same thing with the negative historical events as well, like Tiananmen Square is the most obvious example. And then on the results I just had a quick thing that surprised me a little bit was that you showed that Baidu Baike and Wikipedia perform at the same level of accuracy overall. And then kind of the setup of the initial question is that Baidu Baike has just become a much better corpus, and there’s much more time spent on the corpus, it’s 16 times larger. So I’m just curious why we didn’t see that Baidu Baike corpus perform better.

Jeffrey Ding  35:37

And then yeah, I had, I had some comments on kind of threats to inference, kind of like alternative causes other than censorship that are producing the results. And actually one of them was just a different population of editors. And it’s cool that you all have already done a survey experiment to kind of combat that. That kind of alternative cause I was just thinking, like, as you’re talking about the social media stuff, I wonder if the cleanest way to kind of show the censorship as the key driving factor would be to like, train, train a language model based off of a censored version of like Weibo posts, a sample Weibo posts versus like the population that includes all the Weibo posts from a certain time period. And some, no, that’s like something that other researchers have used to study censorship. And then my last thought, just to open it up to like, kind of bigger questions that I actually don’t know that much about, but it would be cool to know, there’s a lot of technical people on the webinar as well, they could chime in on this point. But the hard part about studying these things is the field moves so fast. So now people are saying that it’s only a matter of time before pre-trained word embeddings and methods like word2vec are just completely replaced by pre-trained language models like, OpenAPI’s work, Google’s work, ELMo GPT2, GPT3. And the idea is that pre-trained word embeddings, kind of they only incorporate previous knowledge into the first layer of the model. And then the rest of the network still needs to be trained from scratch. And recent advances have basically taken kind of what people have done with computer vision and just taken a, to pre-train the entire model with a bunch of hierarchical representations. So I guess like word2vec, would be just wanting the edge and then these pre-trained language models would be learning like the full hierarchy of all of the features from like edges to shapes. And it’d be interesting to explore whether, to what extent, these new language models would still fall into the same traps, or whether they will provide ways to kind of combat some of the problems that you’ve raised. But yeah, looking forward to the discussion.

Molly Roberts  37:53

For great, fantastic comments, and thank you so much, I really appreciate that. And just to sort of pick up on a few of them. And, yeah, we were actually we didn’t, we had certain priors about the category of democracy, we thought that overall, it would be more negative. But of course, we did discuss this issue of mínzhǔ and how and how it’s been used within propaganda within China. The way that we did it was we took, we used both sets of word embeddings to look at all of the closest words to democracy, and get all hundred of those. So it’s not just mínzhǔ, but it’s also all of the other things that are sort of subcategories of democracy. And so it could be that for one of these words, it might be different than others. Right. And so I think that we’re seeing sort of like the overall category, but I think it’s something we should look a little bit more into, because it could sort of piece out some of these mechanisms. Yeah, so one of the things we find with negative historical events and figures is we get less decisive results in these categories. And we think that this is because Baidu Baike just doesn’t have entries on these negative historical events and figures. I think this is one example of how censorship of training data can make can sort of undermine the training corpus, because even from the perspective of the state, if algorithms were using this, and for example, social media, or censorship down the road, you would expect the state would want the algorithm to be able to distinguish between these things, but in fact, because of censorship itself, the algorithm is maybe going to do less well, there might we haven’t shown this yet, but we wouldn’t expect it to do less well on the censorship task than it would have if the training data weren’t censored in the first place. So that’s sort of an interesting kind of catch 22 of this for the from the state’s perspective, right. So and it is interesting that Baidu Baike and Wikipedia at least in our case performed with about the same level of accuracy. And there are papers that show that for certain really more complicated models, the magnitude of the Baidu Baike corpus is better. But of course, I think it sort of depends on your application. In our case, there wasn’t really a difference between the performance or the level of accuracy.

Molly Roberts  40:21

And I really liked this idea of looking at censored versus uncensored corpuses of Weibo posts to try to understand how that could have a downstream effect on training, I think that’d be great way to kind of piece that out. And then this point that you have about pre-trained language models sort of superseding this sort of like pre-trained embeddings, and this transfer learning task, I think that that’s really, really interesting development. And I think that this only makes these questions of where what the trick with initial training data is in transfer learning become more and more important, right? Because these are just sort of the biggest data, whatever has the most data, is that data itself has been amplified by the algorithm downstream. And it’s hard to sort of think about how to delete those biases without actually just fixing the training data itself, or making it more representative of whoever gets, you know, of the population or the language, etc. So yes, I’m looking forward to more discussion. And thank you so much for this awesome comments.

Allan Dafoe  41:35

Great, Jeff, do you want to say any more?

Jeffrey Ding  41:38

Yeah, that last point is really interesting, because there’s some people that are saying that, like, basically, NLP is looking for it’s kind of ImageNet, and kind of, you know, this big, really representative really good data set that you can then just train, you know, your train the language models on and then you do you do transfer learning and learning to all these downstream tasks? And yeah, I think your paper really points to like, if Baidu Baike becomes the ImageNet of Chinese NLP, and you know, I don’t know enough of the technical details in terms of like, if there’s ways in transfer learning to like, do some of the debiasing from the original training set, but yeah, I think, yeah, I think, obviously, the paper will still be super relevant to kind of wherever the NLP models are going.

Allan Dafoe  42:30

Great. Well, I have some thoughts to throw in, and I see people asking questions and voting. So that’s good. We’ll probably start asking those two eventually. And also, you too, should continue. Yeah, saying whatever you want to say. So but just some brief thoughts from me. So I also really like this idea of doing this kind of analysis on a corpus of uncensored data, and then you have an indicator for whether the post was censored. And of course, Molly, I think was it your PhD work, in which you did this. Yeah. So Molly’s already been a pioneer in this research design. And it’s not to say that this, I think it would just be a nice complement to this project. Because this project, you know, you have two nice corpuses, but it’s not obvious what’s causing the differences. It’s like, it could be censorship, it could be fear, it could be different editorial, you know, editors, or just contributors. And whereas that would really, I mean, you’d get the result anyway, so I think that’d be really cool. Okay, so a question I have is maybe one way to phrase it is how, like, how deep are these biases in, in a model trained on these corpuses? And I mean, we, I’d say, currently, you know, we don’t know of a solution for an easy solution for how you can kind of remove notions of bias or, or meaning from, from a language model. You can often remove a kind of the, the connections that you think of, but then there’s still lots of, you know, hidden connections that you may not want. Now, here’s maybe an idea for how you can look at how robust these biases are. I think it was your study three. I don’t know if you’re able to flick to that slide. So you had this, you have a pre trained model. And then in study three, you gave it some additional, right, some additional training data. Okay, yeah.

Molly Roberts  44:36

I’ll go to the setup.

Allan Dafoe  44:39

Yeah, exactly. Yeah. So you have your pre-trained model, and then you give it some additional, right, these Chinese headlines, training data. And sort of the graph I think I want to see is your outcome, so how biased it is, as a function of how much training it’s done, and so on. Initially, it should be extremely biased. And then as you train, if I, well, I think this applies to study three, but if not study three, you could just do it for another corpus where you have sort of the intended associations in that corpus, and see how long it takes for these sort of inherited biases to diminish. You know, maybe they they barely diminished at all, maybe they very rapidly go down with, you know, the first. Well, anyhow, with not too large of a data set, maybe, you know, you never quite get to no bias, but you get quite low. So, yeah, that might be one interesting way of looking at how deep are these biases? How hard is it to extract them? And and I guess another question for you,a nd a question for anyone in the audience who is a natural language expert? Is, are there techniques today or likely on the horizon that could allow for unraveling or flipping or kind of, without requiring almost overwhelming the pre-trained data set, having some way of sort of doing surgery to change the biases in a way that’s not just superficial, but fairly deep? And so, you know, like, for example, maybe with if you have this censored, uncensored and censored data set, you could infer what are the biases being produced by censorship. And then, even if you have a small dataset of this uncensored sensor data, set the biases that you would learn from that. You could then, like, subtract that out of these larger corpuses. And I guess the question is, how effective would that be? And I don’t expect we know the answer, but might be worth reflecting on.

Molly Roberts  46:50

Those are really great points and really interesting points, and I am I, you know, we’re really standing on the shoulders of giants here. This, there’s this whole new literature, and I’m embarrassed that my tech didn’t work. And I’ll I’ll post links to these papers later, that are, is this whole literature on bias of within AI, with respect to race and gender. And, and certainly one of the things that this literature has started to focus on is what are the harms and the downstream applications? So when you talk about like, how deep are these biases, I think one of the things that we have to quantify sort of downstream is like, what are what are there? How are they being used? How is this data being used within applications? And then how does the application that affect people’s decision making or people’s opportunities, etc, or it’s up down down the road? And I think that’s a really hard thing to do. But, but it’s important. And I think that’s one of the ways that where we want to go sort of inspired by this literature. I think how bias is a function of the training data is really interesting. And I think we’ve done a little bit of a few experiments on that, but I think we should include that as a graph within the paper. And certainly, as you get more and more training data, the word embeddings will be less important. Right? And it’s, that would be at least my prior on that. And I think this idea of trying to devise, like, how would you sort of subtract out the bias? I like the idea of trying to figure out, so if you had a corpus, which included both uncensored and included an entire uncensored corpus, and then what information was censored, and then trying to reverse engineer, what are the things that are missing, right, and then adding that to back into the corpus, that would be sort of the way to go about it. It seems hard at one of the things that it doesn’t overcome is self censorship. Because of course, if people didn’t originally add that information to the corpus, even if it were, you never even see that within the data. And, and also sort of, with training data itself is affected by algorithms, because you know, what people talk about. So for example, on social media might be that a lot of people are talking about a lot of different political topics, but certain conversations are amplified, say, by moving them up the newsfeed and other applications are down the newsfeed. And so then you get sort of this feedback loop on what the training data is like. But then, if you then use that, again, into training data, amplifies that again. So I think that there’s so many complicated feedback, AI feedback loops within this space that they’re really difficult to piece out. But that doesn’t mean we shouldn’t try. Yeah, yeah.

Allan Dafoe  49:33

Yeah. A thought that occurred during your talk, is I can imagine the future of censorship is more like light editing. So I submit a post and then the language model says, let’s use these slight synonyms for the words you’re using that have a better connotation than that you can imagine just the whole social discourse being run through this filter of the right associations. And I guess a question for you then also on this is what is is there like an arms race with citizens? So if if citizens don’t entirely endorse the, the state pushed associations, how can what countermeasures can they take? So can they, you know, if one word is sort of appropriated, can they, you know, deploy other words? And I know, there’s kind of like symbolic, you know, games where you can kind of use a symbol as a substitute for a censored term. And, and, and so yeah, are we is there this kind of arms race dynamic happening, where the state wants to control associations and meanings, and people want to express meanings that are not approved of, and so then they change the meaning of words. And you know, maybe even in China, we would see, like a faster cycling or evolution of the meaning of words, because you have this cat and mouse game?

Molly Roberts  50:55

Yeah, I think that’s absolutely right. And I have definitely talked to people who have created applications for like suggesting words that would get around censorship, right. And, and, and that’s, you know, would be like an interesting technology, cat and mouse game around this with AI being used to censor and also adding news to evade censorship. I think one of the interesting implications of what we’re like if you think about the sort of the political structure of AI, as you think about, you know, maybe a set of developers who aren’t necessarily political in themselves, they’re creating applications that are, you know, productivity applications, entertainment applications that are being used in a wide from a wide variety of people. And they’re looking for the biggest data, right, and so and like the most data, the data that’s going to get them the highest accuracy. And because of that, I think the state has a lot of influence over what types of training data sets are developed. And, and has a lot of influence on these applications, even if the application developers themselves are not political. And I think that’s an interesting like, interaction. I’m not, you know, I think, I’m not sure how much states around the world have thought about the development, the politics within training data, and maybe, but I think it could be something that they start thinking about, and might be something to sort of try to understand that. You know, how they might, as training data begin somewhere more important, how they might try to influence it. Yeah.

Allan Dafoe  52:28

Good. Well, we’re at time. So yeah, the remaining questions, I’m afraid will will go unanswered. There was a request for your attention. What was the paper, automating fairness paper? And also, I think, I’m sure people are excited for this paper to come out. So yeah, we look forward to seeing this come out. And, you know, continuing to read, you’re really fascinating and creative. And I guess, yeah, just especially an empirical, like, your work is really thoughtful and effortful, and the extent to which you use sort of different quantitative designs and experimental designs to answer these, and almost kind of field experimental, I guess, designs where you’re, you’re really? Yeah, you can only deploy these experiments if you know the, the nature of the political phenomenon well enough, and I guess, have the resources to devise these experiments that you have been doing. So it’s very exciting work, and thanks for sharing the latest today.

Molly Roberts  53:39

Thanks. Thanks so much for having me. And yeah, thanks, Jeff, also for your fabulous comments.

Molly Roberts  53:47

Thanks, everybody, for coming.

Further reading