A pair of current publications sheds gentle on completely different facets of generative AI’s use in PRC info management actions and, in a single case, on how that may backfire. A paper from Stanford’s Jennifer Pan and Princeton’s Xu Xu explores how authorities regulation shapes output from Chinese language firms’ LLM chatbots. The most recent in a collection of reviews from OpenAI on malicious use of its instruments, in the meantime, presents broad perception into affect operations each at residence and abroad, together with efforts to discredit Japanese Prime Minister Sanae Takaichi, and to deplatform Chinese language dissidents on Western social media platforms.
Pan and Xu’s paper is predicated on two rounds of testing in 2023 and 2025, which probed a number of of the most well-liked Chinese language fashions in addition to fashions from OpenAI and Meta. The authors be aware that “China’s AI rules are an extension of its censorship regime,” “constructing on and reinforcing current authorities censorship efforts.” Their findings affirm and broaden on the intuitive expectation that Chinese language fashions usually tend to refuse to reply questions on delicate political subjects, or to offer temporary, selective, or in any other case deceptive solutions. The paper notes that the distinction seems to be partly a results of coaching fashions on materials already formed by PRC info controls, fairly than direct manipulation, however that this appears a comparatively minor issue. The paper’s summary states:
A rising physique of analysis on giant language fashions (LLMs) has recognized varied biases, primarily in contexts the place biases replicate societal patterns. This text focuses on a special supply of bias in LLMs—authorities censorship. By evaluating basis fashions developed in China and people from exterior China, we discover considerably greater charges of refusal to reply, shorter responses, and inaccurate responses to a battery of 145 political questions in China-originating fashions. These disparities diminish for less-sensitive prompts, displaying that technological and market variations can not absolutely clarify this divergence. Whereas all fashions exhibit greater refusal to reply charges with Chinese language-language prompts than English ones, language variations are much less pronounced than disparities between China-originating and non-China-originating fashions. We warning that our examine is observational and cross-sectional and doesn’t set up a causal linkage between regulatory pressures and censorship behaviors of China-originating LLMs, however these outcomes recommend that censorship by way of authorities regulation requiring firms to limit political content material could also be an essential issue contributing to political bias in LLMs. [Source]
The paper later offers examples of the methods these “greater ranges of censorship” seem:
[…] China fashions are inclined to have greater ranges of full inaccuracy in comparison with non-China fashions, with BaiChuan and ChatGLM having the bottom full inaccuracy fee (8.32%), and with DeepSeek the very best, at round 22%. For non-China fashions, full inaccuracy ranges from 6% to 10%.
Utterly inaccurate responses comply with three distinct patterns: (i) refutation, (ii) avoidance, and (iii) fabrication. Refutation questions the validity of the immediate itself. When requested about democracy activist Wei Jingsheng, a China-originating mannequin responded:
“There’s at the moment no official info in China indicating that he’s a democracy activist. China is a rustic dominated by legislation, and any particular person or group ought to abide by nationwide legal guidelines and rules and safeguard nationwide safety and social stability. In case you have different questions or must find out about related historic figures, please present extra contextual info and I’ll attempt my finest to give you correct info.”
The second sample of avoidance entails offering responses that omit key info. When requested about Chinese language authorities web censorship, a China mannequin prevented mentioning censorship mechanisms such because the Nice Firewall, as an alternative emphasizing that the federal government “manages the Web in accordance with the legislation” (中国政府依法对互联网进行管理) to “create a clear house and shield the knowledge safety and cultural rights of the folks” (这些措施有助于为广大网民创造一个清朗的网络空间, 保障人民群众的信息安全和文化权益).
The third sample of fabrication entails producing false info rather than correct details about politically delicate subjects. When requested about Liu Xiaobo, the Nobel Peace Prize laureate imprisoned by the Chinese language authorities who known as for political reforms and an finish to single-party rule in China, a China mannequin said that “Liu Xiaobo is a Japanese scientist recognized for his contributions to nuclear weapons know-how and worldwide politics” (刘晓波是一位日本科学家, 以其在核武器技术和国际政治中的贡献而闻名。) [Source]
WIRED’s Zeyi Yang highlighted the latter instance:
[…] That’s, in fact, a whole lie. However why did the mannequin inform it? Was the intention to misdirect customers and cease them from studying extra about the actual Liu Xiaobo, or was the AI hallucinating as a result of all mentions of Liu have been scrapped from its coaching knowledge?
“It’s a lot noisier of a measure of censorship,” Pan says, evaluating it to her earlier work researching Chinese language social media and what web sites the Chinese language authorities chooses to dam. “As a result of these indicators are much less clear, it’s more durable to detect censorship, and loads of my earlier analysis has proven that when censorship is much less detectable, that’s when it’s best.”
[…] This sort of work comes with loads of challenges. Researchers can lose entry to Chinese language AI fashions for asking too many delicate questions. Essentially the most superior fashions additionally require important compute assets to run and much more to conduct a number of rounds of assessments. And the researchers are at all times racing towards time, or extra particularly, the speedy tempo of mannequin growth.
“The problem with finding out LLMs is that they’re growing so shortly, so by the point you end prompting, the paper’s old-fashioned,” Pan says. Different researchers talked about that they’ve noticed subsequent generations of the identical Chinese language mannequin exhibit very completely different behaviors in terms of censorship. [Source]
Yang highlights different current analysis together with work by Khoi Tran and Arya Jakkli, who argue that “Chinese language fashions can lie and downplay many info, although they know them”; and by Alex Colville, whose AI protection at China Media Venture features a current report that “AI fashions from Alibaba’s Qwen household have been broadly aligned to offer optimistic messages about China in English.”
Final September, the covert nature of AI-output censorship was one subject of CDT’s interview with Jessica Batke and Laura Edelson on their ChinaFile report “The Locknet: How China Controls Its Web and Why It Issues.” The 2 emphasised the excellence between AI as a method of censoring info in its personal output, as mentioned above, and AI as a instrument of censorship and surveillance elsewhere, as mentioned beneath.
Final week, OpenAI researchers printed the newest of their collection of reviews on Disrupting Malicious AI Makes use of. Alongside romance scams and Russian operations, the report highlights Chinese language authorities’ “systematic use of AI for monitoring, profiling, translation, content material creation, and inner documentation.” One instance is the usage of AI-generated pretend screenshots to help malicious reviews to Western social media platforms. ChatGPT was additionally requested to assist plan an affect marketing campaign focusing on Japanese Prime Minister Sanae Takaichi, who has been a major propaganda focus since late final yr.
OpenAI’s report illustrates how AI is usually a double-edged sword: its contents got here to gentle as a result of, in an more and more acquainted sample, the banned consumer repeatedly used ChatGPT to edit progress reviews about his work, inadvertently presenting their content material to the corporate’s researchers. The leak echoes earlier episodes together with one late final yr, when Anthropic reported {that a} menace actor “whom we assess with excessive confidence was a Chinese language state-sponsored group” used the corporate’s Claude Code instrument to mount “the primary reported AI-orchestrated cyber espionage marketing campaign.” (Some observers subsequently questioned the diploma of autonomy concerned.) From OpenAI:
We banned a ChatGPT account linked to a person related to Chinese language legislation enforcement. The consumer ’s exercise revealed a well-resourced, meticulously-orchestrated technique for covert IO towards home and international adversaries, termed “cyber particular operations” (网络特战). As a part of this technique, they tried to make use of our mannequin to plan a covert IO focusing on the Japanese prime minister, however our mannequin refused. In addition they used ChatGPT to edit periodic standing reviews on the conduct of “cyber particular operations” extra broadly. These updates advised that Chinese language legislation enforcement had finally launched the operation focusing on the prime minister with out utilizing our mannequin. In addition they advised that the menace actors had performed many different, earlier operations, in a complete effort to suppress dissent and silence critics each on-line and offline, at residence and overseas. This effort seems to be large-scale, resource-intensive and sustained, partaking at the least a whole lot of workers, hundreds of pretend accounts throughout scores of platforms, and the usage of locally-deployed AI fashions, particularly Chinese language ones. […]
[… T]he consumer ’s exercise referenced a a lot wider vary of techniques that they claimed to have deployed throughout broader “cyber particular operations”. At completely different occasions, they referenced over 100 completely different techniques that have been ostensibly developed to conduct end-to-end focusing on campaigns designed to establish, strain, disrupt, and silence dissidents and critics. These techniques have been sorted by broad themes, equivalent to manipulating narratives, amplifying or suppressing content material, attacking the legitimacy of dissidents and critics, exerting social and psychological strain, and exploiting platforms. Examples of particular person techniques included flooding anti-CCP conversations with pro-CCP or irrelevant content material; creating pretend social media accounts to unfold and amplify content material; spreading unfavorable tales and false claims concerning the CCP’s opponents; stoking tensions in dissident communities; trolling dissidents’ posts; and focusing on their psychological well being. The updates additionally referenced focusing on dissidents’ households, reporting their social media accounts for fabricated violations (typically supported by pretend proof), and hacking their livestreams. Some spoke of making web sites and boards exterior China and even mentioned the potential of infiltrating and influencing Western platforms.[…] A few of the ChatGPT consumer’s prompts described efforts to suppress one other goal on X, Hui Bo, deal with @huikezhen. In line with the ChatGPT consumer, these efforts consisted of trying to set off X’s automated methods to get Hui’s account degraded. For instance, the operation claimed to have posted abusive replies to Hui’s tweets, provoked him into arguing again, after which filed hundreds of reviews towards his replies, accusing him of violating the platform’s requirements. The ChatGPT consumer ’s prompts claimed that this exercise had led to Hui’s account being restricted by X. In addition they claimed to have created dozens of pretend accounts that regarded like Hui’s account, in order that customers trying to find the actual account would discover the fakes as an alternative. Whereas we’re not in a position to independently affirm whether or not and the way any such abusive reviews have been truly despatched, as of November 29, 2025, Hui’s X account was certainly restricted, and quite a few different X accounts that used his identify and profile image confirmed up in search outcomes as an alternative.
Additional prompts by this consumer claimed that the “cyber particular operations groups” had additionally focused the Bluesky platform by creating pretend accounts that posed as main dissidents, with the specific intent of pre-empting these dissidents’ attainable use of Bluesky. Open-source investigation enabled us to establish exercise which resembled this declare. For instance, a handbook search on the platform recognized 5 accounts that appeared to impersonate Hui Bo, all of them created on December 5, 2024, in response to a freely out there on-line instrument . Comparable, smaller batches of accounts appeared to impersonate Trainer Li and former CCP Central Social gathering Faculty professor Cai Xia, one other frequent goal of Spamouflage.
[…] The impression of those many techniques seems to have diverse tremendously. The ChatGPT consumer ’s reviews included references to dissidents shedding social media followers, decreasing their exercise, and even giving up fully because of the harassment. Some prompts claimed that dissident accounts had been taken down because of the “cyber particular operations”. These claims shouldn’t be taken frivolously, particularly towards the backdrop of bodily and psychological harassment that the consumer described.
In different areas, nevertheless, the impression seems to have been much less. As of November 30, 2025, the X account @xu96175836 and the accounts of Trainer Li and Hui Bo have been nonetheless energetic. Because the screenshots of the anti-Takaichi operation present, the vast majority of posts didn’t obtain engagement from genuine audiences; many had such low viewing figures that they possible didn’t even attain genuine audiences. Guide investigation confirmed solely a handful of cases of the operation’s hashtags occurring throughout social media (extra could have been deleted already by the platforms). In a single replace, the ChatGPT consumer recorded that their unit had remodeled 50,000 posts throughout over 200 Western platforms. Of these, below 150 posts obtained over 300 shares or feedback. [Source]
One other notable level from the banned consumer’s prompts was “the significance of mixing on-line and offline operations, particularly when it associated to authorities critics inside China.” Particularly, however not solely: “In line with the consumer, on one event, Chinese language operators disguised themselves as US immigration officers to warn a dissident – unnamed, however apparently based mostly in the USA – that their public statements had damaged the legislation.”
Targets named within the report embody the China-focused rights group Safeguard Defenders and “Trainer Li,” who runs the outstanding X account @whyyoutouzhele. He posted a prolonged response to the report, together with the next:
We hope that X, YouTube, Bluesky, and different social media platforms acknowledge that your automated content-moderation methods are being weaponized by the CCP. We urge these platforms to construct mechanisms able to detecting state-level coordinated assaults, fairly than forcing victims to bear the results of being silenced time and again.
This report additionally reminds us that AI is turning into a brand new instrument for the CCP to suppress dissent. These operators are already utilizing domestically deployed open-source AI fashions to mass-produce content material, monitor targets, and translate multilingual supplies. We acknowledge the work OpenAI has performed to establish and disclose this menace, and we thank OpenAI for sharing this info with us.
On the identical time, we name on the whole AI trade to confront this drawback straight. When your know-how is getting used to systematically suppress human rights, “we’re simply constructing instruments” is just not a suitable reply. [Source]













