‘The heart is deceitful above all things and beyond cure. Who can understand it?’ — Jeremiah 17:9
II. Lowering the Floor for Harm
The most significant and immediate risk factor for NSA adoption of increasingly powerful AI is its unprecedented accessibility. Open-source models, commercial off-the-shelf drones and GPUs, and generative AI platforms have dramatically lowered the financial and technical barriers to entry to intelligent warfare, terrorism, and societal manipulation. Their ability to create disproportionate effects on states without a large investment in time, expertise, or conventional weaponry is a key motivator for NSAs to adopt them to amplify the effectiveness and speed of their tactics.
Cyber and information warfare
The information domain is a primary arena for early exploitation by NSAs. Large Language Models (LLMs) and other generative AI tools allow NSAs to overcome resource limitations and language barriers, producing vast amounts of propaganda at minimal cost. AI-generated propaganda may be more effective than what they could have written themselves. AI labs at OpenAI, Anthropic, and Google regularly report disrupting such activities.
A notable example is ISIS’s AI-generated ‘News Harvest’ bulletins, which feature AI-generated anchors made to resemble an Al Jazeera broadcast. This makes the content appear more credible and harder for moderation systems to detect and remove, but also reduces the requisite time, skill, money, and energy required to produce it. This is an early step toward a future where NSAs can generate entire, credible-looking news or social media channels, flooding the information space with propaganda that is difficult to distinguish from legitimate journalism. It also reduces the need for charismatic human ideologues to be the ‘face’ of a movement.
AI-generated news anchor reads a report by ISIS’s Amaq News Agency, presenting the group’s attack on Crocus City Hall, Moscow, as a victory.
AI-generated propaganda related to the Israel-Hamas war has been propagated through social channels. These propaganda posters were shared on official Al-Qassam Brigades channels and are likely to be AI-generated.
This tactic of creating seemingly legitimate news outlets is a core component of China-linked disinformation campaigns like Spamouflage (also known as Dragonbridge). For instance, in a campaign dubbed ‘Sneer review,’ operatives used ChatGPT both to generate a barrage of critical comments against a Taiwanese video game but also to subsequently write a long-form article claiming it had received backlash. China-linked actors have created fake news websites that publish AI-generated summaries of state media articles in multiple languages to facilitate the spread of CCP propaganda through what is perceived as independent journalism. These campaigns also leverage AI to create memes and videos featuring AI-generated news anchors to attack political candidates, such as in the lead-up to the 2024 Taiwanese presidential election. While this state-level activity is a threat in itself, it is potentially more catastrophic in that it provides a model for non-state actors. The public reporting on these campaigns unintentionally serves as a form of knowledge transfer; agile NSAs can observe these tactics and replicate them.
The radicalization and recruitment pipelines of extremist groups are, too, enhanced by AI. These systems can analyze individuals’ online behavior and digital footprints to identify vulnerabilities and tailor propaganda that resonates with specific psychological profiles, thereby increasing the likelihood of successful recruitment. The Islamic State Khorasan Province (ISKP) is notable for extensive propaganda efforts, managed by its media wing, the Al-Azaim Foundation, which promotes ‘media jihad’ by producing multilingual content tailored for different audiences. After the Kabul Airport bombing in August 2021, for example, ISKP targeted disillusioned Taliban members by branding the peace agreements with the Taliban as a betrayal of Sunni Muslims, aiming to exploit rifts and position itself as the sole purveyor of authentic jihad. While this form of targeted propaganda is a classic insurgent tactic, AI provides the tools to execute it at an unprecedented scale and precision. At the surface level, generative AI largely eliminates the classic tells of phishing or misinformation attacks, such as poor grammar and awkward phrasing, which have long served as crucial red flags for potential victims. LLMs can also analyze vast amounts of online data to identify groups and specific individuals who express sentiments of disillusionment or ideological questioning for automated micro-targeting, where tailored messages, videos, and texts are delivered directly to the most receptive audiences and individuals, either via short-form content or personally delivered by a bot. Most importantly, these attacks are increasingly simple and fast to execute. Nothing is stopping a non-state actor from using this playbook today.
AI also lowers the technical barrier for entry into cybercrime, enabling smaller, less advanced groups or individuals with malicious intentions to conduct operations that previously required significant manpower, expertise, or finances. Anthropic reported a case where a novice actor utilized a Claude model to rapidly develop malware with sophisticated features like facial recognition, dark web scanning, and a graphical user interface for generating fully undetectable payloads. They thereby enhanced their technical abilities. The delineation between a motivated actor and a ‘skilled cyber operator’ is blurring every day as generative AI becomes an enabler for these methods; we may imagine a situation in which a single motivated individual can bootstrap these methods for an insurgent group to carry out attacks on infrastructure or social targets. In a more alarming case, which Anthropic labeled ‘Vibe Hacking,’ a sophisticated threat actor employed the Claude Code model as an agentic partner in a large-scale data extortion campaign that targeted at least 17 organizations, including healthcare and government entities. The AI was used to automate the entire attack lifecycle, conducting reconnaissance on thousands of targets, harvesting employee credentials, assisting in network penetration, writing ransom letters, and even being delegated authority to make tactical and strategic decisions through a provided TTP framework. Anthropic concluded, “While we have taken steps to prevent this type of misuse, we expect this model to become increasingly common.” This is about as explicit a warning we can receive that, sans intervention, frontier models will, themselves, create these exponentially growing risk factors.
The emergence of specialized illicit AI tools on dark web forums such as WormGPT and FraudGPT — which are explicitly marketed as blackhat alternatives to mainstream LLMs — as well as a market for proven jailbreaks of mainstream LLMs further accelerates this trend by providing ready-made, off-the-shelf capabilities. Based on open-source LLMs (such as GPT-J), these illicit models are trained on malware-related data, collections of phishing templates, and compromised datasets, and are advertised as capable of generating persuasive phishing emails, writing malicious code, finding leaks and vulnerabilities in target systems, and creating undetectable malware. While doubts may exist as to the reliability of illicit models relative to jailbroken mainstream models, it is undeniable that the primary impact of this burgeoning illicit economy is the lowering of the financial, technical, and knowledge barriers to entry for sophisticated cybercrime. This will, in turn, inspire malicious actors who were otherwise discouraged by these barriers. Open-source offensive security frameworks, like HexStrike-AI — an orchestration layer allowing LLMs to direct a suite of hacking tools to perform complete attack chains to identify vulnerabilities and deploy exploits — are also being weaponized by NSAs to these same ends, resulting in a drastic compression of the attack timeline. This watershed moment was, in many ways, inevitable, as AI-powered security tools proliferate on the market.
Kinetic Warfare
Unmanned Aerial Vehicles (UAVs)Between 2016 and 2022, researchers have reported 440 cases of NSAs like ISIS, Ansar Allah/the Houthis, and Hay’at Tahrir al-Sham (HTS) using weaponized UAVs, which have struck everything from armored vehicles to civilian homes to oil facilities. Once these UAVs become AI-enabled, the quantity, impact, and targets of these attacks will be perilously transformed. The Russo-Ukrainian War has become the world’s most significant laboratory for modern warfare, the first major conflict where AI-integrated drone technology has been used at scale. As AI-enabled systems prove their strategic utility and, indeed, devastation, in high-intensity environments, these battlefield innovations provide a blueprint for NSAs globally.
NSAs are adept at converting commercially available technology into lethal weapons, and as AI makes drones smarter and more autonomous — especially in the West — we must face the risk of NSAs developing and deploying increasingly sophisticated autonomous weapons systems. Advancements in autonomous drones provide tangible operational advantages, including improved target recognition and navigation in complex environments, and, critically, a heightened resilience to electronic warfare that allows a drone to complete its mission even when jammed or to avoid detection altogether. This furthers an already potent asymmetric advantage, borne primarily of the proliferation of cheap commercial drones. Such asymmetry is already palpable now that advanced systems are commercially available and, in cases, fall into the hands of non-state actors.
Commercial-off-the-shelf (COTS) drones provide a critical lens. Take Dà-Jiang Innovations (DJI) as an example. Their drones are globally available, relatively inexpensive, user-friendly, highly adaptable, and increasingly capable for various munitions and purposes. Performance metrics, especially transmission distance, have seen dramatic improvements in the last decade, with some 2023 models boasting ranges of over 12 miles, a significant increase from the sub-one-mile ranges of the last decade. A key impact of this is that operators can remain at a safer, more concealed distance from their targets. These factors have fueled their proliferation, which has in turn inverted the traditional cost-benefit calculus of air defense. States are being forced into an economically unsustainable position, using multi-million-dollar missile systems to counter drones that may only cost a few hundred or few thousand dollars to produce and deploy. State-of-the-art air defense systems, such as the Patriot or Israel’s Iron Dome, were designed and priced to counter expensive, high-end threats like fighter jets and ballistic missiles. Using a multi-million dollar interceptor to destroy a thousand-dollar drone is a losing economic proposition, especially when faced with the prospect of swarm attacks to saturate defenses. The implication is that NSAs can achieve significant strategic effects — such as disrupting global shipping, terrorizing civilian populations, or degrading military assets — not by defeating a state’s military outright, but by simply exhausting its economic capacity to defend itself.
ISIS, among other terrorist groups like Hamas, has adopted these drones — releasing a propaganda video featuring the inexpensive DJI Phantom quadcopter — to use for surveillance, collecting intelligence, and guiding targets. A new DJI release, the FlyCart 30, is designed to carry a payload up to 88 pounds; or 66 pounds for 10 miles with the dual-battery setup, at a speed of 45 miles per hour. Priced around $20,000, the FlyCart 30 offers NSAs the ability to deploy drones for bombing, kamikaze, or CBRN attacks, capable of holding large and destructive explosives, at a relatively low cost.
Turkey’s STM Defence sells the Kargu drone, a system possessing a 1.1kg warhead, autonomous tracking and hit capabilities, a facial recognition system, and swarm capabilities. These were delivered to Libyan militias, which have used the systems to attack logistics convoys and armed forces affiliated with Khalia Haftar. Regardless of whether the drones were acquired via an open market or a direct transfer, it portends a world wherein lethal autonomous weapons are not just a threat to be dictated by states, but by non-state actors as well.
NSAs acquire drones, or their components, in a variety of ways. The Houthi Rased, or “surveyor” drone, is nearly identical to the Skywalker-8 manufactured by Skywalker Technology Limited of China, and the Houthi Qasef-1 is nearly identical to the Iranian Ababil-T. The Skywalker-8 drone can be purchased online, while the Qasef-1 components and design are likely supplied directly by the Islamic Regime in Iran, smuggled through an array of land and sea routes, before being assembled in Yemen by the Houthis themselves. Servoactuators were exported from Japan to Abu Dhabi and intercepted before they reached their intended recipient in Yemen. Qasef-1 UAVs documented by Conflict Armament Research have employed circuit boards with microcontrollers produced by the American company Amtel, general voltage regulators produced by the Swiss STMicroelectronics, microprocessors produced by the American company Digi International, petrol model engines possibly produced by the Chinese company Mile Hao Xiang Technology Co. Ltd., and servomotors produced by the American company Hitec.
The ability to perform complex AI computations directly on the drone (edge computing) is a critical enabler for true autonomy, especially in environments where communication links may be jammed or unavailable. Edge devices are able to process captured data with a system of sensors, an edge processor, an inference engine, and communication and navigation modules, without having to rely on cloud infrastructure. Advances in powerful, low-cost, and low-power processors, such as the NVIDIA Jetson series, are a key factor in allowing for real-time processing of sensor data for tasks like object recognition and targeting. This onboard processing is made feasible for small, power-constrained drones through techniques like model quantization, which involves reducing the precision of an AI model’s parameters, which can shrink the model’s size by up to 75% and significantly reduce its computational and energy requirements. This allows complex computer vision models, such as YOLO12 for real-time object detection, to run efficiently on drone hardware. Here, the instructions and expertise provided by LLMs — or the dearth of resources borne of their proliferation — may come in handy, helping an NSA connect an edge device running these models to a COTS drone.
There are several barriers to the proliferation of AI within NSA kinetic systems. Systems integration, for one, is key. While open-source computer vision software is readily available, reliably integrating and fusing data from a suite of sensors to create a coherent and accurate understanding of a complex and contested environment is a significant engineering challenge. Onboard autonomy also requires powerful processing hardware that is difficult for NSAs to acquire; further, there is a fundamental trade-off between processing power, energy consumption, and the drone’s size, weight, and power (SWaP) constraints — more powerful processors consume more energy, which requires larger batteries, increasing weight and reducing flight time and maneuverability. And if a mainstream model cannot be jailbroken, or an open-source model cannot be tweaked, then a bespoke one will require vast amounts of relevant training data (for instance, thousands of images of armored vehicles from multiple angles), which is also difficult to collect, and requires significant compute, which is difficult to acquire. This would be a more realistic scenario for a better-resourced NSA, such as Al-Qaeda, rather than a lower-resourced NSA or hacktivist group. In the near-term, these capabilities may evolve to AI-augmented targeting, which would enable NSAs to use AI for terminal guidance. More advanced applications, such as swarms of autonomous drones or targeted assassinations, become much more plausible upon successful mastery of this step.
Chemical, biological, radiological, and nuclear (CBRN) warfare
Developments in AI, from LLMs to biological design tools (BDTs), pose immense risks for biosecurity.
Large Language Models
Actors can use LLMs like ChatGPT, Claude, and Gemini to generate information and instructions about CBRN creation that would require significantly more time, energy, and patience to uncover manually.
OpenAI’s GPT-5 System Card, published August 13, 2025, noted that the model meets the lab’s threshold for “high” capability in the biological and chemical domain, being “on the cusp” of reaching the ability to “meaningfully help a novice to create severe biological harm.” The lab subsequently activated its associated high-capability preparedness safeguards. In practice, this risk threshold means an actor could manipulate GPT-5 into providing key information toward building a CBRN weapon. This is the first model OpenAI has designated “high” capability in this domain, despite performing better than o3 — an older reasoning model — against lab-sponsored bioweaponization attacks. In some ways, this is an encouraging sign; private-sector companies implementing stronger safeguards than current capabilities may justify in order to increase readiness for higher thresholds reflects perhaps more of a dedication to safety than we may have anticipated. In other ways, it portends a future in which further releases inevitably cross this boundary; in projecting the image of readiness, labs prime society to accept this higher threshold for harm as a symptom of grander innovation.
Anthropic’s Claude 3.7 Sonnet System Card, published February 2025 (just eight months after 3.5 Sonnet), also noted that the model, compared to previous models, provided better advice in key steps of the weaponization pathway and made solving complex problems in CBRN weapon creation faster. More alarmingly, Claude’s succeeding Opus 4 and Sonnet 4 models “showed substantially greater capabilities in CBRN-related evaluations, including stronger performance on virus acquisition tasks, more concerning behavior in expert red-teaming sessions, and enhanced tool use and agentic workflows,” according to their system card. They added, “Several of our external red-teaming partners reported that Claude Opus 4 performed qualitatively differently from any model they had previously tested.” They activated AI Safety Level 3 (ASL-3) Deployment and Security Standards as a result.[Though it should be noted that Anthropic does not run evaluations on chemical risks.] Anthropic also recently announced novel research on removing harmful information about CBRN weapons from their models’ pretraining data, which reduced the models’ accuracy on a harmful capabilities evaluation by 33%. This, however, does not prevent a malicious actor from relying on more conventional jailbreaks, like contextual manipulation or coded instructions, or from efforts to manipulate the models’ learning processes.
Google’s Frontier Safety Framework and 2.5 Deep Think model card show that their models are at risk of reaching the critical capability level for CBRN Uplift Level 1 in the near future, meaning the model could “significantly assist a low-resourced actor with dual-use scientific protocols, resulting in a substantial increase in ability to cause a mass casualty event.” Indeed, an independent evaluation found that both Gemini 2.5 Flash and 2.5 Pro were highly susceptible to attacks, with attempts to glean harmful information on CBRN weapons showing an average success rate of 28.2% across all model-modality combinations. When combining text and image inputs on 2.5 Flash, the researchers were able to find the most vulnerable combination, succeeding in 52% of attempts to elicit harmful CBRN information. They point to a few reasons for this: CBRN-related information appears so rarely in pre-training datasets compared to other common harmful content that the model has encountered insufficient examples to develop consistent refusal; the complex nature of dual-use information (i.e., information being useful for both harmful and non-harmful purposes); the technical nature of CBRN queries, which models may interpret as legitimate requests; as well as the vulnerability of the image modality in 2.5 Flash to vision-based attacks. The researchers were able to obtain detailed CBRN technical procedures, among other harmful information, often solely with simple prompts, rather than advanced jailbreaks.
Chinese lab DeepSeek poses potentially the highest risks of the most readily available LLMs, as their model, R1, is not only more prone to jailbreaking than ChatGPT, Claude, and Gemini, but also offers minimal guardrails to malicious actors who wish to abuse it. Researchers found it was able to provide users with step-by-step instructions on creating a suicide drone, bombs, explosives, and untraceable toxins. It is also still susceptible to the ‘Evil Jailbreak’ that was patched in GPT-4 and GPT-4o, which enabled researchers to prompt DeepSeek to write infostealer malware to extract credit card data from compromised browsers and transmit it to a remote server. Another evaluation found that DeepSeek-R1 is 3.5x more vulnerable than GPT-o1 and Claude-3-opus in producing CBRN content, and 2x more vulnerable than GPT-4o. While American labs may lead in AI safety, this is a cause for concern, not celebration. A fragmented safety landscape, be it inevitable or not, means malicious actors can just as easily rely on jailbreaking and fine-tuning models like DeepSeek to achieve their aims.
Biological design tools
It is unlikely that an actor will be able to use LLMs to build a CBRN weapon end-to-end. A sophisticated actor would likely need to combine the knowledge gleaned from LLMs with the synthesis capabilities offered by biological design tools (BDTs), which are rapidly progressing in their capabilities, too.
In February 2025, the world’s largest biological AI foundation model, Evo 2, was released by researchers from the Arc Institute and Nvidia. Trained on over 128,000 human, animal, plant, bacterial, and viral genomes, it is an open-source platform that allows users to generate genomic sequences. Researchers at CSIS posed the example of malicious actors using “a more advanced BDT with generative capabilities to create a new version of bird flu that is both highly lethal and highly contagious. Such a scenario would likely make the Covid-19 pandemic… look like the common cold.” While today’s BDTs might be “incapable of designing such pathogens from scratch” — a safeguard — the unprecedented speed with which AI is progressing, and Evo’s open-source nature, means that we could find ourselves in such a reality sooner than we might think; indeed, within weeks of Evo 1’s release, researchers refined its published weights with data on viruses that infect humans, and within a year, Evo 2 was announced with the ability to hold 30 times more data and reason more than 8 times as many nucleotides at a time than its predecessor.