Detecting and Countering AI-Enabled Intrusions with Deception

TL;DR - Offensive AI agents are detectable. Counter-offensive is possible. The world is about to fill with autonomous systems fighting each other - some for the dark side, some for the light. Choose your instruments.

Same front, new adversary

December 2003. Black Hat Asia, Singapore. We were on stage with a talk on honeypots versus self-propagating worms - Honeyd against MSBlast. The code was free. The idea travelled.

2010, another Singapore stage. Different adversary: organised crime running exploit kits at industrial scale. We showed how those kits could be turned against their operators. Controversial then. It held.

Each era, the same pattern: the industry panics about a new class of attacker, and a handful of practitioners quietly prove that observation, deception, and measured response still hold the line.

April 2026, Singapore again. Black Hat Asia week. Different chapter. The adversary has mutated once more. The industry is waking up to autonomous AI agents used as tradecraft - agentic hacking, vibe hacking, “whatever-AI”. Major labs are revisiting release policies on frontier capabilities for cyber reasons. Flashpoint’s 2026 Global Threat Intelligence Report recorded a 1,500% surge in AI-related criminal activity in a single month. Microsoft Security (April 2026) reports a 450% year-over-year increase in AI-embedded phishing effectiveness. CrowdStrike’s 2026 Global Threat Report documents an 89% year-over-year rise in attacks by AI-enabled adversaries. Everyone is rediscovering, in real time, that language models make very capable attackers.

“Shall we play a game?”

So at OHIIHO, the experiments were run.

Four controlled wargame labs. Roughly 1,000 runs against a HIIH high-interaction honeypot, with instrumentation built for this exact question. Panels of varying size, up to 150 production LLMs.

The persistence panel - 17 production LLMs across 10 providers:

Provider	Model
Anthropic	Claude Sonnet 4
	Claude Sonnet 4.5
	Claude Opus 4
	Claude Haiku 4.5
	Claude 3.7 Sonnet
OpenAI	GPT-5
	GPT-4o
Google	Gemini 2.5 Pro
xAI	Grok-4
Meta	Llama 3.3 70B Instruct
Mistral	Mistral Large
DeepSeek	DeepSeek V3.2
	DeepSeek R1
Qwen	Qwen3-Max
	QwQ-32B
Z-AI	GLM-4.6
Moonshot	Kimi K2.5

Three findings worth a Monday morning.

1. Persistence is universal

Under Unix, every model in the 17-LLM persistence panel spontaneously wrote its key to ~/.ssh/authorized_keys.

Same mechanism. No variation.

One auditd rule on that path becomes a vendor-agnostic canary for any compromised agent.

2. Attackers come in three shapes

Within the panel, three distinct command archetypes emerged - from heavy reasoning models dropping 1,000+ character one-liners, down to small models firing rapid-fire two-word commands that look indistinguishable from a bot scanner on first glance.

They are not.

Many of them leave behavioural markers inside their own commands - narrative openers, phase headers, enumeration patterns - that classical operators, human or bot, simply do not produce in live shell sessions. The signal is there if you know to look for it.

3. Counter-forensics has arrived. False flag works too.

Without being asked, without being prompted in that direction, one model on its own wrote a decoy operator log to disk - correct format, plausible internal references, tone indistinguishable from a human red teamer’s notes.

An analyst reading that file would believe a human was on the box.

The threat model just gained a new chapter.

And one more thing - counter-intelligence works

Indirect prompt injection is old news as a vulnerability. The surprise is that the same mechanism runs in reverse.

Crafted content inside HIIH high-interaction honeypots extracted substantial portions of attacker system prompts on several models, and self-disclosed engine identity on others. (Read twice.)

Operational context extracted directly from the adversary’s own working memory - mission brief, objectives, engine identity. Live. Not stale feed data.

More is possible - but those findings belong in coordinated disclosure channels, not blog posts. We are working through that process. Vendors and labs interested in joint disclosure can reach us at security@ohiiho.com .

What this means operationally

Defenders can stop treating AI-enabled intrusions as unknowable and start detecting them as a behavioural class.

The instruments and concepts that worked against worms still apply - sharpened, instrumented, re-pointed.

Observation. Deception. Measured response.

Still the line.

Same front. New adversary. Same kind of instruments. Sharpened.