Real-Time Scammer Adaptation in the LLM Era: What Fraud Teams Need to Know

Dr. Sarah Nkemelu · · 8 min read
Abstract concept of adaptive AI-assisted fraud and behavioural-intent modelling

The fraud detection community has spent considerable effort in recent months discussing whether large language models could be used to detect scams. That's a worthwhile conversation. But it's secondary to a more immediate operational question: what does it mean for detection when scammers are using LLMs to run and adapt their conversations in real time?

Our observation — from conversations our honeybots engage with and from the pattern data we accumulate across those engagements — is that LLM-assisted scam operations have a meaningfully different adaptation signature from human-scripted operations. The difference matters for how fraud teams design detection, and for how interception tools like ours need to evolve.

This piece describes what we see, what we think it means for the detection challenge, and where we're honest about the limits of our current understanding.

How LLM-Assisted Scam Operations Differ Structurally

Traditional APP scam operations — particularly investment scams and romance scams — relied on scripts. Human operators worked from a playbook: opening lines, objection handling, crisis narrative variants, close sequences. Scripts provided consistency and could be optimised over time, but they were fundamentally static in the short term. If an operator encountered an unusual response from a target, they either had a trained escalation path or the engagement broke down.

LLM-assisted operations don't work from a fixed script in the same way. The operator — who may be a low-skill individual in a fraud compound or, in increasingly automated cases, a minimal supervisory presence — inputs the target's response and receives a contextually appropriate reply generated in real time. The LLM can handle unusual objections, maintain consistent persona details across a long conversation, switch tone on demand, and generate plausible supporting material (fake investment returns, fake identity documents, fake organisation websites) without the operator needing deep domain knowledge.

The structural effect on the conversation is notable: LLM-assisted scam conversations are more coherent, more contextually consistent, and more adaptive than their human-scripted predecessors. Inconsistencies in a scammer's account — the kind that an alert victim or a fraud ops analyst might previously have flagged — are far less common. The quality floor of the engagement has risen.

The Adaptation Speed Problem

Pattern-based detection for scam conversations has always faced an arms race dynamic: detection tools identify a pattern, scam operators modify their approach to avoid the pattern, detection tools update. The cycle was previously measured in weeks or months, because updating a human-scripted operation required retraining operators, distributing new materials, and having those materials propagate across a distributed workforce.

LLM-assisted operations collapse this cycle dramatically. An operator — or an automated operation — that encounters a detection signal or intervention can modify its approach within a single conversation. If a honeybot or human fraud analyst introduces a probe question designed to surface inconsistencies in a claimed investment return structure, a sophisticated LLM-assisted operation can recognise the probing pattern and pivot the conversation away from the vulnerable territory.

We've observed this in honeybot engagements: conversations where the initial approach matched a known investment scam signature shifted mid-engagement to a different framing — often romance or relationship-building — in a way that looked responsive to the bot's probing rather than coincidental. Whether this reflects deliberate counter-detection design or emergent LLM behaviour in response to unusual conversational inputs is genuinely uncertain. The operational effect is the same: the pattern that triggered the detection is no longer reliably present.

We're not saying this happens in every LLM-assisted engagement, or that it represents a solved capability that fraud operators have universally deployed. But the capability exists and is detectable in the engagement record. Fraud teams that are relying on static pattern matching — even with relatively recent training data — should expect degraded detection rates against this category.

From Pattern Recognition to Behavioural Intent

The detection challenge in the LLM era requires a shift in what fraud detection models are looking for. Pattern recognition — matching a conversation to a known fraud script template — is increasingly insufficient against an adversary that can generate contextually appropriate, pattern-breaking responses in real time.

What persists across LLM-assisted scam conversations, even when surface-level patterns are adapted away, is behavioural intent. The conversation may not match a known investment scam script, but the intent structure — move the target toward a specific action (payment, account detail disclosure, credential sharing), overcome resistance, create urgency — is present regardless of surface variation. Modelling intent rather than pattern requires a different approach to conversation analysis.

Intent-based detection focuses on the dynamic structure of the conversation: who is directing, who is responding; where urgency signals are introduced relative to relationship establishment; how the conversation handles refusal or hesitation; what actions the conversation is converging toward. These structural signals are harder to adapt away than surface patterns because they're inherent to the fraudulent goal, not to the specific script being used.

Building detection models that operate at this level of abstraction is harder than pattern matching. It requires training data that captures intent dynamics across diverse surface variations, and it requires models that are evaluated on intent classification rather than token-level pattern similarity. We're working in this direction in our own model development, and we're not the only team in the space doing so. But it's worth being honest: the field is earlier-stage on intent modelling than on pattern recognition, and the evaluation frameworks are less mature.

What This Means for Honeybot Design

The adaptation capability of LLM-assisted operations has direct implications for how a honeybot should be designed and evaluated.

A honeybot that operates by matching the scammer's known script — playing along with the established fraud narrative — risks being detected when the scammer's LLM pivots. If the honeybot's responses are optimised for engagement within a known investment scam context and the scammer shifts to a different frame, the bot's responses may become contextually incongruent in ways that flag it as automated.

The design direction we've moved toward is a honeybot that responds to conversational intent signals rather than surface script matching. Rather than maintaining engagement within the scammer's preferred frame, the bot introduces controlled ambiguity — responses that are plausible within multiple frames simultaneously — and observes how the scammer resolves that ambiguity. Scammers, even LLM-assisted ones, resolve ambiguity in characteristic ways that reveal the underlying intent structure. That resolution behaviour is what the honeybot is extracting.

This is more sophisticated than a first-generation honeybot design, and it's harder to evaluate. The quality metric shifts from "how long did the bot maintain engagement?" — which was a reasonable early proxy — toward "how accurately did the bot extract intent signals, and how quickly?" The fingerprint output of a modern honeybot engagement should include intent-layer features, not just surface behavioural metadata.

The Automation Threshold and What It Means for Scale

A question that comes up in fraud ops conversations is whether fully automated LLM-assisted scam operations — where a human operator is not present in the loop at all — are already deployed at scale. Our honest answer is that we don't know with confidence, and anyone claiming certainty on this is overstating what's currently observable.

What we observe is a spectrum. At one end, human operators using LLM tools as writing assistance — they read the target's message, hit a button to generate a suggested response, edit it, and send. At the other end, what appears to be fully automated operation where response cadence, consistency, and adaptation speed are beyond what human operators could sustain across many simultaneous engagements. We've seen engagement patterns in honeybot sessions that are consistent with the latter — response times that cluster too tightly around infrastructure latency to be human-in-the-loop, consistency of persona detail across very long conversations that would require extraordinary operator discipline to maintain manually.

The implication is operational: if fully automated scam operations become the norm rather than the exception, the volume of attempted engagements will increase substantially. The cost per attempt drops sharply when you remove the human operator. Detection infrastructure that was designed around a volume floor calibrated to human-operated scam activity will face a different load profile. PSPs that rely primarily on human review of flagged conversations will face a triage bottleneck that automated tooling — including honeybot infrastructure — needs to absorb.

We're not saying this transition has already happened. But we're designing for it, and fraud ops teams that are building capability roadmaps for the next two years should be too. The cost structure of scam operations is changing in a direction that favours volume, and the detection infrastructure needs to scale with it rather than after it.