Honeybot Deployment Patterns for Enterprise PSPs: What We've Learned

Marcus Weld · 24 February 2026 · 10 min read

Abstract technical architecture diagram representing honeybot deployment patterns in payment systems

We've now had honeybots running in production environments long enough to have something useful to say about what deployment looks like in practice — not in a demo environment where we control all the variables, but in live PSP infrastructure where payment rails are real, latency constraints are binding, and false positives have direct cost consequences for fraud ops teams who have to triage them.

This post is about what we've learned on the engineering and operational side. It's not a sales document — there are genuine limitations to acknowledge, and some approaches we thought would work at scale that turned out to require significant adjustment. We're writing this for PSP engineers and fraud ops leads who are evaluating conversation-layer fraud tooling and want to understand the implementation realities before a proof-of-concept.

Placement Decision: Where in the Payment Flow

The first architectural question is where the honeybot sits relative to the payment initiation flow. There are three viable positions, and each involves tradeoffs that aren't always obvious from vendor documentation.

Pre-payment notification injection places the honeybot at the point where the PSP's system sends the customer a payment authorisation notification — typically a push to the PSP's mobile app or a link sent via SMS. The honeybot receives a copy of the conversation context that triggered the notification and can begin engaging the suspected scammer in a parallel channel while the customer's payment is still pending SCA. The advantage is timing: the honeybot has time to work before the customer has confirmed. The disadvantage is that this requires the PSP's notification infrastructure to support a callback or event stream that not all core banking platforms emit cleanly.

Sidecar API pattern positions the honeybot as a separate service that receives flagged conversation contexts forwarded from the PSP's fraud detection layer — either the existing TMS or a separate signal aggregator. The honeybot operates independently and returns a risk signal plus fingerprint data. This is architecturally cleaner and doesn't require changes to the payment initiation path itself, but it introduces a dependency on the upstream flagging quality: if the existing fraud stack isn't surfacing the right conversations to the sidecar, the honeybot never sees them.

Channel-embedded placement applies where the PSP operates its own communication layer — typically a mobile chat feature or in-app messaging between customer service and customers. Here the honeybot can be embedded directly in the conversation handling middleware, with routing rules that divert flagged conversations to the bot. This has the richest context but is only applicable to PSPs with first-party channel infrastructure. Most Faster Payments-focused PSPs we work with don't have this.

In practice, the sidecar API pattern is the most common starting point because it imposes the lightest integration footprint. Most PSPs can add a webhook destination to their existing fraud tooling faster than they can instrument their notification pipeline. The production payoff from notification injection is higher, but the integration timeline is longer.

Latency Budgets: The Constraint That Shapes Everything

The hardest engineering constraint in honeybot deployment isn't the model — it's latency. A honeybot that takes 4 seconds to respond in a conversation that a scammer expects to move at WhatsApp pace will be detected as non-human almost immediately. Scammers operating at scale develop a feel for automated responses very quickly; the latency signature is one of the primary detection signals they use.

Our target latency budget for a honeybot response, measured from receipt of the scammer's message to delivery of the bot's reply, is under 1,200ms in the p95 case. That includes model inference time, conversation context retrieval, and outbound message delivery through whatever channel integration we're using. In early production, we were hitting 2,400ms on average — well outside the credible human response window for short messages.

The improvement came from three places. First, model size reduction: our initial conversations used a larger model that produced higher quality responses but couldn't hit latency targets. We moved to a smaller model for the primary response generation and reserved the larger model for escalation decisions and fingerprint extraction, which happen asynchronously and don't affect the conversation-visible response time. Second, context caching: retrieving the full conversation history on every response was expensive; caching the active context in memory with a short TTL cut retrieval time significantly. Third, geographic placement: processing infrastructure co-located with the PSP's own data centre region reduced round-trip time for the channel integration leg.

We're not saying every deployment hits the 1,200ms target — there are cases where channel architecture or PSP infrastructure constraints push this out. But it's the number we're engineering toward, because anything consistently above 2 seconds materially degrades the bot's effectiveness against experienced operators.

False-Positive Suppression: What Actually Matters to Ops Teams

The false-positive conversation in fraud tooling often focuses on the wrong metric. Vendor materials tend to cite low false-positive rates on detection models. What fraud ops teams care about in practice is something slightly different: how many triage actions does your tooling generate per week that require human review and turn out to be benign?

For honeybots specifically, a false positive means a honeybot engaged with a conversation that wasn't a fraud attempt. The consequences depend on context. In the sidecar API pattern where the honeybot operates in a separate channel, the cost of a false positive is relatively low — the customer's payment flow is unaffected, and the "conversation" the honeybot entered was a flagged channel context rather than a direct customer interaction. In channel-embedded deployment where the honeybot takes over what the customer believes is a customer service interaction, a false positive is more disruptive.

Our approach to suppression uses a two-stage gate. The first stage is the upstream fraud signal — the honeybot only activates when the PSP's existing tooling has already elevated a risk score above a threshold, typically because transaction pattern, payee characteristics, and customer communication signals have all moved in the same direction. This isn't honeybot-specific logic; it's asking the PSP's existing fraud stack to do pre-qualification. The second stage is a fast initial assessment from the honeybot itself in the first 90 seconds of engagement, classifying the conversation as likely-fraud or likely-benign based on structural signals in the opening exchange. If the second-stage classification is benign, the engagement ends and a low-confidence flag is passed back to the fraud ops queue rather than a high-priority alert.

False-positive rates in production have been acceptable, but getting there required tuning the upstream risk threshold carefully. The temptation is to lower the threshold to catch more fraud — the right instinct in isolation, but one that generates a proportionally higher benign-activation rate that ops teams quickly find unsustainable.

Fingerprint Handoff and What Happens After Engagement

The honeybot's primary output isn't a block decision — it's a fingerprint. That fingerprint (device metadata, phone number attributes, linguistic patterns, conversation structure) is returned to the fraud ops platform and needs to go somewhere useful. The integration work here is often underestimated.

PSPs typically have a TMS that can receive risk signals but isn't designed to ingest conversational fingerprints. The fingerprint data needs to be mapped to a format the TMS can act on, which usually means translating it into a risk score adjustment or a tagged entity in the PSP's case management system. We've worked with PSPs whose case management platforms required a custom connector before fingerprint data could be reviewed by a fraud analyst in any useful way.

The downstream use of fingerprint data also needs a policy decision from the PSP: are fingerprints being used only to block the current payment, or are they being retained and matched against future attempts? Retention and matching against future transactions is significantly more valuable — a scammer fingerprint that persists in the PSP's risk database allows the same operator to be detected on their next approach even if they've changed the conversation script. But retention introduces UK GDPR considerations around data minimisation and purpose limitation that need to be addressed in the PSP's data protection impact assessment. This is operational groundwork that has to happen before deployment, not after.

What Enterprise PSPs Get Wrong in Scope-Setting

The most common scoping error we see in enterprise PSP deployments is treating honeybot deployment as a point solution for one fraud category. A honeybot that's been calibrated exclusively for investment scam conversations will perform poorly on romance fraud and vice versa. The conversation structures, the scammer response patterns, and the contextual signals the bot needs to look for are different enough that a single undifferentiated deployment produces mediocre results across all categories.

The correct framing is that a honeybot deployment is a platform decision, not a feature addition. You're building the infrastructure to run conversation-layer tools, and then you're deploying specialised configurations for each fraud category the PSP wants to address. The infrastructure work — integration plumbing, latency engineering, false-positive tuning, fingerprint handoff — is common. The conversation configuration is category-specific.

This doesn't mean a PSP has to solve all categories on day one. Starting with the highest-value category (typically investment or romance fraud based on the PSP's own loss data) is a reasonable approach. But the scope-setting conversation should acknowledge from the start that the goal is a platform capability, not a one-time integration. PSPs that frame it the other way tend to under-invest in the integration infrastructure and then wonder why results aren't scaling as expected when they try to add a second fraud category six months later.