The Challenge
A leading digital bank experienced significant challenges with its existing fraud detection system, specifically regarding the high rate of false positives. While the system was effective at detecting potentially fraudulent activity, it did so at the cost of raising numerous false alerts, which strained customer relations and operational resources. Customers often faced unnecessary account freezes and transaction declines, leading to dissatisfaction and a loss of trust in the bank's services.
The bank's objective was clear: drastically reduce the number of false positives while maintaining the robustness of their fraud detection efforts. This required a system that provided more accurate anomaly detection in real-time, ensuring the bank could swiftly respond to genuine threats without compromising the customer experience.
Our Solution
Adyantrix proposed a sophisticated, AI-driven solution leveraging machine learning algorithms designed to fine-tune the fraud detection process. By implementing a tailored machine learning model, we were able to analyse transactional patterns with greater precision.
Utilising advanced data analytics, these models learned from historical transaction data and current activity patterns to differentiate genuine transactions from fraudulent actions. The model effectively adapted over time, evolving alongside emerging fraud tactics. Moreover, our solution integrated seamlessly with the bank's existing systems, ensuring real-time processing without significant infrastructure overhaul.
We also incorporated user behaviour analytics to provide additional context, allowing the system to factor in anomalies such as sudden location changes or unusual spending patterns with greater context. By doing so, the solution offered a nuanced view of what could be classified as a red flag.
Key Features
- Real-time Data Processing: Immediate analysis and response to transaction data ensure timely fraud detection.
- AI and Machine Learning Algorithms: Enhances accuracy and reduces false positives by learning from genuine transaction patterns.
- User Behaviour Analysis: Contextual analysis improves decision-making by considering user habits and historical data.
- Seamless Integration: Works smoothly with existing systems, providing solutions without the need for major infrastructure changes.
- Adaptive Learning: The model self-improves by continuously learning from new fraud patterns and legitimate transactions.
Results
Implementing the new fraud detection system significantly shifted the bank's operational efficiency. The rate of false positives dropped by an impressive 70%, enabling the bank to reduce unnecessary account holds and transaction declines, thereby improving customer satisfaction and trust.
Operationally, the bank reallocated resources previously tied up in managing false alerts, allowing them to focus more on strategic development and less on customer complaints. The effective integration of an AI-driven model demonstrated not only immediate results but also positioned the bank for ongoing success in fraud prevention.
The improved system fostered customer confidence, ensuring that transactions were safely and accurately vetted without intrusive interruptions. Adyantrix's tailored solution not only met but exceeded the expectations of the digital bank, showcasing the transformative power of AI and machine learning in optimising fintech security.
Technical Approach
The detection system was built as a low-latency scoring pipeline designed to evaluate each transaction within 80 milliseconds — a constraint imposed by the bank's payment processing SLA and regulatory obligations under PSD2's Strong Customer Authentication framework.
The core scoring engine was deployed on AWS, using Apache Kafka for transaction event streaming (processing peaks of approximately 180,000 transactions per minute during evening retail hours), AWS Lambda for stateless feature computation on individual events, and a dedicated Redis cluster for serving precomputed user-level behavioural features at sub-millisecond latency. This combination allowed the system to enrich each transaction with real-time contextual signals before passing it to the scoring model, without introducing latency that would breach the processing SLA.
The machine learning layer used an ensemble approach combining three model types:
- A gradient-boosted classifier (XGBoost): Trained on 28 million labelled historical transactions spanning 24 months, this model evaluated approximately 140 transaction-level and account-level features including merchant category code patterns, transaction velocity, geographical distance from previous transaction, device fingerprint consistency, and time-of-day distributions.
- An autoencoder neural network for anomaly detection: Trained exclusively on confirmed legitimate transactions, this unsupervised model generated a reconstruction error score for each new transaction. High reconstruction error — indicating a transaction pattern dissimilar to the user's established behaviour — contributed to the ensemble fraud probability score even when the supervised model was uncertain.
- A graph neural network (GNN) for network-level fraud signals: Constructed a bipartite transaction graph linking user accounts to merchant nodes. The GNN detected coordinated fraud rings by identifying unusual clustering patterns in transaction graph topology that were invisible to single-transaction scoring models.
Model outputs were combined using a calibrated weighted average, with weights tuned on a held-out validation set to optimise the F2 score — a metric that weights recall more heavily than precision, appropriate for fraud detection where missed frauds are more costly than false positives. The final decision threshold was set to achieve the bank's target false positive rate, with a manual review queue handling transactions scoring in the uncertain mid-range zone.
Implementation Highlights
The project ran over 18 weeks and progressed through three distinct phases.
Phase 1 — Data Audit and Feature Engineering (Weeks 1–5): We worked with the bank's data engineering team to audit 24 months of labelled transaction data. A significant early finding was that the fraud labels in the historical dataset were incomplete — approximately 12% of confirmed fraud cases had been retrospectively labelled after the statutory 90-day chargeback window had closed, meaning they existed in the dataset as "legitimate" transactions. We developed a label reconciliation process that cross-referenced chargeback records with transaction IDs to correct these labels before training, improving training set quality substantially.
Feature engineering was the most labour-intensive component of Phase 1. We generated over 340 candidate features, of which 140 were selected by recursive feature importance analysis for inclusion in the final XGBoost model. Critically, the behavioural features — rolling statistics on spending patterns, merchant diversification indices, device consistency scores — were pre-computed nightly in a feature store (AWS SageMaker Feature Store) to ensure they were available instantly at scoring time without expensive real-time joins.
Phase 2 — Model Development and Validation (Weeks 6–13): Each of the three model architectures was developed, validated, and calibrated independently before ensemble combination. The GNN was the most novel component and required significant infrastructure work — specifically, building the transaction graph incremental update pipeline that added new nodes and edges without requiring full graph recomputation.
Shadow deployment was used for the final four weeks of Phase 2, running the new system in parallel with the legacy rule-based system and comparing outputs on every transaction. This generated a rich dataset of cases where the two systems disagreed, which was reviewed by the bank's fraud operations team to identify genuine improvements versus cases where the new system's behaviour was unexpected.
Phase 3 — Production Rollout (Weeks 14–18): A canary deployment strategy was used, routing 5% of live transactions to the new system initially, increasing to 25%, 50%, and then 100% over four weeks. At each threshold, key metrics — false positive rate, fraud detection rate, system latency — were monitored for a minimum of 72 hours before the next increment.
Measurable Outcomes
The 70% reduction in false positives was measured against the 90-day pre-deployment baseline period.
In absolute terms, the bank had been generating approximately 4,200 false positive fraud alerts per day, resulting in roughly 1,800 customers per day experiencing unnecessary transaction declines or account holds. Post-deployment, this fell to approximately 1,260 false alerts per day, restoring frictionless transaction experiences for over 2,900 customers daily.
Customer satisfaction scores (CSAT) related to payment declines improved by 34% within six weeks of full deployment, as measured by the bank's post-transaction customer feedback mechanism. Churn attributable to declined transaction experiences — tracked via customer exit survey data — fell by approximately 18% in the following quarter.
Operationally, the fraud operations team's case review workload fell by 62%, as the volume of suspicious transactions requiring human review was reduced both by fewer false positives and by the system's improved ability to auto-resolve high-confidence legitimate transactions without escalation. This freed the team to focus investigative capacity on the more complex fraud patterns identified by the GNN, where human judgement genuinely added value.
Genuine fraud detection rate (recall) was maintained at 97.3% — a marginal improvement of 0.8 percentage points over the legacy system — confirming that the false positive reduction was achieved through improved precision rather than relaxed detection sensitivity.
Lessons Learned
Label quality in historical training data is non-negotiable. The discovery that 12% of fraud cases in the training set had been incorrectly labelled as legitimate was a critical finding. Training a supervised classifier on mislabelled data produces a model that has learned to classify known fraud as legitimate, which is precisely the opposite of what is needed. The label reconciliation process added two weeks to Phase 1 but improved model recall by an estimated 4 percentage points on the validation set — a substantial gain that would not have been achievable without clean labels.
The false positive problem is fundamentally a calibration problem, not a modelling problem. The legacy system's high false positive rate was not caused by a weak underlying model — its fraud detection rate was reasonably good. The problem was that its decision threshold was miscalibrated: it was set to maximise fraud detection without adequately penalising false positives. Recalibrating the decision threshold alone (without any model changes) reduced false positives by 28%. The remaining 42% improvement came from the upgraded model architecture. This sequence emphasises that calibration should always be assessed before investing in model replacement.
Graph-based fraud signals are highly valuable but operationally complex. The GNN contributed meaningfully to detecting coordinated fraud rings that the individual-transaction models missed entirely. However, maintaining the transaction graph infrastructure — ensuring consistent node/edge updates, managing graph database performance under high transaction volumes, and explaining GNN outputs to fraud investigators — added significant operational complexity. In future deployments, we would invest more time upfront in graph database selection (we used Amazon Neptune, which performed well, but required careful query optimisation at peak load).
Why This Approach Worked
The ensemble approach — combining a supervised classifier, an unsupervised anomaly detector, and a graph-based network analyser — succeeded because each component captured a different dimension of fraud signal that the others missed.
Rule-based systems fail because fraudsters adapt to known rules. Single-model ML systems improve on this but remain vulnerable to adversarial drift in the specific feature distributions they rely on. By triangulating across three independent signal types — transaction-level statistics, behavioural deviation from individual user norms, and network-level coordination patterns — the ensemble was substantially harder to game than any single approach.
Equally important was the design decision to optimise explicitly for false positive reduction rather than treating it as a secondary consideration. Most fraud detection systems are evaluated primarily on fraud recall (how many genuine frauds are caught), with false positive rate treated as a constraint rather than an objective. By using the F2 score during model development — which elevates recall but does not ignore precision entirely — and by investing separately in the decision threshold calibration exercise, we aligned model development incentives with the bank's actual business priority: reducing customer friction without compromising security.
Speak with our Data Analytics team at Adyantrix to find out how we can support your next project.
Work with Adyantrix
If you are looking to tackle a similar challenge, Adyantrix has the expertise to help across the full project lifecycle. Our data analytics practice covers BI reporting and self-serve analytics platforms. Our AI & machine learning practice covers ML model development, MLOps, and intelligent automation. Our ML model development practice covers supervised, unsupervised, and deep learning models. Our cloud & DevOps practice covers cloud infrastructure, CI/CD, and platform engineering. Get in touch to discuss your requirements — no commitment required.

