An unsupervised machine learning pipeline combining K-Means clustering with statistical outlier detection to identify non-compliant sales behaviour across a 300,000-outlet retail network — protecting incentive payouts and strengthening the integrity of trade marketing reporting.
BAT ran a large-scale trade incentive programme that rewarded retail outlets for achieving sales targets and compliance with merchandising standards. With ~300,000 outlets in scope, manual auditing was not feasible — the team relied on self-reported field data to validate claims.
This created an obvious vulnerability: without automated detection, non-compliant outlets could claim incentive payouts they had not legitimately earned. Beyond the direct financial exposure, inaccurate compliance data was feeding into strategic decisions about where to invest field resources — meaning the problem had compounding downstream effects.
I built a two-stage detection pipeline that combined unsupervised clustering with statistical outlier analysis:
Flagged outlets were surfaced in a Power BI compliance dashboard with drill-through capability, enabling the trade marketing team to investigate specific accounts without needing to access the underlying Python pipeline.
The compliance team was able to act on specific flagged accounts rather than conducting broad manual audits. Incentive payout accuracy improved and the resulting compliance data became more reliable as an input to field investment decisions.