Predictive VAT Non-Compliance Benchmarking Across Industries in Rwanda: A Machine Learning Approach

Niyomugabo, Celestin and Idowu, Sunday A. (2025) Predictive VAT Non-Compliance Benchmarking Across Industries in Rwanda: A Machine Learning Approach. International Journal of Innovative Science and Research Technology, 10 (10): 25oct139. pp. 507-519. ISSN 2456-2165

Abstract

Value Added Tax (VAT) non-compliance remains a persistent challenge in Rwanda despite the nationwide rollout of Electronic Billing Machine (EBM) and other digital reforms. While retrospective VAT gap studies have been useful in quantifying the scale of revenue loss, they fall short of providing predictive insights that can proactively prevent non- compliance. To address this gap, this study developed and validated an industry-aware machine learning model capable of predicting VAT non-compliance using integrated administrative microdata. The study also benchmarked VAT non- compliance across taxpayer scales and industries to identify systematic sectoral heterogeneity to generate actionable evidence for risk-based auditing and more targeted policy design. This study integrated VAT declarations, EBM transactions, and customs import records from Rwanda Revenue Authority (RRA) for the period 2020–2024, linking them at the taxpayer level to build a comprehensive compliance dataset. An Extreme gradient Boosting (XGBoost) classifier was applied, with class imbalance addressed through weighting to ensure that the minority class of VAT non-compliant returns contributed proportionately to model learning. Hyperparameters were optimized through grid search and validation to ensure robust generalization, while decision thresholds were tuned to prioritize high recall without compromising precision. Model performance was evaluated using accuracy, precision, recall, F1-score, and both ROC-AUC and PR-AUC, with additional out-of-time validation to confirm stability. Feature interpretability was ensured through SHARP-based importance analysis, which highlighted the relative contribution of discrepancies between EBM sales and declared turnover, penalty history, and trade activity in predicting VAT non-compliance. The model achieved high predictive performance for the non-compliant class (accuracy 98.9%, precision 0.932, recall 0.887, F1-score 0.909) with robust generalization across tax years. The VAT non-compliance is 6.9% overall, with statistically significant between-industry dispersion (ANOVA p- value<0.001). Elevated risk appears in transport and storage, wholesale and retail trade, manufacturing, mining and quarrying, electricity, gas, steam & air conditioning supply, and activities of households as employers. Non-compliance also increases with taxpayer scale (large 11.5%, medium 9.4%, small 6.0%). Feature importance confirms the operational salience of EBM sales and total value of supplies declared discrepancies and penalty history.  Conclusion: Industry-aware predictive analytics can materially strengthen risk-based auditing in Rwanda by targeting higher-risk sectors and scales, improving audit efficiency and revenue recovery, and providing replicable benchmarks for sector-specific policy design.

Documents
3102:17941
[thumbnail of IJISRT25OCT139.pdf]
IJISRT25OCT139.pdf - Published Version

Download (1MB)
Information
Library
Metrics

Altmetric Metrics

Dimensions Matrics

Statistics

Downloads

Downloads per month over past year

View Item