Scientific Literature

An Integrated Machine Learning Framework for Tender Recommendation and Win Prediction in Kazakhstan's Public Procurement

Discovered On May 12, 2026
Primary Metric 0
Public procurement portals generate millions of tender announcements annually, yet suppliers typically rely on manual browsing and experience-based judgment when deciding which lots to pursue. This paper presents a machine learning-based decision support system for suppliers on Kazakhstan's national procurement platform, goszakup.gov.kz, combining win-probability prediction with semantic lot recommendation in a unified pipeline. A data warehouse was constructed from over 101,000 contracts collected via web scraping, and a CatBoost classifier was trained on 2,081 labelled supplier-lot pairs derived from official tender protocol documents, achieving ROC-AUC 0.779 on a held-out test set. Three structural data leakage mechanisms — each arising from the absence of participation records in the source data — were identified and eliminated during development. The recommendation engine uses multilingual sentence embeddings indexed for fast similarity search, with a reranking step that combines semantic relevance with predicted win probability. A two-tower collaborative filtering approach was prototyped but found to degrade toward popularity bias at the low interaction density typical of procurement data, confirming that content-based methods are more appropriate in this setting. Findings have practical implications for ML practitioners working with procurement data and for system designers choosing recommendation architectures under sparse interaction conditions.
View Raw Thread