Predicting customer repurchase likelihood in e-commerce using machine learning on discrete behavioral data

Authors

  • Thành Hồ Trung University of Economics and Law, Vietnam University Ho Chi Minh City Author
  • Thị Kim Hiền Lê Trường Đại học Kinh tế - Luật, Đại học Quốc gia Hồ Chí Minh, Việt Nam Author
  • Yến Nhi Lưu Trường Đại học Kinh tế - Luật, Đại học Quốc gia Hồ Chí Minh, Việt Nam Author

DOI:

https://doi.org/10.24311/jabes/2025.36.11.05

Abstract

Due to short product life cycles and rapidly changing trends, e-commerce in the fashion industry requires accurate prediction of the timing and probability of repurchase to enhance the effectiveness of personalized marketing strategies. However, a research gap persists as most studies prioritize long-term repurchase prediction, often overlooking short-term purchase likelihood, especially when dealing with discrete behavioral data. This study proposes a model to predict short-term repurchase likelihood, quantified as a probability over a 30-day period by integrating Recency-Frequency-Monetary (RFM) features with product category diversity and session interaction metrics. Validated using LightGBM and XGBoost algorithms on an online fashion dataset, the proposed models achieved strong classification performance, yielding an ROC-AUC score of approximately 0.8830. Furthermore, R and F were identified as the most influential predictors of short-term purchasing behavior. These findings not only extend the literature on purchase timing prediction but also enable businesses to identify high-potential customers for implementing more effective remarketing strategies.

References

Cheng, C. H., & Chen, Y. S. (2009). Classifying the segmentation of customer value via RFM model and RS theory. Expert Systems with Applications, 36(3), 4176-4184. https://doi.org/10.1016/j.eswa.2008.04.003

Chen, T., & Guestrin, C. (2016). XGBoost: A scalable tree boosting system. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’16) (pp. 785-794). https://doi.org/10.1145/2939672.2939785

Cốc Cốc. (2024). Ngành thời trang Việt Nam: Nhìn cơ hội từ sự đa dạng trong hành vi và thói quen tiêu dùng. Truy cập tại https://qc.coccoc.com/vn/news/nganh-thoi-trang-viet-nam-nhin-co-hoi-tu-su-da-dang-trong-hanh-vi-va-thoi-quen-tieu-dung

Gholamveisy, S., Homayooni, S., Shemshaki, M., Sheykhan, S., Boozary, P., Tanhaei, H. G., & Akbari, N. (2024). Application of data mining technique for customer purchase behavior via Extended RFM model with focus on BCG matrix from a data set of online retailing. Journal of Infrastructure Policy and Development, 8(7), 4426. https://doi.org/10.24294/jipd.v8i7.4426

Gomes, M. A., Wönkhaus, M., Meisen, P., & Meisen, T. (2023). TEE: Real-time purchase prediction using time extended embeddings for representing customer behavior. Journal of Theoretical and Applied Electronic Commerce Research, 18(3), 1404-1418. https://doi.org/10.3390/jtaer18030070

Heinisch, J. S., Gao, N., Anderson, C., Deldari, S., David, K., & Salim, F. (2022). Investigating the effects of mood & usage behaviour on notification response time. arXiv. Retrieved from https://doi.org/10.48550/arXiv.2207.03405

Hoang. A. (2025). E‑commerce in upward trend. Vietnam Economic Times – VnEconomy. Retrieved from https://en.vneconomy.vn/e-commerce-in-upward-trend-1250945.htm

Hoàng Nguyễn Thu Huyền, Lê Ngọc Sơn, & Nguyễn Quốc Cường. (2023). Các yếu tố ảnh hưởng đến hành vi mua sắm sản phẩm thời trang trên ứng dụng di động của Gen Z tại Thành phố Hồ Chí Minh. Tạp chí Khoa học và Công nghệ - Trường Đại học Công nghiệp TP.HCM, 66(6), 56-72. https://jst.iuh.edu.vn/index.php/jst-iuh/article/view/4989

Hughes, A. M. (1996). Boosting response with RFM. Marketing Tools, 3(3), 4-10.

Jalal, M. E., & Elmaghraby, A. (2024). Analyzing the dynamics of customer behavior: A new perspective on personalized marketing through counterfactual analysis. Journal of Theoretical and Applied Electronic Commerce Research, 19(3), Article 81. https://www.mdpi.com/0718-1876/19/3/81

Ke, G., Meng, Q., Finley, T., Wang, T., Chen, W., Ma, W., Ye, Q., & Liu, T. Y. (2017). LightGBM: A highly efficient gradient boosting decision tree. In Advances in Neural Information Processing Systems 30 (NeurIPS 2017) – Proceedings of the 30th Conference (pp. 3149-3157). https://proceedings.neurips.cc/paper_files/paper/2017/file/6449f44a102fde848669bdd9eb6b76fa-Paper.pdf

Li, J., Luo, X., Lu, X., & Moriguchi, T. (2020). Boosting returns on E-Commerce retargeting campaigns. American Marketing Association. Retrieved from https://www.ama.org/2020/11/12/boosting-returns-on-e-commerce-retargeting-campaigns/

Lismont, J., Ram, S., Vanthienen, J., Lemahieu, W., & Baesens, B. (2018). Predicting interpurchase time in a retail environment using customer-product networks: An empirical study and evaluation. Expert Systems with Applications, 104, 22-32. https://doi.org/10.1016/j.eswa.2018.03.016

Liu, D., Huang, H., Zhang, H., Luo, X., & Fan, Z. (2024). Enhancing customer behavior prediction in e-commerce: A comparative analysis of machine learning and deep learning models. Applied and Computational Engineering, 55(1), 181-195. https://doi.org/10.54254/2755-2721/55/20241475

Popowska, M., & Sinkiewicz, A. (2021). Sustainable fashion in Poland - Too early or too late?. Sustainability, 13(17), 9713. https://doi.org/10.3390/su13179713

Segun‑Falade, O. D., Osundare, O. S., Kedi, W. E., Okeleke, P. A., Ijomah, T. I., & Abdul‑Azeez, O. Y. (2024). Utilizing machine learning algorithms to enhance predictive analytics in customer behavior studies. International Journal of Scholarly Research in Engineering and Technology, 4(1), 1-18. https://doi.org/10.56781/ijsret.2024.4.1.0018

Vallarino, D. (2023). Buy when? Survival machine learning model comparison for purchase timing. arXiv. https://doi.org/10.48550/arXiv.2308.14343

Verma, R., Rathor, D., Kumar, S., Mishra, M., & Baranwal, M. (2025). Enhancing customer repurchase prediction: Integrating classification algorithms with RFM analysis for precision and actionable insights. IIMB Management Review, 37(2), 100574. https://doi.org/10.1016/j.iimb.2025.100574

Wong, C. G., Tong, G. K., & Haw, S. C. (2024). Exploring customer segmentation in e-commerce using RFM analysis with clustering techniques. Journal of Telecommunications and the Digital Economy, 12(3), 97-125. https://doi.org/10.18080/jtde.v12n3.978

Zhou, S., & Hudin, N. S. (2024). Advancing e-commerce user purchase prediction: Integration of time-series attention with event-based timestamp encoding and Graph Neural Network-Enhanced user profiling. PLoS ONE, 19(4), e0299087. https://doi.org/10.1371/journal.pone.0299087

Published

2026-03-12

Issue

Section

Articles

How to Cite

Hồ Trung, T., Lê, T. K. H., & Lưu, Y. N. (2026). Predicting customer repurchase likelihood in e-commerce using machine learning on discrete behavioral data. JOURNAL OF ASIAN BUSINESS AND ECONOMIC STUDIES, 36(11), 71-88. https://doi.org/10.24311/jabes/2025.36.11.05