Nghiên cứu giải pháp phát hiện tin giả trên mạng xã hội bằng ngôn ngữ tiếng Việt

Định Trương Quốc; Kiều Phan Thị Thúy

doi:10.24311/jabes/2022.33.05.01

Các tác giả

Trương Quốc Định Trường Đại học Cần Thơ Tác giả
Phan Thị Thúy Kiều Phân hiệu Trường Đại học Kinh tế TP. Hồ Chí Minh tại Vĩnh Long Tác giả

DOI:

https://doi.org/10.24311/jabes/2022.33.05.01

Từ khóa:

Tin giả, Mạng xã hội, Phát hiện tin giả, Tóm tắt, Độ tương đồng cosine

Tóm tắt

Trong nghiên cứu này, nhóm tác giả sử dụng giải pháp tìm kiếm thông tin để tìm kiếm các tin thật có nội dung tương tự với nội dung của tin cần kiểm tra, sau đó dùng độ đo cosine để đo đạt và đánh giá bản tin kiểm tra có phải là tin giả hay không. Bài nghiên cứu sử dụng hai bộ dữ liệu cho mục tiêu xác định giá trị các tham số của mô hình đề xuất cũng như thực nghiệm độ chính xác của mô hình. Bộ dữ liệu tin thật được thu thập từ các trang tin chính thống của Việt Nam là tập hợp tin thật. Tập dữ liệu kiểm thử được thu thập từ các bài đăng trên mạng xã hội vừa có tin thật và tin giả dùng cho mục đích kiểm tra độ chính xác của mô hình đề xuất. Kết quả thực nghiệm trên hai bộ dữ liệu cho thấy rằng mô hình đề xuất của bài nghiên cứu có thể phát hiện tin giả.

Tài liệu tham khảo

Ahmed, H. (2017). Detecting opinion spam and fake news using N-gram analysis and semantic similarity. A Thesis Submitted in Partial Fulfillment of the Requirements for the Degree of Master of Applied Science. University of Victoria.

Aldwairi, M., & Alwahedi, A. (2018). Detecting fake news in social media networks. Procedia Computer Science, 141, 215–222.

Allcott, H., & Gentzkow, M. (2017). Social media and fake news in the 2016 election. Journal of Economic Perspectives, 31(2), 211–236.

Bông Mai. (2017). Tin tức giả, hệ quả thật. Truy cập ngày 30/12/2017, từ https://nhandan.vn/megastory/2017/12/29/

Burfoot, C., & Baldwin, T. (2009). Automatic satire detection: Are you having a laugh?. In Proceedings of the ACL-IJCNLP 2009 Conference Short Papers (pp. 161–164). Suntec, Singapore. Association for Computational Linguistics.

Erkan, G., & Radev, D. R. (2004). Lexrank: Graph-based lexical centrality as salience in text summarization. Journal of Artificial Intelligence Research, 22, 457–479.

Horne, B. D., & Adali, S. (2017). This just in: Fake news packs a lot in title, uses simpler, repetitive content in text body, more similar to satire than real news. In Proceedings of the International AAAI Conference on Web and Social Media, 11(1), 759–766.

Hovy, E., & Lin, C. Y. (1999). Automated text summarization in SUMMARIST. In Advances in Automatic Text Summarization (pp. 81–94). Cambridge, MA: The MIT Press.

Jin, Z., Cao, J., Zhang, Y., & Luo, J. (2016). News verification by exploiting conflicting social viewpoints in microblogs. In Proceedings of the AAAI Conference on Artificial Intelligence, 30(1). Retrieved from https://ojs.aaai.org/index.php/AAAI/article/view/10382

Jones, K. S. (1999). Automatic summarizing: Factors and directions. In Advances in Automatic Text Summarization (pp. 1–12). Cambridge, MA: The MIT Press.

Kwon, S., Cha, M., Jung, K., Chen, W., & Wang, Y. (2013). Prominent features of rumor propagation in online social media. In 2013 IEEE 13th International Conference on Data Mining (pp. 1103–1108), 7–10 December, 2013, Dallas, TX, USA.

Kiyoumarsi, F., & Esfahani, F. R. (2011). Optimizing Persian text summarization based on fuzzy logic. In 2011 International Conference on Intelligent Building and Management (pp. 264–269). Singapore: IACSIT Press.

Li, Y., Gao, J., Meng, C., Li, Q., Su, L., Zhao, B.,..., & Han, J. (2016). A survey on truth discovery. ACM SIGKDD Explorations Newsletter, 17(2), 1–16.

Ma, J., Gao, W., Mitra, P., Kwon, S., Jansen, B. J., Wong, K. F., & Cha, M. (2016). Detecting rumors from microblogs with recurrent neural networks. In 25th International Joint Conference on Artificial Intelligence, IJCAI 2016, 9–15 July, 2016, New York, United States.

Maybury, M. T., & Mani, I. (2001). Automatic summarization. In American/European Conference on Computational Linguistics, 8 July, 2001, Toulouse, France.

Mukherjee, S., & Weikum, G. (2015). Leveraging joint interactions for credibility analysis in news communities. In Proceedings of the 24th ACM International on Conference on Information and Knowledge Management (pp. 353–362), 19–23 October, 2015, Melbourne, VIC, Australia.

Nguyễn Việt Hùng. (2017). Trích chọn thuộc tính trong đoạn văn bản với TF-IDF. Truy cập ngày 20/10/2017, từ https://viblo.asia/p/trich-chon-thuoc-tinh-trong-doan-van-ban-voi-tf-idf-Az45bAOqlxY

Ruchansky, N., Seo, S., & Liu, Y. (2017). CSI: A hybrid deep model for fake news detection. In Proceedings of the 26th ACM International Conference on Information and Knowledge Management (CIKM) 2017 (pp. 797–806).

Sampson, J., Morstatter, F., Wu, L., & Liu, H. (2016). Leveraging the implicit structure within social media for emergent rumor detection. In Proceedings of the 25th ACM International on Conference on Information and Knowledge Management (pp. 2377–2382), 24–28 October, 2016, Indianapolis, IN, USA.

Shu, K., Sliva, A., Wang, S., Tang, J., & Liu, H. (2017). Fake news detection on social media: A data mining perspective. ACM SIGKDD Explorations Newsletter, 19(1), 22–36.

Thanh Hà. (2018). Tin giả hoành hành, Luật Việt Nam xử phạt như thế nào?. Truy cập ngày 06/05/2018, từ htttps://tuoitre.vn/tin-gia-hoanh-hanh-luat-viet-nam-xu-phat-the-nao 20180506080658065.htm

Thủ tướng Chính phủ Việt Nam. (2020). Nghị định số 15/2020/NĐ-CP của Chính phủ: Quy định xử phạt vi phạm hành chính trong lĩnh vực bưu chính, viễn thông, tần số vô tuyến điện, công nghệ thông tin và giao dịch điện tử, ban hành ngày 03/02/2020. Truy cập từ https://vanban.chinhphu.vn/default.aspx?pageid=27160&docid=199053

Trương Quốc Định, & Nguyễn Quang Dũng. (2012). Một giải pháp tóm tắt văn bản tiếng Việt tự động. Hội thảo quốc gia lần thứ XV: Một số vấn đề chọn lọc của Công nghệ thông tin và truyền thông, tổ chức ngày 12/2012, tại Hà Nội.

Truong, Q. D., Dkaki, T., Mothe, J., & Charrel, P. J. (2008). GVC: A graph-based information retrieval mode. In Conférence en Recherche d'Information et Applications CORIA (pp. 337–351).

Vương Thị Hồng, Nguyễn Minh Đức, Nguyễn Văn Quang, Trần Văn Hiến, Nguyễn Thị Cẩm Vân, & Trần Mai Vũ. (2016). Phát hiện tài khoản spam trên mạng xã hội dựa trên phương pháp lai (Một thực nghiệm). Hội thảo lần thứ I: Một số vấn đề chọn lọc về an toàn an ninh thông tin, tổ chức ngày 28/11/2016, tại Hà Nội.

Weikum, G. (2017). What computers should know, shouldn't know, and shouldn't believe. In Proceedings of the 26th International Conference on World Wide Web Companion (pp. 1559–1560). doi: 10.1145/3041021.3051120

Wu, L., Li, J., Hu, X., & Liu, H. (2017). Gleaning wisdom from the past: Early detection of emerging rumors in social media. In Proceedings of the 17th SIAM International Conference on Data Mining (pp. 99–107). Houston, United States: Society for Industrial and Applied Mathematics.

Zubiaga, A., Aker, A., Bontcheva, K., Liakata, M., & Procter, R. (2018). Detection and resolution of rumours in social media: A survey. ACM Computing Surveys, 51(2), 1–36.