Pengembangan Sistem Tanya Jawab Hukum Indonesia Menggunakan Pendekatan NLP dan Model BM25

Budi Mukhamad Mulyo, Rangsang Purnama, Rifki Fahrial Zainal, Lutfia Febrianti

Abstract


Penelitian ini bertujuan mengembangkan dataset sistem tanya jawab terkait peraturan perundang-undangan di Indonesia, khususnya UUD 1945 dan Undang-Undang Tahun 2023. Dataset yang dibangun terdiri dari kutipan, pertanyaan, dan jawaban, di mana setiap jawaban secara eksplisit merujuk pada kutipan terkait untuk menjaga konsistensi dan validitas data. Proses pembangunan dataset melibatkan tahapan pra-pemrosesan data, meliputi tokenisasi, penghapusan stopword, dan stemming menggunakan alat pemrosesan bahasa alami untuk Bahasa Indonesia. Sebagai mekanisme pencarian jawaban, diterapkan sistem berbasis model BM25 untuk menilai relevansi antara pertanyaan dan kutipan. Evaluasi sistem dilakukan menggunakan metrik Normalized Discounted Cumulative Gain (NDCG) untuk mengukur kualitas peringkat hasil pencarian. Hasil pengujian menunjukkan nilai NDCG@5 sebesar 0,906, yang mengindikasikan kemampuan sistem dalam menempatkan jawaban relevan pada posisi teratas. Meskipun demikian, tantangan masih muncul pada pertanyaan yang ambigu atau memiliki cakupan luas. Oleh karena itu, pengembangan selanjutnya bisa memperbanyak dataset dan integrasi model terbaru seperti IndoBERT untuk meningkatkan kinerja sistem.


References


A. N. Cahya, M. A. Maksum, T. Akbar, dan S. Primadana, “Transformasi Budaya Hukum dalam Era Digital ( Implikasi Penggunaan AI dalam Perkembangan Hukum Di Indonesia ),” IKRA-ITH HUMANIORA : Jurnal Sosial dan Humaniora, vol. 8, no. 2, hlm. 361–373, Jul 2024, doi: 10.37817/ikraith-humaniora.v8i2.

J. Homepage dkk., “Question Answering System pada Chatbot Telegram Menggunakan Large Language Models (LLM) dan Langchain (Studi Kasus UU Kesehatan),” MALCOM: Indonesian Journal of Machine Learning and Computer Science, vol. 4, no. 3, hlm. 955–964, Mei 2024, doi: 10.57152/MALCOM.V4I3.1378.

Y. Zhang dkk., “Learning to Rank Ace Neural Architectures via Normalized Discounted Cumulative Gain,” Agu 2021, Diakses: 28 Mei 2025. [Daring]. Tersedia pada: https://arxiv.org/pdf/2108.03001

M. A. Calijorne Soares dan F. S. Parreiras, “A literature review on question answering techniques, paradigms and systems,” Journal of King Saud University - Computer and Information Sciences, vol. 32, no. 6, hlm. 635–646, Jul 2020, doi: 10.1016/J.JKSUCI.2018.08.005.

D. Bakır dan M. S. Aktas, “A Systematic Literature Review of Question Answering: Research Trends, Datasets, Methods,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 13377 LNCS, hlm. 47–62, 2022, doi: 10.1007/978-3-031-10536-4_4;SUBPAGE:STRING:ABSTRACT;JOURNAL:JOURNAL:GUIDEPROCEEDINGS;WGROUP:STRING:ACM.

M. Mattila dan A. Dahanayke, “Systematic Literature Review of Question Answering Systems,” Lecture Notes in Networks and Systems, vol. 195, hlm. 54–62, 2021, doi: 10.1007/978-3-030-68476-1_5.

S. S. A. N. Elfadil, M. Jarajreh, dan S. Algarni, “Question Answering Systems: A Systematic Literature Review,” International Journal of Advanced Computer Science and Applications, vol. 12, no. 3, hlm. 495–502, 2021, doi: 10.14569/IJACSA.2021.0120359.

S. Khazaeli dkk., “A Free Format Legal Question Answering System,” Natural Legal Language Processing, NLLP 2021 - Proceedings of the 2021 Workshop, hlm. 107–113, 2021, doi: 10.18653/V1/2021.NLLP-1.11.

B. Fawei, “NLP-Based Rule Learning from Legal Text for Question Answering,” Asian Journal of Research in Computer Science, vol. 17, no. 7, hlm. 31–40, Jun 2024, doi: 10.9734/AJRCOS/2024/V17I7475.

S. Jaybhaye, A. Mudkhedkar, S. Chavan, V. Bhosle, dan A. Mukkawar, “‘Legal Owl’: An Application for Machine-Generated Legal Aid Using NLP,” Signals and Communication Technology, vol. Part F2556, hlm. 615–626, 2024, doi: 10.1007/978-3-031-47942-7_53.

X. Yang, Z. Wang, Q. Wang, K. Wei, K. Zhang, dan J. Shi, “Large language models for automated Q&A involving legal documents: a survey on algorithms, frameworks and applications,” International Journal of Web Information Systems, vol. 20, no. 4, hlm. 413–435, Jul 2024, doi: 10.1108/IJWIS-12-2023-0256/FULL/XML.

H. N. Van, D. Nguyen, P. M. Nguyen, dan M. Le Nguyen, “Miko Team: Deep Learning Approach for Legal Question Answering in ALQAC 2022,” Proceedings - International Conference on Knowledge and Systems Engineering, KSE, vol. 2022-October, 2022, doi: 10.1109/KSE56063.2022.9953780.

H. L. Nguyen, T. B. Nguyen, T. M. Nguyen, H. T. Nguyen, dan H. Y. T. Vuong, “VLH team at ALQAC 2022: Retrieving legal document and extracting answer with BERT-based model,” Proceedings - International Conference on Knowledge and Systems Engineering, KSE, vol. 2022-October, 2022, doi: 10.1109/KSE56063.2022.9953625.

T.-M. Nguyen dkk., “NOWJ1@ALQAC 2023: Enhancing Legal Task Performance with Classic Statistical Models and Pre-trained Language Models,” Sep 2023, Diakses: 28 Mei 2025. [Daring]. Tersedia pada: https://arxiv.org/pdf/2309.09070

R. Hoshino, R. Taniguchi, N. Kiyota, dan Y. Kano, “Question Answering System for Legal Bar Examination Using Predicate Argument Structure,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11717 LNAI, hlm. 207–220, 2019, doi: 10.1007/978-3-030-31605-1_16.

F. T. Admojo, A. Lajis, dan H. Nasir, “Systematic Literature Review on Ontology-based Indonesian Question Answering System,” Knowledge Engineering and Data Science, vol. 6, no. 2, hlm. 129, Okt 2023, doi: 10.17977/UM018V6I22023P129-144.

B. M. Mulyo dan D. H. Widyantoro, “Aspect-Based Sentiment Analysis Approach with CNN,” Proceeding of the Electrical Engineering Computer Science and Informatics, vol. 5, no. 5, Nov 2018, doi: 10.11591/EECSI.V5I5.1597.

R. Patil, P. D. Patil, Y. Joshi, S. Khandelwal, S. Nalawade, dan B. Palve, “NLP Based Question Answering System,” 2023 7th International Conference On Computing, Communication, Control And Automation, ICCUBEA 2023, 2023, doi: 10.1109/ICCUBEA58933.2023.10392202.

Z. Zhao dkk., “FIRE2019@AILA: Legal Information Retrieval Using Improved BM25,” 2019. Diakses: 28 Mei 2025. [Daring]. Tersedia pada: https://ceur-ws.org/Vol-2517/T1-7.pdf

G. Chen, X. Luo, dan J. Zhu, “A Legal Multi-Choice Question Answering Model Based on BERT and Attention,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 14120 LNAI, hlm. 250–266, 2023, doi: 10.1007/978-3-031-40292-0_21.

R. Taniguchi, R. Hoshino, dan Y. Kano, “Legal Question Answering System Using FrameNet,” Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), vol. 11717 LNAI, hlm. 193–206, 2019, doi: 10.1007/978-3-030-31605-1_15.

W. Huang dkk., “Generating Reasonable Legal Text through the Combination of Language Modeling and Question Answering,” IJCAI International Joint Conference on Artificial Intelligence, vol. 4, hlm. 3687–3693, Jul 2020, doi: 10.24963/IJCAI.2020/510.

A. Abdallah, B. Piryani, dan A. Jatowt, “Exploring the state of the art in legal QA systems,” J Big Data, vol. 10, no. 1, hlm. 1–33, Des 2023, doi: 10.1186/S40537-023-00802-8/TABLES/4.

G. Castrillo dkk., “Questions and Answers on Legal Texts Based on BERT-BiGRU,” J Phys Conf Ser, vol. 1828, no. 1, hlm. 012035, Feb 2021, doi: 10.1088/1742-6596/1828/1/012035.

R. F. Noviyan dan N. A. Rakhmawati, “rochimfn/dataset-qa: Release v1.0.1”, doi: 10.5281/ZENODO.6818393.


Full Text: PDF


DOI : https://doi.org/10.33005/scan.v20i2.5662

Refbacks

  • There are currently no refbacks.