CH.VENKATA GOPI, M. VENKATA NARENDRA BABU, A. DEVENDRA, BANDLA.KARTHIK, Mrs. N. RADHA
Abstract
Legal texts are often written in highly technical and formal language, making them difficult for non-experts to understand and interpret correctly. With the growing adoption of artificial intelligence in the legal domain, large language models (LLMs) have shown promise in supporting natural language interaction with normative and regulatory documents. However, general-purpose LLMs frequently suffer from high perplexity, hallucinations, and limited domain reliability when applied to legal texts, which restricts their practical usability in real-world legal information systems. To address these challenges, this paper proposes a reliable and enhanced legal text interpretation framework that combines fine-grained large language models with Retrieval-Augmented Generation (RAG). The approach leverages supervised fine-tuning on a synthetic yet human-validated legal question–answer dataset derived from official data protection laws and regulations, enabling clause-level and article-level understanding of legal content. In addition, a RAG architecture is integrated to ground model responses in authoritative legal sources, thereby reducing hallucinations and improving factual consistency while maintaining adaptability to evolving legal documents. Experimental results demonstrate that the proposed framework significantly improves legal question-answering performance compared to baseline models. Fine-tuned models achieve substantial reductions in perplexity, while the RAG-enhanced system attains accuracy levels exceeding 90% across multiple difficulty categories in benchmark evaluations. These findings highlight that combining fine-grained model adaptation with retrievalgrounded generation provides a robust and scalable solution for reliable legal text interpretation, supporting broader accessibility and trustworthy use of AI-driven legal information systems.
Keywords
Large language models (LLM), generative AI, fine tuning, RAG, Ecuadorian law, legal.
Conclusion
This study has underscored the significant potential of open-source large language models, when combined with fine-tuning techniques and Retrieval-Augmented Generation, to enhance access to and comprehension of legal information within the Ecuadorian context. The research focused particularly on the Ley Orgánica de Protección de Datos Personales and its regulation, shedding light on how these technologies can address real-world legal challenges.
The findings revealed that fine-tuned models achieved substantial reductions in perplexity while markedly improving accuracy on specific legal queries, outperforming their baseline counterparts by a considerable margin. Despite these advancements, Retrieval-Augmented Generation systems demonstrated superior overall performance, particularly due to their adaptability to frequent legislative updates.
Among the key contributions of this work are the development of the first open-source models and specialized legal benchmarks tailored to the Ley Orgánica de Protección de Datos Personales in Ecuador. These resources provide a robust foundation for evaluating large language models in this specific domain and contribute to the democratization of legal knowledge.
References
[1]P. S. García-Montero, P. Vizcaíno, I. G. Reyes-Chacón, and M. E. Morocho-Cayamcela, “Legal AI for all: Reducing perplexity and boosting accuracy in normative texts with fine-tuned LLMs and RAG,” IEEE Access, 2025.
[2]V. Raghupathi, Y. Zhou, and W. Raghupathi, “Legal decision support: Exploring big data analytics approach to modeling pharma patent validity cases,” IEEE Access, 2018. [3]J. Cui, X. Shen, and S. Wen, “A survey on legal judgment prediction: Datasets, metrics, models and challenges,” IEEE Access, 2023. [4]C. He, T. P. Tan, X. Zhang, and S. Xue, “Knowledge-enriched multi-cross attention network for legal judgment prediction,” IEEE Access, 2023.
[5]A. S. Imran, H. Hodnefjeld, Z. Kastrati, N. Fatima, S. M. Daudpota, and M. A. Wani, “Classifying European Court of Human Rights cases using transformer-based techniques,” IEEE Access, 2023. [6]M. Kutbi, “Named entity recognition utilized to enhance text classification while preserving privacy,” IEEE Access, 2023.
[7]O. A. Cejas, M. I. Azeem, S. Abualhaija, and L.
C. Briand, “NLP-based automated compliance checking of data processing agreements against GDPR,” IEEE Transactions on Software Engineering, 2023. [8]A. Iftikhar, S. W. U. Q. Jaffry, and M. K. Malik, “Information mining from criminal judgments of Lahore High Court,” IEEE Access, 2019. [9]G. Li, Z. Wang, and Y. Ma, “Combining domain knowledge extraction with graph long short-term memory for learning classification of Chinese legal documents,” IEEE Access, 2019. [10]A. Munthuli et al., “Transformers for multi-intent classification and slot filling of Supreme Court decisions related to sexual violence law,” IEEE Access, 2023.
[11]V. Javidroozi, H. Shah, and G. Feldman, “FABS: A framework for addressing the business process change challenges for smart city development,” IEEE Access, 2023. [12]L. Yan et al., “Practical and ethical challenges of large language models in education: A systematic scoping review,” British Journal of Educational Technology, vol. 55, no. 1, pp. 90–112, 2024. [13]M. U. Hadi et al., “A survey on large language models: Applications, challenges, limitations, and practical usage,” Authorea Preprints, 2023. [14]S. Tan and Y. Guo, “A study of the impact of scientific collaboration on the application of large language models,” AIMS Mathematics, vol. 9, no. 7,
pp. 19737–19755, 2024.
[15]M. A. K. Raiaan et al., “A review on large language models: Architectures, applications, taxonomies, open issues and challenges,” IEEE Access, vol. 12, pp. 26839–26874, 2024. [16]Anthropic, “The Claude 3 model family: Opus, Sonnet, Haiku,” Mar. 2024.
[17]I. Chalkidis, I. Androutsopoulos, and N. Aletras, “Neural legal judgment prediction in English,” arXiv:1906.02059, 2019.
[18]M. Honnibal and I. Montani, “spaCy 2: Natural language understanding with Bloom embeddings, convolutional neural networks and incremental parsing,” 2017. [19]S. Bird, E. Klein, and E. Loper, Natural Language Processing with Python. Sebastopol, CA: O’Reilly Media, 2009. [20]D. Jurafsky and J. H. Martin, Speech and Language Processing. Englewood Cliffs, NJ: Prentice Hall, 2009. [21]C. Cortes and V. Vapnik, “Support-vector networks,” Machine Learning, vol. 20, no. 3, pp. 273–297, 1995.
[22]D. Nadeau and S. Sekine, “A survey of named entity recognition and classification,” Lingvisticae Investigationes, vol. 30, pp. 3–26, 2007.
How to Cite This Paper
CH.VENKATA GOPI, M. VENKATA NARENDRA BABU, A. DEVENDRA, BANDLA.KARTHIK, Mrs. N. RADHA (2026). Legal Ai For All: Reducing Perplexity And Boosting Accuracy In Normative Texts With Fine-Tuned Llms And Rag. International Journal of Computer Techniques, 13(2). ISSN: 2394-2231.