IRIS-Bot: Institutional Research Information System using Intelligent Automation and Deep Learning | IJCT Volume 12 – Issue 6 | IJCT-V12I6P24

International Journal of Computer Techniques
ISSN 2394-2231
Volume 12, Issue 6  |  Published: November – December 2025

Author

Mr.MADHU CK, GANAVI D GOWDA, LAVANYA NM, NAVEENA K, KUSHAL CS

Abstract

Managing and verifying research publications within academic institutions is a major challenge due to fragmented storage, inconsistent metadata, duplicate submissions, and the lack of automated validation against trusted scholarly sources. Manual compilation of institutional research data—required for accreditation, rankings, audits, and faculty evaluation— is time-consuming, error-prone, and inefficient. To address this, we propose IRIS-Bot: Institutional Research Information System, an intelligent desktop application that automates end-to-end research publication management. The system integrates multiple advanced components, including automated PDF metadata extraction, DOI/CrossRef lookup, citation and indexing verification, journal authenticity checking, duplicate detection, semantic search, and domain classification. Using deep learning-based sentence embeddings and pgvector-powered similarity search, IRIS-Bot provides accurate retrieval and clustering of research documents. A modular backend architecture with PostgreSQL as the primary datastore ensures scalability, while serves as a lightweight for offline use. The Qt-based graphical interface provides an interactive and user-friendly environment for administrators and faculty. Experimental evaluation demonstrates that IRIS-Bot significantly improves accuracy in metadata extraction, reduces duplication errors, and enhances search efficiency using semantic embeddings compared to traditional keyword-based methods. The proposed system offers a unified solution that enables institutions to maintain clean, verified, and searchable research repositories with minimal manual effort. IRIS-Bot can be effectively deployed for academic audits, accreditation processes, and institutional research analytics, contributing to improved data quality and automation in higher education environments Keywords: Institutional Research System, Deep Learning, NLP, Metadata Verification, CrossRef, Automation, Academic Data Integration

Keywords

^KEYWORDS^

Conclusion

The development of the Institutional Research Information System (IRIS-Bot) represents a significant step toward modernizing and streamlining the way academic institutions manage their research outputs. Traditionally, universities rely heavily on manual workflows, decentralized data collection, and inconsistent verification procedures, leading to duplication of effort, inaccurate reporting, and inefficient research tracking. IRIS-Bot directly addresses these challenges through an integrated, automated, and scalable solution. The system successfully combines multiple layers of functionality: automated PDF extraction, metadata enrichment, indexing and citation verification, duplicate detection, domain classification, and semantic search. By leveraging advanced libraries such as PyMuPDF for document parsing, SentenceTransformers for semantic understanding, fuzzy logic for similarity checks, and PostgreSQL with the pgvector extension, IRIS-Bot offers a powerful backend capable of handling large volumes of research data. One of the most impactful contributions of the system is the significant reduction in manual workloads for faculty coordinators and administrative staff. Tasks such as verifying DOI information, checking indexing claims, confirming journal legitimacy, or reviewing citation counts— traditionally time-consuming and error-prone— are now performed automatically and consistently. The system’s enrichment pipeline ensures that data stored in the database is accurate, validated, and complete, allowing institutions to maintain a trustworthy, high-quality research repository. Furthermore, the implementation of a semantic search engine greatly enhances discoverability. This feature is particularly beneficial for scholars seeking to quickly understand the institution’s research strengths and contributions by querying concepts in natural language. The GUI, developed using PySide6, provides a clean, intuitive, and userfriendly way to interact with the system. Overall, IRIS-Bot demonstrates that a thoughtful combination of machine learning, natural language processing, database engineering, and user-centric design can transform institutional research management. The project successfully fulfills its aim of creating a unified, reliable, and intelligent system that simplifies workflows, improves data accuracy, and enhances accessibility. With its modular architecture and extensible pipeline, IRIS-Bot lays a strong foundation for future enhancements and large- scale deployment across departments or institutions.

References

[1] Crossref. Crossref REST API Documentation: Metadata Retrieval and DOI Lookup. Retrieved from Crossref API documentation. https://api.crossref.org [2] Google Scholar. Publication Search and Citation Metrics System. [3] Guo, L., et al. “Duplicate Record Detection Methods in Bibliographic Databases.” Journal of Information Science, 2018. [4] Gupta, B. “Institutional Repository Frameworks and Data Quality.” Library Management Journal, Vol. 39, No. 4, 2019. [5] Johnson, T. “Data-Driven Academic Analytics for Institutional Ranking.” Journal of Machine Intelligence, Vol. 5, 2022. [6] Kim, J. et al. “Deep Learning for Research Classification Using BERT.” Nature Computational Science, Vol. 3, pp. 44–55, 2021. [7] Lee, S. “Building Intelligent Research Information Systems: A Case Study.” IEEE Access, Vol. 9, pp. 55310– 55322, 2021. [8] Meyer, C. “Hybrid Semantic–Lexical Search Techniques for Academic Document Retrieval.” Information Retrieval Journal, 2020. [9] OpenAlex Dataset, 2024. Available: https:// openalex.org [10] Patel, M. “Hybrid Verification Approaches in Scholarly Databases.” Data Science Review, Vol. 12, No. 3, 2020. [11] pdfplumber Documentation. Comprehensive PDF ParsingandTextExtractionToolkit. Official Documentation. [12] PostgreSQL. pgvectorExtension Documentation. PostgreSQL Global Development Group. [13] PyMuPDF Documentation. PyMuPDF (Fitz) PDF Text Extraction and Layout Processing. Official PyMuPDF Reference. [14] Roy, A. and Thomas, P. “Hybrid Metadata Verification Framework for Scholarly Records.” Information Systems Research, 2023. [15] Google Scholar Data Parsing API (ScholarPy), 2023. Available: https://scholarpy. readthedocs.io/ [16] Scimago Journal & Country Rank. Quartile (Q1–Q4) Computation and Journal Ranking Methodology. International Journal of Computer Techniques (IJCT) ISSN : 2394-2231 http://www.ijctjournal.org [17] Scopus Database. Journal Indexing and Evaluation Guidelines. Elsevier. [18] Scopus API Developer Guide, Elsevier, 2024. Available: https://dev.elsevier.com/ [19] Smith, A. “Automated Extraction of Scientific Metadata Using NLP Techniques.” IEEE Transactions on Information Systems, 2021. [20] UGC CARE. Reference List of Quality Journals for Indian Academia. [21] Web of Science. JournalCitationReportsandIndexing Framework. Clarivate Analytics. [22] Academic Research on Semantic Similarity and Document Embeddings. Various IR and NLP Publications.

How to Cite This Paper

Mr.MADHU CK, GANAVI D GOWDA, LAVANYA NM, NAVEENA K, KUSHAL CS (2025). IRIS-Bot: Institutional Research Information System using Intelligent Automation and Deep Learning. International Journal of Computer Techniques, 12(6). ISSN: 2394-2231.

© 2025 International Journal of Computer Techniques (IJCT). All rights reserved.

Submit Paper