
IRIS-Bot: Institutional Research Information System using Intelligent Automation and Deep Learning | IJCT Volume 12 – Issue 6 | IJCT-V12I6P24

International Journal of Computer Techniques
ISSN 2394-2231
Volume 12, Issue 6 | Published: November – December 2025
Table of Contents
ToggleAuthor
Mr.MADHU CK, GANAVI D GOWDA, LAVANYA NM, NAVEENA K, KUSHAL CS
Abstract
Managing and verifying research publications within academic institutions is a major challenge due to fragmented storage,
inconsistent metadata, duplicate submissions, and the lack of automated validation against trusted scholarly sources.
Manual compilation of institutional research data—required for accreditation, rankings, audits, and faculty evaluation—
is time-consuming, error-prone, and inefficient. To address this, we propose IRIS-Bot: Institutional Research Information
System, an intelligent desktop application that automates end-to-end research publication management.
The system integrates multiple advanced components, including automated PDF metadata extraction, DOI/CrossRef
lookup, citation and indexing verification, journal authenticity checking, duplicate detection, semantic search, and domain
classification. Using deep learning-based sentence embeddings and pgvector-powered similarity search, IRIS-Bot provides
accurate retrieval and clustering of research documents. A modular backend architecture with PostgreSQL as the primary
datastore ensures scalability, while serves as a lightweight for offline use. The Qt-based graphical interface provides an
interactive and user-friendly environment for administrators and faculty.
Experimental evaluation demonstrates that IRIS-Bot significantly improves accuracy in metadata extraction, reduces
duplication errors, and enhances search efficiency using semantic embeddings compared to traditional keyword-based
methods. The proposed system offers a unified solution that enables institutions to maintain clean, verified, and
searchable research repositories with minimal manual effort. IRIS-Bot can be effectively deployed for academic audits,
accreditation processes, and institutional research analytics, contributing to improved data quality and automation in
higher education environments Keywords: Institutional Research System, Deep Learning, NLP, Metadata Verification,
CrossRef, Automation, Academic
Data Integration
Keywords
^KEYWORDS^
Conclusion
The development of the Institutional Research Information
System (IRIS-Bot) represents a significant step toward
modernizing and streamlining the way academic
institutions manage their research outputs. Traditionally,
universities rely heavily on manual workflows,
decentralized data collection, and inconsistent verification
procedures, leading to duplication of effort, inaccurate
reporting, and inefficient research tracking. IRIS-Bot directly
addresses these challenges through an integrated,
automated, and scalable solution.
The system successfully combines multiple layers of
functionality: automated PDF extraction, metadata
enrichment, indexing and citation verification, duplicate
detection, domain classification, and semantic search. By
leveraging advanced libraries such as PyMuPDF for
document parsing, SentenceTransformers for semantic
understanding, fuzzy logic for similarity checks, and
PostgreSQL with the pgvector extension, IRIS-Bot offers a
powerful backend capable of handling large volumes of
research data.
One of the most impactful contributions of the system is
the significant reduction in manual workloads for faculty
coordinators and administrative staff. Tasks such as
verifying DOI information, checking indexing claims,
confirming journal legitimacy, or reviewing citation
counts— traditionally time-consuming and error-prone—
are now performed automatically and consistently. The
system’s enrichment pipeline ensures that data stored in
the database is accurate, validated, and complete, allowing
institutions to maintain a trustworthy, high-quality research
repository.
Furthermore, the implementation of a semantic search
engine greatly enhances discoverability. This feature is
particularly beneficial for scholars seeking to quickly
understand the institution’s research strengths and
contributions by querying concepts in natural language.
The GUI, developed using PySide6, provides a clean,
intuitive, and userfriendly way to interact with the system.
Overall, IRIS-Bot demonstrates that a thoughtful
combination of machine learning, natural language
processing, database engineering, and user-centric design
can transform institutional research management. The
project successfully fulfills its aim of creating a unified,
reliable, and intelligent system that simplifies workflows,
improves data accuracy, and enhances accessibility. With its
modular architecture and extensible pipeline, IRIS-Bot lays a strong foundation for future enhancements and large-
scale deployment across departments or institutions.
References
[1] Crossref. Crossref REST API Documentation: Metadata
Retrieval and DOI Lookup. Retrieved from Crossref API
documentation. https://api.crossref.org
[2] Google Scholar. Publication Search and Citation
Metrics System.
[3] Guo, L., et al. “Duplicate Record Detection Methods in
Bibliographic Databases.” Journal of Information
Science, 2018.
[4] Gupta, B. “Institutional Repository Frameworks and
Data Quality.” Library Management Journal, Vol. 39,
No. 4, 2019.
[5] Johnson, T. “Data-Driven Academic Analytics for
Institutional Ranking.” Journal of Machine
Intelligence, Vol. 5, 2022.
[6] Kim, J. et al. “Deep Learning for Research Classification
Using BERT.” Nature Computational Science, Vol. 3, pp.
44–55, 2021.
[7] Lee, S. “Building Intelligent Research Information
Systems: A Case Study.” IEEE Access, Vol. 9, pp. 55310–
55322, 2021.
[8] Meyer, C. “Hybrid Semantic–Lexical Search
Techniques for Academic Document Retrieval.”
Information Retrieval Journal, 2020.
[9] OpenAlex Dataset, 2024. Available: https://
openalex.org
[10] Patel, M. “Hybrid Verification Approaches in Scholarly
Databases.” Data Science Review, Vol. 12, No. 3, 2020.
[11] pdfplumber Documentation. Comprehensive PDF
ParsingandTextExtractionToolkit. Official
Documentation.
[12] PostgreSQL. pgvectorExtension
Documentation. PostgreSQL Global Development
Group.
[13] PyMuPDF Documentation. PyMuPDF (Fitz) PDF Text
Extraction and Layout Processing. Official PyMuPDF
Reference.
[14] Roy, A. and Thomas, P. “Hybrid Metadata Verification
Framework for Scholarly Records.” Information
Systems Research, 2023.
[15] Google Scholar Data Parsing API (ScholarPy), 2023.
Available: https://scholarpy. readthedocs.io/
[16] Scimago Journal & Country Rank. Quartile (Q1–Q4)
Computation and Journal Ranking Methodology. International Journal of Computer Techniques (IJCT) ISSN : 2394-2231 http://www.ijctjournal.org [17] Scopus Database. Journal Indexing and Evaluation
Guidelines. Elsevier.
[18] Scopus API Developer Guide, Elsevier, 2024. Available:
https://dev.elsevier.com/
[19] Smith, A. “Automated Extraction of Scientific
Metadata Using NLP Techniques.” IEEE Transactions
on Information Systems, 2021.
[20] UGC CARE. Reference List of Quality Journals for Indian
Academia.
[21] Web of Science. JournalCitationReportsandIndexing
Framework. Clarivate Analytics.
[22] Academic Research on Semantic Similarity and
Document Embeddings. Various IR and NLP
Publications.
How to Cite This Paper
Mr.MADHU CK, GANAVI D GOWDA, LAVANYA NM, NAVEENA K, KUSHAL CS (2025). IRIS-Bot: Institutional Research Information System using Intelligent Automation and Deep Learning. International Journal of Computer Techniques, 12(6). ISSN: 2394-2231.
IRIS-Bot Institutional Research Information System using Intelligent Automation and Deep LearningDownload
Related Posts:









