A Comparative Evaluation of Vector Databases for Retrieval-Augmented Generation (RAG) Applications | IJCT Volume 13 – Issue 2 | IJCT-V13I2P23

International Journal of Computer Techniques
ISSN 2394-2231
Volume 13, Issue 2  |  Published: March – April 2026

Author

Aanand G Nair, Anju Shukla

Abstract

In this paper vector databases were evaluated for use with Retrieval Augmented Generation (RAG) workflows in the Google Gemini Flash environment. Three database systems were compared: ChromaDB, FAISS and Milvus. For each database all six queries scanned resulted in precision of 100%. The average semantic similarity is 0.77, and keyword relevance 0.56. ChromaDB is easy to use as a fast prototyping database while FAISS runs great for lightweight local deployment, Milvus works well on large scale long-term deployments It also points out trade-offs in usability and deployability, so future benchmarking of these databases should be conducted on a large scale with more data.

Keywords

Retrieval-Augmented Generation, Vector Databases, FAISS, ChromaDB, Milvus, Semantic Search, Information Retrieval, Large Language Models, Embedding Models, Database Performance Evaluation, Open-Source Systems, RAG Architecture

Conclusion

In this study we compared the effectiveness of three vector backends (ChromaDB, FAISS, Milvus) for a Gemini-based RAG Question Answering System using controlled comparisons of a variety of cybersecurity and data science documents. We held constant the embedding model, corpus, and evaluation environment to compare how each system performed on a six-question benchmark, finding all three systems had perfect precision, with an average semantic similarity of approximately .77 and an average retrieval quality of approximately .56. Overall, our results indicate that for small-to-medium scale academic projects, developers can choose which vector backend they want to use based on ease-of-use and deployment requirements, without negatively affecting their ability to obtain answers to questions they are asking. Qualitatively, ChromaDB has the most streamlined integration of any of the options for building rapid RAG prototypes, FAISS provides fine-grained control over indexing and performs well at local experiment levels, while Milvus provides a scalable, cloud native platform for production grade applications [5], [6], [8]. Therefore, researchers and practitioners should base their decision about whether to use one of these systems on factors beyond just retrieval quality, such as operational requirements, the extent to which the system integrates into your existing infrastructure and your expectations for future increases in both the amount of data you will have to index and the number of queries you expect to receive. In general, the main point here is that the retrieval backend selection for RAG systems does not necessarily represent a performance bottleneck for small scale deployments, rather it is primarily driven by practical considerations related to the ease of deploying the system, its operational maturity and its compatibility with your existing IT infrastructure. As RAG applications evolve and transition from research prototype environments to production environments, selecting a database represents an increasing proportion of the overall effort required to achieve that goal, and the trade-offs demonstrated in this work will serve as a useful reference point during those transitions.

References

[1]Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Dai, Y., Sun, J., & Wang, H. (2023). “Retrieval-Augmented Generation for Large Language Models: A Survey.” arXiv preprint arXiv:2312.10997. [2]Patel, Y. V., Sharma, R., & Kumar, A. (2024). “Retrieval-Augmented Generation: From Naive to Adaptive Systems.” Kronika Academic Press, pp. 45–78. [3]Ganatra, S., Desai, P., Joshi, M., & Bhat, V. (2024). “Retrieval-Augmented Generation: A Comprehensive Overview.” In 2024 Conference on AI, Science, Engineering, and Technology (AIxSET), pp. 234–251. [4]PromptingGuide.ai. (2022). “Retrieval-Augmented Generation (RAG) for LLMs: Concepts and Applications.” Retrieved from https://www.promptingguide.ai/research/rag [5]LiquidMetal AI. (2025). “Pinecone vs Weaviate vs Qdrant vs FAISS vs Milvus vs Chroma: Vector Database Comparison.” Retrieved from https://liquidmetal.ai/casesAndBlogs/vector-com parison/ [6]Datacamp. (2025). “The 7 Best Vector Databases in 2026: A Comprehensive Evaluation.” Retrieved from https://www.datacamp.com/blog/the-top-5-vector -databases [7]GeeksforGeeks. (2024). “Top 15 Vector Databases that You Must Try in 2025.” Retrieved from https://www.geeksforgeeks.org/dbms/top-vector- databases/ [8]Zilliz. (2024). “Milvus vs Chroma: Vector Database Comparison and Selection Guide.” Retrieved from https://zilliz.com/comparison/milvus-vs-chroma [9]Bargule, S. (2026, January 19). “Vector Database Comparison: FAISS, Milvus, Weaviate & Others.” LinkedIn Professional Post. Retrieved from https://www.linkedin.com/posts/ [10]Data-Intelligence Blog. (2025, August). “Vector Databases for Multi-Agent RAG – A Comparative Analysis.” Data-Intelligence Hashnode. Retrieved from https://data-intelligence.hashnode.dev/ [11]Dvuchbanny, A. (2024, September 18). “The Vector Duel: FAISS vs Chroma vs Milvus—Performance, Scalability, and Use Cases.” LinkedIn Articles. Retrieved from https://www.linkedin.com/pulse/ [12]Youssef, H. (2024, April 20). “A Comprehensive Comparison Between Open-Source Vector Databases: Architecture, Performance, and Trade-offs.” Towards AI, Medium Publication. Retrieved from https://towardsai.net/p/data-science/a-comprehen sive-comparison-between-open-source-vector-dat abases

How to Cite This Paper

Aanand G Nair, Anju Shukla (2026). A Comparative Evaluation of Vector Databases for Retrieval-Augmented Generation (RAG) Applications. International Journal of Computer Techniques, 13(2). ISSN: 2394-2231.

© 2026 International Journal of Computer Techniques (IJCT). All rights reserved.

Submit Your Paper