
LLMs as Database Administrators: A Survey of AI-Driven Schema Design and Index Recommendation | IJCT Volume 13 – Issue 2 | IJCT-V13I2P77

International Journal of Computer Techniques
ISSN 2394-2231
Volume 13, Issue 2 | Published: March – April 2026
Table of Contents
ToggleAuthor
Mohamed Chetouani, Mahfoudi Souhail
Abstract
Large language models are reshaping database administration by enabling automation across tasks — schema design, index recommendation, configuration tuning, and diagnosis — that classical cost-model-driven tools handle only narrowly. This survey covers ten representative systems published between 2023 and 2026, organized under a four-axis taxonomy of task scope, LLM integration paradigm, deployment model, and autonomy level, with deep-dive comparisons on the two most mature tasks: index recommendation and schema design. For index recommendation, LLM-based advisors such as LLMIA, LLMIdxAdvis, and MAAdvisor match or exceed production baselines like Microsoft’s DTA, though a persistent gap between recommendation quality and validation cost remains unresolved. For schema design, the literature is earlier-stage and lacks shared benchmarks. Across all systems, three findings recur: database feedback loops separate effective advisors from naive prompting baselines, hallucination takes domain-specific forms requiring targeted mitigation, and the tension between frontier-model capability and on-premise deployment constraints is unresolved. Five open challenges — schema scale, cost-model coupling, workload drift, trust and explainability, and standardized benchmarking — define the road ahead for LLM-driven database administration.
Keywords
LLMs, database administration, index recommendation, schema design, in-context learning, retrieval-augmented generation, multi-agent systems, database tuning, DBA automation, large language models,
Conclusion
This survey has examined the emerging body of work applying large language models to tasks traditionally owned by database administrators, with schema design and index recommendation as the two anchoring case studies. We organized the literature along four taxonomy axes (task scope, LLM integration paradigm, deployment model, and autonomy level) and found that the field clusters into three groups: holistic copilots that combine RAG and multi-agent orchestration across diagnosis, tuning, and natural-language interfaces (D-Bot, DB-GPT, λ-Tune); narrow single-task advisors that use in-context learning or multi-agent decomposition for index recommendation (LLMIA, LLMIdxAdvis, AMAZe); and schema-oriented systems that apply prompting or RAG to conceptual modeling, metadata enrichment, and schema linking.
For index recommendation, the surveyed systems demonstrate that LLMs can produce competitive or superior recommendations relative to classical cost-based advisors, particularly when grounded through curated demonstrations, iterative database feedback, or multi-agent decomposition. The evidence is qualified, however: head-to-head evaluation against a production-grade baseline (DTA) reveals high variance across LLM invocations, workload-dependent failures, and a validation-cost bottleneck that currently prevents unattended production deployment. For schema design, the literature is earlier-stage, with prototype tools and domain-specific pipelines but without the shared benchmarks or systematic evaluation that would permit confident claims about maturity.
Three cross-cutting observations emerge. First, tool use and database feedback loops are what separate effective LLM-based advisors from naive prompting baselines; the tighter the loop, the more reliable the output. Second, hallucination takes domain-specific forms in the DBA context (non-existent indexes, performance-regressing recommendations, schema violations) and requires domain-specific mitigation strategies rather than generic guardrails. Third, the tension between frontier-model capability and on-premise deployment constraints remains unresolved: the strongest results come from cloud-hosted models, but enterprise data governance often demands private infrastructure.
The trajectory of the field points toward hybrid architectures that integrate LLM reasoning with classical cost-model validation, toward shared benchmarks that enable reproducible comparison across systems and DBMS engines, and toward continual-learning mechanisms that adapt to workload drift without full offline reconstruction. Closing these gaps will determine whether LLM-driven database administration moves from a research capability to an operational one.
References
[1]Zhaoyan Sun, Xuanhe Zhou, Jianming Wu, Wei Zhou, and Guoliang Li. 2025. D-Bot: An LLM-Powered DBA Copilot. In Companion of the 2025 International Conference on Management of Data (SIGMOD/PODS ’25). Association for Computing Machinery, New York, NY, USA, 235–238. https://doi.org/10.1145/3722212.3725091
[2]Victor Giannakouris and Immanuel Trummer. 2025. λ-Tune: Harnessing Large Language Models for Automated Database System Tuning. Proc. ACM Manag. Data 3, 1, Article 2 (February 2025), 26 pages. https://doi.org/10.1145/3709652
[3]LLMIA: An Out-of-the-Box Index Advisor via In-Context Learning with LLMs Xinxin Zhao, Xinmei Huang, Haoyang Li, Jing Zhang, Shuai Wang, Tieying Zhang, Jianjun Chen, Rui Shi, Cuiping Li, and Hong Chen. “LLMIA: An Out-of-the-Box Index Advisor via In-Context Learning with LLMs.” arXiv:2503.07884v2 [cs.DB], March 2025.
[4]Evaluating the Practical Effectiveness of LLM-Driven Index Tuning with Microsoft Database Tuning Advisor Xiaoying Wang, Wentao Wu, Vivek Narasayya, and Surajit Chaudhuri. “Evaluating the Practical Effectiveness of LLM-Driven Index Tuning with Microsoft Database Tuning Advisor.” arXiv:2603.09181 [cs.DB], March 10, 2026. Microsoft Research, Redmond, USA.
[5]LLMIdxAdvis: Resource-Efficient Index Advisor Utilizing Large Language Model Xinxin Zhao, Haoyang Li, Jing Zhang, Xinmei Huang, Tieying Zhang, Jianjun Chen, Rui Shi, Cuiping Li, and Hong Chen. “LLMIdxAdvis: Resource-Efficient Index Advisor Utilizing Large Language Model.” CoRR abs/2503.07884 (2025).
[6]LLMIdxAdvis: Resource-Efficient Index Advisor Utilizing Large Language Model Xinxin Zhao, Haoyang Li, Jing Zhang, Xinmei Huang, Tieying Zhang, Jianjun Chen, Rui Shi, Cuiping Li, and Hong Chen. “LLMIdxAdvis: Resource-Efficient Index Advisor Utilizing Large Language Model.” CoRR abs/2503.07884 (2025).
[7]Towards an LLM-based Tool for Automated Database Design P. Divljan and D. Brdjanin. “Towards an LLM-based Tool for Automated Database Design.” In Proceedings of the ER 2025 Posters, Demos, and Workshops (ER25_PAD), CEUR Workshop Proceedings, Vol. 4099, 2025. https://ceur-ws.org/Vol-4099/ER25\_PAD\_Divljan.pdf
[8]LLMDap: LLM-based Data Profiling and Sharing Shanshan Jiang et al. “LLMDap: LLM-based Data Profiling and Sharing.” In Proceedings of the VLDB 2025 Workshops, DEC Workshop, 2025. SINTEF AS. https://www.vldb.org/2025/Workshops/VLDB-Workshops-2025/DEC/DEC25\_5.pdf
[9]In-depth Analysis of LLM-based Schema Linking George Katsogiannis-Meimarakis, Katsiaryna Mirylenka, Paolo Scotton, Francesco Fusco, and Abdel Labbi. “In-depth Analysis of LLM-based Schema Linking.” In Proceedings of the 29th International Conference on Extending Database Technology (EDBT 2026), Tampere, Finland, March 24–27, 2026, pp. 117–130. OpenProceedings.org.
[10]DB-GPT: Empowering Database Interactions with Private Large Language Models Siqiao Xue, Caigao Jiang, Wenhui Shi, Fangyin Cheng, Keting Chen, Hongjun Yang, Zhiping Zhang, Jianshan He, Hongyang Zhang, Ganglin Wei, Wang Zhao, Fan Zhou, Danrui Qi, Hong Yi, Shaodong Liu, and Faqiang Chen. “DB-GPT: Empowering Database Interactions with Private Large Language Models.” arXiv:2312.17449 [cs.DB], December 2023. https://arxiv.org/abs/2312.17449
[11]Surajit Chaudhuri and Vivek Narasayya. 1998. AutoAdmin “what-if” index analysis utility. In Proceedings of the 1998 ACM SIGMOD international conference on Management of data (SIGMOD ’98). Association for Computing Machinery, New York, NY, USA, 367–378. https://doi.org/10.1145/276304.276337
[12]Y. Wu, X. Zhou, Y. Zhang and G. Li, “Automatic Index Tuning: A Survey,” in IEEE Transactions on Knowledge and Data Engineering, vol. 36, no. 12, pp. 7657-7676, Dec. 2024, doi: 10.1109/TKDE.2024.3422006.
[13]Reflexion: Language Agents with Verbal Reinforcement Learning Noah Shinn, Federico Cassano, Ashwin Gopinath, Karthik Narasimhan, and Shunyu Yao. “Reflexion: Language Agents with Verbal Reinforcement Learning.” In NeurIPS 2023. arXiv:2303.11366.
[14]Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. “Retrieval-Augmented Generation for Knowledge-Intensive NLP Tasks.” In NeurIPS 2020, pp. 9459–9474.
[15]Surajit Chaudhuri and Vivek R. Narasayya. 1997. An Efficient Cost-Driven Index Selection Tool for Microsoft SQL Server. In Proceedings of the 23rd International Conference on Very Large Data Bases (VLDB ’97). Morgan Kaufmann Publishers Inc., San Francisco, CA, USA, 146–155.
[16]Dana Van Aken, Andrew Pavlo, Geoffrey J. Gordon, and Bohan Zhang. 2017. Automatic Database Management System Tuning Through Large-scale Machine Learning. In Proceedings of the 2017 ACM International Conference on Management of Data (SIGMOD ’17). Association for Computing Machinery, New York, NY, USA, 1009–1024. https://doi.org/10.1145/3035918.3064029
[17]Language Models are Few-Shot Learners (GPT-3) Tom B. Brown, Benjamin Mann, Nick Ryder, Melanie Subbiah, Jared D. Kaplan, et al. “Language Models are Few-Shot Learners.” In Advances in Neural Information Processing Systems 33 (NeurIPS 2020), pp. 1877–1901. Curran Associates, 2020. arXiv:2005.14165.
[18]Immanuel Trummer. 2022. DB-BERT: A Database Tuning Tool that “Reads the Manual”. In Proceedings of the 2022 International Conference on Management of Data (SIGMOD ’22). Association for Computing Machinery, New York, NY, USA, 190–203. https://doi.org/10.1145/3514221.3517843
[19]LLM As DBA (D-Bot vision paper) Xuanhe Zhou, Guoliang Li, and Zhiyuan Liu. “LLM As DBA.” arXiv:2308.05481 [cs.DB], August 2023.
How to Cite This Paper
Mohamed Chetouani, Mahfoudi Souhail (2026). LLMs as Database Administrators: A Survey of AI-Driven Schema Design and Index Recommendation. International Journal of Computer Techniques, 13(2). ISSN: 2394-2231.
LLMs as Database Administrators A Survey of AI-Driven Schema Design and Index RecommendationDownload
Related Posts:







