Conversational Database Management Tool: AI Driven Data Pipeline Orchestrator | IJCT Volume 13 – Issue 1 | IJCT-V13I1P12

International Journal of Computer Techniques
ISSN 2394-2231
Volume 13, Issue 1  |  Published: January – February 2026

Author

Snehal Meshram, Devanshu Sanjay Markam, Divyanshu Chandrashekhar Tayde

Abstract

Nowadays almost every enterprise depends on both historical data and real time data which supports most of the decision making. However, such type of analysis sometimes requires expertise in SQL, distributed real time data management, and statistical structural modeling, which limits the access for users those who doesn’t belong to technical field. We represents this tool as AI Driven Data Pipeline Orchestrator (AIDPO), a conversational analytics platform which developed to reduce this dependency on specialized skills. This system enables users to extract data and understand the analytical workflows through natural language by combining NL2SQL translation with real time Kafka Spark which helps to handle real time data sources, processing and time series forecasting. Also a dual assistant is integrated which separates the analytical requests from system level pipeline operations, allowing users through commands and execute both SQL and live streaming data. This tool was created using real and simulated real world datasets, achieving a forecasting RMSE of 14.23, anomaly detection precision of 0.80, and 94% accuracy of SQL selection queries . Now after analysing the final result we can say that our AIDPO can efficiently translate user inputs into understandable data operation.

Keywords

Natural Language to SQL (NL2SQL), Data Pipeline Orchestration, Real-Time Streaming Analytics, Forecasting, Anomaly Detection, Conversational AI.

Conclusion

This AI-Driven Data Pipeline Orchestrator (AIDPO) introduces a best conversational tool that just simplifies the interaction with complex data systems and automates end to end analytical dataflows. By just integrating a natural language input interpretation and SQL generation, real-time streaming analytics, forecasting models, and automated visualization, this system effectively bridges the gap between technical data sources and non technical users. The dual assistant interface increases the accessibility by just showing users through both the query formulation system and insight review system, enabling inherent analytics without any requiring knowledge of SQL, data pipelines, or statistical modeling. Experimental evaluation shows that our AIDPO execute very reliably across over multiple dimensions, including the accuracy of query translation, live data streaming throughput, quality of forecasting, and detection of anomaly. The orchestration of Kafka and PySpark provides a scalable foundation or base for handling various dynamic and high velocity data, while the visualization engine delivers clear and actionable patterns. These final results bring out the AIDPO’s ability as an intelligent assistant for data analysis, monitoring, and decision support in environment that depends on continuous and live data flow. Overall, this system presents a significant step towards democratizing data analytics by just simple combining the conversational AI with an automated data engineering processes. With future improvement in model development, multimodal inter activity, security, and flexibility, AIDPO can evolve into a comprehensive platform for real time interactive, and intelligent data analysis within modern organizations.

References

[1]X. Zhang et al., “A comprehensive survey on natural language to SQL translation,” ACM Comput. Survey, 2023. [2]K. Xu et al., “RAT-SQL: Relation-aware schema encoding for text-to- SQL parsers,” Proc. ACL, 2020. [3]V. Zhong, C. Xiong, and R. Socher, “Seq2SQL: Generating structured queries from natural language using reinforcement learning,” arXiv, 2017. [4]B. Bogin, M. Gardner, and J. Berant, “Representing schemas with graph neural network for text-to-SQL parsing,” ACL, 2019. [5]H. Yu et al., “SyntaxSQLNet: Syntax tree network for complex and cross domain text to SQL generation,” EMNLP, 2018. [6]A. Thusoo et al., “Hive: A warehousing solution over a map-reduce framework,” Proc. VLDB, 2010. [7]T. Akidau et al., “The dataflow model: A practical approach to balancing correctness, latency, and cost,” Proc. VLDB, 2015. [8]M. Armbrust et al., “Structured streaming: A declarative API for real time application in Apache Spark,” Proc. SIGMOD, 2018. [9]J. Kreps et al., “Kafka: A distributed messaging system for log processing,” NetDB, 2011. [10]Apache Software Foundation, “Apache Kafka Documentation,” 2024. [11]S. Makridakis et al., “Statistical and machine learning forecasting methods: Concerns and ways forward,” PLOS One, 2018. [12]R. Hyndman and G. Athanasopoulos, Forecasting: Principles and Practice, Otexts, 2020. [13]S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural Computation, 1997. [14]C. Chatfield, The Analysis of Time Series, 6th ed., Chapman and Hall, 2004. [15]V. Chandola, A. Banerjee, and V. Kumar, “Anomaly detection: A survey,” ACM Comput. Surveys, 2009. [16]S. Gupta et al., “Outlier detection for temporal data: A review,” IEEE TKDE, 2014. [17]N. Laptev et al., “Time-series anomaly detection using deep learning,” Proc. ACM KDD Workshop, 2015. [18]A. Halevy, P. Norving, and F. Pereira, “The unreasonable effectiveness of data,” IEEE Intell. Syst., 2009. [19]P. Deutsch, “Data engineering for intelligent systems,” IEEE Software, 2022. [20]S. Amershi et al., “Guidelines for human-AI interaction,” CHI, 2019.

How to Cite This Paper

Snehal Meshram, Devanshu Sanjay Markam, Divyanshu Chandrashekhar Tayde (2025). Conversational Database Management Tool: AI Driven Data Pipeline Orchestrator. International Journal of Computer Techniques, 12(6). ISSN: 2394-2231.

© 2025 International Journal of Computer Techniques (IJCT). All rights reserved.

Submit Your Paper