Selecting the most appropriate machine learning algorithm for a given dataset is a recurring challenge for data scientists. This project builds an automated system that analyses dataset characteristics (meta-features), evaluates multiple ML algorithms, and recommends the best performing algorithm. The system supports tabular (CSV), image, and PDF inputs. It integrates meta-feature extraction, model training (Decision Tree, Random Forest, SVM, Logistic Regression, Naive Bayes, XGBoost, LightGBM, and others), model comparison, and a Streamlit web interface.The report details problem formulation, literature survey, methodology, system design, implementation steps, experimental setup, and usability guidelines. Results and evaluation metrics are described along with future directions. Keywords: AutoML, meta-learning, algorithm selection, Streamlit , XGBoost, LightGBM, image feature extraction, pdf parsing.
Keywords
^KEYWORDS^
Conclusion
This project presents a complete framework for automated machine learning algorithm selection and evaluation. The system accepts diverse types of datasets — including tabular CSV data, images, and PDF files — and automatically processes them into a machine learning ready format. Through meta-feature extraction, data preprocessing, and model training, the framework applies a wide range of algorithms such as Decision Tree, Random Forest, SVM, Logistic Regression, Naive Bayes, XGBoost and LightGBM.
The results demonstrate that the proposed approach can accurately and quickly identify which algorithm performs best for a given dataset based on multiple criteria such as accuracy, F1-score, training time, and memory usage. The Streamlit-based user interface enables even non-technical users to upload datasets, view extracted features, inspect model performance, and receive algorithm recommendations instantly.
This work reduces the need for manual trial-and-error in model selection, making the process faster, more objective, and more reproducible. It also serves as a foundation for further enhancements such as:
•adding more advanced algorithms (deep learning with TensorFlow/PyTorch),
•supporting additional evaluation metrics (ROC-AUC, precision, recall),
•incorporating multi-criteria decision analysis,
•and building a larger “case base” to improve recommendations through meta learning.
Overall, the system confirms that automating algorithm selection improves productivity, lowers entry barriers for beginners, and increases the likelihood of achieving good model performance on new datasets.
References
[1]L. Wegmeth, T. Vente, and J. Beel, “Recommender systems algorithm selection for ranking prediction on implicit feedback datasets,” arXiv preprint arXiv:2409.12345, 2024.
[2]S. Alissa, K. Sim, and E. Hart, “Automated algorithm selection: from feature based to feature-free approaches,” arXiv preprint arXiv:2207.11111, 2022.
[3]A. N. Author, B. K. Contributor, and C. D. Researcher, “Automated algorithm selection using meta-learning and pre-trained CNNs,” Artificial Intelligence Journal, ScienceDirect, 2023.
[4]X. Zhang and Y. Wang, “Algorithm selection using edge ML and case-based reasoning,” Journal of Cloud Computing, vol. 12, no. 1, pp. 1–15, 2023.
[5]F. Hutter, L. Kotthoff, and J. Vanschoren (Eds.), Automated Machine Learning: Methods, Systems, Challenges, Springer, 2019.
[6]P. Banerjee and S. Mukherjee, “Meta-feature based automatic model recommendation for classification tasks,” International Journal of Data Science and Analytics, 2023.
[7]M. Feurer, A. Klein, K. Eggensperger, J. Springenberg, M. Blum, and F. Hutter, “Efficient and robust automated machine learning,” Advances in Neural Information Processing Systems (NeurIPS), 2015.
[8]The dataset used in this project was obtained from Kaggle, an open-source machine learning repository that provides a wide range of datasets for data analysis and model training. https://www.kaggle.com/
How to Cite This Paper
Nagaveni Biradar, Abhinandan,
Mahantesh B, Vinay Reddy (2025). Identification of Algorithm from the Given Dataset using AI/ML Technique. International Journal of Computer Techniques, 12(6). ISSN: 2394-2231.