A Robust Deep Learning Framework for Detecting Real and AI-Generated Images Using Multi-Generator and Multi-Scale Feature Analysis | IJCT Volume 13 – Issue 1 | IJCT-V13I1P20
A Robust Deep Learning Framework for Detecting Real and AI-Generated Images Using Multi-Generator and Multi-Scale Feature Analysis | IJCT Volume 13 – Issue 1 | IJCT-V13I1P20
The rapid advancement of generative models such as GANs, autoencoders, and diffusion architectures has significantly increased the realism of synthetic images, creating challenges for reliable real-versus-fake image classification. This research proposes a robust deep learning framework capable of generalizing across multiple AI image generators while accurately distinguishing real images from synthetic content. To address existing research gaps—limited cross-generator generalization, insufficient fine-grained artifact detection, and lack of real-world distortions—a unified and diverse dataset was constructed by integrating real images, DeepFake Detection (DFDC) data, StyleGAN-generated images, ProGAN/PGGAN outputs, and Stable Diffusion synthetic images sourced from Kaggle. All images were standardized and augmented with real-world distortions such as compression artifacts, low-light noise, blur, and occlusions to enhance deployment robustness.A hybrid deep learning architecture was developed that combines CNN backbone networks with Vision Transformer (ViT) layers, multi-scale feature pyramid modules, and attention-based fusion blocks to capture both global semantics and subtle generative artifacts. The model was trained with stratified sampling, transfer learning, and controlled augmentation strategies. Comprehensive evaluation using accuracy, precision–recall, F1-score, ROC-AUC, and cross-generator testing demonstrates that the framework provides strong generalization to unseen generative models, including diffusion-based datasets. Results show significant improvements in robustness against real-world distortions and variability, enabling reliable application in digital forensics, content authentication, and AI-generated media regulation. The proposed system provides a promising pathway toward universal detectors capable of adapting to rapidly evolving generative technologies.
This study presents a comprehensive and robust deep learning framework for detecting real and AI-generated images across diverse generative models, including GANs, autoencoders, and diffusion-based architectures. By integrating multi-generator datasets such as the DeepFake Detection Dataset and Stable Diffusion synthetic images, the research addresses one of the major limitations in existing detection systems poor generalization to unseen generators. The two proposed models, EVAF-Net and XSA-UNet, demonstrate that combining CNN backbones with multi-scale feature extraction, attention mechanisms, and transformer-based global reasoning significantly enhances the model’s ability to capture both subtle textures and high-level structural inconsistencies present in synthetic images.Experimental results confirm high performance across all evaluation metrics, with both architectures achieving strong accuracy, precision–recall balance, and near-perfect AUC–ROC values. EVAF-Net shows excellent robustness, achieving 97.8% and 95.3% accuracy on the DeepFake and Stable Diffusion datasets respectively, while XSA-UNet further improves performance with 98.4% and 96.5% accuracy. Cross-dataset external validation reinforces the generalization capability of the proposed models, with only minor performance drops when exposed to unseen generative distributions. This demonstrates that the integration of multi-scale features, attention refinement, and hybrid CNN–Transformer design effectively mitigates generator-specific bias. Future work may focus on expanding cross-domain datasets, exploring lightweight architectures for real-time deployment, and improving interpretability of detection decisions. Additionally, future research could investigate adaptive learning strategies to automatically update the models against emerging generative techniques, and explore multimodal detection approaches combining audio, video, and text for more comprehensive media authentication. This research contributes an important step toward universal and future-proof detectors capable of adapting to rapidly evolving AI-generated media.
References
[1]M. Goebel, L. Nataraj, T. Nanjundaswamy, T. M. Mohammed, S. Chandrasekaran, and B. S. Manjunath, “Detection, Attribution and Localization of GAN Generated Images,” arXiv preprint, arXiv:2007.10466, Jul. 2020. arXiv
[2]J. Frank, T. Eisenhofer, L. Schönherr, A. Fischer, D. Kolossa, and T. Holz, “Leveraging Frequency Analysis for Deep Fake Image Recognition,” arXiv preprint, arXiv:2003.08685, Mar. 2020. arXiv
[3]A. Agarwal, A. Agarwal, S. Sinha, M. Vatsa, and R. Singh, “MDCSDNetwork: MultiDomain Cross Stitched Network for Deepfake Detection,” arXiv preprint, arXiv:2109.07311, Sep. 2021. arXiv
[4]B. Wang, X. Wu, Y. Tang, Y. Ma, Z. Shan, and F. Wei, “Frequency Domain Filtered Residual Network for Deepfake Detection,” Mathematics, vol. 11, no. 4, Art. no. 816, 2023. MDPI
[5]H. Geng, T. Lu, W. Huang, and B. Ding, “Deepfake Detection Technology Integrating Spatial Domain and Frequency Domain,” Frontiers in Computing and Intelligent Systems, vol. 11, no. 3, pp. 54–62, 2025, doi: 10.54097/yrahtw96. Darcy & Roy Press
[6]M. Bah and M. Dahmane, “Enhanced Deepfake Detection Using Frequency Domain Upsampling,” Proc. of SCITEPRESS – 2024 International Conference on Computer Vision Theory and Applications, pp. …, 2024. (SCITEPRESS) SciTePress
[7]L. Sen and S. Mukherjee, “A Novel Unified Approach to Deepfake Detection of Images,” OpenReview, 2025. OpenReview
[8]L. Lv, T. Wang, M. Huang, R. Liu, and Y. Wang, “A SpatialFrequency Aware MultiScale Fusion Network for RealTime Deepfake Detection,” arXiv preprint, arXiv:2508.20449, Aug. 2025. arXiv+1
[9]C. Tan, Y. Zhao, S. Wei, G. Gu, P. Liu, and Y. Wei, “FrequencyAware Deepfake Detection: Improving Generalizability through Frequency Space Learning,” arXiv preprint, arXiv:2403.07240, Mar. 2024. arXiv
[10]J. Wang, Z. Wu, W. Ouyang, X. Han, J. Chen, S.-N. Lim, and Y.-G. Jiang, “M2TR: MultiModal MultiScale Transformers for Deepfake Detection,” in Proc. of ICMR ’22: International Conference on Multimedia Retrieval, 2022. Papers with Code
[11]H. Zhao, C. Xu, Y. Li, and J. Tian, “Multiattentionbased Approach for Deepfake Face and Expression Swap Detection and Localization,” EURASIP Journal on Image and Video Processing, vol. 2023, no. 1, 2023. SpringerOpen
[12]Y. Qiao, R. Tian, and Y. Wang, “Towards Generalizable Deepfake Detection with SpatialFrequency Collaborative Learning and Hierarchical CrossModal Fusion,” arXiv preprint, arXiv:2504.17223, Apr. 2025. arXiv
[13]L. Alam, M. T. Islam, and S. S. Woo, “SpecXNet: A DualDomain Convolutional Network for Robust Deepfake Detection,” arXiv preprint, arXiv:2509.22070, Sep. 2025. arXiv
[14]H. S. I. A. Sadruddin, A. Sardouie, and M. S. M. Saeed, “A Robust Ensemble Model for Deepfake Detection of GANGenerated Images on Social Media,” Discover Computing, vol. 28, no. 1, Art. 41, 2025, doi:10.1007/s10791-025-09538-w. SpringerLink
[15]Y. Qiao, R. Tian, and Y. Wang, “Towards Generalizable Deepfake Detection with Spatial-Frequency Collaborative Learning and Hierarchical Cross-Modal Fusion,” ArXiv, 2025. (same as #12 but for completeness)
[16]A. Sardhara, V. Vekariya, and J. Tadhani, “A Hybrid CNN-LSTM Approach for High-Accuracy Image Forgery Detection,” International Journal of Engineering Sciences & Research Technology, vol. 14, no. 2, pp. 720–728, 2025. [Online]. Available: https://theaspd.com/index.php/ijes/article/view/7206
[17]Mdpi, “Precision Deepfake Image Detection via Transfer Learning & CNN-LSTM,” Electronics, vol. 13, no. 9, Art. no. 1662, 2024. [Online]. Available: https://www.mdpi.com/2079-9292/13/9/1662
[18]V. M. Patel and S. Degadwala, “Deepfake Detection Using Convolutional Neural Networks and LSTM Modelling,” International Journal of Scientific Research and Technology, 2024. [Online]. Available: https://ijsrst.com/index.php/home/article/view/IJSRST2512361
[19]D. Karishma, S. Umadevi, S. Teja, M. A. Shine, and N. I. Hasitha, “Deepfake Face Detection Using LSTM and CNN,” International Journal of Innovative Science and Advanced Engineering, 2024. [Online]. Available: https://www.ijisae.org/index.php/IJISAE/article/view/7287
[20]K. Rohith, K. Nagarjuna, B. V. Yadav, and G. R. Chandra Kumar, “Face Morph Attack Detection Using LSTM-CNN Hybrid Model,” International Journal for Research in Applied Science & Engineering Technology, 2024. [Online]. Available: https://ijraset.com/research-paper/face-morph-attack-detection-using-lstm-cnn-hybrid-model
[21]A. G. Singh and P. Sharma, “Hybrid Deep Learning Framework: CNN, LSTM & Vision Transformers for Deepfake Detection,” Journal of Emerging Science and Research, 2025. [Online]. Available: https://journal.esrgroups.org/jes/article/view/9109
[22]Y. Shelar, P. Sharma, and C. S. D. Rawat, “An Improved VGG16 and CNN-LSTM Deep Learning Model for Image Forgery Detection,” International Journal of Research in Innovative Technology and Computer Science, 2024. [Online]. Available: https://ijritcc.org/index.php/ijritcc/article/view/6157
[23]S. K. Sharma, W. A. Khan, and M. Kumar, “Estimation and Concealment Deep Fake Detection in Images using Hybrid LSTM,” International Journal of Innovative Science and Advanced Engineering, 2024. [Online]. Available: https://ijisae.org/index.php/IJISAE/article/view/4295
[24]R. Anand, L. Santhosh, and A. K., “Video Authenticity Detection Using Web-Enabled Techniques (CNN, LSTM, ResNeXt),” International Journal for Research in Applied Science & Engineering Technology, 2024. [Online]. Available: https://www.ijraset.com/research-paper/video-authenticity-detection-using-web-enabled-techniques
P. Saikia and D. Dholaria, “A Hybrid CNN-LSTM Model for Video Deepfake Detection by Leveraging Optical Flow Features,” arXiv preprint arXiv:2208.00788, 2022. [Online]. Available: https://arxiv.org/abs/2208.00788
How to Cite This Paper
Yashraj Namdeo, Nitesh Gupta (2025). A Robust Deep Learning Framework for Detecting Real and AI-Generated Images Using Multi-Generator and Multi-Scale Feature Analysis. International Journal of Computer Techniques, 12(6). ISSN: 2394-2231.