Generating front-end code from UI screenshots is a relatively new research interest because it could significantly speed up UI development procedures. This paper describes a systematic review of the methods and systems for transforming a visual UI design into executable front-end code, specifically with focus on React components that are styled in Tailwind CSS. Academic research models and tools are considered. The review covers pipelines of deep learning pipelines consisting of visual feature extraction, semantic understanding, and code synthesis, based on convolutional and transformer-based designs. The evaluation strategies and deployment challenges that are commonly used are also analysed. Sustained efforts in recent years have produced significant advances in code accuracy and structural quality, but there are still many issues to be addressed, such as the generation of a responsive layout, multimodal reasoning and iterative correction mechanisms. The paper concludes by summarizing promising research directions to increase robustness and usability in practice.
Keywords
UI-to-code, Screenshot conversion, React, Tailwind CSS, Deep learning, Code generation
Conclusion
To wrap up, UI-to-code generation has come a long way in recent years, greatly aided by the use of deep learning, computer vision, and large language models. Initial work like CNN–RNN pipelines showed that it was possible to translate GUI images into code, and more recent work has explored the use of transformers, multimodal learning, and large-scale pretrained models to enhance accuracy and scalability. These systems have been further developed and benchmarked with datasets such as RICO and evaluation measures like CodeBLEU. Yet, with all these developments, there remain significant issues in the way that current solutions work with complex real-world interfaces, accessibility, total automation, and robustness with a wide variety of UI designs. Further limitations, including in the amount of data available, in the criteria for evaluating the utility and results of the data, and in the computational requirements, still obstruct practical use. Furthermore, they are not interactive or adaptive and this makes them less useful in real world development environments. Overall, there is a need for more comprehensive and efficient approaches that fuse multimodal understanding, accessibility awareness, and scalable architectures. Going forward, creating powerful, easy-to-use and completely automated systems to connect design and implementation, to achieve more efficient and intelligent UI development workflows, is a research area that needs to be addressed.
References
[1]T. Beltramelli, “pix2code: Generating Code from a Graphical User Interface Screenshot,” arXiv, 2017, arXiv:1705.07962.
[2]D. Soselia, K. Saifullah, and T. Zhou, “Learning UI-to-Code Reverse Generator Using Visual Critic Without Rendering,” arXiv, 2023, arXiv:2305.14637.
[3]H. H. Zhang, T. Zhang, et al., “Widget2Code: From Visual Widgets to UI Code via Multimodal LLMs,” arXiv, 2025, arXiv:2512.19918.
[4]S. Park, B. Lee, and J. Kim, “RICO: A Mobile App Dataset for Building Data-Driven Design Tools,” in Proc. ACM SIGSOFT Symp. Found. Softw. Eng. (FSE), 2018.
[5]R. Lu, S. Ren, Y. Sheng, et al., “CodeBLEU: A Method for Evaluating Code Generation,” in Proc. AAAI Conf. Artif. Intell., 2020.
[6]A. Vaswani, N. Shazeer, N. Parmar, et al., “Attention Is All You Need,” in Advances in Neural Information Processing Systems (NeurIPS), 2017. [7]A. Dosovitskiy, L. Beyer, A. Kolesnikov, et al., “An Image is Worth 16×16 Words: Transformers for Image Recognition at Scale,” in Proc. Int. Conf. Learn. Representations (ICLR), 2021.
[8]M. Chen, J. Tworek, H. Jun, et al., “Evaluating Large Language Models Trained on Code,” arXiv, 2021, arXiv:2107.03374.
[9]B. Rozière, J. Gehring, A. Joulin, et al., “Code Llama: Open Foundation Models for Code,” 2023.
[10]M. Radford, A. Kim, C. Hallacy, et al., “Learning Transferable Visual Models From Natural Language Supervision,” in Proc. Int. Conf. Mach. Learn. (ICML), 2021.
[11]N. Carion, F. Massa, G. Synnaeve, et al., “End-to-End Object Detection with Transformers,” in Proc. Eur. Conf. Comput. Vis. (ECCV), 2020.
[12]J. Redmon, S. Divvala, R. Girshick, and A. Farhadi, “You Only Look Once: Unified, Real-Time Object Detection,” in Proc. IEEE Conf. Comput. Vis. Pattern Recognit. (CVPR), 2016.
[13]Y. Xu, M. Li, L. Cui, et al., “LayoutLM: Pre-training of Text and Layout for Document Image Understanding,” in Proc. ACM SIGKDD Conf. Knowl. Discov. Data Min. (KDD), 2020.
[14]Z. Li, Y. Xu, J. Cui, et al., “DocFormer: End-to-End Transformer for Document Understanding,” in Proc. IEEE Int. Conf. Comput. Vis. (ICCV), 2021.
[15]Y. Wan, X. Li, and H. Zhang, “Divide-and-Conquer: Generating UI Code from Screenshots,” 2024.
[16]J. Deng, K. Yao, and L. Zhang, “VisRefiner: Learning from Visual Differences for Screenshot-to-Code Generation,” 2026.
[17]J. Yoon, S. Kim, and H. Lee, “A11YN: Aligning LLMs for Accessible Web UI Code Generation,” 2025.
[18]H. Suh, J. Park, and K. Choi, “Human or LLM? A Comparative Study on Accessible Code Generation Capability,” 2025.
[19]P. Mowar, R. Singh, and A. Gupta, “CodeA11y: Making AI Coding Assistants Useful for Accessible Web Development,” 2025.
[20]“UI2Code^N: A Visual Language Model for Test-Time Scalable Interactive UI-to-Code Generation,” 2025.
How to Cite This Paper
Harsh Bhanudas Pathare, Pooja R. Tupe (2026). Vision-to-Code Models for Automated UI Generation with React and Tailwind CSS. International Journal of Computer Techniques, 13(3). ISSN: 2394-2231.