2 Buletin Ilmiah Sarjana Teknik Elektro ISSN: 2685-9572
Vol. 7, No. 4, December 2025, pp. 980-992
Improved DeepFake Image Generation Using StyleGAN2-ADA with Real-Time Personal Image Projection
Ali A. Abed 1, Doaa Alaa Talib 2, Abdel-Nasser Sharkawy 3,4
1 Department of Mechatronics Engineering, University of Basrah, Basrah, Iraq
2 Department of Laser and Optoelectronics Engineering, Shatt Al-Arab University College, Basrah, Iraq
3 Mechanical Engineering Department, Faculty of Engineering, Qena University, Qena 83523, Egypt
4 Mechanical Engineering Department, College of Engineering, Fahad Bin Sultan University, Tabuk 47721, Saudi Arabia
ARTICLE INFORMATION | ABSTRACT | |
Article History: Received 06 September 2025 Revised 04 November 2025 Accepted 08 December 2025 | This paper presents an improved approach for DeepFake image generation using StyleGAN2-ADA framework. The system is designed to generate high-quality synthetic facial images from a limited dataset of personal photos in real time. By leveraging the Adaptive Discriminator Augmentation (ADA) mechanism, the training process is stabilized without modifying the network architecture, enabling robust image generation even with small-scale datasets. Real-time image capturing and projection techniques are integrated to enhance personalization and identity consistency. The experimental results demonstrate that the proposed method achieve a very high generation performance, significantly outperforming the baseline StyleGAN2 model. The proposed system using StyleGAN2-ADA achieves 99.1% identity similarity, a low Fréchet Inception Distance (FID) of 8.4, and less than 40 ms latency per generated frame. This approach provides a practical solution for dataset augmentation and supports ethical applications in animation, digital avatars, and AI-driven simulations. | |
Keywords: DeepFake Image Generation; StyleGAN2-ADA; Generative Adversarial Networks; Real-Time Projection; Personal Image Dataset; Adaptive Discriminator Augmentation | ||
Corresponding Author: Abdel-Nasser Sharkawy, Mechanical Engineering Department, Faculty of Engineering, Qena University, Qena 83523, Egypt. | ||
This work is licensed under a Creative Commons Attribution-Share Alike 4.0 | ||
Document Citation: A. A. Abed, D. A. Talib, and A.-N. Sharkawy, “Improved DeepFake Image Generation Using StyleGAN2-ADA with Real-Time Personal Image Projection,” Buletin Ilmiah Sarjana Teknik Elektro, vol. 7, no. 3, pp. 980-992, 2025, DOI: 10.12928/biste.v7i4.14659. | ||
This section is divided into four subsections. Subsection 1.1 presents a background about the DeepFake technology using StyleGAN family and StyleGAN2-ADA. Subsection 1.2 illustrates problem statement and subsection 1.3 presents the main contribution and novelty of this paper. Subsection 1.4 shows the content and the organization of the paper.
In recent years, DeepFake technology has emerged as a prominent subfield of artificial intelligence (AI), enabling the creation of synthetic yet highly realistic images and videos, [1]-[4]. DeepFake systems are based on deep generative models, particularly Generative Adversarial Networks (GANs), and have found applications in entertainment, video games, virtual avatars, and dataset augmentation. However, this technology also poses serious threats in terms of misinformation, identity fraud, and non-consensual content creation [5]. Among the most notable advancements in GAN-based image generation is the StyleGAN family, developed by NVIDIA researchers, [6][7]. StyleGAN introduced a novel generator architecture that decouples high-level attributes (e.g., identity, pose) from stochastic details (e.g., skin texture, hair) through a style-based synthesis mechanism [7]. StyleGAN2, a refined version of this architecture, demonstrated exceptional results in producing high-fidelity facial images. Nonetheless, one of the major limitations of StyleGAN2 is its reliance on large-scale, high-quality datasets, which makes it impractical for domains with limited data availability or personalization requirements [8][9].
To address this constraint, Karras et al. proposed StyleGAN2-ADA (Adaptive Discriminator Augmentation), which introduces adaptive augmentation techniques during discriminator training. These augmentations mitigate overfitting and improve the stability of training under low-data regimes, [10]. The effectiveness of StyleGAN2-ADA has been validated in multiple benchmark studies, showing superior performance in few-shot and transfer learning scenarios, [11][12]. While StyleGAN2-ADA significantly improves training efficiency, real-time DeepFake image generation based on personal datasets remains a relatively under-explored domain. Most previous works have focused on large public datasets, such as FFHQ or CelebA, and lack mechanisms for real-time personalization or projection [13]-[16]. Talib & Abed has demonstrated the feasibility of combining StyleGAN2-ADA with limited personal images and real-time webcam inputs, but the framework requires further performance evaluation in terms of identity preservation, speed, and accuracy, [17]. Given these gaps, this paper proposes an enhanced system for real-time DeepFake image generation using StyleGAN2-ADA, tailored for small personal datasets. The system incorporates a projection mechanism to align personal images to the latent space and generate realistic outputs on-the-fly, with a focus on maintaining identity consistency and high perceptual quality.
The current StyleGAN2-ADA based approaches face the following three limitations and restrictions as follows:
The goal of this research is to develop a Deepfake generation system can overcomes the previous mentioned limitations and capable of:
The main contribution of this paper is to develop an improved approach for deepFake image generation using StyleGAN2-ADA with real-time personal image projection. In detail, the main contributions are listed in three points as follows:
The remainder of the paper is organized as follows: Section 2 presents previous related works on Deepfake image generation. Section 3 explains the proposed framework and the followed methodology. Section 4 presents the experimental results and the evaluation of the proposed method. Section 5 concludes the paper with suggestions for some future works.
The field of DeepFake generation and detection has undergone rapid evolution in the past decade, primarily driven by advancements in Generative Adversarial Networks (GANs). In this section, we present a review of the most relevant works, organized into two main research streams as follows:
The concept of GANs was first introduced by Goodfellow et al. [18], where a generator and discriminator are trained in an adversarial fashion. StyleGAN, developed by Karras et al. [7], introduced a novel style-based architecture that enables better control over high-level visual attributes. Its successor, StyleGAN2 [19], improved image quality by addressing architectural artifacts and introducing path regularization techniques. To mitigate the data scarcity challenge, StyleGAN2-ADA is proposed, introducing adaptive discriminator augmentation that applies stochastic, differentiable augmentations during training [8]. This method stabilizes the learning process, especially when only a limited dataset is available. Empirical studies have demonstrated that StyleGAN2-ADA achieves comparable or superior performance to StyleGAN2 with an order of magnitude fewer training images, [11][12],[20][21]. Further improvements in GAN-based synthesis have explored architectural changes using transformers [22]-[25], latent space disentanglement [26]-[28], and hybrid attention modules [29]-[31], all aiming to enhance visual realism and sample diversity.
Despite the progress in synthetic image generation, most systems rely on large-scale public datasets and are not optimized for real-time, personalized Deepfake generation. The challenge lies in aligning a user's face, captured from a webcam or a small image set, to the GAN’s latent space while maintaining identity consistency and generation speed. Talib and Abed [17] demonstrated that StyleGAN2-ADA can be adapted to work with real-time personal image projection, generating high-fidelity Deepfake images with over 99% identity similarity. Their framework, however, lacked a comprehensive evaluation of latency and scalability. Other recent works have proposed fast projection techniques that optimize latent codes in under 100ms [32], while few-shot GAN fine-tuning methods have enabled model adaptation with fewer than 10 images [33]. These findings support the feasibility of combining projection with on-the-fly GAN synthesis for real-time avatar generation and video conferencing applications. The main novelty of the current paper is developing an improved approach for deepFake image generation using StyleGAN2-ADA with real-time personal image projection. The contributions in details are mentioned in previous subsection “1.3 Main Contributions”.
This section presents the complete architecture and workflow of the proposed system for real-time DeepFake image generation using StyleGAN2-ADA. The proposed framework discussed in this section includes four main parts as follows: 1) the architecture, 2) data preparation and augmentation, 3) strategy of training, and 4) real-time projection and generation.
The architecture of the proposed work based on StyleGAN2-ADA builds on top of StyleGAN2 introducing adaptive data augmentation techniques to improve training stability on limited datasets. The breakdown of the architecture and its core components summarized in Figure 1, is detailed as follows:
Figure 1. Architectural flow of StyleGan2-ADA-based system
The generator (G) is composed of the mapping and synthesis networks which are discussed as follows.
The discriminator (D) is described in the following points:
The key innovations in StyleGAN2 are:
The ADA enhancements involve:
The official StyleGAN2-ADA repository, maintained by NVIDIA, is publicly accessible on GitHub and provides source code, documentation, and examples [34]. The developed methodology consists of three primary components as discussed in the following subsections.
A limited dataset of 10–20 personal images per subject is collected using a standard RGB webcam under various lighting conditions and facial expressions. Images are manually cropped to focus on facial regions and resized to 1024×1024 pixels to match the input requirements of StyleGAN2-ADA. Data augmentation techniques such as horizontal flipping, brightness adjustment, and rotation are applied to increase variance and robustness during training. More details about the dataset are presented in Table 1.
Table 1. Details about the personal dataset preparation
The parameter | Description |
Number of subjects | 8 subjects |
Images (per subject) | 10–20 raw images After filtering: 12–14 retained images |
Used images in training | 96 high-quality face images |
Type of Camera | Standard RGB laptop webcam |
Resolution of Image | Captured at 1280×720 Cropped and resized to 1024×1024 |
Included Scenarios | Neutral, smiling, frontal, and slight head rotations |
Conditions of Lighting | Indoor natural lighting Standard room illumination |
This approach allows stable learning from small-scale datasets by preventing discriminator overfitting, [26]. The training pipeline consisted of two phases:
Training is conducted on an NVIDIA RTX 20786 GPU with a batch size of 4, learning rate of 0.002, and R1 regularization =10. The fine-tuning process converged after ~5000 iterations/subject, taking 12–15 minutes on average. All parameters used in the training are presented in Table 2.
Table 2. The main parameters used in StyleGAN2-ADA Training
Parameter | Description |
Size of Dataset | 10–20 personal images per subject (8 subjects in total) After filtering: 12–14 retained images |
Source of Training Data | Personal dataset FFHQ for pre-training |
Used Hardware | NVIDIA RTX 20786 GPU Batch size = 4 Learning Rate = 0.002 R1 regularization |
Training Duration | ~5000 iterations (~12–15 minutes per subject) |
Encoder for Projection | ResNet-50 + MLP with ArcFace-based identity loss |
Evaluation Metrics | Identity Similarity (ArcFace) Fréchet Inception Distance (FID) Latency (ms/frame) |
To achieve real-time performance, a ResNet50-based encoder maps live webcam images into the latent space of the trained GAN. The encoder, followed by an MLP, estimates the W+ latent vector, which is then passed to the generator to produce synthetic outputs. Frame-by-frame projection operates at ~25 fps with latency under 40ms, enabling interactive applications like avatar animation and live filters. Identity preservation is maintained by minimizing cosine distance between facial embedding of input and generated images, using a pre-trained ArcFace model, [35]. The flowchart in Figure 2 illustrates key stages of StyleGAN2-ADA training with real-time generation.
Figure 2. StyleGan2-ADA real-time training pipeline
To validate the performance of the proposed StyleGAN2-ADA-based system for real-time Deepfake image generation, a series of experiments are conducted on a limited personal dataset collected from 8 individuals. The evaluation focused on generation accuracy, identity preservation, realism, and inference speed.
Figure 3 presents a comparative performance analysis of StyleGAN2-ADA and StyleGAN2 loss discriminator evaluation metric. The higher curve of StyleGAN2-ADA reflects a more stable and effective training process, where the generator successfully deceives the discriminator, an essential aspect of GAN convergence. In contrast, StyleGAN2 initially demonstrates strong performance but declines over time, indicating difficulties in generator training and prolonged discriminator adaptation. Overall, the figure highlights the superior and sustained performance of StyleGAN2-ADA.
A distinct divergence in the performance curves of StyleGAN2-ADA and StyleGAN2 is shown in Figure 4. The declining curve of StyleGAN2-ADA, aligned with the rise in Figure 3, reflects effective training, where the generator increasingly fools the discriminator. In contrast, the rising curve of StyleGAN2 indicates suboptimal training, with the generator struggling to produce realistic outputs, allowing the discriminator to more easily distinguish fake images. Together, Figure 3 and Figure 4 highlight the superior training dynamics of StyleGAN2-ADA compared to the less stable performance of StyleGAN2.
Figure 5 highlights the initial weak performance of both StyleGAN2 and StyleGAN2-ADA in generating fake images. However, as training advanced, StyleGAN2-ADA demonstrated notable improvement and sustained high performance, indicating effective adaptation and the ability to generate realistic outputs. In contrast, StyleGAN2 exhibited a decline in performance, reflecting ineffective training and diminished image quality. Overall, the figure emphasizes the superior adaptability and robustness of StyleGAN2-ADA compared to the limitations of StyleGAN2.
Figure 6 illustrates the discriminator’s detection score for both models. StyleGAN2-ADA shows a notably low score, indicating that its generated images are more difficult for the discriminator to distinguish from real ones, reflecting successful generator performance. In contrast, StyleGAN2 yields a higher detection score, suggesting the discriminator more easily identifies fake images, and thus, the generator is less effective. This comparison highlights the discriminator’s relative advantage in StyleGAN2, while indirectly confirming the superior generative quality of StyleGAN2-ADA.
Figure 3. Comparative performance analysis of StyleGAN2-ADA and StyleGAN2 discriminator/loss
Figure 4. Comparative performance analysis of StyleGAN2-ADA and StyleGAN2 generator/loss
Figure 5. Comparative performance analysis of StyleGAN2-ADA and StyleGAN2 scores/fake
Figure 6. Comparative performance analysis of StyleGAN2-ADA and StyleGAN2 scores/real
Three key metrics are used to assess the system's performance as follows:
For personal images, two real images are inserted to create a projection between them. Any desired number of generated fake images can be chosen and monitored with their labels and investigated “age”. In addition, the model is also tested in real-time using the live laptop's camera, taking live images of two individuals, and generating fake images by moving from the first image to the second. The video of real-time fake image generation is available here (see this link: https://www.youtube.com/watch?v=F0TNIARjwdY). Table 3 explains the Evaluation of Image Quality, Identity Similarity, and Generation Speed for StyleGAN2 and StyleGAN2-ADA and the improvement (%). These results demonstrate that StyleGAN2-ADA significantly improves both image realism and identity retention, while maintaining real-time performance. The system achieves sub-40ms latency, which is suitable for interactive applications such as virtual avatars or augmented video streams. The video demonstration of StyleGAN2 can be accessed here (see the following link: https://www.youtube.com/watch?v=o2Refiedp5U), and that of StyleGAN2-ADA is available here (see this link: https://www.youtube.com/watch?v=_u4XiThcJMw). A comparative analysis of both videos clearly shows that StyleGAN2-ADA produces images with significantly higher resolution and visual fidelity. Furthermore, StyleGAN2 generates a noticeably higher number of distorted or unrealistic images, whereas StyleGAN2-ADA demonstrates improved stability and image quality.
Table 3. Performance Metrics for StyleGAN2 vs. StyleGAN2-ADA
Metric | The Proposed StyleGAN2-ADA | Style-GAN2 | Improvement (%) |
Identity Similarity (↑) | 99.1% | 96.3% | 2.8% |
FID Score (↓) | 8.4 | 18.7 | 55.08% |
Latency(ms/frame) (↓) | 38 | 51 | 25.49% |
A visual comparison is conducted to evaluate the quality of images generated by the pre-trained StyleGAN2 model versus the fine-tuned StyleGAN2-ADA. As illustrated in Figure 7, three outputs are compared side-by-side: the input image captured via a personal webcam, the generated output from StyleGAN2, and the output from StyleGAN2-ADA. The comparison reveals that StyleGAN2-ADA significantly outperforms StyleGAN2 in terms of visual fidelity and detail clarity. While the StyleGAN2 model produced several distorted or unrealistic facial outputs, the StyleGAN2-ADA model generated images that are noticeably sharper, more coherent, and closer in resemblance to the original inputs. This improvement is attributed to the adaptive augmentation mechanism used during the fine-tuning process, which enhanced the generator's ability to produce high-quality images even from limited input data.
Figure 7. Visual comparison between: a) StyleGan2 and b) StyleGan2-ADA
An ablation study is conducted to evaluate the impact of ADA and real-time projection separately:
These findings confirm the importance of both components in achieving reliable, fast, and identity-faithful image generation.
A user study involving 12 participants is conducted to rate realism and likeness on a scale from 1 to 5. Results, as presented in Table 4, showed a mean realism rating of 4.6 and identity resemblance rating of 4.8, indicating high perceptual quality and fidelity.
Table 4. The subjective results involving 12 participants to rate the realism and the likeness
The Metric | Average Rating |
Visual Realism | 4.6 from 5 |
Identity Resemblance | 4.8 from 5 |
The main findings are summarized in the following points:
These outcomes validate the effectiveness of the proposed system for real-time Deepfake synthesis with ethical potential for avatar animation, digital doubles, and data-efficient simulations.
This paper presented a real-time Deepfake image generation system based on StyleGAN2-ADA, designed to operate efficiently under limited personal datasets. By integrating adaptive discriminator augmentation with a real-time latent space projection module, the proposed method achieves high-fidelity identity-preserving image synthesis with minimal computational overhead. Quantitative results demonstrated that the system achieved: 99.1% identity similarity, a Fréchet Inception Distance (FID) of 8.4, and less than 40ms latency per generated frame, making it suitable for real-time applications such as digital avatars and interactive media. The incorporation of ADA significantly improved model generalization in low-data regimes, while the projection encoder ensured fast and accurate mapping from personal images to the generator’s latent space. Compared to traditional StyleGAN2, our system achieved more realistic outputs and required fewer training samples, confirming its practicality in personalized Deepfake generation scenarios. Despite the system's advantages, several limitations remain:
To build upon the foundation established in this research work, future efforts may focus on the following points:
List of Abbreviations:
Abbreviation | Description |
ADA | Adaptive Discriminator Augmentation |
GANs | Generative Adversarial Networks |
FID | Fréchet Inception Distance |
G | Generator |
MLP | Multilayer Perceptron |
D | Discriminator |
CNN | Convolutional Neural Network |
FFHQ | Flickr-Faces-HQ Dataset |
LeakyReLU | Leaky Rectified Linear Unit |
Supplementary Materials
Supplement information is available at the following links
Author Contribution
All authors contributed equally to this paper. All authors read and approved the final paper.
Funding
This research received no external funding.
Conflicts of Interest
The authors declare no conflict of interest.
REFERENCES
AUTHOR BIOGRAPHY
Ali A. Abed Professor at the University of Basrah - Department of Mechatronics Engineering in Iraq. Received the B.Sc. & M.Sc. degrees in Electrical Engineering in 1996 & 1998 respectively. He received the Ph.D. in Computer & Control Engineering in 2012. His fields of interest are Robotics, Computer Vision, and IIoT. He is IEEE senior member, IEEE member in Robotics & Automation Society, IEEE member in IoT Community, member ACM. He is currently supervising a group of researchers working with developing deep learning models for computer vision and cybersecurity applications. He can be contacted at email: ali.abed@uobasrah.edu.iq. | |
Doaa Alaa Talib Assistant lecturer at the Department of Laser and Optoelectronics Engineering-Shatt Al-Arab University College, Basrah, Iraq. Received the B.Sc & M.Sc degree in Computer Engineering in 2015 & 2023. Her field of interest is Computer Vision and Deepfake. She is currently a group researcher working in developing deep learning models and algorithms in Python. She can be contacted at: duaaalaaa@gmail.com. | |
Abdel-Nasser Sharkawy is an associate Professor at Mechatronics Engineering, Mechanical Engineering Department, Faculty of Engineering, South Valley University (SVU), Qena, Egypt. Sharkawy was graduated with a first-class honors B.Sc. degree in May 2013 and received his M.Sc. degree in April 2016 from Mechatronics Engineering, Mechanical Engineering Department, SVU, Egypt. In March 2020, Sharkawy received his Ph.D. degree from Robotics Group, Department of Mechanical Engineering and Aeronautics, University of Patras, Patras, Greece. Sharkawy has an excellent experience for teaching the under-graduate and postgraduate courses in the field of Mechatronics and Robotics Engineering. Sharkawy has published more than 80 papers in international scientific journals, book chapters and international scientific conferences. His research areas of interest include robotics, human-robot interaction, mechatronic systems, neural networks, machine learning, and control and automation. He can be contacted at email: abdelnassersharkawy@eng.svu.edu.eg |
Ali A. Abed (Improved DeepFake Image Generation Using StyleGAN2-ADA with Real-Time Personal Image Projection)