ISSN: 2685-9572 Buletin Ilmiah Sarjana Teknik Elektro
Vol. 8, No. 3, June 2026, pp. 811-823
Real-Time BISINDO Alphabet Recognition via Faster R-CNN Incorporating Skin Tone Diversity as a Classification Feature
Lilis Nur Hayati 1,3, Anik Nur Handayani 1, Wahyu Sakti Gunawan Irianto 1, Rosa Andrie Asmara 2,
Dolly Indra 3, Nor Salwa Damanhuri 4
1 Department of Electrical Engineering and Informatics, Universitas Negeri Malang, Malang, Indonesia
2 Information Technology Department, Politeknik Negeri Malang, Malang, Indonesia
3 Department of Computer Science, Universitas Muslim Indonesia, Makassar, Indonesia
4 Electrical Engineering Studies, Universiti Teknologi MARA (UiTM), Cawangan Pulau Pinang, Malaysia
ARTICLE INFORMATION | ABSTRACT | |
Article History: Received 10 December 2025 Revised 28 March 2026 Accepted 15 June 2026 | Indonesian Sign Language (Bahasa Isyarat Indonesia/BISINDO) enables communication for deaf individuals through hand gestures, yet limited public awareness creates significant barriers between deaf and hearing communities. Existing recognition systems often fail to generalize across diverse skin tones, reducing their effectiveness in inclusive real-world deployment. The contribution of this research is a BISINDO alphabet recognition system that integrates skin color features - extracted via HSV-based skin segmentation - as an additional preprocessing layer within the Faster R-CNN framework, explicitly improving detection robustness across varied skin tones. The dataset consists of 8,000 images from ten adult actors representing light, medium-brown, and dark skin tones, augmented through flipping and brightness variation, with a 90:10 training-to-testing ratio. The model was trained over 15,000 steps with a batch size of 24, selected through empirical validation to balance convergence stability and dataset size. Experimental results show that indoor conditions outperform outdoor settings due to controlled lighting. Light-skinned and dark-skinned participants achieved the highest accuracy of 87.5% and F1-score of 85.71%, while medium-brown-skinned participants showed slightly lower performance, likely attributed to greater variability in reflectance under mixed lighting. The system achieves 24 frames per second, demonstrating potential for real-time communication support. These findings confirm that Faster R-CNN with skin color feature integration is effective for BISINDO alphabet recognition, with skin tone diversity being a critical performance factor. Future work will explore larger participant pools and dynamic gesture recognition under varied real-world lighting scenarios. | |
Keywords: BISINDO; Faster R-CNN; Skin Color Features; Hand Gesture Recognition; Assistive Technology | ||
Corresponding Author: Anik Nur Handayani, Department of Electrical Engineering and Informatics, Universitas Negeri Malang, Malang, Indonesia. Email: aniknur.ft@um.ac.id | ||
This work is open access under a Creative Commons Attribution-Share Alike 4.0 | ||
Document Citation: L. N. Hayati, A. N. Handayani, W. S. G. Irianto, R. A. Asmara, D. Indra, and N. S. Damanhuri, “Real-Time BISINDO Alphabet Recognition via Faster R-CNN Incorporating Skin Tone Diversity as a Classification Feature,” Buletin Ilmiah Sarjana Teknik Elektro, vol. 8, no. 3, pp. 811-823, 2026, DOI: 10.12928/biste.v8i3.15587. | ||
Effective communication is a fundamental aspect of social interaction, enabling individuals to express ideas, emotions, and intentions [1]. For people with hearing impairments, sign language serves as the primary communication medium, utilizing structured hand gestures, body postures, and facial expressions to convey meaning [2][3]. In Indonesia, the deaf community predominantly uses Bahasa Isyarat Indonesia (BISINDO), a naturally developed sign language widely adopted in daily communication [4][5]. Despite its importance, communication barriers between deaf and hearing communities remain significant due to limited public awareness of BISINDO, restricting social participation for deaf individuals [6]. This condition underscores the urgent need for assistive technologies capable of translating BISINDO gestures into textual information to support more inclusive communication [7].
A critical yet frequently overlooked challenge in vision-based sign language recognition is skin color variation [8]. Differences in melanin levels and illumination conditions significantly affect image appearance and feature extraction, causing performance inconsistencies across users with different skin tones [9][10]. Existing recognition systems are often trained on datasets with homogeneous subjects, leading to dataset bias that reduces model generalization, particularly under outdoor or low-light conditions [11][12]. Furthermore, many current approaches focus exclusively on hand shape or motion features without systematically accounting for skin tone diversity, limiting their fairness and reliability in real-world deployment [13]. These gaps directly motivate the present study.
Recent advances in deep learning have enabled substantial progress in gesture recognition and object detection. Convolutional Neural Networks (CNNs) have been widely applied to hand gesture recognition due to their capacity to automatically learn discriminative visual features from image data [14][15]. Among detection frameworks, region-based architectures that integrate region proposal mechanisms with convolutional backbones have demonstrated strong performance across diverse domains including medical imaging, agriculture, and human activity recognition [16][17]. Several studies have explored sign language recognition using these deep learning approaches under controlled environments, reporting promising accuracy [18][19]. However, most existing works do not address skin tone diversity or real-world environmental variability, leaving a significant robustness gap [20][21].
For the BISINDO recognition task, this study selects Faster R-CNN as the detection framework [20], [22][23]. While single-stage detectors such as YOLOv8 and SSD MobileNet offer faster inference, Faster R-CNN's two-stage architecture comprising a Region Proposal Network (RPN) that generates candidate object regions, followed by Region of Interest (RoI) pooling and classification-provides superior localization accuracy for hand gesture regions, which is critical when detecting overlapping or complex hand configurations in BISINDO alphabets [24][25]. Furthermore, Faster R-CNN's modular design allows explicit integration of supplementary features, such as skin color, into the detection pipeline, making it more suitable for this study's objective of analyzing cross-skin-tone performance. The system is designed to achieve real-time processing at 24 frames per second, demonstrating its viability for practical communication support applications.
To address the identified gaps, this study incorporates HSV-based skin color features as an explicit preprocessing component within the Faster R-CNN framework and evaluates the system using a diverse dataset representing light, medium-brown, and dark skin tones under both indoor and outdoor conditions [26]. It is hypothesized that explicitly incorporating skin color features will improve detection consistency across skin tone groups, thereby increasing model robustness and inclusivity. The contribution of this research is the development of a BISINDO alphabet recognition system that integrates skin color features into a Faster R-CNN framework, systematically evaluated across diverse skin tones and environmental conditions to provide empirical insights into the robustness and fairness of deep learning–based sign language recognition systems
The proposed system architecture is illustrated in Figure 1, which depicts the complete pipeline from raw image input to final alphabet prediction for BISINDO gestures, with skin-tone-aware preprocessing integrated at the front end [18]. Input frames are captured from actors spanning the three skin tone groups during gesture performance. Prior to entering the main detection network, each image undergoes a preprocessing step designed to normalize the effects of lighting variation without modifying the core architecture of Faster R-CNN, given that photometric conditions and pigmentation interact to produce appearance shifts that can compromise detection consistency [27][28]. The normalized frames are subsequently routed through a ResNet-50 convolutional backbone, which produces a hierarchy of feature representations encoding geometric properties such as boundary contours, surface texture patterns, and the spatial arrangement of hand structures [29][30].
Feature maps output by the backbone are simultaneously fed into the RPN, whose anchor-based scanning mechanism proposes candidate bounding boxes—referred to as Regions of Interest (RoIs)—that may enclose gesture-performing hand regions [31][32]. An RoI Align operation then resamples each proposal region onto a fixed spatial grid while preserving subpixel alignment, which is critical for accurate localization [33]. The resampled features advance to a dual-head detection module: one head assigns each proposal to a gesture class (or background), while the other refines the bounding box coordinates through regression. A final Non-Maximum Suppression (NMS) pass eliminates spatially redundant high-confidence predictions, yielding detections labeled with the BISINDO alphabet class, bounding box, and confidence score [34]. The resulting pipeline supports consistent and accurate gesture decoding across the full range of tested skin tones and lighting environments.
The design choice to integrate Faster R-CNN with skin-tone-aware class definitions responds to documented challenges in deploying gesture recognition under real-world conditions, including inconsistent illumination, cluttered backgrounds, and heterogeneous user populations. Research in the field of algorithmic fairness has established that recognition systems trained on phenotypically homogeneous data tend to underperform for under-represented groups, particularly when testing environments introduce photometric variability [23]. Evidence from hardware-constrained deployment studies further underscores the importance of building detection models that balance accuracy with computational efficiency [35]. Region proposal frameworks have been repeatedly validated for their superior spatial localization in complex visual scenes, making them well suited for articulated hand gesture tasks [36].
Figure 1. The architecture of Faster R-CNN
The research design comprises two main phases: the training phase and the testing phase. The training hase covers dataset preparation and model development, while the testing phase evaluates the trained model under real-time conditions. The overall research workflow is illustrated in the flowchart in Figure 2.
Figure 2. Research Methodology Flowchart
Image data were collected using a smartphone camera from ten adult actors demonstrating eight BISINDO alphabet classes (A–H). The actors represented three skin color categories: four light-skinned, three medium-brown-skinned, and three dark-skinned individuals. Data acquisition was performed at a distance of approximately 70 cm, with the camera positioned at chest level (120–135 cm from the floor) from a frontal view [37]. Captures were conducted under both indoor and outdoor conditions. Indoor illumination ranged from 161 lux to 254 lux, while outdoor illumination ranged from 197 lux to 6,536 lux.
A total of 8,000 original images were collected, yielding approximately 800 images per actor across eight gesture classes. All images were manually annotated using the LabelImg application [38], and each image was resized to 640×640 pixels to match the Faster R-CNN input resolution. Representative samples are illustrated in Figure 3.
(a) | (b) | (c) | (d) | (e) |
(f) | (g) | (h) | (i) | (j) |
Figure 3. Representative BISINDO Gesture Samples. Examples of BISINDO alphabet gestures (A–H) performed by actors with (a–d) light skin, (e–g) medium-brown skin, and (h–j) dark skin tones under indoor and outdoor conditions with heterogeneous backgrounds
To increase dataset diversity and reduce overfitting, data augmentation was applied using the Roboflow platform. Augmentation techniques included horizontal flipping and image rotation at −12° and +12°, chosen to simulate realistic variations in hand orientation. Through this process (Table 1), the original 8,000 images were expanded to 46,916 images. The augmented dataset was exported in TensorFlow Record (TFRecord) format. The dataset was split into 42,259 training images and 4,657 testing images (90:10 ratio). Examples of augmentation results are shown in Figure 4 and Figure 5.
Table 1. Dataset Composition Before and After Augmentation
Condition | Original Images | Augmented Images | Total |
Indoor | 4,000 | 19,458 | 23,458 |
Outdoor | 4,000 | 19,458 | 23,458 |
Total | 8,000 | 38,916 | 46,916 |
Original | Flip horizontal |
Figure 4. Augmentation Result — Horizontal Flip
Original | 12° |
-12 | |
Figure 5. Augmentation Result — Rotation
A key contribution of this study is the explicit integration of skin color as a discriminative factor within the Faster R-CNN detection framework. This integration was implemented through a class-definition strategy: the detection model was configured to recognize 24 classes, representing all combinations of 8 BISINDO alphabet gestures (A–H) and 3 skin color categories (light, medium-brown, dark).
The class-definition strategy differs from explicit skin segmentation approaches (e.g., HSV or YCbCr masking) in two important ways. First, it avoids segmentation failure cases that arise when background colors closely resemble skin tones — such as wooden walls or cream-colored clothing. Second, compared to standard augmentation-only approaches that randomly apply brightness and color jitter without demographic grounding, the class-level skin tone encoding directly supervises the model to learn gesture representations tied to specific skin tone conditions, providing a structured form of domain diversity that augmentation alone cannot guarantee [39][40]. This is evidenced by the consistent 100% recall across all skin tone groups.
As shown in Table 2, several pre-trained models from the TensorFlow 2 Object Detection Model Zoo were evaluated based on inference speed and COCO mAP [41]. The Faster R-CNN ResNet-50 V1 640×640 was selected for its favorable balance between inference speed and detection accuracy. Compared to heavier backbones such as VGG16 (approximately 138 million parameters), ResNet-50 offers a substantially more efficient parameter structure [42]. Although single-stage detectors such as YOLOv8 offer faster raw inference, Faster R-CNN's two-stage architecture — comprising a RPN for candidate region generation followed by RoI pooling and classification — provides superior localization accuracy for complex hand gesture configurations [24]. A direct controlled comparison with YOLOv8 and SSD MobileNet on the identical dataset is identified as future work.
Table 2. Pre-Trained Model Comparison from TensorFlow 2 Model Zoo
No | Model Name | Input Size | Speed (ms) | COCO mAP |
1 | Faster R-CNN ResNet50 V1 ★ | 640×640 | 53 | 29.3 |
2 | Faster R-CNN ResNet50 V1 | 1024×1024 | 65 | 31.0 |
3 | Faster R-CNN ResNet50 V1 | 800×1333 | 65 | 31.6 |
4 | SSD ResNet101 V1 FPN | 1024×1024 | 104 | 39.5 |
5 | SSD ResNet152 V1 FPN | 1024×1024 | 111 | 39.6 |
Training (Figure 6) was conducted on a system equipped with Intel Core i7 processor, 16 GB RAM, and NVIDIA GPU with CUDA support using Jupyter Notebook version 7.2.1. As shown in Table 3 and Table 4, the training loss decreased consistently from 0.4249 at step 500 to 0.192 at step 15,000, indicating progressive learning of discriminative gesture features.
Table 3. Model Training Hyperparameters
Parameter | Value |
Number of detection classes | 24 (8 alphabets × 3 skin tones) |
Batch size | 24 |
Training steps | 15,000 |
Learning rate (initial) | 0.0001 |
Optimizer | Momentum SGD |
Input resolution | 640 × 640 pixels |
Backbone | ResNet-50 V1 |
Table 4. Training Loss Progression
Training Step | Training Loss | Elapsed Time |
500 | 0.4249 | 28 min 9 sec |
2,800 | 0.3389 | 3 hr 28 min |
5,000 | 0.2868 | 5 hr 58 min |
10,000 | 0.2267 | 12 hr 2 min |
15,000 | 0.1920 | 18 hr 5 min |
Figure 6. Detection Result Example — BISINDO Alphabet "A"
Upon receiving an input frame, the ResNet-50 backbone maps the image to multi-scale feature representations through its convolutional layers. These spatial feature maps are simultaneously shared with the RPN, which uses predefined anchor templates at multiple scales to scan for candidate object locations and outputs a ranked set of region proposals. Each proposal undergoes RoI Align resampling to produce fixed-dimension pooled features, which are then processed by the classification head assigning a gesture class label and the regression head refining the bounding box geometry. Overlapping detections above the confidence threshold are pruned via NMS, and the surviving predictions constitute the model output: an alphabet label, a bounding box, and an associated confidence score [23],[26],[43].
Performance assessment employed two complementary approaches. The first applied the trained model to 4,657 held-out test images, computing confusion-matrix-derived metrics accuracy, precision, recall, and F1-score—for each skin tone and environment combination [44]. The second conducted real-time webcam-based evaluation, with five repetitions per testing condition (self-testing, third-party user testing, varied background testing) at distances between 30 cm and 70 cm. Detections were accepted only when confidence scores exceeded an empirically determined threshold of 0.5, selected to balance detection sensitivity against false positive rate.
Model performance was evaluated using accuracy, precision, recall, and F1-score derived from the confusion matrix. Figure 7 illustrates example values for each confusion matrix element.
Figure 7. Illustration of TP, TN, FP, and FN in Gesture Detection Evaluation
Recognition outcomes are examined across the three skin tone categories and both testing environments, with emphasis on how performance responds to differences in illumination and photometric background complexity.
Table 5 presents the comprehensive recognition metrics for all combinations of skin tone category and testing environment. A consistent pattern emerges: indoor evaluations outperform their outdoor counterparts across every skin tone group, attributable to the greater photometric stability of interior environments. The highest recognition scores—accuracy of 87.5% and F1-score of 85.71%—were achieved for both light-tone and dark-tone actors tested under indoor conditions.
Across every tested scenario, recall reached 100%, indicating that the model did not fail to detect any target gesture when it appeared in the frame. Precision values were consistently lower than recall, revealing those false positive detections—instances where the model incorrectly identified background regions or non-target gestures as valid alphabet signs—constitute the dominant error mode [45]. Crucially, the inter-group performance spread across skin tone categories was confined to a maximum of 4.5 percentage points, providing quantitative evidence that the skin-tone-encoding strategy did not produce systematic bias favoring any particular demographic subgroup [46].
Table 5. Recognition Performance Summary Across Skin Tone Categories and Testing Conditions
Testing Condition | Skin Tone | Accuracy (%) | Precision (%) | Recall (%) | F1-Score (%) |
Indoor | Light | 87.5 | 75.0 | 100.0 | 85.71 |
Indoor | Medium-Brown | 83.0 | 66.7 | 100.0 | 80.0 |
Indoor | Dark | 87.5 | 75.0 | 100.0 | 85.71 |
Outdoor | Light | 83.0 | 66.7 | 100.0 | 80.0 |
Outdoor | Medium-Brown | 83.0 | 66.7 | 100.0 | 80.0 |
Outdoor | Dark | 83.0 | 66.7 | 100.0 | 80.0 |
Figure 8 captures the confusion matrix for the most challenging evaluation scenario: outdoor testing with medium-brown-tone actors, yielding accuracy 83.0%, precision 66.7%, recall 100%, and F1-score 80.0%. Figure 9 presents the peak-performance matrix for indoor light-tone and dark-tone actors, with accuracy 87.5%, precision 75.0%, recall 100%, and F1-score 85.71%.
Figure 8. Confusion Matrix — Outdoor Testing, Medium-Brown Skin
Figure 9. Confusion Matrix — Indoor Testing, Light Skin and Dark Skin
Table 6 situates the proposed system within the landscape of related sign language recognition studies. With a peak accuracy of 87.5%, the proposed approach matches or surpasses most previous BISINDO-specific implementations. The primary differentiating factor is the deliberate multi-group skin tone evaluation, which is absent from the majority of prior BISINDO studies that restrict participant diversity [3][4].
Table 6. Comparison with Previous Sign Language Recognition Studies
Study | Method | Dataset | Letter Class | Best Accuracy |
Proposed | Faster R-CNN ResNet-50 + Skin Color Class Strategy | BISINDO (8,000 orig / 46,916 aug) | 8 (A-H) | 87.5% |
CNN-based gesture classifier | BISINDO dataset | 26 | 82.56% | |
Faster R-CNN (baseline) | BISINDO (controlled) | 26 | 85.0% | |
SSD MobileNet | BISINDO (homogeneous) | 8 | 83.7% | |
VGG-16 | SIBI | 2 (M and N) | 87% | |
YOLOv5 | BISINDO (real-time) | 26 | 99.27% |
The experimental results yield four primary findings. First, the system achieves a peak accuracy of 87.5% and F1-score of 85.71% for light-skinned and dark-skinned actors under indoor conditions. Second, indoor testing consistently outperforms outdoor testing across all skin tone groups. Third, the performance gap across skin tone categories is minimal — at most 4.5 percentage points — confirming that the class-definition strategy successfully embeds skin tone diversity without producing severe inter-group bias. Fourth, recall consistently reaches 100% across all conditions, demonstrating reliable detection of target gestures regardless of the user's skin tone [50].
The results are broadly consistent with and extend upon prior findings. Earlier BISINDO-focused studies using Faster R-CNN baselines on homogeneous datasets report accuracy values in the range of 83–85% [3], [4], which the proposed system matches and exceeds under indoor conditions. Compared to CNN-LSTM hybrid approaches [51], the proposed system trades dynamic gesture support for substantially lower deployment complexity, operating at 24 FPS on standard hardware. Studies in computer vision fairness have highlighted that models trained on homogeneous datasets exhibit reduced generalization for underrepresented skin tones [52][53], the present study empirically demonstrates that this gap can be contained to within 4.5 percentage points through inclusive dataset construction and class-level encoding.
The results demonstrate that region-based detection architectures are viable for real-time BISINDO recognition when configured with skin-tone-aware class structures and trained on demographically diverse datasets. The consistent 100% recall across all groups indicates the system is well-suited for assistive communication contexts where missed detections are more disruptive than occasional false positives. The slight performance reduction observed for medium-brown skin tones under outdoor conditions suggests that future deployment environments requiring outdoor operation should incorporate illumination normalization preprocessing to mitigate reflectance-induced variability.
The primary strength of this study is its systematic evaluation of recognition fairness across skin tone diversity under heterogeneous environmental conditions, largely absent from prior BISINDO literature [3][4]. The primary limitations include: (1) coverage of only 8 of the 26 BISINDO alphabet classes; (2) the absence of a direct controlled comparison with YOLOv8 or SSD MobileNet on the same dataset; (3) the system has not been evaluated on dynamic backgrounds or with moving cameras; and (4) the participant pool of ten actors may not fully represent hand morphology variability in real-world deployment.
This study proposed a Faster R-CNN ResNet-50–based system for BISINDO alphabet recognition that explicitly incorporates skin color diversity through a 24-class definition strategy, representing eight alphabet gestures across three skin tone categories (light, medium-brown, and dark). The system was trained on a diverse dataset of 46,916 augmented images collected from ten actors under both indoor and outdoor conditions.
The best recognition performance was achieved under indoor testing conditions, with a peak accuracy of 87.5%, precision of 75.0%, recall of 100%, and F1-score of 85.71% for light-skinned and dark-skinned participants. Medium-brown-skinned participants achieved a maximum accuracy of 83.0%. The performance gap across skin tone categories was contained to within 4.5 percentage points, confirming that the system reliably detects target gestures regardless of the user's skin tone.
The theoretical contribution of this study lies in demonstrating that skin tone fairness in vision-based sign language recognition can be achieved through dataset diversity and class-level skin tone encoding, without requiring explicit skin segmentation preprocessing. The system operates at 24 frames per second on standard laptop hardware, confirming its viability for real-time assistive communication support.
The primary limitations include coverage of only 8 alphabet classes, moderate performance reduction in outdoor conditions, restriction to static gestures, and the absence of a direct controlled comparison with single-stage detectors. The system has not been evaluated on dynamic backgrounds or with moving cameras.
Future research directions include:
DECLARATION
Supplementary Materials
Representative BISINDO alphabet images from different skin color categories, additional qualitative detection results, and detailed Faster R-CNN training configurations are provided to support reproducibility and further analysis.
Author Contribution
All authors contributed equally to the main contributor to this paper. All authors read and approved the final paper.
Sustainable Development Goals
This study supports SDG 4 (Quality Education) by enabling assistive technologies for Indonesian Sign Language learning and communication. It also contributes to SDG 10 (Reduced Inequalities) by addressing skin color diversity to promote fairness in vision-based recognition systems. Furthermore, the proposed Faster R-CNN–based approach aligns with SDG 9 (Industry, Innovation, and Infrastructure) through the development of robust and real-time AI solutions for inclusive human–computer interaction.
Funding
This research received no external funding.
Acknowledgement
The authors would like to thank all participants involved in the data collection process and Universitas Muslim Indonesia for the facilities and support provided during this research.
Conflicts of Interest
The authors declare no conflict of interest.
REFERENCES
AUTHOR BIOGRAPHY
Lilis Nur Hayati, master's degree in Information Technology was obtained in 2005 from Universitas Gadjah Mada, and the Doctoral degree in the same field is currently being pursued at Universitas Negeri Malang. Currently, she is a lecturer in Information Systems with 9 years of teaching experience at Universitas Muslim Indonesia. Her research interests include software design, software requirements analysis, decision support systems, technopreneurship, human-computer interaction, e-business concepts, research methodology, and operating systems. Email:lilis.nurhayati.2205349@students.um.ac.id Google Scholar https://scholar.google.com/citations?hl=id&user=me_9y28AAAAJ |
Anik Nur Handayani, master's degree in Electrical Engineering in 2008 from Institut Teknologi Sepuluh Nopember (ITS) Surabaya, Indonesia, and earned her Doctoral degree in Science and Advanced Engineering from Saga University, Japan. She is currently a university lecturer at Universitas Negeri Malang, Indonesia. Her research interests include image processing, biomedical signal analysis, artificial intelligence, machine learning, deep learning, computer vision, and assistive technologies. Email: aniknur.ft@um.ac.idGoogle Scholar: https://scholar.google.com/citations?hl=en&user=nqPHjbMAAAAJ&view_op=list_works&sortby=pubdate |
Wahyu Sakti Gunawan Irianto, master’s degree in Computer Science from Universitas Indonesia, Jakarta, in 1997. He earned a Doctoral degree in Computer Science (M.Kom). He is currently a senior lecturer in the Department of Electrical Engineering at Universitas Negeri Malang, Indonesia. His research interests include computer science education, educational technology, intelligent systems, embedded and microcontroller applications, and digital systems. He has contributed to various projects, such as an interactive learning module based on Arduino and multimodal dataset research in the LUMINA project. Email: wahyu.sakti.ft@um.ac.id Google Scholar: https://scholar.google.com/citations?user=DAWTUlAAAAAJ&hl=en |
Rosa Andrie Asmara, received his Bachelor's degree in Electronics Engineering from Universitas Brawijaya, Malang, in 2004. He obtained his Master's degree in Computer Science from Institut Teknologi Sepuluh Nopember, Surabaya, in 2009, and completed his Doctoral degree in Computer Science at Saga University, Japan, in 2013. He is currently a lecturer at Politeknik Negeri Malang, Indonesia. His research interests include machine learning, image understanding, and computer vision. Google Scholar: https://scholar.google.co.id/citations?user=A1592kEAAAAJ&hl=en |
Dolly Indra, earned his Doctoral degree in Information Technology from Universitas Gunadarma in 2017. He is currently a lecturer at the Faculty of Computer Science, Universitas Muslim Indonesia. His research interests include image processing, computer vision, microcontroller systems, and information systems. Email: dolly.indra@umi.ac.id Google Scholar: https://scholar.google.co.id/citations?user=94_nu_QAAAAJ&hl=en |
Nor Salwa Damanhuri, received her Bachelor of Science (Hons.) in Electrical and Electronics Engineering from Universiti Tenaga Nasional (UNITEN), Malaysia, in March 2002. She completed her Master of Science in Control Systems Engineering at The University of Sheffield, United Kingdom, in September 2005, and obtained her Doctor of Philosophy (Ph.D.) in Bioengineering from the University of Canterbury, New Zealand, in April 2015. She is currently an Associate Professor at the Centre for Electrical Engineering Studies, Universiti Teknologi MARA (UiTM), Penang Branch, Malaysia. Her research interests include biomedical engineering, digital signal processing, mathematical modeling, control systems, and solar PV system applications. Email: norsalwa071@uitm.edu.my Google Scholar: https://scholar.google.com/citations?user=O3DojDMAAAAJ&hl=en |
Lilis Nur Hayati (Real-Time BISINDO Alphabet Recognition via Faster R-CNN Incorporating Skin Tone Diversity as a Classification Feature)