Buletin Ilmiah Sarjana Teknik Elektro ISSN: 2685-9572
Tracking Ball Using YOLOv8 Method on Wheeled Soccer Robot with Omnidirectional Camera
Refli Rezka Julianda, Riky Dwi Puriyanto
Department of Electrical Engineering, Universitas Ahmad Dahlan, Yogyakarta, Indonesia
ARTICLE INFORMATION | ABSTRACT | |
Article History: Received 15 June 2024 Revised 26 June 2024 Published 07 September 2024 | Object detection is very we often find in everyday life that facilitates every activities in the object recognition process, for example in the military field, intelligent transportation, face detection, robotics, and others. Detection target detection is one of the hotspots of research in the field of computer vision. The location and category of the target can be determined by using target detection. Currently, target detection has been applied in many fields, one of which includes image segmentation.You only look once (YOLO) is an algorithm that can perform object detection in realtime, YOLO itself always gets development and improvement from previous versions. YOLOv8 is a type of YOLO from the latest version. YOLOv8 is a new implementation of Deep Learning that connects the input (original image) with the output. This type of YOLOv8 algorithm uses A deep dive architecture, assisted by CNN and a new backbone which uses convolutional layers for pixels which when described will be shaped like a pyramid. YOLOv8 is a stable object detection processing method with 80% higher than the previous version of YOLO, which makes YOLOv8 a type of YOLO that is better at processing object data faster and more efficiently in Real-Time.The camera with omnidirectional system is able to detect spherical objects and other objects using the YOLOV8 model used. In performance testing with 320×320 and 416×416 frames, because it fits the grid structure of the YOLO architecture. YOLOv8 has a higher mAP value with a value of 95,5% compared to previous versions of YOLO. In the detection test, YOLOv8 has a better average object detection than the previous version of YOLO which is indicated by the number of objects detected more stable. | |
Keywords: Deep Learning; YOLO; YOLOv8; Tracking Ball; Omnidirectional | ||
Corresponding Author: Riky Dwi Puriyanto Department of Electrical Engineering, Universitas Ahmad Dahlan, Yogyakarta, Indonesia. Email: rikydp@ee.uad.ac.id | ||
This work is licensed under a Creative Commons Attribution-Share Alike 4.0 | ||
Document Citation: R. R. Julianda and R. D. Puriyanto, “Tracking Ball Using YOLOv8 Method on Wheeled Soccer Robot with Omnidirectional,” Buletin Ilmiah Sarjana Teknik Elektro, vol. 6, no. 2, pp. 203-213, 2024, DOI: 10.12928/biste.v6i2.10816. |
Object detection is very important nowadays most of people using it everyday for their life, the application of object detection has been along known and is growing rapidly in several developed countries, in order to optimize technology in their countries [1]. Object detection capabilities have an important role in various daily activities, such as in military activites, education, traffic and others, object detection used to overcome problems in maximizing tasks that require optimal data details on activities carried out directly in real time [2]. Wheeled soccer robots are robots that have the ability to dribble, kick and pass the ball [3]. Wheeled soccer robots have various supporting components to perform and maximize their abilities, such as DC motors, arduino, solenoids, and cameras. The use of object detection is one of the things that is very important and necessary in wheeled soccer robots [4].
Wheeled soccer robots need the ability to see objects that will be used to dribble, kick, and pass the ball, therefore, robots need a sense of sight instead of a pair of eyes, to carry out their duties [5]. To overcome this, wheeled soccer robots use cameras as a sense of vision [6]. Omnidirectional cameras is a type of camera that uses a convex reflecting mirror to provide a reflection towards the camera so that it can see its surroundings 360° [7] are used on wheeled soccer robots as a sense of vision [8]. The omnidirectional camera is a tool that helps wheeled soccer robots detect objects [9].
Object detection has an important function in wheeled soccer robots [10] where object detection is the main benchmark in being able to see objects such as balls [11]. Object detection problems [12][13]. on wheeled soccer robots have many problems in object detection, such as errors in detecting objects, which are caused by the type of color, shape, and texture of objects [14], the percentage of object recognition [15], the number of objects detected, and various types of noise that cause errors and decrease fps in the object detection process [16].
The problem of object detection in wheeled soccer robots occurs because it is difficult for robots to recognize objects such as balls, goals, obstacles, and other robots which are in the light contrast description, and the distance is quite far to be captured by ordinary cameras [17]. To overcome this, wheeled soccer robots use omnidirectional cameras to detect objects widely with a 360° viewing distance, which by using an omnidirectional camera will expand the view of the wheeled soccer robot to detect specified objects [18], which can overcome the problem of visibility in object detection [19].
In previous research, there are many things that can be done to detect objects with shape, color, motion, and texture. With the advancement of time and civilization, object detection began to develop towards artificial nerves such as using machine learning. Machine learning has various developmental algorithms in it where many stages are brought by previous research into their research in object detection, namely deep learning.
Deep learning is an artificial nerve that is used in detecting objects and classifying objects in an image, methods that use developments from deep learning such as Convolutional Neural Networks (CNN), Region-Based Convolutional Neural Network (RCNN), fast Region-Based Convolutional Neural Network (fast R-CNN), and also You Only Look Once (YOLO), where these methods can detect objects in real-time [20].
In this research, we will test the wheeled soccer robot whether it is able to detect balls, goals or other objects on the field using the You Only Look Once (YOLO) method. The robot used in this research is the Ahmad Dahlan University wheeled soccer robot using an omnidirectional camera in taking images or images as a dataset, which is a technique that displays images with a range of views based on various directions, be it the front direction, the back direction, the right side, also the left side.
Choosing YOLO (You Only Look Once) as an object detection architecture compared to other architectures such as the conventional CNN (Convolutional Neural Network) or other detection methods is due to some unique advantages possessed by YOLO, such as speed and efficiency, high detection accuracy compared to other algorithms [21].
The design of the ball tracking system using the YOLOv8 method in this study is described in the system block diagram, image training process flow chart, and system flow chart. To get the best results and as expected, the design of this system refers to theories and datasheets that have been reviewed from various sources.
In general, the system consists of 3 component parts: input, process, and output. The input component of the system is an omnidirectional camera. In the process component, the microcontroller used as a data processor and sender is Arduino Due. Furthermore, the output component in this system consists of a laptop. The block diagram of the measuring instrument system in this study is shown in Figure 1.
Figure 1. Block diagram design
In this study, initial testing will be carried out by detecting two different types of objects. Testing in this research this time the object detected is an orange ball and a white goal which will then be carried out the detection process on a green field. To be able to detect an object such as a ball and goal, this research uses a laptop connected to an omnidirectional camera that uses a Universal Serial Bus (USB) 2.0 port, then the results obtained will be processed using the YOLO library. In the process of implementing images in the software, this research must first conduct image data training using the library. The flowchart of the image training process can be seen in Figure 2 and the system flowchart can be seen in Figure 3.
Figure 2. Flowchart of image training process
Figure 3. Flowchart system
YOLOv8 algorithm uses a deep dive architecture, assisted by a CNN and a new backbone which uses convolutional layers for the pixels which when described will form a pyramid, can be seen in Figure 4.
Figure 4. Architecture YOLOv8
Based on Figure 4 each layer has its own function, which is as follows:
In this research, image capture, image labeling, training dataset results and model implementation were obtained.
The images taken are images of orange balls, white goals, and black obstacles or robots with a green field background color that will be used as material for training. Taking pictures is done on the 3rd floor robotics lab room in the Ahmad Dahlan University lab building using an ELP USB-FHD08S camera that has an omnidirectional lens installed on it then the image is stored on a computer. The following are the results of images taken using an omnidirectional camera can be seen in Figure 5.
Figure 5. Image capture of an omnidirectional camera
The data labeling process is a process where when the data that has been collected becomes one for labeling each object that will be learned by the YOLOv8 algorithm where each object at the time of labeling must adjust all possibilities as learning for blind spots or blur of an object. The image labeling process in this research uses the labeling method which is called through the anaconda prompt. The data labeling process can be seen in Figure 6. Based on Figure 6 is the process of labeling data using labelImg called through anaconda prompt, the drawback of using labelImg at anaconda prompt is that the computer/laptop used to do labeling must not run out of power or run out of battery.
Figure 6. Image labelling process
The model training process in YOLOv8 is the same as in the previous YOLO but the procession of index images, data classes, and model calls is more complex than the previous YOLO version. In this study, the training data model for YOLOv8 uses google collab where the number of epochs is 100 with image labeling totaling 2100 data labels and 2100 images, with a total background of 6042 false data and takes about 3 hours, can seen in Table 1.
Table 1. Validation dataset sharing
Object | Number of datasets |
Ball | 300 |
Goal | 300 |
Robot/Obstacle | 300 |
Ball and Goal | 400 |
Ball and Robot/Obstacle | 400 |
Goal and Robot/Obstacle | 400 |
Total | 2100 |
Based on Table 1, the dataset above will be tested using the YOLOv8 model which produces experimental data, which will later be used for training data test, can be seen in Table 2.
Table 2. Test results on the model
Confusion Matrix | YOLOv8 | ||
Ball | Goal | Robot/Obstacle | |
TP | 830 | 708 | 530 |
FP | 29 | 133 | 78 |
TN | 125 | 125 | 125 |
FN | 80 | 82 | 93 |
Based on Table 2, True Positive (TP) value gets the final result YOLOv8 with a value of 830 for the ball, 708 for the goal, 530 for the robot/obstacle. On the False Positive (FP) value gets the final result YOLOv8 with a value of 29 for the ball, 133 for the goal, 78 for the robot/obstacle. While the True Negative (TN) value gets the final result YOLOv8 with a value of 125 for the ball, 125 for the goal, 125 for the robot / obstacle. Furthermore, the False Negative (FN) value gets the final result of YOLOv8 with a value of 80 for the ball, 82 for the goal, 93 for the robot/obstacle. And the final results of the YOLOv8 dataset training process can be seen in Figure 7.
Figure 7. Final result of training dataset
The results of the dataset training process can be seen in the runs folder, in the runs folder there are many arrays of data results that have been obtained after going through the dataset training process on googlecollab, such as the weights folder which contains last.pt and best.pt which are used as libraries to detect objects that have been trained, in other folders there are assessment curves from the final results of training datasets such as confusion_matrix, F1_curve, and Result_curve. The implementation process of modeling results with ball and obstacle objects can be seen in Figure 8, Figure 9, and Figure 10. In Figure 8 the omnidirectional camera successfully detects the ball object with a value of 0.92 to 1, without the interference of other objects in the area with the state of the orange ball used as object detection on a green carpet to avoid color contrast in the background when object detection takes place in real-time.
In Figure 9 the omnidirectional camera successfully detects obstacle objects with a value of 0.84 to 1, without the interference of other objects in the area with black obstacles used as object detection on a green carpet to avoid color contrast in the background when object detection takes place in real-time.
Figure 8. Implementation of modeling results with ball
Figure 9. Implementation of modeling results with obstacle | Figure 10. Implementation of modeling results with two different objects |
In Figure 10 the omnidirectional camera successfully detects 2 different objects, namely the ball and the obstacle at the same time in real-time with a ball value of 0.85 to 1 and an obstacle value of 0.88 to 1, without the interference of other objects in the area with the condition of 2 obstacles in one omnidirectional camera frame, namely the orange ball object and the black obstacle used as object detection, which are both on a green carpet to avoid color contrast in the background when object detection takes place in real-time. In this test there is a graph of the results of the dataset training process which can be seen in Figure 11, Figure 12, Figure 13, and Figure 14.
Based on Figure 11 confusion matrix is a table used to evaluate the performance of classification models in machine learning. It compares the predicted results generated by the model with the actual results from the test data, thus helping in understanding how well the model works. The confusion matrix provides a detailed overview of the errors made by the model as well as the areas where the model needs to be improved. precision, recall and accuracy values on the confusion matrix in this study using YOLOv8 in object detection the average value is close to 1 to 1 for overall objects such as balls, goals, robots / obstacles.
Based on Figure 12 F1 curve is a visualization of the F1 score produced by a classification model at various thresholds or parameters. F1 score is a matrix that combines precision and recall, thus providing a more thorough picture of the model's performance, especially when there is class imbalance.
Figure 11. Confusion_matrix on the final result of the training dataset
Figure 12. F1_curve on the final training dataset results
Figure 13. Result_curve(1) on the final training dataset results
Figure 14. Result_curve(2) on the final training dataset results
Based on Figure 13 and Figure 14 the result curve, or often referred to as the precision-recall curve, is a visualization tool used to evaluate the performance of classification models, especially in situations with class imbalance. The precision-recall curve illustrates the relationship between precision and recall for various decision thresholds.
In result_curve there are:
Tests were conducted using the pre-trained YOLOv8 model to detect 3 classes of objects, namely, balls, goals and robots/obstacles in real-time using an omnidirectional camera. The output of the detection results can be seen in Table 3.
Table 3. Model peformance
Load Model | YOLOv8 |
Accuracy | 85.7% |
Error Classification | 20.3% |
Precision | 88.4% |
Recall | 89.7% |
F1-Score | 88.3% |
mAP@0.5 | 95.5% |
Based on Table 3 the final results of object detection testing using YOLOv8 get model performance values such as accuracy against detected objects such as balls, goals, and robots/obstacles of 85.7% of 100% with an error classification value against detected objects of 20.3% of 100%, a precision value of 88.4%, a recall value against objects of 89.7%, an f1 score of 88.3% with a final result of mAP@0.5 is 95.5%.
Based on the results of research that has been done on tracking the ball using the YOLOv8 method on a wheeled soccer robot with an omnidirectional camera, conclusions can be drawn. The omnidirectional camera is able to detect objects that have been trained by capturing these objects with a predetermined classification. By using YOLOv8 the results_curve obtained is more stable in measuring the mAP value. In performance testing with 320×320 and 416×416 frames YOLOv8 has a higher mAP value with a value of 95.5% and a more stable average fps rate during object detection compared to previous versions of YOLO, this shows a drastic improvement from the use of previous methods.
ACKNOWLEDGEMENT
Thank You Lots I say to the whole party that has supported study this, so study this can be utilized for robot development and so on outlook knowledge of the academic community, especially Ahmad Dahlan University, Don't forget I say to readers, criticism and suggestions for study this will I thanks.
REFERENCES
AUTHOR BIOGRAPHY
Refli Rezka Juliandan Student of the Electrical Engineering Study Program, Ahmad Dahlan University. Email : refli2000022045@webmail.uad.ac.id | |
Riky Dwi Puriyanto, Lecturer of Electrical Engineering Study Program, Ahmad Dahlan University. Email: rikydp@ee.uad.ac.id |
Tracking Ball Using YOLOv8 Method on Wheeled Soccer Robot with Omnidirectional Camera
(Refli Rezka Julianda)