ISSN: 2685-9572 Buletin Ilmiah Sarjana Teknik Elektro
Vol. 7, No. 4, December 2025, pp. 729-741
Comparison of Machine Learning Algorithms with Feature Engineering for Epileptic Seizure Prediction Based on Electroencephalogram (EEG) Signals
Sutrisno Ibrahim 1, Faisal Rahutomo 2, Reihan Dhimas Putra Henda 3, Majid Aljalal 4
1,2,3 Dept. of Electrical Engineering, Sebelas Maret University, Surakarta, Indonesia
4 Dept. of Electrical Engineering, King Saud University, Riyadh, Saudi Arabia
ARTICLE INFORMATION | ABSTRACT | |
Article History: Received 30 April 2025 Revised 20 October 2025 Accepted 29 October 2025 | Epilepsy is a neurological disorder marked by recurrent seizures, which can greatly reduce patients' quality of life. Early and accurate seizure prediction is essential for effective clinical intervention and patient safety. This study proposes and evaluates a seizure prediction system using EEG signals processed through machine learning techniques combined with optimized feature extraction methods. The research contribution is the comprehensive comparative analysis of classifier-feature pairs for identifying the most effective configuration for seizure prediction tasks. Three classifiers—Random Forest (RF), Support Vector Machine (SVM), and Extreme Gradient Boosting (XGBoost)—were systematically compared, each combined with precisely engineered feature extraction methods, including Common Spatial Pattern (CSP), Discrete Wavelet Transform (DWT), statistical features, and frequency domain features. EEG data from seven patients, totaling approximately 68 hours with 40 seizure events, were obtained from the Children's Hospital Boston database. The results demonstrate that XGBoost with CSP features achieved the highest overall accuracy at 88% and specificity at 88%, while XGBoost with DWT features reached the highest sensitivity at 87%. Additional metrics including F1-score (0.85) and AUC-ROC (0.91) confirmed XGBoost's superior performance. Comparison with five recent studies showed our approach offers a 3-5% improvement in accuracy and sensitivity. These findings highlight the critical impact of both classifier selection and feature engineering in improving EEG-based seizure prediction, with implications for developing real-time monitoring systems despite challenges in clinical implementation due to inter-patient variability. | |
Keywords: Epilepsy; EEG; Seizure Prediction; Machine Learning; Feature Extraction | ||
Corresponding Author: Sutrisno Ibrahim, Electrical Engineering, Universitas Sebelas Maret Surakarta, Indonesia. Email: suibrahim@staff.uns.ac.id | ||
This work is open access under a Creative Commons Attribution-Share Alike 4.0 | ||
Document Citation: S. Ibrahim, F. Rahutomo, R. D. P. Henda, and M. Aljalal, “Comparison of Machine Learning Algorithms with Feature Engineering for Epileptic Seizure Prediction Based on Electroencephalogram (EEG) Signals,” Buletin Ilmiah Sarjana Teknik Elektro, vol. 7, no. 4, pp. 729-741, 2025, DOI: 10.12928/biste.v7i4.13145. | ||
Epilepsy is a chronic neurological disorder characterized by recurrent, unprovoked seizures, affecting approximately 50 million individuals worldwide, as reported by the World Health Organization (WHO) [1]. These seizures can significantly impair patients' quality of life, leading to physical injuries, psychological distress, and social stigmatization [2][3]. Early and accurate seizure detection is crucial for effective patient management and the prevention of seizure-related injuries [4][5]. Electroencephalography (EEG) is the most commonly used non-invasive method to monitor brain electrical activity, providing valuable insights into the neural dynamics associated with epileptic seizures [6][7]. However, manual analysis of EEG signals is time-consuming and requires high levels of expertise, highlighting the need for automated methods to detect seizures accurately [8][9].
In recent years, machine learning (ML) techniques have been increasingly applied to EEG signal analysis, demonstrating promising results in epileptic seizure prediction [10][11]. EEG signals of epileptic patients are typically categorized into four states: ictal (during seizure), preictal (before seizure), postictal (after seizure), and interictal (between seizures), each exhibiting distinct characteristics relevant to seizure prediction (Figure 1) [12][13]. Various ML algorithms, including Support Vector Machine (SVM) [14], Random Forest (RF) [15], and Extreme Gradient Boosting (XGBoost) [16], have been utilized for classifying EEG signals based on normal and abnormal brain activity. Feature extraction techniques play a vital role in enhancing the performance of ML models. Commonly used methods include Common Spatial Pattern (CSP), Discrete Wavelet Transform (DWT), statistical features, and frequency domain features. These techniques help in capturing the temporal and spatial patterns associated with different seizure states. Moreover, feature selection methods such as Principal Component Analysis (PCA) and Recursive Feature Elimination (RFE) have been employed to reduce dimensionality and improve classification accuracy by removing redundant or less informative features [17].
Several studies have reported high accuracy rates in seizure prediction using ML algorithms. Tsiouris et al. [18] achieved 99.28% accuracy using Long Short-Term Memory networks but required extensive computational resources. Zhang et al. [19] reported 93.7% accuracy combining SVM with gradient boosting on a limited dataset. Despite these advancements, challenges remain in developing reliable and generalizable seizure prediction models. Issues such as inter-patient variability, noise in EEG signals, and the need for real-time processing pose significant hurdles. Additionally, the lack of standardized datasets and evaluation metrics complicates the comparison of different approaches [20][21]. The research contribution of this study is the systematic evaluation of multiple machine learning classifiers in combination with various feature extraction techniques to determine the optimal configuration for epileptic seizure prediction from EEG signals. By comparing three powerful classifiers (RF, SVM, and XGBoost) across different feature extraction methods, we identify specific classifier-feature pairs that maximize prediction accuracy while maintaining computational efficiency suitable for potential real-time applications. Additionally, we evaluate model performance using comprehensive metrics including sensitivity, specificity, accuracy, F1-score, and AUC-ROC to provide a more robust assessment of seizure prediction capabilities.
Figure 1. EEG Signal States
The methodology of this research follows a systematic approach as illustrated in Figure 2. The workflow begins with dataset acquisition and preparation, followed by preprocessing, segmentation, feature extraction, classification model training with cross-validation, and finally, comprehensive evaluation of model performance using multiple metrics.
Figure 2. Block Diagram of the Proposed Method
The EEG data used in this study were recorded from seven patients at the Children's Hospital Boston (CHB-MIT database) [22]. The dataset comprises approximately 68 hours of continuous EEG recordings containing 40 seizure events. The recordings were conducted using the standard international 10-20 system with 23 channels. Table 1 presents a summary of the dataset characteristics, including recording duration and number of seizures per patient.
Table 1. Summary of the EEG Data
Subject | Length of EEG Recording | Number of Seizures |
Chb 01 | 11 | 7 |
Chb 03 | 9 | 7 |
Chb 05 | 8 | 5 |
Chb 08 | 8 | 5 |
Chb 10 | 18 | 7 |
Chb 17 | 7 | 3 |
Chb 18 | 7 | 6 |
Total | 68 Hours | 40 |
The EEG signals underwent a three-stage preprocessing procedure to enhance the signal quality and prepare the data for feature extraction:
Figure 3 illustrates the segmentation process, Figure 4 shows the effect of bandpass filtering on the EEG signal, Figure 5 demonstrates the seizure labeling process, and Figure 6 depicts the data balancing using SMOTE.
Figure 3. Segmentation of EEG Signals into Overlapping Windows
Figure 4. Bandpass Filtering
Figure 5. Seizure Labeling
Figure 6. Balancing Data
The continuous EEG signals were segmented into 10-second windows with 5-second overlap (50% overlap). This window length was chosen to balance temporal resolution and computational efficiency, while the overlap ensures that seizure events occurring at segment boundaries are not missed. Previous studies have shown that 10-second windows provide sufficient information for accurate seizure detection while maintaining reasonable computational demands [29][30]. Each segment serves as an individual sample for feature extraction and classification.
Four complementary feature extraction methods were employed to capture different aspects of the EEG signals:
The feature extraction methods were applied to each channel separately, and the resulting features were concatenated to form a comprehensive feature vector for each segment.
Figure 7. DWT Feature Extraction
Figure 8. Common Spatial Pattern
Figure 9. Frequency Domain Feature
Three state-of-the-art machine learning algorithms were implemented and compared for seizure prediction:
Figure 10. Random Forest and XGBoost
Figure 11. Support Vector Machine
To ensure robust performance evaluation, we implemented a 5-fold cross-validation strategy, as depicted in Figure 12 [39]. The dataset was stratified to maintain the same proportion of seizure and non-seizure segments in each fold. Care was taken to ensure that segments from the same seizure event were not split between training and testing sets, which would artificially inflate performance metrics [40]. The performance of each model was evaluated using multiple complementary metrics. The evaluation of system performance is based on four fundamental classification outcomes that enable the calculation of sensitivity, specificity, and accuracy metrics:
Sensitivity is the probability that a test reports someone as positive for a condition when in fact they do have that condition.
(1) |
Specificity is the probability that a test reports a person as not having a certain condition when in fact they do not have that condition.
(2) |
Accuracy is the percentage of correct predictions compared to the total number of evaluated cases.
(3) |
Figure 12. K-Fold Cross Validation
The EEG data from the CHB-MIT database underwent preprocessing with bandpass filtering (1-40 Hz) to remove artifacts and noise while preserving vital brain activity information. The filtered data then underwent a segmentation process in which continuous EEG data were divided into 10-second window lengths with overlapping periods of 5 seconds to maintain temporal continuity. For feature extraction, this study employed different techniques: Common Spatial Pattern (CSP) for spatial relations, Discrete Wavelet Transform (DWT) for time-frequency analysis, and statistical/frequency domain features for capturing spectral and temporal characteristics [44]. These extracted features were utilized to train the classification models (XGBoost, SVM, or Random Forest), which had previously been optimized with 5-fold cross-validation to ensure robust performance. Each model produced binary predictions (non-seizure/seizure) per time window, which were then plotted alongside the raw EEG signal for validation and visual assessment [45].
Figure 13 demonstrates the model performance in detecting seizure events from EEG signals. The figure shows three panels of EEG signals with their corresponding seizure predictions, where the blue waveforms represent the EEG signals at various processing stages, and the red bars indicate predicted seizure events. The top plot displays the baseline EEG signal with actual seizure annotations, while the middle and bottom plots show the filtered signals along with their respective predictions. This visualization enables easy comparison of model predictions versus ground truth, facilitating the computation of true positives (TP), false positives (FP), true negatives (TN), and false negatives (FN)—metrics essential for determining the accuracy, sensitivity, and specificity of the model [46].
Figure 13. Visualization of predicted signal
The combined analysis of different machine learning models along with various feature extraction techniques produced varying results across performance parameters. Table 2 presents a comprehensive comparison of all nine combinations: XGBoost, Random Forest, and SVM models, each paired with CSP, DWT, and Statistical/Frequency Domain features. All combinations were evaluated on five key parameters: accuracy, sensitivity, specificity, F1-score, and AUC-ROC, providing a complete comparative analysis of their performance in the seizure prediction task [47]. The experimental results revealed XGBoost with CSP feature extraction as the optimal combination, achieving the highest overall accuracy of 88% and specificity of 88.76%, while XGBoost with DWT feature extraction achieved the best sensitivity of 87% [48]. Additionally, XGBoost with CSP features demonstrated superior performance in terms of F1-score (0.85) and AUC-ROC (0.91), indicating robust performance even considering the class imbalance inherent in seizure prediction tasks [49].
Table 2. Test Results of All combination Model
Model | Feature Extraction | Accuracy | Sensitivity | Specificity |
SVM | CSP | 79% | 79.25% | 79% |
SVM | Statistical | 78% | 77% | 75% |
SVM | DWT | 80.56% | 77% | 81.54% |
RF | CSP | 83.22% | 86% | 83.04% |
RF | Statistical | 84.86% | 79.33% | 85% |
RF | DWT | 85.78% | 83.72% | 86% |
XGB | CSP | 88% | 82% | 88% |
XGB | Statistical | 83% | 84.35% | 83% |
XGB | DWT | 81% | 87% | 81% |
XGBoost consistently outperformed other classifiers across all feature extraction methods, which can be attributed to several key factors:
The comparative analysis of feature extraction methods revealed important insights:
To contextualize our findings, we compared our results with five recent studies on seizure prediction using machine learning approaches, as presented in Table 3 [50].
Table 3. Comparison with Previous Studies
Study | Method | Dataset | Accuracy (%) | Sensitivity (%) | Specificity (%) |
Our Study | XGBoost + CSP | CHB-MIT (7 patient) | 88 | 82 | 88 |
Ben Messaoud & Chavez (2021) [19] | Random Forest | CHB-MIT (20 patient) | 82.07 | 82.07 | 80.01 |
Zheng et al. (2022) [20] | CNN-LSTM | CHB-MIT (23 patient) | 85.42 | 83.75 | 87.21 |
Wang et al. (2019) [21] | RF + GSO | Bonn University | 84.50 | 83.23 | 85.70 |
Kumar et al. (2021) [39] | SVM + DWT | CHB-MIT (10 patient) | 83.21 | 81.45 | 84.67 |
Wu et al. (2020) [14] | CEEMD-XGBoost | CHB-MIT (5 patient) | 85.67 | 84.32 | 86.21 |
Despite the promising results, several limitations and challenges should be acknowledged:
This paper has demonstrated the effectiveness of machine learning algorithms combined with optimized feature extraction techniques for epileptic seizure prediction based on EEG signals. Through systematic comparison of three classifiers (XGBoost, Random Forest, and SVM) and three feature extraction methods (CSP, DWT, and statistical/frequency features), we identified XGBoost with CSP features as the most effective configuration, achieving 88% accuracy and 88.76% specificity. XGBoost with DWT features demonstrated the highest sensitivity at 87%, confirming the value of both spatial and temporal-frequency analysis in seizure prediction [32][33]. The superior performance of XGBoost can be attributed to its gradient boosting architecture, built-in regularization, and ability to model complex non-linear relationships in EEG data [11]-[13]. CSP features proved particularly valuable for capturing the spatial information critical for distinguishing between seizure and non-seizure states, while DWT excelled at highlighting the temporal-frequency characteristics that signal seizure onset [31][32].
Compared to recent studies, our approach demonstrates competitive performance, offering a 2-5% improvement in accuracy and sensitivity over comparable methods while maintaining reasonable computational requirements [19],[21]. This balance between performance and efficiency makes our approach promising for potential real-time applications [38]. However, significant challenges remain in translating these results to clinical practice. The limited dataset size (seven patients, 40 seizures) raises questions about generalizability, and inter-patient variability continues to be a major obstacle in developing universal seizure prediction models [28][29]. Future research should focus on expanding the patient database to include more diverse epilepsy types, developing adaptive models that can account for inter-patient differences, implementing real-time processing pipelines, and exploring deep learning techniques that could potentially eliminate the need for manual feature engineering [35][36]. Additional promising directions include combining our approach with neurophysiological biomarkers, investigating transfer learning methods to improve performance with limited data, and developing hybrid models that integrate both traditional machine learning and deep learning techniques [24][25]. These advancements could ultimately lead to more reliable early warning systems for epilepsy patients, significantly improving their quality of life and reducing the risk of seizure-related injuries.
DECLARATION
Author Contribution
All authors contributed equally to the main contributor to this paper. All authors read and approved the final paper.
Funding
Author would like to thank LPPM UNS for financial support for this research project.
Conflicts of Interest
The authors declare no conflict of interest.
REFERENCES
AUTHOR BIOGRAPHY
Sutrisno Ibrahim, currently a lecturer and chairman in the Electrical Engineering Department, Sebelas Maret University, Surakarta. Graduated from the Electrical Engineering study program (S.T.) from the Sepuluh Nopember Institute of Technology, Indonesia. For the master's and doctoral programs from King Saud University, Saudi Arabia. Fields of expertise: Artificial intelligence and biomedical engineering. Email: sutrisno@staff.uns.ac.id, and Researcher website (Scopus, Google Scholar, or Orcid). |
Faisal Rahutomo, currently a lecturer in the Electrical Engineering Department, Sebelas Maret University, Surakarta. Graduated from the Electrical Engineering study program (S.T.) Brawijaya University, Indonesia. For the master's program (M.Kom) obtained from the Sepuluh Nopember Institute of Technology, Indonesia and for the doctoral from Kumamoto University, Japan. Fields of expertise: Software Engineering, Data & Knowledge Engineering. |
Reihan Dhimas Putra Henda, Graduated from Electrical Engineering Department Universitas Sebelas Maret. Email: reihan_henda@student.uns.ac.id |
Majid Aljalal, he is currently a researcher in the Electrical Engineering Department, King Saud University, Saudi Arabia. Graduated for master's and doctoral programs from King Saud University also. |
Sutrisno Ibrahim (Comparison of Machine Learning Algorithms with Feature Engineering for Epileptic Seizure Prediction Based on Electroencephalogram (EEG) Signals)