2018 A deep learning architecture for temporal sleep stage classification using multivariate and multimodal time series https://ieeexplore.ieee.org/abstract/document/8307462/

2018 A convolutional neural network for sleep stage scoring from raw single-channel EEG https://www.sciencedirect.com/science/article/pii/S1746809417302847

PaperKey Ideas
[2019] Multilevel weighted feature fusion using convolutional neural networks for EEG motor imagery classification
  • proposed model architecture is like AlexNet, with batch norm and dropout, and exp linear units
  • extract features at every convu layers after pooling
  • raw EEG to 2D array (time and channels)
  • pretrain using High Gamma dataset, and train on BCI2a. (has different # of channels, so i dont know how they do it)
[2019] On the Vulnerability of CNN Classifiers in EEG-Based BCIs
  • explore adversarial examples (vulnerability) on EGGNet, DeepCNN and ShallowCNN
  • three CNN classifiers can all be easily fooled with tiny adversarial perturbations generated by the proposed UFGSM
[2019] Utilizing Deep Learning Towards Multi-modal Bio-sensing and Vision-based Affective Computing
  • classify emotions. used 4 different dataset, they are all different in various ways
  • 3 EEG band into 3 channels like RGB
  • used imagenet pre-trained VGG-16 to extract features
  • include LSTM for DEAP dataset, improve accuracy
  • combine multiple datasets outperform individual dataset
[2019] Validating deep neural networks for online decoding of motor imagery movements from EEG signals [Code]
  • develop 3 architecture for BCI classification, LSTM, CNN, RCNN
  • LSTM network with one hidden layer containing 128 cell units followed by fully-connected layer
  • pragmatic CNN (pCNN), complexity the pCNN is in between of sCNN and dCNN
    • convert EEG to image with short-time Fourier transform
    • 3 convolution blocks: convo + batch norm + max-pool + ReLU
    • dropout (0.2)
    • FC
  • RCNN, convolu + (recurrent convolutional + maxpool stack) + FC
  • dCNN and pCNN has best performance
  • dCNN higher performance but need more parameters, pCNN has lower standard dev
  • LSTM and sCNN, around the same performance
[2019] HS-CNN: A CNN with Hybrid Convolution Scale for EEG Motor Imagery Classification
  • issues:
    • classification accuracy differ significantly from subject to subject or from time to time for the same subject
    • CNN requires large amount of training data, but challenging as acquiring such data require subject to concentrate
    • best kernel size varies from subject to subject
    • different "band" with different Hz range, differs for different motor task
  • hybrid-scale CNN with data augmentation to address issues
    • 3 different kernel size for each frequency band to extract time features
    • data augmentation, segmentation crop, for same person same class do random swap to generate new data
  • dropout 0.8 and 0.01 L2 regularisation
[2018] Learning Temporal Information for Brain-Computer Interface Using Convolutional Neural Networks
  • preserve temporal representation of EEG
  • explored 3 types of convolution, time, channels, time & channels
  • explored breaking 2D (time & channels) into 2x 1D convolu
[2018] Fast and Accurate Multiclass Inference for MI-BCIs Using Large Multiscale Temporal and Spectral Features [Code]
  • feature extraction: CSP and Riemannian are enhanced to multiscale spectral and temporal features to capture dynamic nature of EEG
  • 4 stage architecture:
    • temporal division
    • spectral division
    • CSP or Riemannian
    • classification with SVM (4 classes)
[2017] A novel deep learning approach for classification of EEG motor imagery signals
  • combine CNN and stacked autoencoders for classification
  • instead of 2D filtering, used 1D (same height as the input) to handle vertical location
  • autoencoders to learn features in hidden layer
  • combine CNN and SAE to make it more robust to high noise low signal, and to vary among trials
  • measure accuracy by subject, 10 fold cross validation
  • use Kappa to measure accuracy, by removing effect from random classification
[2017] Deep learning with convolutional neural networks for brain mapping and decoding of movement-related information from the human EEG
[2017] Deep Learning With Convolutional Neural Networks for EEG Decoding and Visualization
  • end to end convolution architecture (2 parts: feature extraction + classifier):
    • shallow: 2 layers. not as good as FBCSP
    • deeper: 5 to 31 layers. better than FBCSP
    • hybrid: shallow and deep. similar or slight worse than deep ConvNet
    • ResNet. worst than deep ConvNet
  • effects of design choices are crucial for high decoding accuracies such as:
    • dropout, 0.5 probability, increased accuracy
    • batch norm, standardise outputs to zero mean and unit variance of training examples, increased accuracy
    • regularisation, new objective function which penalise discrepancies between predictions of neighboring crops
    • exp linear units (ELU), ReLU worsened performance, use ELU instead
  • input and target:
    • input: whole trials, target: trial labels
    • input: crop 2s sliding time windows, target: trial labels. increased accuracies for deep ConvNet.
  • has pre-processing to avoid decoding eye related signals, filtered removed below 4Hz with 3rd order Butterworth filter (2.7.1)
  • author has good visualisation for each classification, by gamma band activity
[2016] Interpretable deep neural networks for single-trial EEG classification
  • produces neurophysiologically highly plausible explanations of how a DNN reaches a decision
  • classification on motor imaginery
  • apply Layer-wise Relevance Propagation (LRP) to produce heatmaps that indicate the relevance of each data point of a spatio-temporal EEG epoch for the classifier’s decision
  • downsampled to 100 Hz
  • bandpass filter range 9-13 Hz
[2016] EEGNet: a compact convolutional neural network for EEG-based brain–computer interfaces
  • evaluate on 4 dataset
  • Depthwise Convolution of size (C,1) to learn a spatial filter. provides a direct way to learn spatial filters for each temporal filter, thus enabling the efficient extraction of frequency-specific spatial filters
  • Separable Convolution, reduce number of parameters and decouple relationship across feature maps
  • dropout: 0.25 for subject, 0.5 for cross subject
[2016] EEG based eye state classification using deep belief network and stacked autoencoder
  • implement Deep Belief Network and Stacked AutoEncoder to predict eye state using EEG
  • internal and external interference can affect EEG signals, external such as power equipments and environment, internal such as eye movements, muscle and respiratory
  • use Deep Belief Network for classification
  • use Stacked AutoEncoder to reconstruct input to extract features
  • used Discrete Wavelet Transform to extract features from EEG signals
[2015] Learning representations from EEG with deep recurrent-convolutional neural networks [Code]
  • issues:
    • invariant inter- and intra- subject differences
    • noise in EEG data collection
    • existing work does not preserve EEG data within space, time and frequency
  • rather than representing EEG features as vectors, transform into multi-dimensional tensor, like a movie
  • CNN to extract each frame, to extract spatial and spectral invariant representation
  • LSTM to extract temporal patterns in frame sequence
  • a single image has 3 channels for each of the 3 prominent frequency bands
  • image generated from EEG using Fast Fourier Transform


A review of classification algorithms for EEG-based brain–computer interfaces


  • curse of dimensionality: training sample small compared to size of features
  • bias-variance tradeoff: stable classifiers have high bias and low variance thus simple classifiers outperform complex models

classifiers categories:

  • linear classifiers
    • Linear Discriminant Analysis (LDA), success in great number of BCI applications, simple and provides good results, but poor results on complex nonlinear EEG data
    • Support Vector Machine (SVM), good for BCI application, insensitive to overfitting
  • neural networks
    • multilayer perceptrons, most popular, sensitive to overfitting to noise therefore need regularisation
  • nonlinear bayesian classifiers
    • Bayes quadratic, have some success to motor imagery and mental task classification
    • Hidden Markov Model, not promising classifiers for BCI systems
  • nearest neighbor classifiers
    • k-NN, not popular, very sensitive to curse of dimensionality
    • mahalanobis distance, simple and robust classifier, good performances, but scarcely used in BCI
  • combinations of classifiers
    • boosting with MLP
    • by majority voting
    • by stacking models

different kinds of classifiers:

  • generative:
    • bayes
  • discriminative:
    • e.g. SVM
    • perform better than generative in presence of noise or outliers
    • deal with high dimensionality feature vectors
  • static:
    • multilayer perceptrons
  • dynamic:
    • hidden markov, able to classify raw EEG, exploit temporal information
  • stable:
    • linear discriminant analysis
  • unstable:
    • multilayer perceptrons

performance metric: - kappa coefficient - mutual information - sensitivity - specificity

A review of classification algorithms for EEG-based brain–computer interfaces: a 10 year update

classifiers categories:

  • adaptive classifiers
  • matrix and tensor classifiers
  • transfer learning and deep learning
  • miscellaneous classifiers

feature extraction

  • spatial filtering
    • Principal Component Analysis (PCA)
    • Common Spatial Patterns (CSP)
    • Filter Bank CSP (FBCSP)
  • combine various sensors data into two or more dimensions matrices
    • increase classification accuracies
    • but it increases dimensions, so require feature selection

feature selection

  • purpose:
    • remove redundant, not related to task
    • fewer parameters to optimise, also reduce possible overfitting on small data set
    • with fewer features, easier to observe which feature are actually related to tasks
    • fewer parameters leads to faster computation
    • storage space reduced
  • for P300-BCI
    • stepwise Linear Discriminant Analysis
  • for motor imagery
    • frequency bands selection
  • popular methods
    • maximum relevance minimum redundancy
    • r^2
    • correlation-based
    • information gain
    • 1R ranking

evaluation metrics

  • kappa
  • confusion matrix
  • sensitivity-specificity
  • ROC
  • AUC


  • low signal-to-noise ratio of EEG signals
  • non-stationarity over time, within or between users
  • limited amount of training data that is generally available to calibrate the classifiers
  • overall low reliability and performance of current BCIs

new EEG classifiers from 2007-2017

  • adaptive
    • works well for motor-imagery and some ERP tasks
    • each feature in a linear discriminant hyperplane, are incrementally re-estimated and updated over time as new EEG data become available
    • both supervised and unsupervised adaptation, and semi-supervised adaptation
    • for motor imagery:
      • LDA
      • Quadratic Discriminant Analysis (QDA)
      • Kernel Discriminant Analysis (KDA)
    • for ERP-based:
      • adaptive Support Vector Machine (SVM)
      • adaptive LDA
      • stochastic gradient-based adaptive linear classifier
      • online Passive-Aggressive (PA) algorithms
    • for P300:
      • bayesian LDA
      • standard LDA
  • matrices and tensors
    • map EEG data directly into some form of covariance matrix
    • Riemannian minimum distance to mean (RMDM) classifier is robust to noise and generalise better
    • tensor dimensions: space (channels), time, frequency, subjects, trials, groups, conditions, wavelets, dictionaries)
    • HODA algorithm estimate 4 most significant features, improved accuracy
  • deep learning
    • convolutional
    • restricted boltzmann machines: a markov random field with bipartite undirect graph
    • CNN + Deep Belief Network (DBN) is effective
    • shallow networks are effective due to the small of data samples of BCI

EEG-Based Brain-Computer Interfaces Using Motor-Imagery: Techniques and Challenges

feature extraction techniques:

  • time-domain approach
  • frequency-domain approach
  • time-frequency domain approach
  • common spatial pattern (CSP)
    • common sparse spatio-spectral patterns (CSSSP)
    • sub-band common spatial pattern (SBCSP)

feature selection techniques:

  • principal component analysis (PCA)
  • filter bank selection
    • Filter bank CSP (FBCSP)
    • Sparse filter bank CSP (SFBCSP)
  • evolutionary algorithms (EAs)
    • particle swarm optimization (PSO)
    • differential evolution (DE) optimization
    • artificial bee colony (ABC) optimization
    • ant colony optimization (ACO)
    • firefly algorithm

classification methods:

  • non-deep learning
    • SVM
    • LDA
    • k-NN
    • naive Bayes
    • regression trees
  • deep Learning
    • CNN
      • convert 2 seconds EEG data to image
      • raw data fed into CNN, first layer extract spatial and temporal information
    • RNN
    • stacked auto encoders (SAEs)
    • deep belief networks
    • LSTM-RNN

Deep learning-based electroencephalography analysis: a systematic review


  • research on EEG applications are relatively small, compared to other deep learning applications
  • about 41% uses CNN, about 14% uses RNN, 15$ uses auto-encoders
  • 54% uses public dataset, 42% reported results from private recording, 4% use both
  • 19% have code available, 7% have code and uses public available data
  • half of research used dataset contained fewer than 13 subjects
  • vary number of electrodes in studies
  • sampling rate also varied, 50% uses 250 Hz or less, highest 5000 Hz
  • very few papers (3) explored impact of data augmentation
  • DNN with 7 layers performed better than shallow (2-4) and deeper (>10)
  • 47% did not report on optimiser, 30% uses Adam, usage of Adam is increasing

research on EEG

  • classification sleep staging, seizure detection, brain computer interfaces
  • improvement on feature extraction or visualising train models
  • data augmentation, generation images

challenges of EEG

  • low signal to noise ratio, need to filter noise to extract true brain activity
  • non-stationary signal, generalize poorly even on same individual
  • high inter-subject variability, physiological differences between individuals, 38% vs. 75% accuracy on unseen and seen subjects
  • manually annotating windows of few seconds of signals requires a lot of time

DNN can help in these areas:

  • learn and extract features from raw or minimally preprocessed data
  • reduce domain specific processing and feature extraction
  • features might be more expressive than those engineered by humans
  • might be able to do transfer learning, on different analysis tasks

but EEG has issues with DNN:

  • DNN needs large dataset, current lack of data
  • EEG’s low signal to noise ratio make it very difficult for DNN to learn (compared to CV and NLP)
  • no standards performance metrics for reporting methodology
  • lack of baseline performance


  • downsampling
  • band-pass filtering
  • windowing


  • shallower architectures preferred when limited amount of data available
  • data augmentation is useful when limited data
  • overlapping is useful, but no consensus on best overlapping percentage
  • no clear preference to use fourier filter extracted features or using raw EEG, but using raw EEG is upward trend, as CNN is effective for processing time series
  • many different tasks, many different dataset were used, often private or limited, lack of reproducibility, low accountability

recommendation for future EEG studies to include

  • describe architecture of model
  • describe data used, number of subjects, number of samples, data augmentations
  • compare performance against public dataset
  • state and improve from existing state of the art baselines
  • share internal recordings
  • share experiment code, include hyperparameters, models file for re-run