Abstract:
Different categories of visual stimuli evoke distinct activation patterns in the human brain. These patterns can be captured with EEG for utilization in application such as Brain-Computer Interface (BCI). However, accurate classification of these patterns acquired using single-trial data is challenging due to the low signal-to-noise ratio of EEG. Recently, deep learning-based transformer models with multi-head self-attention have shown great potential for analyzing variety of data. This work introduces an EEG-ConvTranformer network that is based on both multi-headed self-attention and temporal convolution. The novel architecture incorporates self-attention modules to capture inter-region interaction patterns and convolutional filters to learn temporal patterns in a single module. Experimental results demonstrate that EEG-ConvTransformer achieves improved classification accuracy over state-of-the-art techniques across five different visual stimulus classification tasks. Finally, quantitative analysis of inter-head diversity also shows low similarity in representational space, emphasizing the implicit diversity of multi-head attention.