Machine Learning for Approximating Endpoints in Clinical Neuroscience from Heterogenous Input Data
Speaker: Dr. Denis A. Engemann, French National Institute of Computer Science (Inria-Saclay), Parietal Team
Machine learning has pushed the study of brain health beyond comparing group averages. The statistical view of machine learning as function approximation in high dimensions is attractive for clinical neuroscience as it lends itself towards robust modeling of clinical endpoints from rich and heterogenous input data . A full realization of this research program currently hinges upon meeting important challenges related to the reality of data in clinical neuroscience. In the small-sample regime, common to clinical neuroscience studies, pooling various imaging and electrophysiology modalities places a premium on strong domain knowledge, good priors and flexible handling of missing inputs [2-3]. When the actual endpoint of interest, e.g. neuropsychiatric diagnosis, is expensive to acquire, the total number of labeled samples may not suffice to enable high-fidelity machine learning. In such situations, developing proxy measures derived from widely available labeled targets, such as age, provides a viable workaround [2-4]. Even when data is available in abundance, the study of mental health often poses the additional challenge that, in contrast to somatic medicine no confirmatory measures exist such as plasma-assessment of insulin levels in diabetes. Here, building various proxy measures becomes a way of life and, depending on the target of interest, behavioral and sociodemographic input data can become the go-to modality for studying mental health . This raises the question if machine learning itself can be used to automatically extract proxy measures from brain signals without explicit choice of one specific proxy target. Recent advances in deep learning and non-linear independent component analysis have given rise to the self-supervised learning paradigm. Applied to EEG data, SSL readily captures the structure of clinical recordings reflecting various factors such as the clinical operator, electrode selection, date, age, sleep stages and neuropsychiatric pathology . While causal mechanisms are still under investigation powered by experimental approaches, the statistical learning paradigm offers a viable tool for studying mental health using multiple imaging, electrophysiological and behavioral techniques capitalizing on all available data.
 Engemann, Raimondo, King, Rohaut, Louppe, Faugeras, Annen, Cassol, Gosseries, Fernandez-Slezak, Laureys, Naccache, Dehaene and Sitt. Robust EEG-based cross-site and cross-protocol classification of states of consciousness (2018). Brain 141 (11), 3179–3192, https://doi.org/10.1093/brain/awy251
 Engemann, DA., Kozynets, O., Sabbagh, D., Lemaitre, G., Varoquaux, G., Liem, F., & Gramfort, A. (2020). Combining magnetoencephalography with MRI enhances learning of surrogate-biomarkers. eLife. 9:e54055. 10.7554/eLife.54055
 Sabbagh, D., Ablin, P., Varoquaux, G., Gramfort, A., & Engemann, DA. (2020). Predictive regression modeling with MEG/EEG: from source power to signals and cognitive states. NeuroImage. https://doi.org/10.1016/j.neuroimage.2020.116893
 Dadi, K., Varoquaux, G., Houenou, J., Bzdok, D., Thirion, B., Engemann, DA. (2020). Beyond brain age: Empirically-derived proxy measures of mental health. bioRiv preprint. https://doi.org/10.1101/2020.08.25.266536
 Banville, H. Chehab, O., Hyvärinen, A., Engemann, DA, Gramfort, A. (2020). Uncovering the structure of clinical EEG signals with self-supervised learning. arXiv preprint. https://arxiv.org/abs/2007.16104