TY - JOUR
T1 - Multiple Holdouts With Stability
T2 - Improving the Generalizability of Machine Learning Analyses of Brain–Behavior Relationships
AU - NeuroScience in Psychiatry Network (NSPN) Consortium
AU - Mihalik, Agoston
AU - Ferreira, Fabio S.
AU - Moutoussis, Michael
AU - Ziegler, Gabriel
AU - Adams, Rick A.
AU - Rosa, Maria J.
AU - Prabhu, Gita
AU - de Oliveira, Leticia
AU - Pereira, Mirtes
AU - Bullmore, Edward T.
AU - Fonagy, Peter
AU - Goodyer, Ian M.
AU - Jones, Peter B.
AU - Hauser, Tobias
AU - Neufeld, Sharon
AU - Romero-Garcia, Rafael
AU - St Clair, Michelle
AU - Vértes, Petra E.
AU - Whitaker, Kirstie
AU - Inkster, Becky
AU - Ooi, Cinly
AU - Toseeb, Umar
AU - Widmer, Barry
AU - Bhatti, Junaid
AU - Villis, Laura
AU - Alrumaithi, Ayesha
AU - Birt, Sarah
AU - Bowler, Aislinn
AU - Cleridou, Kalia
AU - Dadabhoy, Hina
AU - Davies, Emma
AU - Firkins, Ashlyn
AU - Granville, Sian
AU - Harding, Elizabeth
AU - Hopkins, Alexandra
AU - Isaacs, Daniel
AU - King, Janchai
AU - Kokorikou, Danae
AU - Maurice, Christina
AU - McIntosh, Cleo
AU - Memarzia, Jessica
AU - Mills, Harriet
AU - O'Donnell, Ciara
AU - Pantaleone, Sara
AU - Fearon, Pasco
AU - Suckling, John
AU - van Harmelen, Anne Laura
AU - Kievit, Rogier
AU - Shawe-Taylor, John
AU - Dolan, Raymond
AU - Mourao-Miranda, Janaina
PY - 2020/2/15
Y1 - 2020/2/15
N2 - Background: In 2009, the National Institute of Mental Health launched the Research Domain Criteria, an attempt to move beyond diagnostic categories and ground psychiatry within neurobiological constructs that combine different levels of measures (e.g., brain imaging and behavior). Statistical methods that can integrate such multimodal data, however, are often vulnerable to overfitting, poor generalization, and difficulties in interpreting the results. Methods: We propose an innovative machine learning framework combining multiple holdouts and a stability criterion with regularized multivariate techniques, such as sparse partial least squares and kernel canonical correlation analysis, for identifying hidden dimensions of cross-modality relationships. To illustrate the approach, we investigated structural brain–behavior associations in an extensively phenotyped developmental sample of 345 participants (312 healthy and 33 with clinical depression). The brain data consisted of whole-brain voxel-based gray matter volumes, and the behavioral data included item-level self-report questionnaires and IQ and demographic measures. Results: Both sparse partial least squares and kernel canonical correlation analysis captured two hidden dimensions of brain–behavior relationships: one related to age and drinking and the other one related to depression. The applied machine learning framework indicates that these results are stable and generalize well to new data. Indeed, the identified brain–behavior associations are in agreement with previous findings in the literature concerning age, alcohol use, and depression-related changes in brain volume. Conclusions: Multivariate techniques (such as sparse partial least squares and kernel canonical correlation analysis) embedded in our novel framework are promising tools to link behavior and/or symptoms to neurobiology and thus have great potential to contribute to a biologically grounded definition of psychiatric disorders.
AB - Background: In 2009, the National Institute of Mental Health launched the Research Domain Criteria, an attempt to move beyond diagnostic categories and ground psychiatry within neurobiological constructs that combine different levels of measures (e.g., brain imaging and behavior). Statistical methods that can integrate such multimodal data, however, are often vulnerable to overfitting, poor generalization, and difficulties in interpreting the results. Methods: We propose an innovative machine learning framework combining multiple holdouts and a stability criterion with regularized multivariate techniques, such as sparse partial least squares and kernel canonical correlation analysis, for identifying hidden dimensions of cross-modality relationships. To illustrate the approach, we investigated structural brain–behavior associations in an extensively phenotyped developmental sample of 345 participants (312 healthy and 33 with clinical depression). The brain data consisted of whole-brain voxel-based gray matter volumes, and the behavioral data included item-level self-report questionnaires and IQ and demographic measures. Results: Both sparse partial least squares and kernel canonical correlation analysis captured two hidden dimensions of brain–behavior relationships: one related to age and drinking and the other one related to depression. The applied machine learning framework indicates that these results are stable and generalize well to new data. Indeed, the identified brain–behavior associations are in agreement with previous findings in the literature concerning age, alcohol use, and depression-related changes in brain volume. Conclusions: Multivariate techniques (such as sparse partial least squares and kernel canonical correlation analysis) embedded in our novel framework are promising tools to link behavior and/or symptoms to neurobiology and thus have great potential to contribute to a biologically grounded definition of psychiatric disorders.
KW - Adolescence
KW - Brain–behavior relationship
KW - Depression
KW - Framework
KW - RDoC
KW - SPLS
UR - http://www.scopus.com/inward/record.url?scp=85077437819&partnerID=8YFLogxK
U2 - 10.1016/j.biopsych.2019.12.001
DO - 10.1016/j.biopsych.2019.12.001
M3 - Article
AN - SCOPUS:85077437819
SN - 0006-3223
VL - 87
SP - 368
EP - 376
JO - Biological Psychiatry
JF - Biological Psychiatry
IS - 4
ER -