Publications
Manuscripts I got involved into, from latest to oldest. See also Google Scholar.
2023
-
CalibrationBeyond calibration: estimating the grouping loss of modern neural networksAlexandre Perez-Lebel, Marine Le Morvan, and Gaël VaroquauxIn The Eleventh International Conference on Learning Representations – ICLR 2023
Good decision making requires machine-learning models to provide trustworthy confidence scores. To this end, recent work has focused on miscalibration, i.e, the over or under confidence of model scores. Yet, contrary to widespread belief, calibration is not enough: even a classifier with the best possible accuracy and perfect calibration can have confidence scores far from the true posterior probabilities. This is due to the grouping loss, created by samples with the same confidence scores but different true posterior probabilities. Proper scoring rule theory shows that given the calibration loss, the missing piece to characterize individual errors is the grouping loss. While there are many estimators of the calibration loss, none exists for the grouping loss in standard settings. Here, we propose an estimator to approximate the grouping loss. We use it to study modern neural network architectures in vision and NLP. We find that the grouping loss varies markedly across architectures, and that it is a key model-comparison factor across the most accurate, calibrated, models. We also show that distribution shifts lead to high grouping loss.
@inproceedings{Perez-Lebel2023, author = {Perez-Lebel, Alexandre and Le Morvan, Marine and Varoquaux, Gaël}, month = may, title = {Beyond calibration: estimating the grouping loss of modern neural networks}, booktitle = {{The Eleventh International Conference on Learning Representations -- ICLR}}, address = {Kigali, Rwanda}, url = {https://arxiv.org/abs/2210.16315}, pdf = {https://arxiv.org/pdf/2210.16315}, arxivid = {2210.16315}, year = {2023}, bibtex_show = {true}, abbr = {Calibration} }
2022
-
Missing valuesBenchmarking missing-values approaches for predictive models on health databasesGigaScience 2022
As databases grow larger, it becomes harder to fully control their collection, and they frequently come with missing values. These large databases are well suited to train machine learning models, e.g., for forecasting or to extract biomarkers in biomedical settings. Such predictive approaches can use discriminative—rather than generative—modeling and thus open the door to new missing-values strategies. Yet existing empirical evaluations of strategies to handle missing values have focused on inferential statistics.Here we conduct a systematic benchmark of missing-values strategies in predictive models with a focus on large health databases: 4 electronic health record datasets, 1 population brain imaging database, 1 health survey, and 2 intensive care surveys. Using gradient-boosted trees, we compare native support for missing values with simple and state-of-the-art imputation prior to learning. We investigate prediction accuracy and computational time. For prediction after imputation, we find that adding an indicator to express which values have been imputed is important, suggesting that the data are missing not at random. Elaborate missing-values imputation can improve prediction compared to simple strategies but requires longer computational time on large data. Learning trees that model missing values—with missing incorporated attribute—leads to robust, fast, and well-performing predictive modeling.Native support for missing values in supervised machine learning predicts better than state-of-the-art imputation with much less computational cost. When using imputation, it is important to add indicator columns expressing which values have been imputed.
@article{Perez-Lebel2022, author = {Perez-Lebel, Alexandre and Varoquaux, Gaël and Le Morvan, Marine and Josse, Julie and Poline, Jean-Baptiste}, title = {{Benchmarking missing-values approaches for predictive models on health databases}}, journal = {GigaScience}, volume = {11}, year = {2022}, month = apr, issn = {2047-217X}, doi = {10.1093/gigascience/giac013}, url = {https://doi.org/10.1093/gigascience/giac013}, note = {giac013}, eprint = {https://academic.oup.com/gigascience/article-pdf/doi/10.1093/gigascience/giac013/43384549/giac013.pdf}, pdf = {https://academic.oup.com/gigascience/article-pdf/doi/10.1093/gigascience/giac013/43384549/giac013.pdf}, bibtex_show = {true}, abbr = {Missing values}, code = {https://github.com/aperezlebel/benchmark_mv_approaches} }
2020
-
fMRIVariability in the analysis of a single neuroimaging dataset by many teams
Click to expand 197 authors
Rotem Botvinik-Nezer, Felix Holzmeister, Colin F Camerer, Anna Dreber, Juergen Huber, Magnus Johannesson, Michael Kirchler, Roni Iwanir, Jeanette A Mumford, R Alison Adcock, Paolo Avesani, Blazej M Baczkowski, Aahana Bajracharya, Leah Bakst, Sheryl Ball, Marco Barilari, Nadège Bault, Derek Beaton, Julia Beitner, Roland G Benoit, Ruud M W J Berkers, Jamil P Bhanji, Bharat B Biswal, Sebastian Bobadilla-Suarez, Tiago Bortolini, Katherine L Bottenhorn, Alexander Bowring, Senne Braem, Hayley R Brooks, Emily G Brudner, Cristian B Calderon, Julia A Camilleri, Jaime J Castrellon, Luca Cecchetti, Edna C Cieslik, Zachary J Cole, Olivier Collignon, Robert W Cox, William A Cunningham, Stefan Czoschke, Kamalaker Dadi, Charles P Davis, Alberto De Luca, Mauricio R Delgado, Lysia Demetriou, Jeffrey B Dennison, Xin Di, Erin W Dickie, Ekaterina Dobryakova, Claire L Donnat, Juergen Dukart, Niall W Duncan, Joke Durnez, Amr Eed, Simon B Eickhoff, Andrew Erhart, Laura Fontanesi, G Matthew Fricke, Shiguang Fu, Adriana Galván, Remi Gau, Sarah Genon, Tristan Glatard, Enrico Glerean, Jelle J Goeman, Sergej A E Golowin, Carlos González-García, Krzysztof J Gorgolewski, Cheryl L Grady, Mikella A Green, João F Guassi Moreira, Olivia Guest, Shabnam Hakimi, J Paul Hamilton, Roeland Hancock, Giacomo Handjaras, Bronson B Harry, Colin Hawco, Peer Herholz, Gabrielle Herman, Stephan Heunis, Felix Hoffstaedter, Jeremy Hogeveen, Susan Holmes, Chuan-Peng Hu, Scott A Huettel, Matthew E Hughes, Vittorio Iacovella, Alexandru D Iordan, Peder M Isager, Ayse I Isik, Andrew Jahn, Matthew R Johnson, Tom Johnstone, Michael J E Joseph, Anthony C Juliano, Joseph W Kable, Michalis Kassinopoulos, Cemal Koba, Xiang-Zhen Kong, Timothy R Koscik, Nuri Erkut Kucukboyaci, Brice A Kuhl, Sebastian Kupek, Angela R Laird, Claus Lamm, Robert Langner, Nina Lauharatanahirun, Hongmi Lee, Sangil Lee, Alexander Leemans, Andrea Leo, Elise Lesage, Flora Li, Monica Y C Li, Phui Cheng Lim, Evan N Lintz, Schuyler W Liphardt, Annabel B Losecaat Vermeer, Bradley C Love, Michael L Mack, Norberto Malpica, Theo Marins, Camille Maumet, Kelsey McDonald, Joseph T McGuire, Helena Melero, Adriana S Méndez Leal, Benjamin Meyer, Kristin N Meyer, Glad Mihai, Georgios D Mitsis, Jorge Moll, Dylan M Nielson, Gustav Nilsonne, Michael P Notter, Emanuele Olivetti, Adrian I Onicas, Paolo Papale, Kaustubh R Patil, Jonathan E Peelle, Alexandre Perez, Doris Pischedda, Jean-Baptiste Poline, Yanina Prystauka, Shruti Ray, Patricia A Reuter-Lorenz, Richard C Reynolds, Emiliano Ricciardi, Jenny R Rieck, Anais M Rodriguez-Thompson, Anthony Romyn, Taylor Salo, Gregory R Samanez-Larkin, Emilio Sanz-Morales, Margaret L Schlichting, Douglas H Schultz, Qiang Shen, Margaret A Sheridan, Jennifer A Silvers, Kenny Skagerlund, Alec Smith, David V Smith, Peter Sokol-Hessner, Simon R Steinkamp, Sarah M Tashjian, Bertrand Thirion, John N Thorp, Gustav Tinghög, Loreen Tisdall, Steven H Tompson, Claudio Toro-Serey, Juan Jesus Torre Tresols, Leonardo Tozzi, Vuong Truong, Luca Turella, Anna E van ‘t Veer, Tom Verguts, Jean M Vettel, Sagana Vijayarajah, Khoi Vo, Matthew B Wall, Wouter D Weeda, Susanne Weis, David J White, David Wisniewski, Alba Xifra-Porxas, Emily A Yearling, Sangsuk Yoon, Rui Yuan, Kenneth S L Yuen, Lei Zhang, Xu Zhang, Joshua E Zosky, Thomas E Nichols, Russell A Poldrack, and Tom SchonbergNature 2020Data analysis workflows in many scientific domains have become increasingly complex and flexible. Here we assess the effect of this flexibility on the results of functional magnetic resonance imaging by asking 70 independent teams to analyse the same dataset, testing the same 9 ex-ante hypotheses1. The flexibility of analytical approaches is exemplified by the fact that no two teams chose identical workflows to analyse the data. This flexibility resulted in sizeable variation in the results of hypothesis tests, even for teams whose statistical maps were highly correlated at intermediate stages of the analysis pipeline. Variation in reported results was related to several aspects of analysis methodology. Notably, a meta-analytical approach that aggregated information across teams yielded a significant consensus in activated regions. Furthermore, prediction markets of researchers in the field revealed an overestimation of the likelihood of significant findings, even by researchers with direct knowledge of the dataset2–5. Our findings show that analytical flexibility can have substantial effects on scientific conclusions, and identify factors that may be related to variability in the analysis of functional magnetic resonance imaging. The results emphasize the importance of validating and sharing complex analysis workflows, and demonstrate the need for performing and reporting multiple analyses of the same data. Potential approaches that could be used to mitigate issues related to analytical variability are discussed.
@article{Botvinik-Nezer2020, author = {Botvinik-Nezer, Rotem and Holzmeister, Felix and Camerer, Colin F and Dreber, Anna and Huber, Juergen and Johannesson, Magnus and Kirchler, Michael and Iwanir, Roni and Mumford, Jeanette A and Adcock, R Alison and Avesani, Paolo and Baczkowski, Blazej M and Bajracharya, Aahana and Bakst, Leah and Ball, Sheryl and Barilari, Marco and Bault, Nad{\`{e}}ge and Beaton, Derek and Beitner, Julia and Benoit, Roland G and Berkers, Ruud M W J and Bhanji, Jamil P and Biswal, Bharat B and Bobadilla-Suarez, Sebastian and Bortolini, Tiago and Bottenhorn, Katherine L and Bowring, Alexander and Braem, Senne and Brooks, Hayley R and Brudner, Emily G and Calderon, Cristian B and Camilleri, Julia A and Castrellon, Jaime J and Cecchetti, Luca and Cieslik, Edna C and Cole, Zachary J and Collignon, Olivier and Cox, Robert W and Cunningham, William A and Czoschke, Stefan and Dadi, Kamalaker and Davis, Charles P and Luca, Alberto De and Delgado, Mauricio R and Demetriou, Lysia and Dennison, Jeffrey B and Di, Xin and Dickie, Erin W and Dobryakova, Ekaterina and Donnat, Claire L and Dukart, Juergen and Duncan, Niall W and Durnez, Joke and Eed, Amr and Eickhoff, Simon B and Erhart, Andrew and Fontanesi, Laura and Fricke, G Matthew and Fu, Shiguang and Galv{\'{a}}n, Adriana and Gau, Remi and Genon, Sarah and Glatard, Tristan and Glerean, Enrico and Goeman, Jelle J and Golowin, Sergej A E and Gonz{\'{a}}lez-Garc{\'{i}}a, Carlos and Gorgolewski, Krzysztof J and Grady, Cheryl L and Green, Mikella A and {Guassi Moreira}, Jo{\~{a}}o F and Guest, Olivia and Hakimi, Shabnam and Hamilton, J Paul and Hancock, Roeland and Handjaras, Giacomo and Harry, Bronson B and Hawco, Colin and Herholz, Peer and Herman, Gabrielle and Heunis, Stephan and Hoffstaedter, Felix and Hogeveen, Jeremy and Holmes, Susan and Hu, Chuan-Peng and Huettel, Scott A and Hughes, Matthew E and Iacovella, Vittorio and Iordan, Alexandru D and Isager, Peder M and Isik, Ayse I and Jahn, Andrew and Johnson, Matthew R and Johnstone, Tom and Joseph, Michael J E and Juliano, Anthony C and Kable, Joseph W and Kassinopoulos, Michalis and Koba, Cemal and Kong, Xiang-Zhen and Koscik, Timothy R and Kucukboyaci, Nuri Erkut and Kuhl, Brice A and Kupek, Sebastian and Laird, Angela R and Lamm, Claus and Langner, Robert and Lauharatanahirun, Nina and Lee, Hongmi and Lee, Sangil and Leemans, Alexander and Leo, Andrea and Lesage, Elise and Li, Flora and Li, Monica Y C and Lim, Phui Cheng and Lintz, Evan N and Liphardt, Schuyler W and {Losecaat Vermeer}, Annabel B and Love, Bradley C and Mack, Michael L and Malpica, Norberto and Marins, Theo and Maumet, Camille and McDonald, Kelsey and McGuire, Joseph T and Melero, Helena and {M{\'{e}}ndez Leal}, Adriana S and Meyer, Benjamin and Meyer, Kristin N and Mihai, Glad and Mitsis, Georgios D and Moll, Jorge and Nielson, Dylan M and Nilsonne, Gustav and Notter, Michael P and Olivetti, Emanuele and Onicas, Adrian I and Papale, Paolo and Patil, Kaustubh R and Peelle, Jonathan E and Perez, Alexandre and Pischedda, Doris and Poline, Jean-Baptiste and Prystauka, Yanina and Ray, Shruti and Reuter-Lorenz, Patricia A and Reynolds, Richard C and Ricciardi, Emiliano and Rieck, Jenny R and Rodriguez-Thompson, Anais M and Romyn, Anthony and Salo, Taylor and Samanez-Larkin, Gregory R and Sanz-Morales, Emilio and Schlichting, Margaret L and Schultz, Douglas H and Shen, Qiang and Sheridan, Margaret A and Silvers, Jennifer A and Skagerlund, Kenny and Smith, Alec and Smith, David V and Sokol-Hessner, Peter and Steinkamp, Simon R and Tashjian, Sarah M and Thirion, Bertrand and Thorp, John N and Tingh{\"{o}}g, Gustav and Tisdall, Loreen and Tompson, Steven H and Toro-Serey, Claudio and {Torre Tresols}, Juan Jesus and Tozzi, Leonardo and Truong, Vuong and Turella, Luca and {van ‘t Veer}, Anna E and Verguts, Tom and Vettel, Jean M and Vijayarajah, Sagana and Vo, Khoi and Wall, Matthew B and Weeda, Wouter D and Weis, Susanne and White, David J and Wisniewski, David and Xifra-Porxas, Alba and Yearling, Emily A and Yoon, Sangsuk and Yuan, Rui and Yuen, Kenneth S L and Zhang, Lei and Zhang, Xu and Zosky, Joshua E and Nichols, Thomas E and Poldrack, Russell A and Schonberg, Tom}, doi = {10.1038/s41586-020-2314-9}, issn = {1476-4687}, journal = {Nature}, number = {7810}, pages = {84--88}, title = {{Variability in the analysis of a single neuroimaging dataset by many teams}}, url = {https://doi.org/10.1038/s41586-020-2314-9}, volume = {582}, year = {2020}, bibtex_show = {true}, abbr = {fMRI}, website = {https://www.narps.info/} }