Reproducibility in data science:

statistical methods and applications


The goal of this fast-paced course is to expose PhD-level statistics and machine learning students to current research topics in statistical inference for large-scale data sets, focusing on methods with finite-sample frequentist guarantees.
This work is motivated by the growing reliance of many applied fields on the automatic analysis of large amounts of data in order to make scientific discoveries and inform high-stakes decisions.
In particular, there is growing awareness of a widespread reproducibility crisis in science, and novel statistical methods are needed to ensure that reported discoveries are reproducible and are not spurious discoveries resulting from the multiple-comparisons problem (“data snooping”).
We will begin by introducing the frequentist multiple hypothesis testing problem and exploring a variety of general methods for addressing it.
Next, we will frame the model selection problem as a multiple hypothesis testing problem and explore some of the inferential challenges and recent solutions in this setting.
We will conclude by exploring how conditional independence testing relates to causality and discussing how to calibrate arbitrary machine learning algorithms to ensure valid predictive inference.


Chiara Sabatti (Department of Statistics, Stanford University, USA)

Stephen Bates (Department of Statistics, Stanford University, USA)
Eugene Katsevich (Carnegie Mellon University, USA)

Morning: 3 hours/day lectures
Afternoon: 2 hours/day supervised tutorials as well as individual and team work.

Moreover, there will be a poster session, where participants, upon previous request, may present their research. A welcome cocktail will be offered during the poster session. More detailed info to follow.

Room and board
Accommodation is included in the registration fee.
The students will be hosted at the Guest House of Villa del Grumello and at the Ostello Bello.
The organizing committee will take care of the reservation.
Working days’ lunches are included in the registration fees.
Attendance and final certificate
Full attendance of the activities of the summer school is mandatory for the participants.
Subject to a positive participation to the program, an attendance certificate will be awarded by Università Bocconi, mentioning that the 2020 edition of the Summer School is offered in collaboration with University of Oxford and Imperial College London.