Algorithms for Data Science
CS 2243, Fall 2024
Course Details
This is a graduate topics class on algorithmic challenges in modern machine learning and data science. We will touch upon a number of domains (generative modeling, deep learning theory, robust statistics, Bayesian inference) and frameworks for algorithm design (spectral/tensor methods, gradient descent, message passing, MCMC, diffusions), focusing on provable guarantees. The theory draws upon a range of techniques from stochastic calculus, harmonic analysis, statistical physics, algebra, and beyond. We will also explore the myriad modeling challenges in building this theory and prominent paradigms (averagecase complexity, smoothed complexity, oracles) for going beyond traditional worstcase analysis.
Time/Location: MW 2:153:30, SEC 1.413
Instructor: Sitan Chen (sitan@seas.harvard.edu)
Office hours: SEC 3.325, Monday 5:306:30
Teaching Fellows:
 Weiyuan Gong (wgong@g.harvard.edu), Office hours: Thursday 1112, SEC 3.317
 Marvin Li (marvinli@college.harvard.edu), Office hours: Friday 45, SEC SEC 3.317
Canvas (for lecture recordings only)
Course Policies: See syllabus for detailed overview.
Announcements
Pset 4 due November 11, 11:59pm.
Pset 3 due October 29, 11:59pm.
Pset 2 due October 9, 11:59pm.
Pset 1 due September 20, 11:59pm.
Notifications of course placement will be sent out by Aug 31 at the latest.
Pset 0 and course application form due September 3, 4:59pm
Miscellaneous Materials
 Helpful references:
 Roman Vershynin. HighDimensional Probability (primarily Chapters 12, though 34 are also helpful)
 Compilation of recurring notation used in the course
 Some relevant past courses:
 Ankur Moitra. Algorithmic Aspects of Machine Learning
 Tselil Schramm. The SumofSquares Algorithmic Paradigm in Statistics
 Sam Hopkins. The Sum of Squares Method
 Prasad Raghavendra. Efficient Algorithms and Computational Complexity in Statistics
 Sanjeev Arora. Theory of Deep Learning
 Song Mei. Mean Field Asymptotics in Statistical Learning
 Lenka Zdeborova & Florent Krzakala. Statistical Physics For Optimization and Learning
 Ahmed ElAlaoui. Topics in HighDimensional Inference
 Subhabrata Sen. STAT 217: Topics in HighDimensional Statistics  Methods from Statistical Physics.
 Kevin Tian. Continuous Algorithms.
 Mark Sellke. Random HighDimensional Optimization: Landscapes and Algorithmic Barriers.
 Sam Hopkins and Costis Daskalakis. Algorithmic Statistics
Lectures
Date  Topic  Lecture Notes  Resources 

Sep 4  Logistics, vignette: diffraction limit and learning theory  instructor notes, slides 

Sep 9  Tensors I: Jennrich's algorithm, applications (superresolution, Gaussian mixtures, independent component analysis)  instructor notes, slides, scribe notes 

Sep 11  Tensors II: Iterative methods (gradient descent, tensor power method, alternating least squares)  instructor notes, slides, scribe notes 

Sep 16  Tensors III: overcomplete tensor decomposition via smoothed analysis  instructor notes, slides, scribe notes 

Sep 18  Sumofsquares I: pseudodistributions, sumofsquares proofs  instructor notes, slides 

Sep 23  Sumofsquares II: robust regression and certifiable hypercontractivity  instructor notes, scribe notes 

Sep 25  Sumofsquares III: mixtures of spherical Gaussians, entropy maximization  instructor notes, slides, scribe notes 

Sep 30  Supervised learning I: PAC learning, Fourier analysis, semirandom noise  slides, scribe notes 

Oct 2  Supervised learning II: noise sensitivity, onehidden layer MLPs, Hermite analysis, CSQ algorithms  instructor notes, slides, scribe notes 

Oct 7  Supervised learning III: filtered PCA  instructor notes, slides, scribe notes 

Oct 9  Supervised learning IV: linearized networks  instructor notes, slides, scribe notes 

Oct 16  Supervised learning V: mean field limit  instructor notes, slides, scribe notes 

Oct 21  Computational complexity I: hardness basics, SQ lower bounds  instructor notes, slides, scribe notes 

Oct 23  Computational complexity II: SQ dimension (application to MLPs), statistical dimension (applications to Gaussian mixtures, generative models)  instructor notes, slides, scribe notes 

Oct 28  Computational complexity III: cryptographic lower bounds, continuous LWE  instructor notes, slides 

Oct 30  Computational complexity IV: continuous LWE (cont'd)  instructor notes (board talk only)  
Nov 4  Statistical physics I: intro to graphical models and variational inference  instructor notes (board talk only), scribe notes 

Nov 6  Statistical physics II: belief propagation, Bethe free energy  instructor notes, slides, scribe notes 

Nov 11  Statistical physics III: derivation of AMP, state evolution, Z2 synchronization  instructor notes, slides, scribe notes 
