Algorithms for Data Science
CS 224, Fall 2023
Course Details
This is a graduatelevel topics class on algorithmic challenges arising in modern machine learning and data science more broadly. The course will touch upon a number of wellstudied problems (generative modeling, deep learning theory, adversarial robustness, inverse problems, inference) and frameworks for algorithm design (gradient descent, spectral methods, tensor/moment methods, message passing, convex programming hierarchies, Markov chains). As we will see, proving rigorous guarantees for these problems often draws upon a wide range of techniques from stochastic calculus, harmonic analysis, random matrix theory, algebra, and statistical physics. We will also explore the myriad modeling challenges that go into building theory for ML and discuss prominent paradigms (semirandom models, smoothed analysis, oracles) for going beyond traditional worstcase analysis.
Time/Location: MW 2:153:30, SEC LL2224
Instructor: Sitan Chen (sitan@seas.harvard.edu)
Office hours: SEC 3.325, Thursday 45
Teaching Fellows:
 David Brewster (dbrewster@g.harvard.edu), Office hours: MaxwellDworkin 123, Tuesday 56
 Depen Morwani (dmorwani@g.harvard.edu), Office hours: SEC 1.404, Wednesday 1112
 Rosie Zhao (rosiezhao@g.harvard.edu), Office hours: SEC 2.330, Monday 1112
Canvas (for lecture recordings and announcements)
Course Policies: See syllabus for detailed overview.
Student projects
Yuyang Zhang: Difficulties and Solutions in Learning Latent Variable Models
Prashanth Amireddy: Unsupervised Learning via Algebraic Circuit Reconstruction
Itai Shapira: Edge of Stability Training Dynamics
Alan Chung, Ben Schiffer, Kevin Luo: NonAsymptotic Convergence of Diffusion Models
Alexander Cai, Oliver Cheng, Tianze Jiang: Meanfield approximation and correlation rounding
Shi Feng, Xiaodong Yang: Posterior Sampling via Diffusion Processes
Arif Kerem Dayi: Leap Complexity: How SGD Exploits Hierarchical Structure in Functions to Learn Efficiently
Announcements
10/24: Pset 4 (final) is out, see below.
10/10: Pset 3 is out, see below.
10/2: See canvas announcement for information about final project.
9/25: Pset 2 is out, see below.
9/11: Pset 1 is out, see below.
See Canvas for link to sign up for pset office hour preferences
Notifications of course placement will be sent out Thursday afternoon, Aug 31.
Assignments
Miscellaneous Materials
 Helpful references:
 Roman Vershynin. HighDimensional Probability (Chapters 12, 34 also helpful)
 Some relevant past courses:
 Ankur Moitra. Algorithmic Aspects of Machine Learning
 Tselil Schramm. The SumofSquares Algorithmic Paradigm in Statistics
 Prasad Raghavendra. Efficient Algorithms and Computational Complexity in Statistics
 Sanjeev Arora. Theory of Deep Learning
 Song Mei. Mean Field Asymptotics in Statistical Learning
 Lenka Zdeborova & Florent Krzakala. Statistical Physics For Optimization and Learning
 Ahmed ElAlaoui. Topics in HighDimensional Inference
 Subhabrata Sen. STAT 217: Topics in HighDimensional Statistics  Methods from Statistical Physics.
Lectures
Date  Topic  Lecture Notes  Resources 

Sep 6  Logistics, vignette: diffraction limit and learning theory  instructor notes, slides 

Sep 11  Tensors I: Jennrich's algorithm, applications (superresolution, Gaussian mixtures, independent component analysis)  instructor notes, slides, scribe notes 

Sep 13  Tensors II: Iterative methods (gradient descent, tensor power method, alternating least squares)  instructor notes, slides, scribe notes 1, scribe notes 2 

Sep 18  Tensors III: overcomplete tensor decomposition via smoothed analysis  instructor notes, slides, scribe notes 1, scribe notes 2 

Sep 20  Sumofsquares I: pseudodistributions, application to robust regression  instructor notes, slides, scribe notes 

Sep 25  Sumofsquares II: mixtures of spherical Gaussians, certifiable hypercontractivity  instructor notes, slides, scribe notes 

Sep 27  Sumofsquares III: highentropy constraints, noisy orthogonal tensor decomposition  instructor notes (board talk only), scribe notes 

Oct 2  Sumofsquares IV: rounding SoS relaxations with Jennrich's, overcomplete tensor decomposition  instructor notes, slides, scribe notes 

Oct 4  Robust statistics I: motivation, iterative filtering  instructor notes, slides, scribe notes 

Oct 11  Robust statistics II: listdecodable learning, subspace isotropic filtering  instructor notes (board talk only), supplemental instructor notes [pdf], scribe notes 

Oct 16  Robust statistics III: learning halfspaces under Massart noise  instructor notes, slides, scribe notes 

Oct 18  Supervised learning I: PAC basics, lowdegree algorithm, L1/L2 polynomial regression, polynomial threshold functions  slides, scribe notes 

Oct 23  Supervised learning II: noise sensitivity, onehidden layer MLPs, Hermite analysis, CSQ algorithms  instructor notes, slides, scribe notes 

Oct 25  Supervised learning III: filtered PCA  instructor notes, slides, scribe notes 

Oct 30  Supervised learning IV: linearized networks  instructor notes, slides, scribe notes 

Nov 1  Supervised learning V: mean field limit  instructor notes, slides, scribe notes 

Nov 6  Computational complexity I: hardness basics, SQ lower bounds, noisy parity  instructor notes, slides, scribe notes 

Nov 8  Computational complexity II: SQ dimension (application to MLPs), statistical dimension (applications to Gaussian mixtures, generative models)  instructor notes, slides, scribe notes 

Nov 13  Computational complexity III: cryptographic lower bounds, DanielyVardi lifting  instructor notes, slides, scribe notes 

Nov 15  Statistical physics I: intro to graphical models and variational inference  instructor notes (board talk only), scribe notes 

Nov 20  Statistical physics II: belief propagation, Bethe free energy  instructor notes, slides, supplemental instructor notes [pdf], scribe notes 

Nov 27  Statistical physics III: derivation of AMP, state evolution, Z2 synchronization  instructor notes, slides, scribe notes 

Nov 29  Statistical physics IV: optimality of AMP for Z2 synchronization  instructor notes, slides, scribe notes 

Nov 29  Diffusion models  instructor notes, slides, scribe notes 