Algorithms for Data Science
CS 224, Fall 2023
Course Details
This is a graduatelevel topics class on algorithmic challenges arising in modern machine learning and data science more broadly. The course will touch upon a number of wellstudied problems (generative modeling, deep learning theory, adversarial robustness, inverse problems, inference) and frameworks for algorithm design (gradient descent, spectral methods, tensor/moment methods, message passing, convex programming hierarchies, Markov chains). As we will see, proving rigorous guarantees for these problems often draws upon a wide range of techniques from stochastic calculus, harmonic analysis, random matrix theory, algebra, and statistical physics. We will also explore the myriad modeling challenges that go into building theory for ML and discuss prominent paradigms (semirandom models, smoothed analysis, oracles) for going beyond traditional worstcase analysis.
Time/Location: MW 2:153:30, SEC LL2224
Instructor: Sitan Chen (sitan@seas.harvard.edu)
Office hours: SEC 3.325, Thursday 45
Teaching Fellows:
 David Brewster (dbrewster@g.harvard.edu), Office hours: MaxwellDworkin 123, Tuesday 56
 Depen Morwani (dmorwani@g.harvard.edu), Office hours: SEC 1.404, Wednesday 1112
 Rosie Zhao (rosiezhao@g.harvard.edu), Office hours: SEC 2.330, Monday 1112
Canvas (for lecture recordings and announcements)
Course Policies: See syllabus for detailed overview.
Announcements
Pset 1 is out, see below.
See Canvas for link to sign up for pset office hour preferences
Notifications of course placement will be sent out Thursday afternoon, Aug 31.
Pset 0 and course application form due August 29, 11:59pm
Miscellaneous Materials
 Helpful references:
 Roman Vershynin. HighDimensional Probability (Chapters 12, 34 also helpful)
 Compilation of recurring notation used in the course
 Some relevant past courses:
 Ankur Moitra. Algorithmic Aspects of Machine Learning
 Tselil Schramm. The SumofSquares Algorithmic Paradigm in Statistics
 Prasad Raghavendra. Efficient Algorithms and Computational Complexity in Statistics
 Sanjeev Arora. Theory of Deep Learning
 Song Mei. Mean Field Asymptotics in Statistical Learning
 Lenka Zdeborova & Florent Krzakala. Statistical Physics For Optimization and Learning
 Ahmed ElAlaoui. Topics in HighDimensional Inference
 Subhabrata Sen. STAT 217: Topics in HighDimensional Statistics  Methods from Statistical Physics.
Lectures
Date  Topic  Lecture Notes  Resources 

Sep 6  Logistics, vignette: diffraction limit and learning theory  instructor notes, slides 

Sep 11  Tensors I: Jennrich's algorithm, applications (superresolution, Gaussian mixtures, independent component analysis)  instructor notes, slides, scribe notes 

Sep 13  Tensors II: Iterative methods (gradient descent, tensor power method, alternating least squares)  instructor notes, slides, scribe notes 

Sep 18  Tensors III: overcomplete tensor decomposition via smoothed analysis  instructor notes, slides 

Sep 20  Sumofsquares I: pseudodistributions, application to robust regression  instructor notes, slides 
