Algorithms for Data Science CS 2243, Fall 2024
Course Details
This is a graduate topics class on algorithmic challenges in modern machine learning and data science. We will touch upon a number of domains (generative modeling, deep learning theory, robust statistics, Bayesian inference) and frameworks for algorithm design (spectral/tensor methods, gradient descent, message passing, MCMC, diffusions), focusing on provable guarantees. The theory draws upon a range of techniques from stochastic calculus, harmonic analysis, statistical physics, algebra, and beyond. We will also explore the myriad modeling challenges in building this theory and prominent paradigms (average-case complexity, smoothed complexity, oracles) for going beyond traditional worst-case analysis.
Time/Location: MW 2:15-3:30, SEC 1.413
Instructor: Sitan Chen (sitan@seas.harvard.edu) Office hours: SEC 3.325, Monday 5-6
Teaching Fellows:
- Weiyuan Gong (wgong@g.harvard.edu), Office hours: TBD
- Marvin Li (marvinli@college.harvard.edu), Office hours: TBD
Canvas (for lecture recordings only)
Course Policies: See syllabus for detailed overview.
Announcements
Notifications of course placement will be sent out by Aug 31 at the latest.
Pset 0 and course application form due September 3, 4:59pm
Assignments
Assignments will be posted below and in the course Overleaf when they become available.
- Pset 0 (due 09/03 at 4:59pm): PDF
Miscellaneous Materials
- Helpful references:
- Roman Vershynin. High-Dimensional Probability (primarily Chapters 1-2, though 3-4 are also helpful)
- Compilation of recurring notation used in the course
- Some relevant past courses:
- Ankur Moitra. Algorithmic Aspects of Machine Learning
- Tselil Schramm. The Sum-of-Squares Algorithmic Paradigm in Statistics
- Sam Hopkins. The Sum of Squares Method
- Prasad Raghavendra. Efficient Algorithms and Computational Complexity in Statistics
- Sanjeev Arora. Theory of Deep Learning
- Song Mei. Mean Field Asymptotics in Statistical Learning
- Lenka Zdeborova & Florent Krzakala. Statistical Physics For Optimization and Learning
- Ahmed El-Alaoui. Topics in High-Dimensional Inference
- Subhabrata Sen. STAT 217: Topics in High-Dimensional Statistics - Methods from Statistical Physics.
- Kevin Tian. Continuous Algorithms.
- Mark Sellke. Random High-Dimensional Optimization: Landscapes and Algorithmic Barriers.
- Sam Hopkins and Costis Daskalakis. Algorithmic Statistics
Lectures
Date | Topic | Lecture Notes | Resources |
---|---|---|---|
Sep 4 | Logistics, vignette: diffraction limit and learning theory | instructor notes, slides |
|