Algorithms for Data Science
CS 2243, Fall 2024

Course Details

This is a graduate topics class on algorithmic challenges in modern machine learning and data science. We will touch upon a number of domains (generative modeling, deep learning theory, robust statistics, Bayesian inference) and frameworks for algorithm design (spectral/tensor methods, gradient descent, message passing, MCMC, diffusions), focusing on provable guarantees. The theory draws upon a range of techniques from stochastic calculus, harmonic analysis, statistical physics, algebra, and beyond. We will also explore the myriad modeling challenges in building this theory and prominent paradigms (average-case complexity, smoothed complexity, oracles) for going beyond traditional worst-case analysis.

Time/Location: MW 2:15-3:30, SEC 1.413

Instructor: Sitan Chen (sitan@seas.harvard.edu)
Office hours: SEC 3.325, Monday 5:30-6:30

Teaching Fellows:

  • Weiyuan Gong (wgong@g.harvard.edu), Office hours: Thursday 11-12, SEC 3.317
  • Marvin Li (marvinli@college.harvard.edu), Office hours: Friday 4-5, SEC SEC 3.317
Pset office hour: Tuesday 3:15-5:15pm, SEC 3.317

Canvas (for lecture recordings only)

Gradescope

Ed (discussion)

Course Policies: See syllabus for detailed overview.


Announcements

  • Pset 4 due November 11, 11:59pm.

  • Pset 3 due October 29, 11:59pm.

  • Pset 2 due October 9, 11:59pm.

  • Pset 1 due September 20, 11:59pm.

  • Notifications of course placement will be sent out by Aug 31 at the latest.

  • Pset 0 and course application form due September 3, 4:59pm


Assignments

Assignments will be posted below and in the course Overleaf when they become available.

  • Pset 0 (due 09/03 at 4:59pm): PDF
  • Pset 1 (due 09/20 at 11:59pm): PDF
  • Pset 2 (due 10/09 at 11:59pm): PDF
  • Pset 3 (due 10/29 at 11:59pm): PDF
  • Pset 4 (due 11/11 at 11:59pm): PDF


Miscellaneous Materials


Lectures

Date Topic Lecture Notes Resources
Sep 4 Logistics, vignette: diffraction limit and learning theory instructor notes, slides
Sep 9 Tensors I: Jennrich's algorithm, applications (super-resolution, Gaussian mixtures, independent component analysis) instructor notes, slides, scribe notes
Sep 11 Tensors II: Iterative methods (gradient descent, tensor power method, alternating least squares) instructor notes, slides, scribe notes
Sep 16 Tensors III: overcomplete tensor decomposition via smoothed analysis instructor notes, slides, scribe notes
Sep 18 Sum-of-squares I: pseudo-distributions, sum-of-squares proofs instructor notes, slides
Sep 23 Sum-of-squares II: robust regression and certifiable hypercontractivity instructor notes, scribe notes
Sep 25 Sum-of-squares III: mixtures of spherical Gaussians, entropy maximization instructor notes, slides, scribe notes
Sep 30 Supervised learning I: PAC learning, Fourier analysis, semirandom noise slides, scribe notes
Oct 2 Supervised learning II: noise sensitivity, one-hidden layer MLPs, Hermite analysis, CSQ algorithms instructor notes, slides, scribe notes
Oct 7 Supervised learning III: filtered PCA instructor notes, slides, scribe notes
Oct 9 Supervised learning IV: linearized networks instructor notes, slides, scribe notes
Oct 16 Supervised learning V: mean field limit instructor notes, slides, scribe notes
Oct 21 Computational complexity I: hardness basics, SQ lower bounds instructor notes, slides, scribe notes
Oct 23 Computational complexity II: SQ dimension (application to MLPs), statistical dimension (applications to Gaussian mixtures, generative models) instructor notes, slides, scribe notes
Oct 28 Computational complexity III: cryptographic lower bounds, continuous LWE instructor notes, slides
Oct 30 Computational complexity IV: continuous LWE (cont'd) instructor notes (board talk only)
Nov 4 Statistical physics I: intro to graphical models and variational inference instructor notes (board talk only), scribe notes
Nov 6 Statistical physics II: belief propagation, Bethe free energy instructor notes, slides, scribe notes
Nov 11 Statistical physics III: derivation of AMP, state evolution, Z2 synchronization instructor notes, slides, scribe notes
  • See the extensive survey on AMP by [Feng-Venkataramanan-Rush-Samworth '21] for pointers to the large literature on AMP
  • Tutorials on AMP: survey on mean-field spin glass techniques: [Montanari-Sen '22] (see Section 2.4 for derivation of AMP and proof sketch for state evolution), lecture notes on high-dimensional inference [El Alaoui '21] (see Lectures 8 to 12 for details on state evolution), lecture notes on statistical physics / optimization / learning: [Krzakala-Zdeborova '22] (see chapter 8 for cavity method perspective on AMP), short lecture on AMP for Z2 synchronization [Wein '17], video recording of tutorial on statistical physics methods: [El Alaoui '22]
  • Survey on low rank matrix estimation: [Miolane '19]
  • First work rigorously proving state evolution for AMP (for solving TAP equation) [Bolthausen '12], state evolution result from class: [Bayati-Montanari '10]
Nov 13 Statistical physics IV: optimality of AMP for Z2 synchronization instructor notes, slides, scribe notes
Nov 18 Generative modeling I: stochastic calculus, Langevin diffusion instructor notes, slides
Nov 20 Generative modeling II: intro to diffusion models instructor notes, slides, scribe notes
Nov 25 Generative modeling III: Girsanov's theorem and discretization analysis instructor notes, slides
Dec 2 Generative modeling IV: Probability flow ODE, reverse transport inequalities, and shifted composition slides
Dec 4 Generative modeling V: polynomial approximation and diffusions for learning Gaussian mixtures instructor notes, slides