Algorithms for Data Science
CS 224, Fall 2023

Course Details

This is a graduate-level topics class on algorithmic challenges arising in modern machine learning and data science more broadly. The course will touch upon a number of well-studied problems (generative modeling, deep learning theory, adversarial robustness, inverse problems, inference) and frameworks for algorithm design (gradient descent, spectral methods, tensor/moment methods, message passing, convex programming hierarchies, Markov chains). As we will see, proving rigorous guarantees for these problems often draws upon a wide range of techniques from stochastic calculus, harmonic analysis, random matrix theory, algebra, and statistical physics. We will also explore the myriad modeling challenges that go into building theory for ML and discuss prominent paradigms (semi-random models, smoothed analysis, oracles) for going beyond traditional worst-case analysis.

Time/Location: MW 2:15-3:30, SEC LL2-224

Instructor: Sitan Chen (sitan@seas.harvard.edu)
Office hours: SEC 3.325, Thursday 4-5

Teaching Fellows:

  • David Brewster (dbrewster@g.harvard.edu), Office hours: Maxwell-Dworkin 123, Tuesday 5-6
  • Depen Morwani (dmorwani@g.harvard.edu), Office hours: SEC 1.404, Wednesday 11-12
  • Rosie Zhao (rosiezhao@g.harvard.edu), Office hours: SEC 2.330, Monday 11-12
Pset office hour: SEC 3.314, Thursday 4-6 on weeks with a pset deadline

Canvas (for lecture recordings and announcements)

Gradescope

Ed (discussion)

Course Policies: See syllabus for detailed overview.


Student projects


Announcements

  • 10/24: Pset 4 (final) is out, see below.

  • 10/10: Pset 3 is out, see below.

  • 10/2: See canvas announcement for information about final project.

  • 9/25: Pset 2 is out, see below.

  • 9/11: Pset 1 is out, see below.

  • See Canvas for link to sign up for pset office hour preferences

  • Notifications of course placement will be sent out Thursday afternoon, Aug 31.


Assignments

  • Scribing template and instructions: TeX, bib file, PDF
  • Pset 0 (due 08/29 at 11:59pm): PDF
  • Pset 1 (due 09/22 at 3:59pm): PDF
  • Pset 2 (due 10/09 at 11:59pm): PDF
  • Pset 3 (due 10/23 at 11:59pm): PDF
  • Pset 4 (due 11/06 at 11:59pm): PDF


Miscellaneous Materials


Lectures

Date Topic Lecture Notes Resources
Sep 6 Logistics, vignette: diffraction limit and learning theory instructor notes, slides
Sep 11 Tensors I: Jennrich's algorithm, applications (super-resolution, Gaussian mixtures, independent component analysis) instructor notes, slides, scribe notes
Sep 13 Tensors II: Iterative methods (gradient descent, tensor power method, alternating least squares) instructor notes, slides, scribe notes 1, scribe notes 2
Sep 18 Tensors III: overcomplete tensor decomposition via smoothed analysis instructor notes, slides, scribe notes 1, scribe notes 2
Sep 20 Sum-of-squares I: pseudo-distributions, application to robust regression instructor notes, slides, scribe notes
Sep 25 Sum-of-squares II: mixtures of spherical Gaussians, certifiable hypercontractivity instructor notes, slides, scribe notes
Sep 27 Sum-of-squares III: high-entropy constraints, noisy orthogonal tensor decomposition instructor notes (board talk only), scribe notes
Oct 2 Sum-of-squares IV: rounding SoS relaxations with Jennrich's, overcomplete tensor decomposition instructor notes, slides, scribe notes
Oct 4 Robust statistics I: motivation, iterative filtering instructor notes, slides, scribe notes
Oct 11 Robust statistics II: list-decodable learning, subspace isotropic filtering instructor notes (board talk only), supplemental instructor notes [pdf], scribe notes
Oct 16 Robust statistics III: learning halfspaces under Massart noise instructor notes, slides, scribe notes
Oct 18 Supervised learning I: PAC basics, low-degree algorithm, L1/L2 polynomial regression, polynomial threshold functions slides, scribe notes
Oct 23 Supervised learning II: noise sensitivity, one-hidden layer MLPs, Hermite analysis, CSQ algorithms instructor notes, slides, scribe notes
Oct 25 Supervised learning III: filtered PCA instructor notes, slides, scribe notes
Oct 30 Supervised learning IV: linearized networks instructor notes, slides, scribe notes
Nov 1 Supervised learning V: mean field limit instructor notes, slides, scribe notes
Nov 6 Computational complexity I: hardness basics, SQ lower bounds, noisy parity instructor notes, slides, scribe notes
Nov 8 Computational complexity II: SQ dimension (application to MLPs), statistical dimension (applications to Gaussian mixtures, generative models) instructor notes, slides, scribe notes
Nov 13 Computational complexity III: cryptographic lower bounds, Daniely-Vardi lifting instructor notes, slides, scribe notes
Nov 15 Statistical physics I: intro to graphical models and variational inference instructor notes (board talk only), scribe notes
Nov 20 Statistical physics II: belief propagation, Bethe free energy instructor notes, slides, supplemental instructor notes [pdf], scribe notes
Nov 27 Statistical physics III: derivation of AMP, state evolution, Z2 synchronization instructor notes, slides, scribe notes
  • See the extensive survey on AMP by [Feng-Venkataramanan-Rush-Samworth '21] for pointers to the large literature on AMP
  • Tutorials on AMP: survey on mean-field spin glass techniques: [Montanari-Sen '22] (see Section 2.4 for derivation of AMP and proof sketch for state evolution), lecture notes on high-dimensional inference [El Alaoui '21] (see Lectures 8 to 12 for details on state evolution), lecture notes on statistical physics / optimization / learning: [Krzakala-Zdeborova '22] (see chapter 8 for cavity method perspective on AMP), short lecture on AMP for Z2 synchronization [Wein '17], video recording of tutorial on statistical physics methods: [El Alaoui '22]
  • Survey on low rank matrix estimation: [Miolane '19]
  • First work rigorously proving state evolution for AMP (for solving TAP equation) [Bolthausen '12], state evolution result from class: [Bayati-Montanari '10]
Nov 29 Statistical physics IV: optimality of AMP for Z2 synchronization instructor notes, slides, scribe notes
Nov 29 Diffusion models instructor notes, slides, scribe notes