Lecture 1
Trustworthy Machine Learning (Spring 2025)
Course Overview
Theme I
Foundations of Modern Language Models
From classical sequence models to attention, transformers, and pretrained language models.
Lecture 2
N-Grams, RNNs, LSTMs, Seq2Seq
Lecture 3
Attention Mechanism & Transformers
Lecture 4
BERT: Bidirectional Encoder Representations from Transformers (NAACL 2019)
Lecture 5
GPT-2: Language Models are Unsupervised Multitask Learners (OpenAI Technical Report, 2019)
Theme II
Scaling, Adaptation, and Alignment
Scaling laws, few-shot learning, instruction tuning, human feedback, and efficient open foundation models.
Lecture 6
Scaling Laws for Neural Language Models (arXiv 2020)
Lecture 7
GPT-3: Language Models are Few-Shot Learners (NeurIPS 2020)
Lecture 8
FLAN: Fine-tuned Language Models are Zero-Shot Learners (ICLR 2022)
Lecture 9
RLHF: Training Language Models to Follow Instructions with Human Feedback (NeurIPS 2022)
Lecture 10
LLaMA: Open and Efficient Foundation Language Models (arXiv 2023)
Theme III
Capabilities, Reasoning, and Knowledge Augmentation
Specialized capabilities, explicit reasoning, retrieval, search over thought processes, and domain knowledge.
Lecture 11
CODEX: Evaluating LLMs Trained on Code (arXiv 2021)
Lecture 12
Chain-of-Thought: Eliciting Reasoning in LLMs (NeurIPS 2022)
Lecture 13
Retrieval-Augmented Generation (RAG) (NeurIPS 2020)
Lecture 14
Tree-of-Thought (NeurIPS 2023)
Lecture 15
DeepSeek-R1 Paper (arXiv 2025)
Lecture 16
LLMs Encode Clinical Knowledge (Nature 2023)
Theme IV
LLM Safety: Jailbreaking, Defenses, and Red Teaming
How safety training fails, how adversarial attacks are constructed, and how defenses and evaluation methods respond.
Lecture 17
Jailbroken: How Does LLM Safety Training Fail? (NeurIPS 2023)
Lecture 18
Greedy Coordinate Gradient (GCG) Jailbreak (arXiv 2023)
Lecture 19
Jailbreaking Black Box Large Language Models in Twenty Queries (IEEE SaTML 2025)
Lecture 20
SmoothLLM: Defending Large Language Models Against Jailbreaking Attacks (ICLR 2024)
Lecture 21
Rainbow Teaming (NeurIPS 2024)
Theme V
Privacy, Memorization, and Training-Data Leakage
Training-data extraction, memorization measurement, scalable attacks, and membership inference.
Lecture 22
Extracting Training Data from LLMs (USENIX Security 2021)
Lecture 23
Quantifying Memorization in LLMs (ICLR 2023)
Lecture 24
Scalable Extraction of Training Data from Production LLMs (arXiv 2023)
Lecture 25
Scalable Extraction from Aligned LLMs (ICLR 2025)
Lecture 26