Neural Sync Active
BSDA5004 - Large Language Models
Registry Synced
BSDA5004 - Large Language Models
605 words
3 min read
| Field | Value |
|---|---|
| Course Code | BSDA5004 |
| Level | Degree Level Course |
| Credits | 4 |
| Type | Elective |
| Pre-requisites | BSCS3004 - Β Deep Learning |
π Description
Understanding the Transformer architecture
Understanding the concept of pretraining and fine-tuning language models
Compare and contrast different types of tokenizers like BPE, wordpiece, sentencepiece
Understanding different LLMs architectures: encoder-decoder, encoder-only, decoder-only
Exploring common datasets like C4,mc4,Pile, Stack and so on
Addressing the challenges of applying vanilla attention mechanisms for long range context windows.
Apply different types of fine-tuning techniques to fine-tune large language models
ποΈ Weekly Syllabus
| Week | Topic |
|---|---|
| Week 1 | Transformers: Introduction to transformers - Self-attention - cross- attention-Masked attention-Positional encoding |
| Week 2 | A deep dive into number of parameters, computational complexity and FLOPs- Introduction to language modeling |
| Week 3 | Causal Language Modeling: What is a language model?- Generative Pretrained Transformers (GPT) - Training and inference |
| Week 4 | Masked Language Modeling : Bidirectional Encoder Representations of Transformers (BERT) - Fine-tuning - A deep dive into tokenization: BPE, SentencePi |
| Week 5 | Bigger Picture: T5, A deep dive into text-to-text (genesis of prompting), taxonomy of models, road ahead |
| Week 6 | Data: Datasets, Pipelines, effectiveness of clean data, Architecture: Types of attention, positional encoding (PE) techniques, scaling techniques |
| Week 7 | Training: Revisiting optimizers, LION vs Adam, Loss functions, Learning schedules, Gradient Clipping, typical failures during training |
| Week 8 | Fine Tuning: Prompt Tuning,Multi-task Fine-tuning,Parametric Efficient |
| Fine-Tuning, Instruction fine-tuning datasets | |
| Week 9 | Benchmarks: MMLU, BigBench, HELM,OpenLLM, Evaluation |
| Frameworks | |
| Week 10 | Training Large Models: Mixed precision training,Activation checkpointing, 3D parallelism, ZERO, Bloom as a case study |
| Week 11 | Scaling Laws: Chinchilla,Gopher, Palm v2 |
| Week 12 | Recent advances |
π Books & Resources
Prescribed Books
The following are the suggested books for the course:
Research papers, articles
π About the Instructors
Prof. Mitesh M.Khapra
Associate Professor,
Department of Computer Science and Engineering,
IIT Madras
Mitesh M. Khapra is an Associate Professor in the Department of Computer Science and Engineering at IIT Madras and is affiliated with the Robert Bosch Centre for Data Science and AI. He is also a co-founder of One Fourth Labs, a startup whose mission is to design and deliver affordable hands-on courses on AI and related topics. He is also a co-founder of AI4Bharat, a voluntary community with an aim to provide AI-based solutions to India-specific problems. His research interests span the areas of Deep Learning, Multimodal Multilingual Processing, Natural Language Generation, Dialog systems, Question Answering and Indic Language Processing. Prior to IIT Madras, he was a Researcher at IBM Research India for four and a half years, where he worked on several interesting problems in the areas of Statistical Machine Translation, Cross Language Learning, Multimodal Learning, Argument Mining and Deep Learning. Prior to IBM, he completed his PhD and M.Tech from IIT Bombay in Jan 2012 and July 2008 respectively.During his PhD he was a recipient of the IBM PhD Fellowship (2011) and the Microsoft Rising Star Award (2011). He is also a recipient of the Google Faculty Research Award (2018), the IITM Young Faculty Recognition Award (2019) and the Prof. B. Yegnanarayana Award for Excellence in Research and Teaching (2020).
less
Other courses by the same instructor:
BSCS3004 -
Deep Learning
and
BSDA5013 -
Deep Learning Practice