Registry Synced

BSEE4001 - Speech Technology

641 words
3 min read
FieldValue
Course CodeBSEE4001
LevelDegree Level Course
Credits4
TypeElective
Pre-requisitesNone
VideosYouTube Playlist

πŸ“– Description

The following are the suggested books for the course:

πŸ—“οΈ Weekly Syllabus

WeekTopic
Week 1Review of Signals and Systems, Continuous time signals and transforms
Discrete time signals, Discrete Fourier transform, Autocorrelation and Cross-Cor
Week 2Acoustic Feature Analysis of Speech Signals I, II
Gaussian mixture models (GMM), universal background model (UBM-GMM), singular value decomposition (S
Week 3Hidden Markov model (HMM), Examples of HMM based approach for ASR, TTS, speaker diarization
Information bottleneck (IB) based clustering for diarizati
Week 4Introduction and History of ASR and TTS
Components of ASR: Acoustic Modelling, Punctuation Model (Lexicon) and language modelling (N-Gram Language mod
Week 5HMMs for Acoustic Modelling - Monophone, Triphone
Speech Synthesis: unit selection, statistical parametric synthesis (HTS)
Week 6Neural networks for building speech technologies
NN for Acoustic Modelling - Hybrid modelling- Hybrid-NN: DNN,CNN,TDNN
Week 7End-to-End Approaches I:
CTC, Encoder-decoder Architecture E2E with RNN
Week 8Applications to ASR and TTS
End-to-End Approaches II
Week 9Encoder-decoder Architecture E2E with transformers for ASR and TTS
Interesting Problems
Week 10Speaker recognition/verification: with ivector, xvector
Speaker diarization: using x-vector
Week 11Speaker adaptation: (revisit i, x vectors) and introduce s-vectors.
Code Switched Speech recognition; Speech Translation
Week 12Singing voice synthesis; voice conversion; generic voice synthesis

πŸ“š Books & Resources

Prescribed Books The following are the suggested books for the course:
        L R Rabiner and R W Schafer, "Theory and Application of Digital Speech
Processing", PH, Pearson, 2011.
        L R Rabiner, B-H Juang and B Yegnanarayana, "Fundamentals of Speech
Recognition", Pearson, 2009 (Indian subcontinent adaptation).
        Xuedong Huang, Alex Acero, Hsiao-wuen Hon, "Spoken Language Processing: A
guide to Theory, Algorithm, and System Development", Prentice Hall PTR, 2001.
        References:
        
        Thomas Quatieri, "Discrete-time Speech Processing: Principles and Practice", PH,
2001.
        Rabiner and Schafer, "Digital Processing of Speech Signals", Pearson Education,
1993.
        Recent research papers

πŸ“ About the Instructors

Prof. S. Umesh
Professor,
Department of Electrical Engineering,
Indian Institute of Technology,
IIT Madras
S. Umesh is a Β Professor of Electrical Engineering at IIT-Madras. He completed his PhD from the University of Rhode Island,USA and his PostDoctoral Fellowship from the City University of New York. He has also been a visiting researcher at AT&T Research Laboratories, USA; at Machine Intelligence Laboratory Cambridge University Engineering Department, UK and the Department of Computer Science, RWTH-Aachen, Germany.
...
more
He is a recipient of the AICTE Career Award for Young Teachers in 1997 and the Alexander von Humboldt Research Fellowship in 2004.Β  During his stint at Cambridge University in 2004, he was part of the U.S. DARPA's Effective, Affordable Reusable Speech-to-text (EARS) programme. Similarly in 2005 he was part of the RWTH-Aachen's TC-STAR project for transcription of speech from European Parliament's Plenary Sessions. Between 2010-2016, he led a multi-institution consortium to develop ASR systems in Indian languages in the agriculture domain which was funded by MeiTY. He is currently leading the ASR efforts for the Natural Language Translation Mission managed by the Office of Principal Scientific Adviser of Govt. of India.
less
Visit website
Other courses by the same instructor:
BSDA5013 -
Deep Learning Practice
Prof. Hema A Murthy
Professor,
Department of Computer Science and Engineering,
IIT Madras
Faculty at the Department of Computer Science and Engineering, Indian Institute of Technology Madras.
less
Visit website

Document Outline
Table of Contents
System Normal // Awaiting Context

Intelligence Hub

Navigate the knowledge graph to generate context. The Hub adapts dynamically to surface backlinks, related notes, and metadata insights.