Theory and Applications of Digital Speech Processing, 1st edition
Published by Pearson (March 3, 2010) © 2011
- Lawrence Rabiner Bell Labs., Murray Hill, NJ
- Ronald Schafer
eTextbook
- Easy-to-use search and navigation
- Add notes and highlights
- Search by keyword or page
- Hardcover, paperback or looseleaf edition
- Affordable rental option for select titles
For graduate students in digital signal processing, and undergraduate students in Electrical and Computer Engineering. Also suitable for practicing engineers in speech processing.
Clear, up-to-date, hands-on coverage of digital speech processing.
This new text presents the basic concepts and theories of speech processing with clarity and currency, while providing hands-on computer-based laboratory experiences for students. The material is organized in a manner that builds a strong foundation of basics first, and then concentrates on a range of signal processing methods for representing and processing the speech signal.
Building from basic concepts to application of the material.
Following the discussion of the basic signal processing methods, the book shows how speech algorithms can be built on top of various speech representations, and ultimately how applications to speech and audio coding, synthesis, and recognition can be realized based entirely on ideas discussed in earlier chapters of the book.
A logical and intuitive order to teaching the concepts that allows instructors to tailor their lectures as needed. Every chapter provides insights into the theory, concepts, and implementations of a range of short-time signal processing methods, and their utilization in realizing speech algorithms and speech applications:
- Chapter 2 offers a review of the most important concepts from digital signal processing that form the basis for the signal processing presented in the book.
- Chapters 3-4 discuss the basics of speech production and perception, providing a basis for understanding how and why short-time analysis techniques work in a range of speech algorithms and applications.
- Chapter 5 describes the basis of the terminal analog model of speech production and synthesis, which forms the basis for Chapter 6-9, each of which present the signal processing aspects of processing speech in the time domain, the frequency domain, the cepstral domain and the linear predictive modeling domain.
- Chapter 10 describes a range of speech algorithms, each showing how they exploit the properties of a range of short-time representations of the speech signal.
- Chapters 11-14 discuss a range of applications of short-time speech processing to speech and audio coding, speech synthesis, and speech recognition and understanding.
Homework problems that include a range of practical MATLAB exercises, which reinforce concepts taught in lectures by providing students with first-hand experiences implementing a range of concepts and algorithms in speech processing. See Chapters 2-11.
Textbook references that offer students the ability to go further explore any topic of interest. See Chapter 1.
An accompanying Website, which vividly illustrates speech processing algorithms and systems through audio examples accompanied by clear explanations. Also included on the Website is an extensive set of MATLAB programs for illustrating many of the concepts in the text. See Chapters 1, 2, 7, 11 and 13.
CHAPTER 1 Introduction to Digital Speech Processing 1
1.1 The Speech Signal 3
1.2 The Speech Stack 8
1.3 Applications of Digital Speech Processing 10
1.4 Comment on the References 15
1.5 Summary 17
CHAPTER 2 Review of Fundamentals of Digital Signal Processing 18
2.1 Introduction 18
2.2 Discrete-Time Signals and Systems 18
2.3 Transform Representation of Signals and Systems 22
2.4 Fundamentals of Digital Filters 33
2.5 Sampling 44
2.6 Summary 56
Problems 56
CHAPTER 3 Fundamentals of Human Speech Production 67
3.1 Introduction 67
3.2 The Process of Speech Production 68
3.3 Short-Time Fourier Representation of Speech 81
3.4 Acoustic Phonetics 86
3.5 Distinctive Features of the Phonemes of American English 108
3.6 Summary 110
Problems 110
CHAPTER 4 Hearing, Auditory Models, and Speech Perception 124
4.1 Introduction 124
4.2 The Speech Chain 125
4.3 Anatomy and Function of the Ear 127
4.4 The Perception of Sound 133
4.5 Auditory Models 150
4.6 Human Speech Perception Experiments 158
4.7 Measurement of Speech Quality and Intelligibility 162
4.8 Summary 166
Problems 167
CHAPTER 5 Sound Propagation in the Human Vocal Tract 170
5.1 The Acoustic Theory of Speech Production 170
5.2 Lossless Tube Models 200
5.3 Digital Models for Sampled Speech Signals 219
5.4 Summary 228
Problems 228
CHAPTER 6 Time-Domain Methods for Speech Processing 239
6.1 Introduction 239
6.2 Short-Time Analysis of Speech 242
6.3 Short-Time Energy and Short-Time Magnitude 248
6.4 Short-Time Zero-Crossing Rate 257
6.5 The Short-Time Autocorrelation Function 265
6.6 The Modified Short-Time Autocorrelation Function 273
6.7 The Short-Time Average Magnitude Difference Function 275
6.8 Summary 277
Problems 278
CHAPTER 7 Frequency-Domain Representations 287
7.1 Introduction 287
7.2 Discrete-Time Fourier Analysis 289
7.3 Short-Time Fourier Analysis 292
7.4 Spectrographic Displays 312
7.5 Overlap Addition Method of Synthesis 319
7.6 Filter Bank Summation Method of Synthesis 331
7.7 Time-Decimated Filter Banks 340
7.8 Two-Channel Filter Banks 348
7.9 Implementation of the FBS Method Using the FFT 358
7.10 OLA Revisited 365
7.11 Modifications of the STFT 367
7.12 Summary 379
Problems 380
CHAPTER 8 The Cepstrum and Homomorphic Speech Processing 399
8.1 Introduction 399
8.2 Homomorphic Systems for Convolution 401
8.3 Homomorphic Analysis of the Speech Model 417
8.4 Computing the Short-Time Cepstrum and Complex Cepstrum
of Speech 429
8.5 Homomorphic Filtering of Natural Speech 440
8.6 Cepstrum Analysis of All-Pole Models 456
8.7 Cepstrum Distance Measures 459
8.8 Summary 466
Problems 466
CHAPTER 9 Linear Predictive Analysis of Speech Signals 473
9.1 Introduction 473
9.2 Basic Principles of Linear Predictive Analysis 474
9.3 Computation of the Gain for the Model 486
9.4 Frequency Domain Interpretations of Linear Predictive
Analysis 490
9.5 Solution of the LPC Equations 505
9.6 The Prediction Error Signal 527
9.7 Some Properties of the LPC Polynomial A(z) 538
9.8 Relation of Linear Predictive Analysis to Lossless Tube Models 546
9.9 Alternative Representations of the LP Parameters 551
9.10 Summary 560
Problems 560
CHAPTER 10 Algorithms for Estimating Speech Parameters 578
10.1 Introduction 578
10.2 Median Smoothing and Speech Processing 580
10.3 Speech-Background/Silence Discrimination 586
10.4 A Bayesian Approach to Voiced/Unvoiced/Silence Detection 595
10.5 Pitch Period Estimation (Pitch Detection) 603
10.6 Formant Estimation 635
10.7 Summary 645
Problems 645
CHAPTER 11 Digital Coding of Speech Signals 663
11.1 Introduction 663
11.2 Sampling Speech Signals 667
11.3 A Statistical Model for Speech 669
11.4 Instantaneous Quantization 676
11.5 Adaptive Quantization 706
11.6 Quantizing of Speech Model Parameters 718
11.7 General Theory of Differential Quantization 732
11.8 Delta Modulation 743
11.9 Differential PCM (DPCM) 759
11.10 Enhancements for ADPCM Coders 768
11.11 Analysis-by-Synthesis Speech Coders 783
11.12 Open-Loop Speech Coders 806
11.13 Applications of Speech Coders 814
11.14 Summary 819
Problems 820
CHAPTER 12 Frequency-Domain Coding of Speech and Audio 842
12.1 Introduction 842
12.2 Historical Perspective 844
12.3 Subband Coding 850
12.4 Adaptive Transform Coding 861
12.5 A Perception Model for Audio Coding 866
12.6 MPEG-1 Audio Coding Standard 881
12.7 Other Audio Coding Standards 894
12.8 Summary 894
Problems 895
CHAPTER 13 Text-to-Speech Synthesis Methods 907
13.1 Introduction 907
13.2 Text Analysis 908
13.3 Evolution of Speech Synthesis Methods 914
13.4 Early Speech Synthesis Approaches 916
13.5 Unit Selection Methods 926
13.6 TTS Future Needs 942
13.7 Visual TTS 943
13.8 Summary 947
Problems 947
CHAPTER 14 Automatic Speech Recognition and Natural
Language Understanding 950
14.1 Introduction 950
14.2 Basic ASR Formulation 952
14.3 Overall Speech Recognition Process 953
14.4 Building a Speech Recognition System 954
14.5 The Decision Processes in ASR 957
14.6 Step 3: The Search Problem 971
14.7 Simple ASR System: Isolated Digit Recognition 972
14.8 Performance Evaluation of Speech Recognizers 974
14.9 Spoken Language Understanding 977
14.10 Dialog Management and Spoken Language Generation 980
14.11 User Interfaces 983
14.12 Multimodal User Interfaces 984
14.13 Summary 984
Problems 985
Appendices
A Speech and Audio Processing Demonstrations 993
B Solution of Frequency-Domain Differential Equations 1005
Bibliography 1009
Index 1033
From 1962 through 1964, Dr. Rabiner participated in the cooperative program in Electrical Engineering at AT&T Bell Laboratories, Whippany and Murray Hill, New Jersey. During this period Dr. Rabiner worked on designing digital circuitry, issues in military communications problems, and problems in binaural hearing. Dr. Rabiner joined AT&T Bell Labs in 1967 as a Member of the Technical Staff. He was promoted to Supervisor in 1972, Department Head in 1985, Director in 1990, and Functional Vice President in 1995. He joined the newly created AT&T Labs in 1996 as Director of the Speech and Image Processing Services Research Lab, and was promoted to Vice President of Research in 1998 where he managed a broad research program in communications, computing, and information sciences technologies. Dr. Rabiner retired from AT&T at the end of March 2002 and is now a Professor of Electrical and Computer Engineering at Rutgers University, and the Associate Director of the Center for Advanced Information Processing (CAIP) at Rutgers.
Dr. Rabiner is co-author of the books “Theory and Application of Digital Signal Processing” (Prentice- Hall, 1975), “Digital Processing of Speech Signals” (Prentice-Hall, 1978), “Multirate Digital Signal Processing” (Prentice-Hall, 1983), and “Fundamentals of Speech Recognition” (Prentice-Hall, 1993).
Dr. Rabiner is a member of Eta Kappa Nu, Sigma Xi, Tau Beta Pi, the National Academy of Engineering, the National Academy of Sciences, and a Fellow of the Acoustical Society of America, the IEEE, Bell Laboratories, and AT&T. He is a former President of the IEEE Acoustics, Speech, and Signal Processing Society, a former Vice-President of the Acoustical Society of America, a former editor of the ASSP Transactions, and a former member of the IEEE Proceedings Editorial Board.
Ronald W. Schafer is an electrical engineer notable for his contributions to digital signal processing.
After receiving his Ph.D. degree at MIT in 1968, he joined the Acoustics Research Department at Bell Laboratories, where he did research on digital signal processing and digital speech coding. He came to the Georgia Institute of Technology in 1974, where he stayed until joining Hewlett Packard in March 2005.
He has served as Associate Editor of IEEE Transactions on Acoustics, Speech, and Signal Processing and as Vice-President and President of the IEEE Signal Processing Society. He is a Life Fellow of the IEEE and a Fellow of the Acoustical Society of America.
He has received the IEEE Region 3 Outstanding Engineer Award, the 1980 IEEE Emanuel R. Piore Award, the Distinguished Professor Award at the Georgia Institute of Technology, the 1992 IEEE Education Medal and the 2010 IEEE Jack S. Kilby Signal Processing Medal.
Need help? Get in touch