CHAPTER 1 Introduction to Digital Speech Processing 1
1.1 The Speech Signal 3
1.2 The Speech Stack 8
1.3 Applications of Digital Speech Processing 10
1.4 Comment on the References 15
1.5 Summary 17
Â
CHAPTER 2 Review of Fundamentals of Digital Signal Processing 18
2.1 Introduction 18
2.2 Discrete-Time Signals and Systems 18
2.3 Transform Representation of Signals and Systems 22
2.4 Fundamentals of Digital Filters 33
2.5 Sampling 44
2.6 Summary 56
Problems 56
Â
CHAPTER 3 Fundamentals of Human Speech Production 67
3.1 Introduction 67
3.2 The Process of Speech Production 68
3.3 Short-Time Fourier Representation of Speech 81
3.4 Acoustic Phonetics 86
3.5 Distinctive Features of the Phonemes of American English 108
3.6 Summary 110
Problems 110
Â
CHAPTER 4 Hearing, Auditory Models, and Speech Perception 124
4.1 Introduction 124
4.2 The Speech Chain 125
4.3 Anatomy and Function of the Ear 127
4.4 The Perception of Sound 133
4.5 Auditory Models 150
4.6 Human Speech Perception Experiments 158
4.7 Measurement of Speech Quality and Intelligibility 162
4.8 Summary 166
Problems 167
Â
CHAPTER 5 Sound Propagation in the Human Vocal Tract 170
5.1 The Acoustic Theory of Speech Production 170
5.2 Lossless Tube Models 200
5.3 Digital Models for Sampled Speech Signals 219
5.4 Summary 228
Problems 228
Â
CHAPTER 6 Time-Domain Methods for Speech Processing 239
6.1 Introduction 239
6.2 Short-Time Analysis of Speech 242
6.3 Short-Time Energy and Short-Time Magnitude 248
6.4 Short-Time Zero-Crossing Rate 257
6.5 The Short-Time Autocorrelation Function 265
6.6 The Modified Short-Time Autocorrelation Function 273
6.7 The Short-Time Average Magnitude Difference Function 275
6.8 Summary 277
Problems 278
Â
CHAPTER 7 Frequency-Domain Representations 287
7.1 Introduction 287
7.2 Discrete-Time Fourier Analysis 289
7.3 Short-Time Fourier Analysis 292
7.4 Spectrographic Displays 312
7.5 Overlap Addition Method of Synthesis 319
7.6 Filter Bank Summation Method of Synthesis 331
7.7 Time-Decimated Filter Banks 340
7.8 Two-Channel Filter Banks 348
7.9 Implementation of the FBS Method Using the FFT 358
7.10 OLA Revisited 365
7.11 Modifications of the STFT 367
7.12 Summary 379
Problems 380
Â
CHAPTER 8 The Cepstrum and Homomorphic Speech Processing 399
8.1 Introduction 399
8.2 Homomorphic Systems for Convolution 401
8.3 Homomorphic Analysis of the Speech Model 417
8.4 Computing the Short-Time Cepstrum and Complex Cepstrum
of Speech 429
8.5 Homomorphic Filtering of Natural Speech 440
8.6 Cepstrum Analysis of All-Pole Models 456
8.7 Cepstrum Distance Measures 459
8.8 Summary 466
Problems 466
Â
CHAPTER 9 Linear Predictive Analysis of Speech Signals 473
9.1 Introduction 473
9.2 Basic Principles of Linear Predictive Analysis 474
9.3 Computation of the Gain for the Model 486
9.4 Frequency Domain Interpretations of Linear Predictive
Analysis 490
9.5 Solution of the LPC Equations 505
9.6 The Prediction Error Signal 527
9.7 Some Properties of the LPC Polynomial A(z) 538
9.8 Relation of Linear Predictive Analysis to Lossless Tube Models 546
9.9 Alternative Representations of the LP Parameters 551
9.10 Summary 560
Problems 560
Â
CHAPTER 10 Algorithms for Estimating Speech Parameters 578
10.1 Introduction 578
10.2 Median Smoothing and Speech Processing 580
10.3 Speech-Background/Silence Discrimination 586
10.4 A Bayesian Approach to Voiced/Unvoiced/Silence Detection 595
10.5 Pitch Period Estimation (Pitch Detection) 603
10.6 Formant Estimation 635
10.7 Summary 645
Problems 645
Â
CHAPTER 11 Digital Coding of Speech Signals 663
11.1 Introduction 663
11.2 Sampling Speech Signals 667
11.3 A Statistical Model for Speech 669
11.4 Instantaneous Quantization 676
11.5 Adaptive Quantization 706
11.6 Quantizing of Speech Model Parameters 718
11.7 General Theory of Differential Quantization 732
11.8 Delta Modulation 743
11.9 Differential PCM (DPCM) 759
11.10 Enhancements for ADPCM Coders 768
11.11 Analysis-by-Synthesis Speech Coders 783
11.12 Open-Loop Speech Coders 806
11.13 Applications of Speech Coders 814
11.14 Summary 819
Problems 820
Â
CHAPTER 12 Frequency-Domain Coding of Speech and Audio 842
12.1 Introduction 842
12.2 Historical Perspective 844
12.3 Subband Coding 850
12.4 Adaptive Transform Coding 861
12.5 A Perception Model for Audio Coding 866
12.6 MPEG-1 Audio Coding Standard 881
12.7 Other Audio Coding Standards 894
12.8 Summary 894
Problems 895
Â
CHAPTER 13 Text-to-Speech Synthesis Methods 907
13.1 Introduction 907
13.2 Text Analysis 908
13.3 Evolution of Speech Synthesis Methods 914
13.4 Early Speech Synthesis Approaches 916
13.5 Unit Selection Methods 926
13.6 TTS Future Needs 942
13.7 Visual TTS 943
13.8 Summary 947
Problems 947
Â
CHAPTER 14 Automatic Speech Recognition and Natural
Language Understanding 950
14.1 Introduction 950
14.2 Basic ASR Formulation 952
14.3 Overall Speech Recognition Process 953
14.4 Building a Speech Recognition System 954
14.5 The Decision Processes in ASR 957
14.6 Step 3: The Search Problem 971
14.7 Simple ASR System: Isolated Digit Recognition 972
14.8 Performance Evaluation of Speech Recognizers 974
14.9 Spoken Language Understanding 977
14.10 Dialog Management and Spoken Language Generation 980
14.11 User Interfaces 983
14.12 Multimodal User Interfaces 984
14.13 Summary 984
Problems 985
Â
Appendices
A Speech and Audio Processing Demonstrations 993
B Solution of Frequency-Domain Differential Equations 1005
Bibliography 1009
Index 1033