Foreword xiii
Preface xv
Acknowledgments xvii
About the Authors xix
Â
Chapter 1: Why CUDA? Why Now? 1
1.1 Chapter Objectives 2
1.2 The Age of Parallel Processing 2
1.3 The Rise of GPU Computing 4
1.4 CUDA 6
1.5 Applications of CUDA 8
1.6 Chapter Review 11
Â
Chapter 2: Getting Started 13
2.1 Chapter Objectives 14
2.2 Development Environment 14
2.3 Chapter Review 19
Â
Chapter 3: Introduction to CUDA C 21
3.1 Chapter Objectives 22
3.2 A First Program 22
3.3 Querying Devices 27
3.4 Using Device Properties 33
3.5 Chapter Review 35
Â
Chapter 4: Parallel Programming in CUDA C 37
4.1 Chapter Objectives 38
4.2 CUDA Parallel Programming 38
4.3 Chapter Review 57
Â
Chapter 5: Thread Cooperation 59
5.1 Chapter Objectives 60
5.2 Splitting Parallel Blocks 60
5.3 Shared Memory and Synchronization 75
5.4 Chapter Review 94
Â
Chapter 6: Constant Memory and Events 95
6.1 Chapter Objectives 96
6.2 Constant Memory 96
6.3 Measuring Performance with Events 108
6.4 Chapter Review 114
Â
Chapter 7: Texture Memory 115
7.1 Chapter Objectives 116
7.2 Texture Memory Overview 116
7.3 Simulating Heat Transfer 117
7.4 Chapter Review 137
Â
Chapter 8: Graphics Interoperability 139
8.1 Chapter Objectives 140
8.2 Graphics Interoperation 140
8.3 GPU Ripple with Graphics Interoperability 147
8.4 Heat Transfer with Graphics Interop 154
8.5 DirectX Interoperability 160
8.6 Chapter Review 161
Â
Chapter 9: Atomics 163
9.1 Chapter Objectives 164
9.2 Compute Capability 164
9.3 Atomic Operations Overview 168
9.4 Computing Histograms 170
9.5 Chapter Review 183
Â
Chapter 10: Streams 185
10.1 Chapter Objectives 186
10.2 Page-Locked Host Memory 186
10.3 CUDA Streams 192
10.4 Using a Single CUDA Stream 192
10.5 Using Multiple CUDA Streams 198
10.6 GPU Work Scheduling 205
10.7 Using Multiple CUDA Streams Effectively 208
10.8 Chapter Review 211
Â
Chapter 11: CUDA C on Multiple GPUs 213
11.1 Chapter Objectives 214
11.2 Zero-Copy Host Memory 214
11.3 Using Multiple GPUs 224
11.4 Portable Pinned Memory 230
11.5 Chapter Review 235
Â
Chapter 12: The Final Countdown 237
12.1 Chapter Objectives 238
12.2 CUDA Tools 238
12.3 Written Resources 244
12.4 Code Resources 246
12.5 Chapter Review 248
Â
Appendix A: Advanced Atomics 249
A.1 Dot Product Revisited 250
A.2 Implementing a Hash Table 258
A.3 Appendix Review 277
Â
Index 279