Data Science for All
Brennan Davis and Hunter Glanz, Cal Poly – San Luis Obispo
Crafted for any undergrad student, Data Science for All empowers students with the skills to navigate our data–rich world. The authors help students succeed by integrating engaging, hands-on activities and practical examples, making it accessible to those without prior statistics or programming experience. Featuring flexible software options and comprehensive resources, the book ensures students can seamlessly transition from learning to application. Perfect for introductory courses, it prepares students to navigate and utilize data in both their academic and professional journeys.
Features | Meet the authors | Table of contents | Testimonials | Author webinars


Integrated Review
Additional foundational math & stats support, including auto-graded exercises and videos, is available in the MyLab Statistics course.
Meet the authors
Hear from Brennan and Hunter.

Brennan Davis
Professor and Director of Analytics Programs, California Polytechnic State University

Hunter Glanz
Associate Professor, California Polytechnic State University
Chapter 1: What is Data Science?
Section 1: Introduction to Data Science
Case Study: Netflix Uses Data Science for a Better Customer Experience
Case Study: NASA Uses Cloud Services to Stream Real-Time Mars Footage
Section 2: Data in Tables
Section 3: Data Preparation
Section 4: Data Analysis and Storytelling
Section 5: Data Science in Society and Industry
Case Study: Amazon Uses Data for Customers, Ads, and Fraud Prevention
Putting It Together
Ethics in Practice: Some Risks in Data Science
Exercises
Chapter 2: Data Wrangling: Preprocessing
Section 1: What is Data Wrangling?
Section 2: Cleaning Missing Data
Case Study: Data Wrangling in Criminal Justice Research
Section 3: Cleaning Anomalous Values
Case Study: Dewey Defeats Truman and the Role of Data Wrangling
Section 4: Transforming Quantitative Variables
Case Study: GlobalGiving Teaches Nonprofits About Transforming Variables
Section 5: Transforming Categorical Variables
Section 6: Reshaping a Data Set
Section 7: Combining Data Sets
Putting It Together
Ethics in Practice: Othering
Exercises
Chapter 3: Making Sense of Data Through Visualization
Case Study: The Washington Post Uses a Visualization to Report on U.S. Flooding
Section 1: The Grammar of Graphics
Section 2: Visualizations with One Quantitative Variable
Section 3: Visualizations with One Categorical Variable
Section 4: Visualizations with Two Variables
Section 5: Visualizations with Three or More Variables
Section 6: The Dangers of Visual Misrepresentation
Section 7: Data Visualization Guidelines
Case Study: European Space Agency Offers Interactive Star Mapper
Case Study: ESPN Updates its Visualizations in Real Time
Putting It Together
Ethics in Practice: The Perils of Using Color
Exercises
Chapter 4: Exploratory Data Analysis
Case Study: Shopify Helps Small Businesses with Descriptive Analytics
Section 1: Central Tendency
Section 2: Variability
Case Study: On- and Off-Field Exploratory Data Analysis in Sports
Section 3: Shape
Section 4: Resistant Central Tendency and Variability
Section 5: Data Associations
Case Study: Exploratory Data Analysis of Electronic Medical Records
Section 6: Identifying Outliers
Putting It Together
Ethics in Practice: Simpson’s Paradox
Exercises
Chapter 5: Data Management
Section 1: Asking Questions of Data
Section 2: Selecting Variables
Case Study: Starbucks Queries its Customer Data
Section 3: Filtering and Ordering Observations
Case Study: Zara Filters to Move its Product Faster
Section 4: Summarizing and Structuring Data
Section 5: Merging Tables
Case Study: Merging Data to Combat the Spread of Disease
Putting It Together
Ethics in Practice: Data Privacy Regulation
Exercises
Chapter 6: Understanding Uncertainty, Probability, and Variability
Section 1: Variability and Uncertainty
Section 2: Probability
Case Study: FiveThirtyEight
Section 3: Sampling Methods
Case Study: Sabermetrics and Next-Gen Stats
Section 4: Simulation
Section 5: Working with Probabilities and Common Fallacies
Case Study: The Base Rate Fallacy of COVID-19 Misinformation in Iceland
Putting It Together
Ethics in Practice: Power in Sampling
Exercises
Chapter 7: Inference from Data
Section 1: Introduction to Statistical Inference
Section 2: Data Collection and Study Design
Case Study: Firearm Regulations and Causation Versus Correlation
Section 3: The Language of Statistical Inference
Section 4: Exploratory Data Analysis to Begin Inference
Section 5: Drawing Conclusions in an Observational Study
Section 6: A/B Testing as a Case of Experiments
Case Study: A/B Testing Rating System at Netflix
Putting It Together
Ethics in Practice: P-Hacking and The Reproducibility Crisis
Exercises
Chapter 8: Machine Learning
Section 1: Artificial Intelligence
Section 2: Three Steps in the Machine Learning Process
Case Study: How Tesla Uses Machine Learning
Section 3: Machine Learning Method Characteristics
Section 4: Machine Learning Method Evaluation
Section 5: Deep Learning
Case Study: ChatGPT
Case Study: Improving Safety in the Construction Industry Through Deep Learning
Section 6: Use High-Quality Data in Machine Learning
Putting It Together
Ethics in Practice: Social Justice in Data Science
Exercises
Chapter 9: Supervised Learning
Section 1: Linear Regression with a Categorical Explanatory Variable
Section 2: Linear Regression with a Quantitative Explanatory Variable
Section 3: Multiple Linear Regression
Case Study: Anesthesia and Regression
Section 4: Nonparametric Regression Models
Case Study: Improving Student Success and Satisfaction in Higher Education
Section 5: Classification Models
Putting It Together
Ethics in Practice: Extrapolation
Exercises
Chapter 10: Unsupervised Learning
Section 1: What is Unsupervised Learning?
Case Study: Anomaly Detection at Accenture
Section 2: Getting to Know Cluster Analysis
Section 3: Introduction to K-Means Clustering
Case Study: Spotify Uses Unsupervised Machine Learning for Personalization
Section 4: Introduction to Hierarchical Clustering
Section 5: Assessing the Quality of Clusters
Case Study: Advertising from Target
Putting It Together
Ethics in Practice: Subjectivity in Unsupervised Learning
Exercises
“I appreciate the variety in the visualizations and the visual appeal of them.”
- Devin Brown, student pilot user
“Its presentation of topics goes beyond mere definitions by incorporating real-world examples, enabling students to associate data science techniques with practical applications. This approach significantly enhances the learning process.”
- Professor, California State University - Pomona
“I love the explanations for the concepts. The language is very easy to understand and not overly technical. I love the level of detail on each of the topics presented, as well as including real world applications to motivate why these topics are important.”
- Karle Flanagan, University of Illinois- Champaign
Rachel Vincent-Finley
Brennan Davis
Professor and Director of Analytics Programs, California Polytechnic State University
Brennan Davis is the Richard and Julie Hood Professor and Director of Analytics Programs at the Orfalea College of Business at Cal Poly in San Luis Obispo, CA. He has a PhD from The Paul Merage School of Business at the University of California Irvine, an MBA from the Wharton School of Business at the University of Pennsylvania, and a BS in mathematics from UCLA. He teaches upper-division undergraduate and graduate analytics courses. He is a member of Cal Poly’s Data Science & Analytics university initiatives committee.
Marc Renault
Hunter Glanz
Associate Professor, California Polytechnic State University
Hunter Glanz is an associate professor of statistics and data science at California Polytechnic State University (Cal Poly, San Luis Obispo). He received a BS in mathematics and a BS in statistics from Cal Poly, San Luis Obispo, followed by an MA and PhD in statistics from Boston University. He maintains a passion for machine learning and statistical computing and enjoys advancing education efforts in these areas. Hunter serves on numerous committees and organizations dedicated to delivering cutting-edge statistical and data science content to students and professionals alike. He is a founding board member of the California Alliance for Data Science Education.