Data Science for All

Brennan Davis and Hunter Glanz, Cal Poly – San Luis Obispo

Crafted for any undergrad student, Data Science for All empowers students with the skills to navigate our data–rich world. The authors help students succeed by integrating engaging, hands-on activities and practical examples, making it accessible to those without prior statistics or programming experience. Featuring flexible software options and comprehensive resources, the book ensures students can seamlessly transition from learning to application. Perfect for introductory courses, it prepares students to navigate and utilize data in both their academic and professional journeys.

Features | Meet the authors | Table of contents | Testimonials | Author webinars

Data science for all, 1st edition book cover

Features

Data Science for All includes key features designed to help students acquire knowledge, understand concepts and practice what they’ve learned. This includes a full MyLab course with ample assessment and practice support.

Try it yourself feature showing a density plot

Try it Yourself & Applying the Concepts

Try it Yourself items give students conceptual practice with guidance. Applying the Concepts activities ask students to apply concepts using the analysis tool of your choice.

Try an exercise

Prerequisite practice videos and practice homework

Integrated Review

Additional foundational math & stats support, including auto-graded exercises and videos, is available in the MyLab Statistics course.

Video Support

Video Support

Over 500 videos include lecture support, example problem videos, and software support for R, Python, Excel, StatCrunch and SQL.

View an R example video

Meet the authors

Hear from Brennan and Hunter.

Brief contents

Chapter 1: What is Data Science?

Section 1: Introduction to Data Science
Case Study: Netflix Uses Data Science for a Better Customer Experience
Case Study: NASA Uses Cloud Services to Stream Real-Time Mars Footage
Section 2: Data in Tables
Section 3: Data Preparation
Section 4: Data Analysis and Storytelling
Section 5: Data Science in Society and Industry
Case Study: Amazon Uses Data for Customers, Ads, and Fraud Prevention
Putting It Together
Ethics in Practice: Some Risks in Data Science
Exercises

Chapter 2: Data Wrangling: Preprocessing

Section 1: What is Data Wrangling?
Section 2: Cleaning Missing Data
Case Study: Data Wrangling in Criminal Justice Research
Section 3: Cleaning Anomalous Values
Case Study: Dewey Defeats Truman and the Role of Data Wrangling
Section 4: Transforming Quantitative Variables
Case Study: GlobalGiving Teaches Nonprofits About Transforming Variables
Section 5: Transforming Categorical Variables
Section 6: Reshaping a Data Set
Section 7: Combining Data Sets
Putting It Together
Ethics in Practice: Othering
Exercises

Chapter 3: Making Sense of Data Through Visualization

Case Study: The Washington Post Uses a Visualization to Report on U.S. Flooding
Section 1: The Grammar of Graphics
Section 2: Visualizations with One Quantitative Variable
Section 3: Visualizations with One Categorical Variable
Section 4: Visualizations with Two Variables
Section 5: Visualizations with Three or More Variables
Section 6: The Dangers of Visual Misrepresentation
Section 7: Data Visualization Guidelines
Case Study: European Space Agency Offers Interactive Star Mapper
Case Study: ESPN Updates its Visualizations in Real Time
Putting It Together
Ethics in Practice: The Perils of Using Color
Exercises

Chapter 4: Exploratory Data Analysis

Case Study: Shopify Helps Small Businesses with Descriptive Analytics
Section 1: Central Tendency
Section 2: Variability
Case Study: On- and Off-Field Exploratory Data Analysis in Sports
Section 3: Shape
Section 4: Resistant Central Tendency and Variability
Section 5: Data Associations
Case Study: Exploratory Data Analysis of Electronic Medical Records
Section 6: Identifying Outliers
Putting It Together
Ethics in Practice: Simpson’s Paradox
Exercises

Chapter 5: Data Management

Section 1: Asking Questions of Data
Section 2: Selecting Variables
Case Study: Starbucks Queries its Customer Data
Section 3: Filtering and Ordering Observations
Case Study: Zara Filters to Move its Product Faster
Section 4: Summarizing and Structuring Data
Section 5: Merging Tables
Case Study: Merging Data to Combat the Spread of Disease
Putting It Together
Ethics in Practice: Data Privacy Regulation
Exercises

Chapter 6: Understanding Uncertainty, Probability, and Variability

Section 1: Variability and Uncertainty
Section 2: Probability
Case Study: FiveThirtyEight
Section 3: Sampling Methods
Case Study: Sabermetrics and Next-Gen Stats
Section 4: Simulation
Section 5: Working with Probabilities and Common Fallacies
Case Study: The Base Rate Fallacy of COVID-19 Misinformation in Iceland
Putting It Together
Ethics in Practice: Power in Sampling
Exercises

Chapter 7: Inference from Data

Section 1: Introduction to Statistical Inference
Section 2: Data Collection and Study Design
Case Study: Firearm Regulations and Causation Versus Correlation
Section 3: The Language of Statistical Inference
Section 4: Exploratory Data Analysis to Begin Inference
Section 5: Drawing Conclusions in an Observational Study
Section 6: A/B Testing as a Case of Experiments
Case Study: A/B Testing Rating System at Netflix
Putting It Together
Ethics in Practice: P-Hacking and The Reproducibility Crisis
Exercises

Chapter 8: Machine Learning

Section 1: Artificial Intelligence
Section 2: Three Steps in the Machine Learning Process
Case Study: How Tesla Uses Machine Learning
Section 3: Machine Learning Method Characteristics
Section 4: Machine Learning Method Evaluation
Section 5: Deep Learning
Case Study: ChatGPT
Case Study: Improving Safety in the Construction Industry Through Deep Learning
Section 6: Use High-Quality Data in Machine Learning
Putting It Together
Ethics in Practice: Social Justice in Data Science
Exercises

Chapter 9: Supervised Learning

Section 1: Linear Regression with a Categorical Explanatory Variable
Section 2: Linear Regression with a Quantitative Explanatory Variable
Section 3: Multiple Linear Regression
Case Study: Anesthesia and Regression
Section 4: Nonparametric Regression Models
Case Study: Improving Student Success and Satisfaction in Higher Education
Section 5: Classification Models
Putting It Together
Ethics in Practice: Extrapolation
Exercises

Chapter 10: Unsupervised Learning

Section 1: What is Unsupervised Learning?
Case Study: Anomaly Detection at Accenture
Section 2: Getting to Know Cluster Analysis
Section 3: Introduction to K-Means Clustering
Case Study: Spotify Uses Unsupervised Machine Learning for Personalization
Section 4: Introduction to Hierarchical Clustering
Section 5: Assessing the Quality of Clusters
Case Study: Advertising from Target
Putting It Together
Ethics in Practice: Subjectivity in Unsupervised Learning
Exercises

Testimonials

“I appreciate the variety in the visualizations and the visual appeal of them.”

- Devin Brown, student pilot user

“Its presentation of topics goes beyond mere definitions by incorporating real-world examples, enabling students to associate data science techniques with practical applications. This approach significantly enhances the learning process.”

- Professor, California State University - Pomona

“I love the explanations for the concepts. The language is very easy to understand and not overly technical. I love the level of detail on each of the topics presented, as well as including real world applications to motivate why these topics are important.”

- Karle Flanagan, University of Illinois- Champaign

Author Webinars

Data literacy is becoming a necessary foundational skill for the public. How can we integrate comprehensive introductory data science instruction into the undergraduate curriculum? What might a general education in data science course look like? How can educators meet the growing demand for this skill set and integrate data consumption and analysis into the broad educational goals of each institution? Join Drs. Hunter Glanz and Brennan Davis, professors at California Polytechnic State University, as they explore these questions and more.

Watch the webinar

Back to top