A Practical EDA Checklist

A practical EDA checklist helps you inspect a dataset before modeling: understand its structure, summarize key statistics, explore distributions, detect outliers, study relationships, review categories, and connect findings to modeling decisions.

A Practical EDA Checklist
Illustration generated with Nano Banana 2 Pro. EDA checklist

Use this checklist whenever you receive a new dataset.

  1. Dataset overview
    1. How many rows and columns?
    2. What does each row represent?
    3. What does each column mean?
    4. Are the data types correct?
    5. Are there duplicate rows?
  2. Descriptive statistics
    1. What are the mean and median?
    2. Are they close or far apart?
    3. What are the minimum and maximum values?
    4. What is the variance or standard deviation?
    5. Are there impossible values?
  3. Distributions
    1. Is the data symmetric or skewed?
    2. Are there multiple peaks?
    3. Are there long tails?
    4. Should variables be transformed?
  4. Outliers
    1. Which values are extreme?
    2. Are they errors or valid rare cases?
    3. Should they be kept, removed, capped, transformed, or segmented?
  5. Relationships
    1. Which variables correlate with the target?
    2. Which variables correlate with each other?
    3. Are relationships linear or curved?
    4. Do scatter plots reveal clusters or exceptions?
  6. Categorical variables
    1. Which categories are most common?
    2. Are there rare categories?
    3. Do categories have different target distributions?
    4. Are category labels consistent?
  7. Modeling implications
    1. Which features seem promising?
    2. Which features may need cleaning?
    3. Which variables may leak target information?
    4. Which assumptions should be tested later?

Series Parts

Managing Data Science – From Concept to Governance

  1. The Analytics Continuum
  2. Exploratory Data Analysis EDA& statistics
  3. A Practical EDA Checklist
  4. The How-To Guide: Step-by-Step EDA in Python; next