Pandas Cheat Sheet

A complete Pandas reference for Python data analysis — DataFrames, indexing, filtering, grouping, merging, time series, and more. Use the search bar to instantly find the operation you need.

🐼 Looking for the underlying language? Python Cheat Sheet

36 commands found
Filter by category:

import pandas

Setup & Import

Import the pandas library and common companions

Syntax:

import pandas as pd

Examples:

import pandas as pd
Standard pandas import alias
import pandas as pd
import numpy as np
Import pandas with NumPy for numerical operations
pd.__version__
Check the installed pandas version
pd.set_option('display.max_columns', None)
Show all columns when printing a DataFrame

Notes:

The 'pd' alias is the universal convention for pandas in scripts and notebooks.

pd.Series()

Creating Data

Create a one-dimensional labeled array

Syntax:

pd.Series(data, index=None, name=None)

Examples:

s = pd.Series([10, 20, 30, 40])
Create a Series from a list
s = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
Series with custom index labels
s = pd.Series({'a': 1, 'b': 2, 'c': 3})
Create a Series from a dictionary
s = pd.Series(np.random.randn(5), name='random')
Series with NumPy random values and a name

Notes:

A Series is essentially a single column — building block of a DataFrame.

pd.DataFrame()

Creating Data

Create a two-dimensional labeled data structure

Syntax:

pd.DataFrame(data, index=None, columns=None)

Examples:

df = pd.DataFrame({'name': ['Ana', 'Bob'], 'age': [25, 30]})
From a dictionary of lists
df = pd.DataFrame([[1, 2], [3, 4]], columns=['a', 'b'])
From a list of lists with column names
df = pd.DataFrame(np.random.rand(4, 3), columns=['x', 'y', 'z'])
From a NumPy array with column labels
df = pd.DataFrame(records, index=['r1', 'r2', 'r3'])
Provide a custom row index

Notes:

DataFrames are the central pandas object — think of them as in-memory spreadsheets.

pd.date_range()

Creating Data

Generate a fixed-frequency DatetimeIndex

Syntax:

pd.date_range(start, end=None, periods=None, freq='D')

Examples:

pd.date_range('2024-01-01', periods=7)
Seven daily timestamps starting Jan 1, 2024
pd.date_range('2024-01-01', '2024-12-31', freq='M')
Month-end dates across the year
pd.date_range('2024-01-01', periods=24, freq='H')
Hourly timestamps for one day
pd.date_range('2024-01-01', periods=10, freq='B')
Ten business days starting Jan 1, 2024

Notes:

Common freq values: 'D' day, 'B' business day, 'W' week, 'M' month-end, 'Q' quarter, 'Y' year, 'H' hour, 'T' minute.

pd.read_csv()

Reading & Writing

Read a CSV file into a DataFrame

Syntax:

pd.read_csv(filepath, sep=',', header=0, index_col=None, ...)

Examples:

df = pd.read_csv('data.csv')
Read a basic CSV file
df = pd.read_csv('data.csv', sep=';', encoding='utf-8')
Custom separator and encoding
df = pd.read_csv('data.csv', index_col=0, parse_dates=['date'])
Use first column as index, parse dates
df = pd.read_csv('data.csv', nrows=1000, usecols=['id', 'name'])
Read only first 1000 rows of selected columns

Notes:

Use chunksize=N for streaming large files instead of loading everything into memory.

df.to_csv()

Reading & Writing

Write a DataFrame to a CSV file

Syntax:

df.to_csv(path, sep=',', index=True, header=True)

Examples:

df.to_csv('output.csv', index=False)
Save without the row index
df.to_csv('output.csv', sep='\t', encoding='utf-8')
Tab-separated with UTF-8 encoding
df.to_csv('output.csv.gz', compression='gzip')
Compressed CSV output
df.to_csv('output.csv', columns=['a', 'b'], index=False)
Export only selected columns

Notes:

Always pass index=False unless the row index actually carries information.

Other formats

Reading & Writing

Read and write Excel, JSON, Parquet, SQL, and HTML

Syntax:

pd.read_excel / read_json / read_parquet / read_sql / read_html

Examples:

df = pd.read_excel('file.xlsx', sheet_name='Sheet1')
Read a specific Excel sheet
df = pd.read_json('data.json', orient='records')
Read a JSON file (record-oriented)
df = pd.read_parquet('data.parquet')
Read a Parquet columnar file
df = pd.read_sql('SELECT * FROM users', conn)
Run a SQL query and load results
df.to_excel('output.xlsx', sheet_name='Data', index=False)
Write to Excel
df.to_parquet('output.parquet')
Write to Parquet (compact, fast)

Notes:

Parquet is recommended for large datasets — much faster I/O and smaller files than CSV.

head() / tail() / sample()

Inspecting Data

Preview the top, bottom, or random rows of a DataFrame

Syntax:

df.head(n=5) / df.tail(n=5) / df.sample(n=5)

Examples:

df.head()
First 5 rows (default)
df.head(20)
First 20 rows
df.tail(3)
Last 3 rows
df.sample(n=5, random_state=42)
5 random rows (reproducible)

Notes:

Use sample() for visual sanity checks — head() can be misleading on sorted data.

info() / describe() / shape

Inspecting Data

Inspect dtypes, summary statistics, and dimensions

Syntax:

df.info() / df.describe() / df.shape

Examples:

df.info()
Column dtypes, non-null counts, memory usage
df.describe()
Summary statistics for numeric columns
df.describe(include='all')
Stats for all columns (numeric + object)
df.shape
Tuple (rows, columns)
df.dtypes
Data type of each column
df.columns
List of column names

Notes:

df.info() is the fastest way to spot missing data and incorrect dtypes.

value_counts()

Inspecting Data

Count unique values in a Series

Syntax:

s.value_counts(normalize=False, dropna=True)

Examples:

df['city'].value_counts()
Frequency count of each city
df['city'].value_counts(normalize=True)
Return percentages instead of counts
df['city'].value_counts(dropna=False)
Include NaN in the counts
df['city'].nunique()
Number of distinct values

Notes:

Indispensable for exploring categorical columns — sorts results by count by default.

Select columns

Selecting & Indexing

Access one or multiple columns by name

Syntax:

df['col'] / df[['col1', 'col2']]

Examples:

df['name']
Single column as a Series
df[['name', 'age']]
Multiple columns as a DataFrame
df.name
Dot-access (only works for valid identifiers)
df.filter(like='date')
Columns whose names contain 'date'
df.filter(regex='^id_')
Columns matching a regex pattern

Notes:

Always prefer df['col'] over df.col — dot access fails on names with spaces or special chars.

.loc[]

Selecting & Indexing

Label-based row and column selection

Syntax:

df.loc[row_label, col_label]

Examples:

df.loc[3]
Row with label 3
df.loc[0:5, 'name']
Rows 0–5 (inclusive), name column
df.loc[:, ['name', 'age']]
All rows, specific columns
df.loc[df['age'] > 21, 'name']
Boolean filter combined with column selection
df.loc[3, 'age'] = 99
Update a single cell by label

Notes:

.loc is inclusive on both ends of slices, unlike Python list slicing.

.iloc[]

Selecting & Indexing

Integer position-based row and column selection

Syntax:

df.iloc[row_pos, col_pos]

Examples:

df.iloc[0]
First row
df.iloc[-1]
Last row
df.iloc[0:3, 0:2]
First 3 rows, first 2 columns
df.iloc[[0, 2, 4], :]
Rows 0, 2, and 4 with all columns
df.iloc[:, -1]
Last column, all rows

Notes:

.iloc uses standard Python slicing — end index is exclusive.

Boolean indexing

Filtering

Filter rows using a boolean condition

Syntax:

df[condition]

Examples:

df[df['age'] > 21]
Rows where age > 21
df[(df['age'] > 21) & (df['country'] == 'US')]
Combine conditions with & and |
df[~(df['status'] == 'inactive')]
Negate a condition with ~
df[df['name'].str.startswith('A')]
Names starting with 'A'

Notes:

Always wrap each condition in parentheses — & and | have higher precedence than comparison operators.

query()

Filtering

Filter rows using a string expression

Syntax:

df.query(expr)

Examples:

df.query('age > 21')
Rows where age > 21
df.query('age > 21 and country == "US"')
Multiple conditions, readable syntax
df.query('city in ["NYC", "LA", "SF"]')
Filter using membership
min_age = 30
df.query('age >= @min_age')
Reference a Python variable with @

Notes:

Often more readable than boolean indexing — and faster on large DataFrames.

isin() / between()

Filtering

Filter by membership or numeric range

Syntax:

s.isin(values) / s.between(low, high)

Examples:

df[df['city'].isin(['NYC', 'LA', 'SF'])]
Rows whose city is in a list
df[~df['city'].isin(['NYC'])]
Rows whose city is NOT NYC
df[df['age'].between(18, 65)]
Age between 18 and 65 (inclusive)
df[df['date'].between('2024-01-01', '2024-12-31')]
Date range filter

Notes:

between() is inclusive on both ends — pass inclusive='neither' to exclude bounds.

isna() / dropna() / fillna()

Cleaning

Detect, drop, or impute missing values

Syntax:

df.isna() / df.dropna() / df.fillna(value)

Examples:

df.isna().sum()
Count missing values per column
df.dropna()
Drop rows containing any NaN
df.dropna(subset=['email'], how='any')
Drop rows with NaN in 'email' only
df.fillna(0)
Replace all NaN with 0
df['age'].fillna(df['age'].mean(), inplace=True)
Fill NaN with the column mean
df.fillna(method='ffill')
Forward-fill — propagate last valid value

Notes:

Always investigate WHY data is missing before dropping or imputing.

duplicated() / drop_duplicates()

Cleaning

Identify and remove duplicate rows

Syntax:

df.duplicated() / df.drop_duplicates(subset=None)

Examples:

df.duplicated().sum()
Count duplicate rows
df.drop_duplicates()
Remove all duplicate rows (keep first)
df.drop_duplicates(subset=['email'])
Remove duplicates based on email column
df.drop_duplicates(keep='last')
Keep the last occurrence instead of the first

Notes:

Pass keep=False to remove ALL occurrences of duplicates.

rename()

Cleaning

Rename columns or index labels

Syntax:

df.rename(columns={...}, index={...})

Examples:

df.rename(columns={'old_name': 'new_name'})
Rename a single column
df.rename(columns=str.lower)
Lowercase all column names
df.columns = ['a', 'b', 'c']
Replace all column names at once
df.rename(index={0: 'first', 1: 'second'})
Rename row index labels

Notes:

Pass inplace=True to modify the DataFrame directly without re-assignment.

astype() / replace()

Cleaning

Convert dtypes or substitute specific values

Syntax:

df.astype(dtype) / df.replace(old, new)

Examples:

df['age'] = df['age'].astype(int)
Convert column to integer
df['date'] = pd.to_datetime(df['date'])
Convert string to datetime
df['category'] = df['category'].astype('category')
Convert to memory-efficient categorical
df.replace({'N/A': np.nan, '-': np.nan})
Replace placeholder strings with NaN
df['status'].replace({'A': 'active', 'I': 'inactive'})
Map short codes to full labels

Notes:

Categorical dtype dramatically reduces memory for low-cardinality string columns.

apply() / map() / applymap()

Modifying

Apply a function across rows, columns, or every cell

Syntax:

df.apply(func) / s.map(func) / df.applymap(func)

Examples:

df['name'].map(str.upper)
Uppercase a Series with a function
df['age'].apply(lambda x: x * 2)
Apply a lambda to each value
df.apply(lambda col: col.max() - col.min())
Apply a function to each column
df.apply(lambda row: row['a'] + row['b'], axis=1)
Apply a function row-wise (axis=1)
df.applymap(lambda x: f'${x}')
Apply to every cell of a DataFrame

Notes:

Vectorized operations (df['x'] * 2) are far faster than apply() — only use apply for complex logic.

Add / drop columns

Modifying

Create new columns or remove existing ones

Syntax:

df['new_col'] = ... / df.drop(columns=[...])

Examples:

df['total'] = df['price'] * df['quantity']
Computed column from existing ones
df['rank'] = range(1, len(df) + 1)
Add a sequential column
df.drop(columns=['unused'])
Drop a single column
df.drop(['col1', 'col2'], axis=1, inplace=True)
Drop multiple columns in place
df.assign(total=df['price'] * df['qty'])
Method-chain-friendly column add

Notes:

df.assign() returns a new DataFrame — ideal for fluent method chains.

sort_values() / sort_index()

Sorting

Sort rows by column values or by the index

Syntax:

df.sort_values(by, ascending=True) / df.sort_index()

Examples:

df.sort_values('age')
Sort ascending by age
df.sort_values('age', ascending=False)
Sort descending by age
df.sort_values(['country', 'age'], ascending=[True, False])
Sort by multiple columns with mixed order
df.sort_index()
Sort by the row index
df.nlargest(5, 'sales')
5 rows with the largest sales values
df.nsmallest(5, 'price')
5 rows with the smallest price values

Notes:

nlargest / nsmallest are faster than sort_values + head when you only need the top N.

groupby()

Grouping & Aggregation

Group rows by a column and aggregate each group

Syntax:

df.groupby(by).agg(func)

Examples:

df.groupby('country')['sales'].sum()
Total sales per country
df.groupby('country').agg({'sales': 'sum', 'orders': 'count'})
Different aggregations per column
df.groupby(['country', 'year'])['sales'].mean()
Group by two columns
df.groupby('country').agg(['sum', 'mean', 'count'])
Multiple aggregations per group
df.groupby('country').filter(lambda g: len(g) > 10)
Keep only groups with >10 rows
df.groupby('country').transform('mean')
Broadcast group mean back to original shape

Notes:

Pass as_index=False to keep the grouping column as a regular column rather than the index.

Aggregation functions

Grouping & Aggregation

Common reduction functions on Series and DataFrames

Syntax:

df.sum() / mean() / median() / min() / max() / std() / var() / count()

Examples:

df['sales'].sum()
Total of a column
df.mean(numeric_only=True)
Mean of every numeric column
df['sales'].agg(['sum', 'mean', 'std'])
Multiple aggregations on one column
df.agg({'sales': 'sum', 'price': 'mean'})
Different aggregations per column
df['sales'].cumsum()
Cumulative sum (running total)
df['sales'].rank(method='dense')
Rank values (dense — no gaps after ties)

Notes:

Pass numeric_only=True to silence warnings on DataFrames with mixed dtypes (pandas 2.x).

pivot_table() / pivot()

Grouping & Aggregation

Reshape data into a pivot summary table

Syntax:

df.pivot_table(index, columns, values, aggfunc='mean')

Examples:

df.pivot_table(index='country', columns='year', values='sales')
Country × year matrix of mean sales
df.pivot_table(index='country', values='sales', aggfunc='sum')
Total sales per country
df.pivot_table(index='country', columns='year', values='sales', aggfunc='sum', fill_value=0)
Sum aggregation with NaN replaced by 0
df.pivot(index='date', columns='product', values='price')
pivot() — reshape only, no aggregation
pd.crosstab(df['country'], df['status'])
Cross-tabulation (frequency table)

Notes:

Use pivot_table when there are duplicates in the index/columns — pivot will raise an error.

merge()

Merging & Joining

SQL-style join of two DataFrames on key columns

Syntax:

pd.merge(left, right, on=, how='inner')

Examples:

pd.merge(df1, df2, on='id')
Inner join on the 'id' column
pd.merge(df1, df2, on='id', how='left')
Left join — keep all rows from df1
pd.merge(df1, df2, on='id', how='outer')
Outer join — keep all rows from both
pd.merge(df1, df2, left_on='user_id', right_on='id')
Different key column names
pd.merge(df1, df2, on='id', suffixes=('_left', '_right'))
Custom suffixes for overlapping columns

Notes:

Always check the resulting row count — duplicates in either side cause unexpected row multiplication.

concat()

Merging & Joining

Stack DataFrames vertically or horizontally

Syntax:

pd.concat(objs, axis=0)

Examples:

pd.concat([df1, df2])
Stack vertically (more rows)
pd.concat([df1, df2], ignore_index=True)
Reset the index after concatenation
pd.concat([df1, df2], axis=1)
Stack horizontally (more columns)
pd.concat([df1, df2], keys=['a', 'b'])
Add a hierarchical key per source

Notes:

concat is for stacking; merge is for joining on a key. Don't mix them up.

join()

Merging & Joining

Join DataFrames using their indexes

Syntax:

df1.join(df2, how='left')

Examples:

df1.join(df2)
Left join on the index
df1.join(df2, how='inner')
Inner join on the index
df1.join([df2, df3])
Join multiple DataFrames at once

Notes:

join() is a thin wrapper around merge() that defaults to using the index. Use merge() when joining on column values.

pd.to_datetime()

Time Series

Convert strings or numbers to pandas datetime

Syntax:

pd.to_datetime(arg, format=None, errors='raise')

Examples:

df['date'] = pd.to_datetime(df['date'])
Auto-parse a date column
pd.to_datetime(df['date'], format='%Y-%m-%d')
Explicit format (faster, fails on bad rows)
pd.to_datetime(df['date'], errors='coerce')
Set unparseable values to NaT instead of raising
df['ts'] = pd.to_datetime(df['epoch'], unit='s')
Convert Unix epoch seconds

Notes:

Set the column as the index (df.set_index('date')) to unlock resample() and time-based slicing.

.dt accessor

Time Series

Access datetime components on a Series

Syntax:

s.dt.

Examples:

df['date'].dt.year
Extract the year
df['date'].dt.month_name()
Month name (e.g. 'January')
df['date'].dt.day_name()
Day of the week name
df['date'].dt.dayofweek
Day of the week (0=Mon, 6=Sun)
df['date'].dt.strftime('%Y-%m')
Format as a string

Notes:

The .dt accessor only works on datetime Series — convert with pd.to_datetime first.

resample()

Time Series

Group time-series data by a frequency and aggregate

Syntax:

df.resample(rule).agg(func)

Examples:

df.set_index('date').resample('D').sum()
Daily totals
df.resample('W')['sales'].mean()
Weekly mean of sales
df.resample('M').agg({'sales': 'sum', 'orders': 'count'})
Monthly aggregation across columns
df.resample('Q').last()
Quarter-end snapshot (last value)
df['sales'].rolling(window=7).mean()
7-day rolling mean (moving average)

Notes:

Resample requires a DatetimeIndex — call set_index() on your date column first.

.str accessor

String Operations

Vectorized string methods on a Series

Syntax:

s.str.

Examples:

df['name'].str.lower()
Lowercase every string
df['name'].str.strip()
Strip leading and trailing whitespace
df['email'].str.contains('@gmail')
Boolean mask for substring match
df['name'].str.replace('Mr. ', '', regex=False)
Replace a literal substring
df['phone'].str.extract(r'(\d{3})-(\d{4})')
Extract regex capture groups into columns
df['tags'].str.split(',', expand=True)
Split into multiple columns

Notes:

All .str methods skip NaN values silently — no need to dropna first.

set_index() / reset_index()

Index Operations

Promote a column to the index or restore default integer index

Syntax:

df.set_index(col) / df.reset_index()

Examples:

df.set_index('id')
Use 'id' column as the row index
df.set_index(['country', 'year'])
Create a MultiIndex from two columns
df.reset_index()
Move the index back to columns and reset to integer
df.reset_index(drop=True)
Reset the index and discard the old one
df.sort_index(level=0)
Sort a MultiIndex by the first level

Notes:

MultiIndex DataFrames enable powerful hierarchical slicing — use df.xs(key, level=) to cross-section.

Statistical methods

Statistics

Built-in statistical and mathematical operations

Syntax:

df.corr() / cov() / quantile() / pct_change() / diff()

Examples:

df.corr()
Pearson correlation matrix between numeric columns
df['sales'].quantile([0.25, 0.5, 0.75])
Quartiles of a column
df['price'].pct_change()
Percent change between consecutive rows
df['sales'].diff()
Difference between consecutive rows
df['x'].corr(df['y'])
Correlation between two specific columns

Notes:

df.corr(method='spearman') for rank correlation when relationships aren't linear.

df.plot()

Visualization

Quick built-in matplotlib visualizations

Syntax:

df.plot(kind='line', x=, y=)

Examples:

df.plot(x='date', y='sales')
Line plot (default)
df.plot(kind='bar', x='country', y='sales')
Bar chart
df['age'].plot(kind='hist', bins=20)
Histogram
df.plot(kind='scatter', x='age', y='income')
Scatter plot
df.boxplot(column='sales', by='country')
Box plot grouped by country

Notes:

Run %matplotlib inline in Jupyter, or call plt.show() in scripts to display the plot.

About the Pandas Cheat Sheet

This Pandas cheat sheet is a searchable, copy-ready quick reference for the most-used operations in the pandas Python library — the de-facto standard for tabular data analysis. It covers loading data, exploring it, cleaning it, reshaping it, joining it together, and producing aggregates and time-series summaries.

Every command on this page is grouped into a clear category and includes a real Python example with syntax highlighting. Whether you are wrangling a CSV in Jupyter, building an ETL pipeline, or studying for a data engineering interview, this reference helps you skip the documentation tab and find the right method in seconds.

Pandas is built on top of NumPy and provides two primary data structures — the one-dimensional Series and the two-dimensional DataFrame — alongside hundreds of methods for I/O, selection, group-wise computation, and time-series analysis.

How to Use This Cheat Sheet

  1. 1 Search — type any keyword (e.g. groupby, merge, fillna, resample) into the search bar to instantly filter all matching commands and examples.
  2. 2 Filter by category — click a category pill (Cleaning, Grouping & Aggregation, Time Series, etc.) to narrow the list to a single topic.
  3. 3 Copy and adapt — every example is real, runnable Python. Select the snippet, paste it into your notebook, and tweak the column names to fit your DataFrame.
  4. 4 Read the notes — the small italic note under each command captures the gotcha, performance tip, or "why this matters" that the official docs often bury.

Common Use Cases

Exploratory Data Analysis

Loading a CSV, eyeballing it with head() / info(), computing summary stats, and surfacing missing values before deciding how to clean.

Data Cleaning & Wrangling

Dropping duplicates, filling NaN, converting types, renaming columns, and applying functions to standardise messy real-world data into something analysable.

Aggregation & Reporting

Using groupby(), pivot_table(), and crosstab() to roll up granular data into summary tables for dashboards or stakeholders.

Joining Datasets

SQL-style merge() and concat() for combining customer data with orders, lookup tables with fact tables, or stacking files from many days.

Time-Series Analysis

Resampling to daily/weekly/monthly grain, computing rolling means, and extracting datetime components for seasonality and trend analysis.

ETL & Data Pipelines

Reading from CSV, Excel, Parquet, or SQL; transforming with vectorised operations; and writing to a downstream warehouse or analytics tool.

Frequently Asked Questions

What is Pandas used for?

Pandas is a Python library for analysing tabular data — anything you would put in a spreadsheet or SQL table. It is widely used for data cleaning, exploration, transformation, time-series analysis, and as the input layer for machine learning libraries like scikit-learn. If your data has rows and columns, pandas is probably the fastest way to work with it in Python.

What is the difference between a Series and a DataFrame?

A Series is a one-dimensional labelled array — essentially a single column with an index. A DataFrame is a two-dimensional structure with rows and columns, where every column is a Series. Selecting a single column from a DataFrame returns a Series; selecting multiple columns returns a DataFrame.

Should I use loc or iloc?

Use .loc[] when you want to select by label (column names or index labels). Use .iloc[] when you want to select by integer position. One key gotcha: .loc slices are inclusive on both ends, while .iloc follows standard Python slicing (end is exclusive).

How do I handle missing values in Pandas?

Detect missing data with df.isna().sum(), drop it with df.dropna(), or fill it with df.fillna(value). Common imputation strategies include filling with a constant (0 or "Unknown"), the column mean/median, or a forward-fill (method='ffill') for time-series data. Always investigate why values are missing before deciding which strategy to use.

When should I use apply() vs vectorised operations?

Always prefer vectorised operations like df['x'] * 2 or df['a'] + df['b'] — they run in optimised C code and are typically 10–100× faster than apply(). Reserve apply() for genuinely complex per-row logic that cannot be expressed in vectorised form. The .str and .dt accessors are also vectorised and should be used over apply.

What is the difference between merge, join, and concat?

pd.merge() performs SQL-style joins on key columns and is the most flexible. df.join() is a thin wrapper around merge that defaults to joining on the index. pd.concat() stacks DataFrames end-to-end (vertically by default, horizontally with axis=1) without any key matching — use it for combining files of the same shape.

Why is my Pandas code slow on large datasets?

Common causes: using apply() or iterrows() instead of vectorised ops; not setting the right dtypes (use 'category' for low-cardinality strings); reading CSV when Parquet would be much faster; or chaining many intermediate DataFrames that copy memory. For datasets larger than memory, consider Polars, Dask, or DuckDB as drop-in-ish alternatives.

Is this cheat sheet up to date with Pandas 2.x?

Yes. The examples are written for modern pandas (2.x) — including the numeric_only requirement on aggregations over mixed-dtype DataFrames, copy_on_write-friendly patterns, and the recommendation to use pd.to_datetime() with an explicit format= argument for speed.