Pandas Cheat Sheet

A complete Pandas reference for Python data analysis — DataFrames, indexing, filtering, grouping, merging, time series, and more. Use the search bar to instantly find the operation you need.

🐼 Looking for the underlying language? Python Cheat Sheet

36 commands found

Filter by category:

`import pandas`

Setup & Import

Import the pandas library and common companions

Syntax:

import pandas as pd

Examples:

import pandas as pd

Standard pandas import alias

import pandas as pd
import numpy as np

Import pandas with NumPy for numerical operations

pd.__version__

Check the installed pandas version

pd.set_option('display.max_columns', None)

Show all columns when printing a DataFrame

Notes:

The 'pd' alias is the universal convention for pandas in scripts and notebooks.

`pd.Series()`

Creating Data

Create a one-dimensional labeled array

Syntax:

pd.Series(data, index=None, name=None)

Examples:

s = pd.Series([10, 20, 30, 40])

Create a Series from a list

s = pd.Series([1, 2, 3], index=['a', 'b', 'c'])

Series with custom index labels

s = pd.Series({'a': 1, 'b': 2, 'c': 3})

Create a Series from a dictionary

s = pd.Series(np.random.randn(5), name='random')

Series with NumPy random values and a name

Notes:

A Series is essentially a single column — building block of a DataFrame.

`pd.DataFrame()`

Creating Data

Create a two-dimensional labeled data structure

Syntax:

pd.DataFrame(data, index=None, columns=None)

Examples:

df = pd.DataFrame({'name': ['Ana', 'Bob'], 'age': [25, 30]})

From a dictionary of lists

df = pd.DataFrame([[1, 2], [3, 4]], columns=['a', 'b'])

From a list of lists with column names

df = pd.DataFrame(np.random.rand(4, 3), columns=['x', 'y', 'z'])

From a NumPy array with column labels

df = pd.DataFrame(records, index=['r1', 'r2', 'r3'])

Provide a custom row index

Notes:

DataFrames are the central pandas object — think of them as in-memory spreadsheets.

`pd.date_range()`

Creating Data

Generate a fixed-frequency DatetimeIndex

Syntax:

pd.date_range(start, end=None, periods=None, freq='D')

Examples:

pd.date_range('2024-01-01', periods=7)

Seven daily timestamps starting Jan 1, 2024

pd.date_range('2024-01-01', '2024-12-31', freq='M')

Month-end dates across the year

pd.date_range('2024-01-01', periods=24, freq='H')

Hourly timestamps for one day

pd.date_range('2024-01-01', periods=10, freq='B')

Ten business days starting Jan 1, 2024

Notes:

Common freq values: 'D' day, 'B' business day, 'W' week, 'M' month-end, 'Q' quarter, 'Y' year, 'H' hour, 'T' minute.

`pd.read_csv()`

Reading & Writing

Read a CSV file into a DataFrame

Syntax:

pd.read_csv(filepath, sep=',', header=0, index_col=None, ...)

Examples:

df = pd.read_csv('data.csv')

Read a basic CSV file

df = pd.read_csv('data.csv', sep=';', encoding='utf-8')

Custom separator and encoding

df = pd.read_csv('data.csv', index_col=0, parse_dates=['date'])

Use first column as index, parse dates

df = pd.read_csv('data.csv', nrows=1000, usecols=['id', 'name'])

Read only first 1000 rows of selected columns

Notes:

Use chunksize=N for streaming large files instead of loading everything into memory.

`df.to_csv()`

Reading & Writing

Write a DataFrame to a CSV file

Syntax:

df.to_csv(path, sep=',', index=True, header=True)

Examples:

df.to_csv('output.csv', index=False)

Save without the row index

df.to_csv('output.csv', sep='\t', encoding='utf-8')

Tab-separated with UTF-8 encoding

df.to_csv('output.csv.gz', compression='gzip')

Compressed CSV output

df.to_csv('output.csv', columns=['a', 'b'], index=False)

Export only selected columns

Notes:

Always pass index=False unless the row index actually carries information.

`Other formats`

Reading & Writing

Read and write Excel, JSON, Parquet, SQL, and HTML

Syntax:

pd.read_excel / read_json / read_parquet / read_sql / read_html

Examples:

df = pd.read_excel('file.xlsx', sheet_name='Sheet1')

Read a specific Excel sheet

df = pd.read_json('data.json', orient='records')

Read a JSON file (record-oriented)

df = pd.read_parquet('data.parquet')

Read a Parquet columnar file

df = pd.read_sql('SELECT * FROM users', conn)

Run a SQL query and load results

df.to_excel('output.xlsx', sheet_name='Data', index=False)

Write to Excel

df.to_parquet('output.parquet')

Write to Parquet (compact, fast)

Notes:

Parquet is recommended for large datasets — much faster I/O and smaller files than CSV.

`head() / tail() / sample()`

Inspecting Data

Preview the top, bottom, or random rows of a DataFrame

Syntax:

df.head(n=5) / df.tail(n=5) / df.sample(n=5)

Examples:

df.head()

First 5 rows (default)

df.head(20)

First 20 rows

df.tail(3)

Last 3 rows

df.sample(n=5, random_state=42)

5 random rows (reproducible)

Notes:

Use sample() for visual sanity checks — head() can be misleading on sorted data.

`info() / describe() / shape`

Inspecting Data

Inspect dtypes, summary statistics, and dimensions

Syntax:

df.info() / df.describe() / df.shape

Examples:

df.info()

Column dtypes, non-null counts, memory usage

df.describe()

Summary statistics for numeric columns

df.describe(include='all')

Stats for all columns (numeric + object)

df.shape

Tuple (rows, columns)

df.dtypes

Data type of each column

df.columns

List of column names

Notes:

df.info() is the fastest way to spot missing data and incorrect dtypes.

`value_counts()`

Inspecting Data

Count unique values in a Series

Syntax:

s.value_counts(normalize=False, dropna=True)

Examples:

df['city'].value_counts()

Frequency count of each city

df['city'].value_counts(normalize=True)

Return percentages instead of counts

df['city'].value_counts(dropna=False)

Include NaN in the counts

df['city'].nunique()

Number of distinct values

Notes:

Indispensable for exploring categorical columns — sorts results by count by default.

`Select columns`

Selecting & Indexing

Access one or multiple columns by name

Syntax:

df['col'] / df[['col1', 'col2']]

Examples:

df['name']

Single column as a Series

df[['name', 'age']]

Multiple columns as a DataFrame

df.name

Dot-access (only works for valid identifiers)

df.filter(like='date')

Columns whose names contain 'date'

df.filter(regex='^id_')

Columns matching a regex pattern

Notes:

Always prefer df['col'] over df.col — dot access fails on names with spaces or special chars.

`.loc[]`

Selecting & Indexing

Label-based row and column selection

Syntax:

df.loc[row_label, col_label]

Examples:

df.loc[3]

Row with label 3

df.loc[0:5, 'name']

Rows 0–5 (inclusive), name column

df.loc[:, ['name', 'age']]

All rows, specific columns

df.loc[df['age'] > 21, 'name']

Boolean filter combined with column selection

df.loc[3, 'age'] = 99

Update a single cell by label

Notes:

.loc is inclusive on both ends of slices, unlike Python list slicing.

`.iloc[]`

Selecting & Indexing

Integer position-based row and column selection

Syntax:

df.iloc[row_pos, col_pos]

Examples:

df.iloc[0]

First row

df.iloc[-1]

Last row

df.iloc[0:3, 0:2]

First 3 rows, first 2 columns

df.iloc[[0, 2, 4], :]

Rows 0, 2, and 4 with all columns

df.iloc[:, -1]

Last column, all rows

Notes:

.iloc uses standard Python slicing — end index is exclusive.

`Boolean indexing`

Filtering

Filter rows using a boolean condition

Syntax:

df[condition]

Examples:

df[df['age'] > 21]

Rows where age > 21

df[(df['age'] > 21) & (df['country'] == 'US')]

Combine conditions with & and |

df[~(df['status'] == 'inactive')]

Negate a condition with ~

df[df['name'].str.startswith('A')]

Names starting with 'A'

Notes:

Always wrap each condition in parentheses — & and | have higher precedence than comparison operators.

`query()`

Filtering

Filter rows using a string expression

Syntax:

df.query(expr)

Examples:

df.query('age > 21')

Rows where age > 21

df.query('age > 21 and country == "US"')

Multiple conditions, readable syntax

df.query('city in ["NYC", "LA", "SF"]')

Filter using membership

min_age = 30
df.query('age >= @min_age')

Reference a Python variable with @

Notes:

Often more readable than boolean indexing — and faster on large DataFrames.

`isin() / between()`

Filtering

Filter by membership or numeric range

Syntax:

s.isin(values) / s.between(low, high)

Examples:

df[df['city'].isin(['NYC', 'LA', 'SF'])]

Rows whose city is in a list

df[~df['city'].isin(['NYC'])]

Rows whose city is NOT NYC

df[df['age'].between(18, 65)]

Age between 18 and 65 (inclusive)

df[df['date'].between('2024-01-01', '2024-12-31')]

Date range filter

Notes:

between() is inclusive on both ends — pass inclusive='neither' to exclude bounds.

`isna() / dropna() / fillna()`

Cleaning

Detect, drop, or impute missing values

Syntax:

df.isna() / df.dropna() / df.fillna(value)

Examples:

df.isna().sum()

Count missing values per column

df.dropna()

Drop rows containing any NaN

df.dropna(subset=['email'], how='any')

Drop rows with NaN in 'email' only

df.fillna(0)

Replace all NaN with 0

df['age'].fillna(df['age'].mean(), inplace=True)

Fill NaN with the column mean

df.fillna(method='ffill')

Forward-fill — propagate last valid value

Notes:

Always investigate WHY data is missing before dropping or imputing.

`duplicated() / drop_duplicates()`

Cleaning

Identify and remove duplicate rows

Syntax:

df.duplicated() / df.drop_duplicates(subset=None)

Examples:

df.duplicated().sum()

Count duplicate rows

df.drop_duplicates()

Remove all duplicate rows (keep first)

df.drop_duplicates(subset=['email'])

Remove duplicates based on email column

df.drop_duplicates(keep='last')

Keep the last occurrence instead of the first

Notes:

Pass keep=False to remove ALL occurrences of duplicates.

`rename()`

Cleaning

Rename columns or index labels

Syntax:

df.rename(columns={...}, index={...})

Examples:

df.rename(columns={'old_name': 'new_name'})

Rename a single column

df.rename(columns=str.lower)

Lowercase all column names

df.columns = ['a', 'b', 'c']

Replace all column names at once

df.rename(index={0: 'first', 1: 'second'})

Rename row index labels

Notes:

Pass inplace=True to modify the DataFrame directly without re-assignment.

`astype() / replace()`

Cleaning

Convert dtypes or substitute specific values

Syntax:

df.astype(dtype) / df.replace(old, new)

Examples:

df['age'] = df['age'].astype(int)

Convert column to integer

df['date'] = pd.to_datetime(df['date'])

Convert string to datetime

df['category'] = df['category'].astype('category')

Convert to memory-efficient categorical

df.replace({'N/A': np.nan, '-': np.nan})

Replace placeholder strings with NaN

df['status'].replace({'A': 'active', 'I': 'inactive'})

Map short codes to full labels

Notes:

Categorical dtype dramatically reduces memory for low-cardinality string columns.

`apply() / map() / applymap()`

Modifying

Apply a function across rows, columns, or every cell

Syntax:

df.apply(func) / s.map(func) / df.applymap(func)

Examples:

df['name'].map(str.upper)

Uppercase a Series with a function

df['age'].apply(lambda x: x * 2)

Apply a lambda to each value

df.apply(lambda col: col.max() - col.min())

Apply a function to each column

df.apply(lambda row: row['a'] + row['b'], axis=1)

Apply a function row-wise (axis=1)

df.applymap(lambda x: f'${x}')

Apply to every cell of a DataFrame

Notes:

Vectorized operations (df['x'] * 2) are far faster than apply() — only use apply for complex logic.

`Add / drop columns`

Modifying

Create new columns or remove existing ones

Syntax:

df['new_col'] = ... / df.drop(columns=[...])

Examples:

df['total'] = df['price'] * df['quantity']

Computed column from existing ones

df['rank'] = range(1, len(df) + 1)

Add a sequential column

df.drop(columns=['unused'])

Drop a single column

df.drop(['col1', 'col2'], axis=1, inplace=True)

Drop multiple columns in place

df.assign(total=df['price'] * df['qty'])

Method-chain-friendly column add

Notes:

df.assign() returns a new DataFrame — ideal for fluent method chains.

`sort_values() / sort_index()`

Sorting

Sort rows by column values or by the index

Syntax:

df.sort_values(by, ascending=True) / df.sort_index()

Examples:

df.sort_values('age')

Sort ascending by age

df.sort_values('age', ascending=False)

Sort descending by age

df.sort_values(['country', 'age'], ascending=[True, False])

Sort by multiple columns with mixed order

df.sort_index()

Sort by the row index

df.nlargest(5, 'sales')

5 rows with the largest sales values

df.nsmallest(5, 'price')

5 rows with the smallest price values

Notes:

nlargest / nsmallest are faster than sort_values + head when you only need the top N.

`groupby()`

Grouping & Aggregation

Group rows by a column and aggregate each group

Syntax:

df.groupby(by).agg(func)

Examples:

df.groupby('country')['sales'].sum()

Total sales per country

df.groupby('country').agg({'sales': 'sum', 'orders': 'count'})

Different aggregations per column

df.groupby(['country', 'year'])['sales'].mean()

Group by two columns

df.groupby('country').agg(['sum', 'mean', 'count'])

Multiple aggregations per group

df.groupby('country').filter(lambda g: len(g) > 10)

Keep only groups with >10 rows

df.groupby('country').transform('mean')

Broadcast group mean back to original shape

Notes:

Pass as_index=False to keep the grouping column as a regular column rather than the index.

`Aggregation functions`

Grouping & Aggregation

Common reduction functions on Series and DataFrames

Syntax:

df.sum() / mean() / median() / min() / max() / std() / var() / count()

Examples:

df['sales'].sum()

Total of a column

df.mean(numeric_only=True)

Mean of every numeric column

df['sales'].agg(['sum', 'mean', 'std'])

Multiple aggregations on one column

df.agg({'sales': 'sum', 'price': 'mean'})

Different aggregations per column

df['sales'].cumsum()

Cumulative sum (running total)

df['sales'].rank(method='dense')

Rank values (dense — no gaps after ties)

Notes:

Pass numeric_only=True to silence warnings on DataFrames with mixed dtypes (pandas 2.x).

`pivot_table() / pivot()`

Grouping & Aggregation

Reshape data into a pivot summary table

Syntax:

df.pivot_table(index, columns, values, aggfunc='mean')

Examples:

df.pivot_table(index='country', columns='year', values='sales')

Country × year matrix of mean sales

df.pivot_table(index='country', values='sales', aggfunc='sum')

Total sales per country

df.pivot_table(index='country', columns='year', values='sales', aggfunc='sum', fill_value=0)

Sum aggregation with NaN replaced by 0

df.pivot(index='date', columns='product', values='price')

pivot() — reshape only, no aggregation

pd.crosstab(df['country'], df['status'])

Cross-tabulation (frequency table)

Notes:

Use pivot_table when there are duplicates in the index/columns — pivot will raise an error.

`merge()`

Merging & Joining

SQL-style join of two DataFrames on key columns

Syntax:

pd.merge(left, right, on=, how='inner')

Examples:

pd.merge(df1, df2, on='id')

Inner join on the 'id' column

pd.merge(df1, df2, on='id', how='left')

Left join — keep all rows from df1

pd.merge(df1, df2, on='id', how='outer')

Outer join — keep all rows from both

pd.merge(df1, df2, left_on='user_id', right_on='id')

Different key column names

pd.merge(df1, df2, on='id', suffixes=('_left', '_right'))

Custom suffixes for overlapping columns

Notes:

Always check the resulting row count — duplicates in either side cause unexpected row multiplication.

`concat()`

Merging & Joining

Stack DataFrames vertically or horizontally

Syntax:

pd.concat(objs, axis=0)

Examples:

pd.concat([df1, df2])

Stack vertically (more rows)

pd.concat([df1, df2], ignore_index=True)

Reset the index after concatenation

pd.concat([df1, df2], axis=1)

Stack horizontally (more columns)

pd.concat([df1, df2], keys=['a', 'b'])

Add a hierarchical key per source

Notes:

concat is for stacking; merge is for joining on a key. Don't mix them up.

`join()`

Merging & Joining

Join DataFrames using their indexes

Syntax:

df1.join(df2, how='left')

Examples:

df1.join(df2)

Left join on the index

df1.join(df2, how='inner')

Inner join on the index

df1.join([df2, df3])

Join multiple DataFrames at once

Notes:

join() is a thin wrapper around merge() that defaults to using the index. Use merge() when joining on column values.

`pd.to_datetime()`

Time Series

Convert strings or numbers to pandas datetime

Syntax:

pd.to_datetime(arg, format=None, errors='raise')

Examples:

df['date'] = pd.to_datetime(df['date'])

Auto-parse a date column

pd.to_datetime(df['date'], format='%Y-%m-%d')

Explicit format (faster, fails on bad rows)

pd.to_datetime(df['date'], errors='coerce')

Set unparseable values to NaT instead of raising

df['ts'] = pd.to_datetime(df['epoch'], unit='s')

Convert Unix epoch seconds

Notes:

Set the column as the index (df.set_index('date')) to unlock resample() and time-based slicing.

`.dt accessor`

Time Series

Access datetime components on a Series

Syntax:

s.dt.

Examples:

df['date'].dt.year

Extract the year

df['date'].dt.month_name()

Month name (e.g. 'January')

df['date'].dt.day_name()

Day of the week name

df['date'].dt.dayofweek

Day of the week (0=Mon, 6=Sun)

df['date'].dt.strftime('%Y-%m')

Format as a string

Notes:

The .dt accessor only works on datetime Series — convert with pd.to_datetime first.

`resample()`

Time Series

Group time-series data by a frequency and aggregate

Syntax:

df.resample(rule).agg(func)

Examples:

df.set_index('date').resample('D').sum()

Daily totals

df.resample('W')['sales'].mean()

Weekly mean of sales

df.resample('M').agg({'sales': 'sum', 'orders': 'count'})

Monthly aggregation across columns

df.resample('Q').last()

Quarter-end snapshot (last value)

df['sales'].rolling(window=7).mean()

7-day rolling mean (moving average)

Notes:

Resample requires a DatetimeIndex — call set_index() on your date column first.

`.str accessor`

String Operations

Vectorized string methods on a Series

Syntax:

s.str.

Examples:

df['name'].str.lower()

Lowercase every string

df['name'].str.strip()

Strip leading and trailing whitespace

df['email'].str.contains('@gmail')

Boolean mask for substring match

df['name'].str.replace('Mr. ', '', regex=False)

Replace a literal substring

df['phone'].str.extract(r'(\d{3})-(\d{4})')

Extract regex capture groups into columns

df['tags'].str.split(',', expand=True)

Split into multiple columns

Notes:

All .str methods skip NaN values silently — no need to dropna first.

`set_index() / reset_index()`

Index Operations

Promote a column to the index or restore default integer index

Syntax:

df.set_index(col) / df.reset_index()

Examples:

df.set_index('id')

Use 'id' column as the row index

df.set_index(['country', 'year'])

Create a MultiIndex from two columns

df.reset_index()

Move the index back to columns and reset to integer

df.reset_index(drop=True)

Reset the index and discard the old one

df.sort_index(level=0)

Sort a MultiIndex by the first level

Notes:

MultiIndex DataFrames enable powerful hierarchical slicing — use df.xs(key, level=) to cross-section.

`Statistical methods`

Statistics

Built-in statistical and mathematical operations

Syntax:

df.corr() / cov() / quantile() / pct_change() / diff()

Examples:

df.corr()

Pearson correlation matrix between numeric columns

df['sales'].quantile([0.25, 0.5, 0.75])

Quartiles of a column

df['price'].pct_change()

Percent change between consecutive rows

df['sales'].diff()

Difference between consecutive rows

df['x'].corr(df['y'])

Correlation between two specific columns

Notes:

df.corr(method='spearman') for rank correlation when relationships aren't linear.

`df.plot()`

Visualization

Quick built-in matplotlib visualizations

Syntax:

df.plot(kind='line', x=, y=)

Examples:

df.plot(x='date', y='sales')

Line plot (default)

df.plot(kind='bar', x='country', y='sales')

Bar chart

df['age'].plot(kind='hist', bins=20)

Histogram

df.plot(kind='scatter', x='age', y='income')

Scatter plot

df.boxplot(column='sales', by='country')

Box plot grouped by country

Notes:

Run %matplotlib inline in Jupyter, or call plt.show() in scripts to display the plot.

About the Pandas Cheat Sheet

This Pandas cheat sheet is a searchable, copy-ready quick reference for the most-used operations in the pandas Python library — the de-facto standard for tabular data analysis. It covers loading data, exploring it, cleaning it, reshaping it, joining it together, and producing aggregates and time-series summaries.

Every command on this page is grouped into a clear category and includes a real Python example with syntax highlighting. Whether you are wrangling a CSV in Jupyter, building an ETL pipeline, or studying for a data engineering interview, this reference helps you skip the documentation tab and find the right method in seconds.

Pandas is built on top of NumPy and provides two primary data structures — the one-dimensional Series and the two-dimensional DataFrame — alongside hundreds of methods for I/O, selection, group-wise computation, and time-series analysis.

How to Use This Cheat Sheet

1 Search — type any keyword (e.g. groupby, merge, fillna, resample) into the search bar to instantly filter all matching commands and examples.
2 Filter by category — click a category pill (Cleaning, Grouping & Aggregation, Time Series, etc.) to narrow the list to a single topic.
3 Copy and adapt — every example is real, runnable Python. Select the snippet, paste it into your notebook, and tweak the column names to fit your DataFrame.
4 Read the notes — the small italic note under each command captures the gotcha, performance tip, or "why this matters" that the official docs often bury.

Common Use Cases

Exploratory Data Analysis

Loading a CSV, eyeballing it with head() / info(), computing summary stats, and surfacing missing values before deciding how to clean.

Data Cleaning & Wrangling

Dropping duplicates, filling NaN, converting types, renaming columns, and applying functions to standardise messy real-world data into something analysable.

Aggregation & Reporting

Using groupby(), pivot_table(), and crosstab() to roll up granular data into summary tables for dashboards or stakeholders.

Joining Datasets

SQL-style merge() and concat() for combining customer data with orders, lookup tables with fact tables, or stacking files from many days.

Time-Series Analysis

Resampling to daily/weekly/monthly grain, computing rolling means, and extracting datetime components for seasonality and trend analysis.

ETL & Data Pipelines

Reading from CSV, Excel, Parquet, or SQL; transforming with vectorised operations; and writing to a downstream warehouse or analytics tool.

Frequently Asked Questions

What is Pandas used for?

Pandas is a Python library for analysing tabular data — anything you would put in a spreadsheet or SQL table. It is widely used for data cleaning, exploration, transformation, time-series analysis, and as the input layer for machine learning libraries like scikit-learn. If your data has rows and columns, pandas is probably the fastest way to work with it in Python.

What is the difference between a Series and a DataFrame?

A Series is a one-dimensional labelled array — essentially a single column with an index. A DataFrame is a two-dimensional structure with rows and columns, where every column is a Series. Selecting a single column from a DataFrame returns a Series; selecting multiple columns returns a DataFrame.

Should I use loc or iloc?

Use .loc[] when you want to select by label (column names or index labels). Use .iloc[] when you want to select by integer position. One key gotcha: .loc slices are inclusive on both ends, while .iloc follows standard Python slicing (end is exclusive).

How do I handle missing values in Pandas?

Detect missing data with df.isna().sum(), drop it with df.dropna(), or fill it with df.fillna(value). Common imputation strategies include filling with a constant (0 or "Unknown"), the column mean/median, or a forward-fill (method='ffill') for time-series data. Always investigate why values are missing before deciding which strategy to use.

When should I use apply() vs vectorised operations?

Always prefer vectorised operations like df['x'] * 2 or df['a'] + df['b'] — they run in optimised C code and are typically 10–100× faster than apply(). Reserve apply() for genuinely complex per-row logic that cannot be expressed in vectorised form. The .str and .dt accessors are also vectorised and should be used over apply.

What is the difference between merge, join, and concat?

pd.merge() performs SQL-style joins on key columns and is the most flexible. df.join() is a thin wrapper around merge that defaults to joining on the index. pd.concat() stacks DataFrames end-to-end (vertically by default, horizontally with axis=1) without any key matching — use it for combining files of the same shape.

Why is my Pandas code slow on large datasets?

Common causes: using apply() or iterrows() instead of vectorised ops; not setting the right dtypes (use 'category' for low-cardinality strings); reading CSV when Parquet would be much faster; or chaining many intermediate DataFrames that copy memory. For datasets larger than memory, consider Polars, Dask, or DuckDB as drop-in-ish alternatives.

Is this cheat sheet up to date with Pandas 2.x?

Yes. The examples are written for modern pandas (2.x) — including the numeric_only requirement on aggregations over mixed-dtype DataFrames, copy_on_write-friendly patterns, and the recommendation to use pd.to_datetime() with an explicit format= argument for speed.