POST POST

OCT
22
2021

Pandas Cheat Sheet

ORIGINALLY POSTED TO: https://blog.simontimms.com/2021/10/22/pandas-cheat-sheet

Part of an evolving cheat sheet

Read a file

For CSV use

1
data = pd.read_csv('data.csv')

For excel use

1
df = pd.read_excel (r'./Detected as deleted from API data.xlsx')

You might need to install a library for xlsx files

1
pip install openpyxl

Get a summary of the data

Numerical summaries

1
data.describe()

Types of columns

1
data.dtypes

Create a new column from combining with another

This adds up values in two columns

1
data["totalqueuelength"] = data["Svc"] + data["Svc Que"]

This converts a couple of columns that have the data and time to a single field and turns it into a date

1
data["datetime"] = pd.to_datetime(data["Date"] + " " + data["Time"], format='%m/%d/%Y %I:%M %p')

Check date formats at https://docs.python.org/3/library/datetime.html#strftime-and-strptime-behavior

Create a new column from a function

1
df = df.assign(percent=lambda x: 100* x['deleted on date'] / x['total'])

Binning

You can bin the data by adding a new bin column to the dataframe

1
df['binning'] = pd.cut(df['percent'], 5) # 5 bins based on the percent column

Average a column

1
df['column'].mean()

Find row at an index

1
df.iloc[[int(totalRecords * .95)]] # Find the row at the 95th percentile

Filter data rows

1
data.loc[(data['datetime'] > '2021-10-16')]

Sort

1
df.sort_values(by=['percent'], ascending=False)

Simon Timms

Email Email
Web Web
Twitter Twitter
GitHub GitHub
RSS

Looking for someone else?

You can find the rest of the Western Devs Crew here.

© 2015 Western Devs. All Rights Reserved. Design by Karen Chudobiak, Graphic Designer