Why Learn Pandas?

theanonymousone1 pts0 comments

Software development training - Geekuni blog: Why learn Pandas?

Thursday, 4 June 2026

Why learn Pandas?

By Andrew Solomon - Geekuni mentor, software engineer and aspiring stall holder.<br>In this article, we’ll look at real-world problems and solve them both with and without Pandas. Along the way, you’ll see how Pandas can help you solve problems with less code that’s easier to read.<br>The goal isn’t to say you always need Pandas. It’s to show when it can help you get the answers you need from your data - sometimes even more easily than using a spreadsheet!

Introduction

For a long time I thought of Pandas as the Swiss Army knife for data scientists. However, when I started playing with it I realised that it was going to make a lot of things much more straightforward. Sharing Pandas with other developers was part of my motivation for putting together the Python Essentials course.

This article is a very quick Pandas taster where we walk you through a use-case which will provide you with the motivation to add it to your toolkit too.

Scenario

I’m running a market stall at Bondi Beach (I wish!) and it’s time to look over how my purchases compare with my sales to see what I need to fine-tune.

I have all purchase and sales data from 2025 in a spreadsheet, and here are my questions:

What was my profit over the year?

What was my profit per month? per produce?

What did I buy too much of?

The data I have to work off is this CSV file - my ledger.

Preparation

First, clone the data and examples of this blog rather than copying and pasting:

git clone https://github.com/andrewsolomon/play_with_pandas.git

Because you’ll be installing two Python modules - Pandas and Babel - create a virtual environment first so that package dependencies don’t affect anything else you’re working on:

cd play_with_pandas<br>python3 -m venv .env<br>source .env/bin/activate

Finally, install Pandas and Babel:

python -m pip install -r requirements.txt

Example 1: What was my profit over the year?

Let’s start with the easiest, a spreadsheet:

Open Google Sheets and import ledger.csv

In G1 enter profit

In G2 enter =ARRAYFORMULA(E2:E * F2:F - C2:C * D2:D)

In I1 enter Total Profit

In J1 enter =ARRAYFORMULA(SUM(G2:G))

The non-Pandas Python approach involves looping over each row, type-casting the various fields from strings to integers or floats, and calculating the profit of each row in a for-loop.

#!/usr/bin/env python

import csv

with open('ledger.csv', 'r') as file:<br>rows = csv.DictReader(file)<br>profit = sum(<br>int(row['num_sold']) * float(row['retail_price'])<br>- int(row['num_purchased']) * float(row['wholesale_price'])<br>for row in rows

print(f'Profit: ${profit:,.2f}')

Here’s the Pandas approach:

#!/usr/bin/env python

import pandas as pd

df = pd.read_csv('ledger.csv')<br>df['profit'] = (<br>df['num_sold'] * df['retail_price']<br>- df['num_purchased'] * df['wholesale_price']<br>print(f'Profit: ${df["profit"].sum():,.2f}')

Both approaches give the same result:

$ ./ex01_year_profit.py<br>Profit: $113,624.06

Reflections

Think of df (short for DataFrame) as a spreadsheet, where we’re adding the column profit using a formula involving columns num_sold, retail_price, num_purchased and wholesale_price. As with a spreadsheet, we didn’t need to loop over the rows - we just did the calculation using columns.

Amazingly, we didn’t need to do any type casting (e.g. float(row['retail_price'])) - Pandas just guessed the column types for us!

Here are Pandas’ inferred column types:

>>> df.dtypes<br>date object<br>produce object<br>wholesale_price float64<br>num_purchased int64<br>num_sold int64<br>retail_price float64<br>profit float64<br>dtype: object

It’s got the prices and numbers right, but casting the date as an object means it will be treated just like a Python string which isn’t perfect. We’ll address this in the next example.

Example 2: What was my profit per month?

Without Pandas, the Python code involves starting with a default dictionary using months for keys, like this:

>>> from collections import defaultdict<br>>>> monthly_profit = defaultdict(float)<br>>>> monthly_profit['2025-01']<br>0.0

We’re extracting the month 2025-01 as the first 7 characters of the date string 2025-01-04 like this:

month = row['date'][:7]

The full implementation is, once again, a for-loop:

#!/usr/bin/env python

import csv<br>from collections import defaultdict

monthly_profit = defaultdict(float)

with open('ledger.csv', 'r') as file:<br>rows = csv.DictReader(file)

for row in rows:<br>month = row['date'][:7]<br>monthly_profit[month] += (<br>int(row['num_sold']) * float(row['retail_price'])<br>- int(row['num_purchased']) * float(row['wholesale_price'])

print('Profit per month: ')<br>for month in sorted(monthly_profit):<br>print(f'{month} ${monthly_profit[month]:,.2f}')

The Pandas approach is to make sure the date type is exactly what you expect using the parse_dates and date_format parameters:

>>> import pandas as pd

>>> df = pd.read_csv(<br>... 'ledger.csv',<br>... parse_dates=['date'],<br>... date_format={'date':...

pandas profit month python date import

Related Articles