Why Learn Pandas?

Software development training - Geekuni blog: Why learn Pandas?

Thursday, 4 June 2026

Why learn Pandas?

By Andrew Solomon - Geekuni mentor, software engineer and aspiring stall holder. In this article, we’ll look at real-world problems and solve them both with and without Pandas. Along the way, you’ll see how Pandas can help you solve problems with less code that’s easier to read. The goal isn’t to say you always need Pandas. It’s to show when it can help you get the answers you need from your data - sometimes even more easily than using a spreadsheet!

Introduction

For a long time I thought of Pandas as the Swiss Army knife for data scientists. However, when I started playing with it I realised that it was going to make a lot of things much more straightforward. Sharing Pandas with other developers was part of my motivation for putting together the Python Essentials course.

This article is a very quick Pandas taster where we walk you through a use-case which will provide you with the motivation to add it to your toolkit too.

Scenario

I’m running a market stall at Bondi Beach (I wish!) and it’s time to look over how my purchases compare with my sales to see what I need to fine-tune.

I have all purchase and sales data from 2025 in a spreadsheet, and here are my questions:

What was my profit over the year?

What was my profit per month? per produce?

What did I buy too much of?

The data I have to work off is this CSV file - my ledger.

Preparation

First, clone the data and examples of this blog rather than copying and pasting:

git clone https://github.com/andrewsolomon/play_with_pandas.git

Because you’ll be installing two Python modules - Pandas and Babel - create a virtual environment first so that package dependencies don’t affect anything else you’re working on:

cd play_with_pandas python3 -m venv .env source .env/bin/activate

Finally, install Pandas and Babel:

python -m pip install -r requirements.txt

Example 1: What was my profit over the year?

Let’s start with the easiest, a spreadsheet:

Open Google Sheets and import ledger.csv

In G1 enter profit

In G2 enter =ARRAYFORMULA(E2:E * F2:F - C2:C * D2:D)

In I1 enter Total Profit

In J1 enter =ARRAYFORMULA(SUM(G2:G))

The non-Pandas Python approach involves looping over each row, type-casting the various fields from strings to integers or floats, and calculating the profit of each row in a for-loop.

#!/usr/bin/env python

import csv

with open('ledger.csv', 'r') as file: rows = csv.DictReader(file) profit = sum( int(row['num_sold']) * float(row['retail_price']) - int(row['num_purchased']) * float(row['wholesale_price']) for row in rows

print(f'Profit: ${profit:,.2f}')

Here’s the Pandas approach:

#!/usr/bin/env python

import pandas as pd

df = pd.read_csv('ledger.csv') df['profit'] = ( df['num_sold'] * df['retail_price'] - df['num_purchased'] * df['wholesale_price'] print(f'Profit: ${df["profit"].sum():,.2f}')

Both approaches give the same result:

$ ./ex01_year_profit.py Profit: $113,624.06

Reflections

Think of df (short for DataFrame) as a spreadsheet, where we’re adding the column profit using a formula involving columns num_sold, retail_price, num_purchased and wholesale_price. As with a spreadsheet, we didn’t need to loop over the rows - we just did the calculation using columns.

Amazingly, we didn’t need to do any type casting (e.g. float(row['retail_price'])) - Pandas just guessed the column types for us!

Here are Pandas’ inferred column types:

>>> df.dtypes date object produce object wholesale_price float64 num_purchased int64 num_sold int64 retail_price float64 profit float64 dtype: object

It’s got the prices and numbers right, but casting the date as an object means it will be treated just like a Python string which isn’t perfect. We’ll address this in the next example.

Example 2: What was my profit per month?

Without Pandas, the Python code involves starting with a default dictionary using months for keys, like this:

>>> from collections import defaultdict >>> monthly_profit = defaultdict(float) >>> monthly_profit['2025-01'] 0.0

We’re extracting the month 2025-01 as the first 7 characters of the date string 2025-01-04 like this:

month = row['date'][:7]

The full implementation is, once again, a for-loop:

#!/usr/bin/env python

import csv from collections import defaultdict

monthly_profit = defaultdict(float)

with open('ledger.csv', 'r') as file: rows = csv.DictReader(file)

for row in rows: month = row['date'][:7] monthly_profit[month] += ( int(row['num_sold']) * float(row['retail_price']) - int(row['num_purchased']) * float(row['wholesale_price'])

print('Profit per month: ') for month in sorted(monthly_profit): print(f'{month} ${monthly_profit[month]:,.2f}')

The Pandas approach is to make sure the date type is exactly what you expect using the parse_dates and date_format parameters:

>>> import pandas as pd

>>> df = pd.read_csv( ... 'ledger.csv', ... parse_dates=['date'], ... date_format={'date':...

Why Learn Pandas?

Related Articles

The Newest Instagram "Exploit" Is the Goofiest I've Seen

It's Not Just X. It's Y

Amazon, Facebook, FBI have access to a private intelligence-sharing network

Show HN: GoPeek – open links in live mini browser windows without new tabs

Agent Memory: An Anatomy