The Largest Vocabulary in Hip Hop

chistev1 pts0 comments

The Largest Vocabulary in Hip Hop

-->

The Pudding

BY MATT DANIELS

THE LARGEST<br>VOCABULARY

IN HIP HOP

Rappers, ranked by the number of unique<br>words used in their lyrics

Literary elites love to rep Shakespeare’s vocabulary: across his entire corpus, he uses 28,829 words, suggesting he knew over 100,000 words and arguably had the largest vocabulary, ever.

I decided to compare this data point against the most famous artists in hip hop. I used each artist’s first 35,000 lyrics. That way, prolific artists, such as Jay-Z, could be compared to newer artists, such as Drake.

# of Unique words used within artist’s first 35,000 lyrics

2,900 Words

3,600

4,300

5,000

5,700

6,400

Notes/sources:

(1)(2) I used the first 5,000 words for 7 of Shakespeare's works: Hamlet, Romeo and Juliet, Othello, Macbeth, As You Like It, Winter's Tale, and Troilus and Cressida. For Melville, I used the first 35,000 words of Moby Dick.

All lyrics are provided by Rap Genius, but are only current to 2012. My lack of recent data prevented me from using quite a few current artists.

This data viz uses code by Amelia Bellamy-Royds's in this jsfiddle.

Puff-Daddy:

4,429

unique words used

Southern

Midwest

West Coast

East Coast

All Artists<br>View by Region<br>Just

shakespeare 1

would be here

(5,170)

moby dick 2

would be here

(6,022)

(ps. Get this project as a poster on Pop Chart Lab! It includes 40 more rappers in the analysis, including Childish Gambino, 2 Chainz, Immortal Technique, and Kendrick Lamar.)

35,000 words covers 3-5 studio albums and EPs. I included mixtapes if the artist was just short of the 35,000 words. Quite a few rappers don’t have enough official material to be included (e.g., Biggie, Kendrick Lamar). As a benchmark, I included data points for Shakespeare and Herman Melville, using the same approach (35,000 words across several plays for Shakespeare, first 35,000 of Moby Dick).

I used a research methodology called token analysis to determine each artist’s vocabulary. Each word is counted once, so pimps, pimp, pimping, and pimpin are four unique words. To avoid issues with apostrophes (e.g., pimpin’ vs. pimpin), they’re removed from the dataset. It still isn’t perfect. Hip hop is full of slang that is hard to transcribe (e.g., shorty vs. shawty), compound words (e.g., king shit), featured vocalists, and repetitive choruses.

It’s still directionally interesting. Of the 85 artists in the dataset, let’s take a look at who is on top.

#1 - Aesop Rock

When I first published this analysis, I excluded Aesop Rock, figuring he was too obscure. The Reddit hip hop community was in uproar, claiming Aesop would absolutely be #1. Sure enough, Aesop Rock is well-above every artist in my dataset and I was obliged to add him to the chart. In fact, his datapoint is so far to the right that he should be off the chart (I'm lazy and didn't adjust the scale).

#2, #6, #7, #9, #20, and #23 - wu-tang clan aint nothin ta fuck wit

Wu-Tang Clan at #6 is fucking impressive given that 10 members, with vastly different styles, are equally contributing lyrics. Add the fact that GZA, Ghostface, Raekwon, and Method Man's solo works are also in the top 20 – notably, GZA at #2 . Perhaps their countless hours of studio time together (and RZA’s mentorship) exposed each rapper’s vocabulary to one another.

Let’s take a deeper look at Wu-Tang five studio albums to better understand each member’s contribution. Here's a breakdown of the number and percent of words used by each member.

OUTKAST ACCOLADES ('94 - '03)<br>-->

To understand each rapper's vocabulary (# of unique words) in Wu-Tang's first five albums, I chose a 3,500 word threshold so that each person was on an equal footing. That way, we could include GZA, but unfortunately had to exclude Ol' Dirty Bastard, Cappadonna, and Masta Killa, who have too few verses across Wu-Tang's corpus.

OUTKAST ACCOLADES ('94 - '03)<br>-->

U-God and GZA clearly bolster the group’s average. Raekwon and Method Man’s contributions have a lower average compared to other members, but recognize that their data points would exceed most artists in hip hop.

#3 - 5 - Kool Keith, Canibus, Cunninlynguists

Moving past Wu-Tang’s dominance, the next three artists are relatively not as well-known. Of the three, Kool Keith has the most diverse vocabulary. For a taste of his work, check out his album with the largest vocab: Dr. Octagonecologyst. #2 and #3 are two relatively underground (yet accomplished) acts: Jamaican-born rapper Canibus and southern-based group CunninLyguists.

#14 - 15 - Outkast and E-40

Of course E-40 is in the top 20; he’s considered to be the inventor of much slang. Just a few that he’s been responsible for: all good, pop ya collar, shizzle, and you feel me.

At #15, Outkast’s deep vocabulary is definitely a function of their style: frequent use of portmanteau (e.g., ATLiens, Stankonia), southern drawl (e.g., nahmsayin, ery’day), and made-up slang (e.g., flawsky-wawsky).

As expected, other...

words vocabulary artists first tang largest

Related Articles