Scraping Rap Genis Lyrics¶

With this scrape function, I added headers to the get call to make the genius.com servers think the request is geniune in nature and I am not scraping their pages. Once I get the request, I use beeautiful soup to parse the data.

When I was parsing using beautiful soup, I noticed that I could only properly get lyrics that were annotated and were in a element tags.

In [25]:
%matplotlib inline
import urllib2
from BeautifulSoup import BeautifulSoup
In [7]:
def scrape(url):
    # add headers since genius.com blocks requests with no headers
    header = {'User-Agent':'Mozilla/5.0'}
    req = urllib2.Request(url=url, headers = header)
    html = urllib2.urlopen(req)
    soup = BeautifulSoup(html)

    lyrics = ''
    # lyrics are all in <a> tags with class = 'referent'
    # note that this only works with lyrics that are annotated.
    for row in soup('a',{'class':'referent'}):
        text = ''.join((row.findAll(text = True)))
        data = text.strip() + '.' '\n'
        lyrics += data

    return lyrics
In [12]:
url = 'http://genius.com/Lil-wayne-go-dj-lyrics'

text = scrape(url)

print text
Go DJ, cause that's my DJ
Say go DJ, cause that's my DJ.
Murder 101,.
The hottest nigga under the sun.
I came from under the tummy busting a tommy.
Or come from under your garments, your chest and your arm hit.
Pow! One to the head: now you know he dead.
Now you know I play it like a pro in the game
Naw, better yet a veteran in'th hall of fame.
I got that medicine; I'm better than all the names.
Ay, it's Cash Money Records, man: a lawless gang.
Put some water on the track, Fresh, for all his flame.
Wear a helmet when you bang it, man, and guard your brain.
Cause the flow is spasmatic: what they call insane.
That ain't even a motherfuckin' aim. I get dough boy
And you already know that pimpin'.
18, how I'm living? Young'un, show that Bentley.
Stunna my Pa so you know that's in me.
Gotti my mentor so don't go there with me.
And I move like the Coupe through traffic
Rush hour GT Bent': roof is absent.
Your bitch present with the music blastin'.
And she keep asking, "how it shoot if it's plastic?".
I tell her, "you see if your boy run up".
She sat back and cut the Carter back up, oh fa sho.
Ay Big Mike: they better step they authority up.
Before they step to a sergeant's son. I got army guns.
You niggas never harmin' Young
Fly Wizzy, my opponent's done, I'm done talkin'.
And I ain't just begun,.
I been running my city like Diddy, you chump.
I fly by you in a foreign whip, on the throttle with a model; bony bitch.
Pair of phony tits,.
her hair is long and shit, to her thong and shit.
Birdman, put them niggas in a trash can
Leave em outside of your door: I'm your trash man.
I'm steady lighting up the hash, man
And riding in my Jag, you will need a gas mask, man.
You snakes: stop hiding in the grass
Sooner or later I'll cut it, knock the blades in your ass.
You homo niggas getting AIDS in the ass.
While the homie here trying to get paid in advance.
I'm staying on my grizzy I'mma bona fide hustler.
Play me or play with me, then I'm gonna find your mother.
Niggas wanna eat cause they ain't ate nothing
But niggas wanna leave when you say you out of mustard!.
So I'mma walk into the restaurant alone, leaving out
Leaving behind just residue and bones.
In your residence with Rugers to your dome.
Like, "where the fuck you holding the coke?" Holding your throat, choke!.

Analyzing Rap Genis Lyrics¶

Once I have the lyrics for the song, I can do a simple sentiment analysis on the lyrics, sentence by sentence using textblob. Using the lyrics, I can also use pytagcloud to create a word map of the popular words used in the song.

The results are pretty interesting, especially the word map.

In [39]:
import pandas
from textblob import TextBlob
import matplotlib.pyplot as plt
from pytagcloud import create_tag_image, make_tags
from pytagcloud.lang.counter import get_tag_counts
from IPython.display import Image
In [52]:
# funtion will analyze polarity (positive / negative) behind the passed verse and create a plot of the polarity
def get_sentiment_analysis(text):
    blob = TextBlob(text)
    polarity = []
    sentences = []
    sentiment = pandas.DataFrame(columns=['SENTENCE', 'POLARITY'])
    # go through each sentece and get the polarity of the sentence.
    for sentence in blob.sentences:
        polarity.append(sentence.sentiment.polarity)
        sentences.append(str(sentence.raw))
    # add to data frames
    sentiment['SENTENCE'] = sentences
    sentiment['POLARITY'] = polarity
    # create plot and add dataframe to plot
    plt.figure().set_size_inches(18.5, 10.5)
    plt.plot(sentiment['POLARITY'])
    plt.xlabel('Sentence')
    plt.ylabel('Polarity')
    return plt.show()
In [53]:
get_sentiment_analysis(text)
In [48]:
# function will create a word map for a blob of text
def get_wordmap(text):
    tags = make_tags(get_tag_counts(text), maxsize=120)
    create_tag_image(tags, 'lilwayne.png', size=(1300, 1100), fontname='Lobster')
    return Image('lilwayne.png')
In [49]:
get_wordmap(text)
Out[49]: