With this scrape function, I added headers to the get call to make the genius.com servers think the request is geniune in nature and I am not scraping their pages. Once I get the request, I use beeautiful soup to parse the data.
When I was parsing using beautiful soup, I noticed that I could only properly get lyrics that were annotated and were in a element tags.
%matplotlib inline
import urllib2
from BeautifulSoup import BeautifulSoup
def scrape(url):
# add headers since genius.com blocks requests with no headers
header = {'User-Agent':'Mozilla/5.0'}
req = urllib2.Request(url=url, headers = header)
html = urllib2.urlopen(req)
soup = BeautifulSoup(html)
lyrics = ''
# lyrics are all in <a> tags with class = 'referent'
# note that this only works with lyrics that are annotated.
for row in soup('a',{'class':'referent'}):
text = ''.join((row.findAll(text = True)))
data = text.strip() + '.' '\n'
lyrics += data
return lyrics
url = 'http://genius.com/Lil-wayne-go-dj-lyrics'
text = scrape(url)
print text
Once I have the lyrics for the song, I can do a simple sentiment analysis on the lyrics, sentence by sentence using textblob. Using the lyrics, I can also use pytagcloud to create a word map of the popular words used in the song.
The results are pretty interesting, especially the word map.
import pandas
from textblob import TextBlob
import matplotlib.pyplot as plt
from pytagcloud import create_tag_image, make_tags
from pytagcloud.lang.counter import get_tag_counts
from IPython.display import Image
# funtion will analyze polarity (positive / negative) behind the passed verse and create a plot of the polarity
def get_sentiment_analysis(text):
blob = TextBlob(text)
polarity = []
sentences = []
sentiment = pandas.DataFrame(columns=['SENTENCE', 'POLARITY'])
# go through each sentece and get the polarity of the sentence.
for sentence in blob.sentences:
polarity.append(sentence.sentiment.polarity)
sentences.append(str(sentence.raw))
# add to data frames
sentiment['SENTENCE'] = sentences
sentiment['POLARITY'] = polarity
# create plot and add dataframe to plot
plt.figure().set_size_inches(18.5, 10.5)
plt.plot(sentiment['POLARITY'])
plt.xlabel('Sentence')
plt.ylabel('Polarity')
return plt.show()
get_sentiment_analysis(text)
# function will create a word map for a blob of text
def get_wordmap(text):
tags = make_tags(get_tag_counts(text), maxsize=120)
create_tag_image(tags, 'lilwayne.png', size=(1300, 1100), fontname='Lobster')
return Image('lilwayne.png')
get_wordmap(text)