Twitter Sentiment Analysis

Following my last post Extracting Data from Twitter, we are now ready to to play with the tweets we grabbed from Twitter.

The first thing we will do is authenticate our connection with Twitter.

load("twitCred.RData")
registerTwitterOAuth(twitCred)

Now, we need a way to score how ‘good’ or ‘bad’ each tweet is. This is actually an interesting problem. I will be using an adapted solution developed from Jeffrey Breen.

# load files containing positive & negative words
positive.words = scan("positive-words.txt",
					  what='character',
					  comment.char=';')
negative.words = scan("negative-words.txt",
					  what='character',
					  comment.char=';')

# dependent function
tryTolower = function(x){

	# create missing value
	# this is where the returned value will be
	y = NA

	# tryCatch error
	try_error = tryCatch(tolower(x), error=function(e) e)

	# if not an error
	if (!inherits(try_error, "error"))
	y = tolower(x)

	return(y)
}

# sentiment score function
score.sentiment = function(sentences, pos.words, neg.words, .progress='none'){

    # We got a vector of sentences.
	# plyr will handle a list or a vector as an "l" for us.
    # We want a simple array ("a") of scores back, so we use
    # "l" + "a" + "ply" = "laply":
    scores = laply(sentences, function(sentence, pos.words, neg.words){

        # Clean Up Sentences With R's Regex-driven Global Substitute, gsub():
        sentence = gsub('[[:punct:]]', '', sentence)
        sentence = gsub('[[:cntrl:]]', '', sentence)
        sentence = gsub('\\d+', '', sentence)

        # Convert to Lower Case:
		sentence = iconv(sentence, 'UTF-8', 'ASCII')
		sapply(sentence, function(x) tryTolower(x))

        # Split into Words. str_split is in the stringr package
        word.list = str_split(sentence, '\\s+')

        # sometimes a list() is one level of hierarchy too much
        words = unlist(word.list)

        # Compare Our Words to the Dictionaries of Positive & Negative Terms
        pos.matches = match(words, pos.words)
        neg.matches = match(words, neg.words)

        # match() returns the position of the matched term or NA
        # we just want a TRUE/FALSE:
        pos.matches = !is.na(pos.matches)
        neg.matches = !is.na(neg.matches)

        # and conveniently enough, TRUE/FALSE will be treated as 1/0 by sum():
        score = sum(pos.matches) - sum(neg.matches)

        return(score)
    }, pos.words, neg.words, .progress=.progress )

	scores.df = data.frame(score=scores, text=sentences)
	return(scores.df)
}

# function to grab tweets
grab.tweet <- function(search="#rstats", n=10) {

	# check search parameter
	if (!is.character(search)) {
		warning(sprintf("Search argument %s is not character.",search))
	}

	# check n parameter
	if (!is.numeric(n)) {
		warning(sprintf("Number %s is not numeric.",n))
	}

	# get tweets
	tweets <- searchTwitter(search, n=n, cainfo = system.file("CurlSSL", "cacert.pem", package = "RCurl"))

	# put in data frame
	tweets <- twListToDF(tweets)

	# output to temporary file and read back in to weed out erroneous characters
	write.csv(tweets, file="tmp_tweet.csv", row.names=FALSE)
	tweets <- read.csv(file="tmp_tweet.csv")

	# score tweets
	tweet.scores <- score.sentiment(tweets$text, positive.words, negative.words)

	# merge back to original data
	tweet.scores <- merge(tweets,tweet.scores)
	return(tweet.scores)
}

Now we can load tweets and analyze the sentiment score.

# get 500 latest tweets containing "#heat"
tweets <- grab.tweet(search="#heat", n=500)

# examine data frame with tweets and score
head(tweets)

# tabulate scores
table(tweets$score)

# plot scores and output as png file
require(lattice)
png(file="heat_bar.png", width=1000, height=750)
barchart(as.factor(tweets$score),
		 horizontal=FALSE,
		 col="red",
		 scales=list(x=list(cex=1.5), y=list(cex=1.5)),
		 xlab=list("Score",cex=2),
		 ylab=list("Frequency",cex=2),
		 main=list("Miami Heat Tweet Sentiment",cex=4))
dev.off()

This was run right after the Miami Heat won game 5 of the 2013 NBA Playoffs against the Indiana Pacers in the Eastern Conference finals.

heat_bar

Advertisements

One thought on “Twitter Sentiment Analysis

  1. Pingback: 2013 State of the Union Twitter Data | Aaron Crowley

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s