[SentiSteem #1] Rise and fall of Conor McGregor. Sentiment analysis of tweets from 2013 to 2017

sentiment.png

Hello world! Welcome to report where I'm using machine learning to analyze tweets about specified topic and present results in form of various and easy to understand charts. This sentiment analysis algorithm has been developed as part of my Master Thesis in 2017/2018.

This report is currently being published exclusively here on Steemit.

text10.png

Parameters

Today's analysis has been executed on tweets which contain word "Mcgregor" and were published between 2013-01-01 and 2017-12-31. Detailed specification of the data is shown in the following list:

  • Keyword: McGregor
  • From: 2013-01-01
  • To: 2017-12-31
  • Number of analyzed tweets: 60000
  • Language: en
  • Geographical location: Not specified

text01.png

Results

Sentiment

After downloading 60000 tweets between the specified dates, sentiment analysis has been executed on each and every one of those tweets. Sentiment score has been then aggregated over weeks and months, to lower the granularity of results on the time axis and then plotted as a following linechart.

sentiment.png
Sentiment of tweets for keyword "McGregor"

My subjective comment on the chart: I think there's no arguing here that the sentiment of Tweets is declining steadily. I personally believe it's half the envy of people as Conor got rich and successful and half his trashtalk which really got offensive and over the border in last couple of years.

Aggregation using heatmaps


To show the general trend/pattern in the sentiment, linechart works great. We can see the bigger timeframe and estimate the long-term direction. But if you're interested in particular month or week, it's hard and in case of weeks actually impossible to see the change. Has an athlete put the great performance in particular match? Has the brand/company released a new line of product? So see such low lever changes, following 2 heatmaps are to be used.

heatMap.png
Chart shows average sentiment per month where 0.50 is the worst and 0.78 the best achieved score

My subjective comment on the chart: Oh wow, I looove this chart. Look where the lowest and highest sentiment accured. Exactly around his 2 leendary fights. The most positive score is in October 2016 as there was anticipated a huuuge bout between him and Eddie Alvarez. It was the first time ever, UFC fighter could potentially hold 2 belts simultaneously. The wost sentiment can be seen in August 2017. Why? Well, he fought Floyd Mayweather and there were probably many many tweets he will "lose", "get his ass kicked" etc etc. This chart doesnt lie ;) Another interesting pattern I've noticed is that sentiment always gets bigger one month before his fight - e.g. in the 2016, he fought in March, August and November. Sentiment peaked 3 times - in February, July and October :) What's also worth noticing is that with his boxing bout, the opposite occurred. Sentiment was bad in the months leading to the fight, but once it happened, sentiment rose up higher.

heatMapWeekly.png
Chart shows average sentiment per week where 0.50 is the worst and 0.79 the best achieved score

My subjective comment on the chart: This chart basically "zooms" in. It might be that the granularity is bit too much. We can clearly see increase when May-Mac fight was announced and also the week of his Aldo fight for the featherweight belt has huge sentiment.

Most frequently used words


Another very interesting aspect to look into are the repeatedly used words using wordclouds. Even more interesting is to compare two wordclouds generated from different time - usually before and after some event/change happened. If you give this a second though, the problem here is that many short words (like "and", "or", "with" and so on) are used almost in every sentence and would also show up in wordclouds. To mitigate this, I've removed list of 153 so called stopwords. Additionally I've also removed words typical for this area listed in the end of the report*.

CommonWords.png

Most often used words in tweets containing word "McGregor" before and after 2015-12-12.

My subjective comment on the chart: <3 I totally love data science :D We can very nicely see that before December 2015, two names were often used with Conor's - Aldo and Frankie (two featherweight kings). After 2015, when Conor left division, Nate Diaz or Floyd Mayweather are the big names. It's also interesting to see the word will in the before 2015 wordcloud. I think it comes from tweets which were talking about Conor as a future champion.

Most frequently used UNIQUE words

As we can see in the previous worldcloud, there are many words which are actually shared in both wordclouds. That makes all the sense as there are many areas which will be forever connected with . But I went one step further and decided to create wordclouds which contain only unique words with don't appear in the opposite wordcloud.

UniqueWords.png
Most often UNIQUE used words in tweets containing word "McGregor" before and after 2015-12-12.

My subjective comment on the chart: This chart shows similar results as the previous one, only bit more amplified. I can clearly see following points:

  • Biggest change are oviously two words - boxing & suck. THere's no doubt these talk about Conor's bout and skills against Mayweather.
  • Positive words as truly, love, star, genius are very popular during Conor's rise to stardom
  • When he really became a household name because of Mayweather boxing match, words like ESPN, Payout occurred as they're closely connected to boxing and his big payday for the fight
  • Word bottle shows how huge topic his incident with Nate Diaz was. Conor threw a bottle and a Monster can on Nate.
  • Huge is also a word refuse which definitely occurred in tweets about Conor being pulled from UFC200 as he refuse to fly over to US for press conference.

text10.png

Get your report - Christmas present!


Twitter sentiment analysis reports are being sold for quite some dollars in the world outside of Steemit. I'll be selling these reports here for much more reasonable price in the near future, I just want to fine tune the algorithm bit more. You'll get the report, numbers and charts and can it all post on your account!. But because it's Christmas time, I've decided to gift one report! To qualify, complete these 2 conditions:

  • Comment down the parameters of report you're interested in
  • Resteem this post
  • No upvote required :)
I'll choose the winning entry based on doability of the report as Tweets can get pretty tricky and don't always make sense :D

Thanks for reading! Matko.

text10.png

You can find my latest posts here:

šŸ† My STEEMMONSTERS trophies/scalps šŸ†

ATS_NIGHT VIEW-2.jpg
Reached #16 in leaderboard
ATS_NIGHT VIEW.jpg
One-sided win over JoeParys
ATS_NIGHT VIEW.jpg
Proud owner of legendary Hydra

Steemit Bloggers
Join us @steemitbloggers
Animation By @zord189

H2
H3
H4
Upload from PC
Video gallery
3 columns
2 columns
1 column
22 Comments