Crawling Twitter Data using Rapid Miner for Analytics.

I am kinda using Rapid Miner for a few University Projects , thought i would share some things i learned during the process , this is just about getting data from specific twitter accounts .

Installation :

Use the following link to download Rapid Miner : , you can probably do it by your self , it’s a very simple installation .

The Interface :

When you done setting up Rapid Miner you will be greeted with this start page .

As you can see there is a bunch of different options here for the sake of this tutorial we will use the Blank option .

Our goal is to get Twitter Data from specific twitter accounts of news publication .

As you can see in from the screenshot , there is three available operators on regards to twitter available from the get go in Rapid Miner . Since we want to crawl user tweets , we will select get twitter user statuses for our case .

We will take three of them because we want to crawl three accounts accounts and append them together using the append operator .

On the right hand side you will see a connection option , click the twitter logo beside it to set up a twitter account to log in to , that’s not the account we will take tweets from but it is important be logged into an account in order to crawl data.

Now we need to set up the Get Twitter User Status operator , to do that lets start with finding the news outlet we want to pick for our work , for this we will nytimes ,dailystar and the star .

We will use each of these twitter accounts user name . For New York Times , it’s nytimes , for The star it is staronline and for The daily star , it is dailystar .

If we select one of the operators , parameters of that operator will show up on the right side of the Rapid Miner dashboard . We will select the connection type we already added before . Query type we will select name because we will crawl using the username of the account . We will then set the user to our desired account , for this case we will use nytimes , dailystar and staronline and we will set the limit of query to 1000 . so it will work with 1000 tweets.

Now we will use the filter operator to filter what type of news we would want to find , for now we are trying to find the negative news posted by these accounts recently (out of 1000 tweets each account .

So we are almost done , now we are moving close to getting our results , if we want to save our result in a excel file we can use the Write to Excel operator which will print the result to excel.

So finally it will look something like this , now we will try running the program using the blue play button and it should produce something like this .

From-User holds all user accounts and text field contains the status . From the number of column we can now find out out of 1000 news , how many of them contain the negative words we used as our filter .

Easy right ?

Exercise : Compare the number of Bad and Good news generated in the United States of America , using four news news outlet’s twitter account and compare the results .

Visualization using Rapid Miner :

Now let’s say we want visualize what we want to display a pie chart , that will display the Retweet count for each account .

Let’s select our plot type . pie chart from these then tick the Aggregate data column

Since we will plot the chart using the number of tweets generated from the three accounts , we are gonna group by From-User , Aggregation Fund will be set to count as we are counting number of tweets .

And we have our Pie Chart .

Exercise : From KFC and Pizza hut’s twitter page , plot a pie chart based on the retweets for each account .

Written by Masud Imran

