Ukraine: Twitter analysis (Twint/Gephi tutorial)

  • Warning: this article is the sole responsibility of its author. OSINT-FR cannot be held responsible for any update of the tools described in this guide.

In this tutorial I will show you how to create a relationship graph by extracting tweets.

The data are available on the OSINT-FR Github

Pre-requisite installation :

I advise you to install the twint module with this command :


Copy to Clipboard

Step 1: Collecting tweets

For example, run the command :
Copy to Clipboard

Here is the order manual : https://github.com/twintproject/twint/wiki/Basic-usage

This will produce a csv file with this metadata :

id, conversation_id, created_at, date, time, timezone, user_id, username, name, 
place, tweet, language, mentions, urls, photos, replies_count, retweets_count, 
likes_count, hashtags, cashtag,  link, retweet, quote_url, video, thumbnail, 
near, geo, source, user_rt_id, user_rt, retweet_id, reply_to, retweet_date, 
translate, trans_src, trans_dest

There is often a point where the scrapping runs out and twint can’t go any further into the past.

Here we get a file of 1.3 MB, 2195 tweets. When you open the csv file with Excel or Libre Office only use the “tab” separators. The easiest way I found to get Python to digest this csv is to save it in XLSX format.

Python can digest it with this code (python) :

Copy to Clipboard

Step 2: Clean up the file

For our relationship graph, the nodes are the users (we will use the username column) and the links are the mentions. So there is no need for the whole file. We will filter with this python script the tweets that contain a mention and extract the username so that it is easy to read for the next step.

Copy to Clipboard

Important note: In this example I am making an important approximation that all the usernames mentioned are in those who tweeted. This is not necessarily the case.

Step 3: Create the Gephi file

Go to https://medialab.github.io/table2net/ to use the CSV file from the previous step.

We check visually that the CSV file has been imported :

Setup :

  • Type of network: Normal (one type of node)
  • Nodes : username (some usernames should appear below)
  • Links : mentions (you should see the username appear below)

Step 4: Setting up Gephi

The previous file is opened :

Tab: Overview

As there are not many nodes, there is no need to filter out those that do not have many links. We go directly to see if there are any communities.

Click on Modularity to calculate the statistics

then Close

In the left-hand tab, we will use this statistic. Nodes / Partition / Modularity Class. This will use the statistic to colour the communities. We click on apply :

Finally, we will manage the size of the nodes with the Size button on the right, then Nodes, then Rating, then Apply :

Now you have to let Gephi do its calculations to arrange the nodes harmoniously.

Left tab, choose a spatialization, Force Atlas 2, if you let it run like that without parameterization you get this :

They must be brought together and recovery prevented :

There are not many links in this example and users seem to be important too.

We can go to the Preview tab to see the labels.

If we check Show labels and a size 5 we get this graph :

I reduced the maximum node size to 50 and refreshed :

You can save the image as a PNG, SVG, PDF. The last two have the advantage of keeping the username text.

BONUS :

I collected the last 75,000 tweets of the order (which gives tweets from 05 March 19:54 to 08 March 00:00):

Copy to Clipboard

by extracting the interesting tweets (i.e. with mentions), we get about 3000. we see different communities without influencers that overwhelm and some accounts that make the links.