Overview
I don’t use Twitter. But, a lot of people do: globally, there are about 6000 tweets per second. So, I decided to make a visualization of tweets over time, using their geolocation tags to plot them on a map.
First things first - how do I scrape data from Twitter? It took a little digging, but I found the excellent streamR package, which interfaces with the Twitter API. Twitter’s API lets you sample the stream of public tweets in realtime. streamR
provides some very useful functions that let you filter that stream by different variables (like location or username), as well as parse the tweets you get into a tidy data frame.
Setup
I sampled the Twitter stream for 7 hours, from about 6 am Eastern to 10 am Pacific, and recorded only those tweets that were both public and located within the US mainland (more or less). The goal was to see how the concentration of tweets in the western part of the US increases as people start to wake up. (Thank you to my good friend, Alex, for giving me that idea!)
The way Twitter filters for location is by using latitude and longitude coordinates to draw a rectangle around a given area on a map. Because the US isn’t perfectly rectangular, it did pull some data from Canada and Mexico. It is possible to filter these data out after the fact, but for my purposes I decided to leave them.
In the end, I captured 365530 tweets in the seven hour window, with an average of 860 tweets per minute, and a range of 207-2087 tweets per minute. Again, remember that this is just a subset of a random sample of all the public tweets that are happening in the US mainland.
Data Visualization
Here’s what the map looks like!
The red lines indicate the median longitude and latitude values for each timepoint. It appears that there is overall more use in the eastern part of the US compared to the western part, but this could be due to both population and time of day.
Methods
Before being able to pull data from Twitter, I first had to create an app at apps.twitter.com. Then I used the ROAuth
package and provided my app’s consumer key and consumer secret to establish the connection between R and my app. The code for how to do this is located in the documentation for the streamR
package.
Sampling the Twitter Stream
I used the filterStream
function in streamR
and gave the locations
parameter the coordinates of the bounding box for the US mainland.
library(streamR)
library(ROAuth)
# AUTHENTICATE FIRST; CODE NOT SHOWN HERE
# Coordinates for USA
usa <- c(-124.848974,24.396308,-66.53076,49.23037)
# 7 hours + 5 minutes = 25,500 seconds
filterStream(file.name = "sunriseTrack-7h.json", oauth = my_oauth,
locations = usa, timeout = 25500)
Data Processing
I used the parseTweets
function to create a tidy data frame of the tweets, and then did some manipulation to get the information I needed to make the plots.
library(dplyr)
library(lubridate)
parsed <- parseTweets("sunriseTrack-7h-2.json")
parsed <- tbl_df(parsed)
# Grab the columns we want, filter out any NA entries, and do some mutations to get the data types we want
locations <- select(parsed, created_at, place_lat, place_lon, lat, lon) %>%
filter(!is.na(created_at)) %>%
mutate(created_at = as.POSIXct(created_at,
format = "%a %b %d %H:%M:%S %z")) %>%
rename(time = created_at) %>%
mutate(time.by.min = format(time,
format = "%Y-%m-%d %H:%M %Z",
tz = "America/New_York"))
# Some of the place_lat and place_lon values are NA, but there are entries for lat and lon. So, copy the lat/lon values to place_lat and place_lon
naInds <- which(is.na(locations$place_lat) & is.na(locations$place_lon))
for (i in naInds) {
locations$place_lat[i] <- locations$lat[i]
locations$place_lon[i] <- locations$lon[i]
}
# Find the number of tweets per minute and exclude the first minute, since it's not a full 60 seconds
tweets.by.minute <- group_by(locations, time.by.min) %>%
summarize(entries = table(time.by.min)) %>%
arrange(time.by.min) %>%
slice(2:n())
Creating the Animation
I used the animation
package to create the GIF. For each frame of the GIF, I plotted the coordinates of each tweet for a particular minute. I also added a “cool down” period to make the animation a little cleaner: for any given minute (beyond minute 1), the points from the previous minute will fade by manipulating the alpha value.
I set the interval to 0.25 seconds, giving a frame rate of 4 fps. I uploaded the GIF to Gfycat, so you can change the speed as needed.
library(ggplot2)
library(animation)
library(maps)
statemap <- map_data("state")
saveGIF ({
for(i in 1:nrow(tweets.by.minute)) {
# Add "cool down" for points using alpha
if (i > 1) {
prev.match <- match
match <- tweets.by.minute[i,1]
df <- filter(locations, time.by.min %in% c(match, prev.match)) %>%
arrange(time.by.min)
len.match <- nrow(filter(df, time.by.min %in% match))
len.prevmatch <- nrow(filter(df, time.by.min %in% prev.match))
alphalist <- c(rep(.2,len.prevmatch), rep(.7,len.match))
df <- ungroup(df) %>%
mutate(alphaVal = alphalist)
newInd <- (len.prevmatch+1):length(df$place_lat)
medPosn <- as.data.frame(t(c(lat = median(df$place_lat[newInd]),
lon = median(df$place_lon[newInd]))))
} else {
match <- tweets.by.minute[i,1]
df <- filter(locations, time.by.min %in% match) %>%
mutate(alphaVal = rep(1, n()))
medPosn <- as.data.frame(t(c(lat = median(df$place_lat),
lon = median(df$place_lon))))
}
p1 <- ggplot(data = statemap, aes(x = long, y = lat, group = group)) +
geom_polygon(fill = "#55acee", color = "white") +
coord_fixed(1.3, xlim = c(-130,-60), ylim = c(20,50)) +
geom_point(data = df, aes(x = place_lon, y = place_lat),
alpha = df$alphaVal, size = 3, inherit.aes = FALSE,
color = "#292f33") +
geom_point(data = medPosn, aes(x = lon, y = lat),
color = "red", size = 3, inherit.aes = FALSE) +
geom_vline(data = medPosn, xintercept = medPosn$lon,
color = "red", size = .5) +
geom_hline(data = medPosn, yintercept = medPosn$lat,
color = "red", size = .5) +
annotate("text", label = df$time.by.min[length(df$time.by.min)], x = -95, y = 20,
size = 8) +
theme_void() +
theme(legend.position = "none")
print(p1)
}
}, interval = 0.125, ani.width = 1028, ani.height = 514,
movie.name = "sunrise-tweets-7h-8fps-2.gif")