Signs of Triviality

Opinions, mostly my own, on the importance of being and other things.
[homepage] [index] [jschauma@netmeister.org] [@jschauma] [RSS]

Twitter Stats

Updated 2012-06-11: timestamps are UTC, not Pacific

I spend a silly amount of time on Twitter. I also happen to archive my tweets, which isn't as braindead as you might initially think. I've come to realize that being able to grep(1) through my tweets is rather useful as I frequently find myself looking for information that I've shared in this way.

Since Twitter only allows you to retrieve the last 3200 messages, I found it useful to fetch all my messages on a nightly basis and store them locally using my twistory tool:

#!/bin/sh
FILE=${HOME}/.twitter/history
LAST=$(tail -1 ${FILE} | cut -f1 -d' ')
twistory -l -a ${LAST} -u jschauma | sort -n >> ${FILE}

With this, I now have my last 4305 messages on file (missing about 1.1K messages, since I did not realize the Twitter 3.2K limit until it was too late). And once you have such data on file, what do you naturally do? You start to ask yourself stupid questions. Like "What's my tweeting frequency?". Since twistory(1) stores the date of the tweets, we can easily answer that with a few simple commands.

And so, even though the findings themselves aren't particularly exciting or surprising, they're there, and so they're here, too. Behold.

Tweets by Month

At this point, graphing by year doesn't make a whole lot of sense. I've been on Twitter only since 2010, and naturally the frequency has increased. Let's skip "by year" and look at "by month" instead.

We start by extracting the date from the history file, look for the relevant field, sort(1) and uniq(1) the output and finally use Perl to print a "graph". Each 'x' stands for up to 5 tweets.

<history sed -e 's/.* (20[0-9][0-9]-\([0-9]*\)-.*/\1/' |
        sort | uniq -c |
        perl -le 'while (<>) {
                           m/(.*) (.*)/;
                           printf("$2 %s\n", 'x' x (($1 > 5) ? $1/5 : 1)); }'

The output looks like this:

01 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
02 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
03 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
04 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
05 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
06 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
07 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
08 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
09 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
10 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
11 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
12 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Looks like I tweet more in May than in other months. Useful.

Tweets by Hour of Day

I skipped the results for tweets by day of month (quite even distribution, not surprisingly), as well as tweets by minute and seconds of the day. But let's look at tweets by hour of the day:

<history sed -e 's/.* //' -e 's/:.*//' |
        sort | uniq -c |
        perl -le 'while (<>) {
                           m/(.*) (.*)/;
                           printf("$2 %s\n", 'x' x (($1 > 5) ? $1/5 : 1)); }'

The output looks like this:

00 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
01 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
02 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
03 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
04 xxxxxxxxxxxxx
05 x
06 x
07 x
08 x
09 x
10 x
11 x
12 xxxxxxxxx
13 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
14 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
15 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
16 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
17 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
18 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
19 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
20 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
21 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
22 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
23 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

Initially not very shocking, I was surprised to find such a large number of tweets after midnight. There's a fair number that I'm going to chalk up to having travelled to CA and thus having those tweets actually be three hours earlier (timestamps here are whatever Twitter hands back as created_at without looking at any other time-related data).

As I write this, it occurs to me that all the time stamps are probably in Pacific time, meaning they're shifted by three hours. I thought this made sense, but it was then pointed out to me that the timestamps are actually in UTC, which... makes actual sense. So after accounting for UTC offset, the distribution looks like this:

00 xxxxxxxxxxxxx
01 x
02 x
03 x
04 x
05 x
06 x
07 x
08 xxxxxxxxx
09 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
10 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
11 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
12 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
13 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
14 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
15 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
16 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
17 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
18 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
19 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
20 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
21 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
22 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
23 xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

...which looks a lot more reasonable. Yay, I learned something: all my tweets in my history are in UTC. Exciting.

Tweets by Weekday

Finally, let's see on which weekdays I tweet the most:

<history sed -e 's/.* (\([0-9-]*\) [0-9:]*)/\1/' |
        perl -le 'use Date::Calc qw(Day_of_Week);
                   @days = ( "Monday", "Tuesday", "Wednesday", "Thursday", "Friday", "Saturday", "Sunday");
                   while (<>) {
                           m/([0-9]*)-([0-9]*)-([0-9]*)/;
                           print $days[Day_of_Week($1, $2, $3) -1];}' |
        sort | uniq -c |
        perl -le 'while(<>){ m/(.*) (.*)/;  printf("$2 %s\n", 'x' x (($1 > 5) ? $1/5 : 1))};'

This gives us:

Monday     xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Tuesday    xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Wednesday  xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Thursday   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Friday     xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Saturday   xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
Sunday     xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx

This isn't really useful, though.


Anyway, so that's my Twitter stats. It's just the kind of thing us geeks do with data when they have it.

June 10th, 2012


[GMail Annoyances] [Index] [How Systems and Software Engineers see themselves]