Tag Archives: twitter

Dollar Value of Personal Data

Personal Data - Median Fair Price

Personal Data – Median Fair Price

How much is your personal data worth?  Worth, as in – how much should you sell it for?  In dollar terms.

I went looking for attempts to answer this question and didn’t find much.  So I took a shortcut and asked a bunch of people.  Here’s how I set up the survey:

Imagine your friend has just told you about his new job.  He now works for a company that pays people for their personal data.  For example, you would tell the company your name and the name of your favorite TV show, and you would receive a certain amount of money in return.

Your friend is in charge of setting fair prices for each piece of personal data.  He needs advice from you about how much to pay.  For each of the items below, please provide the price you believe would be fair to pay someone to provide that information.

More methodology details are below, but let’s get straight to the results.  In this table is each facet of personal data requested, and the survey respondents’ median1 dollar price value.

Personal Data Median Fair Price
Home or Cell Phone Number $6.25
Home Address (Street Address) $5.00
Name of Employer $5.00
Previous Employers $5.00
Brand Name of Bank Used Most Often $3.00
Brand Name of Credit Card Used Most Often $2.00
Link to (Public) Facebook Profile $2.00
Twitter Username $2.00
Age(s) of Children Living at Home $1.75
Yearly Income $1.50
Make and Model of Car Driven Most Often $1.00
Home Address (City) $1.00
Brand Name of Computer Used Most Often $1.00
Date of Birth (Month, Day and Year) $1.00
Highest Level of Education $1.00
College Major $1.00
Marital Status $1.00
Political Party Affiliation $1.00
Religious Affiliation $1.00
Home Address (State) $1.00
Home Address (Zip Code) $1.00
Gender $0.50
Favorite Book $0.50
Favorite Movie $0.50
Favorite Restaurant $0.50
Favorite Song $0.50
Favorite TV Show $0.50


  • People just don’t want to be bothered.  Phone number and street address are the pieces of personal information held most dear.
  • Employer and previous employer data are curiously highly-valued.
  • Twitter and Facebook identities are valued more highly than a number of demographic variables, including annual income.

What Personal Data is it Inappropriate Even to Ask About?

I wanted to make an option available to users to indicate that no price could ever persuade them to part with some data.  The option I settled upon was a checkbox labeled “None / Inappropriate / Should Not Ask” that could be selected instead of entering a dollar value.

In the table below, I list the personal data labels and the number of respondents who chose “None / Inappropriate / Should Not Ask” instead of entering a dollar value for the item.

Personal Data Marked Inappropriate
Previous Employers 37
Home or Cell Phone Number 36
Home Address (Street Address) 34
Name of Employer 34
Brand Name of Bank Used Most Often 34
Brand Name of Credit Card Used Most Often 30
Link to (Public) Facebook Profile 29
Twitter Username 28
Age(s) of Children Living at Home 22
Date of Birth (Month, Day and Year) 22
Make and Model of Car Driven Most Often 20
Home Address (State) 17
Religious Affiliation 15
Home Address (City) 15
Political Party Affiliation 14
Home Address (Zip Code) 14
Brand Name of Computer Used Most Often 13
Highest Level of Education 7
College Major 7
Marital Status 7
Gender 7
Favorite Book 6
Favorite Movie 6
Favorite Song 6
Favorite TV Show 6
Favorite Restaurant 5
Yearly Income 1

The ordering should look rather familiar.  The median price ordering and the inappropriate to ask ordering are almost identical. The (Spearman) r-value for the correlation is 0.96. This suggests that the same personal data components that were given high dollar values (by respondents willing to affix a dollar value) were the same components that other respondents thought should be unavailable for sale at any price.

Survey Methodology

I used Mechanical Turk to field a survey to 104 people.  I limited the survey to users in the United States.  Below is a partial screenshot that is exactly what the survey-takers saw when beginning the survey.

Survey Instructions and Example Questions

Survey Instructions and Example Questions

The average time each survey-taker spent answering questions was just over four minutes.  They gave a dollar value for the fair price or marked Inappropriate for 27 items.

If you are interested in working with the raw fair price response data, please contact me and I will provide it.


This survey should not be considered “scientific.”  I did not attempt to obtain a random sample of the human population nor even the United States population.  The sample is representative of those people using Amazon Mechanical Turk and willing to take a survey about the dollar value of personal data.  How much of a limitation that is is up to you.

I specifically asked users to provide “the price you believe would be fair to pay someone” for each item.  I did not ask them what the price would have to be for them to sell their own data.  I purposefully did this to reduce noise due to uniquely personal preferences in the data, but I recognize some might feel it better to ask the price question more directly.


  1. Because of the inevitable positive skew for these questions, the median is both nicer to look at and more representative of the actual distribution than the mean.  One way to think about the median value in this context is that of the people surveyed 50% would accept this value as a fair price.  It meets or exceeds their asking price.

Twitterjjj – 180 Million Tweets at a Glance

Twitterjjj is a set of scripts I use to take a first look at what people say when they tweet about a particular topic, brand or person.

I wrote it to a small set of (self-imposed) specifications.

  • Create a 1-page report describing how people discuss a keyword or phrase in their tweets.
  • Ensure the report is easy to read on the web, and that results can be easily downloaded for further analysis.
  • Respond immediately with preliminary results and continue to update as the processor churns through the terabytes.

My corpus of tweets starts in April 2012 and continues to the present. To be non-specific about the numbers, there are about 180 million tweets in this corpus, with about 3 million more added every day.

Many of the reports I’ve run I make publicly available.  For instance, you might discover that people use more negative sentiment than positive when they discuss republicans on twitter.  Don’t worry, the same is true for democrats.  You might have guessed that, but did you know that tweeters are 3 times more likely to mention the “GOP” than “republicans”?  Brevity is the soul of twitter.

Beyond politicians, who occupies a lot of twitter mindshare?  Let’s look at the first three celebrity names that popped into my head: Justin Bieber, Lady Gaga and Kim Kardassian.  We’ll measure mindshare in tweets-per-million.  That is, in every one million tweets, how many times is the keyword (celebrity name in this case) present?

Celeb Tweets per Million
Bieber 908
Gaga 541
Kardassian 116

It looks like Bieber fever is more contagious than the Kardassian cough.

Find me on Twitter @jasonjones_jjj.  I’m happy to hear feedback and suggestions for new keywords to explore.

London 2012 Twitter Olympics

The London 2012 Olympics are upon us.  Lots of athletes will be judged, timed and measured for the athletic things they do.  But aren’t the non-athletic things they do much more interesting?  Like tweeting?

No, of course not.  But that’s not going to stop me from holding my very own Twitter Olympics and handing out (virtual) medals for exceptional Twitter performances.

Below is the list of events and the results so far.  I’ll be rolling out more results as the real Olympics go on.

Preview Event: Games-Dropping

  • Who’s most pumped up for the games? In this event, Olympians score one point for every time they have used the words Olympics, Games, or London in their recent tweets.
Olympian Games, Olympics or London in Tweets
@SwissDom 41
@lolojones 37
@NickSymmonds 28

Full Results:  london2012_gamesd (.xlsx)

Sexy At-Mention

  • In this event, a sampling of thousands of tweets mentioning Olympic athletes was scored. The winning Olympian was the one with the most co-occurrences of their Twitter handle and the words “sexy,” “hottest,” “beautiful,” “cute,” “handsome,” “pretty,” or “babe.”
Olympian Sexy At-Mentions
@matthew_mitcham 25
@Joeingles7 8
@hopesolo 7

Full Results:  london2012_sexy (.xlsx)

The Sesquipedaliathon

  • In this event, scores are awarded based on the average number of syllables per word in the Olympian’s tweets.
Olympian Syllabes Per Word Longest Word
@juanmata10 1.76 visitaremos
@Njr92 1.66 spideranderson
@TipsarevicJanko 1.62 pantomime

Full Results:  london2012_sesq

Most Followed

  • In this event, one point is scored for each Twitter follower.
Olympian Followers
@Njr92 4,953,514
@juanmata10 1,164,329
@DjokerNole 1,111,326

Full Results:  london2012_user_info

Most Followed (by other Olympians)

  • In this event, one point is scored for each Olympian follower.
Olympian Fellow Olympian Followers
@MichaelPhelps 12
@usainbolt 11
@lolojones 9

Full Results:  london2012_degrees

London 2012 Olympians Twitter Follow Network

London 2012 Olympians Twitter Follow Network.  Arrows point from follower to followee.  Click the picture to view a larger version.

Most Follows

  • In this event, one point is scored for each Twitter user the Olympian follows.
Olympian Followees
@officialasafa 3,794
@Njr92 630
@TomDaley1994 542

Full Results: london2012_user_info

Most Follows (of other Olympians)

  • In this event, one point is scored for each fellow Olympian the Olympian follows.
Olympian Follows X Fellow Olympians
@ItsStephRice 8
@OscarPistorius 7
@RickyBerens 7
@drewsullivan8 6
@matthew_mitcham 6
@MichaelPhelps 6
@PopsMBonsu 6

Full Results: london2012_degrees

Special Event: Non-Olympian Most Followed by Olympians

  • The only event (so far) in which non-Olympians compete. Medals to those non-Olympians who are followed by the most Olympians.
Non-Olympian Name Olympian Followers
@OMGFacts OMG Facts 10
@SportsCenter SportsCenter 9
@espn ESPN 8
@Sports_Greats Sports Quotes 8

Full Results:  london2012_nonlist_followees

Olympic Followback

  • Most athletes have many more followERS than followEES. In this event, Olympians are scored according the proportion of their followers that they follow back.
Olympian Followback Percentage
@drewsullivan8 16.7%
@SmoothKJ88 14.2%
@EricBoateng 10.1%

Full Results:  london2012_followback


Olympians on Twitter Olympics

The Olympics bring together the world’s most talented and dedicated athletes.  And so does Twitter.  As a part of my continuing effort to try to do interesting things with the Twitter API, I decided to create my own Olympics for Olympians on Twitter. Er, yeah I think that’s right.

To begin with I created the sociomatrix of Olympian Tweeters.  A sociomatrix is a table where every person in a group gets a row and a column.  Each cell in the table indicates whether a relationship exists between two people (the row person and the column person).  To indicate this, one just places a zero in the cell if the relationship does not exist and one if it does.

Jack Rose Cal
Jack 1 0
Rose 1 0
Cal 0 1

Example Sociomatrix.  The relationship is row in love with column as per James Cameron’s Titanic.

I created a sociomatrix of Olympians on Twitter where the relationship was follows.  Given a sociomatrix, row sums and columns sums are usually interesting, quick summaries of the data.  In our case, a row sum is the number of Olympians one particular account follows.  A column sum is the number of Olympians following a given account.  So, without further ado, let’s get to our first event:  Olympian most followed by other Olympians.

Most Followed (by other Olympians)

Medal Olympian Followed By
Gold @BillyDemong 30
Gold @Shaun_White 30
Gold @ApoloOhno 30
Silver @lindseyvonn 28
Silver @emilycook 28
Bronze @GretchenBleiler 25

Do they allow ties in the real Olympics?  Probably not, but since these are virtual gold medals I’m handing out, why not?

You can probably guess the next event.  And this would probably be the easiest event to win if you knew it was coming.  We know who has the most followers, but who does the most following?

Most Follows (of Olympians)

Medal Olympian Follows
Gold @emilycook 73
Silver @StevenHolcomb 34
Bronze @TFletchernordic 32

The Sesquipedaliathon

Medal Olympian Syllables per word Longest word
Gold @LMCHOLEWINSKI 1.88   obesity
Silver @AngelaRuggiero 1.58   sustainability
Bronze @Pchiddy 1.57   anniversary

In the sesquipedaliathon, Olympians compete on their vocabularies.  Tweeters are ranked by the mean number of syllables in the words in their tweets.  Polysyllabic expressions win out over short words.

Sesquipedalian tweets may be the mark of a skilled wordsmith discussing a complex topic, or they may be the result of needless pretentiousness.  Syllables per word is one component of the Flesch-Kincaid readability scale.  According to the Flesch-Kincaid scale, the more syllables-per-word one uses, the more sophisticated the writing (or the less readable the text, depending on how you want to look at it).

The gold winner @LMCHOLEWINSKI is tweeting at about a 10th grade level.  @LMCHOLEWINSKI’s tweets clock in at about the same level as the discourse in the United States Congress,  according to recent analyses.

(For fun, I checked the syllables per word my dissertation tweetbot outputs.  At 1.61, my doctoral dissertation would take home a silver.)


Medal Olympian Tweets about “Games” or “Olympics”
Gold @ShaniDavis 19
Silver @AngelaRuggiero 17
Bronze @GretchenBleiler 16

For this event, Olympians score every time they use the word “games” or “Olympics.”  So the medal winners are (presumably) those who are talking about the Olympics most often.

Sexy At-Mentions

Medal Olympian Sexy At-Mentions
Gold @vitya_zvesda 16
Silver @lindseyvonn 15
Bronze @louievito 11

Yes, it has come to this.  I needed to find something to do with at-mentions, right?  So why not count for each Olympian how many times someone calls them sexy in a tweet?  And why stop with sexy?

One point for each tweet that mentions the athlete by their twitter handle and also contains one of the following words: hot, sexy, babe, handsome, pretty, beautiful or cute.

Non-Olympian Most Followed by Olympians


Medal Tweeter Olympians Following
Gold @lancearmstrong 30
Silver @ConanOBrien 24
Bronze @BarackObama 20
Bronze @TheEllenShow 20
Bronze @StephenAtHome 20
Bronze @universalsports 20
Bronze @shitmydadsays 20

This event was the toughest – as far as programming time goes.  First, I grabbed everyone my list of Olympians follow.  Then I aggregated to find out exactly how many Olympians followed each account.  Then I filtered out Olympians to get this list of non-Olympians most followed by Olympians.

That’s the last of the events for now.  Please check below for updates, and leave ideas for new Twitter Olympics events in the comments!


UPDATE: The list of Olympians used here came straight from Twitter’s verified accounts page.  However, it’s rather wonky.  I have a new, better list of London 2012 Olympians on Twitter and I’ll be re-running all of these analyses on this list.  Check for a link to the London 2012 version of these events on Friday the 27th.

UPDATE: London 2012 Twitter Olympics now available.

Twitter Follow Network for Political Networks Conference

I am currently attending the 5th Annual Political Networks Conference in beautiful Boulder, CO.  On twitter, the conference is served by the account @PolNetworks and the hash tag #PolNet2012.  Just for fun, below is a depiction of the follow network for the @PolNetworks account and all the twitter users who follow @PolNetworks.


Figure: Best described as the first-degree egocentric follow network of @PolNetworks.  Click the picture for a larger version.

This is a directed graph.  Arrows point from follower to followee.  Obviously, PolNetworks is in the center of this graph, because every user follows PolNetworks.

Graph Density:  0.15

Graph Transitivity:  0.56

Graph Connectedness:  1.00

Graph Efficiency:  0.87

Some Node-Level Measures:

Account inDegree outDegree Eigen. Centrality
JaciKettler 8 14 0.36
smotus 23 12 0.32
kwcollins 14 12 0.31
jlove1982 6 9 0.28
JohnCluverius 7 9 0.28
JeffGulati 6 9 0.25
RebeccaHannagan 2 8 0.24
therriaultphd 12 8 0.23
BrendanNyhan 21 9 0.21
davekarpf 7 7 0.21
richardmskinner 6 8 0.21
ianpcook 3 7 0.2
hsquared47 3 7 0.19
jon_m_rob 0 6 0.16
sissenberg 7 6 0.16
First_Street 1 6 0.15
FHQ 12 4 0.1
heathbrown 2 5 0.1
DocPolitics 1 4 0.09
ajungherr 0 3 0.07
archimedino 0 3 0.07
GeoffLorenz 0 3 0.07
JoeLenski 2 4 0.07
slimbock 0 2 0.05
James_H_Fowler 2 3 0.04
PolNetworks 34 1 0.04
jasonjones_jjj 1 3 0.03
krmckelv 1 2 0.03
dogaker 0 1 0.01
DominikBatorski 0 1 0.01
janschulz 0 1 0.01
jboxstef 0 1 0.01
matthewhitt 0 1 0.01
ophastings 0 1 0.01
stefanjwojcik 0 1 0.01

Edge list in xlsx format:  polnetworks_edge_list

Data collected 6/12/2012

Does Barack Obama follow Queen Noor?

Twitter maintains a few lists of verified accounts. One of these lists includes 38 world leaders. Using Twitter’s fantastic API, I did some detective work to see which world leaders “follow” which others.

Follow network of verified world leaders’ Twitter accounts.

The graph is messy, but it displays some order. Barack Obama (@BarackObama) and David Cameron (@Number10gov) tie for the most followers at 17 each and appear toward the center of the network. The Prime Minister is more reciprocal in his attention – with 13 outgoing follows to The President’s mere 4.

What does it mean for one world leader to follow another on Twitter? Probably not much. Perhaps there will come a day when it is a diplomatic faux pas to meet with a head of state and then neglect to follow his Twitter account.

As for whether Barack Obama follows Queen Noor? He does not. @QueenNoor‘s follow of @BarackObama is unrequited.

Politics or Sports

When tweeting, what words do people use when they are talking about politics? I did a fast analysis of the last 1000 tweets from the 16 most popular political bloggers and the last 1000 tweets from the 16 most popular sports bloggers.

Here are the overall word counts, the counts for political and sports tweets separately, and a measure of the politics/sports diagnosticity.

Word Count Political Count Sports Count Chi Square
obama 1137 1132 5 558.54
gop 659 659 0 329.50
game 758 28 730 325.07
house 551 540 11 253.94
yankees 500 2 498 246.02
senate 452 452 0 226.00
party 432 413 19 179.67
vs 598 69 529 176.92
democrats 319 319 0 159.50
health 346 336 10 153.58
tea 324 319 5 152.15
president 370 349 21 145.38
ufc 293 1 292 144.51
angels 282 0 282 141.00
election 271 271 0 135.50
political 265 262 3 126.57
vote 345 319 26 124.42
rangers 256 3 253 122.07
reform 242 242 0 121.00
update 568 100 468 119.21
#yankees 232 0 232 116.00
giants 248 5 243 114.20
#ufc 225 0 225 112.50
lakers 225 2 223 108.54
#mma 216 0 216 108.00
on 4265 2605 1660 104.69
watch 446 374 72 102.25
government 204 204 0 102.00
campaign 217 213 4 100.65
law 230 222 8 99.56
obama’s 199 199 0 99.50
us 406 343 63 96.55
dodgers 197 1 196 96.51
bowl 200 2 198 96.04
(video) 211 206 5 95.74
rally 258 240 18 95.51
football 205 6 199 90.85
republicans 181 181 0 90.50
tax 194 189 5 87.26
today 516 407 109 86.05
polls 185 181 4 84.67
obamacare 169 169 0 84.50
republican 169 169 0 84.50
palin 168 168 0 84.00
coach 175 2 173 83.55
bush 184 179 5 82.27
dems 163 163 0 81.50
nfl 189 8 181 79.18
voters 180 174 6 78.40
basketball 160 1 159 78.01
o’donnell 153 153 0 76.50
fans 178 7 171 75.55
players 162 3 159 75.11
news 389 315 74 74.65
race 214 196 18 74.03
[delicious] 147 0 147 73.50
obamas 143 143 0 71.50
kings 158 4 154 71.20
team 307 49 258 71.14
#rangers 139 0 139 69.50
care 228 202 26 67.93
play 230 27 203 67.34
congress 139 137 2 65.56
season 199 19 180 65.13
kobe 130 0 130 65.00
bill 253 217 36 64.75
america 170 159 11 64.42
politics 124 124 0 62.00
in 5766 3304 2462 61.48
et 149 142 7 61.16
sox 122 0 122 61.00
jobs 137 133 4 60.73
economy 125 124 1 60.52
notes 178 16 162 59.88
debate 163 151 12 59.27
cam 116 0 116 58.00
democratic 116 116 0 58.00
125 115 0 115 57.50
brandon 115 0 115 57.50
cbs 125 122 3 56.64
christine 112 112 0 56.00
player 123 3 120 55.65
preview 169 16 153 55.53
elections 115 114 1 55.52
democrat 111 111 0 55.50
sen 110 110 0 55.00
americans 121 118 3 54.65
supreme 113 112 1 54.52
dem 111 110 1 53.52
rep 122 118 4 53.26
speech 110 109 1 53.02
pelosi 106 106 0 53.00
fox 143 133 10 52.90
its 300 239 61 52.81
#playoffs 105 0 105 52.50
security 116 113 3 52.16
espn 113 3 110 50.66
usc 108 2 106 50.07
deal 199 29 170 49.95
mosque 99 99 0 49.50
american 194 166 28 49.08

Chi-square (the last column) was calculated with the chi-square formula: (Observed frequency – Expected Frequency)^2 / Expected Frequency. The “Expected Frequency” in this case was half the total number of times the word appeared. In other words, we assume each word has an equal chance of appearing in a sports tweet or political tweet, and then measure how much that assumption was violated.

This table lists only the top most diagnostic words. Of course there were tens of thousands more. However, if you ever need to build a quick-and-dirty classifier or settle a bet on which words separate the pols from the jocks, here’s your answer. 🙂