Tag Archives: social networks

Learning How Things Go Together

[This is my attempt at converting my dissertation abstract to “Up-Goer Five speak” (i.e. using only the 1000 most-frequently used English words).  For context, here’s the xkcd comic that started the trend.  Search the #upgoer5 hashtag on Twitter for more.  Try it yourself on the Up-Goer Five text editor.]

Big things are just many small things put together. It would be good to know which small things go together. You could learn how a brain works by thinking this way. Or you could learn which people like which other people. Thinking about how small things are put together to make big things is a good idea. It would be good to know how we learn, and how we should learn which things go together.

To this end, I did five studies in which people learned which things in a set were joined together. To show you what I mean, some people learned “who is friends with who” in a friend group. But other people learned about other things that were joined together – like which cities have roads that go between them. By doing these studies, I found out a few things. One thing I learned was that it matters how the things are joined up. To show you what I mean, think about the friend group again. It is easier to learn who is friends with who in a group where few people have many friends and many people have few friends. If things are more even, and all people have about the same number of friends, it is hard to learn exactly who is friends with who.

It doesn’t matter if the joined things are people or cities or computers. It is all the same. Also, it doesn’t seem to matter much why it is you are learning what things go together.

I also show that people learn better by seeing a picture of joined-together things rather than reading about joined-together things. This is the case even more when the things that are joined are made to be close together in the picture.

Finally, I talk about an all-around idea for how people learn about groups of joined together things. I say people start out by quickly sorting things into much-joined and few-joined types. Then they more slowly learn which one thing is joined to which one other thing a little at a time.

London 2012 Twitter Olympics

The London 2012 Olympics are upon us.  Lots of athletes will be judged, timed and measured for the athletic things they do.  But aren’t the non-athletic things they do much more interesting?  Like tweeting?

No, of course not.  But that’s not going to stop me from holding my very own Twitter Olympics and handing out (virtual) medals for exceptional Twitter performances.

Below is the list of events and the results so far.  I’ll be rolling out more results as the real Olympics go on.

Preview Event: Games-Dropping

  • Who’s most pumped up for the games? In this event, Olympians score one point for every time they have used the words Olympics, Games, or London in their recent tweets.
Olympian Games, Olympics or London in Tweets
@SwissDom 41
@lolojones 37
@NickSymmonds 28

Full Results:  london2012_gamesd (.xlsx)

Sexy At-Mention

  • In this event, a sampling of thousands of tweets mentioning Olympic athletes was scored. The winning Olympian was the one with the most co-occurrences of their Twitter handle and the words “sexy,” “hottest,” “beautiful,” “cute,” “handsome,” “pretty,” or “babe.”
Olympian Sexy At-Mentions
@matthew_mitcham 25
@Joeingles7 8
@hopesolo 7

Full Results:  london2012_sexy (.xlsx)

The Sesquipedaliathon

  • In this event, scores are awarded based on the average number of syllables per word in the Olympian’s tweets.
Olympian Syllabes Per Word Longest Word
@juanmata10 1.76 visitaremos
@Njr92 1.66 spideranderson
@TipsarevicJanko 1.62 pantomime

Full Results:  london2012_sesq

Most Followed

  • In this event, one point is scored for each Twitter follower.
Olympian Followers
@Njr92 4,953,514
@juanmata10 1,164,329
@DjokerNole 1,111,326

Full Results:  london2012_user_info

Most Followed (by other Olympians)

  • In this event, one point is scored for each Olympian follower.
Olympian Fellow Olympian Followers
@MichaelPhelps 12
@usainbolt 11
@lolojones 9

Full Results:  london2012_degrees

London 2012 Olympians Twitter Follow Network

London 2012 Olympians Twitter Follow Network.  Arrows point from follower to followee.  Click the picture to view a larger version.

Most Follows

  • In this event, one point is scored for each Twitter user the Olympian follows.
Olympian Followees
@officialasafa 3,794
@Njr92 630
@TomDaley1994 542

Full Results: london2012_user_info

Most Follows (of other Olympians)

  • In this event, one point is scored for each fellow Olympian the Olympian follows.
Olympian Follows X Fellow Olympians
@ItsStephRice 8
@OscarPistorius 7
@RickyBerens 7
@drewsullivan8 6
@matthew_mitcham 6
@MichaelPhelps 6
@PopsMBonsu 6

Full Results: london2012_degrees

Special Event: Non-Olympian Most Followed by Olympians

  • The only event (so far) in which non-Olympians compete. Medals to those non-Olympians who are followed by the most Olympians.
Non-Olympian Name Olympian Followers
@OMGFacts OMG Facts 10
@SportsCenter SportsCenter 9
@espn ESPN 8
@Sports_Greats Sports Quotes 8

Full Results:  london2012_nonlist_followees

Olympic Followback

  • Most athletes have many more followERS than followEES. In this event, Olympians are scored according the proportion of their followers that they follow back.
Olympian Followback Percentage
@drewsullivan8 16.7%
@SmoothKJ88 14.2%
@EricBoateng 10.1%

Full Results:  london2012_followback


Olympians on Twitter Olympics

The Olympics bring together the world’s most talented and dedicated athletes.  And so does Twitter.  As a part of my continuing effort to try to do interesting things with the Twitter API, I decided to create my own Olympics for Olympians on Twitter. Er, yeah I think that’s right.

To begin with I created the sociomatrix of Olympian Tweeters.  A sociomatrix is a table where every person in a group gets a row and a column.  Each cell in the table indicates whether a relationship exists between two people (the row person and the column person).  To indicate this, one just places a zero in the cell if the relationship does not exist and one if it does.

Jack Rose Cal
Jack 1 0
Rose 1 0
Cal 0 1

Example Sociomatrix.  The relationship is row in love with column as per James Cameron’s Titanic.

I created a sociomatrix of Olympians on Twitter where the relationship was follows.  Given a sociomatrix, row sums and columns sums are usually interesting, quick summaries of the data.  In our case, a row sum is the number of Olympians one particular account follows.  A column sum is the number of Olympians following a given account.  So, without further ado, let’s get to our first event:  Olympian most followed by other Olympians.

Most Followed (by other Olympians)

Medal Olympian Followed By
Gold @BillyDemong 30
Gold @Shaun_White 30
Gold @ApoloOhno 30
Silver @lindseyvonn 28
Silver @emilycook 28
Bronze @GretchenBleiler 25

Do they allow ties in the real Olympics?  Probably not, but since these are virtual gold medals I’m handing out, why not?

You can probably guess the next event.  And this would probably be the easiest event to win if you knew it was coming.  We know who has the most followers, but who does the most following?

Most Follows (of Olympians)

Medal Olympian Follows
Gold @emilycook 73
Silver @StevenHolcomb 34
Bronze @TFletchernordic 32

The Sesquipedaliathon

Medal Olympian Syllables per word Longest word
Gold @LMCHOLEWINSKI 1.88   obesity
Silver @AngelaRuggiero 1.58   sustainability
Bronze @Pchiddy 1.57   anniversary

In the sesquipedaliathon, Olympians compete on their vocabularies.  Tweeters are ranked by the mean number of syllables in the words in their tweets.  Polysyllabic expressions win out over short words.

Sesquipedalian tweets may be the mark of a skilled wordsmith discussing a complex topic, or they may be the result of needless pretentiousness.  Syllables per word is one component of the Flesch-Kincaid readability scale.  According to the Flesch-Kincaid scale, the more syllables-per-word one uses, the more sophisticated the writing (or the less readable the text, depending on how you want to look at it).

The gold winner @LMCHOLEWINSKI is tweeting at about a 10th grade level.  @LMCHOLEWINSKI’s tweets clock in at about the same level as the discourse in the United States Congress,  according to recent analyses.

(For fun, I checked the syllables per word my dissertation tweetbot outputs.  At 1.61, my doctoral dissertation would take home a silver.)


Medal Olympian Tweets about “Games” or “Olympics”
Gold @ShaniDavis 19
Silver @AngelaRuggiero 17
Bronze @GretchenBleiler 16

For this event, Olympians score every time they use the word “games” or “Olympics.”  So the medal winners are (presumably) those who are talking about the Olympics most often.

Sexy At-Mentions

Medal Olympian Sexy At-Mentions
Gold @vitya_zvesda 16
Silver @lindseyvonn 15
Bronze @louievito 11

Yes, it has come to this.  I needed to find something to do with at-mentions, right?  So why not count for each Olympian how many times someone calls them sexy in a tweet?  And why stop with sexy?

One point for each tweet that mentions the athlete by their twitter handle and also contains one of the following words: hot, sexy, babe, handsome, pretty, beautiful or cute.

Non-Olympian Most Followed by Olympians


Medal Tweeter Olympians Following
Gold @lancearmstrong 30
Silver @ConanOBrien 24
Bronze @BarackObama 20
Bronze @TheEllenShow 20
Bronze @StephenAtHome 20
Bronze @universalsports 20
Bronze @shitmydadsays 20

This event was the toughest – as far as programming time goes.  First, I grabbed everyone my list of Olympians follow.  Then I aggregated to find out exactly how many Olympians followed each account.  Then I filtered out Olympians to get this list of non-Olympians most followed by Olympians.

That’s the last of the events for now.  Please check below for updates, and leave ideas for new Twitter Olympics events in the comments!


UPDATE: The list of Olympians used here came straight from Twitter’s verified accounts page.  However, it’s rather wonky.  I have a new, better list of London 2012 Olympians on Twitter and I’ll be re-running all of these analyses on this list.  Check for a link to the London 2012 version of these events on Friday the 27th.

UPDATE: London 2012 Twitter Olympics now available.

Learning a Social Graph Does NOT Depend on Method of Training

Learning a social graph does NOT depend on method of training. So far, at least.

One of my initial hypotheses when I began to study the acquisition of social network structure was this:

Some forms of representation of the network will lead to faster acquisition than others.

What I meant was that there are many ways you can represent a social network. You can put people’s names in circles and draw lines between the circles to represent relationships. You could simply list all the dyads – pairs of people who are friends, for example. Formally, you might create an adjacency matrix. Some of these should make learning who is connected to whom easy and some should make it less easy, right?

Well, not so far. However, I’m just beginning to explore the space, and the manipulations I’ve implemented so far may be too subtly different. For instance, I first trained subjects by either Random Edge or Network Walk training.

In Random Edge training, subjects were told they would be shown one dyad at a time – two names representing a pair of people who were friends. The friendships shown to the subjects were randomly drawn from the set of existing friendships. (An “edge” in a graph is a connection between two nodes.)

In Network Walk training, subjects similarly saw pairs of friends’ names. However, one name in each pair was always the same as a name in the previous pair. For example, a subject would see the sequence Frank-Bob, Bob-Alice, Alice-Cindy. Unlike Random Edge training, this type of training emphasizes the larger structure of the graph by taking you on a “walk” through the “network.”

This does not seem to make much difference to learners, however. In the graph below, you will notice that the type of training (Edge = Random Edge and Walk = Network Walk) makes no difference in how quickly subjects acquire information.

Random Edge vs. Network Walk Training

The type of graph affects speed of acquisition as expected. Random graphs take longer to learn than scale-free graphs.

The similarity in the pace of acquisition can also be conveyed in the form of learning curves. Here we see the number of errors decline as subjects are given more and more of each type of training.

Learning Curves - Random Edge vs. Network Walk

The difference between the two curves is not statistically reliable. Most of the confidence intervals around the data points overlap.

What should we make of the very similar performance for these two types of training? Being presented the connections in a graph in a systematic way (walking from edge to edge) seems to present no advantage over being given the list of edges in a random order. Of course, there is the usual caution about not accepting the null hypothesis. It may be the effect is just very small, and the experiment lacked power. However, the same experiment was powerful enough to detect the difference between random graph and scale-free graph acquistion rates, so if nothing else we can assume any possible effect is smaller than the graph structure effect.

I am still convinced there must be better and worse ways to learn the structure of a graph, and I’ll be experimenting with various training methods over the coming months. Check the blog for new results!

Scale-Free Graphs are Easier to Learn than Random Graphs

The first of my three hypotheses about social network acquisition concerns the structure of the social network graph. The claim is that human subjects will acquire a network’s structure more quickly if it resembles a true human social network rather than an arbitrary network. To translate this into an experiment, I compared the learning rate for subjects learning a random graph to the learning rate for those learning a scale-free graph.

What is a random social network graph? A random social network graph is a graph in which people are nodes, and the friendship ties between them (edges in the graph) are placed at random. In other words, there is nothing special about the node that determines what edges it participates in. All the edges (friendships) are sprinkled at random within the graph. Below is an illustration of a random graph.

Random Social Network Graph. Produced using the Erdős-Rényi method.

What is a scale-free social network graph? A scale-free social network graph is a graph in which the more edges (friendships) a node (person) participates in, the more likely that node will be to form new edges. In other words, the rich get richer, or the more popular one is the easier it is to make new friends. Below is an illustration of a scale-free graph.

Scale-Free Graph

Scale-Free Social Network Graph. Produced using the Barabási–Albert method.

So which of these two types of social network graph is easier to learn? The scale-free graph. I’ll post the figures for a couple experiments below. This is a clear and reliable result that replicates across all of my studies so far.

scale free vs random graph results

Scale-free graphs are acquired more quickly than random graphs. The number of trials needed to reach criterion performance is lower.

scale free vs random graph learning curves

Scale-free graphs are acquired more quickly than random graphs. The number of errors made during training decreases more rapidly for the scale-free social network graph.

UPDATE 8/21/2012: I’ve replicated this result several times now. More details and a formal description of the experiment and the results are available in pre-prints of two papers on my SSRN author page:

Acquiring Social Network Knowledge

The shotgun approach isn’t just the name of the blog. I live it. I have so many projects going that I buy file folders by the pallet.

The one I’m really excited about at the moment is an attempt to bring together the two worlds I’ve been living in for the past year – cognitive psychology and social networks – and keep myself working on my dissertation at the same time.

For this project, I’ll be running several online experiments. The main goal is to characterize exactly how humans acquire and retain social network information.

I’ll be posting links to experiments and results here as time allows. For now, I’ll list the first few hypotheses I’ll be testing:

  • Human subjects will acquire a network’s structure more quickly if it resembles a true human social network rather than an arbitrary network. To operationalize this, I will measure learning curves as subjects learn the structure of random or scale-free graphs.
  • Human subjects will acquire a network’s structure more quickly if it is framed as a social network as opposed to the same network framed in some other manner (e.g. a computer or transport network.)
  • Some forms of representation of the network will lead to faster acquisition than others. For example, you might represent a network as a series of edges between vertices (e.g. friendships between people) or you might represent a network as a traversal of the links within it (think of following links in the Kevin Bacon Game). Some forms will lead to faster acquisition than others, and this will allow us to draw conclusions about how graph information is represented in the brain.

Check back for updates on this project, and please leave feedback and questions.