Category Archives: research

Comments on Lorge & Kruglov, 1950

Summary and commentary for “The Relationship Between the Readability of Pupils’ Compositions and Their Measured Intelligence

Irving Lorge and Lorraine Kruglov

The Journal of Educational Research, Vol. 43, No. 6 (Feb., 1950), pp. 467-474

I have to admit it was a little bit dispiriting to read this article.  First, it describes a project very similar to the one I am about to undertake.  Second, this project beat me to the punch by more than fifty years.  Third, the findings were negative, while I’m expecting my findings to be positive.  And finally, in the 62 years this article has existed, it has garnered exactly 7 citations, so I have to wonder how interested the academy will be in the project I am just starting.  Anyway, back to the article at hand.

In this paper, Lorge and Kruglov use the high-school entrance exam scores of 50 eighth- and ninth-graders to correlate the “readability” of the students’ writing to the same students’ scores on the intelligence-testing portion of the same exam.  They find positive correlations, but the values are low (~.10) and not significantly different from zero.  They conclude that for people matched on education and age level, the complexity of their writing is not a good predictor/substitute/correlate of general intelligence.

The main reason they do not find a significant correlation is likely to be the restricted range of the data.  In the article, the authors mention two successful demonstrations of correlation between readability measures and education levels.  It seems Lorge and Kruglov were too ambitious in thinking that readability would be successful in predicting intelligence in a small sample of relatively similar students: all were eighth- and ninth-graders in New York schools applying for a selective science high school.

One could rightly argue that the data are nearly useless in answering the question of whether there exists a relationship between writing complexity and intelligence in general.  The lack of a significant correlation in this narrow range of measured data points does not disprove an overall relationship that may still exist.

The paper is important in practical terms.  Suppose the test evaluators had intended to use Lorge Readability as the sole measure of subjects’ ability.  The fact that it does not correlate with intelligence in this sample shows this would be a grave mistake.

I still hypothesize that – in general – writing complexity and intelligence will be correlated, but this article gave me some pause.  If evaluation in a narrow range is the goal, I will need to be extremely careful as to whether my methods are rigorous and precise enough to meet that goal.  And I will need to be clear in explaining that they do not, if that is the case.

Quick hits:

  • It sounds like the authors had thousands of exam results to choose from and chose 50 at random for this study.  Times change, I guess.  Although I might have done the same if I was computing all the scores and correlations by hand.
  • On average, students write two grade levels below their current level.  The authors claim this is because students comprehension runs ahead of their ability to compose.
  • The intelligence measure was the total score on 30 arithmetic problems, 60 multiple-choice vocabulary questions 15 “proverb-matching” items.  Compositions were of ~100 words.  I wonder how much longer compositions or multiple compositions per student would have increased the precision of the readability measure.

Learning How Things Go Together

[This is my attempt at converting my dissertation abstract to “Up-Goer Five speak” (i.e. using only the 1000 most-frequently used English words).  For context, here’s the xkcd comic that started the trend.  Search the #upgoer5 hashtag on Twitter for more.  Try it yourself on the Up-Goer Five text editor.]

Big things are just many small things put together. It would be good to know which small things go together. You could learn how a brain works by thinking this way. Or you could learn which people like which other people. Thinking about how small things are put together to make big things is a good idea. It would be good to know how we learn, and how we should learn which things go together.

To this end, I did five studies in which people learned which things in a set were joined together. To show you what I mean, some people learned “who is friends with who” in a friend group. But other people learned about other things that were joined together – like which cities have roads that go between them. By doing these studies, I found out a few things. One thing I learned was that it matters how the things are joined up. To show you what I mean, think about the friend group again. It is easier to learn who is friends with who in a group where few people have many friends and many people have few friends. If things are more even, and all people have about the same number of friends, it is hard to learn exactly who is friends with who.

It doesn’t matter if the joined things are people or cities or computers. It is all the same. Also, it doesn’t seem to matter much why it is you are learning what things go together.

I also show that people learn better by seeing a picture of joined-together things rather than reading about joined-together things. This is the case even more when the things that are joined are made to be close together in the picture.

Finally, I talk about an all-around idea for how people learn about groups of joined together things. I say people start out by quickly sorting things into much-joined and few-joined types. Then they more slowly learn which one thing is joined to which one other thing a little at a time.

Dollar Value of Personal Data

Personal Data - Median Fair Price

Personal Data – Median Fair Price

How much is your personal data worth?  Worth, as in – how much should you sell it for?  In dollar terms.

I went looking for attempts to answer this question and didn’t find much.  So I took a shortcut and asked a bunch of people.  Here’s how I set up the survey:

Imagine your friend has just told you about his new job.  He now works for a company that pays people for their personal data.  For example, you would tell the company your name and the name of your favorite TV show, and you would receive a certain amount of money in return.

Your friend is in charge of setting fair prices for each piece of personal data.  He needs advice from you about how much to pay.  For each of the items below, please provide the price you believe would be fair to pay someone to provide that information.

More methodology details are below, but let’s get straight to the results.  In this table is each facet of personal data requested, and the survey respondents’ median1 dollar price value.

Personal Data Median Fair Price
Home or Cell Phone Number $6.25
Home Address (Street Address) $5.00
Name of Employer $5.00
Previous Employers $5.00
Brand Name of Bank Used Most Often $3.00
Brand Name of Credit Card Used Most Often $2.00
Link to (Public) Facebook Profile $2.00
Twitter Username $2.00
Age(s) of Children Living at Home $1.75
Yearly Income $1.50
Make and Model of Car Driven Most Often $1.00
Home Address (City) $1.00
Brand Name of Computer Used Most Often $1.00
Date of Birth (Month, Day and Year) $1.00
Highest Level of Education $1.00
College Major $1.00
Marital Status $1.00
Political Party Affiliation $1.00
Religious Affiliation $1.00
Home Address (State) $1.00
Home Address (Zip Code) $1.00
Gender $0.50
Favorite Book $0.50
Favorite Movie $0.50
Favorite Restaurant $0.50
Favorite Song $0.50
Favorite TV Show $0.50


  • People just don’t want to be bothered.  Phone number and street address are the pieces of personal information held most dear.
  • Employer and previous employer data are curiously highly-valued.
  • Twitter and Facebook identities are valued more highly than a number of demographic variables, including annual income.

What Personal Data is it Inappropriate Even to Ask About?

I wanted to make an option available to users to indicate that no price could ever persuade them to part with some data.  The option I settled upon was a checkbox labeled “None / Inappropriate / Should Not Ask” that could be selected instead of entering a dollar value.

In the table below, I list the personal data labels and the number of respondents who chose “None / Inappropriate / Should Not Ask” instead of entering a dollar value for the item.

Personal Data Marked Inappropriate
Previous Employers 37
Home or Cell Phone Number 36
Home Address (Street Address) 34
Name of Employer 34
Brand Name of Bank Used Most Often 34
Brand Name of Credit Card Used Most Often 30
Link to (Public) Facebook Profile 29
Twitter Username 28
Age(s) of Children Living at Home 22
Date of Birth (Month, Day and Year) 22
Make and Model of Car Driven Most Often 20
Home Address (State) 17
Religious Affiliation 15
Home Address (City) 15
Political Party Affiliation 14
Home Address (Zip Code) 14
Brand Name of Computer Used Most Often 13
Highest Level of Education 7
College Major 7
Marital Status 7
Gender 7
Favorite Book 6
Favorite Movie 6
Favorite Song 6
Favorite TV Show 6
Favorite Restaurant 5
Yearly Income 1

The ordering should look rather familiar.  The median price ordering and the inappropriate to ask ordering are almost identical. The (Spearman) r-value for the correlation is 0.96. This suggests that the same personal data components that were given high dollar values (by respondents willing to affix a dollar value) were the same components that other respondents thought should be unavailable for sale at any price.

Survey Methodology

I used Mechanical Turk to field a survey to 104 people.  I limited the survey to users in the United States.  Below is a partial screenshot that is exactly what the survey-takers saw when beginning the survey.

Survey Instructions and Example Questions

Survey Instructions and Example Questions

The average time each survey-taker spent answering questions was just over four minutes.  They gave a dollar value for the fair price or marked Inappropriate for 27 items.

If you are interested in working with the raw fair price response data, please contact me and I will provide it.


This survey should not be considered “scientific.”  I did not attempt to obtain a random sample of the human population nor even the United States population.  The sample is representative of those people using Amazon Mechanical Turk and willing to take a survey about the dollar value of personal data.  How much of a limitation that is is up to you.

I specifically asked users to provide “the price you believe would be fair to pay someone” for each item.  I did not ask them what the price would have to be for them to sell their own data.  I purposefully did this to reduce noise due to uniquely personal preferences in the data, but I recognize some might feel it better to ask the price question more directly.


  1. Because of the inevitable positive skew for these questions, the median is both nicer to look at and more representative of the actual distribution than the mean.  One way to think about the median value in this context is that of the people surveyed 50% would accept this value as a fair price.  It meets or exceeds their asking price.

Employment Progression

In my new dataset, each row is a series of jobs that one person has had.

Most of them are quotidian:
Junior Tax Analyst –> Senior Tax Analyst
Investment Banking –> Investment Banking –> Investment Banking

Some of them are funny:
Corn Detassler –> Flight Delivery Center Technician
Quabbity Assuance –> Electronics Sales Associate

Some baffling:
Gymnast –> Air Traffic Controller
bust boy –> bust boy –> bust boy?

And some inspiring:
Dishwasher –> Dishwasher –> Model