When tweeting, what words do people use when they are talking about politics? I did a fast analysis of the last 1000 tweets from the 16 most popular political bloggers and the last 1000 tweets from the 16 most popular sports bloggers.
Here are the overall word counts, the counts for political and sports tweets separately, and a measure of the politics/sports diagnosticity.
Word |
Count |
Political Count |
Sports Count |
Chi Square |
obama |
1137 |
1132 |
5 |
558.54 |
gop |
659 |
659 |
0 |
329.50 |
game |
758 |
28 |
730 |
325.07 |
house |
551 |
540 |
11 |
253.94 |
yankees |
500 |
2 |
498 |
246.02 |
senate |
452 |
452 |
0 |
226.00 |
party |
432 |
413 |
19 |
179.67 |
vs |
598 |
69 |
529 |
176.92 |
democrats |
319 |
319 |
0 |
159.50 |
health |
346 |
336 |
10 |
153.58 |
tea |
324 |
319 |
5 |
152.15 |
president |
370 |
349 |
21 |
145.38 |
ufc |
293 |
1 |
292 |
144.51 |
angels |
282 |
0 |
282 |
141.00 |
election |
271 |
271 |
0 |
135.50 |
political |
265 |
262 |
3 |
126.57 |
vote |
345 |
319 |
26 |
124.42 |
rangers |
256 |
3 |
253 |
122.07 |
reform |
242 |
242 |
0 |
121.00 |
update |
568 |
100 |
468 |
119.21 |
#yankees |
232 |
0 |
232 |
116.00 |
giants |
248 |
5 |
243 |
114.20 |
#ufc |
225 |
0 |
225 |
112.50 |
lakers |
225 |
2 |
223 |
108.54 |
#mma |
216 |
0 |
216 |
108.00 |
on |
4265 |
2605 |
1660 |
104.69 |
watch |
446 |
374 |
72 |
102.25 |
government |
204 |
204 |
0 |
102.00 |
campaign |
217 |
213 |
4 |
100.65 |
law |
230 |
222 |
8 |
99.56 |
obama’s |
199 |
199 |
0 |
99.50 |
us |
406 |
343 |
63 |
96.55 |
dodgers |
197 |
1 |
196 |
96.51 |
bowl |
200 |
2 |
198 |
96.04 |
(video) |
211 |
206 |
5 |
95.74 |
rally |
258 |
240 |
18 |
95.51 |
football |
205 |
6 |
199 |
90.85 |
republicans |
181 |
181 |
0 |
90.50 |
tax |
194 |
189 |
5 |
87.26 |
today |
516 |
407 |
109 |
86.05 |
polls |
185 |
181 |
4 |
84.67 |
obamacare |
169 |
169 |
0 |
84.50 |
republican |
169 |
169 |
0 |
84.50 |
palin |
168 |
168 |
0 |
84.00 |
coach |
175 |
2 |
173 |
83.55 |
bush |
184 |
179 |
5 |
82.27 |
dems |
163 |
163 |
0 |
81.50 |
nfl |
189 |
8 |
181 |
79.18 |
voters |
180 |
174 |
6 |
78.40 |
basketball |
160 |
1 |
159 |
78.01 |
o’donnell |
153 |
153 |
0 |
76.50 |
fans |
178 |
7 |
171 |
75.55 |
players |
162 |
3 |
159 |
75.11 |
news |
389 |
315 |
74 |
74.65 |
race |
214 |
196 |
18 |
74.03 |
[delicious] |
147 |
0 |
147 |
73.50 |
obamas |
143 |
143 |
0 |
71.50 |
kings |
158 |
4 |
154 |
71.20 |
team |
307 |
49 |
258 |
71.14 |
#rangers |
139 |
0 |
139 |
69.50 |
care |
228 |
202 |
26 |
67.93 |
play |
230 |
27 |
203 |
67.34 |
congress |
139 |
137 |
2 |
65.56 |
season |
199 |
19 |
180 |
65.13 |
kobe |
130 |
0 |
130 |
65.00 |
bill |
253 |
217 |
36 |
64.75 |
america |
170 |
159 |
11 |
64.42 |
politics |
124 |
124 |
0 |
62.00 |
in |
5766 |
3304 |
2462 |
61.48 |
et |
149 |
142 |
7 |
61.16 |
sox |
122 |
0 |
122 |
61.00 |
jobs |
137 |
133 |
4 |
60.73 |
economy |
125 |
124 |
1 |
60.52 |
notes |
178 |
16 |
162 |
59.88 |
debate |
163 |
151 |
12 |
59.27 |
cam |
116 |
0 |
116 |
58.00 |
democratic |
116 |
116 |
0 |
58.00 |
125 |
115 |
0 |
115 |
57.50 |
brandon |
115 |
0 |
115 |
57.50 |
cbs |
125 |
122 |
3 |
56.64 |
christine |
112 |
112 |
0 |
56.00 |
player |
123 |
3 |
120 |
55.65 |
preview |
169 |
16 |
153 |
55.53 |
elections |
115 |
114 |
1 |
55.52 |
democrat |
111 |
111 |
0 |
55.50 |
sen |
110 |
110 |
0 |
55.00 |
americans |
121 |
118 |
3 |
54.65 |
supreme |
113 |
112 |
1 |
54.52 |
dem |
111 |
110 |
1 |
53.52 |
rep |
122 |
118 |
4 |
53.26 |
speech |
110 |
109 |
1 |
53.02 |
pelosi |
106 |
106 |
0 |
53.00 |
fox |
143 |
133 |
10 |
52.90 |
its |
300 |
239 |
61 |
52.81 |
#playoffs |
105 |
0 |
105 |
52.50 |
security |
116 |
113 |
3 |
52.16 |
espn |
113 |
3 |
110 |
50.66 |
usc |
108 |
2 |
106 |
50.07 |
deal |
199 |
29 |
170 |
49.95 |
mosque |
99 |
99 |
0 |
49.50 |
american |
194 |
166 |
28 |
49.08 |
|
|
|
|
|
Chi-square (the last column) was calculated with the chi-square formula: (Observed frequency – Expected Frequency)^2 / Expected Frequency. The “Expected Frequency” in this case was half the total number of times the word appeared. In other words, we assume each word has an equal chance of appearing in a sports tweet or political tweet, and then measure how much that assumption was violated.
This table lists only the top most diagnostic words. Of course there were tens of thousands more. However, if you ever need to build a quick-and-dirty classifier or settle a bet on which words separate the pols from the jocks, here’s your answer. 🙂