This is confirmed by the Pearson’s r correlation coefficient of 0.08. Basically, this is a very weak correlation if it indicates any correlation at all. Correlation coefficients measure the degree of linear association between two variables, ranging from +1 a perfect positive correlation to -1 for a perfect negative correlation.
Though interpretations vary somewhat by authors, taking a conservative approach, anything less than a coefficient of about .19 is considered to be a very low to non-existent correlation. A more intuitive interpretation is to square the value, which produces the coefficient of determination, or r-squared. This is called a proportional reduction of error statistic and basically indicates what percent of variation in Y (% TP members) is explained by the variation in X (% unemployment). This isn’t really explanation in a causal sense, but rather if I know X, how much does this reduce my error in predicting Y. In this case, squaring 0.083 gives 0.0069. That is, if we know the percent unemployment in a city our error in predicting the percent of Tea Party members in the city population is reduced by 0.0064 — this is pretty much nothing. So, from the magnitude of the relationship it is safe to say that there is very little if any relationship between unemployment and online Tea Party membership, at least as evidenced in this data.
This conclusion is strengthened by the fact that Pearson’s r is easily skewed by outliers. The high values, for instance, skew the line upward and make the correlation coefficient greater than it would be otherwise. For instance, just removing Nampa, FL (the highest outlier) reduces the correlation coefficient to 0.072 (and that’s just removing one data point from a pretty sizeable sample. Removing the next 4 would make it go down even further because they are all above the line).
We also tested whether this correlation coefficient was statistically significant. In short, because the data was not randomly selected, normally distributed and the variances of the two variables were not equal Pearson’s r can give a biased measure of statistical significance. We did some other data transformations to make the data fit these requirements, but these did not produce normal distributions. To employ a proxy for statistical significance, we used a randomization method that (1) mixes up the original data so that it is randomly distributed; (2) does this 10,000 times and calculates a new correlation coefficient each time; and (3) creates a population of correlation coefficients from this data that is randomly distributed — i.e., in which there is no relationship between unemployment and tea party online membership. This data can then be used to answer the following question: how likely are you to see a correlation as strong as the value found in the original data (0.08) if in fact there is no relationship between unemployment and Tea Party membership? This is the p-value. In this case, the p-value was 0.0741. This indicates that 7.4% of the values in this distribution are larger than the value we got (see the figure below for where our value landed in the distribution). Generally anything above 0.05 is rejected a “statistically non-significant.” Basic this alpha level (the 0.05) is a measure of how willing you are to risk concluding that there is a relationship between the variables when in fact there is none. Generally social scientists says will risk this in 1/20 cases. In this instance (p= 0.0741) we would get our coefficient (r=0.083) 1/13.5 times in a population with no relationship between these two variables. In social science lingo, you would conclude that you cannot reject the null hypothesis of no relationship between the variables at the 0.05 level.
However, this is one of those cases where statistical significance does not really tell us that much about the data. If in fact the p-value was 0.0001, I would still be confident concluding there is little if any relationship between unemployment and Tea Party membership because the magnitude of the relationship (r=0.083) is so small as to be sociologically insignificant.
We also tested to determine the impact of removing further outliers. Dropping out the highest 7 outliers reduces the correlation coefficient to 0.014 and raises the p-value to 0.3982. This means that the correlation coefficient in the first test I ran is inflated by the impact of a few outliers and that there is an even greater chance that you could get this coefficient when there is no relationship between unemployment and Tea Party membership. The resultant graph shows that the line is even flatter than in the first analysis.
One caveat regarding this statistical outcome: the conclusions that can be drawn from it are strongest for the data set on which it was based. We can confidently say that there is no meaningful relationship between these variables across these 372 cities. It is much less clear how well this data can be extended to inferences about the entire population of the 8000+ cities in the Tea Party membership database for the period or the entire population of cities across the United States. Since this dataset is based on cities for which the Bureau of Labor Statistics had unemployment data on, it is likely that this is a set of cities that is larger in population than the whole population.
Summaries of the data:
summary (tp$perunemp)#summary information for unemployment across these cities
Min. 1st Qu. Median Mean 3rd Qu. Max.
5.00 9.00 10.00 10.65 12.00 22.00
> summary(tp$permemb) #sumamry information for percent Tea Party members across the cities.
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.001893 0.039350 0.059780 0.076340 0.092840 0.924200