Skip to main content

Lying Statistic and the Lying Politicians that Use Them

 Ok... first, a disclaimer. I don't often expose this, because, for some reason, guitar players, and musicians in general, are not supposed to have brains. Though my own brain is probably not even close to the most educated in the musician sphere*, I do have a degree in Math/Computer Science, with a minor in Philosophy and emphasis in statistics. Advanced statistics is VERY difficult. The whizzes in stats become actuaries after passing a series of tests, largely known for their insane difficulty. I'm NOT a whiz... in fact, I fell asleep in statistics class and fell off my chair one day.

This blog entry doesn't touch on advanced statistics and you don't need more than a 6th grade education to understand this. In other words: Donald Trump -- just stop here. You'll be lost after this point.

Today, I'm writing about how politicians -- or their corporate sponsors -- like to quote statistics to support their spin of the day. However, in many instances, the exact same data used to tout their viewpoint also can be used to support the exact opposite opinion.

How does this work? As an example, I'm going to cite some hypothetical data on household income, and poverty levels.

Example 1: Using "Average" and "Median" to determine income distribution

Suppose your friendly neighborhood pundit enthusiastically announces that under the current administration, AVERAGE income has increased by 15%.

Well -- this sounds like the administration policies must be really wise and effective. But, let's deconstruct this to see how this is not at ALL a meaningful indicator of income distribution.

Suppose we're working with this data set (reduced for simplification):

Incomes of the population of Dullsville:

SAMPLE 1: {$15,000, $16,000, $17,000, $17,500, $16,500}

Obviously, there are very few people in Dullsville, which flies in the face of experience. But it's pretty easy to calculate the Average -- you just add all the incomes together and then divide by the number of data items, in this case, 5.

I've done the math for you -- sum of incomes: $82,000, 5 persons, $82,000 / 5 = $16,400.

OK.... this would appear to be a pretty income challenged town... an average income of $16,400, about $10,000 short of the poverty level for a family of 4. These people are most likely not eating very well. (Sadly to say, the yearly income of an earner working a minimum wage job @ $7.25/hr, 40 hr/week, 52 weeks/year is just $15,080. In order to achieve federal poverty level, a family with a single earner must make at least $12.50/hr.)

But let's make this addition to the data sample. The owner of the sweatshop where the other five persons in Dullsville toil in desperation, decides to move his home to Dullsville. His annual income is $14,000,000,000. Lets add this into our existing data set. Now, the "average" income is calculated to be  $2,333,347,000 -- over 2 billion per family! Wow... now the town is one of the richest in the world! That tycoon can brag that during his reign, the average income of Dullsville increased by over $2.3 billion! (Lying bastard!)

Clearly, the statistic of Average Income is monumentally misleading -- in fact so bad that it's useless.

There's a second measure called The Median. This value is determined by ordering the data in increasing value -- smallest to largest. The median is the value in the center. If the number of data points is even, the two middle values are averaged to determine the Median. In the above example, before the Dullsville, Inc. magnate descends upon the town, the median income would be $16,500. The Median shows that there are as many data points below this value as are above it. With this simple data set, and before the addition of Mr. Dullsville, both the Average and the Median are really fairly accurate descriptions of the economic reality of Dullsville.

The problem with the Median as a measure is that, like the Average, wildly scattered data can seriously skew this measure of wealth. Let's say that four VERY impoverished families move to Dullsville... so the data set becomes:

SAMPLE 2: {$0, $2, $3, $4, $10,000, $15,000, $16,000, $17,000, $18,000}. 

Now, the Average income is calculated as $8,445.44. The Median is $10,000. Obviously neither of these tells us much about the income of Dullsville. Even if the tycoon of Dullsville's income is substituted for that of the previously richest family, the Average rises to around $1.5 Million, but the Median is remains $10,000. Neither statistic is at all descriptive of the actual financial state of Dullsville.

How to more clearly interpret data

Accurately interpreting data is VERY complex. This is why actuaries are the elites of statisticians. I think casinos also employ actuaries to keep their business in the black. It's also very difficult to condense data such as income to a single value that is at all significant and few people have the patience to digest anything at all complicated -- just ask Dr. Fauci. (If you're still reading at this point, CONGRATULATIONS... you're an unusual American!

But if we don't understand at least how statistics can be manipulated to deceive, then how are we to make informed decisions with respect to our votes?

To restate, there IS no one number that can express a reliable measure of a data set, but there are other figures that can be calculated and that can help to understand data.

The first of these is The Range of data. This is simply the difference between the highest value and the lowest value in a dataset. Clearly, knowing the range gives you a clue that can help to understand the spread of data. In SAMPLE 2 above, for example, the range is $18,000 (excluding the tycoon's income.) In SAMPLE 1, the range is a mere $2,500. From this, we can determine that the data in SAMPLE 1 is more consistent. So, possibly, the average and mean values are more indicative of a real measure.

A third compilation figure is The Variance. The variance is generally an interim step toward calculating The Standard Deviation. To calculate the variance, you first calculate the Average, then add together the difference between each data point and the mean (average) squared; then average this set of values. Here are the steps for SAMPLE 1 as I calculated the Variance:

VARIANCE OF SAMPLE 1:

1. Average =     (15,000+16,000+17,000+17,500+16,500) / 5 = (82,000 / 5) = 16,400

2. Differences between each data point and the Average, squared:

(16,400 - 15,000) ^ 2     = 1400 ^ 2     = 1,960,000
(16,400 - 16,000) ^ 2     = 400 ^ 2       = 160,000
(16,400 - 17,000) ^ 2     = 600 ^ 2       = 360,000
(16,400 - 17,500) ^ 2     = 1,100 ^ 2    = 1,210,000
(16,400 - 16,500) ^ 2     = 100 ^ 2      =  10,000

(Note that for the difference, I'm using "absolute values", that is, the minus sign is ignored. It doesn't matter, because a negative number squared is positive anyway.)

3. Average of differences squared from Step 2:

(1,960,000 + 160,000 + 360,000 + 1210,000 + 10,000) = 3,700,000
3,700,000 / 5 = 740,000

This is abbreviated by the Greek letter, "σ2" (Sigma squared). 

4. Then standard deviation,  "σ"  (Sigma) is the square root of the variance.

If this seems like a lot of ballyhoo just to calculate a number that seems no more informative than just average and median -- well, it's not for naught. Once you know the standard deviation, you can determine which values lie outside the standard deviation. 

Let's take the example of the incomes of five families in Dullsville -- not including the tycoon. I'm going to do the math off blog -- you don't want to deal with this stuff, I know, which is how we got into trouble in the first place. But here's a summary of the calculations of the Standard Deviation of SAMPLE 1:


OK... according to my potentially flawed calculation, the standard deviation is roughly $860. (I used the formula for calculating the standard deviation in a "Population"... this is when the data are the only values we're interested in. There's a slightly different formula for calculating Sigma for a sample from a larger population.)

Once we know the standard deviation, we can inspect the data and see which data points are within ONE STANDARD DEVIATION of the average. It my original sample, these would be the data points between $15,610 and $17,400 -- three of the data points, $16,000, $16,500, and $17,000, lie within one standard deviation of the average. In a typical population, a statistician would expect about 68% of data point to lie within one standard deviation. In my tiny population, it's 3/5 or 60% -- not too bad for totally trumped up (oops, excuse me) data.

If I add back the tycoons exorbitant income, it will lie far beyond even the recalculated standard deviation -- so we spot it as an anomaly.

Likewise, for the data sample that includes extremely low values, the average and median values are equally meaningless.  

What does this mean in practical terms?

Well, for starters, DJT repeatedly bragged that:

"... median household income is up $5,000 since I took office"

We can see that this is a meaningless claim. We know that the divide between high echelon earners and the rest of us increased significantly, because of tax cuts that heavily benefitted higher incomes. The number of people in this group has increased -- the 1% is now the 1.25% or something like that. With a population of 300,000,000 plus people, this means that the number of gazillionaires has increased from 3 million, to 3.75 million. This moves the median significantly higher meaning the already wealthy have become even more wealthy. It does NOT mean that all American's incomes are $5,000 higher. In fact, it says absolutely NOTHING about the lower half of the population. To make ANY conclusion about the state of affluence in the general population, we would have to calculate the Standard Deviation and eliminate all those values that lie outside ONE STANDARD DEVIATION as anomalies.

Equally important for understanding the economic status of the general population, we would also discount the low values that lie beyond one standard deviation.

The overall point is that clearly, the ex-president -- and probably every president before and since -- have and will use statistics to support whatever point they're trying to make, fully knowing that mean and median are completely meaningless on their own. 

Just a note -- all the increased income brags ignore one extremely significant point: inflation. Average income in 1970 was $52,000 (again, with the AVERAGE.) To purchase what at that time would cost $52,000 (say a house), would now cost $350,000+ (unless you're in Austin, in which case that house would cost about $1.25 million using the 25 year inflation on the proportional increase on estimated value of my own house as a guide.)

------------------------------------------------------------------------------------------

*The Big Brain Musician award most likely belongs to one of these:

Brian Mays (lead guitar of Queen), PhD in astrophysics;
Phil Alvin (lead vocal of The Blasters), advanced degree in math;
Dexter Holland (lead vocal, The Offspring), PhD in Molecular Biology;
Art Garfunkel, masters in Math;
Sterling Morrison (Velvet Underground), PhD in Medieval Literature from UT, no less!!!;
Milo Aukerman (vocal, The Descendants), PhD in Molecular Biology from USC;

And my all time favorite, (besides Leonardo Da Vinci)
Charles Ives, who founded and ran a successful insurance company, in the process advancing many innovation in financial services. This allowed him the freedom to write the music that he WANTED to, instead of the music he NEEDED to. He's one of my life models.

From the Wikipedia on Charles Ives ()
"Igor Stravinsky praised Ives. In 1966 he said: [Ives] was exploring the 1860's during the heyday of
Strauss and Debussy. Polytonality; atonality; tone clusters; perspectivistic effects; chance; statistical
composition; permutation; add-a-part, practical-joke, and improvisatory music: these were Ives’s
discoveries a half-century ago as he quietly set about devouring the contemporary cake before the rest of
us even found a seat at the same table." 

Comments

Popular posts from this blog

The Constitution on Mosques

First Amendment: Congress shall make no law respecting an establishment of religion, or prohibiting the free exercise thereof... Article II, Section 2: "...he (the President) shall take Care that the Laws be faithfully executed." Just last week, Sarah Palin is seen on camera telling an Alaskan woman that she was working to "...elect candidates that understand the constitution." Today, we hear, a woman who appears to be this same Sarah Palin urging, "Mr. President, should they or should they not build a mosque?" Does Ms. Palin not understand the Constitution? Congress is prohibited from making any laws prohibiting the free exercise of religion, and the President has absolutely no power to establish laws AT ALL -- not in the Constitution. He should not offer an opinion... he should simply, as he has done, restate the content of the Constitution. Ms. Palin should take the time to read the US Constitution and stop cynically trying to stir up political ...

After the 2016 Election

Yesterday was election day and today, I'm immensely depressed. Songwriter that I am, I often find inspiration in loss -- and today, I'm inspired, therefore, I write. The results of the "election" verify what we've suspected -- our country is deeply polarized. It's hard to imagine a more clear cut choice that Trump vs. Clinton. The results clearly delineate the two sides and it's very evenly divided -- the popular vote was roughly 50%-50% with Clinton slightly ahead. By examining the exit polls: If you are a white woman with a college degree, you voted for Clinton, 51% to 43%. If you are a white woman with no college degree, you voted for Trump, 62% to 34%. If you are a white male with a college degree, you voted for Trump, 54% to 39%. If you are a white man with no college degree, you voted for Trump 72% to 23%. ALL other groups sampled, women and men, college or no college, voted overwhelmingly for Ms. Clinton. For the record, I voted for Bern...

How to Lead a Band - Part I

OK... I'm not Bob Wills, Woody Herman, Benny Goodman, or Johann Strauss the Elder, nor even the Younger. I'm just a seat of the pants geetar picker who started his first band at age 12, and has been at it now for 64 years. If you're reading this after 2024, then add an appropriate number of years on to that figure... I assume that if you can read, you can probably add. I've played probably close to 10,000 gigs plus or minus, both as leader and sideman, so this blog comes from the perspective of both. I've played for as many as 25,000 people, and as few as zero... that's right -- ZERO. (I'm not sure I like ZERO, but I'd far preferred the ONE GOOD LISTENER to 25,000 inebriated idiots, just for the record.) First, here's a few things that leaders should be aware of, followed by some things that sidemen should be aware of. If you're in a "band" and it's very democratic and no one is really the leader... enjoy your childhood. This is...