### Question Underlying Data & it’s Assumptions

Tuesday, December 7th, 2010

When I wrote Never Quote Percentages Without Context there was a little voice in my head saying “Now you’re going to have to talk a little about statistics needing context, too”. It appears that little voice in my head was on the money.

### Don’t be Afraid of Statistical Representation

Statistics have a bad name and I’m personally someone who struggles with them. Out of all the units in the MBA program I put the most work into statistics and received my second lowest result – 68% credit. However, understanding at least the basic premise of statistical representation will go a long way to your being able to look at data and information as a manager, a journalist or a researcher.

Don’t worry, this article isn’t a deep and technical one… it’s about understanding when data does and does not have value as information.

### Be Cautious of all Underlying Data – Question It Closely

Programmers will probably get this faster than anybody else… the quality and cleanliness of your data defines the value you can derive from the information it purports to provide.

That means when you go back to the data you critically look at the who, what, when, where, how and why to determine its value. How was it collected? By who? What size were the sample groups the data were taken from? What do they mean?

If a column is headed Education and ranges from 1 through 14… what does that mean? Does it measure years in school, achievement in school or meaningful study towards a specific career? Where would a high school graduate sit? And where would an undergraduate degree sit?

Similarly, a column headed Marriage could be binary – but what about same sex partners, de facto couples and people unwilling to be represented by the religious constraint of the formal marriage relationship?

Or, Urban versus Rural. Unless this column is identified by postcode you might discover self-identification warps the underlying data – what if a person lives 100 metres from the town limits, or in a leafy suburb right on the city’s fringe. Urban or rural?

