Types of Data
In this chapter(?) I'm going to discuss data theory. There won't be any D3 exercise but I hope you take the time to ponder it. Understanding different types of data helps inform how to visualize it. Also, in future sections(?) I will make use of vocabulary defined here. Finally visualization is about visually communicating relationships. Taking the time to consider the patterns of these relationships will ultimately lead to sharper thinking and better communication.
Quantitative Data vs. Categorical Data
Fundamentally data is made of things measured (i.e. quantities) or the names we give to things (i.e. a categories). In the visualization world the former is called quantitative data and the latter is called categorical data. It is rare to have one without the other. Below I've listed examples of quantitative data and things that are categories. I think you'll find that not much meaning is conveyed, although it may be easy to make up a story about them in your mind.
Examples of Categories:
- Numbers > 30
- New York
Examples of Quantities:
A string of numbers on it's own often has no meaning. Does [453,4328,5908,1230] mean anything to you? Categories on their own can have meaning provided there is a relationship between the categories. While ["Bob", "Angela", "Maria"] doesn't mean much, the fact that Maria is the child of Bob and Angela starts to have meaning.
Relationships are what give meaning to data.
Here are examples of common relationships in data.
|Difference of quantity||1 > 2||Allows for sorting and ranking.|
|Ratio||Jupiter's radius is 11.2 times bigger than Earth's radius.||Allows understanding of relative size.|
|Correlation||People who smoke are 15-30 times more like to get lung cancer.||Change in one quantity is associated with change in another. Suggests possibile causation between quantities.|
|Distribution||[1,1,0,1,0] is similar to [1,1,0,0,0] but both are not similar to [0,0,0,0,0].||Similar to difference of quantity except for samples/collections of quantities.|
|Ordering||Alphabetical ordering. Another is Monday comes before Tuesday. Of course it also comes after Tuesday but we rarely think about it that way.||Facilities finding familiar values.|
|Hierarchy||We've broken our sales out by Division then by Country.||Understanding compositon|
|Grouping||We consider the hours of 6am-6pm "day" and the rest of the hours "night". The "Southwest" states include: Arizona, California, Nevada and Utah.||Aggregation/Abstraction. Similar to hierarchy.|
Specialized types of data
Some types of quantitative data are very important to us as humans, so we tend to deal with them in specific ways. Two types of data that require special treament are location and time.
If you think about it most of our maps are a two dimensional scatter plot with boundaries drawn. However, we tend to think of maps as being separate and distinct from a scatter plot. We natually want maps to have up mean "north" (or "south" if in southern hemisphere).
For time, we insist that times moves from left to right in diagrams and always on the horizontal axis. And barring that, that time move continuously in order.
Select the Categorical relationships Ratio Ordering Distribution Differnce of quantity Select the Quantitative relationships Grouping Correlation Hierarchy