Filtering and Transforming

Tutorial

Inevitabley data is not in the perfect format for visualization. Data types may be wrong, dates will be in weird formats, you may need to perform aggregation, you may only want to visualize some of the data.

D3 data is just a Javascript array. Outside of D3 Javascript provides some great built-in array functionality. In this lab I'll show you how to make use of filtering and transforming functionality built into Javascript arrays. In the upcoming labs we'll see how D3 augments the array functionality built into Javascript.

Filtering

Often you have more data than you want, or just more than you want to look at one time. Javascript arrays have the filter() function that creates a new array from the current array only including the members that are approved by a predicate function. filter() takes one argument, the predicate function. The predicate function takes one parameter and returns true (keep) or false (don't keep) for each item of the array. filter() does not modify the array, it creates a new one so you can use the same source array to create several views of the data. Below are some examples of using filter:

var even = [1,2,3,4,5,6].filter(function(d) { return d%2; });
// even = [2,4,6]

var over5 = [1,2,3,4,5,6].filter(function(d) { return d>5; });
// over5 = [6]

var populations = [
  {state: "CA", city: "Riverside", count: 314000},
  {state: "WA", city: "Seattle", count: 600000},
  {state: "NV", city: "Los Vegas", count: 600000}, 
  {state: "MI", city: "Detroit", count: 700000},
  {state: "CA", city: "Isla Vista", count: 23000},
  {state: "CA", city: "Sacramento", count: 500000}
];
var justCA = populations.filter(function(d) { return d === "CA"; });
// justCA contains entries for "Riverside", "Isla Vista", "Sacramento"

var justBigCA = populations.filter(function(d) { return d === "CA" && d.count > 100000; });
// justBigCA contains entries for "Riverside", "Sacramento"

Transforming

Another problem that often comes up is the fields within datapoints aren't what we want. Suppose the data field is in metric units and you want english units. In this case you want to transform each datapoint to change that field from metric to english.

Javascript arrays have the map() function which returns a new array that is the result of applying the specified function to each element of the array. In other words you specify how to transform one element, it is applied to the entire array and returns a new array. For transformation this is exactly what we want. We can add or subtract fields and change data types. Some examples are below:

var doubled = [1,2,3,4].map(function(d) { return d*2; });
// doubled = [2,4,6,8]

In this example I create a new array doubled such that each element in it is two times the corresponding element in the source array. More complicated mappings are possible. In the following example, all of the original fields are discarded and only calculated fields are kept. The source data is daily temperature readings from Death Valley California. I want to plot the temperature readings over time in Farenheit. Unfortunatley the source data temperatures are in tenths of a degree Celcius and the dates are strings instead of Date objects so transformation is necessary.

var tempData = [ 
  { DATE: "20140215", TMIN: 67, TMAX: 161 },
  { DATE: "20140216", TMIN: 39, TMAX: 172 }
];

function transform(d) { 
  var year = parseInt(d.DATE.substring(0,4));
  var month = parseInt(d.DATE.substring(4,6));
  var day = parseInt(d.DATE.substring(6,8));

  return { 
           date: new Date(year, month-1, day);
           maxTempFaren: d.TMAX*(9/50)+32,
           minTempFaren: d.TMIN*(9/50)+32 
         };
}

var transformed = tempData.map(transform);

// transformed = [
//  { date: "Sat Feb 15 2014 00:00:00 GMT-0800 (PST)", minTempFaren: 44.06, maxTempFaren: 60.98 },
//  { date: "Sun Feb 16 2014 00:00:00 GMT-0800 (PST)", minTempFaren: 36.4, maxTempFaren: 62.96 }
// ]

The temperatures TMIN and TMAX are in units of tenths of a degree celcius and represent the minimum and maximum temperature of the day. I want to plot them in degrees Farenheit, so I have to convert them from tenths of a degree Celcius. The date is expressed as a string. I chose to break that string into it's component values: year, month and day. With these it is easy to construct a Javasript date object. Be aware that for Javascript's Date object months go from 0 to 11 whereas day and year start from 1, hense the month-1.

From the date object I can filter on specific values or ranges such as:

// Filter's for February 2014, because of 0 based months.
transformed.filter(function(d) { return d.date.getMonth() === 1 
                                     && d.date.getFullYear() === 2014; }) 

Data Citation

Menne, M.J., I. Durre, B. Korzeniewski, S. McNeal, K. Thomas, X. Yin, S. Anthony, R. Ray, R.S. Vose, B.E.Gleason, and T.G. Houston, 2012: Global Historical Climatology Network - Daily (GHCN-Daily), Version 3. NOAA National Climatic Data Center. http://doi.org/10.7289/V5D21VHZ 2015-01-01.

Quiz

Match the method's description with the method
filter - returns a new array that contains only the elements that met a given criteria
map    - returns a new array that is the result of applying a function to each element

Things to do

  1. Filter the data to July 2011.
  2. Filter the data to the station "GHCND:USW00053139". Use the "STATION" field in the original data. (i.e. before transforming the data)
  3. Convert the temperature to Kelvin. (Kelvin = Celcius + 273.15). Remember data is tenths of a degree Celcius so you will need to divide by 10 to get Celcius.

Extra Credit

Change the code on the left. Once you've made a change, the page will render on the right.