Scaling Data

Tutorial

A common problem that I've attempted to skirt around so far is that translation between the size of the visualization area as opposed to the range of values being visualized. It is quite uncommon that the size of the visual area matches exactly the data to be displayed. For example's sake, let's say we want to visualize data on a 100 by 100 SVG area. Our data however ranges from 200 to 10,000. To make this work out we have to translate between our data's range and our visualizations display area.

D3 provides scales which make it easy to perform this translation and more importantly easily modifiable as you change the visualization to meet your needs. The two things that must always be passed to a scale are the domain (which is the range of values for the data) and the range (which is typically the range of pixel values in the visualization). Below I'll walk you through some examples of the various types of scales.

Linear

The simplest type of scale translation is a linear translation, which means that the data is uniformly scaled to fit the visualization. d3.scale.linear() creates a linear scale and is probably the most frequently used D3 scale. You call the domain() and range() functions to configure it. A scale object is a function that when called with a domain value returns a range value. Below is a quick example of how to create a scale.

var scale = 
    d3.scale.linear()
      .domain([0,10])
      .range([0,100]);

The range of the values of the domain corresponds to the range of values in your input data (zero to ten) that will be transformed into units of pixels for visualization (zero to one hundred). With this scale, scale(5) returns 50 and scale(2) returns 20. This is because 2 out of 10 scales to 20 out of 100. To use the scale function you specify a data value (from the domain) as the input parameter and the scale function returns the the corresponding value from the range.

Scales in use

Before covering some other types of scales that D3 provides, I'm going to walk through an example of how scales are useful in practice.

In this example we'll create a simple bar chart using scales. Our data will be the number of cars on the road at various times during the day for a small city. We'll use vertical bars so that the bar height depicts the data value and the horizontal placement just indicates the order of the data values. We'll use an SVG area of 100x100.

var data = [200,5000,10000,3000];

// Layout touch-ups are easier later if you use variables for width/height
var svgWidth = 100;
var svgHeight = 100;
var barWidth = 15;

var vertScale = d3.scale.linear()
                  .domain([0,d3.max(data)])      // Bar charts always start from 0
                  .range([svgHeight,0]);         // Bottom of svg area (100) to bottom (0)

var horizScale = d3.scale.linear()
                   .domain([0,data.length-1])    // 0,1,2,3 data points (i.e 0 to 4-1)
                   .range([0,svgWidth-barWidth]);// remove barWidth so final bar is still visible

d3.select("#example1 svg")
  .selectAll("rect")
  .data(data)
  .enter()
  .append("rect")
  .attr("width", barWidth)
  .attr("height", function(d) { return svgHeight-vertScale(d); }) // height isn't same as verticle placement so we have to adjust
  .attr("x", function(d,i) { return horizScale(i); }) // the data item position (i) is what we're using for placement
  .attr("y", vertScale);  // vertScale is already of the form fuction(d)

For the vertical scale's domain we started from 0 and went the maximum value of our data set as bar charts typically start from 0. The range is simply the whole vertical SVG range, but starting from 100 and going to 0 to account of how SVG addresses pixels. For the horizontal scale we used the input domain as the number of items in the data set. The output range is the whole svg width minus the width of one bar because if we start a bar at x=100 it won't be visible. For the "height" attribute we had to do the typical math of converting between the height and the y placement. Note how little math was involved to get a working layout.

Next, we'll add y-axis labels and grid lines. To do this we'll make use of the ticks() function which returns an array of ordered uniformly spaced values from the domain. This array makes a good data set to use for axis labels. The ticks() function takes an optional parameter count, which is about how many ticks are desired. It doesn't guaranty how many will be returned but uses the count as a hint.

// only including new code in this listing

d3.select("#example2 svg")
  .selectAll("line")
  .data(vertScale.ticks(5))
  .enter()
  .append("line")
  .attr("x1", horizScale(0))
  .attr("x2", horizScale(data.length))
  .attr("y1", vertScale)
  .attr("y2", vertScale);

d3.select("#example2 svg")
  .selectAll("text")
  .data(vertScale.ticks(5))
  .enter()
  .append("text")
  .attr("y", vertScale)
  .text(function(d) { return d; });

It was pretty straight forward to add axis labels and grid lines. However, the axis labels overlap the chart and the grid lines overlap the labels it would be nice to not have that happen. We can simply change the starting point of the horizontal scale's range to allow space for the labels. Because we also used horizScale for the x1 and x2 attributes of our grid lines, these will be resized as well.

// problem solved by a small adjustment to the scale
var horizScale = d3.scale.linear()
                   .domain([0,data.length-1])
                   .range([35,svgWidth-barWidth]);

Quiz


Match the d3.scale.linear function to its description

domain() - the range of input values
range() - the range of output values
ticks() - returns an array of evenly-spaced sample values from the output range


The optional number of ticks parameter to ticks() specifies the exact 
number of ticks to be returned. True/False.

Things to do

The chart below has been coded by hand without scales. It is getting unweildy tuning everything by hand. Switch to using scales and feel the difference.

Create a "price" scale. Have it's domain go from 0 to the maximum car price. Have it's range go from 0 to 400 - textPadding. We've left 140 pixels of textPadding for the descriptive text. The output of this scale is the width of the bars. This scale will be used for bar widths, vertical tick marks, and the placement of the dollar amount labels. Be sure to update all 3. Also when using for placement you'll need to add the textPadding back because the scale returns the bar width not the position of the end of the bar.
Create a vertical spacing scale. In my sample output I had my range go from 130 to 0 to leave room for price labels at the bottom.

Extra Credit

Experiement with the hinting to ticks(). Try to find the ideal amount of ticks in your opinion.

Change the code on the left. Once you've made a change, the page will render on the right.

<!doctype html>
<html>
  <head>
    <meta charset="utf-8">
    <style>
      line { 
        stroke-width: 0.5px;
        stroke: grey;
      }
    </style>
    <script src="d3.min.js"></script>
  </head>
  <body>
    <h4>Your code will display here</h4>
    <svg height="200" width="400">
    </svg>
    <script>
      var priceData = [{ name: "1999 Lexus ES-300", price: 4200 },
      { name: "2006 BMW 330 XI", price: 9900 },
      { name: "2006 Lexus ES-330", price: 7600 },
      { name: "2003 Cadillac CTS", price: 5500 }

];

var textPadding = 140;

// Add price scale here

var vertScale = d3.scale.linear()
        .domain([0, priceData.length])
        .range([130, 0]);

d3.select("svg")
        .selectAll("rect")
        .data(priceData)
        .enter()
        .append("rect")
        .attr("width", function (d) { return d.price; }) // use price scale instead
        .attr("height", 20)
        .attr("x", textPadding)
        .attr("y", function (d, i) { return vertScale(i); });

d3.select("svg")
        .selectAll("text")
        .data(priceData)
        .enter()
        .append("text")
        .attr("x", 0)
        .attr("y", function (d, i) { return vertScale(i) + 15; })
        .text(function (d) { return d.name; });

// TODO, use ticks() from a scale instead of predefined data points
      var customTicks = [0, 5000, 10000];
      d3.select("svg")
        .selectAll("line")
        .data(customTicks)
        .enter()
        .append("line")
        .attr("x1", function (d) { return d + textPadding; }) // use price scale instead
        .attr("x2", function (d) { return d + textPadding; }) // use price scale instead
        .attr("y1", 160)
        .attr("y2", 20);

d3.select("svg")
        .selectAll(".tickLabels")
        .data(customTicks)
        .enter()
        .append("text")
        .classed("tickLabels", true)
        .attr("x", function (d) { return d + textPadding; }) // use price Scale
        .attr("y", 200)
        .attr("transform",
          function (d) {
            var vertPos = vertScale(4);
            var barEnd = d + textPadding; // use price Scale
            return "translate(-20,0) rotate(315," + barEnd + "," + vertPos + ")";
          })
        .text(function (d) { return "$" + d; });
    </script>
  </body>
</html>