Other Scales

Tutorial

Linear scales are good most of the time. However sometimes it is easier to see the data and explain it with non-linear scales. In this lab I'll explain various other scales provided by D3.

Power Scale

d3.scale.pow is used for relationships that involve an exponent. For example the surface area of a sphere is 4πr^2. If we were to plot this relationship for spheres of various radii it would grow quite quickly. It becomes difficult to see the vertical difference in values at the left end of the chart.

// d3.range(1,15) creates the array [1,2,3,4,5,6,7,8,9,10,11,12,13,14]
var data = d3.range(1,15).map(function(d) { return 4*d*d*Math.PI; });

var yscale = 
  d3.scale.linear()
    .domain(d3.extent(data))
    .range([95,5]);

var xscale = 
  d3.scale.linear()
    .domain([0, data.length])
    .range([5,95]);

d3.select("#example1")
  .select("svg")
  .selectAll("circle")
  .data(data)
  .enter()
  .append("circle")
  .attr("r", 2)
  .attr("cy", yscale)
  .attr("cx", function(d,i) { return xscale(i); });

Instead, we can use a power scale to adjust the y values such that they should have a linear relationship. This is done with the exponent() function. In this case we'll set the exponent to .5 meaning instead of plotting the value first take it's square root and plot that value. Below we can see this gives us a straight line. The formula for the line is y = 2r*sqrt(π). Of course it is important to properly label the axis to make it clear that the scale is NOT linear! Otherwise observers may miss the point as they would with the example below.

// only displaying the relevant code change
// exponent .5 = square root
var yscale =
  d3.scale.pow()
    .exponent(.5)
    .domain(d3.extent(data))
    .range([95,5]);

Log Scale

Log scales are similar to power scales. They are for scaling relationships of the form x^y where x is a constant value. For example the number of values that a binary number of n bits can represent is 2^n.

// only displaying relevant code changes
var data = d3.range(1,15).map(function(d) { return Math.pow(2,d); });

// also added y-axis labels and ticks
d3.select("#example3")
  .select("svg")
  .selectAll("text")
  .data(yscale.ticks(5))
  .enter()
  .append("text")
  .text(function(d) { return d; })
  .attr("x", 0)
  .attr("y", function(d) { return yscale(d) + 5; });

// remember that ticks(5) is just a suggestion, we only end up with 3 in this case
d3.select("#example3")
  .select("svg")
  .selectAll("line")
  .data(yscale.ticks(5))
  .enter()
  .append("line")
  .text(function(d) { return d; })
  .attr("x1", 45)
  .attr("x2", 50)
  .attr("y1", yscale)
  .attr("y2", yscale);

Here we see the problem again, perhaps even more clearly. It is difficult to see the difference between all but the last few data points with a linear scale. Using d3.scale.log once again we can set it back to a straight line using the base() function. Since we plotted 2^x, we set the base as 2.

// only displaying relevant code changes
d3.scale.log()
  .base(2)
  .domain(d3.extent(data))
  .range([95,5]);

// In this case I didn't get what we wanted from ticks()
// Instead of changing the parameter to data(), I moved
// the clutter outside of the diagram's visible area.
d3.select("#example4")
  .select("svg")
  .selectAll("text")
  .data(yscale.ticks(1))
  .enter()
  .append("text")
  .text(function(d) { return d; })
  .attr("x", function(d,i) { return i%4==0?0:200; })
  .attr("y", function(d) { return yscale(d) + 5; });

d3.select("#example4")
  .select("svg")
  .selectAll("line")
  .data(yscale.ticks(1))
  .enter()
  .append("line")
  .text(function(d) { return d; })
  .attr("x1", 40)
  .attr("x2", 45)
  .attr("y1", function(d,i) { return i%4==0?yscale(d):300; })
  .attr("y2", function(d,i) { return i%4==0?yscale(d):300; });

This time the straight line presentation is improved with the use of a labeled axis to make it clear to the viewer that it is not a linear scale despite being in a line.

Quantizing

Sometimes you want to map numbers to a discrete set of outputs. For example, your pooridge could be too cold, just right, or too hot corresponding to the temperatures in the ranges 90-120. Too cold would correspond to 90 to 99, just right is 100 to 109, too hot is 110 to 120. d3.scale.quantize allows us to take numbers and tranform them to categories with a linear transformation. The domain must be an array of two values and the range is a list of discrete values. Below we use a domain of 90 to 120 and the discrete values "lightblue", "green" and "darkred" as the range. These values correspond to "too cold", "just right", and "too cold".

// creates an array starting at 90 and incrementing by 2 until 120
var data = d3.range(90,120,2);
var justRightScale = d3.scale.quantize()
                             .domain(d3.extent(data))
                             .range(["lightblue", "green", "darkred"]);
var xScale = d3.scale.linear()
                     .domain([0, data.length])
                     .range([10,495]);

d3.select("#example5 svg")
  .selectAll("circle")
  .data(data)
  .enter()
  .append("circle")
  .attr("r", 5)
  .attr("cx", function(d,i) { return xScale(i); })
  .attr("cy", 5)
  .style("fill", justRightScale);

d3.select("#example5 svg")
  .selectAll("text")
  .data(data)
  .enter()
  .append("text")
  .style("text-anchor", "middle")
  .attr("y", 30)
  .attr("x", function(d,i) { return xScale(i); })
  .text(function (d) { return d; });

Quantiles

Quantiles are when you split a distribution of values into buckets such that each bucket has about the same number of values. You may have heard of percentiles which is when there are 100 buckets. Each quantile is a range of values that falls into the bucket. For example the 50th percentile for income in the United States could be $40,000 to $45,000. d3.scale.quantile helps you use quantiles as scales. The input domain is the values, to continue with the income example it would be all incomes in the United States. The number of values in the range determines how many buckets there are. Since the range values can be arbitrary you can map the quantile to say a color or a description such as "1st percentile".

var data = [1,1,  5,7,  9,11,  50,60];
// Quartile is the name for a quantile where there are 4 output bins.
var quartile = d3.scale.quantile()
                       .domain(data)
                       .range(["Bottom 25%", "25%-50%", "50%-75%", "Top 25%"]);

d3.select("#quantileEx")
  .append("p")
  .text("Input Data: " + data);
d3.select("#quantileEx")
  .selectAll("p")
  .data([1,6,9,55,12,49])
  .enter()
  .append("p")
  .text(function(d) { return  "Which quartile is " + d + " in? " + quartile(d); });

At this point you may be asking yourself what the difference is between this and the quantize scale. The main difference is the quantile mapping does not have to be linear. Notice in the example I used the smallest output bucket is just 1 whereas the top bucket is 50-60 (D3 will pick the closest bucket if you say picked a value such as 49 that is not in the input).

Scales are simply transformations to your data. The various types of scales provided by D3 offer the conveinence of not having to create these common transformations yourself.

Quiz

Match the scale to its description

d3.scale.pow - scales data using formula x^n
d3.scale.log - scales data using formula log_n(x)
d3.scale.quantize - transforms the input into a finite number of evenly-spaced output bins
d3.scale.quantile - transforms the input into evenly sized output bins based on the data distribution


d3.scale.pow and d3.scale.log are useful because they can make it 
easier to compare small values when there are also large values. True/False.

Things to do

You have some website log data that tracks hits to the homepage and to your other 3 product pages. The traffic to each product is very different. With a linear scale it is hard to see what the traffic pattern looks like for product 2 and product 3.

Change the vertical scale to a log scale with a base of 10.
Now there are too many y-axis labels. Filter the data returned by ticks() so that only every tenth tick is included.

Extra Credit

Transform the data so that it displays percent of total hit traffic for each page in a given hour. Pick an appropriate scale for this.

Change the code on the left. Once you've made a change, the page will render on the right.

<!doctype html>
<html>
  <head>
    <meta charset="utf-8">
    <style>
    .hourLabel {
      text-anchor: middle;
    }
    
    .chartTitle {
      font-family: sans-serif;
      font-weight: bold;
    }
    
    .axisLabel { 
      font-weight: bold;
    }
    </style>
    <script src="d3.min.js"></script>
  </head>
  <body>
    <h1>Your code will display here</h1>
    <svg width="700" height="400">
    </svg>
    <script>
      d3.json("webtraffic.json", function (error, data) {
        var svg = d3.select("svg");
        var svgWidth = 700;
        var svgHeight = 400;
        var leftMargin = 50;
        var rightMargin = 100;
        var bottomMargin = 50;
        var topMargin = 20;

var timeScale = d3.scale.linear()
          .domain(d3.extent(data, function (d) { return d.hour; }))
          .range([leftMargin, svgWidth - rightMargin]);
        // TODO: change this to make data visible
        var hitsScale = d3.scale.linear()
          .domain([1, d3.max(data, function (d) { return d.hp; })])
          .range([svgHeight - bottomMargin, 5 + topMargin]);

var legend = [];

// make it easy to draw a given line
        function drawPageLine(pageName, description, color) {
          svg.selectAll("." + pageName)
            .data(data)
            .enter()
            .append("circle")
            .attr("cx", function (d) { return timeScale(d.hour); })
            .attr("cy", function (d) { return hitsScale(d[pageName]); })
            .attr("r", 3)
            .style("fill", color)
            .classed(pageName, true);

var lastPointY = hitsScale(data[data.length - 1][pageName]);
          var lastPointX = timeScale(data[data.length - 1].hour);
          legend.push({ description: description, color: color, x: lastPointX, y: lastPointY });
        }

drawPageLine("hp", "Home Page", "#1f77b4");
        drawPageLine("p1", "Product 1", "#ff7f0e");
        drawPageLine("p2", "Product 2", "#2ca02c");
        drawPageLine("p3", "Product 3", "#d62728");

svg.selectAll(".hourLabel")
          .data(data)
          .enter()
          .append("text")
          .classed("hourLabel", true)
          .text(function (d) { return d.hour; })
          .attr("x", function (d) { return timeScale(d.hour); })
          .attr("y", svgHeight - 20);

svg.selectAll(".hitTick")
          .data(hitsScale.ticks())
          .enter()
          .append("text")
          .classed("hitTick", true)
          .text(function (d) { return d; })
          .attr("x", 0)
          .attr("y", function (d) { return hitsScale(d); });

svg.selectAll(".legendText")
          .data(legend)
          .enter()
          .append("text")
          .attr("x", function (d) { return d.x + 5; })
          .attr("y", function (d) { return d.y; })
          .style("fill", function (d) { return d.color; })
          .text(function (d) { return d.description; });

svg.append("text")
          .text("Page hits of various pages")
          .classed("chartTitle", true)
          .attr("x", 200)
          .attr("y", 15);

svg.append("text")
          .text("Hour of Day")
          .classed("axisLabel", true)
          .attr("x", svgWidth / 2)
          .attr("y", svgHeight);
      });
    </script>
  </body>
</html>