Chart dos and don’ts
This item is open for comments. See the comments section below
Above all else show the data
Do use the full axis and avoid distortion
For bar charts, the numerical axis (often the y axis) must start at zero.
Another bad example shown on BBC UK show “Breakfast”. Did really the men height doubled from 1871 to 1971?
If you need to show data details that are not visible when using the full axis, than the original chart with full axis must be accompanied with a “zoomed in chart”, a so called “panel chart”. See example below
If you have only one category to show, than you can show a portion of the chart by using a line chart in a specific range.
Another suggestion is to “break” the axis, so that part of the axis shows the small values, then another part of the axis shows the large values, with a section of the axis scale removed. Sounds good, but you’ve lost any correlation between the large and small values.
Making these charts interactive will solve many of the issues stated above. For example the user would be able to mouse over a column and get the exact value, filter out some categories or sort the columns according to their values for easier comparison.
Use consistent intervals on axis (be transparent on data gaps)
The x-axis in the "wrong example" below has a time-series with inconsistent intervals (missing years 2003 and 2004) giving a distorted view of data over time.
Remove any visual clutter (increase data-ink ratio, Tufte’s principle)
As shown in the example above, it is important to remove any visual clutter like the dark background and the dark grid (non-data-ink) and instead enhance the visibility of the data information part (data-ink), in this case the bars. The grid can be removed or made in a much more subtle style, since it is a supporting tool rather than the data itself.
Use a clear language and avoid acronyms
Tell the "why" and "how"
Most people simply identify what is being measured in the title line or other descriptive information, leaving the reader with no help or clue on how to read or why the chart was made.
Original title: Cadmium emissions
Improved title with note: Change in cadmium emissions. Note: A reduction of emission is an indication of improved air quality in major European cities.
Charts are mostly communication tools. We have already made some reasoning on the "why and how" when we choose the chart type (bar, line, scatter plots etc.). Specific chart types are best at showing specific aspects of the data.
You can skip this rule if you are building a raw "Statistical exploratory charting tool" where user can slice and create any chart they want.
For end-products ready to be consumed by the target audience, you should always explain how to read the chart and the reasoning behind it. Try to be objective and leave out any subjective interpretations.
Highlight what’s important, tell one story
Therefore you should highlight just one or two important lines in the chart, but keep the others as context in the background.
Another bad example, with no highlighted story
The above chart remade below in a much better version which highlight the rise and fall of Microsoft. Do you see what has made the difference?
Sort your data for easier comparisons
The bar chart below is a good example, where the chart x-axis is sorted on the y-values not on the alphabetic order of the country names.
It will be otherwise very difficult if not impossible for users to do a proper comparison across the many bars. It is in any case easy with a quick eye-scan to find your own country in the list.
The pie chart below (even though pie charts should be avoided) works also better when presented with sorted data values. It starts at 12 o’clock with the largest slice. It is much easier to understand the relations between the parts, what is bigger and what is smaller, even when the values are not readable or the areas are very similar.
Use direct labeling wherever possible, avoiding indirect look-up
Rotate bar chart when category names are too long
Do not use legend when you have only one data category
Chart with a legend that is not needed (before)
The legend display one category only and it is already in the title, no need to add it to the axis either.
Chart after we removed the unnecessary legend information
Do use proper aspect ratio to minimize dramatic slopes effects
Robert Kosara has a great summary of the "banking to 45 degrees" practice first proposed by Bill Cleveland.
Here are the examples given by Kosara:
The same data is presented three ways. The slope is a reflection of the scales used on the two axes.
However, in some cases there can be legitimate reasons why not to stick completely to "banking to 45 degrees". For example to analyze the data and reveal certain patterns which would not be visible in the 45 degree slope. See example below.
Two plots of monthly atmospheric carbon dioxide measurements, taken from 1959 to 1990. The first plot, with an aspect ratio of 1.17, reveals an accelerating increase in CO2 levels. The second plot, with an aspect ratio of 7.87, facilitates closer inspection of seasonal fluctuations, revealing a gradual attack followed by a steeper decay. Source: Computer Science Division, University of California, Berkeley (http://vis.berkeley.edu/papers/banking/)
Do adjust for inflation in long-time series
This is done by using the CPI (consumer price index). A Euro in 2010 just does not have the same spending power as a Euro in 1961.
The purchasing power of €100 EUR in year 1961 is equivalent to €1948 EUR in year 2010.
Do ask others for opinions
Don't use 3D or blow apart effects
Bad chart examples
Below a very creative 3D-pie chart and very incomprehensible as well.
Below another (in)famous “churtjunk”. Compare the 21,2% with the 19,5% slices in the pie. Which one looks bigger?
(presented by Steve Jobs at Engadget 2008 http://www.engadget.com/2008/01/15/live-from-macworld-2008-steve-jobs-keynote/)
Avoid pie charts and donuts
The human mind thinks linearly: we can compare lengths of line segments but when it comes to angles most of us can't judge them well. Therefore try to avoid the use of pie charts when comparing a large number of items. Simple pie charts displaying 2-3 categories may work just fine, but when displaying more data it is strongly advised to choose another chart type.
The donut chart is just another pie chart with a hole punched in the middle. The donut chart is a useless chart made worse. Never ever use a donut chart either.
Avoid stacked charts, difficult for comparing data
To solve this issue some chart tools allow the user to filter out interactively the stacked categories and be able to do have a single category displayed.
Same issue applies to stacked areas charts. It is difficult to compare the areas in the different regions when stacked (figure above) and much easier to have them as lines (figure below) and a separate line for total.
Another example on how bad stacked bar charts can be in certain cases
Let’s see how the chart above looks like as a line chart
Now we can clearly see the decline of household category “Married Couples with Children”. Moreover we can more clearly see the trends in the other categories as well.
More reading http://junkcharts.typepad.com/junk_charts/2013/05/more-power-brings-more-responsibility.html
Don't confuse correlation with causation
For example if you plot two different data series (A and B) on a common time series, you will notice that both follow a similar pattern over time. It is very hard if not impossible to prove that A cause B or viceversa. There are so many third factors that have influence both on A and B that are not plotted on the chart. Many other external factors can be the cause of both A and B changing the same way over time. Only a very large profound statistical-based study on all factors can give some indication of causation, if any exists.
Don’t use maps for everything that has spatial dimension
In fact most data has a geographical dimension if we think about it but it does not always convey new insight when displayed on a map.
A very bad map example above, where a huge amount of data is displayed just because it has a location attached to it. However the user does not get any insight from this map. There is no correlation or pattern in this map which we could further investigate.
A good example: The famous map from John Snow 1854, one of the first map data visualization. It is an excellent example of when it is appropriate to use a map for getting insight on data. In this case a strong correlation was found between the cholera outbreaks and the positions of water pumps. Further investigation confirmed that the water was contaminated and this was the cause of cholera outbreaks concentrated around those areas.
Another example of where a map feels “in the way” and making it more difficult to understand the data displayed on it.
In the map above displays where different sectors of High-Tech manufacturing and R&D is located in the nordic countries together with their sizes (number of employed in sector). The map does pretty well the job of displaying where these jobs are located, but it is useless when used for comparing the different sizes of the circles. Moreover it is implicit that most jobs are located near the large cities like Copenhagen, Malmo, Gothenburg, Oslo, Stockholm and Helsinki. We learn nothing new here.
It would be much more interesting to see the sectors data plotted on a bar chart, optionally grouped by country or other regions. It would make it easier to see which region has most jobs in which sector and easier to compare the different sectors sizes with each other, if that is the story we want to tell.
Don't use more than (about) six colors.
- Different colors should be used for different categories (e.g., male/female, types of fruit), not different values in a range (e.g., age, temperature).
- Do not use rainbows for range values
- If you want color to show a numerical value, use a range that goes from white to a highly saturated color in one of the universal color categories. no rainbows
Example of bad chart, where we use different colors for same measurement
Now redone with a gradient color:
Don’t forget 7%-10% of your male audience (color deficiency)
As an example consider the following chart.
Below you have the same chart displayed as a color-blind person would see it.
Use Vischeck to test your images.
If the chart is readable in black and white than it is even better!
References and further reading