Chart dos and don’ts
This item is open for comments. See the comments section below
Above all else show the data
Do use the full axis and avoid distortion
For bar charts, the numerical axis (often the y axis) must start at zero.
Another bad example shown on BBC UK show “Breakfast”. Did really the men height doubled from 1871 to 1971?
If you need to show data details that are not visible when using the full axis, than the original chart with full axis must be accompanied with a “zoomed in chart”, a so called “panel chart”. See example below
If you have only one category to show, than you can show a portion of the chart by using a line chart in a specific range.
Another suggestion is to “break” the axis, so that part of the axis shows the small values, then another part of the axis shows the large values, with a section of the axis scale removed. Sounds good, but you’ve lost any correlation between the large and small values.
Making these charts interactive will solve many of the issues stated above. For example the user would be able to mouse over a column and get the exact value, filter out some categories or sort the columns according to their values for easier comparison.
Use consistent intervals on axis (be transparent on data gaps)
The x-axis in the "wrong example" below has a time-series with inconsistent intervals (missing years 2003 and 2004) giving a distorted view of data over time.
Note: Data has not been reported for 2003 and 2004.
Remove any visual clutter (increase data-ink ratio, Tufte’s principle)
As shown in the example above, it is important to remove any visual clutter like the dark background and the dark grid (non-data-ink) and instead enhance the visibility of the data information part (data-ink), in this case the bars. The grid can be removed or made in a much more subtle style, since it is a supporting tool rather than the data itself.
Use a clear language and avoid acronyms
Tell the "why" and "how"
Most people simply identify what is being measured in the title line or other descriptive information, leaving the reader with no help or clue on how to read or why the chart was made.
Original title: Cadmium emissions
Improved title with note: Change in cadmium emissions. Note: A reduction of emission is an indication of improved air quality in major European cities.
Charts are mostly communication tools. We have already made some reasoning on the "why and how" when we choose the chart type (bar, line, scatter plots etc.). Specific chart types are best at showing specific aspects of the data.
You can skip this rule if you are building a raw "Statistical exploratory charting tool" where user can slice and create any chart they want.
For end-products ready to be consumed by the target audience, you should always explain how to read the chart and the reasoning behind it. Try to be objective and leave out any subjective interpretations.
Highlight what’s important, tell one story
Therefore you should highlight just one or two important lines in the chart, but keep the others as context in the background.
Another bad example, with no highlighted story
The above chart remade below in a much better version which highlight the rise and fall of Microsoft. Do you see what has made the difference?
Sort your data for easier comparisons
The bar chart below is a good example, where the chart x-axis is sorted on the y-values not on the alphabetic order of the country names.
It will be otherwise very difficult if not impossible for users to do a proper comparison across the many bars. It is in any case easy with a quick eye-scan to find your own country in the list.
The pie chart below (even though pie charts should be avoided) works also better when presented with sorted data values. It starts at 12 o’clock with the largest slice. It is much easier to understand the relations between the parts, what is bigger and what is smaller, even when the values are not readable or the areas are very similar.
Use direct labeling wherever possible, avoiding indirect look-up
Rotate bar chart when category names are too long
Do not use legend when you have only one data category
Chart with a legend that is not needed (before)
The legend display one category only and it is already in the title, no need to add it to the axis either.
Chart after we removed the unnecessary legend information
Do use proper aspect ratio to minimize dramatic slopes effects
Robert Kosara has a great summary of the "banking to 45 degrees" practice first proposed by Bill Cleveland.
Here are the examples given by Kosara:
The same data is presented three ways. The slope is a reflection of the scales used on the two axes.
However, in some cases there can be legitimate reasons why not to stick completely to "banking to 45 degrees". For example to analyze the data and reveal certain patterns which would not be visible in the 45 degree slope. See example below.
Two plots of monthly atmospheric carbon dioxide measurements, taken from 1959 to 1990. The first plot, with an aspect ratio of 1.17, reveals an accelerating increase in CO2 levels. The second plot, with an aspect ratio of 7.87, facilitates closer inspection of seasonal fluctuations, revealing a gradual attack followed by a steeper decay. Source: Computer Science Division, University of California, Berkeley (http://vis.berkeley.edu/papers/banking/)
Do adjust for inflation in long-time series
This is done by using the CPI (consumer price index). A Euro in 2010 just does not have the same spending power as a Euro in 1961.
The purchasing power of €100 EUR in year 1961 is equivalent to €1948 EUR in year 2010.
Do ask others for opinions
Don't use 3D or blow apart effects
Bad chart examples
Below a very creative 3D-pie chart and very incomprehensible as well.
Below another (in)famous “churtjunk”. Compare the 21,2% with the 19,5% slices in the pie. Which one looks bigger?
(presented by Steve Jobs at Engadget 2008 http://www.engadget.com/2008/01/15/live-from-macworld-2008-steve-jobs-keynote/)
Avoid pie charts and donuts
The human mind thinks linearly: we can easily compare lengths/heights of line segments but when it comes to angles and areas most of us can't judge them well. Therefore try to avoid the use of pie charts when comparing a large number of items. Simple pie charts displaying 2-3 categories may work just fine, but when displaying more data it is better to choose another chart type.
The donut chart is just another pie chart with a hole punched in the middle. The donut chart is a useless chart made worse. Avoid donut charts for the same reasons.
- Countdown of Top 10 Reasons to Never Ever Use a Pie Chart
- Storytelling with data: alternatives to pies
Avoid stacked charts, difficult for comparing data
To solve this issue some chart tools allow the user to filter out interactively the stacked categories and be able to do have a single category displayed.
Same issue applies to stacked areas charts. It is difficult to compare the areas in the different regions when stacked (figure above) and much easier to have them as lines (figure below) and a separate line for total.
Another example on how bad stacked bar charts can be in certain cases
Let’s see how the chart above looks like as a line chart
Now we can clearly see the decline of household category “Married Couples with Children”. Moreover we can more clearly see the trends in the other categories as well.
More reading http://junkcharts.typepad.com/junk_charts/2013/05/more-power-brings-more-responsibility.html
Don't confuse correlation with causation
For example if you plot two different data series (A and B) on a common time series, you will notice that both follow a similar pattern over time. It is very hard if not impossible to prove that A cause B or viceversa. There are so many third factors that have influence both on A and B that are not plotted on the chart. Many other external factors can be the cause of both A and B changing the same way over time. Only a very large profound statistical-based study on all factors can give some indication of causation, if any exists.
Don’t use maps for everything that has spatial dimension
In fact most data has a geographical dimension if we think about it but it does not always convey new insight when displayed on a map. A very bad map example below, where a huge amount of data is displayed just because it has a location attached to it. However the user does not get any insight from this map. There is no correlation or pattern in this map which we could further investigate.
Bad map example
Good map example
History gives us some good examples
Video - When to *not* use maps
Another bad example of where a map feels “in the way” and making it more difficult to understand the data displayed on it.
The map above displays where different sectors of High-Tech manufacturing and R&D is located in the Nordic countries together with their sizes (number of employed in sector). The map does pretty well the job of displaying where these jobs are located, but it is useless when used for comparing the different sizes of the circles. Moreover it is implicit that most jobs are located near the large cities like Copenhagen, Malmö, Gothenburg, Oslo, Stockholm and Helsinki. We learn nothing new here.
It would be much more interesting to see the sectors data plotted on a bar chart (see figure below), optionally grouped by country or other regions. It would make it easier to see which region has most jobs in which sector and easier to compare the different sectors sizes with each other, if that is the story we want to tell.
Avoid animated charts and maps, use small multiples
A series of small charts / maps, so called small multiples, may convey the message much better than an animation.
Below an excellent example of a small multiple which effectively shows the trend over time for consumption of liquor per person by county. An animated chart or a map would not have been able to achieve such scientific elegance in representation of data.
See other examples when small multiple chart is the best alternative to a map
Below an animated map showing water stress in several river basin districts over four seasons during 2002-2012. Although the animation may be appealing to the eye, it is difficult to use in order to compare different years or seasons.
Experimenting with the speed of the animation will help you see any pattern that are otherwise hidden if the speed is too slow or too fast.
Below same data shown as small multiples. Since the maps are shown by year and by season, it is easier to compare any year to any year or any season to any other season. We can clearly see that the summers are those with highest water exploitation index and that south Europe, especially Spain is the most affected. In North Europe, England, Copenhagen and Stockholm area also stands out. We can also see that there is no up- or down-trend over time for all seasons. A small multiple of line charts would probably work even better than the map.
Be very careful on how you treat "no-data / missing data"
Take the following chart as an example of the results of certain observations made on the street, just by observing people passing by car / bike or any other transport type. We want to see how many men or women are passing in a specific time frame. When we are not able to identify the gender we would mark it as "unknown". After 1000 observations we would stop collecting data.
The left chart says that 33,5% males and 28,6% females passed by on the street and 37,9% where unknown (the missing data). However we all know that on any given day for a long period of time there should be around 50% male and 50% female (unless we are on a very gender-specific area of the city). The issue with the chart above is that the unknown must not be treated as a third category different from the other two. The unknown contains actually both male and female most probably with the same distribution. Therefore the missing data must be removed and only reported separately. This is standard practice in all statistical survey. On the right the chart corrected, without the unknown. In this case an indication of a margin of error would also help.
Don't compare apples with oranges
This rule sounds trivial but it can be quite difficult to respect it when things may look identical to us. For example image the following trend analysis of CO2 emissions over time in Europe. From a simplistic point of view we are looking at the trend over time for the EU from 1995 to 2014 and all looks fine. However the EU did not consist of the same countries over time and therefore we cannot compare EU12 with EU25 or EU28. The countries that formed EU12 are not statistically equally representatives for all the countries that are part of EU28. We are comparing apples with oranges indeed.
Show the level of confidence
Include error bars any time you use data to make an argument
Don't use more than (about) six colors
- Different colors should be used for different categories (e.g., male/female, types of fruit), not different values in a range (e.g., age, temperature).
- Do not use rainbows for range values
- If you want color to show a numerical value, use a range that goes from white to a highly saturated color in one of the universal color categories. no rainbows
Example of bad chart, where we use different colors for same measurement
Now redone with a gradient color:
Don’t forget 7%-10% of your male audience (color deficiency)
As an example consider the following chart.
Below you have the same chart displayed as a color-blind person would see it.
Use Vischeck to test your images.
If the chart is readable in black and white than it is even better!
Choose the chart type wisely
Before you start charting, take a step back and ask yourself what are the main questions you want to answer. Choose the right chart type that is best for finding specific patterns and gain possible new insights in your data. Online tools like the Data Visualization Catalogue or a decision diagram [2006, A.Abela] helps you finding the right chart for your data.
References and further reading
For references, please go to http://www.eea.europa.eu/data-and-maps/daviz/learn-more/chart-dos-and-donts or scan the QR code.
PDF generated on 26 Mar 2017, 08:55 PM