Dangers of Data Visualization Analysis

As a follow up to my ‘How to Avoid the Pitfalls of Data Analysis’ article, I’ll examine the dangers of data visualization analysis in this post.

As companies strive to gain greater insights across every aspect of their business, we’re seeing a growing demand for data visualization tools – particularly among business and non-technical users – for accessing and analyzing data. Today, there are various data visualization tools in the form of dashboards and scorecards being used on top of current BI applications or other operational systems, such as enterprise resource planning (ERP).

While representing data in a graphical or visual format is valuable for conveying a message in an easy-to-read format, it can be misleading if not used appropriately.

Pie charts are an excellent example of this: they have a place when representing data that add up to 100% but are often used outside of this scenario. In figure 1 the slices add up to 193%: the poll likely allowed more than one response and a pie chart is not the right graphic to display this. Without reading the numbers and just looking at the pie slice size, the viewer is misled with the impression that each candidate has a third of the support. Today we refer to this as ‘fake news.’

Figure 1 – Source: Flowingdata

In figure 2 below, the creator of the chart does not follow normal conventions: Up and right is positive, while down and left is negative. The desire to be creative and not follow conventions makes it hard to interpret the data that you are trying to convey and easily leads to misinterpretations.

The example in Figure 3 breaks several rules. At first glance, it looks like treemap, where the size of each colored area is representative of the data but that’s actually not the case. It’s misleading on several fronts: The color does not correlate to the types of staff, the percentage font is either filled or not, and also bears no correlation to the data except to draw the eye to particular parts of the chart.

 

Figure 2 – Source http://viz.wtf/

Figure 3 – Source http://viz.wtf/

Both of the above charts are prime examples of chartjunk: describing the unnecessary intrusion of graphical elements into data representation, as coined by renowned statistician, Edward Tufte.

Choose the right chart type to ensure accuracy of data analysis

To ensure the accuracy of your data visualization projects, begin with ‘clean data’ and be sure to pick the right chart type.

To choose the right type of chart, just ask yourself if you want to:

  • Compare values:
    • Bar chart
    • Line chart
  • Show the individual parts that make up a whole:
    • Pie chart
    • Stacked bar
    • Stacked column
  • Understand how the data is distributed:
    • Scatter plot
    • Line chart
    • Bar chart
  • Analyze trends:
    • Line chart
    • Bar chart
  • Comprehend the relationship between data sets:
    • Line chart
    • Scatter plot
    • Bubble chart

It’s also important to consider the statistical significance: A large enough pool of representative data is needed to support accurate trends in the data or for comparing data that is different enough to suggest significance. For example, a range of 150 to 350 data points/surveys completed is the benchmark in marketing research. Using data sets that are too small to suggest a trend or comparing results that are not different enough will likely have no statistical significance.

In summary, be wary of how data is presented whether in the form of a chart or infographic. Are you using the right type of chart or displaying data in an infographic in a representative manner?

Learn more about how we’re enabling analytical clients access to a multitude of data sources with our ODBC and JDBC drivers for data connectivity.