Ethics of Graphing
Principles of Graphing
Dr. Michael E. Olson
MATH 3080 - Foundations of Data Science
What is wrong with this graph?
<img src="./images/12_EthicsOfGraphing/bad_scatter.png" width=100% alt="Lack of Visual Cues">
What is wrong with this graph?
<img src="./images/12_EthicsOfGraphing/bad_scatter.png" width=100% alt="Lack of Visual Cues">
* Numbers are not on a scale
* (Numbers are likely strings)
* No labels
* No title
* No legend
* Font size (Unreadable)
Corrected graph
<img src="./images/12_EthicsOfGraphing/good_scatter.png" width=100% alt="Lack of Visual Cues">
* Numbers are not on a scale
* (Numbers are likely strings)
* No labels
* No title
* No legend
* Font size (Unreadable)
What is wrong with this graph?
<img src=”https://ticktockmaths.co.uk/wp-content/uploads/2024/09/10.png” width=80%>
(from https://ticktockmaths.co.uk/badgraphs/)
What is wrong with this graph?
<img src="./images/12_EthicsOfGraphing/no_labels.jpeg" height=40%>
(from https://ticktockmaths.co.uk/badgraphs/)
Mistake or Misleading?
Some graphs give readers incorrect impressions.
Sometimes, people honestly make mistakes
Sometimes, people purposefully use bad statistics to mislead readers
Example of Misleading Statistics
In the 2020 election season, a news article said:
Older, white voters are significantly more likely to vote by mail and have those ballots counted, studies show, while voters of color and younger voters are significantly more likely to have their ballots rejected.
NBC News, Aug 9, 2020
Could this be deceptive?
Example of Misleading Statistics
It likely isn’t purposefully deceptive - just a wording issue. But here is a scenario where their statement is true, but completely misleading.
| |
White |
Non-White |
| Older |
45% |
25% |
| Younger |
20% |
10% |
Younger AND non-white groups would be 20%+25%+10%=55%,
a majority, though the biggest problem is older white voters.
Example of Misleading Statistics
<img src="./images/12_EthicsOfGraphing/Trump_employment.png" width=85% alt="">
From the 2026 State of the Union address
Example of Misleading Statistics
<img src="./images/12_EthicsOfGraphing/Trump_employment_full.png" width=85% alt="">
Mistake or Misleading?
In the following slides, we will look at a few principles that could convey information incorrectly and how to avoid them.
Note that these figures come from our textbook:
Visual Cues
<img src="https://rafalab.dfci.harvard.edu/dsbook-part-1/dataviz/dataviz-principles_files/figure-html/piechart-1.png" width=100% alt="Lack of Visual Cues">
Visual Cues
<img src="https://rafalab.dfci.harvard.edu/dsbook-part-1/dataviz/dataviz-principles_files/figure-html/piechart-1.png" width=100% alt="Lack of Visual Cues">
* Which had the largest?
* Did the size of Firefox users increase or decrease?
Visual Cues
<img src="https://rafalab.dfci.harvard.edu/dsbook-part-1/dataviz/dataviz-principles_files/figure-html/piechart-1.png" width=100% alt="Lack of Visual Cues">
Issues:
* No clear way to compare the size of each area between the two figures
* Sections are determined by both angle and area (two different dimensions)
Visual Cues
<img src="https://rafalab.dfci.harvard.edu/dsbook-part-1/dataviz/dataviz-principles_files/figure-html/donutchart-1.png" width=100% alt="Results with only area">
Solutions:
* Add labels to the graph
* Only use angle or area
Here is an example using a donut graph with the same data, but only using area.
No labels -- still not clear
Visual Cues
| Browser | 2000 | 2015 |
| :------ | :---: | :---: |
| Opera | 3 | 2 |
| Safari | 21 | 22 |
| Firefox | 23 | 21 |
| Chrome | 26 | 29 |
| IE | 28 | 27 |
Sometimes, just giving the data in a table is clearer
Visual Cues
<img src="https://rafalab.dfci.harvard.edu/dsbook-part-1/dataviz/dataviz-principles_files/figure-html/two-barplots-1.png" width=100% alt="Pie chart compared to a bar graph">
... or use a bar graph.
<img src="https://rafalab.dfci.harvard.edu/dsbook-part-1/dataviz/dataviz-principles_files/figure-html/two-barplots-1.png" width=100% alt="Pie chart compared to a bar graph">
... or use a bar graph.
When to include 0
<img src="https://rafalab.dfci.harvard.edu/dsbook-part-1/dataviz/img/class2_8.jpg" width=100% alt="Bar graph without a 0 base">
When to include 0
<img src="https://rafalab.dfci.harvard.edu/dsbook-part-1/dataviz/img/class2_8.jpg" width=100% alt="Bar graph without a 0 base">
Issues:
* Lines are not proportioned correctly
It looks like 2013 has tripled 2011, but really only increased by 16%
When to include 0
<img src="https://rafalab.dfci.harvard.edu/dsbook-part-1/dataviz/dataviz-principles_files/figure-html/barplot-from-zero-1-1.png" width=100% alt="Bar graph with a 0 base">
Solution:
* When the distance from 0 matters, make sure 0 is displayed in the graph
When to include 0
<img src="https://rafalab.dfci.harvard.edu/dsbook-part-1/dataviz/img/venezuela-election.png" width=100% alt="Election results without a 0 base">
When to include 0
<img src="https://rafalab.dfci.harvard.edu/dsbook-part-1/dataviz/img/venezuela-election.png" width=100% alt="Election results without a 0 base">
<img src="https://rafalab.dfci.harvard.edu/dsbook-part-1/dataviz/dataviz-principles_files/figure-html/barplot-from-zero-3-1.png" width=100% alt="Election results with a 0 base">
When including 0 isn’t needed
<img src=”https://rafalab.dfci.harvard.edu/dsbook-part-1/dataviz/dataviz-principles_files/figure-html/points-plot-not-from-zero-1.png” width=70% alt=”Life Expectancy by continent - Clustered data”>
In cases where we look at a distribution of values, the 0 is not really necessary
Distorting Quantities
<img src="https://rafalab.dfci.harvard.edu/dsbook-part-1/dataviz/img/state-of-the-union.png" width=100% alt="World economy shown by radius, but area stands out">
Distorting Quantities
<img src="https://rafalab.dfci.harvard.edu/dsbook-part-1/dataviz/dataviz-principles_files/figure-html/area-not-radius-1.png" width=100% alt="World economy shown by radius and again by area">
Issues:
* The most obvious measure is the area (US 5x as large as China)
* Actually used radius/diameter (US 3x as large as China)
Distorting Quantities
<img src=”https://rafalab.dfci.harvard.edu/dsbook-part-1/dataviz/dataviz-principles_files/figure-html/barplot-better-than-area-1.png” width=55% alt=”World economy shown as a bar graph”>
A bar graph is easier
Distorting Quantities
<img src=”https://i0.wp.com/ticktockmaths.co.uk/wp-content/uploads/2024/09/15-2-1024x530.png?ssl=1” width=55% alt=”World economy shown as a bar graph”>
(from https://ticktockmaths.co.uk/badgraphs/)
Distorting Quantities
<img src="./images/12_EthicsOfGraphing/GunDeaths.jpg" height=67%>
Distorting Quantities
<img src="./images/12_EthicsOfGraphing/GunDeaths.jpg" height=67%>
Issues:
y-axis is inverted, giving impression that fewer deaths occurred after Florida enacted its "Stand Your Ground" law
Meaningful Order
<img src=”https://rafalab.dfci.harvard.edu/dsbook-part-1/dataviz/dataviz-principles_files/figure-html/do-not-order-alphabetically-1.png” width=55% alt=”Murder rates by state alphabetically and again by rate order”>
Meaningful Order
<img src=”https://rafalab.dfci.harvard.edu/dsbook-part-1/dataviz/dataviz-principles_files/figure-html/reorder-boxplot-example-1.png” width=65% alt=”Income distributions by region alphabetically and again by income median”>