Assignment 3: White Hat/Black Hat Visualization

It is tempting to think of data and data visualization as a neutral actor. An emphasis on a minimalist aesthetic — particularly through the use of clean, precise geometric lines — lends an air of objective, transparent reporting that masks visualization’s persuasive power. Given the growing ubiquity of visualization as a medium for recording, analyzing, and communicating data, we have a responsibility to examine how our design choices can influence the way a visualization is read, and what insights a reader walks away with.

In this assignment, we will grapple with these ethical concerns by visualizing a single dataset from two different perspectives: the “white hat” and the “black hat.” These terms originated in the symbolism used by early Western (genre) movies: the heroes wore white hats, and the villains wore black hats. This trope continues to be used in visual media today, and the terms have also been adopted in computer security to refer to two different kinds of hackers: a white hat hacker uses their skills for good (e.g., to uncover vulnerabilities in software to draw attention to and fix the issue), whereas a black hat hacker violates computer security for malicious ends (e.g., their own personal gain).

For this assignment, we will consider a white hat visualization to be one where:

  • The visualization is clear and easy to interpret for the intended audience (often the general population)
  • Any data transformations (e.g., filters, additional computations, etc.) are clearly and transparently communicated
  • The sources of the data, including potential bias, is communicated

A black hat visualization, on the other hand, exhibits one or several of the following characteristics:

  • The visual representation is intentionally inappropriate, overly complex and/or too cluttered for the audience
  • Labels, axes, and legends are misleading
  • Titles are skewed to intentionally influence the viewer’s perception
  • The data has been transformed, filtered, or processed in an intentionally misleading way
  • The source and provenance of the data is not clear to the viewer.

Although we might never imagine ourselves to be (nor aspire to be) black hat hackers, we are going to temporarily don this hat to better appreciate the extent of the rhetorical force of visualization, and build our critical reading skills.

Datasets

You will be working with a single dataset: choose one from the five listed below. These datasets are intentionally chosen to cover politically charged topics as these are typically the type of data where ethical visualization is important. Note that you do not have to visualize the entire dataset (i.e., you may choose a subset of the data to visualize) and that your two visualizations can focus on different aspects of the data.

The five datasets are the following:

  • Greenhouse Gas Emissions 1990–2018. The Organization for Economic Co-operation and Development (OECD) has compiled data for the emissions of all participating countries broken out by the pollutant (e.g., carbon monoxide, methane, etc.) and by different sources (e.g., energy, agriculture, etc.). The linked interface can be a little difficult to use, but you can access various slices of the data by either choosing alternate themes in the left-hand side menu, or by customizing the pollutants and variables in the dropdown menus in the main view.

  • Gender Equality Indicators 1960–2017. The World Bank tracks a number of different measures including fertility rate, literacy, employment and ownership of businesses, and wages to study the extent of gender equality around the world. The linked dataset curates a smaller subset of the overall set of gender indicators which you are welcome to use as well.

  • Civilian Complaints Against New York City Police Officers. This is a dataset compiled by ProPublica, an independent, nonprofit investigative journalism newsroom. It contains more than 12,000 civilian complaints filed against the NYPD, with demographic information about the complainant and officer, the category of the alleged misconduct, and the result of the complaint.

  • Gentrification and Demographic Analysis. This is a dataset compiled by BuzzFeed News to understand gentrification, or how the character and demographics of neighborhoods change as more affluent people and business move in and potentially displace existing residents. The process of data collection, cleaning, and analysis is well-documented by the BuzzFeed News team, and be sure to read the accompanying article which contains important context and details.

  • Gun Deaths in America. This repository contains the R scripts and CSV datasets associated with FiveThirtyEight’s Gun Deaths in America project. We recommend working with full.csv and reading the project page to understand the methodology used to compile this dataset.

Deliverables

You will be visualizing your dataset from two perspectives: the white hat and black hat. As a result, you will be generating two static visualizations – one for each hat. We construe “visualization” broadly (e.g., a single visualization may comprise several small multiple views). You are free to use any visualization technique and any visualization tool and you do not need to use the same tools/techniques to generate both visualizations. As with prior assignments, you should carefully consider not only visual encoding decisions but also how you might transform your data (e.g., calculating new fields; grouping, binning, or aggregating data; log transforms; etc.), and what annotations and labels might help best convey the message from a particular perspective.

For each visualization, document your decisions and describe your rationale in a short write-up (no more than 4 paragraphs per visualization). Note that subtlety is part of the rubric for the black hat visualization, which means we will likely rely heavily on your write-up for grading this visualization in particular.

Grading Criteria

The assignment score is out of a maximum of 10 points, evenly divided between white and black hat. Submissions that squarely meet the requirements (i.e., the “Satisfactory” column in the rubric below) will receive a score of 8. We will determine scores by judging the clarity of your white hat visualization, the subtle deceptiveness of your black hat visualization, and the quality of the associated write-ups.

We will use the following rubric to grade your assignment. Note, rubric cells may not map exactly to specific point scores.

Hat Component Excellent Satisfactory Poor
White Marks & Encodings All design choices are effective. The visualization can be read and understood effortlessly. Design choices are largely effective, but minor errors hinder comprehension. Ineffective mark or encoding choices are distracting or potentially misleading.
Data Transformation More advanced transformations (e.g., additional calculations, aggregations) were used to extend the dataset in interesting or useful ways. Simple transforms (e.g., sorting, filtering) were primarily used. The raw dataset was used directly, with little to no additional transformation.
Titles & Labels Titles and labels helpfully describe and contextualize the visualization. Most necessary titles and labels are present, but they could provide more context. Many titles or labels are missing, or do not provide human-legible information.
Write-Up Your write up is well-crafted and provides reasoned justification for all design choices. Most design decisions are described, but rationale could be explained at a greater level of detail. Missing or incomplete. Several design choices are left unexplained.
Black Deceptiveness Visualization is misleading in at least 2 out of these 3 categories: marks/encodings, data transformation, titles/labels. Visualization is misleading in only 1 of these 3 categories: marks/encodings, data transformation, titles/labels. No black hat techniques were used.
Subtlety The black-hat techniques used are very subtle and need close study to be identified even by seasoned visualization readers. The black-hat techniques cannot be detected at first glance but are still somewhat easy to identify. The black-hat techniques could be immediately identified.
Visualization Design (marks, encodings, data transformations, title & labels) Aspects of the visualization design make it appear interesting and possibly trustworthy. An acceptable quality of visualization design. However, some aspects do not help convince the reader of its trustworthiness. Poor quality of visualization design does not convince the reader that the visualization is trustworthy. E.g., certain elements such as titles or legends are missing altogether.
Write-Up Your write up is well-crafted and provides reasoned justification for all design choices, and especially the black-hat techniques you used. Most design decisions are described, but rationale could be explained at a greater level of detail. Missing or incomplete. Several design choices are left unexplained.
Either / Both Creativity & Originality You exceeded the parameters of the assignment, with original insights or a particularly engaging design. You met all the parameters of the assignment. You met most of the parameters of the assignment.

Submission Details

You must work individually for this assignment. It is due Wednesday 3/24, 11:59 pm EST. Submit your assignment using this Google Form. The form expects your visualization to be a single image (either a .png or .jpg). Please make sure your image is sized for a reasonable viewing experience — readers should not have to zoom or scroll in order to view your submission.

Tips and Examples

In past years, we’ve had many questions about what constitutes a “black hat” technique and how obvious it needs to be. Indeed, it’s a tricky balancing act. For instance “black hattedness” is not a strictly monotonic function as you can raise a reader’s suspicion by using too many black hat techniques all at once, by making outright mistakes, or by making the misleading intent too obvious to the reader. Similarly, having an unclear (or no) data question, or ommitting titles, axes, or legends are more likely to hinder a reader’s overall ability to read or make sense of a visualization rather than mislead them or slant the message.

So how should you navigate this design tension?

Acknowledgements

This assignment is heavily inspired by a similar effort from Niklas Elmqvist at the University of Maryland, College Park.