# Accessible Visualization via Natural Language Descriptions: A Four-Level Model of Semantic Content

IEEE Transactions on Visualization & Computer Graphics (Proc. IEEE VIS), 2022

## 1 Introduction

The proliferation of visualizations during the COVID-19 pandemic has underscored their double-edged potential: efficiently communicating critical public health information — as with the immediately-canonical “Flatten the Curve” chart (Fig. Teaser) — while simultaneously excluding people with disabilities. “For many people with various types of disabilities, graphics and the information conveyed in them is hard to read and understand,” says software engineer Tyler Littlefield [28], who built a popular text-based COVID-19 statistics tracker after being deluged with inaccessible infographics [94, 65]. While natural language descriptions sometimes accompany visualizations in the form of chart captions or alt text (short for “alternative text”), these practices remain rare. Technology educator and researcher Chancey Fleet notes that infographics and charts usually lack meaningful and detailed descriptions, leaving disabled people with “a feeling of uncertainty” about the pandemic [28]. For readers with visual disabilities (approximately 8.1 million in the United States and 253 million worldwide [1]), inaccessible visualizations are, at best, demeaning and, at worst, damaging to health, if not accompanied by meaningful and up-to-date alternatives.

Predating the pandemic, publishers and education specialists have long suggested best practices for accessible visual media, including guidelines for tactile graphics [41] and for describing “complex images” in natural language [99, 39]. While valuable, visualization authors have yet to broadly adopt these practices, for lack of experience with accessible media, if not a lack of attention and resources. Contemporary visualization research has primarily attended to color vision deficiency [21, 77, 79], and has only recently begun to engage with non-visual alternatives [25, 67] and with accessibility broadly [53, 105]. Parallel to these efforts, computer science researchers have been grappling with the engineering problem of automatically generating chart captions [27, 78, 84]. While well-intentioned, these methods usually neither consult existing accessibility guidelines, nor do they evaluate their results empirically with their intended readership. As a result, it is difficult to know how useful (or not) the resultant captions are, or how effectively they improve access to meaningful information.

In this paper, we make a two-fold contribution. First, we extend existing accessibility guidelines by introducing a conceptual model for categorizing and comparing the semantic content conveyed by natural language descriptions of visualizations. Developed through a grounded theory analysis of 2,147 natural language sentences, authored by over 120 participants in an online study (§ 3), our model spans four levels of semantic content: enumerating visualization construction properties (e.g., marks and encodings); reporting statistical concepts and relations (e.g., extrema and correlations); identifying perceptual and cognitive phenomena (e.g., complex trends and patterns); and elucidating domain-specific insights (e.g., social and political context) (§ 4). Second, we demonstrate how this model can be applied to evaluate the effectiveness of visualization descriptions, by comparing different semantic content levels and reader groups. We conduct a mixed-methods evaluation in which a group of 30 blind and 90 sighted readers rank the usefulness of descriptions authored at varying content levels (§ 5). Analyzing the resultant 3,600 ranked descriptions, we find significant differences in the content favored by these reader groups: while both groups generally prefer mid-level semantic content, they sharply diverge in their rankings of both the lowest and highest levels of our model.

These findings, contextualized by readers’ open-ended feedback, suggest that access to meaningful information is strongly reader-specific, and that captions for blind readers should aim to convey a chart’s trends and statistics, rather than solely detailing its low-level design elements or high-level insights. Our model of semantic content is not only descriptive (categorizing what is conveyed by visualizations) and evaluative (helping us to study what should be conveyed to whom) but also generative [7, 8], pointing toward novel multimodal and accessible data representations (§ 6.1). Our work further opens a space of research on natural language as a data interface coequal with the language of graphics [12], calling back to the original linguistic and semiotic motivations at the heart of visualization theory and design (§ 6.2).

## 2 Related Work

Multiple visualization-adjacent literatures have studied methods for describing charts and graphics through natural language — including accessible media research, Human-Computer Interaction (HCI), Computer Vision (CV), and Natural Language Processing (NLP). But, these various efforts have been largely siloed from one another, adopting divergent methods and terminologies (e.g., the terms “caption” and “description” are used inconsistently). Here, we survey the diverse terrain of literatures intersecting visualization and natural language.

### 2.1 Automatic Methods for Visualization Captioning

Automatic methods for generating visualization captions broadly fall into two categories: those using CV and NLP methods when the chart is a rasterized image (e.g., jpegs or pngs); and those using structured specifications of the chart’s construction (e.g., grammars of graphics).

#### 2.1.1 Computer Vision and Natural Language Processing

Analogous to the long-standing CV and NLP problem of automatically captioning photographic images [64, 58, 48], recent work on visualization captioning has aimed to automatically generate accurate and descriptive natural language sentences for charts [22, 24, 23, 6, 78, 59, 83]. Following the encoder-decoder framework of statistical machine translation [98, 107], these approaches usually take rasterized images of visualizations as input to a CV model (the encoder), which learns the visually salient features for outputting a relevant caption via a language model (the decoder). Training data consists of ⟨chart, caption⟩  pairs, collected via web-scraping and crowdsourcing [84], or created synthetically from pre-defined sentence templates [47]. While these approaches are well-intentioned, in aiming to address the engineering problem of how to automatically generate natural language captions for charts, they have largely sidestepped the complementary (and prior) question: which semantic content should be generated to begin with? Some captions may be more or less descriptive than others, and different readers may receive different semantic content as more or less useful, depending on their levels of data literacy, domain-expertise, and/or visual perceptual ability [69, 72, 71]. To help orient work on automatic visualization captioning, our four-level model of semantic content offers a means of asking and answering these more human-centric questions.

#### 2.1.2 Structured Visualization Specifications

In contrast to rasterized images of visualizations, chart templates [96], component-based architectures [38], and grammars of graphics [87] provide not only a structured representation of the visualization’s construction, but typically render the visualization in a structured manner as well. For instance, most of these approaches either render the output visualization as Scalable Vector Graphics (SVG) or provide a scenegraph API. Unfortunately, these output representations lose many of the semantics of the structured input (e.g., which elements correspond to axes and legends, or how nesting corresponds to visual perception). As a result, most present-day visualizations are inaccessible to people who navigate the web using screen readers. For example, using Apple’s VoiceOver to read D3 charts rendered as SVG usually outputs an inscrutable mess of screen coordinates and shape rendering properties. Visualization toolkits can ameliorate this by leveraging their structured input to automatically add Accessible Rich Internet Application (ARIA) attributes to appropriate output elements, in compliance with the World Wide Web Consortium (W3C)’s Web Accessibility Initiative (WAI) guidelines [99]. Moreover, this structured input representation can also simplify automatically generating natural language captions through template-based mechanisms, as we discuss in § 4.1.

### 2.2 Accessible Media and Human-Computer Interaction

While automatic methods researchers often note accessibility as a worthy motivation [27, 78, 84, 83, 30, 31], evidently few have collaborated directly with disabled people [25, 71] or consulted existing accessibility guidelines [67]. Doing so is more common to HCI and accessible media literatures [73, 91], which broadly separate into two categories corresponding to the relative expertise of the description authors: those authored by experts (e.g., publishers of accessible media) and those authored by non-experts (e.g., via crowdsourcing or online platforms).

#### 2.2.1 Descriptions Authored by Experts

Publishers have developed guidelines for describing graphics appearing in science, technology, engineering, and math (STEM) materials [39, 9]. Developed by and for authors with some expert accessibility knowledge, these guidelines provide best practices for conveying visualized content in traditional media (e.g., printed textbooks, audio books, and tactile graphics). But, many of their prescriptions — particularly those relating to the content conveyed by a chart, rather than the modality through which the chart is rendered — are also applicable to web-based visualizations. Additionally, web accessibility guidelines from W3C provide best-practices for writing descriptions of “complex images” (including canonical chart types), either in a short description alt text attribute, or as a long textual description displayed alongside the visual image [99]. While some of these guidelines have been adopted by visualization practitioners [19, 88, 29, 32, 102, 101, 34], we here bring special attention to the empirically-grounded and well-documented guidelines created by the wgbh National Center for Accessible Media [39] and by the Benetech Diagram Center [9].

#### 2.2.2 Descriptions Authored by Non-Experts

Frequently employed in HCI and visualization research, crowdsourcing is a technique whereby remote non-experts complete tasks currently infeasible for automatic methods, with applications to online accessibility [13], as well as remote description services like Be My Eyes. For example, Morash et al. explored the efficacy of two types of non-expert tasks for authoring descriptions of visualizations: non-experts authoring free-form descriptions without expert guidance, versus those filling-in sentence templates pre-authored by experts [72]. While these approaches can yield more richly detailed and “natural”-sounding descriptions (as we discuss in § 5), and also provide training data for auto-generated captions and annotations [84, 56], it is important to be attentive to potential biases in human-authored descriptions [10].

### 2.3 Natural Language Hierarchies and Interfaces

Apart from the above methods for generating descriptions, prior work has adopted linguistics-inspired framings to elucidate how natural language is used to describe — as well as interact with — visualizations.

#### 2.3.1 Using Natural Language to Describe Visualizations

Demir et al. have proposed a hierarchy of six syntactic complexity levels corresponding to a set of propositions that might be conveyed by bar charts [27]. Our model differs in that it orders semantic content — i.e., what meaning the natural language sentence conveys — rather than how it does so syntactically. Thus, our model is agnostic to a sentence’s length, whether it contains multiple clauses or conjunctions, which has also been a focus of prior work in automatic captioning [84]. Moreover, whereas Demir et al. speculatively “envision” their set of propositions to construct their hierarchy, we arrive to our model empirically through a multi-stage grounded theory process (§ 3). Perhaps closest to our contribution are a pair of papers by Kosslyn [57] and Livingston & Brock [66]. Kosslyn draws on canonical linguistic theory, to introduce three levels for analyzing charts: the syntactic relationship between a visualization elements; the semantic meaning of these elements in what they depict or convey; and the pragmatic aspects of what these elements convey in the broader context of their reading [57]. We seeded our model construction with a similar linguistics-inspired framing, but also evaluated it empirically, to further decompose the semantic levels (§ 3.1). Livingston & Brock adapt Kosslyn’s ideas to generate what they call “visual sentences”: natural language sentences that are the result of executing a single, specific analytic task against a visualization [66]. Inspired by the Sentence Verification Technique (svt[85, 86], this work considers visual sentences for assessing graph comprehension, hoping to offer a more “objective” and automated alternative to existing visualization literacy assessments [63, 35]. While we adopt a more qualitative process for constructing our model, Livingston & Brock’s approach suggests opportunities for future work: might our model map to similarly-hierarchical models of analytic tasks [17, 5]?

#### 2.3.2 Using Natural Language to Interact with Visualizations

Adjacently, there is a breadth of work on Natural Language Interfaces (NLIs) for constructing and exploring visualizations [75, 42, 90, 50]. While our model primarily considers the natural language sentences that are conveyed by visualizations (cf., natural language as input for chart specification and exploration) [93], our work may yet have implications for NLIs. For example, Hearst et al. have found that many users of chatbots prefer not to see charts and graphics alongside text in the conversational dialogue interface [43]. By helping to decouple visual-versus-linguistic data representations, our model might be applied to offer these users a textual alternative to inline charts. Thus, we view our work as complementary to NLIs, facilitating multimodal and more accessible data representations [51], while helping to clarify the theoretical relationship between charts and captions [52, 80], and other accompanying text [106, 54, 55, 2].

## 3 Constructing the Model: Employing the Grounded Theory Methodology

To construct our model of semantic content we conducted a multi-stage process, following the grounded theory methodology. Often employed in HCI and the social sciences, grounded theory offers a rigorous method for making sense of a domain that lacks a dominant theory, and for constructing a new theory that accounts for diverse phenomena within that domain [74]. The methodology approaches theory construction inductively — through multiple stages of inquiry, data collection, “coding” (i.e., labeling and categorizing), and refinement — as well as empirically, remaining strongly based (i.e., “grounded”) in the data [74]. To construct our model of semantic content, we proceeded in two stages. First, we conducted small-scale data collection and initial open coding to establish preliminary categories of semantic content. Second, we gathered a larger-scale corpus to iteratively refine those categories, and to verify their coverage over the space of natural language descriptions.

### 3.1 Initial Open Coding

We began gathering preliminary data by searching for descriptions accompanying visualizations in journalistic publications (including the websites of FiveThirtyEight, the New York Times and the Financial Times), but found that these professional sites usually provided no textual descriptions — neither as a caption alongside the chart, nor as alt text for screen readers. Indeed, often these sites were engineered so that screen readers would pass over the visualizations entirely, as if they did not appear on the page at all. Thus, to proceed with the grounded theory method, we conducted initial open coding (i.e., making initial, qualitative observations about our data, in an “open-minded” fashion) by studying preliminary data from two sources. We collected 330 natural language descriptions from over 100 students enrolled in a graduate-level data visualization class. As a survey-design pilot to inform future rounds of data collection (§ 3.2.1), these initial descriptions were collected with minimal prompting: students were instructed to simply “describe the visualization” without specifying what kinds of semantic content that might include. The described visualizations covered a variety of chart types (e.g., bar charts, line charts, scatter plots) as well as dataset domains (e.g., public health, climate change, and gender equality). To complement the student-authored descriptions, from this same set of visualizations, we curated a set of 20 and wrote our (the authors’) own descriptions, attempting to be as richly descriptive as possible. Throughout, we adhered to a linguistics-inspired framing by attending to the semantic and pragmatic aspects of our writing: which content could be conveyed through the graphical sign-system alone, and which required drawing upon our individual background knowledge, experiences, and contexts.

Analyzing these preliminary data, we proceeded to the next stage in the grounded theory method: forming axial codes (i.e., open codes organized into broader abstractions, with more generalized meaning [74]) corresponding to different content. We began to distinguish between content about a visualization’s construction (e.g., its title, encodings, legends), content about trends appearing in the visualized data (e.g., correlations, clusters, extrema), and content relevant to the visualized data but not represented in the visualization itself (e.g., explanations based on current events and domain-specific knowledge). From these axial codes, different categories (i.e., groupings delineated by shared characteristics of the content) began to emerge [74], corresponding to a chart’s encoded elements, latent statistical relations, perceptual trends, and context. We refined these content categories iteratively by first writing down descriptions of new visualizations (again, as richly as possible), and then attempting to categorize each sentence appearing in that description. If we encountered a sentence that didn’t fit within any category, we either refined the specific characteristics belonging to an existing category, or we created a new category, where appropriate.

### 3.2 Gathering A Corpus

The prior inductive and empirical process resulted in a set of preliminary content categories. To test their robustness, and to further refine them, we conducted an online survey to gather a larger-scale corpus of 582 visualization descriptions comprised of 2,147 sentences.

#### 3.2.1 Survey Design

We first curated a set of 50 visualizations drawn from the MassVis dataset [16, 15], Quartz’s Atlas visualization platform [81], examples from the Vega-Lite gallery [87], and the aforementioned journalistic publications. We organized these visualizations along three dimensions: the visualization type (bar charts, line charts, and scatter plots); the topic of the dataset domain (academic studies, business-related, or non-business data journalism); and their difficulty based on an assessment of their visual and conceptual complexity. We labeled visualizations as “easy” if they were basic instances of their canonical type (e.g., single-line or un-grouped bar charts), as ”medium” if they were more moderate variations on canon (e.g., contained bar groupings, overlapping scatterplot clusters, visual embellishments, or simple transforms), and as ”hard” if they strongly diverged from canon (e.g., contained chartjunk or complex transforms such as log scales). To ensure robustness, two authors labeled the visualizations independently, and then resolved any disagreement through discussion. Table 1 summarizes the breakdown of the 50 visualizations across these three dimensions.

In the survey interface, participants were shown a single, randomly-selected visualization at a time, and prompted to describe it in complete English sentences. In our preliminary data collection (§ 3.1), we found that without explicit prompting participants were likely to provide only brief and minimally informative descriptions (e.g., sometimes simply repeating the chart title and axis labels). Thus, to mitigate against this outcome, and to elicit richer semantic content, we explicitly instructed participants to author descriptions that did not only refer to the chart’s basic elements and encodings (e.g., it’s title, axes, colors) but to also referred to other content, trends, and insights that might be conveyed by the visualization. To make these instructions intelligible, we provided participants with a few pre-generated sentences enumerating the visualization’s basic elements and encodings (e.g., the color coded sentences in Table 3 A.1, B.1, C.1), and prompted them to author semantic content apart from what was already conveyed by those sentences. To avoid biasing their responses, participants were not told that they would be read by people with visual disabilities. This prompting ensured that the survey captured a breadth of semantic content, and not only the most readily-apparent aspects of the visualization’s construction.

#### 5.2.2 Participant Demographics

Among the 30 blind participants, 53% (n=16) reported their gender as male, 36% (n=11) as female, and 3 participants “preferred not to say.” The most common highest level of education attained was a Bachelor’s degree (60%, n=18), and most readers were between 20 – 40 years old (66%, n=20). The screen reader technology readers used to complete the study was evenly balanced: VoiceOver (n=10), JAWS (n=10), NVDA (n=9), and “other” (n=1). Among the 90 sighted participants, 69% reported their gender as male (n=62) and 31% as female (n=28). The most common highest level of education attained was a high school diploma (42%, n=38) followed by a Bachelor’s degree (40%, n=36), and most sighted readers were between 20 – 30 years old (64%, n=58).

On a 7-point Likert scale [1=strongly disagree, 7=strongly agree], blind participants reported having “a good understanding of data visualization concepts” ($\mu=6.3$, $\sigma=1.03$) as well as “a good understanding of statistical concepts and terminology” ($\mu=5.90$, $\sigma=1.01$). Sighted participants reported similar levels of understanding: ($\mu=6.7$, $\sigma=0.73$) and ($\mu=5.67$, $\sigma=1.06$), respectively. Sighted participants also considered themselves to be “proficient at reading data visualizations” ($\mu=5.97$, $\sigma=0.89$) and were able to “read and understand all of the visualizations presented in this study” ($\mu=6.44$, $\sigma=0.71$).

### 5.3 Quantitative Results

Quantitative results for the individual rankings (1,800 per blind and sighted reader groups) are summarized by the heatmaps in Table 4 (Upper Subtable), which aggregate the number of times a given content level was assigned a certain rank. Dotted lines in both blind and sighted heatmaps delineate regions exceeding a threshold — calculated by taking the mean plus half a standard deviation ($\mu+\frac{\sigma}{2}$) resulting in a value of 139 and 136, respectively — and are labeled with a capital letter A – F.

These results exhibit significant differences between reader groups. For both reader groups, using Friedman’s Test (a non-parametric multi-comparison test for rank-order data) the p-value is $p<0.001$, so we reject the null hypothesis that the mean rank is the same for all four semantic content levels [37]. Additionally, in Table 4 (Lower Subtable), we find significant ranking differences when making pair-wise comparisons between levels, via Nemenyi’s test (a post-hoc test commonly coupled with Friedman’s to make pair-wise comparisons). There appears to be strong agreement among sighted readers that higher levels of semantic content are more useful: Levels 3 and 4 are found to be most useful (Region 4.F), while Levels 1 and 2 are least useful (Regions 4.D and  4.E). Blind readers agree with each other to a lesser extent, but strong trends are nevertheless apparent. In particular, blind readers rank content and Levels 2 and 3 as most useful (Region 4.C), and semantic content at Levels 1 and 4 as least useful (Regions 4.A and  4.B).

When faceting these rankings by visualization type, topic, or difficulty we did not observe any significant differences, suggesting that both reader groups rank semantic content levels consistently, regardless of how the chart itself may vary. Noteworthy for both reader groups, the distribution of rankings for Level 1 is bimodal —– the only level to exhibit this property. While a vast majority of both blind and sighted readers rank Level 1 content as least useful, this level is ranked “most useful” in 101 and 87 instances by blind and sighted readers, respectively. This suggests that both reader groups have a more complicated perspective toward descriptions of a chart’s elemental and encoded properties; a finding we explore further by analyzing qualitative data.

### 5.4 Qualitative Results

In a questionnaire, we asked readers to use a 7-point Likert scale [1=strongly disagree, 7=strongly agree] to rate their agreement with a set of statements about their experience with visualizations. We also asked them to offer open-ended feedback about which semantic content they found to be most useful and why. Here, we summarize the key trends that emerged from these two different forms of feedback, from both blind readers (BR) and sighted readers (SR).

#### 5.4.1 Descriptions Are Important to Both Reader Groups

All blind readers reported encountering inaccessible visualizations: either multiple times a week (43%, n=13), everyday (20%, n=6), once or twice a month (20%, n=6), or at most once a week (17%, n=5). These readers reported primarily encountering these barriers on social media (30%, n=9), on newspaper websites (13%, n=4), and in educational materials (53%, n=16) — but, most often, barriers were encountered in all of the above contexts (53%, n=16). Blind readers overwhelmingly agreed with the statements “I often feel that important public information is inaccessible to me, because it is only available in a visual format” ($\mu=6.1$, $\sigma=1.49$), and “Providing textual descriptions of data visualizations is important to me” ($\mu=6.83$, $\sigma=0.38$).

“I am totally blind, and virtually all data visualizations I encounter are undescribed, and as such are unavailable. This has been acutely made clear on Twitter and in newspapers around the COVID-19 pandemic and the recent U.S. election. Often, visualizations are presented with very little introduction or coinciding text. I feel very left out of the world and left out of the ability to confidently traverse that world. The more data I am unable to access, the more vulnerable and devalued I feel.” (BR5)

By contrast, sighted readers neither agreed nor disagreed regarding the inaccessibility of information conveyed visually ($\mu=4$, $\sigma=1.57$). Similarly, they were split on whether they ever experienced barriers to reading visualizations, with 52% (n=47) reporting that they sometimes do (especially when engaging with a new topic) and 48% (n=43) reporting that they usually do not. Nevertheless, sighted readers expressed support for natural language descriptions of visualizations ($\mu=5.60$, $\sigma=1.27$). A possible explanation for this support is that — regardless of whether the visualization is difficult to read — descriptions can still facilitate comprehension. For instance, SR64 noted that “textual description requires far less brainpower and can break down a seemingly complex visualization into an easy to grasp overview.”

A majority of blind readers (63%, n=19) were emphatic that descriptions should not contain an author’s subjective interpretations, contextual information, or editorializing about the visualized data (i.e., Level 4 content). Consistent with blind readers ranking this as among the least useful (Region 4.B), BR20 succinctly articulated a common sentiment: “I want the information to be simply laid out, not peppered with subjective commentary… I just prefer it to be straight facts, not presumptions or guesstimates.” BR4 also noted that an author’s “opinions” about the data “should absolutely be avoided,” and BR14 emphasized agency when interpreting data: “I want to have the time and space to interpret the numbers for myself before I read the analysis.” By contrast, many sighted readers 41% (n=37) expressed the opposite sentiment (Region 4.F) noting that, for them, the most useful descriptions often “told a story,” communicated an important conclusion, or provided deeper insights into the visualized data. As SR64 noted: “A description that simply describes the visualization and its details is hardly useful, but a description that tells a story using the data and derives a solution from it is extremely useful.” Only 4% (n=4) of sighted readers explicitly stated that a description should exclude Level 4 semantic content.

#### 5.4.3 Some Readers Prefer Non-Statistical Content

Overall, blind readers consistently ranked both Levels 2 and 3 as the most useful (Region 4.C). But, some readers explicitly expressed preference for the latter over the former, highlighting two distinguishing characteristics of Level 3 content: that it conveys not only descriptive statistics but overall perceptible trends, and that it is articulated in commonplace or “natural”-sounding language. For instance, BR26 remarked that a visualization description is “more useful if it contains the summary of the overall trends and distributions of the data rather than just mentioning some of the extreme values or means.” Similarly, BR21 noted that “not everyone who encounters a data visualization needs it for statistical purposes,” and further exclaimed “I want to know how a layperson sees it, not a statistician; I identify more with simpler terminology.” These preferences help to further delineate Level 3 from Levels 2 and 4. Content at Level 3 is “non-statistical” in the sense that it does only report statistical concepts and relations (as in Level 2), but neither does it do away with statistical “objectivity” entirely, so as to include subjective interpretation or speculation (as content in Level 4 might). In short, Level 3 content conveys statistically-grounded concepts in not-purely-statistical terms, a challenge that is core to visualization, and science communication more broadly.

#### 5.4.4 Combinations of Content Levels Are Likely Most Useful

While roughly 12% readers from both blind and sighted groups indicated that a description should be as concise as possible, among blind readers, 40% (n=12) noted that the most useful descriptions would combine content from multiple levels. This finding helps to explain the bimodality in Level 1 rankings we identified in the previous section. According to BR9, Level 1 content is only useful if other information is also conveyed: “All of the descriptions provided in this survey which *only* elaborated on x/y and color-coding are almost useless.” This sentiment was echoed by BR5, who added that if Level 1 content were “combined with the [Level 2 or Level 3], that’d make for a great description.” This finding has implications for research on automatic visualization captioning: these methods should aim to generate not only the lower levels of semantic content, but to more richly communicate a chart’s overall trends and statistics, sensitive to reader preferences.

#### 5.4.5 Some Automatic Methods Raise Ethical Concerns

Research on automatically generating visualization captions is often motivated by the goal of improving information access for people with visual disabilities [83, 27, 78, 84]. However, when deployed in real-world contexts, these methods may not confer their intended benefits, as one blind reader in our evaluation commented.

“A.I. attempting to convert these images is still in its infancy. Facebook and Apple auto-descriptions of general images are more of a timewaster than useful. As a practical matter, if I find an inaccessible chart or graph, I just move on.” (BP22)

Similarly, another participant (BR26) noted that if a description were to only describe a visualization’s encodings then “the reader wouldn’t get any insight from these texts, which not only increases the readers’ reading burden but also conveys no effective information about the data.” These sentiments reflect some of the ethical concerns surrounding the deployment of nascent CV and NLP models, which can output accurate but minimally informative content — or worse, can output erroneous content to a trusting audience [69, 78]. Facebook’s automatic image descriptions, for example, have been characterized by technology educator Chancey Fleet as “famously useless in the Blind community” while “garner[ing] a ton of glowing reviews from mainstream outlets without being of much use to disabled people” [33, 40]. Such concerns might be mitigated by developing and evaluating automatic methods with disabled readers, through participatory design processes [67].

## 6 Discussion and Future Work

Our four-level model of semantic content — and its application to evaluating the usefulness of descriptions — has practical implications for the design of accessible data representations, and theoretical implications for the relationship between visualization and natural language.

### 6.1 Natural Language As An Interface Into Visualization

Divergent reader preferences for semantic content suggests that it is helpful to think of natural language — not only as an interface for constructing and exploring visualizations [93, 36, 89] — but also as an interface into visualization, for understanding the semantic content they convey. Under this framing, we can apply Beaudoin-Lafon’s framework for evaluating interface models in terms of their descriptive, evaluative, and generative powers [7, 8], to bring further clarity to the practical design implications of our model. First, our grounded theory process yielded a model with descriptive power: it categorizes the semantic content conveyed by visualizations. Second, our study with blind and sighted readers demonstrated our model’s evaluative power: it offered a means of comparing different levels of semantic content, thus revealing divergent preferences between these different reader groups. Third, future work can now begin to study our model’s generative power: its implications for novel multimodal interfaces and accessible data representations. For instance, our evaluation suggested that descriptions primarily intending to benefit sighted readers might aim to generate higher-level semantic content (§ 5.4.2), while those intending to benefit blind readers might instead focus on affording readers the option to customize and combine different content levels (§ 5.4.4), depending on their individual preferences (§ 5.4.3). This latter path might involve automatically ARIA tagging web-based charts to surface semantic content at Levels 1 & 2, with human-authors conveying Level 3 content. Or, it might involve applying our model to develop and evaluate the outputs of automatic captioning systems — to probe their technological capabilities and ethical implications — in collaboration with the relevant communities (§ 5.4.5). To facilitate this work, we have released our corpus of visualizations and labeled sentences under an open source license: vis.csail.mit.edu/pubs/vis-text-model/data/.

### 6.2 Natural Language As Coequal With Visualization

In closing, we turn to a discussion of our model’s implications for visualization theory. Not only can we think of natural language as an interface into visualization (as above), but also as an interface into data itself; coequal with and complementary to visualization. For example, some semantic content (e.g., Level 2 statistics or Level 4 explanations) may be best conveyed via language, without any reference to visual modalities [82, 43], while other content (e.g., Level 3 clusters) may be uniquely suited to visual representation. This coequal framing is not a departure from orthodox visualization theory, but rather a return to its linguistic and semiotic origins. Indeed, at the start of his foundational Semiology of Graphics, Jacques Bertin introduces a similar framing to formalize an idea at the heart of visualization theory: content can be conveyed not only through speaking or writing but also through the “language” of graphics [12]. While Bertin took natural language as a point of departure for formalizing a language of graphics, we have here pursued the inverse: taking visualization as occasioning a return to language. This theoretical inversion opens avenues for future work, for which linguistic theory and semiotics are instructive [103, 68, 97].

Within the contemporary linguistic tradition, subfields like syntax, semantics, and pragmatics suggest opportunities for further analysis at each level of our model. And, since our model focuses on English sentences and canonical chart types, extensions to other languages and bespoke charts may be warranted. Within the semiotic tradition, Christian Metz (a contemporary of Bertin’s) emphasized the pluralistic quality of graphics [18]: the semantic content conveyed by visualizations depends not only on their graphical sign-system, but also on various “social codes” such as education, class, expertise, and — we hasten to include — ability. Our evaluation with blind and sighted readers (as well as work studying how charts are deployed in particular discourse contexts [44, 46, 62, 3]) lends credence to Metz’s conception of graphics as pluralistic: different readers will have different ideas about what makes visualizations meaningful (Fig. Teaser). As a means of revealing these differences, we have here introduced a four-level model of semantic content. We leave further elucidation of the relationship between visualization and natural language to future work.

## Acknowledgements

For their valuable feedback, we thank Emilie Gossiaux, Chancey Fleet, Michael Correll, Frank Elavsky, Beth Semel, Stephanie Tuerk, Crystal Lee, and the MIT Visualization Group. This work was supported by National Science Foundation GRFP-1122374 and III-1900991.

## References

• [1] P. Ackland, S. Resnikoff, and R. Bourne (2017) World Blindness and Visual Impairment. Community Eye Health. ISSN 0953-6833, Link Cited by: §1.
• [2] E. Adar and E. Lee (2020) Communicative Visualizations as a Learning Problem. In TVCG, (en). Cited by: §2.3.2.
• [3] G. Aiello (2020) Inventorizing, Situating, Transforming: Social Semiotics And Data Visualization. In Data Visualization in Society, M. Engebretsen and H. Kennedy (Eds.), Cited by: §6.2.
• [4] K. M. Ali (2016) Mind-Dependent Kinds. In Journal of Social Ontology, Cited by: §4.3.
• [5] R. Amar, J. Eagan, and J. Stasko (2005) Low-level Components Of Analytic Activity In Information Visualization. In INFOVIS, Cited by: §2.3.1.
• [6] A. Balaji, T. Ramanathan, and V. Sonathi (2018) Chart-Text: A Fully Automated Chart Image Descriptor. arXiv. Cited by: §2.1.1.
• [7] M. Beaudouin-Lafon (2000) Instrumental Interaction: An Interaction Model For Designing Post-WIMP User Interfaces. In CHI, , Link Cited by: §1, §6.1.
• [8] M. Beaudouin-Lafon (2004) Designing Interaction, Not Interfaces. In AVI, , Link Cited by: §1, §6.1.
• [9] Benetech Making Images Accessible. (en-US). Note: http://diagramcenter.org/making-images-accessible.html/ Cited by: §2.2.1.
• [10] C. L. Bennett, C. Gleason, M. K. Scheuerman, J. P. Bigham, A. Guo, and A. To (2021) “It’s Complicated”: Negotiating Accessibility and (Mis)Representation in Image Descriptions of Race, Gender, and Disability. In CHI, Cited by: §2.2.2.
• [11] C. T. Bergstrom (2020) SARS-CoV-2 Coronavirus. Note: http://ctbergstrom.com/covid19.html Cited by: Teaser.
• [12] J. Bertin (1983) Semiology of Graphics. University of Wisconsin Press. Cited by: §1, §4.1, §6.2.
• [13] J. P. Bigham, C. Jayant, H. Ji, G. Little, A. Miller, R. C. Miller, R. Miller, A. Tatarowicz, B. White, S. White, and T. Yeh (2010) VizWiz: Nearly Real-time Answers To Visual Questions. In UIST, Cited by: §2.2.2.
• [14] J. P. Bigham, I. Lin, and S. Savage (2017) The Effects of ”Not Knowing What You Don’t Know” on Web Accessibility for Blind Web Users. In ASSETS, Cited by: §5.2.
• [15] M. A. Borkin, Z. Bylinskii, N. W. Kim, C. M. Bainbridge, C. S. Yeh, D. Borkin, H. Pfister, and A. Oliva (2016) Beyond Memorability: Visualization Recognition and Recall. In TVCG, Cited by: §3.2.1.
• [16] M. A. Borkin, A. A. Vo, Z. Bylinskii, P. Isola, S. Sunkavalli, A. Oliva, and H. Pfister (2013) What Makes a Visualization Memorable?. In TVCG, Cited by: §3.2.1.
• [17] M. Brehmer and T. Munzner (2013) A Multi-Level Typology of Abstract Visualization Tasks. In TVCG, (en). Cited by: §2.3.1.
• [18] A. Campolo (2020) Signs and Sight: Jacques Bertin and the Visual Language of Structuralism. Grey Room. Cited by: §6.2.
• [19] A. Cesal (2020-08) Writing Alt Text for Data Visualization. (en). Cited by: §2.2.1.
• [20] R. Chang, C. Ziemkiewicz, T. M. Green, and W. Ribarsky (2009) Defining Insight for Visual Analytics. In CG&A, Cited by: §4.4.
• [21] A. Chaparro and M. Chaparro (2017) Applications of Color in Design for Color-Deficient Users. Ergonomics in Design (en). ISSN 1064-8046, Link Cited by: §1.
• [22] C. Chen, R. Zhang, S. Kim, S. Cohen, T. Yu, R. Rossi, and R. Bunescu (2019) Neural Caption Generation Over Figures. In UbiComp/ISWC ’19 Adjunct, Cited by: §2.1.1.
• [23] C. Chen, R. Zhang, E. Koh, S. Kim, S. Cohen, and R. Rossi (2020) Figure Captioning with Relation Maps for Reasoning. In WACV, (en). Cited by: §2.1.1.
• [24] C. Chen, R. Zhang, E. Koh, S. Kim, S. Cohen, T. Yu, R. Rossi, and R. Bunescu (2019) Figure Captioning with Reasoning and Sequence-Level Training. arXiv. Cited by: §2.1.1.
• [25] J. Choi, S. Jung, D. G. Park, J. Choo, and N. Elmqvist (2019) Visualizing for the Non-Visual: Enabling the Visually Impaired to Use Visualization. In CGF, (en). Cited by: §1, §2.2.
• [26] S. Demir, S. Carberry, and K. F. McCoy (2008) Generating textual summaries of bar chartsGenerating Textual Summaries Of Bar Charts. In INLG, (en). Cited by: §4.2.
• [27] S. Demir, S. Carberry, and K. F. McCoy (2012) Summarizing Information Graphics Textually. In Computational Linguistics, Cited by: §1, §2.2, §2.3.1, §5.4.5.
• [28] M. Ehrenkranz (2020) Vital Coronavirus Information Is Failing the Blind and Visually Impaired. Vice. Cited by: §1, Teaser.
• [29] F. Elavsky (2021) Chartability. Note: https://chartability.fizz.studio/ Cited by: §2.2.1.
• [30] S. Elzer, S. Carberry, D. Chester, S. Demir, N. Green, I. Zukerman, and K. Trnka (2005) Exploring And Exploiting The Limited Utility Of Captions In Recognizing Intention In Information Graphics. In ACL, Cited by: §2.2.
• [31] S. Elzer, E. Schwartz, S. Carberry, D. Chester, S. Demir, and P. Wu (2007) A Browser Extension For Providing Visually Impaired Users Access To The Content Of Bar Charts On The Web. In WEBIST, Cited by: §2.2.
• [32] C. Fisher (2019) Creating Accessible SVGs. (en-US). Cited by: §2.2.1.
• [33] C. Fleet (2021) Things which garner a ton of glowing reviews from mainstream outlets without being of much use to disabled people. For instance, Facebook’s auto image descriptions, much loved by sighted journos but famously useless in the Blind community. Twitter. Note: https://twitter.com/ChanceyFleet/status/1349211417744961536 Cited by: §5.4.5.
• [34] S. L. Fossheim (2020) An Introduction To Accessible Data Visualizations With D3.js. (en). Cited by: §2.2.1.
• [35] M. Galesic and R. Garcia-Retamero (2011) Graph Literacy: A Cross-cultural Comparison. In Medical Decision Making, Cited by: §2.3.1.
• [36] T. Gao, M. Dontcheva, E. Adar, Z. Liu, and K. G. Karahalios (2015) DataTone: Managing Ambiguity in Natural Language Interfaces for Data Visualization. In UIST, Cited by: §6.1.
• [37] S. García, A. Fernández, J. Luengo, and F. Herrera (2010) Advanced Nonparametric Tests For Multiple Comparisons In The Design Of Experiments In Computational Intelligence And Data Mining: Experimental Analysis Of Power. Information Sciences (en). Cited by: §5.3.
• [38] B. Geveci, W. Schroeder, A. Brown, and G. Wilson (2012) VTK. The Architecture of Open Source Applications. Cited by: §2.1.2.
• [39] B. Gould, T. O’Connell, and G. Freed (2008) Effective Practices for Description of Science Content within Digital Talking Books. Technical report The WGBH National Center for Accessible Media (en). Note: https://www.wgbh.org/foundation/ncam/guidelines/effective-practices-for-description-of-science-content-within-digital-talking-books Cited by: §1, §2.2.1, §5.1.
• [40] M. Hanley, S. Barocas, K. Levy, S. Azenkot, and H. Nissenbaum (2021) Computer Vision and Conflicting Values: Describing People with Automated Alt Text. arXiv. Cited by: §5.4.5.
• [41] L. Hasty, J. Milbury, I. Miller, A. O’Day, P. Acquinas, and D. Spence (2011) Guidelines and Standards for Tactile Graphics. Technical report Braille Authority of North America. Note: http://www.brailleauthority.org/tg/ Cited by: §1.
• [42] M. Hearst, M. Tory, and V. Setlur (2019) Toward Interface Defaults for Vague Modifiers in Natural Language Interfaces for Visual Analysis. In VIS, Cited by: §2.3.2.
• [43] M. Hearst and M. Tory (2019) Would You Like A Chart With That? Incorporating Visualizations into Conversational Interfaces. In VIS, Cited by: §2.3.2, §6.2.
• [44] J. Hullman and N. Diakopoulos (2011) Visualization Rhetoric: Framing Effects in Narrative Visualization. In TVCG, (en). Cited by: §6.2.
• [45] J. Hullman, N. Diakopoulos, and E. Adar (2013) Contextifier: automatic generation of annotated stock visualizations. In CHI, Cited by: §4.4.
• [46] J. Hullman, N. Diakopoulos, E. Momeni, and E. Adar (2015) Content, Context, and Critique: Commenting on a Data Visualization Blog. In CSCW, Cited by: §6.2.
• [47] S. E. Kahou, V. Michalski, A. Atkinson, A. Kadar, A. Trischler, and Y. Bengio (2018) FigureQA: An Annotated Figure Dataset for Visual Reasoning. arXiv. Cited by: §2.1.1, §4.1.
• [48] A. Karpathy and L. Fei-Fei (2017-04) Deep Visual-Semantic Alignments for Generating Image Descriptions. In TPAMI, Cited by: §2.1.1.
• [49] D. A. Keim and D. Oelke (2007) Literature Fingerprinting: A New Method for Visual Literary Analysis. In VAST, Cited by: Figure 1, §3.2.2.
• [50] D. H. Kim, E. Hoque, and M. Agrawala (2020-04) Answering Questions about Charts and Generating Visual Explanations. In CHI, (en). Cited by: §2.3.2.
• [51] D. H. Kim, E. Hoque, J. Kim, and M. Agrawala (2018) Facilitating Document Reading by Linking Text and Tables. In UIST, Cited by: §2.3.2.
• [52] D. H. Kim, V. Setlur, and M. Agrawala (2021) Towards Understanding How Readers Integrate Charts and Captions: A Case Study with Line Charts. In CHI, (en). Cited by: §2.3.2.
• [53] N. W. Kim, S. C. Joyner, A. Riegelhuth, and Y. Kim (2021) Accessible Visualization: Design Space, Opportunities, and Challenges. In CGF, Cited by: §1.
• [54] H. Kong, Z. Liu, and K. Karahalios (2018-04) Frames and Slants in Titles of Visualizations on Controversial Topics. In CHI, (en). Cited by: §2.3.2.
• [55] H. Kong, Z. Liu, and K. Karahalios (2019-05) Trust and Recall of Information across Varying Degrees of Title-Visualization Misalignment. In CHI, (en). Cited by: §2.3.2.
• [56] N. Kong, M. A. Hearst, and M. Agrawala (2014) Extracting References Between Text And Charts Via Crowdsourcing. In CHI, Cited by: §2.2.2.
• [57] S. M. Kosslyn (1989) Understanding Charts and Graphs. Applied Cognitive Psychology (en). Cited by: §2.3.1.
• [58] R. Krishna, Y. Zhu, O. Groth, J. Johnson, K. Hata, J. Kravitz, S. Chen, Y. Kalantidis, L. Li, D. A. Shamma, M. S. Bernstein, and L. Fei-Fei (2017) Visual Genome: Connecting Language and Vision Using Crowdsourced Dense Image Annotations. In IJCV, Cited by: §2.1.1.
• [59] C. Lai, Z. Lin, R. Jiang, Y. Han, C. Liu, and X. Yuan (2020) Automatic Annotation Synchronizing with Textual Description for Visualization. In CHI, (en). Cited by: §2.1.1.
• [60] P. Law, A. Endert, and J. Stasko (2020-08) What are Data Insights to Professional Visualization Users?. arXiv. Cited by: §4.4.
• [61] P. Law, A. Endert, and J. Stasko (2020) Characterizing Automated Data Insights. arXiv. Cited by: §4.4.
• [62] C. Lee, T. Yang, G. Inchoco, G. M. Jones, and A. Satyanarayan (2021) Viral Visualizations: How Coronavirus Skeptics Use Orthodox Data Practices to Promote Unorthodox Science Online. In CHI, Cited by: §6.2.
• [63] S. Lee, S. Kim, and B. C. Kwon (2016) Vlat: Development Of A Visualization Literacy Assessment Test. In TVCG, Cited by: §2.3.1.
• [64] T. Lin, M. Maire, S. Belongie, J. Hays, P. Perona, D. Ramanan, P. Dollár, and C. L. Zitnick (2014) Microsoft COCO: Common Objects in Context. In ECCV, (en). Cited by: §2.1.1.
• [65] T. Littlefield (2020) COVID-19 Statistics Tracker. (en). Note: https://cvstats.net Cited by: §1.
• [66] M. A. Livingston and D. Brock (2020) Position: Visual Sentences: Definitions and Applications. In VIS, Cited by: §2.3.1.
• [67] A. Lundgard, C. Lee, and A. Satyanarayan (2019-10) Sociotechnical Considerations for Accessible Visualization Design. In VIS, Cited by: §1, §2.2, §5.2.1, §5.4.5.
• [68] A. M. MacEachren, R. E. Roth, J. O’Brien, B. Li, D. Swingley, and M. Gahegan (2012) Visual Semiotics Uncertainty Visualization: An Empirical Study. In TVCG, Cited by: §6.2.
• [69] H. MacLeod, C. L. Bennett, M. R. Morris, and E. Cutrell (2017) Understanding Blind People’s Experiences with Computer-Generated Captions of Social Media Images. In CHI, Cited by: §2.1.1, §5.4.5.
• [70] J. Matejka and G. Fitzmaurice (2017) Same Stats, Different Graphs: Generating Datasets with Varied Appearance and Identical Statistics through Simulated Annealing. In CHI, (en). Cited by: §4.3.
• [71] P. Moraes, G. Sina, K. McCoy, and S. Carberry (2014) Evaluating The Accessibility Of Line Graphs Through Textual Summaries For Visually Impaired Users. In ASSETS, Cited by: §2.1.1, §2.2.
• [72] V. S. Morash, Y. Siu, J. A. Miele, L. Hasty, and S. Landau (2015) Guiding Novice Web Workers in Making Image Descriptions Using Templates. In TACCESS, Cited by: §2.1.1, §2.2.2, §4.3.
• [73] M. R. Morris, J. Johnson, C. L. Bennett, and E. Cutrell (2018) Rich Representations of Visual Content for Screen Reader Users. In CHI, Cited by: §2.2.
• [74] M. Muller (2014) Curiosity, Creativity, and Surprise as Analytic Tools: Grounded Theory Method. In Ways of Knowing in HCI, J. S. Olson and W. A. Kellogg (Eds.), Cited by: §3.1, §3.
• [75] A. Narechania, A. Srinivasan, and J. Stasko (2021) NL4DV: A Toolkit for Generating Analytic Specifications for Data Visualization from Natural Language Queries. In TVCG, (en). Cited by: §2.3.2.
• [76] C. North (2006) Toward Measuring Visualization Insight. In CG&A, Cited by: §4.4.
• [77] J. R. Nuñez, C. R. Anderton, and R. S. Renslow (2018) Optimizing Colormaps With Consideration For Color Vision Deficiency To Enable Accurate Interpretation Of Scientific Data. PLOS ONE (en). Cited by: §1.
• [78] J. Obeid and E. Hoque (2020) Chart-to-Text: Generating Natural Language Descriptions for Charts by Adapting the Transformer Model. arXiv. Cited by: §1, §2.1.1, §2.2, §4.2, §4.3, §5.1, §5.4.5.
• [79] M. M. Oliveira (2013) Towards More Accessible Visualizations for Color-Vision-Deficient Individuals. In CiSE, Cited by: §1.
• [80] A. Ottley, A. Kaszowska, R. J. Crouser, and E. M. Peck (2019) The Curious Case of Combining Text and Visualization. In EuroVis, (en). Cited by: §2.3.2.
• [81] J. Poco and J. Heer (2017) Reverse-Engineering Visualizations: Recovering Visual Encodings from Chart Images. In CGF, Cited by: §3.2.1, §4.1.
• [82] V. Potluri, T. E. Grindeland, J. E. Froehlich, and J. Mankoff (2021) Examining Visual Semantic Understanding in Blind and Low-Vision Technology Users. In CHI, Cited by: §6.2.
• [83] X. Qian, E. Koh, F. Du, S. Kim, J. Chan, R. A. Rossi, S. Malik, and T. Y. Lee (2021) Generating Accurate Caption Units for Figure Captioning. In WWW, (en). Cited by: §2.1.1, §2.2, §5.4.5.
• [84] X. Qian, E. Koh, F. Du, S. Kim, and J. Chan (2020) A Formative Study on Designing Accurate and Natural Figure Captioning Systems. In CHI EA, Cited by: §1, §2.1.1, §2.2.2, §2.2, §2.3.1, §5.4.5.
• [85] J. M. Royer, C. N. Hastings, and C. Hook (1979) A Sentence Verification Technique For Measuring Reading Comprehension. Journal of Reading Behavior. Cited by: §2.3.1.
• [86] J. M. Royer (2001) Developing Reading And Listening Comprehension Tests Based On The Sentence Verification Technique (SVT). In Journal of Adolescent & Adult Literacy, Cited by: §2.3.1.
• [87] A. Satyanarayan, D. Moritz, K. Wongsuphasawat, and J. Heer (2017) Vega-Lite: A Grammar of Interactive Graphics. In TVCG, Cited by: §2.1.2, §3.2.1, §4.1.
• [88] D. Schepers (2020) Why Accessibility Is at the Heart of Data Visualization. (en). Cited by: §2.2.1.
• [89] V. Setlur, S. E. Battersby, M. Tory, R. Gossweiler, and A. X. Chang (2016) Eviza: A Natural Language Interface for Visual Analysis. In UIST, Cited by: §6.1.
• [90] V. Setlur, M. Tory, and A. Djalali (2019) Inferencing Underspecified Natural Language Utterances In Visual Analysis. In IUI, Cited by: §2.3.2.
• [91] A. Sharif, S. S. Chintalapati, J. O. Wobbrock, and K. Reinecke (2021) Understanding Screen-Reader Users’ Experiences with Online Data Visualizations. In ASSETS, Cited by: §2.2.
• [92] A. Srinivasan, S. M. Drucker, A. Endert, and J. Stasko (2019) Augmenting Visualizations with Interactive Data Facts to Facilitate Interpretation and Communication. In TVCG, (en). Cited by: §3.2.2, §4.2, §4.2.
• [93] A. Srinivasan, N. Nyapathy, B. Lee, S. M. Drucker, and J. Stasko (2021) Collecting and Characterizing Natural Language Utterances for Specifying Data Visualizations. In CHI, (en). Cited by: §2.3.2, §6.1.
• [94] H. Sutton (2020) Accessible Covid-19 Tracker Enables A Way For Visually Impaired To Stay Up To Date. Disability Compliance for Higher Education (en). Cited by: §1.
• [95] B. Tang, S. Han, M. L. Yiu, R. Ding, and D. Zhang (2017) Extracting Top-K Insights from Multi-dimensional Data. In SIGMOD, (en). Cited by: §4.4.
• [96] B. D. Team (2014) Bokeh: Python Library For Interactive Visualization. Bokeh Development Team. Cited by: §2.1.2.
• [97] P. Vickers, J. Faith, and N. Rossiter (2013) Understanding Visualization: A Formal Approach Using Category Theory and Semiotics. In TVCG, Cited by: §6.2.
• [98] O. Vinyals, A. Toshev, S. Bengio, and D. Erhan (2015) Show and Tell: A Neural Image Caption Generator. In CVPR, (en). Cited by: §2.1.1.
• [99] W3C (2019) WAI Web Accessibility Tutorials: Complex Images. Note: https://www.w3.org/WAI/tutorials/images/complex/ Cited by: §1, §2.1.2, §2.2.1.
• [100] Y. Wang, Z. Sun, H. Zhang, W. Cui, K. Xu, X. Ma, and D. Zhang (2020) DataShot: Automatic Generation of Fact Sheets from Tabular Data. In TVCG, Cited by: §3.2.2, §4.2.
• [101] L. Watson (2017) Accessible SVG Line Graphs. (en). Note: https://tink.uk/accessible-svg-line-graphs/ Cited by: §2.2.1.
• [102] L. Watson (2018) Accessible SVG Flowcharts. (en). Cited by: §2.2.1.
• [103] W. Weber (2019) Towards a Semiotics of Data Visualization – an Inventory of Graphic Resources. In IV, Cited by: §6.2.
• [104] L. Wilkinson (2005) The Grammar of Graphics. Statistics and Computing, Springer-Verlag (en). Cited by: §4.1.
• [105] K. Wu, E. Petersen, T. Ahmad, D. Burlinson, S. Tanis, and D. A. Szafir (2021) Understanding Data Accessibility for People with Intellectual and Developmental Disabilities. In CHI 2021, (en). Cited by: §1.
• [106] C. Xiong, L. V. Weelden, and S. Franconeri (2020) The Curse of Knowledge in Visual Data Communication. In TVCG, Cited by: §2.3.2.
• [107] K. Xu, J. Ba, R. Kiros, K. Cho, A. Courville, R. Salakhutdinov, R. Zemel, and Y. Bengio (2016) Show, Attend and Tell: Neural Image Caption Generation with Visual Attention. arXiv. Cited by: §2.1.1.
• [108] J. S. Yi, Y. Kang, J. T. Stasko, and J. A. Jacko (2008) Understanding and Characterizing Insights: How Do People Gain Insights Using Information Visualization?. In BELIV, (en). Cited by: §4.4.