Summary of The Surveillance AI Pipeline Analyzing Research and Patents

Summary The Surveillance AI Pipeline Analyzing Research and Patents arxiv.org

13,440 words - PDF document - View PDF document

One Line

The study uncovers the extensive use of human data extraction in surveillance technologies by elite universities and big tech companies, emphasizing the need for regulation and public involvement.

Slides

Slide Presentation (12 slides)

Copy slides outline Copy embed code Download as Word

The Surveillance AI Pipeline: Uncovering the Expansion of Mass Surveillance

Source: arxiv.org - PDF - 13,440 words - view

Introduction

• The Surveillance AI Pipeline: Analyzing Research and Patents

• Uncovering the connection between computer vision research and surveillance technologies

• Emphasizing the need for regulation and public involvement

Computer Vision Research and Mass Surveillance

• Computer vision research in AI contributes to the expansion of mass surveillance

• Analysis of three decades of research papers and patents reveals prevalence of human data extraction

• Human bodies and body parts are the focus of data extraction in computer vision technology

Involvement of Elite Universities and Big Tech Corporations

• Elite universities and big tech corporations are implicated in surveillance patents

• Challenges the perception that only a few entities contribute to surveillance

• Prolific institutions, nations, and subfields author computer vision papers with downstream patents

Increase in Computer Vision Papers Used in Surveillance Patents

• Significant increase in the use of computer vision research in surveillance patents over the years

• More than five-fold increase between the 1990s and 2010s

• Shift towards analyzing humans and semantic categories in computer vision research

Obfuscation of Language in Computer Vision Documents

• Language in computer vision papers and patents downplays or hides the extent of surveillance

• Terms like "objects" used to refer to humans, minimizing acknowledgment of human data extraction

• Figures and datasets may contain images of humans without explicit mention or discussion

Foundational Role of Computer Vision in Surveillance AI

• Perception of computer vision research as a neutral pursuit is challenged

• Progress in computer vision is closely tied to the expansion of Surveillance AI

• Recognition of social and ethical implications of computer vision technologies is crucial

Insights into Institutions, Nations, and Subfields Involved in Surveillance Patents

• Study provides insights into institutions, nations, and subfields contributing to surveillance patents

• Majority of computer vision papers with downstream patents are used in surveillance patents

• Implications for communities, policymakers, researchers, and the public to organize against surveillance

Methodology

• Analysis of papers and patents from Conference on Computer Vision and Pattern Recognition (CVPR)

• Data gathered using Microsoft Academic Graph, paper-patent citation linkages, and Google Patents

• Content analysis conducted by a team of experts using an inductive-deductive methodology

Large-Scale Computational Analysis

• Computational analysis of over 40,000 papers and patents

• Surveillance indicator words used to identify patents related to surveillance

• Changes in the focus of papers and patents over the years

Background on Surveillance and Computer Vision

• Surveillance as a technology of social control perpetuating inequalities

• Computer vision's rapid rise and lack of considerations for consent, privacy, and negative stereotypes

• Criticisms of efficiency, universality, and impartiality in the field

Understanding the Surveillance AI Pipeline

• Computer vision research contributes to the expansion of mass surveillance

• Elite universities and big tech corporations are involved in surveillance patents

• Increase in computer vision papers used in surveillance patents over the years

• Obfuscation of language in computer vision documents downplays the extent of surveillance

• Recognizing the foundational role of computer vision in Surveillance AI is crucial

Key Points

Computer vision research in the field of artificial intelligence (AI) is contributing to the expansion of mass surveillance.
An analysis of three decades of computer vision research papers and patents reveals that human data extraction is prevalent in computer vision technology.
Elite universities and "big tech" corporations are involved in surveillance patents, challenging the perception that only a few entities contribute to surveillance.
The number of computer vision papers used in surveillance patents has significantly increased over the years.
Language in computer vision papers and patents often downplays or hides the extent of surveillance, using terms like "objects" to refer to humans.
Computer vision research is foundational to the paradigm of surveillance and has social and ethical implications.
The study provides insights into the institutions, nations, and subfields involved in surveillance patents and exposes obfuscating language used in computer vision documents.
The methodology involved analyzing papers and patents, identifying key dimensions of surveillance AI, and conducting a large-scale computational analysis.

Summaries

30 word summary

The study examines 40,000 research papers and patents, revealing widespread human data extraction in surveillance technologies. It implicates elite universities and big tech companies, calling for regulation and public influence.

61 word summary

"The Surveillance AI Pipeline" analyzes over 40,000 computer vision research papers and patents to expose the prevalence of human data extraction in surveillance technologies. Elite universities and "big tech" companies are implicated in thousands of surveillance patents, challenging the notion that only a few rogue entities are responsible. The study calls for critical examination, regulation, and public influence over surveillance technologies.

159 word summary

"The Surveillance AI Pipeline" is a research paper that examines the relationship between computer vision research and the development of surveillance technologies. The study analyzes over 40,000 computer vision research papers and patents to uncover the prevalence of human data extraction in computer vision. Elite universities and "big tech" corporations are implicated in thousands of surveillance patents, challenging the perception that only a few rogue entities are responsible for surveillance. The research reveals a significant increase in the use of computer vision research in surveillance patents over time. The authors also highlight the use of obfuscating language in computer vision papers and patents to downplay the extent of surveillance. The findings emphasize the need for critical examination of the field and recognition of the social and ethical implications of computer vision technologies. The study aims to empower communities to organize against surveillance, guide policymakers in regulation, shape the research agenda, and enable public knowledge and influence over surveillance technologies.

580 word summary

"The Surveillance AI Pipeline" is a research paper that examines the relationship between computer vision research and the development of surveillance technologies. The authors argue that computer vision, specifically in the field of artificial intelligence (AI), is contributing to the expansion of mass surveillance. They aim to uncover the pathway from computer vision research to surveillance applications and shed light on the normalization of Surveillance AI.

The study analyzes over 40,000 computer vision research papers and downstream patents spanning three decades. The authors find that the majority of these papers and patents focus on extracting data about humans, particularly human bodies and body parts. They present quantitative and qualitative analysis to demonstrate the prevalence of human data extraction in computer vision.

The research also investigates the institutions involved in computer vision research and their connection to surveillance patents. Elite universities and "big tech" corporations, which contribute significantly to computer vision research, are cited in thousands of surveillance patents. This challenges the notion that only a few rogue entities are responsible for surveillance, as many institutions, nations, and subfields involved in computer vision research are implicated.

The study reveals a significant increase in the use of computer vision research in surveillance patents over time, with more than a five-fold increase between the 1990s and 2010s. The linguistic analysis of paper titles also shows a shift towards a greater focus on analyzing humans and semantic categories in computer vision.

The authors highlight the use of obfuscating language in computer vision papers and patents, which downplay or hide the extent of surveillance. Terms like "objects" are used to refer to humans, minimizing the acknowledgment of human data extraction. Images of humans may be included in figures and datasets without explicit mention in the text, further obscuring the connection to surveillance.

The findings challenge the perception of computer vision research as a neutral pursuit and emphasize its foundational role in surveillance. The authors argue for a critical examination of the field and recognition of the social and ethical implications of computer vision technologies.

The research contributes to understanding the widespread extraction of human data in computer vision and the normalization of Surveillance AI. It provides insights into the institutions, nations, and subfields involved in surveillance patents and exposes the obfuscating language used in computer vision papers and patents. The authors hope that their findings will empower communities to organize against surveillance, help policymakers identify regulatory targets, guide researchers in shaping the research agenda, and enable the public to have knowledge and influence over surveillance technologies.

The study employed a methodology that involved analyzing papers and patents from the Conference on Computer Vision and Pattern Recognition (CVPR) from 1990-2020. Various databases were used to gather the data, and content analysis was conducted by a team of experts. The researchers identified key dimensions of surveillance AI and found that the majority of papers and patents focused on extracting data about humans, with an emphasis on human bodies and body parts.

Large-scale computational analysis of over 40,000 papers and patents was conducted to study the breadth and variation of surveillance across years, institutions, nations, and subfields. Surveillance indicator words were identified, and the corpus was scanned for patents containing these words. The majority of papers with downstream patents were found to be used in surveillance patents.

The study also provided background information on surveillance and computer vision, highlighting the impact of surveillance on marginalized communities and ethical concerns surrounding dataset collection and curation practices in computer vision.

696 word summary

"The Surveillance AI Pipeline" is a research paper that explores the connection between computer vision research and the development of surveillance technologies. The authors argue that computer vision, particularly in the field of artificial intelligence (AI), is contributing to the expansion of mass surveillance. They aim to uncover the pathway from computer vision research to surveillance applications and shed light on the normalization of Surveillance AI.

The study analyzes three decades of computer vision research papers and downstream patents, totaling over 40,000 documents. The authors find that the majority of annotated computer vision papers and patents report their technology enables the extraction of data about humans, specifically human bodies and body parts. They present both quantitative and qualitative analysis to demonstrate the prevalence of human data extraction in computer vision.

The research also examines the institutions that produce computer vision research and their involvement in surveillance patents. Elite universities and "big tech" corporations, which are prolific in computer vision research, are cited in thousands of surveillance patents. This challenges the narrative that only a few rogue entities contribute to surveillance, as the majority of institutions, nations, and subfields that author computer vision papers with downstream patents are implicated in surveillance.

The study reveals a significant increase in the use of computer vision research in surveillance patents over the years. Between the 1990s and 2010s, there has been a more than five-fold increase. The linguistic analysis of paper titles also shows a shift towards an increased focus on analyzing humans and semantic categories in the field of computer vision.

The authors highlight the obfuscation of language in computer vision papers and patents, which often downplay or hide the extent of surveillance. Terms like "objects" are used to refer to humans, minimizing the acknowledgment of human data extraction. Figures and datasets may contain images of humans without explicit mention or discussion in the text, further obscuring the connection to surveillance.

The findings challenge the perception of computer vision research as a neutral pursuit and reveal its foundational role in the paradigm of surveillance. The authors argue that progress in computer vision is closely tied to the expansion of Surveillance AI. They emphasize the need for a critical examination of the field and a recognition of the social and ethical implications of computer vision technologies.

The research contributes to the understanding of the pervasive extraction of human data in computer vision and the normalization of Surveillance AI. It provides insights into the institutions, nations, and subfields involved in surveillance patents and exposes the obfuscating language used in computer vision papers and patents. The authors hope that their findings will serve as a tool for communities to organize against surveillance, for policymakers to identify regulatory targets, for researchers to shape the research agenda, and for the public to exercise knowledge and power over surveillance technologies.

The methodology involved analyzing papers and patents from the Conference on Computer Vision and Pattern Recognition (CVPR) from 1990-2020. The researchers used various databases to gather the data and employed a team of experts to conduct content analysis. They identified key dimensions of surveillance AI, including data type, transferal, and use. They found that the majority of papers and patents extracted data relating to humans, with a focus on human bodies and body parts.

The analysis of the evolution across years showed that the number of computer vision papers with downstream patents stabilized in the early 2000s and remained above 200 every year until 2018 when it suddenly dropped by nearly half. The linguistic evolution demonstrated changes in the focus of papers and patents, with certain words becoming more polarized in their association with surveillance.

The study also provided additional background on surveillance and computer vision, highlighting the impact of surveillance on marginalized communities and the ethical concerns surrounding dataset collection and curation practices in computer vision.

969 word summary

The Surveillance AI Pipeline is a research paper that analyzes the connection between computer vision research and the development of surveillance technologies. The authors argue that computer vision, particularly in the field of artificial intelligence (AI), is contributing to the expansion of mass surveillance. They aim to uncover the pathway from computer vision research to surveillance applications and shed light on the normalization of Surveillance AI.

The study analyzes three decades of computer vision research papers and downstream patents, totaling over 40,000 documents. The authors find that the large majority of annotated computer vision papers and patents self-report their technology enables the extraction of data about humans, specifically human bodies and body parts. They present both quantitative and qualitative analysis to demonstrate the prevalence of human data extraction in computer vision.

The study reveals a significant increase in the number of computer vision papers used in surveillance patents over the years. Between the 1990s and 2010s, there has been a more than five-fold increase in the use of computer vision research in surveillance patents. The linguistic analysis of paper titles also shows a shift towards an increased focus on analyzing humans and semantic categories in the field of computer vision.

The findings of this study challenge the perception of computer vision research as a neutral pursuit and reveal its foundational role in the paradigm of surveillance. The authors argue that progress in computer vision is closely tied to the expansion of Surveillance AI. They emphasize the need for a critical examination of the field and a recognition of the social and ethical implications of computer vision technologies.

The methodology of the study involved analyzing papers and patents from the Conference on Computer Vision and Pattern Recognition (CVPR) from 1990-2020. The researchers used the Microsoft Academic Graph, paper-patent citation linkages, and the Google Patents database to gather the data. The content analysis was conducted using a team of six experts who used an inductive-deductive methodology to analyze the documents. They identified key dimensions of surveillance AI, including data type, data transferal, and use of data. They found that the majority of papers and patents extracted data relating to humans, with a focus on human bodies and body parts. They also analyzed the transferal of human data, finding that it is often transferred to other institutions and not under the control of the datafied person. The researchers also examined the institutional use of data and identified three categories: modeling or categorizing humans, soft influence, and hard control. They found that many subfields of computer vision contribute to Surveillance AI, even ones not explicitly connected to human data.

The study also conducted a large-scale computational analysis of over 40,000 papers and patents to study the breadth and variation of surveillance across years, institutions, nations, and subfields. The researchers identified surveillance indicator words and scanned the corpus for patents containing these words. They validated each word through manual inspection and created a list of approved surveillance indicator words. They then scanned each paper's downstream patents to identify patents containing these words. They found that the majority of papers with downstream patents were used in surveillance patents.

The analysis of the evolution across years showed that the number of computer vision papers with downstream patents stabilized in the early 2000s and remained above 200 every year until 2018 when it suddenly dropped by nearly half. To study the linguistic evolution, the researchers computed the log-odds ratio of words appearing in 1990s paper titles versus 2010s paper titles. They found that there were changes in the focus of papers and patents, with certain words becoming more polarized in their association with surveillance.

The study also provided additional background on surveillance and computer vision. Surveillance is a technology of social control that is intrinsically tied to the production of power relations. It perpetuates inequalities and disproportionately impacts marginalized communities. Computer vision has rapidly risen in the past decade due to the availability of image and video data. However, dataset collection and curation practices often lack considerations of informed consent, privacy, and the mitigation of negative social stereotypes. The field has been criticized for its focus on efficiency, universality, and impartiality, which can lead to the erosion of privacy and the reproduction of social stereotypes.

Overall, the study provides a comprehensive analysis of the Surveillance AI pipeline, examining the data extraction, transferal, and use of human data in computer vision research and applications. It highlights the pervasive nature of surveillance in the field and raises important ethical considerations.

Raw indexed text (90,801 chars / 13,440 words / 1,021 lines)

The Surveillance AI Pipeline

PRATYUSHA RIA KALLURI ∗ , Computer Science Department, Stanford University, USA

WILLIAM AGNEW ∗ , Paul G. Allen School of Computer Science and Engineering, University of Washington, USA

MYRA CHENG ∗ , Computer Science Department, Stanford University, USA

KENTRELL OWENS ∗ , Paul G. Allen School of Computer Science and Engineering, University of Washington, USA

LUCA SOLDAINI ∗ , Allen Institute for Artificial Intelligence (AI2), USA

ABEBA BIRHANE ∗ , Mozilla Foundation & School of Computer Science and Statistics, Trinity College Dublin, Ireland

ABSTRACT

A rapidly growing number of voices argue that AI research, and computer vision in particular, is powering mass surveillance. Yet

the direct path from computer vision research to surveillance has remained obscured and difficult to assess. Here, we reveal the

Surveillance AI pipeline by analyzing three decades of computer vision research papers and downstream patents, more than 40,000

documents. We find the large majority of annotated computer vision papers and patents self-report their technology enables extracting

data about humans. Moreover, the majority of these technologies specifically enable extracting data about human bodies and body

parts. We present both quantitative and rich qualitative analysis illuminating these practices of human data extraction. Studying

the roots of this pipeline, we find that institutions that prolifically produce computer vision research, namely elite universities and

“big tech” corporations, are subsequently cited in thousands of surveillance patents. Further, we find consistent evidence against

the narrative that only these few rogue entities are contributing to surveillance. Rather, we expose the fieldwide norm that when

an institution, nation, or subfield authors computer vision papers with downstream patents, the majority of these papers are used

in surveillance patents. In total, we find the number of papers with downstream surveillance patents increased more than five-fold

between the 1990s and the 2010s, with computer vision research now having been used in more than 11,000 surveillance patents.

Finally, in addition to the high levels of surveillance we find documented in computer vision papers and patents, we unearth pervasive

patterns of documents using language that obfuscates the extent of surveillance. Our analysis reveals the pipeline by which computer

vision research has powered the ongoing expansion of surveillance.

INTRODUCTION

Over the past few decades, many groups, from grassroots communities to policymakers, have drawn attention to and

organized against the rise of mass surveillance [3, 4, 23, 26, 56]. Moreover, many have asserted that artificial intelligence

(AI) research, and computer vision research in particular, is a primary source for designing, building, and powering

modern mass surveillance [9, 22, 47, 52, 66, 68, 73]. If these claims are true, the rapidly growing field of computer vision

is contributing to the legacy of surveillance technologies that have exacerbated disparities, limited free expression, and

created conditions that facilitated discrimination and abuse of power [13, 22, 23, 29, 37, 49, 51, 65, 68]. Ascertaining the

details of the pathways from computer vision to surveillance is urgent and crucial, as currently steep barriers prevent

rigorously understanding, unmasking, and thus intervening on, these societally consequential technologies. Here, we

reveal the Surveillance AI pipeline, illuminating the role of computer vision in surveillance.

* These

authors contributed equally to the realization of this project.

Authors’ addresses: Pratyusha Ria Kalluri ∗ , [email protected], Computer Science Department, Stanford University, 353 Jane Stanford Way, Palo Alto,

USA; William Agnew ∗ , [email protected], Paul G. Allen School of Computer Science and Engineering, University of Washington, 185 E

Stevens Way NE, Seattle, USA; Myra Cheng ∗ , [email protected], Computer Science Department, Stanford University, 353 Jane Stanford Way, Palo

Alto, USA; Kentrell Owens ∗ , [email protected], Paul G. Allen School of Computer Science and Engineering, University of Washington, 185 E

Stevens Way NE, Seattle, USA; Luca Soldaini ∗ , [email protected], Allen Institute for Artificial Intelligence (AI2), 2157 N Northlake Ave, Seattle, USA;

Abeba Birhane ∗ , [email protected], Mozilla Foundation & School of Computer Science and Statistics, Trinity College Dublin, Dublin, Dublin, Ireland.

Pratyusha Ria Kalluri ∗ , William Agnew ∗ , Myra Cheng ∗ , Kentrell Owens ∗ , Luca Soldaini ∗ , and Abeba Birhane ∗

Fig. 1. Computer vision papers and downstream patents.

This figure presents examples of computer vision papers and downstream patents, randomly drawn from our collected corpus of

more than 40,000 such documents. For each paper and patent presented, an excerpt describing its goals and applications is shown, as

well as an excerpted image. This provides a snapshot of our corpus

Computer vision papers Downstream patents

“[Removal of image background] is a useful

technique...especially when there are active, moving

objects...a crucial component in human activity recognition

and the analysis of video from surveillance...There are an

estimated minimum 10,000 surveillance cameras in the city of

Chicago...The goal [is] to enable technologies that can analyze

video data in real-time” “a method and system for segmenting and tracking...content

of videos in real-time. The content can include...e.g., a largely

stationary background [and] moving objects...[I]t is necessary

to provide a method that can segment and track...in

real-time”

“Lack of reliable and efficient [algorithms for linking a subject

across many images at different viewing angles or times]

makes it difficult for many image analysis tasks such as face

recognition [and] image classification... [Our method is]

capable of dealing with real time tasks such as visual tracking.” “[Techniques that are effective] in locating and extracting

many near-regular patterns or objects…

for example, human faces, texts, building facades, cars, plant

leaves, flowers, etc...a wide range of application...[e.g.,] to

remove noise...to add things [to images]...[and] license plate

recognition”

“We focus on detecting visual relations [e.g. “person ride bike”

and “bike next to car”] ...which provide further semantic

information for applications such as image captioning and QA” “Technology that can recognize [an image] and form a

combination of multiple sentence components [e.g. "person",

"play", "skateboard"]...applications such as image

understanding”

Computer vision refers to AI that focuses on measuring, mapping, recording, and monitoring the world from visual

inputs such as image and video data. Computer vision has historical roots in military and carceral surveillance. As a

technology that emerged in military contexts, it was historically developed to identify targets and gather intelligence

in war, law enforcement, and immigration contexts [21, 63]. The field of computer vision now generally emphasizes

training computers to interpret and understand the visual world. Yet, in-depth study of particular prominent computer

vision tasks such as facial recognition has revealed that military history has heavily shaped core aspects and uses of

these subfields. This further motivates interrogation into the extent to which the field of computer vision as a whole

has been shaped in a way that powers mass surveillance.

Surveillance at the most general level is defined as an entity gathering, extracting, or attending to data connectable to

other persons, whether individuals or groups [49]. In the current computer age, surveillance is frequently “extensive”:

entities, who are often minimally visible, use big datasets and aggregation to extend their reach, accessing previously

unseen persons, locations, or information. Prominent examples are practices where entities in position of power observe,

monitor, track, profile, sort, or police individuals and populations in private and public spaces through devices such as

CCTV, digital traces on social network sites, or biometric monitoring of bodies [22, 52]. Through ubiquitously connected

networks, data is aggressively gathered, shared, and aggregated. Behaviours, relationships and social and physical

environments are datafied, modelled, and profiled. Many scholars emphasize that surveillance is inextricable from

purposes such as influence, management, coercion, repression, discipline, and domination [13]. Crucially, a foundational

understanding in surveillance studies is that technologies enabling the very possibility of monitoring suffice to foster

conditions of fear and self-censorship, and this approach is a key means of social control [29, 37]. We present a more

extensive review of contextualizing literature in Appendix E.

The obfuscation of the Surveillance AI pipeline results from a confluence of forces. First, computer vision research

is perceived by many as a neutral, purely intellectual endeavor, separate from downstream impacts and applications.

In fact, AI research at large rarely discusses connection to societal needs or potential negative consequences [17].

Furthermore, surveillance often operates in the dark, and surveillance technology producers take extra measures to hideThe Surveillance AI Pipeline

their existence [20, 43]. It is difficult to gather direct evidence and details regarding the connections between research

and surveillance applications: computer vision research papers and documentation, i.e., what research is being done,

are written in ways that are not accessible to many outside the field; those who can parse this work are not accustomed

or incentivized to elucidate the details of surveillance emerging; and research appears to trickle down in a complex

multi-stage process. As a result, many aspects of the connection between computer vision research and surveillance

remain shrouded in mystery.

Our contributions. In this paper, an interdisciplinary team of researchers leveraged broad expertise including

machine learning, AI, robotics, computer vision, privacy, science technology and society studies, history, AI ethics, and

critical data studies to conduct an in-depth content analysis and large-scale computational analysis of three decades of

computer vision papers and downstream patents. Notably, we study self-reported uses and claims in the papers and

patents. Thus, our findings are robust to claims of unintentional, unanticipated dual use by “bad actors”; the extent and

types of Surveillance AI we uncover in this paper are those that are intentionally indicated and fully anticipated by

authors. Our key contributions are threefold:

(1) We reveal the pervasive extraction of human data, quantifying the prevalence of extracting human data,

identifying four types of human data extraction, and presenting quantitative and rich qualitative analysis of the

extraction taking place.

(2) We present a large-scale computational analysis of more than 40,000 computer vision papers and

downstream patents to capture the roots and rise of Surveillance AI, across corporations, universities,

nations, subfields, and years. We reveal, for example, that it is not a few outlier institutions dedicated to

surveillance. Rather it is a fieldwide norm that the majority of an institution’s patented computer vision papers

will be used in surveillance patents. Comparing the 1990s to the 2010s, these papers have become increasingly

focused on surveillance, and the amount of computer vision used in surveillance has risen more than five-fold.

(3) We shed light on widespread obfuscating language in papers and patents that contributes to perpetu-

ating the paradigm of Surveillance AI. This underscores the additional, hidden layers of interconnectedness

between computer vision research and surveillance, above and beyond what is typically discussed.

In making visible the pathways from computer vision research to surveillance applications, we aim for this mapping

to serve as a tool for communities to strategically organize around and against surveillance; policy-makers to identify

regulatory targets to curb surveillance; researchers to contend with the consequences of the field and (re)shape the

research agenda; and the public to exercise the right to knowledge and power over the apps, gadgets, and devices that

mediate and infiltrate their daily lives with surveillance.

CORPUS OF COMPUTER VISION PAPERS AND DOWNSTREAM PATENTS

To study the pathway from computer vision research to surveillance, we collect and analyze a corpus linking more

than 19,000 computer vision research papers to more than 23,000 downstream patents. Research papers and patents

have unique advantages making them revealing artifacts. First, they are primary sources written in researchers’ and

patenters’ own words, with professional and institutional standards that they accurately describe their research and

technologies and be able and willing to defend these documents’ accuracy. The connections between research papers

and citing patents serve as a rich data-trail of the path from research to applications [10, 40]. We study papers published

in the proceedings of the longest standing and highest impact computer vision conference, as it is highest impact by an

extremely large margin; by standard h5-index these proceedings are among the top five highest impact publications4

Pratyusha Ria Kalluri ∗ , William Agnew ∗ , Myra Cheng ∗ , Kentrell Owens ∗ , Luca Soldaini ∗ , and Abeba Birhane ∗

Fig. 2. The prevalence of human data extraction in computer vision papers and patents. Left. The large majority (90%) of the

annotated computer vision papers and patents enable extracting data about humans. The majority of the papers and patents (68%)

specifically enable extracting data about human bodies and body parts. Only 1% of the papers and patents targeted only non-humans.

(n=200, human bodies N=74 SD=6.7, human body parts N=62 SD=6.7, human spaces N=34 SD=5.6, unspecified N=22 SD=4.5, salient traces

N=6 SD=2.3, non-human data N=2 SD=1.2). Right. Figures in downstream patents make visible the prominence of targeting human

bodies and spaces, as is shown in this random sample of downstream patent images. We highlight those containing human bodies or

body parts (in red) and those containing human spaces (in orange).

Human

bodies Human

body parts

37% 31%

Non-human

data

Human

spaces

17%

Unspeciﬁed

data

Other socially

salient data

11%

in any discipline, alongside Nature and Science. This research is widely seen to be an “indicator of hot topics for

the AI and machine learning community” [5]. Acceptance and publication marks approval of research as work that

exemplifies the core values of the computer vision community. As such, these papers both represent state-of-the-art in

current computer vision and effectively reveal the values held in high regard within the community. We obtain all the

proceedings published from 1990-2020 and, for each paper that has been cited in one or more patents, we obtain all

citing patents. We refer to these as a patented paper and its downstream patents. Finally, we identify the which patents

contain one or more surveillance indicator words, which we refer to as downstream surveillance patents. We present

additional methodological details in Section 8. In Figure 1, we present randomly sampled pairs consisting of a paper

and a downstream patent, providing a snapshot of our corpus.

THE EXTRACTION OF HUMAN DATA

There is extensive evidence of public distrust and fear concerning the capturing and monitoring of human data,

including substantial concern about computer vision technologies operating on data ranging from online personal

data traces to biometric and body data [23, 55]. To surface the potentially numerous and subtly expressed variants

of human data extractions in computer vision, we conducted an in-depth qualitative content analysis of a subset of

the corpus, analyzing one hundred computer vision papers and one hundred downstream patents (See Section 8 for

methodological details). In the context of qualitative content analysis, this sample constitutes a large-scale manual

analysis, which we complement with a large-scale computational analysis of the full corpus in subsequent sections. This

qualitative orientation is necessary when the key concepts that will emerge from a body of study are not known a priori,

documents are complex or dense, expressing their key concepts with subtle language unique to the corpus, and a deep

characterization is valuable [31, 69]. In this section, we present the resulting empirically grounded characterization,

illuminating the types of data targeted in computer vision, along with to what extent parts or the whole of computer

vision is explicitly dedicated to extracting human data.The Surveillance AI Pipeline

Human body

parts

“The acquisition system may include a biometric sensor (e.g. an electronic fingerprint sensor, or an

optical eye scanner, or a camera arranged to acquire a portrait image of an authorized person’s

face...” (Patent 71)

These technologies most frequently targeted faces, including detection of eyes, eye movement, faces,

“suspicious” facial expressions, and facial recognition. Some targeted other body parts, typically to enable

human activity recognition.

Human bodies

“...people monitoring in public areas, smart homes, urban traffic control, mobile application, and

identity assessment for security and safety...” (Paper 53)

These technologies most frequently targeted humans in the midst of everyday activities (e.g., walking,

shopping, at group events) for purposes including body detection, tracking, and counting, as well as security

monitoring and human activity recognition.

Human spaces

“...a scene could be decomposed into a set of semantic objects...” [accompanied by an example image

taken inside an office] (Paper 40)

These technologies most frequently targeted living spaces, personal and communal, such as people’s homes,

offices, roads, town squares, or borders. The purpose was often identifying unspecified objects in these

spaces; other specific purposes included modeling traffic and monitoring large border crossing areas.

Other

socially salient

human data

“Free-hand human sketches [e.g., of another person’s item of clothing] are used as queries to

perform instance-level retrieval of images” (Paper 81)

These technologies targeted data containing traces of the mental, economic, cultural, social status, identities,

preferences, or location details of humans, most frequently to narrow users’ search results.

Table 1. Human data extraction in computer vision papers and patents.

In-depth content analysis identified four targets of human data extraction that were found in computer vision papers and downstream

patents. These targets of extraction form a series of increasingly focused categories: socially salient human data, human spaces, human

bodies, and human body parts. This figure presents each category, with textual examples and qualitative description, and serves as the

basis for our qualitative analysis.

Quantitative analysis

We present the types and extent of human data targeted in computer vision papers and downstream patents in Figure 2.

In Figure A1 we stratify this data to compare annotated papers versus patents. We find that 90% of papers and patents

extracted data relating to humans. Furthermore, the majority of papers and patents (68%) explicitly extracted data about

human bodies and body parts. In particular, at least a quarter of both papers and patents (35% and 27% respectively)

claimed or demonstrated targeting human body part data as a strength of their technology, and at least an additional

third of both papers and patents (36% and 38% respectively) claimed or demonstrated targeting human bodies. A smaller

but still substantial portion of papers and patents (18% and 16%) extracted data about human spaces.

Few papers and patents (1% and 5%) present their technology as useful for monitoring, tracking or predicting

non-body-related socially salient human data. Strikingly, only 1% of papers and patents were dedicated to extracting6

Pratyusha Ria Kalluri ∗ , William Agnew ∗ , Myra Cheng ∗ , Kentrell Owens ∗ , Luca Soldaini ∗ , and Abeba Birhane ∗

only non-human data, revealing that both computer vision research and its applications are overwhelmingly concerned

with datafying humans and specifically human bodies. Finally, the remaining portion (11%) of papers and patents

claimed to capture and analyze “images”, “text”, “objects”, or similarly generic terms, leaving unstated whether they

anticipated these categories including humans or human data.

Qualitative analysis

Our guiding aim was to cast light on the nature of the dense bodies of computer vision research and applications and

elucidate connections to surveillance, especially as these papers and patents can each be dozens of pages, difficult

to obtain and link, and written in a manner that assumes the reader has substantial expertise in computer vision,

disciplinary jargon, academic research, and patent applications. Here, on the basis of our in-depth content analysis, we

present the four targets of human data extraction that were identified in computer vision papers and patents, alongside

examples and qualitative description (Table 1). Further, we find that our analysis responds to narratives that only a

minute portion of computer vision and data extraction are harmful, and many kinds of computer vision and data

extraction are benign or harmless. Rather, we find that computer vision prioritizes forms of data extraction that are

widely viewed as the most intrusive, and, drawing from the large body of surveillance studies scholarship, we see that

other forms of prominent data extraction are not less intrusive – they merely intrude in distinct ways.

The four targets of human data extraction that emerged form a series of increasingly focused categories: socially

salient human data, human spaces, human bodies, and human body parts. Papers and patents broadly assumed tasks

targeting human body part data as valuable, particularly targeting facial analysis, and sometimes enabling activity

classification. This validates the substantial concerns that have been put forth regarding biometric and related body part

data. Biometrics such as faces, fingerprints, and gait, which constitute uniquely personal data that is often inseparable

from our identities, has proliferated as a form of surveillance in recent years, from fingerprint detection to activity

recognition technologies that emphasize tracking body parts. Their pervasiveness has been shown to significantly

infringe on people’s privacy and threaten human rights [23].

Further, the papers and patents targeting human bodies most frequently targeted humans in the midst of everyday

activities (e.g., walking, shopping, at group events), and named purposes included body detection, tracking, and counting,

as well as security monitoring and human activity recognition. The dominance of analysis of human bodies in everyday

settings aligns with the view of new surveillance by Browne [22] who characterizes the new practices of surveillance

as often undetected – for example cameras hidden in everyday benign objects – or even invisible. In these forms, data is

frequently collected without consent of the target, and then shared, permanently stored and aggregated. Browne [22]

characterizes surveillance as focused on monitoring and cataloguing that which was previously left unobserved, with

the human body as a primary site of surveillance.

Beyond human bodies, analysis of human spaces was widespread in papers related to scene analysis, understanding, or

recognition, which are often presented as a core contribution of the field in papers and patents alike. These technologies

targeted personal and communal living spaces, including people’s homes, offices, roads, town squares, or borders.

Purposes were often generic, identifying unspecified objects; more specific purposes included modeling traffic and

monitoring borders. This is one way of making previously unobserved phenomena, events, interactions and places

amenable to observation, which [25] document as a fundamental mechanism of surveillance. The rendition of homes,

streets, neighbourhoods, villages and towns to surveillance technology marks these spaces as no longer scenes where

residents, live, meet and talk but another object of target for data collection, tracking, categorizing, and predicting [73].

The consequence of the gradual rendering of more and more of these spaces is extremely subtle yet has profoundThe Surveillance AI Pipeline

Fig. 3. The rise of computer vision for downstream surveillance.

Left. Across three decades of patented computer vision papers (n=11,917), there has been a steady increase in the proportion used in

surveillance patents. In the 1990s, only one half of patented computer vision papers were used in surveillance patents (percent=50%,

SD=2%, n=664), yet in the 2010s, a large majority of patented computer vision papers were used in surveillance patents (percent=79%,

SD=1%, n=2,327). Center. Comparing the 1990s to the 2010s, the number of computer vision papers used in only non-surveillant

patents has remained relatively stable, while the number of computer vision papers used in surveillance patents has risen more than

five-fold. Whiskers represent standard deviation. Right. To assess linguistic evolution of computer vision papers across decades, we

measure differences in word frequency between 1990s paper titles versus 2010s paper titles. We report highly polarized words (with

z-scores computed using weighted log-odds ratios). There is a clear qualitative shift from more generic paper focus in the 1990s (teal

bars) to an increased focus on analysis of semantic categories and humans (e.g. “semantic”, “action”, “person”) in the 2010s (pink bars).

All shown word associations are statistically significant (𝑝 < .01).

Log Odds Ratio

Words more

common in 1990s

paper titles

z-score

Words more

common in 2010s

paper titles

implications for the future of humanity. It accumulates to what Zuboff calls the condition of “no exit”, where there are

fewer and fewer spaces left to “disconnect”, seek respite and be left to just be [73]. Similarly, capturing socially salient

human data contributes to the gradual cataloguing, documenting, mapping, and monitoring of human affairs in its rich

complexities [52, 73].

Finally, through close inspection, we find that the targeting of generic or unspecified data does not imply that the

technology described in the paper or patent cannot be used on human-related data or even that human data was not a

desired use case by the authors. In Section 6, we present evidence that, to the contrary, we find dense patent language

can hide the human data analysis in the upstream papers and, conversely, papers that do not speak to the potential for

use with human data often lead to patents that explicitly monitor human data, contributing to an additional obfuscated

layer of human data extraction. These findings provide analysis and concrete examples challenging the casting of

human data extraction as frequently benign, as we see that the forms of data extraction present are each highly intrusive

and distinct in the manner of intrusion.

THE RISE OF SURVEILLANCE AI

We present the evolution of computer vision papers and downstream patents in Figure 3. We find a substantial increase

in papers used in surveillance patents. Comparing decades, we find that the 1990s produced relatively fewer computer

vision papers with downstream patents, and only half of these were used in surveillance patents (percent=50%, SD=2%,

n=664). Two decades later, the 2010s show a tripling of computer vision papers with downstream patents, and 79% were8

Pratyusha Ria Kalluri ∗ , William Agnew ∗ , Myra Cheng ∗ , Kentrell Owens ∗ , Luca Soldaini ∗ , and Abeba Birhane ∗

Fig. 4. The fieldwide dominance of downstream surveillance.

Top. For the top institutions and countries authoring the most computer vision research, we show the large number of these papers

subsequently used in surveillance patents. For every one of these prolifically publishing institutions and countries, we see the majority

of its patented papers are used in surveillance patents. (Teal bars are larger than grey bars.) Bottom. This aligns with a computer

vision fieldwide norm. For each institution, country, and subfield that have published at least 10 papers with downstream patents, we

show the percent of these papers that are used in surveillance patents (vertical grey bars) (n=13,804, n=18,272, n=19,413). We find a

pervasive norm: when an institution, nation, or subfield authors papers with downstream patents, the majority are used in surveillance

patents. (Vertical grey bars are consistently above the orange 50% threshold.) Whiskers represent standard deviation. The name of each

individual entity and its corresponding percentage (i.e. the label of each individual vertical grey bar) is included in the Appendix for

further inspection.

TOP INSTITUTIONS

TOP COUNTRIES

Microsoft

United States

Carnegie Mellon Univ. China

Chinese Academy of Sciences United Kingdom

MIT Germany

Tsinghua Univ. Australia

Chinese Univ. of Hong Kong

Canada

Google

Not used in surveillance patents

Used in surveillance patents

ETH Zurich

Urbana-Champaign

France

Japan

Switzerland

Berkeley

India

100

200

300

Number of papers with downstream patents

500

1000

1500

2000

2500

3000

Number of papers with downstream patents

Percent used in surveillance patents

100% 100%

50% 50%

All institutions

All countries

100%

50%

All subfields

used in surveillance patents (percent=79%, SD=1%, n=2,327). The twin forces of the increase in computer vision papers

with downstream patents and the increase in the proportion of these used in surveillance patents combined to large

effect: the 1990s to the 2010s constituted a more than five-fold increase in the number of computer vision papers used

in surveillance patents (Figure 3, center).

It is also possible to gain insight into the evolution of computer vision by inductively surfacing patterns of linguistic

evolution. To study the linguistic evolution that has occurred over the past several decades, we compare the log-odds

ratios of word frequencies in 1990s paper titles versus 2010s paper titles. We use the informative Dirichlet prior to

obtain measures of statistical significance and control for variance in words’ frequencies [53]. In Figure 3, we show

highly polarized word associations in both directions with computed z-scores. Additional methodological details are

presented in Appendix 8. We see a clear, qualitative shift from more generic application-ambiguous language in the

1990s (e.g. “shape”, “edge”, “surfaces”; teal bars) to an increased focus in the 2010s on analysis of semantic categories

and humans (e.g. “semantic”, “action”,“person”; pink bars). Due to this being an inductive analysis that surfaces the

dominant patterns of linguistic evolution, this finding suggests that not only is there a major change toward rising

surveillance but it is one of the most salient changes that have occurred in the field over the past several decades. TakingThe Surveillance AI Pipeline

our results together, we see that the language and patenting practices in computer vision have evolved in ways that

increasingly focus on analyzing humans and enabling surveillance.

THE NORMALIZATION OF SURVEILLANCE AI

Surveillance technology does not emerge in a vacuum. Over time, researchers from many nations, institutions, and

subfields have conducted this research and developed applications, whether cognizant of and attentive to its downstream

consequences or not. This work has been actively funded and commercialized by external parties, and it continues

to evolve. In Figure 4 (top), we show the institutions and nations authoring the most computer vision research,

and we present the large number of these papers subsequently used in surveillance patents. For every one of these

prolifically publishing institutions and nations, we see the majority of its patented papers are used in surveillance patents

(Fugure 4 top, teal bars are greater than grey bars). These top institutions are often prestigious institutions, including

“big tech” corporations and elite universities, many of which are also the top producers of computer science papers

generally [11, 15], reflecting tight ties between those driving research and those driving surveillance. The prevalence of

both elite universities and major tech corporations in Figure 4, along with an across-the-board practice of being used

in surveillance patents, reflects the research ties between universities and corporations that have shaped the field of

computer science from its nascence [30]. The institutions authoring the most papers with downstream surveillance

patents align with well-established historical legacies of the military-industrial-academic complex [7, 12, 27, 44, 45, 72].

To understand the influence of nations and their research output on surveillance patents, we additionally present in

Figure 4 the distribution of ties to surveillance across authoring countries. The authoring countries are obtained from the

location of paper authors’ institutional affiliations. The top two nations producing papers with downstream surveillance

patents are the US and China by a large margin, with the US producing more of these papers than the next several nations

combined. Our findings correspond to previous reports about AI-driven surveillance across countries, which state that

on a global scale, China and the United States are the major drivers in supplying advanced surveillance technologies,

while the major users include both liberal democracies and other countries with less democratic governments [35].

We connect these statistics to the narrative of ongoing tensions between the United States and China to establish

themselves as the global superpower in AI [58]. From this perspective, the institutional race to develop surveillance

technologies is cast as a mission to defend against an enemy and align with state agendas [24, 27].

These findings provide the basis for a salient question: are only a few rogue entities contributing to surveillance

or are ties from research to surveillance a fieldwide norm? We find substantial evidence against the narrative of only

a few rogue entities contributing to surveillance. Rather, we identify a pervasive norm: when an institution or nation

authors computer vision papers with downstream patents, the majority are used in surveillance patents. (Figure 4 bottom,

institutions’ and nations’ vertical grey bars are consistently above the orange 50% threshold.) This norm describes the

behavior of 74% of institutions and 83% of nations, evidencing the wide-spanning normalization of computer vision

used in surveillance. Similarly, we find substantial evidence against the narrative that there are merely a few implicated

subfields of computer vision within a broader non-surveillance-oriented field. Rather, we find the continuation of the

norm: when a subfield produces computer vision papers with downstream patents, the majority are used in surveillance

patents. (Figure 4 bottom, the majority of subfields’ vertical grey bars are above the orange 50% threshold.) Inspecting

the computer vision subfields that author papers with downstream patents, it may be expected that the stated norm

describes frequently implicated subfields such as facial recognition, but in fact we find that the norm describes the

majority (69%) of the subfields. Our findings indicate that, across institutions, nations, and subfields, the practice of

producing computer vision that enables surveillance is a pervasive fieldwide norm.10

Pratyusha Ria Kalluri ∗ , William Agnew ∗ , Myra Cheng ∗ , Kentrell Owens ∗ , Luca Soldaini ∗ , and Abeba Birhane ∗

THE OBFUSCATING LANGUAGE OF SURVEILLANCE AI

Finally, in addition to our analysis of self-reported ties to surveillance, we illuminate striking trends of obfuscating

language that minimized or sidestepped mentions of potential surveillance and discussion of its harms. We highlight

and qualitatively describe two salient themes that emerged:

1. Papers and patents cast humans as merely another entity under the umbrella term “objects”.

“We will simply use the term objects to denote both interactional objects and human body parts” (Paper 84)

“Using these methods, objects such as people and vehicles may be identified and quantified based on image data.” (Patent 85)

“Since the surveillance system detects and can be interested on vehicles, animals in addition to people,

hereinafter we more generally refer to them with the term moving object.” (Paper 53)

Establishing the conceptualization of human as merely a kind of object explicitly, as many papers and patents do,

enables the rest of those documents and, crucially, all other papers and patents to merely discuss problems related to

objects or scenes, as they can rely on the understanding of human as object that has been established by peers. Because

humans are considered objects and scenes often contain people, such abstractions indicate that the field understands any

paper or patent that discusses objects and scenes – which constitutes the majority of the field – as potentially enabling

surveillance of humans. Many papers conflate humans with objects, making no note of how performing tasks like

detection or segmentation on people has extremely specific, and socially consequential impacts. For instance, a paper

about panoptic segmentation, in giving context about the body of literature that it draws from, makes no distinction

between non-human detection and face detection: “Early work on face detection...helped popularize bounding-box

object detection. Later, pedestrian detection datasets helped drive progress in the field” (Paper 96). The lede of a paper

about parsing object interactions does the same: “a major task of fine-grained interaction action analysis is to detect

the interacting objects or human body parts for each video frame (in the rest of the paper, we will simply use the

term objects to denote both interactional objects and human body parts)” (Paper 84). Considering humans as objects

implies that any knowledge produced related to object-focused tasks can be directly applied to human data. This

assumption neatly abstracts away the ways that such methods can be applied to surveillance. This phenomenon also

ties to literature about traditional science’s sharp divide between subject and object, which positions scientists as the

studiers of “objects” out there. This “splitting of subject and object” facilitates “denial of responsibility and critical

inquiry” [41]. This contextualizes the field’s homogenization of all possible data, including human data, into objects to

be studied, often without consent and without consideration of their sources or consequences.

2. What is not said: Even when the text of papers and patents makes no mention of human data,

figures or datasets may contain many, sometimes exclusively, images of humans.

The pattern of papers and patents claiming to target “objects”, while briefly defining these terms as subsuming humans,

sets a clear precedent that we find has already played out: we find that other documents lean on these norms, claim

to target “objects”, in actuality target humans, and thus leave no textual trace of the human data extraction they are

engaged in. For example, one paper describes itself as improving object classification and makes no mention of humans;

yet close inspection of the paper’s first figure reveals (in 3-point font) that it classifies so-called objects into classes

including “person”, “people”, and “person sitting” (Paper 5). A second paper describes itself as identifying salient

regions of images and does not mention humans in its text or figures; yet inspection of the paper’s datasets reveals

they demonstrate their technology by detecting regions of interest such as humans walking on a sidewalk (Paper 1).The Surveillance AI Pipeline

Figure 2 illuminates this pattern via a random sample of images drawn from downstream patents, many of which we

find contain human bodies and spaces despite many lacking explicit mention of these entities in the text. This further

entrenches a field norm that, on one hand, humans and objects of all kinds may be targeted in parallel, casting the vastly

different implications as inconsequential, and, on the other hand, that humans can be central targets of technologies

without needing to leave a textual trace, let alone discuss, surveillance. This norm obscures the extent of Surveillance

AI from both outsiders attempting to understand the field and insiders not cognizant of these practices. In this way, the

modeling and categorization of humans has become so pervasive it can only be understood as a task that has become

widely acceptable across the field of computer vision, rendering it a potential application of virtually all computer

vision papers and patents.

DISCUSSION: A PARADIGM OF SURVEILLANCE

The studies presented in this paper ultimately reveal that the field of computer vision is not merely a neutral pursuit of

knowledge; it is a foundational layer for a paradigm of surveillance. Our findings include these striking points: 90%

of papers and patents emphasize it as a strength that their technologies can target human data. Not only is human

data broadly targeted, but the majority (68%) of papers and patents explicitly focus on surveillance of human body

parts (e.g., faces) and human bodies. Between the 1990s and 2010s, we have seen the rise of Surveillance AI, and it

has become an overwhelming norm that computer vision papers analyze humans, and those papers used in patents

are most likely used in surveillance patents. Moreover, even when a paper does not explicitly state surveillance as an

application, it provides the methods to do so and is grounded in a historical context that makes it possible to target

human surveillance while minimizing the acknowledgement of these intentions. In other words, the default stance of

the field ties progress closely with surveillance. This is evident in the types of research questions that are valued and

prioritized by the field, as well as the way that the papers are written – particularly the use of obfuscating language

– for example the use of “object” in the analysis of humans, sometimes only exposing the anticipated human data in

images and figures.

The uncovered features of computer vision tie into a broader literature about the veneer of neutrality in science.

Scientific findings are frequently falsely presented as facts that emerge from an objective “view from nowhere”, in a

historical, cultural, and contextual vacuum. Such views of science as “value-free” and “neutral" have been debunked by

a variety of scholarships, from philosophy of science, STS and feminist and decolonial studies. A purported view from

nowhere is always a view from somewhere and usually a view from those with the greatest power [41, 42, 46, 62]. Social

and cultural histories and norms, funding priorities, academic trends, researcher objectives, and research incentives,

for example, all inevitably constrain and shape the direction and production of scientific knowledge [8, 17, 32, 33]. An

assemblage of social forces have shaped computer vision, resulting in a field that now fuels the mass production of

Surveillance AI.

Peering past the veneer of scientific neutrality, we find that the ongoing expansion of the field of computer vision

is centrally tied to the expansion of Surveillance AI. At its core, surveillance is the perpetual practice of rendering

visible what was previously shielded and unseen [22]. This is precisely the goal of the discipline of computer vision.

The continued progress of the field amounts to increasing the capabilities for recording, monitoring, tracking, and

profiling of humans as well as the wider social and physical environment. These tasks, which may seem benign to those

swimming in the waters of computer vision, in fact exemplify the ways that progress in the field of computer vision is

inextricable from increasing surveillance capabilities.12

Pratyusha Ria Kalluri ∗ , William Agnew ∗ , Myra Cheng ∗ , Kentrell Owens ∗ , Luca Soldaini ∗ , and Abeba Birhane ∗

Ultimately, whether a work in the field of computer vision predicates surveillance applications or not, it can and

frequently will be used for these purposes. Given the ways that research throughout the field can be implicated

and engaged in surveillance, even when the precise details are missing or obfuscated, our findings may constitute

a lower bound on the extent of computer-vision based surveillance: there are likely many more works that have

quietly contributed to Surveillance AI. Viewing computer vision in this light, it becomes clear that shifting away from

the violence of surveillance requires, not a small shift in applications, but rather a reckoning and challenging of the

foundations of the discipline.

METHODOLOGY

Data. In addition to the unique advantages of analyzing papers we describe in the main text, additional advantages

include that they must report their authors, primary affiliated institutions, and years of publication, enabling reliable

analysis of how these factors influence the pathway to applications; they are available online; and they have a consistent

overall structure facilitating consistency of annotation and reliable comparisons. These papers and their collected

downstream patents served as the basis for the in-depth content analysis and large-scale automated analysis presented

in this paper. We study the longest standing and highest impact (by an extremely large margin) computer vision

conference, which is the Conference on Computer Vision and Pattern Recognition (CVPR). Throughout our studies, we

analyze the corpus of CVPR papers from 1990-2020. In 1990, 1995, and 2002, CVPR did not occur, so there are no papers

from these years. In constructing our corpus, we leverage and link the papers in the Microsoft Academic Graph [67], the

paper-patent citation linkages inferred by Marx and Fuegi [50], and the patents in the Google Patents database. Manual

verification found the paper-patent citation linkages to have over 99% precision and 78% recall. All papers at CVPR are

published in English. For patents that were published in other languages, the Google patents English translations were

used.

Content analysis. Following best practices in content analysis, we conducted an in-depth analysis of a purposive

sample of papers and patents distinctively informative of the development of computer vision research and applications.

For each year from 2010 to 2020, we randomly sampled ten paper-patent pairs that consisted of a CVPR research paper

published in this year and a downstream patent. This formed a total of 100 papers and 100 downstream patents. In the

context of content analysis, this constitutes a large-scale annotation.

We conducted the content analysis using close reading of documents and a rigorous qualitative methodology. An

interdisciplinary six-person team analyzed the documents using an integrated inductive-deductive methodology. In

the inductive component, each document was read line by line including figures, inductively coding key emergent

features of the technology’s treatment of human data and iteratively accumulating a list of these key features and

their relationships. We complemented this with an additional deductive component in order to ensure that we actively

looked for and captured instances of papers and patents with key features that inhibited usage for surveillance, even if

rare. In this deductive component, we coded for two such features. The inductive and deductive codes are discussed in

this section, Section 3, and Appendix B and C.

Our annotation team had several strengths: our team included both published experts in computer vision and field

outsiders, allowing for expert insights and translation, as well as fresh perspectives that could illuminate computer

vision disciplinary biases. We utilized the constant comparative method. Throughout the coding process the team held

frequent, extensive discussions to develop the precise meanings of codes and their relationships, and to revise and refine

the code list. At the end of all coding, the team unanimously agreed upon the key emergent dimensions and features,The Surveillance AI Pipeline

along with the relationships amongst these dimensions and features, which we summarize in Figure A2, and discuss in

detail in Section 3 and Appendix B and C. Additionally, as we coded papers and downstream patents we encountered

and discussed salient examples of obfuscating language being used to describe or avoid describing surveillance, and we

present these findings in Section 6.

On the basis of our in-depth, interdisciplinary content analysis, we present the Surveillance AI topology in Figure A2,

bringing to the fore the dimensions, features, and dynamics of computer vision’s treatment of human data and connecting

these to concepts in surveillance studies that elucidate the complexity and consequences of these particular findings.

Our analysis identified three key dimensions capturing these technologies’ treatment of human data: (1) Data type

— What type of data does the technology extract, attend to, capture, monitor, track, profile, compute, or sort, and to

what extent is it human and personal? (2) Data transferal — To what extent does the data remain under the control of

the datafied person or become transferred to others? (3) Use of data — For what purpose is the data used? These three

dimensions are discussed in detail, with examples and analysis, in Section 3 and Appendix B and C.

We discuss the primary dimension of the topology in detail in Section 3. In this primary dimension, the inductively

identified types of human data extracted form a series of nested, increasingly focused categories: socially salient human

data, human spaces, human bodies, and human body parts. A fifth inductively identified target of data extraction was

General / unspecified data, which tended to target generic tasks such as “identifying objects”, did not specify targeting

human data, but also did not commit to targeting only non-human data. In addition to these data types, which were

inductively found only through close reading of the papers and patents, the annotation team deductively included

non-human data in the annotation scheme from the start. This was to ensure that we captured mentions of any

non-surveillance technologies in papers and patents, even if rare. In order to enable a quantitative analysis of this

primary dimension, for each paper and patent, we identified the innermost (most focused) type of human data extracted.

Half of the documents were annotated by more than one annotator, which was particularly valuable for becoming

accustomed to types of cases in which a single sentence or figure influenced the appropriate code; the existence of

such cases is discussed in Section 6; in these cases of multiple annotators, each document’s final code was determined

through discussion until consensus. We then quantized all documents’ annotations, presenting the relative frequencies

of the data types in Figures 2 and A1. The second and third dimensions of the topology are less consistently discussed

in papers and patents. Nonetheless, key areas of surveillance studies scholarship are dedicated to how these dimensions

(data transfer and data use) are important to understanding the roles, dynamics, and consequences of surveillance.

Given the importance of these dimensions, in Appendix B and C we include a full discussion of these dimensions,

the inductive and deductive codes, demonstrative examples and findings, and connections to nuanced dynamics of

surveillance that have been discussed in surveillance studies literature.

Automated analysis. To study the breadth and variation of surveillance across years, institutions, nations, and

subfields, we conducted a large-scale computational analysis of more than 40,000 papers and patents. Specifically,

during the in-depth manual content analysis the team of annotators identified a list of surveillance indicator words that

indicated surveillance (in particular, words that indicated the targeting of human body parts, human bodies, human

spaces, or socially salient human data; Section 3 provides detailed discussion of each of these types of targeting and

discussion of how they enable surveillance). To validate each surveillance indicator word, we scanned the corpus for all

patents containing this word, randomly sampled ten of these patents, and conducted manual inspection. We removed

from the list all words that manual inspection identified as not reliable indicators (typically because they had frequent

additional word senses; e.g. a “store” could be a human space but was frequently a technical term related to data or14

Pratyusha Ria Kalluri ∗ , William Agnew ∗ , Myra Cheng ∗ , Kentrell Owens ∗ , Luca Soldaini ∗ , and Abeba Birhane ∗

memory storage, so was removed from the list). The resulting list of surveillance indicator words was approved by

consensus, and we list these words in Appendix F.1.

For each paper, we scanned its downstream patents to identify patents containing one or more of these surveillance

indicator words, which we refer to as downstream surveillance patents. We present the distribution of surveillance

patents across institutions, nations, subfields, and years, along with contextualizing discussion, in Sections 4 and 5. We

present additional methodological details in Appendix F.

Analysis of the evolution across years. To conduct an analysis across years (displayed in Figure 3), we filter the

corpus years. In emerging and developing fields, the estimated time from a paper being published to a downstream

patent being published is three to four years; this is the time from the paper being published to the downstream patent

being filed as well as the time of the patenting process [36]. This appears to be in line with our corpus, as the number

of computer vision papers with downstream patents stabilized in the early 2000s and from the early 2000s onward

remained above 200 every year, until 2018 (exactly four years before our analysis began), at which point it suddenly

dropped by nearly a half. Accordingly, for the analysis across years, we removed papers from the years 2018 and 2019

since these were less than four years before our analysis began so many papers had not yet had the chance to have

the majority of their patenting process play out, leading to less reliable analysis. This filter had the added benefit that,

in our analyses comparing the 1990s to the 2010s, both decades consisted of 8 years, putting these decades on a fair

playing field for totaling when comparing the number of downstream patents of various types.

To study the linguistic evolution that has occurred, we compute the log-odds ratio with Dirichlet prior of words

appearing in 1990s paper titles versus 2010s paper titles [53]. We remove stop-words (those in the NLTK stopwords

list, as well as “using" and “via", because “using" and “via" are common stopwords in computer vision titles). We then

present ten highly polarized word associations in both directions with computed z-score in Figure 3. These are the

strongest word associations by z-score with the exception that, since we are interested in changes in the focus of papers

and patents and not in the well-known evolution of specific types/names of models being used, we skip the words of

"machine learning model(s) neural network(s)".

Data Availability. Instructions for downloading and creating datasets used is available at https://anonymous.4open.

science/r/surv-cv-DD3F/README.md.

Code Availability. Code for this project is available at https://anonymous.4open.science/r/surv-cv-DD3F/README.

md.

INCLUSION AND ETHICS STATEMENT

The authors of this paper are a multi-racial, multi-gender team with a wide range of expertise, including AI, machine

learning, computer vision, NLP, robotics, cognitive science, philosophy, community organizing, critical theory, and

security. The range of identities and expertises strengthened this paper, allowing us to understand the many subfields

and impacts of computer vision and develop rigorous annotation schemes. This paper contends directly with the ethics

of computer vision, helping uncover the extend of surveillance applications of CV and connect these applications to

research.The Surveillance AI Pipeline

ACKNOWLEDGEMENTS

We owe gratitude and accountability to the long history of work exposing the nature of surveillance and how technology

shifts power, work primarily done by communities at the margins. Myra Cheng is supported by an NSF Graduate

Research Fellowship (Grant DGE-2146755) and Stanford Knight-Hennessy Scholars graduate fellowship. Pratyusha

Kalluri is supported in part by an Open Phil AI Fellowship.

REFERENCES

[1] European Commission [n. d.]. 2018 reform of EU data protection rules. European Commission. https://ec.europa.eu/commission/sites/beta-

political/files/data-protection-factsheet-changes_en.pdf

[2] ICO [n. d.]. Enforcement Powers of the Information Commissioner Enforcement Notice. ICO. https://ico.org.uk/media/action-weve-taken/enforcement-

notices/4020437/clearview-ai-inc-en-20220518.pdf

[3] [n. d.]. Mijente. mijente.net.

[4] [n. d.]. Stop LAPD Spying Coallition. https://stoplapdspying.org/.

[5] 2021. CVPR 2021 Report Identifies 5 Trend Areas. https://www.computer.org/publications/tech-news/events/cvpr-2021-recap.

[6] ICO 2022. ICO could impose multi-million pound fine on TikTok for failing to protect children’s privacy. ICO. https://ico.org.uk/about-the-ico/media-

centre/news-and-blogs/2022/09/ico-could-impose-multi-million-pound-fine-on-tiktok-for-failing-to-protect-children-s-privacy/

[7] Janet Abbate. 2000. Inventing the internet. MIT press.

[8] Janet Abbate. 2012. Recoding gender: Women’s changing participation in computing. Mit Press.

[9] Philip E Agre. 1994. Surveillance and capture: Two models of privacy. The information society 10, 2 (1994), 101–127.

[10] Mohammad Ahmadpoor and Benjamin F. Jones. 2017. The dual frontier: Patented inventions and prior scientific advance. Science 357, 6351 (2017),

583–587. https://doi.org/10.1126/science.aam9527 arXiv:https://www.science.org/doi/pdf/10.1126/science.aam9527

[11] Nur Ahmed and Muntasir Wahed. 2020. The De-democratization of AI: Deep Learning and the Compute Divide in Artificiatl Intelligence Research.

arXiv (2020).

[12] Nafeez Mossadeq Ahmed. 2015. How the CIA Made Google. Inside the Secret Network behind Mass Surveillance, Endless War, and Skynet. Insurge

Intelligence, January 22 (2015).

[13] Thomas Allmer. 2011. Critical surveillance studies in the information society. tripleC: Communication, Capitalism & Critique. Open Access Journal

for a Global Sustainable Information Society 9, 2 (2011), 566–592. https://www.triple-c.at/index.php/tripleC/article/view/266

[14] Ruha Benjamin. 2019. Race after technology: Abolitionist tools for the new jim code. Social forces (2019).

[15] Emery D. Berger. 2017. CSrankings. https://csrankings.org

[16] Federico Bianchi, Pratyusha Kalluri, Esin Durmus, Faisal Ladhak, Myra Cheng, Debora Nozza, Tatsunori Hashimoto, Dan Jurafsky, James Zou, and

Aylin Caliskan. 2022. Easily accessible text-to-image generation amplifies demographic stereotypes at large scale. arXiv preprint arXiv:2211.03759

(2022).

[17] Abeba Birhane, Pratyusha Kalluri, Dallas Card, William Agnew, Ravit Dotan, and Michelle Bao. 2022. The values encoded in machine learning

research. In 2022 ACM Conference on Fairness, Accountability, and Transparency. 173–184.

[18] Abeba Birhane and Vinay Uday Prabhu. 2021. Large image datasets: A pyrrhic win for computer vision?. In 2021 IEEE Winter Conference on

Applications of Computer Vision (WACV). IEEE, 1536–1546.

[19] Sarah R Blenner, Melanie Köllmer, Adam J Rouse, Nadia Daneshvar, Curry Williams, and Lori B Andrews. 2016. Privacy policies of android diabetes

apps and sharing of health information. Jama 315, 10 (2016), 1051–1052.

[20] Thomas Brewster. 2022. Meet The Secretive Surveillance Wizards Helping The FBI And ICE Wiretap Facebook And Google Users. https://www.forbes.

com/sites/thomasbrewster/2022/02/23/meet-the-secretive-surveillance-wizards-helping-the-fbi-and-ice-wiretap-facebook-and-google-users/

[21] Meredith Broussard. 2018. Artificial unintelligence: How computers misunderstand the world. mit Press.

[22] Simone Browne. 2015. Dark matters: On the surveillance of blackness. Duke University Press.

[23] Madeleine Chang. 2022. Countermeasures: The need for new legislation to govern biometric technologies in the UK.

[24] Wendy Hui Kyong Chun. 2006. Control and freedom. Power and Paranoia in the Age of Fiber (2006).

[25] Julie E Cohen. 2017. Surveillance vs. privacy: effects and implications. Cambridge Handbook of Surveillance Law, eds. David Gray & Stephen E.

Henderson (New York: Cambridge University Press, 2017) (2017), 455–69.

[26] Kate Conger, Richard Fausset, and Serge F Kovaleski. 2019. San Francisco bans facial recognition technology. The New York Times 14 (2019), 1.

[27] Kate Crawford. 2021. The atlas of AI: Power, politics, and the planetary costs of artificial intelligence. Yale University Press.

[28] Lydia de la Torre. 2018. A guide to the california consumer privacy act of 2018. Available at SSRN 3275571 (2018).

[29] Gilles Deleuze. 1992. Postscript on the Societies of Control. The MIT Press.

[30] Paul N Edwards. 1996. The closed world: Computers and the politics of discourse in Cold War America. MIT press.

[31] Satu Elo and Helvi Kyngäs. 2008. The qualitative content analysis process. Journal of advanced nursing 62, 1 (2008), 107–115.16

Pratyusha Ria Kalluri ∗ , William Agnew ∗ , Myra Cheng ∗ , Kentrell Owens ∗ , Luca Soldaini ∗ , and Abeba Birhane ∗

[32] Nathan Ensmenger. 2015. “Beards, sandals, and other signs of rugged individualism”: masculine culture within the computing professions. Osiris 30,

1 (2015), 38–65.

[33] Nathan L Ensmenger. 2012. The computer boys take over: Computers, programmers, and the politics of technical expertise. Mit Press.

[34] Virginia Eubanks. 2018. Automating inequality: How high-tech tools profile, police, and punish the poor. St. Martin’s Press.

[35] Steven Feldstein. 2019. The global expansion of AI surveillance. Vol. 17. Carnegie Endowment for International Peace Washington, DC.

[36] U. Finardi. 2011. Time relations between scientific production and patenting of knowledge: the case of nanotechnologies. Scientometrics (2011).

[37] Michel Foucault. 1977. Discipline and Punish : the Birth of the Prison. Pantheon Books.

[38] Thomas Germain. 2023. Innocent Black Man Jailed After Facial Recognition Got It Wrong, His Lawyer Says. https://news.yahoo.com/innocent-

black-man-jailed-facial-200800345.html?guccounter=1

[39] Chris Gilliard. 2020. Caught in the Spotlight. Urban Omnibus 9 (2020).

[40] Björn Hammarfelt. 2021. Linking science to technology: the “patent paper citation. Journal of Documentation (2021).

[41] Donna Haraway. 2020. Situated knowledges: The science question in feminism and the privilege of partial perspective. In Feminist theory reader.

Routledge, 303–310.

[42] Sandra Harding. 2013. Rethinking standpoint epistemology: What is “strong objectivity”? In Feminist epistemologies. Routledge, 49–82.

[43] Kashmir Hill. 2020. The secretive company that might end privacy as we know it. In Ethics of Data and Analytics. Auerbach Publications, 170–177.

[44] Melvyn P Leffler and Odd Arne Westad. 2010. The Cambridge history of the cold war. Vol. 1. Cambridge University Press.

[45] Stuart W Leslie et al. 1993. The Cold War and American science: The military-industrial-academic complex at MIT and Stanford. Columbia University

Press.

[46] Helen E Longino. 2020. Science as social knowledge. In Science as Social Knowledge. Princeton university press.

[47] David Lyon. 2010. Surveillance, power and everyday life. Emerging digital spaces in contemporary society: Properties of technology (2010), 107–120.

[48] Gianclaudio Malgieri. 2020. The concept of fairness in the GDPR: a linguistic and contextual interpretation. In Proceedings of the 2020 Conference on

fairness, accountability, and transparency. 154–166.

[49] Gary T. Marx. 2015. Surveillance Studies. In International Encyclopedia of the Social & Behavioral Sciences (Second Edition) (second edition ed.),

James D. Wright (Ed.). Elsevier, Oxford, 733–741. https://doi.org/10.1016/B978-0-08-097086-8.64025-4

[50] Matt Marx and Aaron Fuegi. 2022. Reliance on science by inventors: Hybrid extraction of in-text patent-to-article citations. Journal of Economics &

Management Strategy 31, 2 (2022), 369–392.

[51] Torin Monahan and David Murakami Wood. 2018. Introduction: Surveillance studies as a transdisciplinary endeavor. (2018).

[52] Torin Monahan and David Murakami Wood. 2018. Surveillance Studies: A Reader. Oxford University Press.

[53] Burt L Monroe, Michael P Colaresi, and Kevin M Quinn. 2008. Fightin’words: Lexical feature selection and evaluation for identifying the content of

political conflict. Political Analysis 16, 4 (2008), 372–403.

[54] Mozilla. 2022. Privacy Not Included. https://foundation.mozilla.org/en/privacynotincluded/

[55] Irena Nesterova. 2022. Questioning the EU proposal for an Artificial Intelligence Act: The need for prohibitions and a stricter approach to biometric

surveillance. Information Polity Preprint (2022), 1–16.

[56] Access Now. 2021. Ban biometric surveillance. Brooklyn, Access Now (2021).

[57] Ciara O. 2023. Data watchdogs issued nearly €3bn in fines in 2022. https://www.irishtimes.com/business/2023/01/17/data-watchdogs-issued-

nearly-3bn-in-fines-in-2022/

[58] State Council of the People’s Republic of China. 2017. Next generation artificial intelligence development plan.

[59] Cathy O’neil. 2017. Weapons of math destruction: How big data increases inequality and threatens democracy. Crown.

[60] Amandalynne Paullada, Inioluwa Deborah Raji, Emily M Bender, Emily Denton, and Alex Hanna. 2021. Data and its (dis) contents: A survey of

dataset development and use in machine learning research. Patterns 2, 11 (2021), 100336.

[61] Kenny Peng, Arunesh Mathur, and Arvind Narayanan. 2021. Mitigating dataset harms requires stewardship: Lessons from 1000 papers. arXiv

preprint arXiv:2108.02922 (2021).

[62] Robert N Proctor, Robert Proctor, et al. 1991. Value-free science?: Purity and power in modern knowledge. Harvard University Press.

[63] Inioluwa Deborah Raji and Genevieve Fried. 2021. About face: A survey of facial recognition evaluation. arXiv preprint arXiv:2102.00813 (2021).

[64] Urbano Reviglio and Rogers Alunge. 2020. “I am datafied because we are datafied”: An Ubuntu perspective on (relational) privacy. Philosophy &

Technology 33, 4 (2020), 595–612.

[65] Neil M. Richards. 2013. The Dangers of Surveillance. Harvard Law Review (2013).

[66] Morgan Klaus Scheuerman, Alex Hanna, and Emily Denton. 2021. Do datasets have politics? Disciplinary values in computer vision dataset

development. Proceedings of the ACM on Human-Computer Interaction 5, CSCW2 (2021), 1–37.

[67] Arnab Sinha, Zhihong Shen, Yang Song, Hao Ma, Darrin Eide, Bo-June Hsu, and Kuansan Wang. 2015. An overview of microsoft academic service

(mas) and applications. In Proceedings of the 24th international conference on world wide web. 243–246.

[68] Luke Stark. 2019. Facial recognition is the plutonium of AI. XRDS: Crossroads, The ACM Magazine for Students 25, 3 (2019), 50–55.

[69] Mojtaba Vaismoradi, Hannele Turunen, and Terese Bondas. 2013. Content analysis and thematic analysis: Implications for conducting a qualitative

descriptive study. Nursing & health sciences 15, 3 (2013), 398–405.

[70] Carissa Véliz. 2021. Privacy is power. Melville House New York.

[71] Sarah Myers West. 2019. Data capitalism: Redefining the logics of surveillance and privacy. Business & society 58, 1 (2019), 20–41.The Surveillance AI Pipeline

[72] Charles Albert Ziegler and David Jacobson. 1995. Spying without spies: origins of America’s secret nuclear surveillance system. Greenwood Publishing

Group.

[73] Shoshana Zuboff. 2019. The age of surveillance capitalism: The fight for a human future at the new frontier of power: Barack Obama’s books of 2019.

Profile books.

APPENDIX

THE TARGETING OF HUMAN DATA IN PAPERS VERSUS PATENTS

In Figure A1 we present a quantitative summary of the types of data targeted in computer vision papers compared to

downstream patents. We find similar trends. On the basis of our in-depth content analysis, we find that 90% of papers

and 86% of patents extracted data relating to humans. Furthermore, the majority (71% of papers and 65% of patents)

explicitly extracted data about human bodies and body parts. In some cases, papers that do not speak to the potential for

use with human data led to patents that explicitly report monitoring human data. In general it was more common that

papers did speak to targeting human data, and in particular targeting human body parts, and then patents leveraged

these papers for overall targeting of human data or targeting of now unnamed data types. Only 1% of papers and 1% of

patents were dedicated to targeting non-human data, showing that both computer vision research and applications are

similarly concerned with analyzing, tracking, and monitoring humans and specifically human bodies.

Fig. A1. The extraction of human data in computer vision papers versus downstream patents. The breakdown for papers

(n=100) is as follows: human body parts (percent=35% SD=4.7%), human bodies (percent=36% SD=4.7%), human spaces (percent=18%

SD=3.8%), salient traces (percent=1% SD=0.8%), unspecified (percent=9% SD=2.8%), and non-human data (percent=1% SD=0.8%).

The breakdown for patents (n=100) is as follows: human body parts (percent=27% SD=4.5%), human bodies (percent=38% SD=4.9%),

human spaces (percent=16% SD=3.8%), salient traces (percent=5% SD=2.2%), unspecified (percent=13% SD=3.3%), and non-human

data (percent=1% SD=0.8%).

HUMAN DATA TARGETED

IN COMPUTER VISION PAPERS

HUMAN DATA TARGETED

IN DOWNSTREAM PATENTS

THE TRANSFER OF HUMAN DATA

An additional central, organizing feature of surveillance is the mass collection, permanent storage, aggregation, and

sharing of data, frequently without consent or awareness by the target individual, group or community [22]. Regulatory

bodies such as Europe’s General Data Protection Regulation (GDPR) [1] and the California Consumer’s Privacy Act18

Pratyusha Ria Kalluri ∗ , William Agnew ∗ , Myra Cheng ∗ , Kentrell Owens ∗ , Luca Soldaini ∗ , and Abeba Birhane ∗

of 2018 (CCPA) [28] have aimed to put mechanisms and regulations in place to ensure and enforce individual and

collective privacy rights. GDPR outlines fair, lawful and transparent data collection practices [48], deeming much

of the current ubiquitous and aggressive nonconsensual mass data collection, transferal and sharing by surveillance

companies/technologies unlawful. Subsequently, surveillance companies such as Clearview AI [2] as well as TikTok and

Meta [6] are often found in breach of these data protection rights and face fines. European data regulation authorities

for example, issued nearly €3bn in fines in 2022 alone [57]. Still, problematic and unlawful data collection, sharing, and

transferal practices have become the norm. From targeted online ads to wide ranging services (including, insurance,

retail and finance) to “smart” home devices, future prediction is a core objective of surveillance technology [73], which

heavily relies on the vigorous collection, aggregation and transferal of data. Many studies of public attitudes reveal

intense concern alongside a need for knowledge regarding the practices of data transferal.

We identified four categories capturing technologies’ transferal of human data: the paper or patent anticipates

transferring the data on a wireless connection; the data is transferred to another person or institution; the data is kept

entirely locally; and whether and where data is stored or transferred is left ambiguous. We found that stating data transferal,

storage or management information is rarely mentioned in papers but relatively more common and conveyed in patents.

Fig. A2. The topology of Surveillance AI. We present a topology of key dimensions of Surveillance AI. In particular, the relationship

between complex technologies and surveillance can be clarified by attending to the extraction of human data, the practices of data

transferal, and the institutional uses. At each stage, we identify and describe prominent variants. In Section 3 we present textual

examples and analysis and quantify the prevalence of these features of Surveillance AI.

DATA TRANSFER

Socially salient human data

Unspecif

example: analyzing social

networks

Unspeciﬁed

EXTRACTION OF HUMAN DATA

Human spaces

example: analyzing images

taken inside a house

Non-

human

data

Transfers data

about a person

to others

example: a company

scans “suspicious”

customers, a car

analyzes pedestrians,

or a search engine

proﬁles

Transfers data

about a person

over a wireless

connection

Unspeciﬁed

example: images of

people are wirelessly

transmitted to an

external server for

analysis

Guarantees

data

remains

local

Human bodies

example: labeling human bodies

INSTITUTIONAL USE OF DATA

Human body parts

example: face recognition

Modeling or

classifying humans

example: proﬁling users

or classifying bodies

Inﬂuence Control

example: a search

engine proﬁles to

give targeted

results or ads example: border

surveillance is

used to restrict

movement

Data is transferred to others

“We developed methods for face recognition from sets of images...of the same unknown individual” (Patent 0)

This category captured scenarios in which data about a person is not guaranteed to remain solely with that person and

may instead be transferred to one or more other persons or institutions. An example of this is a home video surveillance

system that gives the system administrator access to videos of other persons, and may also share those videos withThe Surveillance AI Pipeline

the manufacturer or other entities, such as law enforcement. In a world of ‘data economy’ [59] where AI systems are

hungry for data, data collected from our digital devices, fitness tracking technology and cameras provide insights about

ourselves as well as our surroundings [39]. Rarely, if at all, such data remains under the control of the data subject

and is shared with third parties; institutes, data brokers, or other persons. Even when privacy polices are outlined,

data is not guaranteed to remain under the control of the person. Examining 211 diabetes apps, Blenner et al. [19],

for example, found that of apps with privacy policies, 79 percent shared data while only about half of them admitted

doing so. Similarly, a recent review of the privacy and data sharing policies of IoT devices and apps, found that despite

restrictions in privacy policies, personal data is aggressively collected, shared and sold to third parties [54].

Data transfer over a wireless connection

“image data...may not be saved in intermediate form, but may simply be “piped”

to a next stage over a bus, cable, wireless signal or other information channel” (Patent 5)

Some patents indicate that image or video analysis will be done in the cloud and illustrate this in diagrams outlining their

system. Others do not explicitly mention that their artifact will be used to transfer data to an institution, but described

the wireless capabilities of their artifact. In both of the described scenarios we understand these as having the fully and

intentionally anticipated capability for wireless data transfer. The collection, aggregation and categorization of data

is one of the key characteristics of surveillance and an increasingly lucrative business [22, 73]. Even while appearing

everyday and seemingly benign to many, ubiquitously connected technologies are instrumental for documenting,

mapping, monitoring and facilitating widespread, networked surveillance. The under-regulated data broker industry

and analytics companies, who infer individual features from consumer data in order to predict behaviour are an essential

component of the surveillance ecosystem [64, 70, 71]. And, despite diverse understandings of the ideal that ought to be

possible with internet and connectivity, in reality all connectivity serves an, at times shockingly productive, venue for

data collection, aggregation, analytics, prediction, and ultimately surveillance [73]. According to Zuboff, “Every avenue

of connectivity serves to bolster private power’s need to seize behavior for profit.”

Data remains exclusively local

Surveillance is not mere designing, building and deploying technologies, but is also marked by the struggle for power

and control. A tracking technology such as a health monitor, for example, that exclusively remains under the control

of a particular person, might serve only that particular user. This can potentially include papers and patents where

all data collected is guaranteed to be kept and processed entirely at the control of the data subject, for example, on a

personal server. Because this is entirely possible, we included this deductive code: the inclusion of this category served

to actively search for and document any possible technology aimed placing total agency in the hands of the end user;

however, none of the papers or patents fell into this category.

Unspecified

Data transferal or storage information is sometimes undisclosed in patents and is rarely stated in papers. Note that

this label does not prevent or limit any data from being transferred to others. Instead, it means that the work does not

specify where or how the data is stored, shared, or transferred. Given that surveillance technologies tend to operate in

the dark where technology vendors take extra measurements to hide their existence [20, 43], opacity in these category

of papers and patents can signify purposeful obfuscation.20

Pratyusha Ria Kalluri ∗ , William Agnew ∗ , Myra Cheng ∗ , Kentrell Owens ∗ , Luca Soldaini ∗ , and Abeba Birhane ∗

THE INSTITUTIONAL USE OF DATA

Surveillance is not mere passive observation but also extends on some capacity to control, regulate, or modulate

behaviour [52]. These can be seemingly invisible influences, for example limiting choices or opportunities or directing

people towards certain decisions (and away from others) through, for example, recommendation or personalization

tools developed by big corporations. This form of influence and behaviour modulation is subtle and at times not

recognized. Other times, papers and patents present a relatively direct surveillance application of their technology,

where data is transferred and controlled by institutions, such as state and military bodies for the purpose of exercising

power. Typically, surveillance technologies and norms are implemented and practised as convenience and a solution to

“efficiency, productivity, participation, welfare, health or safety” whereby social control is framed as an unintended

consequence [22]. We identify three key data uses described in papers and patents.

Modeling or categorizing humans

The methods proposed in these works attempt to make humans amenable to modeling and categorization. This form of

surveillance might be to used collected data to generate models of humans without specifying the intended use case for

these models. An example of this could be pedestrian detection with no explicit purpose. Alternatively this form of

surveillance might explicitly model and categorize to facilitate soft influence or hard control of humans.

Soft influence

“Applications including...real-time language translation,

online search optimizations, and personalized user recommendations” (Patent 35)

Soft influence includes online targeted ads, recommendations, and other forms of personalization. While surveillance

used to exert soft influence does not require people to carry out certain behaviors, it can coerce them towards behaviors

that the surveillor believes are desirable and constraining the other options available to them. An example of this is

gaze tracking in mixed reality environments for targeted advertising. Note that while we use the term “soft” relative to

the category of “hard control,” such influences can have significant, life-changing harms. For instance, personalized

recommendations and advertisements can and have targeted vulnerable populations with misleading products and

excluded marginalized communities from opportunities for employment, credit, and housing [59].

Hard control

“Applications include...assisting in automated patrol of large uncontrolled border crossing areas,

such as the border between Canada and the US and/or the border between Mexico and the US.” (Patent 5)

Patents (more so than papers) often state exemplar applications of their technologies where data is transferred to

institutions of power. Examples of applications mentioned include tracking and identifying people for the purpose of

border surveillance (for example, restricting movement), recognizing and tracking vehicles in cluttered urban scene

using autonomous drones for the purpose of law enforcement, and detecting “anomalous” and suspicious activities.

ADDITIONAL DETAILS ON THE NORMALIZATION OF SURVEILLANCE AI

In Figure A3, we present plots identical to those in Figure 4 bottom with the addition of individual institutions’, countries’,

and subfields’ names listed. This enables closer inspection of, e.g. how specific institutions, nations, or subfields of

interest participate in the contribution to surveillance.

We especially draw attention to the case of subfields. Computer vision is a broad field comprised of several subfields,

such as object detection, medical imaging, and 3D reconstruction. Although some research in these subfields is more0%

Surrey

Univ.

Technology

Univ.

California,

of Jerusalem

College

Wisconsin–Madison

California,

National

Purdue

Univ.

The Surveillance AI Pipeline

Fig. A3. Additional details on the fieldwide dominance of downstream surveillance.

This figure is identical to Figure 4 bottom with the addition of individual institutions, countries, and subfields listed, enabling

exploration of, e.g., which particular entities commit most intensively to surveillance. Best viewed as a PDF and zoomed in on. We

identify each institution, nation, and subfield that has published at least 10 papers with downstream patents, and we compute the

percent of these papers that are used in surveillance patents (vertical grey bars) (n=13,804 papers, n=18,272 papers, n=19,413 papers). In

the case of subfields, rather than showing the large number of such subfields, we sort by the number of published computer vision

papers and show only the first 150 subfields (n=18,946 papers). As is described in the main text, we see a pervasive norm: when an

institution, nation, or subfield authors papers with downstream patents, the majority are used in surveillance patents. (Vertical grey bars

are consistently above the orange 50% threshold.) Whiskers represent standard deviation.

Percent used in surveillance patents

100%

50%

100%

50%

100%

50%22

Pratyusha Ria Kalluri ∗ , William Agnew ∗ , Myra Cheng ∗ , Kentrell Owens ∗ , Luca Soldaini ∗ , and Abeba Birhane ∗

recognizable as surveillance (e.g., facial recognition) than others, we found that research across many subfields has

contributed to the creation of Surveillance AI. For 77% of subfields that author papers with downstream patents, at

least half of these papers are used in surveillance patents (vertical bars are above the orange 50% threshold). As can be

seen, there are several topics explicitly related to the tracking of human bodies, such as “face detection” and “motion

detection.” Yet interestingly, many of the top subfields have no explicit relation to the modeling of human data and

instead are simply common topics, like “background subtraction” and “computer graphics.” The wide distribution of

topics across papers cited in surveillance patents reveal that the work of many subfields, even ones not explicitly

connected to human data, have contributed to Surveillance AI.

ADDITIONAL BACKGROUND ON SURVEILLANCE AND COMPUTER VISION

Surveillance. Surveillance is a technology of social control intrinsically tied to the production of power relations.

The observing, tracing and monitoring is practised by those in a relative position of power to those being observed.

The enactments of surveillance frequently reify boundaries, borders, and bodies along racial lines, the consequence

of which is often discriminatory treatment of individuals and communities that are negatively portrayed, which

Browne terms “Racializing surveillance” [22]. Surveillance perpetually influences its subjects in making them more

“amenable to observations, prediction, and suggestion” [25]. Surveillance practices produce social norms and standards

and exercise the “power to define what is in or out of place” [22]. State powers and military institutions track,

monitor and profile citizens, immigrants, “offenders” or “suspects”; companies watch and monitor their employees;

tech corporations track, sort and profile users; education institutes track and monitor their pupils, often with the

justification of enhancing security, productivity, safety, or efficiency. As most surveillance technologies are designed,

developed, and deployed by and for institutions of power as the paying customers and primary stakeholders, the safety,

welfare, and interest of individuals and communities where these technologies are deployed are an afterthought, if

considered at all. As a result, while institutions of power benefit the most from the production and deployment of

surveillance technologies, communities subjected to surveillance (often communities at the margins of society) are

disproportionately negatively impacted [14, 22, 34, 47]. Current facial recognition technologies used in law enforcement,

for example, disproportionately negatively impact racial minorities. In the context of US law enforcement, for example,

facial recognition surveillance has so far led to at least four wrongful arrest, all of whom are Black men [38], facilitating

and expanding racialized carceral systems. Relational and collective conceptions of surveillance are therefore critical for

a comprehensive understanding of surveillance that can account for power asymmetries that permeate the surveillance

ecology.

Computer Vision. The emergence of the World Wide Web and with it, the ‘availability’ of vast amount of image and

video data, has been a central contributing factor for the rapid rise of the field in the past decade. Critiques have been

formulated that draw attention not only to the histories, but the encoded values and ongoing practices within the field.

Dataset collection, curation and management practices, in most cases remain devoid of careful considerations of issues

such as informed consent, privacy, or dataset audits (for example, to mitigate negative social stereotypes often encoded

in data) [18, 61]. Dataset collection and curation practices in computer vision are compared to the ethical equivalent of

data theft [60] and erode privacy with most data collected without informed consent or procedures to opt-out [18, 60].

Dataset collection, documentation, and development in computer vision are driven by the underlying values of efficiency,

universality, impartiality, and model work. Scheuerman et al. [66] et al. further note that “Efficiency is valued over care,

a slow and more thoughtful approach to dataset curation. Universality is valued over contextuality, a focus on moreThe Surveillance AI Pipeline

specific tasks, locations, or audiences. Impartiality is valued over positionality” based on extensive analysis of canonical

image datasets. Furthermore, the rapid rise and accessibility of generative models such as Stable Diffusion, not only

exacerbates the erosion of privacy and the reproduction of social stereotypes, toxic and discriminatory predictions, the

proliferation of these generated images at a massive scale also pollutes the digital ecology [16]. Against the backdrop

of the rapid rise of computer vision and the growing array of critiques of the field, it is crucial we understand the

extent and nature of the flow from the field’s histories, values, and practices to major downstream applications such as

surveillance.

ADDITIONAL METHODOLOGICAL DETAILS

F.1

Automated analysis

We searched the abstract and body of each patent for the following surveillance indicator words: “ad”, “advertisement”,

“airport”, “apartment”, “army”, “baggage”, “caste”, “citizen”, “combat”, “convict”, “crime”, “criminal”, “defense”, “disability”,

“enemy”, “ethnicity”, “face”, “facial”, “facial recognition”, “felon”, “female”, “foot traffic”, “fraud”, “friend”, “gender”,

“geolocation”, “hand”, “iris”, “irises”, “jail”, “kid”, “license plate”, “limb”, “male”, “man”, “military”, “nonbinary”, “office”,

“pedestrian”, “penitentiary”, “prison”, “prisoner”, “purchase”, “recommend”, “reidentification”, “security”, “sex”, “sexuality”,

“social network”, “street”, “surveil”, “surveillance”, “torso”, “transgender”, “underage”, “woman”, and “youth”.

While we in general searched the patent abstracts and patent bodies, limitations of our parser resulted in, for a small

number of patents, being able to obtain and search only the abstract. Our keyword counts thus constitute lower bounds,

and the prevalence of surveillance is likely to be even greater than that which we document.

F.2

Error Estimation

For any quantitative result from our manual and automated analysis, we report standard deviation. Standard deviation

is estimated using a boostrapping algorithm. For any given subset of 𝑛 papers or patents, we sample 𝑘 = 1, 000 sets of

the 𝑛 elements with replacement, and calculate the standard deviation of positive and negative classes. For the manual

coding, an item is in the positive class if it has been annotated as such (i.e., if it falls under one of the 4 categories shown

in Table 1). For the automated analysis, an item is in the positive class if it is associated with any of the keywords listed

in Appendix F.1.