Skip to Main Content

AI Ethics: AI Data & Bias

Introduction

Generative AIs like ChatGPT are essentially very powerful predictive text programs. They are not intelligent, but instead are very, very good at predicting what word should come next in a string. They "learn" this by being trained on a huge body (also called a "corpus") of information; they study literally trillions of words to understand, when given a prompt on a certain topic, which words should be put in what order.

In the case of ChatGPT, this corpus of information was most of the free parts of the internet.  OpenAI, the company behind ChatGPT, periodically scrapes the internet, including the entirety of Wikipedia, Reddit, free news sites, and much more, and ChatGPT learns from this massive amount of information.

Unfortunately, as we all know, the internet is hardly a lovely place. It is rife with racism, sexism, homophobia, transphobia, and many other sorts of bias, stereotypes, and intolerance. This means that AIs can become biased, too. OpenAI and other tech companies have made efforts to remove some of this bias from their apps, putting up guardrails so that users cannot easily ask ChatGPT to create, say, a racist diatribe. Even so, given the problematic body of information they have been trained on, it is essentially impossible to completely remove bias from these generative AIs.

These biases can range from something subtle, such as ChatGPT perpetuating stereotypical gender roles in a fictional story, to something far more overt, such as an AI art program turning an Asian woman's selfies into hypersexualized avatars, or an AI used for predictive policing demonstrating extreme racial bias against African Americans.

AIs can be extremely useful tools, but we should always remain aware that the products they generate may include biases and stereotypes because the information they were trained on includes those biases and stereotypes.

Case Study: Twitter's Racist A.I. Bot

In 2016, Twitter proudly launched an AI bot that would interact with the platform's users. Due to the virulently racist and sexist content on Twitter, however, the AI became racist and sexist within hours, and Twitter was forced to shut it down less than a day after its launch.

This is an important reminder that AIs are not naturally intelligent, but merely "learn" from the data on which they are trained. Frankly, Twitter should have easily predicted this turn of events, given the repellent nature of discourse prevalent on the platform.

Case Study: Amazon's Anti-Woman Hiring A.I.

In 2018, Amazon was forced to abandon an AI used to screen prospective employees because it demonstrated a bias against female applicants.  The algorithm used by the AI was not designed to be prejudiced against women, but it was trained on ten years of job applications, and during those years, male applicants vastly outnumbered female ones.  Thus, the AI "learned" to favor men and penalize women when it came to rating their applications.

 

Contact

Crumb Library: 315-267-2485
Crane Library: 315-267-2451

Text Us!: 315-277-3730

Social Media

College Libraries

Home

SUNY Potsdam College Libraries
44 Pierrepont Ave
Potsdam, NY 13676