Why does a widely used internet search engine deliver results that can be blatantly racist and sexist? Two leading information researchers investigate their discoveries of hidden biases in the search technology we rely on every day, involving pornographic images and ads implying criminal behavior triggered by simple search queries. Both researchers share common concerns about how everyday online searches can reinforce damaging stereotypes, and explore how technology can be made more equitable. (Premiered April 14, 2021)
Search Engine Breakdown
PBS Airdate: April 14, 2021
REPORTER: As misinformation and so-called fake news continue to be rapidly distributed on the internet, our reality has become increasingly shaped by false information. Many people don’t know the difference between something real and something created to deceive them.
DR. SAFIYA NOBLE, PH.D. (Co-Director, UCLA Center for Critical Internet Inquiry): I spent about 15 years in advertising and marketing, and while I was there, Google arrived on the scene. I understood the transformative effect that this search engine was having, in helping us curate through all kinds of information, but I was surprised, having just left advertising, that everybody was thinking about Google as this new public trusted resource, because I thought of it as an advertising platform.
Most people who use search engines believe that search engine results are fair and unbiased.
Caption: Google handles of 90% of all information searches online globally.
SAFIYA NOBLE: The public, and especially kids and young people, use search engines to tell them the facts about the world.
One weekend, my nieces were coming over to hang out and I was thinking, “Oh, let me pull my laptop out and see if I can find some cool things for us to do this weekend.” I just thought to type in, “Black girls,” and the whole first page of search results was almost exclusively pornography or hyper-sexualized content. In 2012, I started to see some of the results changing. Google had started to suppress the pornography around Black girls.
Unfortunately, still today, we see pornography and a kind of hyper-sexualized content as the primary way in which Latina and Asian girls are represented. “What makes Asian girls so attractive,” “Asian fetish,” “Hot ladies from Asia: See who we rank number one in 2020,” “Tender Asian girls,” “Meet world beauties.”
This is the study that was done by The Markup that replicated my study from 10 years ago. They found that Black girls, Latina girls and Asian girls, those phrases were, look, so profoundly linked with kind of adult content; zero for white girls, zero for white boys.
There are so many racial stereotypes and gender stereotypes that show up in search results. What about actual girls and children who go and look for themselves in these spaces? It’s very disheartening. When women become sex objects in a space like this, it’s really profound, because the public generally relates to search engines as, kind of, fact checkers.
Before we were so heavily reliant upon a database, we used something like a card catalog. We didn’t rank content, it was alphabetical. It also might be by subject.
LIBRARIAN (Reading a poster called “How to Find a Book”/Archival Footage): It’s a summary of the organization system, we call the Dewey Decimal System.
SAFIYA NOBLE: Now, when we’re in a subject, we know there is a lot in relationship to that one item that we might be looking for. We might go look for a book in the stacks, for example, and find that there’s hundreds of books around that one that tell us something about that book. And we might serendipitously find all kinds of other bits of information that are amazing. But we can see a little a bit more about the logics of that.
We don’t understand the logics of how certain things make it to the first page in a search. Google has a very complicated and nuanced algorithm for search. Over 200 different factors go into how they decide what we see. Of course, they’re indexing about half of all of the information that is on the web, and even that is trillions of pages.
GOOGLE VIDEO: Billions of times a day, Google software locates all the potentially relevant results on the web, removes all the spam, and ranks them, based on hundreds of factors like keywords, links, locations and freshness, all in, oh, .81 seconds.
SAFIYA NOBLE: The whole premise of a search engine is to categorize and classify information. A lot of the content that comes back to us on the internet is in a cultural context of ranking. We know very early what it means to be number one, so ranking logic signals to us that the classification is accurate, from one being the best to whatever is on page 48 of search, which nobody ever looks at. Part of what it’s doing is picking up signals from things that we’ve clicked on in the past, that a lot of other people have clicked on, things that are popular. So, an algorithm is, in essence, a decision tree: if these conditions are present, then this decision should be made. And the decision tree gets automated, so that it becomes like a sorting mechanism.
Google’s very reliable for certain types of information. If you’re using it in this kind of phone book fashion, it’s fairly reliable. But when you start asking a search engine more complex question or you start looking for knowledge, the evidence isn’t there that it’s capable of doing that. It’s this combination of hyperlinking, it’s a combination of advertising and capital and also what people click on that really drives what we find on the web. And this is where we start falling into trickier situations, because those who have the most money are really able to optimize their content better than anyone else.
There have been great studies about the disparate impact of what a profile online says about who you are.
DR. LATANYA SWEENEY, Ph.D., (Professor, Harvard University, John F. Kennedy School of Government): I was the first African American women to get a Ph.D. in computer science at M.I.T. So, I visit Harvard; I’m being interviewed there by a reporter, and he wants to see a particular paper that I had done before. So, I go over to my computer, I type in my name into a Google search bar and upward pops this ad, implying I had an arrest record. He says, “Ah, forget that article. Tell me about the time you were arrested.” I say, “Well I have never been arrested.” And he says, “Then why does your computer say you’ve been arrested?” So, I click on the ad, I go to the company to show him, not only did I not have an arrest record, but nobody with a Latanya Sweeney name had an arrest record. And he says, “Yeah, but why did it say that?”
If you type in the name “Latanya” in the Google Image search, you can see a lot Black faces staring back. Whereas, if I type “Tanya,” I see a lot of white faces staring back. So, we get the idea that there are some first names given more often to Black babies than white babies. So, I then took a month, and I researched almost 150,000 ad deliveries, around the country, and I found that if your name was given more often to white babies than Black babies, the ad would be neutral, and if your first name was given more often to Black babies than white babies you were 80 percent likely to get an ad implying you had an arrest record, even if no one with your name had any arrest record in their database.
SAFIYA NOBLE: One specific way that algorithms discriminate is that they just are too crude. The idea of “if X then Y,” “if you have this type of name it means you’re automatically associated with criminality.” That blunt, crude kind of association, that is the staple logic of how algorithms work. The types of bias that we find on the internet are often blunt. We are being profiled into similar groups of people, who do the kinds of things that we might be doing, and we’re clustered and sold as a cluster to advertisers. And so, there’s certainly a commercial bias, but we also have the bias of the people who designed the technologies. To think that technologies will be neutral or never have bias is really an improper framing. Of course, there will always be a point of view in our technologies, the question is, is the point of view in service of oppression? Is it sexist? Is it racist?
LATANYA SWEENEY: Here I was, a passionate believer in the future of equitable technology, and if the people, when they were hiring me at Harvard, had typed my name into the Google search bar and paid attention to this ad, it put me at a disadvantage. And not just me, but a whole group of Black people would be placed at a disadvantage. How could these biases of society be invading the technology that I really had grown to love? And now civil rights was up for grabs, by what technology design allowed or didn’t allow.
Google’s ad delivery system is really quite amazing. You click on a web page, and that web page has a slot that an ad is going to be delivered. And in that fraction of a second, while the page is being delivered, Google runs a fast digital auction. And in that digital auction, they decide which of competing ads are going to be the ad they’re going to place right there. At first, the Google algorithm will choose one of them randomly, but if somebody clicks on one, then that one becomes weighted more often to be delivered.
So, one way the discrimination in online ads could happen, would’ve been that society would have been biased on which ads they clicked most often and that this would’ve represent the bias of society itself. Our technology and our data sharing are so powerful, that they are, kind of, like, the new policy maker. We don’t have oversight over these designs, but yet, how the technology is designed dictates the rules we live by. And this meant that we were moving from a democracy to a new kind of technocracy.
I became the Chief Technology Officer at the Federal Trade Commission. They’re sort of the de facto police department of the internet. One of the experiments that I had done while I was at the F.T.C., showed that everyone’s online experience is not the same.
CAPTION: (From MIT TECHNOLOGY REVIEW) Racism is Poisoning Online Ad Delivery, Says Harvard Professor.
GOOGLE STATEMENT 1: “The findings from Dr. Sweeney’s academic study, conducted in 2013, cite examples that are prohibited today under our policy on Misusing Personal Information.”
GOOGLE STATEMENT 2: “We fundamentally design our systems to return not only results that are relevant, but from sources that people tend to find reliable and authoritative.”
GOOGLE STATEMENT 3: “We’re constantly retraining our systems to be able to identify topics…where authoritative information is particularly important. Our systems are not 100% perfect, and we’re always making improvements …”
GOOGLE STATEMENT 4: “Third Party researchers have evaluated our results and shown that this approach has been effective, identifying that misinformation is significantly more prominent in other search engines when compared to Google.” -Excerpts from Google response to NOVA questions, February 2021
SAFIYA NOBLE: What we lose, with our hyper-reliance upon search technologies and social media, is the criteria for surfacing what’s most important can be deeply, highly manipulated.
One of the hardest case studies to write in my book was about Dylann Roof. He went online, and he was trying to make sense of the trial of George Zimmerman.
DYLANN ROOF (Interrogation Footage): And the person that, I guess I can say... I would say woke me up, and it would be the Trayvon Martin case.
TERRY MORAN (Reporter/News Footage): Trayvon Martin, an unarmed Black teenager, was shot down by a white neighborhood watchman, who claimed self-defense.
DYLANN ROOF (Interrogation Footage): Eventually I decided to, you know, look his name up, just typed him into Google. You know what I’m saying? For some reason, it made me type in the words, “Black on white crime.”
SAFIYA NOBLE: We know from Dylann Roof’s own words that the first site that he comes to is the Council of Conservative Citizens. The C.C.C. is an organization that the Southern Poverty Law Center calls vehemently racist.
DYLANN ROOF (Interrogation Footage): and that’s... that was it and ever since then.
SAFIYA NOBLE: Let’s say he had been my student; I could’ve just immediately said, “Did you know that, that phrase is kind of a racist red herring? The F.B.I. statistics show us that the majority of white people are actually killed by other white people.” But instead, he goes to the internet and he finds the C.C.C. and he goes down a rabbit hole of white supremacist websites.
DETECTIVE (Interrogation Footage): Did you read a lot? Did you read books or watch videos or watch movies or YouTube or anything like that, specifically about that subject matter?
DYLANN ROOF (Interrogation Footage): You know, it was pretty much just reading articles.
DETECTIVE (Interrogation Footage): Reading articles?
DYLANN ROOF (Interrogation Footage): Yeah.
SAFIYA NOBLE: And we know that, shortly thereafter, he goes into a church, murders nine African Americans and says his intent is to start a race war. This is not an atypical possibility. When you don’t get a counter point to the query, you don’t get Black studies scholarship or F.B.I. statistics or anything that would reframe the very question that you’re asking. This is an extreme case of acting upon white power radicalization, but this is not unlike things that are happening right now, every day, in search engines, on Facebook, on Twitter, in Gab. People are being targeted and radicalized in very dangerous ways. This is what is at stake when people are so susceptible to disinformation, hate speech, hate propaganda in our society.
LATANYA SWEENEY: Racism itself can’t be solved by technology. The question is, to what extent can we make sure technology doesn’t perpetuate it? Doesn’t allow harms to be made because of it? We need a diverse and inclusive community in the design stage and the marketing and business stage, in the regulatory and general stages, as well.
SAFIYA NOBLE: I am really interested in solutions. It’s easy to talk about the problems, and it’s painful, also, to talk about the problems. But that pain and that struggle should lead us to thinking about alternatives. Those are the kind of things that I like to talk to other information professionals and researchers and librarians about.
As a person who has a name that doesn’t sound like “Jennifer,” right? Or “Sarah,” or something, that paper made the difference for me, because I was just this grad student, and you were this esteemed Harvard professor and you were having these experience, too. When I think about the, like the, the foundations of something like ethical A.I., I go back to you in that early paper. I think what I feel most hopeful about is that there’s this new cottage industry called “ethical A.I.,” and I know that our work is profoundly tied to that. But on another level, I feel like these predictive technologies are so much more ubiquitous than they were 10 years ago.
LATANYA SWEENEY: You know, what I find really painful is that, as we move forward, it’s harder to track. One thing that comes clear is, we can use a heck of a lot more transparency. As a computer scientist, my vision is I want society to enjoy the benefits of all these new technologies, without these problems. Technology doesn’t have to be made this way.
SAFIYA NOBLE: That’s right. That’s right. I see so many more women and girls of color, interested in these conversations. And one of the things that I also see is how we see things because we ask different questions based on our lived experiences.
LATANYA SWEENEY: Just the fact that the questions are being raised means that the space is less hostile, means there’s an opportunity for your voice. And, and the other thing that’s really important about this work, it means that it’s a new, kind of, way of thinking about computer science. It’s, it’s in this conversation with you that I see a future. I’m hopeful because it’s not one isolated paper, but, in fact, it’s a, it’s a movement, who are asking the right questions, exposing the right unforeseen consequences and pushing this forward towards a solution.
SAFIYA NOBLE: Some questions cannot be answered instantly. Some issues we’re dealing with in society, we need time and we need discussion. How can we look for new logics and new metaphors and new ways to get a bigger picture? Maybe we can see, when we do that query, that, “that’s just nothing but propaganda,” and we can even see the sources of the disinformation farms, maybe we can see the financial backers. There’s a lot of ways that we can reimagine our information landscape. So, I do feel like there is some hope.
WRITTEN AND DIRECTED BY
DIRECTORS OF PHOTOGRAPHY
ASSOCIATE PRODUCER & RESEARCHER
POST PRODUCTION SERVICES
Heart Punch Studio
FILM Archives, Inc.
University of California, Los Angeles (UCLA)
A NOVA Production by 7th Empire Media for WGBH Boston.
© 2021 WGBH Educational Foundation
All rights reserved
This program was produced by WGBH, which is solely responsible for its content. Some funders of NOVA also fund basic science research. Experts featured in this film may have received support from funders of this program.
Image credit: (hands on keyboard)
© Andrey_Popov/Shutterstock, graphics by Zachary Ludescher and Pablo Londero
- Safiya Noble, Latanya Sweeney