We've had all of our data on Facebook for years. And advertisers have used Facebook's self-service ad-buying platform, similar to Graph Search, to segment and target ads at us for years. But now, the rollout of Graph Search allows every Facebook user to sort their friends, friends of friends, and public profiles in great detail and by detailed refinements. (You can see some fun and some creepy examples at the Actual Facebook Graph Searches on Tumblr.)
Here are some existing critiques of Graph Search and its privacy implications:
- Search Option From Facebook Is a Privacy Test
- Why Graph Search Could Be Facebook's Largest Privacy Invasion Ever
- Purpose of Facebook's Graph Search: To help you buy stuff
Part of the reason we trust so much of our information with Facebook is security through ubiquity: No one's really going to take the time to look through all of our data. But Graph Search threatens to make our information immediately more interesting and relevant to anyone in our network who's looking for it.
Of course, the tool is only as good as the data it searches. Many Facebook users have let their profile information grow stale over time. Other profile information is encoded in social and site-wide norms. Many users who are, in fact, single, do not list this status; they simply leave their information blank. The number of Graph Search examples targeting single users leads me to believe we're about to see a wave of relationship status update changes.
Facebook's Search team consists of more than 50 people. It's led by Lars Rasmussen, who started what became Google Maps.
How It WorksThe nodes (including users, pages, networks and groups) and edges (their relationships) can be directional and non-directional; there can also be more than one edge connecting two nodes. Other nodes are entered by the user but can't be edited (e.g., the university you attended, the course you took). There are hundreds of types of nodes and thousands of types of edge. There are over a billion user nodes in the graph, 240 billion photo nodes, and 1 trillion edges (the lines that connect nodes). It's an enormous amount of information to query.
From an ontological perspective this is interesting. Facebook has created hundreds of types of objects, or created the possibilities for hundreds of types of objects, which are all related to each other through different kinds of edges.
When you search on the web, you enter text, and get a list of links; you go through that list of links until you find the answer. So you as a user define the query and search for the result. Graph search is different because the dropdown gives you a list of possible queries (TypeAhead) and before you press search, you have already defined and structured the search query. Facebook says this is more streamlined and efficient than Google's "suggested search," because it defines the type of information you're looking for before you search.
Potential queries include: "Bars visited by my friends who like beer" to "Videos of TV shows liked by people who like things I like" would work, but "My friends who are tall" or "Weather in Boston" would not work, because the information isn't in the graph.
For example, "Bars visited by my friends who like beer" tells Facebook to search for bars (a given object in a type) that have been visited (checked-in to) by my friends (a relationship indicated by type of edge) who like (a relationship indicated by a type of edge) beer (an object which can be liked)." But "My friends who are tall" doesn't, because while "my friends" is a valid type of relationship, "tall" is neither a type of object nor a type of relationship defined, collected, or stored by Facebook.
As you can see, the engine attempts to define queries in semantic sentences that make sense to human beings. Facebook uses a context-free grammar (CFG) to define what users can search for, and help the system understand what they mean.
In order to work out what the user is searching for, the Graph Search needs a sophisticated parser. A parser matches natural language tokens from the grammar, and words that may match names of users, places or things. The parser understands distance between objects, and returns results in this order.
Graph Search also includes a simple web search, powered by Bing, but this does not use the same process as the rest of Graph Search. Real, live linguists helped with this grammar.
The URL of the page shows the query that the system is making, e.g., facebook.com/search/me/pages-liked/likers.
Facebook is trying to make the experience not feel like web search -- instead of getting a list of possible answers, you're getting more structured results. Visually, it moves Facebook in a similar direction as Twitter and Google's cards metaphor. In a page of search results of profile pages, the first attribute listed under the person's name is the search query you made (such as Matt Stempeck, "likes pages that I like", followed by the rest of his information).
The key conceptual difference between web search and Graph Search, implies the Facebook representative who spoke at the event, is that while web search is informed by both text (natural language) and relationships between pages (links), Graph Search is informed by relationships between types of objects. Because so many things in Facebook are conceptualized as types of objects, what Graph Search can do is query the relationships which exist between those objects, human or non-human. This is distinguished from, say, Google Search, which queries across the content associated through relationships (links).
For example, a query in Facebook Graph works something like:
-- friends(me) -> friends(711030) (where 711030 is my user ID)
-- photos-tagged(me) -> photos-tagged(711030)
-- photos-by(me)∪photos-tagged(me)-> the union of photos-by(711030) and photos-tagged(711030)
Is there a limit to how many layers of the inception social graph a search can reach? Friends of friends of friends? Yes, according to Facebook: eight (to which someone in the audience wisecracked: "Why do you need eight if we're all connected by six degrees?")
The results are privacy-aware. You won't see any Graph Search results you couldn't see otherwise. But you'll likely see information you wouldn't have otherwise found. This has resulted in the uneasy feeling some have felt following the announcement of Graph Search.
There is a "diversification" module. Facebook ran early tests and realized they did not have enough result diversity. If you searched for friends with pictures in bars, for example, it would return all the pictures of one friend in a bar, then another friend in a bar, not something that would return different types of related results. Facebook wanted search results that you're encouraged to browse through, in which search results aren't too similar to each other.
A student at the event asked why the privacy check is the final link in the chain -- wouldn't it be more efficient to do the privacy check before diversifying the results? The answer is yes. The privacy checks are done in PHP, in the site's front end. It's the most expensive part of the system.
Another student asked how many of the results processed through the system are "caught" in the final privacy checker step. The Facebook representative said that it depends on the query. If you ask "my friends," 100% of the queries will come through, since you can always see your friends. But if you ask the system "places where my friends of friends live," where location is something which frequently has privacy controls, the privacy filter may weed out well more than half of the results. It's by far the most computationally expensive part of the system -- it would make their hardware and software design and maintenance a lot easier to do if they didn't do it, but they do.
Facebook has not yet internationalized Graph Search. The rendering and parsing of the grammar currently only works in English. Negation is not supported, either. Facebook simply hides things from you if you cannot see it due to privacy settings, so double negatives could allow users to infer otherwise private information.
privacy uproar?One student pointed out that, when Facebook rolled out the News Feed product, a user uproar followed, even though it wasn't technically exposing any information that wasn't already viewable. It just made viewing this information far more convenient. Could Graph Search deliver the same scare, he asks? By way of comparison, the student pointed out that pre-Graph Search it would be very difficult to find pictures taken by friends in which (Facebook CEO) Mark Zuckerberg was tagged without being friends with Zuckerberg himself, because it would require clicking through all photos taken by all friends looking for Zuckerberg. Graph Search, on the other hand, eliminates the intermediary step, by allowing the user to directly query the relationships between objects.
The Facebook representative acknowledged this, to which another student in the audience snarkily rejoined that "yes, the point of Graph Search is to make it easier to find things."
Couldn't lazy cops just search for users in the Cambridge area who love illegal drugs? Answer: Be careful what you post on the internet.
One of the Graph Search team's next tasks is developing the mobile-specific search, which would open highly location-sensitive queries like "Friends who like baseball who are near me right now."
The most common Graph Search query so far is "Graph Search." That was unexpected, according to Facebook.
The potential of Graph Search is limited not just by potential privacy uproars, but also by computational expense. Potentially interesting queries like "Places my friends have taken the most photos" take too much server time, because counting is computationally expensive.
Matt Stempeck is a Research Assistant at the Center for Civic Media at the MIT Media Lab. He has spent his career at the intersection of technology and social change, mostly in Washington, D.C. He has advised numerous non-profits, startups, and socially responsible businesses on online strategy. Matt's interested in location, games, online tools, and other fun things. He's on Twitter @mstem.
This post originally appeared on the MIT Center for Civic Media blog.