What the NSA leak means for Google’s Big Data programs







During the last weeks, we have witnessed how the border between online-paranoia and realistic panopticon has been largely dissolved. While the documents revealed by Edward Snowden suggest that the NSA has had direct access to user data of leadings internet companies such as Apple, Google, Facebook and Microsoft, all of them deny their conscious involvement in the recently scandalised Prism program. There are several, obvious reasons not to admit to such a strategy—which is seen as betrayal of users’ trust and a severe infringement of their privacy. Particularly, search engines such as Google structure their entire business model not only around the fact that a sufficiently large number of people uses the search service: moreover, they rely on their users’ ‘natural’ search behaviour which should not be influenced by any suspicions of being monitored. Losing their users’ trust does not only result in declining user numbers, but it might also affect a user behaviour which cannot be reliably utilized to generate representative and prognostic data as provided by Google Trends or involved in programs such as Google Flu. Once your users hesitate to put in particular, sensitive or personal search requests, the thereby produced data can only very conditionally be employed to constitute a reliable, significant (marketing) data base.

Looking at recent developments by Google Inc., the Trends Service has gained significant attention: it is no longer merely of interest for marketers, but has managed to attract common, private use incentives. The subsection offers nicely laid out diagrams, indicating the quantitative development of  search queries for phrases over time. It also reveals geographic emphases as well as common co-occurrences of terms. However, Google Trends does not only offer possibilities of observing trending topics and experimenting with semantic correlations, it also serves as an indicator of Google’s access and power through the hidden knowledge in search weblogs. As I have mentioned in an earlier post, Google Flu Trends gave us a foretaste of the analytic potential which emerges from multiplicities of search requests by people all around the globe. The vague notion of ‘multiplicities’, however, is a first irritating aspect about the data provided by Google Trends: The user does not actually learn what kind of numbers are building the base for Google’s Trends visualisations. The quantity of search requests is only indicated on a scale from 1 to 100 which necessarily changes once you put in several terms in relation to each other. Seeing that such a data base has already been used for scientific purposes (this does not apply to Google Flu Trends which is based on publicly unrevealed, de facto statistics), one has to ask to what extent such studies are actually acting blindly. The process of raising the data stays a black box to them and any correlation—as it has for example been stated by Preis, Moat and Stanley for stock exchange developments in 2012—can merely be treated as temporary affirmation.

Data scandals such as Snowden’s revelation of NSA documents are crucial points to observe how the big data produced by search engines such as Google are heavily dependent on users’ behaviour. Already in July 2012 Moritz Tremmel diagnosed a fluctuation in users’ choices of search engines. Google’s retargeting strategy and earlier (as well as less seminal) data scandals were considered as reasons for users’ hesitation to use market-leading search engines and their drift towards alternatives. Also more recently, DuckDuckGo, Blekko and Ixquick were registering significantly higher numbers of search queries and unique visitors. Especially DuckDuckGo, which has been meanwhile called the “anti-google”, attracts users with its promise to “Search anonymously. Find instantly”.


In this sense, loosing users trust also affects their behaviour and choice of search engines which again has emerged as a main base of services and income streams of companies such as Google. The leading search engine provider relies in a sense on its users’ uninhibited search behaviour which produces data that do not deviate from their actual interests or needs. Once the user trust is damaged to an extent that users do, for example, choose alternative search engines as sources for sensitive or political topics, likewise the data produced by the former database choice is necessarily distorted.

While secreting the functionalities of its algorithms used to be a main strategy of Google to ensure that their data could not be manipulated—as it was temporarily the case in examples such as provided by Brent Payne—it is now the users (dis-)trust which might develop into the next crucial interference factor. Developments such as the Tor network, but also the most recent Microsoft campaign which assures users’ control over their data and implies the promise of bringing back their privacy, are indicators of an increasing awareness of users that their usage immediately produces data which they possibly do not want to reveal. In this context, the most recent NSA scandal acts as a final reassurance that you are not necessarily wearing a tin foil head when you are assuming that your online-life is not as private as you would like it to be.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.