By default, users of the Venmo payment service allow Venmo to mine their transaction data and share everything except the payment amounts. Venmo has chosen to exercise this liberty by providing a Web interface though which anyone with Internet access can download the transaction data — no authentication necessary!
It turns out that some people use the text fields in which one can document the reason for the payment and send a comment to the recipient as opportunities for other modes of discourse.
“A Privacy Researcher Uncovered a Year's Worth of Breakups and Drug Deals Using Venmo's Public Data”
Samantha Cole, Motherboard, July 17, 2018
Payment exchanges accumulate in a public feed, where people thought it was hysterical to write things like “money for drugs” or “sexual favors” for otherwise-innocuous payments. …
It's not so much the exposure of the intimate details of your life, … but that each transaction is just one data point in a massive web of knowledge companies like Venmo are building about us. And once they know who we're closely connected to, what we buy, and when, that's an immensely valuable dataset for companies to use in targeting your future decisions.
Practically all video games, apps, consoles, and platforms now collect location data, contact lists, and biometric data on players and sell it to advertisers.
“Privacy in Gaming”
N. Cameron Russell, Joel R. Reidenberg, and Sumyung Moon, Center on Law and Information Policy, Fordham Law School, March 19, 2018
There are currently many different ways that game companies collect data from users, including through hardware (cameras, sensors, and microphones), platform features (social media aspects and abilities for other user-generated content), and tracking technologies (cookies and beacons). Location data and biometric data — like facial, voice, heart rate, weight, skin response, brain activity, and eye-tracking data — is now routinely collected while gaming. In mobile gaming, requests for access to a user's contacts or address book are common. …
There may also be an interrelationship between data collection, game functionalities, and external hardware items like the Apple Watch or the smartphone device. Moreover, gaming companies have business relationships with each other. Data flows extend beyond the game and game console, and game data is often aggregated with external partners and sources. Every game and platform … examined states that game data may be shared with advertising platforms or used for advertising purposes. Although there are some avenues for opt-outs and user choice, users may have difficulty discerning the identities of third party affiliates with whom gaming companies share data even after reading the relevant privacy policies.
We have now reached the point at which it is foolish to register at most corporate Web sites, not just because they will send spam to the e-mail address you provide, but also because registration implies acceptance of the site's terms of service.
“Registering for Things on the Internet Is Dangerous These Days”
Chris Siebenmann, Chris's Wiki, May 24, 2018
In the old days, terms of service were not all that dangerous and often existed only to cover the legal rears of the service you were registering with. Today, this is very much not the case … Most ToSes will have you agreeing that the service can mine as much data from you as possible and sell it to whoever it wants. Beyond that, many ToSes contain additional nasty provisions like forced arbitration, perpetual broad copyright licensing for whatever you let them get their hands on (including eg your profile picture), and so on. …
The corollary to this is that you should assume that anyone who requires registration before giving you access to things when this is not actively required by how their service works is trying to exploit you. For example, “register to see this report” should be at least a yellow and perhaps a red warning sign. My reaction is generally that I probably don't really need to read it after all.
A corporate portrait of Palantir, which collects and mines data about people.
“Palantir Knows Everything about You”
Peter Waldman, Lizette Chapman, and Jordan Robertson, Bloomberg, April 19, 2018
If you use the Internet or interact regularly with people who do, companies like Google and Facebook have compiled dossiers on you, regardless of whether you have ever set up accounts with them or used their services.
“Facebook Is Tracking Me Even Though I'm Not on Facebook”
Daniel Kahn Gillmor, Free Future, American Civil Liberties Union, April 5, 2018
Nearly every Website you visit that has a “Like” button is actually encouraging your browser to tell Facebook about your browsing habits. Even if you don't click on the “Like” button, displaying it requires your browser to send a request to Facebook's servers for the “Like” button itself. That request includes information mentioning the name of the page you are vising and any Facebook-specific cookies your browser might have collected. (See Facebook's own description of this process.) …
This makes it possible for Facebook to created a detailed picture of your browsing history — even if you've never even visited Facebook directly, let alone signed up for a Facebook account.
Think about most of the web pages you've visited — how many of them don't have a “Like” button? If you administer a website and you include a “Like” button on every page, you're helping Facebook to build profiles of your visitors, even those who have opted out of the social network. …
The profiles that Facebook builds on non-users don't necessarily include so-called “personally identifiable information” (PII) like names or email addresses. But they do include fairly unique patterns. Using Chromium's NetLog dumping, I performed a simple five-minute browsing test last week that included visits to various sites — but not Facebook. In that test, the PII-free data that was sent to Facebook included information about which news articles I was reading, my dietary preferences, and my hobbies.
Given the precision of this kind of mapping and targeting, “PII” isn't necessary to reveal my identity. How many vegans examine specifications for computer hardware from the ACLU's offices while reading about Cambridge Analytica? Anyway, if Facebook combined that information with the “web bug” from the email mentioned above — which is clearly linked to my name and e-mail address — no guesswork would be required.
If you want to target advertising effectively or persecute people for their political views or social status, building the dossiers at the vertices of the social graph is only the beginning. You can make many more reliable inferences if you identify and label the edges of the graph and study not only your target nodes' neighbors, but also their neighbors' neighbors.
“Stanford Researchers Find That Friends of Friends Reveal Hidden Online Traits”
Tom Abate, Stanford News, April 5, 2018
Researchers who have studied social media relationships have found that we tend to friend people of roughly our own age, race and political belief. … These traits are easily and accurately inferred from friendship studies. …
But not all unknown traits are easy to predict using friend studies. Gender, for instance, exhibits what researchers call weak homophily in online contexts. …
The group's new research shows that it's possible to infer certain concealed traits — gender being the first — by studying the friends of our friends.
“Your Own Devices Will Give the Next Cambridge Analytica Far More Power to Influence Your Vote”
Justin Hendrix and David Carroll, MIT Technology Review, April 2, 2018
Though it's not clear if Cambridge Analytica's behavioral profiling and microtargeting had any measurable effect on the 2016 US election, these technologies are advancing quickly — faster than academics can study their effects and certainly faster than policymakers can respond. The next generation of such firms will almost certainly deliver on the promise. …
In the next few years, … we'll see the convergence of multiple disciplines, including data mining, artificial intelligence, psychology, marketing, economics, and experiential design theory. These methods will combine with an exponential increase in the number of surveillance sensors we introduce into our homes and communities, from voice assistants to internet-of-things devices that track people as they move through the day. Our devices will get better at detecting facial expressions, interpreting speech, and analyzing psychological signals.
In other words, the machines will know us better tomorrow than they do today. They will certainly have the data. While a General Data Protection Regulation is about to take effect in the European Union, the US is headed in the opposite direction. Facebook may have clamped down on access to its data, but there is more information about citizens on the market than ever before … not to mention all the data sloshing around thanks to hacks and misuse.
The “exponential increase in the number of surveillance sensors we introduce into our homes and communities” is already well along. We're past the knee of the curve and climbing the shaft of the hockey stick. Soon the only constraints will be bandwidth and network congestion, as a trillion cameras, microphones, and sensors all try to deliver their data in real time to the marketers, propagandists, spies, and law-enforcement teams poised in eager expectation.
Surveillance is essential to Facebook's business model. It collects and compiles enormous amounts of personal data on its users (and non-users), and it sells to its customers — advertisers, academics, political operatives, and others — the privilege of creating applications that collect and compile still more personal data.
In theory, Facebook doesn't actually sell its dossiers to its customers. It only licenses the data, or the right to collect data, retaining control over any further dissemination so as to maintain its ownership of its most valuable intellectual property. In practice, Facebook has no effective means of preventing its customers from copying and distributing any data they have legitimately obtained. The licenses that it relies on turn out to be quite difficult to enforce.
In 2014, a senior research associate at Cambridge University, Aleksandr Kogan, wrote a Facebook app called “thisismydigitallife.” Superficially, it was a personality quiz, but the people who signed up to take it gave Kogan permission to access their Facebook profiles and the Facebook profiles of the people they had friended. Facebook approved this arrangement but stipulated that the data that Kogan collected be used solely for the purpose of academic research.
Kogan agreed to this stipulation and proceeded to collect millions of Facebook profiles through the app. Instead of mining the data at Cambridge, however, he set up a company called Global Science Research and carried out his supposedly academic research there. Global Science Research had a million-dollar contract with another company, SCL Group. One of SCL's subsidiaries, SCL Elections, had recently secured funding to set up a new corporation, Cambridge Analytica, to explore the use of data-mining techniques to find reliable correlations between the personalities and “likes” of individual Facebook users on one hand and their political views and behaviors on the other. Because Kogan's research was funded, at least in part, by Cambridge Analytica, he apparently saw nothing wrong with sharing with his employers the data on which his research was based.
It's quite possible that sharing this data with a commercial enterprise violated Kogan's understanding with Facebook. It may also be a violation of UK data-protection laws, because Kogan asked the people who used his app only for their permission to collect and study their personal data, not for permission to share it with (or sell it to) third parties.
However, the only thing that prevented Cambridge Analytica from obtaining the same data directly from Facebook is that the license would probably have cost them much more money. Nothing in Facebook's notoriously lax, mutable, and labyrinthine privacy policies would have obstructed such a transaction if the price was right. Facebook's dossiers are their principal product, and selling access to them is their principal source of revenue.
Facebook now claims that Kogan and Cambridge Analytica have violated its terms of service and has closed their Facebook accounts. Lawsuits and threats of lawsuits are now flying in all directions, and some members of Congress are threatening to launch terrifying inquisitions into the monstrous abuse of the American electoral process that Cambridge Analytica supposedly perpetrated with the assistance of Kogan's data. However, there are now so many unlicensed copies of the data that there is no way to ensure that all of them will ever be erased, or even located. Now that arbitrarily large amounts of data can be copied quickly and inexpensively, and now that multiple backups of valuable data are the norm, the idea of restricting the distribution of data through licensing is a non-starter. It can't possibly work.
There's another reason why the lawsuits and the fulminations of member of Congress are idle, from the point of view of ordinary Facebook users (and non-users): Surveillance is essential to Facebook's business model. If Facebook stopped collecting and compiling personal data and erased its current stores, it would quickly go bankrupt. But once the dossiers exist, it is inevitable that they will be copied and disseminated, and once they are copied and disseminated, it is impossible ever to recover and destroy all of the copies, data-protection and privacy laws notwithstanding.
Instead (as Mark Zuckerberg's Facebook post on this subject makes clear), Facebook will continue to build up massive dossiers as fast as it can and will continue to use the information in those dossiers as it sees fit. The steps that Zuckerberg describes as “protecting users' data” are all designed to protect Facebook's proprietary interest in everyone's personal data, to prevent or at least obstruct the propagation of the dossiers to unworthy outsiders.
“Suspending Cambridge Analytica and SCL Group from Facebook”
Paul Grewal, Facebook Newsroom, March 16, 2018
“How Trump Consultants Exploited the Facebook Data of Millions”
Matthew Rosenberg, Nicholas Confessore, and Carole Cadwalladr, The New York Times, March 17, 2018
“Cambridge Analytica Responds to Facebook Announcement”
Cambridge Analytica, March 17, 2018
“‘I Made Steve Bannon's Psychological Warfare Tool’: Meet the Data War Whistleblower”
Carole Cadwalladr, The Guardian, March 18, 2018
“Cambridge Analytica's Ad Targeting Is the Reason Facebook Exists”
Jason Koebler, Motherboard, March 19, 2018
Though Cambridge Analytica's specific use of user data to help a political campaign is something we haven't publicly seen on this scale before, it is exactly the type of use that Facebook's platform is designed for, has facilitated for years, and continues to facilitate every day. At its core, Facebook is an advertising platform that makes almost all of its money because it and the companies that use its platform know so much about you.
Facebook continues to be a financially successful company precisely because its platform has enabled the types of person-specific targeting that Cambridge Analytica did. …
“The incentive is to extract every iota of value out of users,” Hartzog [Woodrow Hartzog, Professor of Law and Computer Science at Northeastern University] said. “The service is built around those incentives. You have to convince people to share as much information as possible so you click on as many ads as possible and then feel good about doing it. This is the operating ethos for the entire social internet.”
“Facebook's Surveillance Machine”
Zeynep Tufekci, The New York Times, March 19, 2018
Billions of dollars are being made at the expense of our public sphere and our politics, and crucial decisions are being made unilaterally, and without recourse or accountability.
“Then Why Is Anyone Still on Facebook?”
Wolf Richter, Wolf Street, March 20, 2018
So now there's a hue and cry in the media about Facebook, put together by reporters who are still active on Facebook and who have no intention of quitting Facebook. There has been no panicked rush to “delete” accounts. There has been no massive movement to quit Facebook forever. Facebook does what it does because it does it, and because it's so powerful that it can do it. A whole ecosystem around it depends on the consumer data it collects. …
Yes, there will be the usual ceremonies … CEO Zuckerberg may get to address the Judiciary Committee in Congress. The questions thrown at him for public consumption will be pointed. But behind the scenes, away from the cameras, there will be the usual backslapping between lawmakers and corporations. Publicly, there will be some wrist-slapping and some lawsuits, and all this will be settled and squared away in due time. Life will go on. Facebook will continue to collect the data because consumers continue to surrender their data to Facebook voluntarily. And third parties will continue to have access to this data. …
People who are still active on Facebook cannot be helped. They should just enjoy the benefits of having their lives exposed to the world and serving as a worthy tool and resource for corporate interests, political shenanigans, election manipulators, jealous exes, and other facts of life.
“Facebook Sued by Investors over Voter-Profile Harvesting”
Christie Smythe and Kartikay Mehrotra, Bloomberg Technology, March 20, 2018
“The Researcher Who Gave Cambridge Analytica Facebook Data on 50 Million Americans Thought It Was ‘Totally Normal’”
Kaleigh Rogers, Motherboard, March 21, 2018
Kogan said he was under the impression that what he was doing was completely normal.
“What was communicated to me strongly was that thousands and maybe tens of thousands of apps were doing the exact same thing and that this was a pretty normal use case and a normal situation for usage of Facebook data,” Kogan said.
“Facebook's Mark Zuckerberg Vows to Bolster Privacy amid Cambridge Analytica Crisis”
Sheera Frenkel and Kevin Roose, The New York Times, March 21, 2018
“It's Too Late”
Jason Koebler, Motherboard, March 21, 2018
I must have missed this article when it first appeared.
“They're Watching You at Work”
Don Peck, The Atlantic, December 2013
An even more thought-provoking passage deals with data mining as a method of distinguishing candidates for software-development positions:
Torrents of data are routinely collected by American companies and now sit on corporate servers, or in the cloud, awaiting analysis. Bloomberg reportedly logs every keystroke of every employee, along with their comings and goings in the office. The Las Vegas casino Harrah's tracks the smiles of the card dealers and waitstaff on the floor (its analytics team has quantified the impact of smiling on customer satisfaction). E-mail, of course, presents an especially rich vein to be mined for insights about our productivity, our treatment of co-workers, our willingness to collaborate or lend a hand, our patterns of written language, and what those patterns reveal about our intelligence, social skills, and behavior. As technologies that analyze language become better and cheaper, companies will be able to run programs that automatically trawl through the e-mail traffic of their workforce, looking for phrases or communication patterns that can be statistically associated with various measures of success or failure in particular roles.
This past summer, I sat in on a sales presentation by Gild, a company that uses people analytics to help other companies find software engineers. I didn't have to travel far: Atlantic Media, the parent company of The Atlantic, was considering using Gild to find coders. …
The company's algorithms begin by scouring the Web for any and all open-source code, and for the coders who wrote it. They evaluate the code for its simplicity, elegance, documentation, and several other factors, including the frequency with which it's been adopted by other programmers. For code that was written for paid projects, they look at completion times and other measures of productivity. Then they look at questions and answers on social forums such as Stack Overflow, a popular destination for programmers seeking advice on challenging projects. They consider how popular a given coder's advice is, and how widely that advice ranges.
The algorithms go farther still. They assess the way coders use language on social networks from LinkedIn to Twitter; the company has determined that certain phrases and words used in association with each other can distinguish expert programmers from less skilled ones. Gild knows these phrases and words are associated with good coding because it can correlate them with its evaluation of open-source code, and with the language and online behavior of programmers in good positions at prestigious companies.
Here's the part that's most interesting: having made those correlations, Gild can then score programmers who haven't written open-source code at all, by analyzing the host of clues embedded in their online histories. They're not all obvious, or easy to explain. Vivienne Ming, Gild's chief scientist, told me that one solid predictor of strong coding is an affinity for a particular Japanese manga site.
Why would good coders (but not bad ones) be drawn to a particular manga site? By some mysterious alchemy, does reading a certain comic-book series improve one's programming skills? “Obviously, it's not a causal relationship,” Ming told me. But Gild does have 6 million programmers in its database, she said, and the correlation, even if inexplicable, is quite clear. …
Gild's CEO, Sheeroy Desai, told me that he believes his company's approach can be applied to any occupation characterized by large, active online communities, where people post and cite individual work, ask and answer professional questions, and get feedback on projects.
It cheers me somewhat to report that Gild appears to have gone out of business in 2016.