Applications of natural-language processing commonly rely on libraries that implement common tasks such as tokenization and part-of-speech tagging. This article describes the functions of such components, focussing on the ones that aren't completely terrible at doing the jobs that they claim to do.
“Natural Language Processing Is Fun!”
Adam Geitgey, Medium, July 18, 2018
But of course. It's not a bug — it's a feature.
“Top Voting Machine Vendor Admits It Installed Remote-Access Software on Systems Sold to States”
Kim Zetter, Motherboard, July 17, 2018
The nation's top voting machine maker has admitted in a letter to a federal lawmaker that the company installed remote-access software on election-management systems it sold over a period of six years, raising questions about the security of those systems and the integrity of elections that were conducted with them.
In a letter sent to Sen. Ron Wyden (D-OR) in April and obtained recently by Motherboard, Election Systems and Software acknowledged that it had “provided pcAnywhere remote connection software … to a small number of customers between 2000 and 2006,” which was installed on the election-management system ES&S sold them.
The statement contradicts what the company told me and fact checkers for a story I wrote for the New York Times in February. At that time, a spokesperson said ES&S had never installed pcAnywhere on any election system it sold. …
ES&S customers who had pcAnywhere installed also had modems on their election-management systems so ES&S technicians could dial into the systems and use the software to troubleshoot, thereby creating a potential point of entry for hackers as well.
In May 2006 in Allegheny County, Pennsylvania, ES&S technicians used the pcAnywhere software installed on that county's election-management system for hours trying to reconcile vote discrepancies in a local election, according to a report filed at the time. And in a contract with Michigan, which covered 2006 to 2009, ES&S discussed its use of pcAnywhere and modems for this purpose. …
In 2006, the same period when ES&S says it was still installing pcAnywhere on election systems, hackers stole the source code for the pcAnywhere software …
Security researchers discovered a critical vulnerability in pcAnywhere that would allow an attacker to seize control of a system that had the software installed on it, without needing to authenticate themselves to the system with a password. And other researchers with the security firm Rapid7 scanned the internet for any computers that were online and had pcAnywhere installed on them and found nearly 150,000 were configured in a way that would allow direct access to them. …
In its letter to Wyden, ES&S defended its installation of pcAnywhere, saying that during the time it installed the software on customer machines prior to 2006, this was “considered an accepted practice by numerous technology companies, including other voting system manufacturers.”
That's the problem, all right. My guess is that installing remote-access backdoors is still a universal practice among makers of proprietary election-management devices, though perhaps “accepted” is no longer the right word for it. There's an obvious need for remote access in this day and age: Without it, how would the managers of elections be able to determine their outcomes?
By default, users of the Venmo payment service allow Venmo to mine their transaction data and share everything except the payment amounts. Venmo has chosen to exercise this liberty by providing a Web interface though which anyone with Internet access can download the transaction data — no authentication necessary!
It turns out that some people use the text fields in which one can document the reason for the payment and send a comment to the recipient as opportunities for other modes of discourse.
“A Privacy Researcher Uncovered a Year's Worth of Breakups and Drug Deals Using Venmo's Public Data”
Samantha Cole, Motherboard, July 17, 2018
Payment exchanges accumulate in a public feed, where people thought it was hysterical to write things like “money for drugs” or “sexual favors” for otherwise-innocuous payments. …
It's not so much the exposure of the intimate details of your life, … but that each transaction is just one data point in a massive web of knowledge companies like Venmo are building about us. And once they know who we're closely connected to, what we buy, and when, that's an immensely valuable dataset for companies to use in targeting your future decisions.
The Institute of Electrical and Electronic Engineers has issued a straightforward statement endorsing the use of strong encryption both by governments and by individuals and opposing requirements to insert backdoors into software packages that implement strong encryption.
“In Support of Strong Encryption”
IEEE Board of Directors, IEEE, June 24, 2018
Exceptional access mechanisms would create risks by allowing malicious actors to exploit weakened systems or embedded vulnerabilities for nefarious purposes. Knowing that exceptional access mechanisms exist would allow malicious actors to focus on finding and exploiting them. Centralized key escrow schemes would create the risk that an adversary would have an opportunity to compromise security of all participants, including those who were not specifically targeted. …
Efforts to constrain strong encryption or introduce key escrow schemes into consumer products can have long-term negative effects on the privacy, security, and civil liberties of the citizens so regulated. Encryption is used worldwide, and not all countries or institutions would honour the policy-based protections that exceptional access mechanisms would require. A purpose that one country might consider lawful and in its national interest could be considered by other countries to be illegal or in conflict with their standards and interests.
Some researchers at Google Brain have discovered a technique by which a black-box decider that has been successfully trained for one task can be used to perform an unrelated computation by embedding the inputs for that computation in the input to the black-box decider and extracting the result of the unrelated computation from the output of the black-box decider.
One of the proof-of-concept experiments that the paper describes uses ImageNet for recognition of handwritten numerals. The inputs for the numeral-recognition problem are small images (twenty-eight pixels high and twenty-eight pixels wide), and the task is to determine which of the ten decimal numerals each input represents. Normally ImageNet takes much larger, full-color images as inputs and outputs a tag identifying what's in the picture, chosen from a list of a thousand fixed tags. Numerals aren't included in that list, so ImageNet never outputs a numeral. It's not designed to be a recognizer for handwritten numerals.
But ImageNet can be coopted. The researchers took the first ten tags from the ImageNet tag list and associated them with numerals (tench ↦ 0, goldfish ↦ 1, etc.). Then they set up an optimization problem: Find the pattern of pixels making up a large image so as to maximize the ImageNet's success in “interpreting” the images that result when each small image from the training set for the numeral-recognition task is embedded at the center of the large image. An interpretation counts as correct, for this purpose, if ImageNet returns the tag that is mapped to the correct numeral.
The pixel pattern that emerges from this optimization problem looks like video snow; it doesn't have any human-recognizable elements. When one of the small handwritten numerals is embedded at the center, the image looks to a human being like a white handwritten numeral in a small black square surrounded by this random-looking video snow. But if the numeral is a 9, ImageNet thinks that it looks very like an ostrich, whereas if it's a 3, then ImageNet thinks that it depicts a tiger shark.
Note that ImageNet is not being retrained here and isn't doing anything that it wouldn't do right out of the box. The “training” step here is just finding the solution to the optimization problem: What pattern of pixels will most effectively trick ImageNet into doing the computation we want it to do when the input data for our problem is embedded into that pattern of pixels?
The researchers call the optimized pixel patterns “adversarial programs.”
Besides the numeral-recognition task, the researchers were also able to trick ImageNet — six different variants of ImageNet, in fact — into doing two other standard classification tasks, just by finding optimal pixel patterns — adversarial programs — in which to embed the input data.
“Adversarial Reprogramming of Neural Networks”
Gamaleldin F. Elsayed, Ian Goodfellow, and Jascha Sohl-Dickstein, arXiv, June 28, 2018
“Tech's ‘Dirty Secret’: App Developers Sift Through Your Gmail”
Douglas MacMillan, Stocks Newsfeed, July 2, 2018
But the Internet giant continues to let hundreds of outside software developers scan the inboxes of millions of Gmail users who signed up for email-based services offering shopping price comparisons, automated travel-itinerary planners or other tools. Google does little to police those developers, who train the computers — and, in some cases, employees — to read their users' emails …
Letting employees read user emails has become “common practice” for companies that collect this type of data, says Thede Loder, the former chief technology officer at eDataSource Inc. … He says engineers at eDataSource occasionally reviewed emails when building and improving software algorithms.
“Some people might consider that to be a dirty secret,” says Mr. Loder. “It's kind of reality.”
“A Chinese-Style Digital Dystopia Isn't As Far Away As We Think”
Matt Stoller, Buzzfeed, June 27, 2018
We accept price discrimination all the time; going to the movies and getting a senior discount is price discrimination. But in that case, the decision of how to discriminate is done by class; it is publicly posted; and everyone accepts that, in this case, seniors get a discount. It is a public decision to discriminate.
Discriminating on an individual level is different and allows for powerful exploitation and manipulation of the citizen. In areas with first-degree price discrimination, like car insurance or credit cards, there are often gender- or race-based pricing choices. With increasing datafication of society, we can see this increasingly organized to the level of the individual.
An airline could, for instance, analyze your email for the words “death in the family” and “travel,” look at your credit limit, and then offer you a price based on this information. Or imagine a group of companies putting together a common list of troublemakers, perhaps negative online reviewers or commenters or consumers who frequently return items. All of a sudden, for no obvious reason, someone who returns an item to one store might find that prices on a host of socially [essential] goods have [gone] up.
Corporations generally deny they do anything like this or even that they can. But …
We are now in a totally unregulated world of lawless web giants who operate as the core infrastructure for our society. They can use their data and power to discriminate and exploit, and the strategy now for companies like AT&T is to emulate them, or die. And the deep links that intelligence agencies have with these giants suggest that this power can, with a flip of a few switches, be easily weaponized by the state.
In its surveillance of American citizens, the National Security Agency is supposed to be constrained by the Foreign Intelligence Surveillance Act, which specifies exactly which violations of the Fourth Amendment are notionally permitted and which ones are doubly and explicitly prohibited by Congress.
The NSA, being above the law, ignores all such constraints whenever it is convenient for them to do so. But the Foreign Intelligence Surveillance Act stipulates that the NSA is subject to a feeble kind of judicial oversight and review, by a body called the Foreign Intelligence Surveillance Court, which has managed to detect a few of the NSA's numerous modes of violation and issued carefully phrased reprimands.
This article attempts to enumerate the known violations and points out that, taken together, they demonstrate that the NSA operated illegally from 2004 through 2018, without interruption.
“NSA — Continually Violating FISA Since 2004”
Marcy Wheeler, emptywheel, June 28, 2018
“Deceived by Design: How Tech Companies Use Dark Patterns to Discourage Us from Exercising Our Rights to Privacy”
Forbrukerrådet, June 27, 2018
“Thermostats, Locks and Lights: Tools of Domestic Abuse”
Nellie Bowles, The New York Times, June 23, 2018
There are also great possibilities here for landlords and managers of residential-care facilities to drive out tenants/residents who complain too much or fall behind in the rent.
Abusers — using apps on their smartphones, which are connected to the internet-enabled devices — would remotely control everyday objects in the home, sometimes to watch and listen, other times to scare or show power. Even after a partner had left the home, the devices often stayed and continued to be used to intimidate and confuse.
For victims and emergency responders, the experiences were often aggravated by a lack of knowledge about how smart technology works, how much power the other person has over the devices, how to legally deal with the behavior and how to make it stop. …
Those at help lines said more people were calling in the last 12 months about losing control of Wi-Fi-enabled doors, speakers, thermostats, lights and cameras. Lawyers also said they were wrangling with how to add language to restraining orders to cover smart home technology. …
Legal recourse may be limited. Abusers have learned to use smart home technology to further their power and control in ways that often fall outside existing criminal laws.