Data Digest № 015

Data Digest ¦ August 11th, 2019, 11:00 pm

Hey there, and welcome to the 15th edition of the Data Digest, where I offer a weekly summary of the most important happenings in the data industry. This week in review: Hyp3r gets hyped on your personal data, security experts expose flaws in GDPR, Twitter admits to a data mishap, data silos halt AI progress, and more. Enjoy!

Instagram’s “Trusted Partner” HYP3R Has Been Caught Scraping Millions of Profiles

San Francisco based startup HYP3R, a “preferred marketing partner” of Facebook-owned Instagram until late, has been caught scraping huge amounts of public user data. The firm has publicly admitted that it has been hoarding “a unique dataset of hundreds of millions of the highest value consumers in the world”, revealing that “90% of its data came from Instagram.” Business Insider wrote that HYP3R took “advantage of an Instagram security lapse” allowing users who weren’t logged in to view posts from public location pages. Using that access, the company created geofenced locations, harvested “every public post tagged with that location on Instagram,” and stored them indefinitely. They even built a tool to download Instagram Stories, which are supposed to auto-delete after 24 hours but instead were essentially made permanent. With this data, the firm hashed together detailed records of millions of users’ locations, personal bios and photographs posted to ‘stories’, which enabled the firm to construe accurate interest and behavioral patterns of users, allowing them to be effectively targeted with ads. Facebook’s lax privacy policies have allowed this data breach to continue under its nose for over a year, issuing HYP3R a cease and desist and kicking them off the platform following the reported data breach. More than a year after the Cambridge Analytica scandal, it comes as no surprise that Facebook is still uncovering important privacy lapses, highlighting the widespread and urgent need for open platforms to perform due diligence when it comes to users’ personal information. The total volume of user data HYP3R scraped from Instagram remains unclear. A former HYP3R employee disclosed, “It takes very little effort for Instagram to protect the location accessed by HYP3R, why they haven’t done it remains a mystery.” Begging the question, how many others are still getting away with the same thing?

GDPR Privacy Law Exploited To Reveal Personal Data

University of Oxford based researcher and security expert, James Pavur, presented worrying findings from a curious experiment at the Black Hat conference in Las Vegas. The experiment was intended to replicate an attack that could be carried out by someone starting with the details found on a basic LinkedIn page or similar public profile. He contacted over 80 firms of different sizes based in the UK and US to see how they would handle a “right of access” request made in someone else’s name. In each case, he asked for all the details they held on his fiancee. During the experiment, he managed to expose a total of 60 distinct pieces of personal information about his partner. His findings revealed that a staggering 24% of the 83 firms supplied personal information without even verifying the requestor’s identity, 16% requested an easily forged type of ID that he did not provide, and only 5% said they had no data to share. This is the first of its kind in revealing the negligent security when firms are faced with the citation of an EU privacy law, and highlights the lack of clearly defined best practices to be implemented in order to keep up with enhanced privacy regulations. The implications of this ground-breaking research are yet to be seen. Industry-wide best practices should be established to guide companies’ compliance efforts. Clearly, stricter security measures within firms need to be implemented top-down.

Black Hat: GDPR privacy law exploited to reveal personal data

One in four firms holding a test subject's data released it to her partner without her permission.

Twitter Sharing User Data With Advertisers, Even After Users Explicitly Tell Them Not To

Twitter has admitted to sharing users personal data to advertisers regardless of the users’ permission. Whether this will attract regulatory attention is yet to be seen. European regulation under GDPR mandates disclosure of data breaches, meaning the case will depend on how long ago Twitter found the bugs. GDPR also includes fines for confirmed data protection violations. Twitter revealed bugs that affected the way it shares personal data back in May 2019 when they disclosed that they had been sharing users location data during Real Time Bidding (RTB) auctions by accident. Twitter stated that they “may have shared certain data (e.g., country code; if you engaged with the ad and when; information about the ad, etc)” with ad measurement and advertising partners. If social media companies are able to get away with these security ‘mishaps’ with not even as much as a slap on the wrist, what’s the point of putting the regulations there in the first place?

Twitter ‘fesses up to more adtech leaks – TechCrunch

Twitter has disclosed more bugs related to how it uses personal data for ad targeting that means it may have shared users data with advertising partners even when a user had expressly told it not to. Back in May the social network disclosed a bug that in certain conditions resulted in an account’s …

Data Needs To Be Controlled By Users, But Remain Usable To Others

Data silos in the field of medicine are impeding scientific innovation. This can create huge problems for biomedical scientists like Robert Chang, a Stanford ophthalmologist, and James Zou, a practicing professor at Stanford, who aptly states that “there is a gap between the policy community and the technical community on what exactly it means to value data.” Data silos in the field of medicine are preventing data from being shared across institutions, ultimately hindering significant progress. Patients and doctors alike would almost certainly be more willing to share data knowing it won’t be visible to anyone but themselves. Embracing a privacy conscious design that enables data to be utilized for machine learning models, whether that be used for medicine, technology or otherwise, would significantly help medical innovation. For data ownership to manifest, data needs to be controlled by users, but still usable to others in a privacy preserving manner. This is possible through blockchain based platforms that encrypt and anonymize data, like Datawallet.

AI Needs Your Data—and You Should Get Paid for It

A new approach to training artificial intelligence algorithms involves paying people to submit medical data, and storing it in a blockchain-protected system.

FBI Surveillance Proposal Interferes With FTC Scrutiny of Facebook’s Privacy Policies

Last week’s tragic mass murders in Dayton, Ohio, and El Paso, Texas, have prompted politicians and law enforcement officials to call on social-media companies like Facebook and Twitter to help prevent terrorism. However, Facebook’s past struggles with governmental interference, such as the Cambridge Analytica scandal, has set up a conflict of interest. The FBI and other law enforcement agencies are seeking to gather publicly available data, which combined with other data sources, would be used to build detailed profiles of users in order to track social lives. While Facebook is undergoing FTC scrutiny of its privacy policy, the Federal Bureau of Investigation is soliciting proposals from outside vendors for a contract to pull vast quantities of public data , yet there appears to be scant evidence that social media platforms are able to predict sudden outbursts of violence. Paired with the sensitivity of the recent privacy settlement with the Federal Trade Commission, which includes “preventing the misuse of even publicly viewable data of the sort that the FBI wants to capture from Facebook, Instagram and Twitter”, they’re going to have a tough time coming up with a compromise.

FBI Surveillance Proposal Sets Up Clash With Facebook

An effort by the FBI to more aggressively monitor social media for threats sets up a clash with Facebook’s privacy policies and its attempts to comply with its recent FTC settlement.

Campaign Group Exposed 6.2 Million Americans’ Email Addresses

Email addresses of 6.2 million Americans have been left on an exposed server by an organization seeking to help elect Democratic candidates to the US Senate. According to security firm UpGuard, the data came from those “who had opted out or should otherwise be excluded” from the committee’s marketing. The spreadsheet was titled “EmailExcludeClinton.csv” and was found in a similarly named and unprotected collection of data in the cloud, without a password. The file was uploaded in 2010 — a year after former Democratic senator and presidential candidate Hillary Clinton, whom the data is believed to be named after, became secretary of state. Many questions arise when political data is exposed almost 10 years after it’s created.

Democratic Senate campaign group exposed 6.2 million Americans’ emails – TechCrunch

A political campaign group working to elect Democratic senators left on an exposed server a spreadsheet containing the email addresses of 6.2 million Americans. Data breach researchers at security firm UpGuard found the data in late July, and traced the storage bucket back to a former staffer at th…

Amazon Is Looking to Put Advertising Data on a Blockchain

The tech industry is realizing that the predominantly convoluted discrepancies in adtech need a serious refurb. Amazon is joining a number of companies in order to utilize blockchain technology to remodel the way we distribute and record where the advertising dollars are going, and identify the middle men that are taking a cut. Currently, there are large sums of money flowing through the RTB system. In the US alone, about $20 billion. The general consensus across the board is that more transparency for transacting data in the online advertising industry is imperative.

Amazon Is Looking to Put Advertising Data on a Blockchain - CoinDesk

Amazon is looking to hire a software engineer to integrate parts of its advertising business with a blockchain.

What I'm Reading:

Exclusive: Critical U.S. Election Systems Have Been Left Exposed Online Despite Official Denials

The top voting machine company in the country insists that its election systems are never connected to the internet. But researchers found 35 of the systems have been connected to the internet for months and possibly years, including in some swing states.

Trump campaign, GOP committees halt Twitter spending after McConnell account locked

McConnell’s campaign account was locked for posting a video of protesters outside his home that included violent threats.

Amazon lets Alexa users disable human voice recording review

Amazon's Alexa voice assistant users can now disable human review of their voice recordings by changing settings in the Alexa smartphone app.

Cultivated data is the next Gold Rush – TechCrunch

Data and analytics is finally being valued and becoming mission critical: It is no longer "just another tool” to have in the toolbox, but is key to a company’s success.

Why Tomorrow's Political Leaders Must Engage With Technologists

What technological changes might nudge us towards empowered and enlightened societies that bridge our differences rather than deepen our divides? How can we reorient the digital sphere towards creating the more informed societies that will strengthen and enrich our democracies into the 21st century?

Democrats have told Google to make its contractors permanent employees

The news: Ten Democratic senators—including presidential hopefuls Kamala Harris, Bernie Sanders, and Elizabeth Warren—have asked Google to turn its army of temporary workers into full-time employees. The senators wrote in a letter to Google CEO Sundar Pichai that “the differences between the categories of workers appears to be in name only,” and so the company…

See you next week!



Get the Data Digest in your inbox