SAN FRANCISCO — Personal information of almost 200 million registered U.S. voters was accidentally exposed online due to an improperly configured security setting, security firm UpGuard revealed on Monday.
The leaked information, compiled by Republican data firm Deep Root Analytics and two other Republican contractors, included names, birth dates, addresses, voter registration details and data information that included a cache of posts scraped from Reddit.
UpGuard cyber risk analyst Chris Vickery discovered the open database of 198 million voters on June 12, and it was secured on June 14. Putting that number into context, Politico reported last October that the United States had a little more than 200 million voters.
About 1.1 terabytes of data was available to download and not password protected.
“This the DNA of voter analysis,” Vickery told CNNTech. “This is exactly what they use to determine how someone is likely to vote on a specific issue.”
Deep Root said in a statement the data was exposed on June 1 when the firm updated security settings. The firm said it “builds voter models to help enhance advertiser understanding of TV viewership.”
The data on the open Amazon S3 storage server highlights the years-long effort to stockpile data about American voters, and includes data from 2008, 2012 and 2016 presidential campaigns and other campaign efforts. The 2016 files were not as comprehensive as preceding years.
While some data obviously points to predicting voter decision-making, the use of other information remains a mystery. For instance, the text of Reddit posts could be used for training computers to recognize language sentiment, or to watch specific subreddits to see how people were feeling about political topics.
Some of the exposed information, like voter registration, is public record, but states have different ways of letting people access it and rules on how it can be used.
This is the third time Vickery has found a huge portion of the national voter registration database leaked online. Amazon buckets — where data is stored — are private by default. Someone would have to configure the bucket to be public in order to be exposed.
What’s also unknown is whether Vickery was the first person to notice the exposed information.
If a bad guy accessed that data, it could be used for stealing someone’s identity, stalking individuals or used as leverage for social engineering purposes (that is, tricking something like a phone company into giving someone else your data).
“There’s a lot of people hunting for publicly exposed buckets for nefarious purposes,” he said.