NewsNational News

Actions

Library of Congress will stop saving every single public tweet

Posted at 12:47 PM, Dec 27, 2017
and last updated 2017-12-27 12:47:30-05

What’s the Library of Congress’ New Year’s resolution? Trim down on its Twitter time.

The library, which has collected every single public tweet sent out since Twitter’s inception in 2006, will only acquire and preserve tweets “on a selective basis” starting January 1. That means tweets from President Trump and other newsmakers will continue to be archived, but your (allegedly) witty rants on “The Last Jedi” may not be.

“Generally, the tweets collected and archived will be thematic and event-based, including events such as elections, or themes of ongoing national interest, e.g. public policy,” the library said in a statement.
Part of what drove this decision is that there’s just a whole lot more Twitter to keep up with now. People are tweeting now more than ever, tweets have grown in length (Twitter recently doubled its character limit from 140 characters to 280) and tweets aren’t just text-based anymore, which creates difficulties for the library.

“The library only receives text. It does not receive images, videos or linked content. Tweets now are often more visual than textual, limiting the value of text-only collecting,” it said.

The Library of Congress also said it generally doesn’t collect comprehensively. It made an exception for social media in its infancy, but now that Twitter, Facebook and other platforms are more established, the library will now bring its social media collecting practices more in line with its normal collection policies.

“The Library regularly reviews its collections practices to account for environmental shifts, diversity of collections and topics, cost effectiveness, use of collections and other factors. This change results from such a review,” Gayle Osterberg, the library’s director of communications,” wrote in a blog post.

In April 2010 Twitter gave the library its archive of tweets dating back to the first ones sent in 2006. The Library of Congress has been saving them — hundreds of billions of them — ever since. Until now.

But that ginormous Twitter archive — a virtual treasure trove of the social media site’s first decade and an extraordinary window into how we communicate with each other — will remain part of the library’s collection.

But the Library of Congress said public access to the archive will be blocked until it can figure out “a cost-effective and sustainable” way to let people view and use it.

Examples of notable tweets contained in the archive include the first-ever tweet from Twitter co-founder Jack Dorsey and the tweet sent from Barack Obama’s account after he won the 2008 election.

Created in 1800, the Library of Congress serves as the unofficial library of the United States, as well as being Congress’ official research library.

Since 2000, the library has been collecting pages from websites that document government information and activity. Today, that archive is more than 300 terabytes in size and represents tens of thousands of different sites.

The library’s entire collection of printed books has been estimated to total about 10 terabytes of data (although staff at the library suspect it’s probably more).