Do you want nearly every Reddit comment ever written to be available to at very fingertips — even offline? The answer is likely no; but if you said yes you’ll be please to know that’s now possible.
Someone has compiled all the comments on the website from October 2007 until May 2015 into an entire data set. According to this person’s post on Archive.org, there were about 1.65 billion comments (although approximately 350,000 were unavailable “due to Reddit API issues”).
The size of the files are huge, coming in at about 5GB of data just for the compressed version.
The project was first announced on Reddit earlier this month, but the author has just now made it available for download via a Torrent file.
Already people are excited about what can be done with this data. One user had the idea of training a ‘neural network’ to “generate typical reddit comments from different subreddits.”
So perhaps we’ll be seeing some cool data projects in the next few months digging deep into the annals of Reddit.
If you’d like to download your own offline vault of Reddit’s comments, you can access the database here.