Yahoo Releases Its Biggest-Ever Machine Learning Dataset To The Research Community


Yahoo announced this morning that it’s making the largest-ever machine learning dataset available to the academic research community through its ongoing program, Yahoo Labs Webscope. The new dataset measures a whopping 13.5 TB (uncompressed) in size, and consists of anonymized user interaction data. Specifically, it contains interactions from about 20 million users from February 2015 through May 2015, including those that took place on the Yahoo homepage, Yahoo News, Yahoo Sports, Yahoo Finance, and Yahoo Real Estate.

In addition to the user interaction data, the dataset also includes demographic information like age range, gender, and generalized geographic data, while items in the dataset include title, summary, and key phrases of the news article in question, plus local timestamps, and partial device information.

Read article at TechCrunch