Mapillary open sources 25k street-level images to train automotive AI


As more companies wade into the business of building artificial intelligence systems to help you drive (or do the driving for you), a startup founded by an ex-Apple computer vision specialist is open sourcing a huge dataset that can help them on their road to autonomy.

Mapillary, a Swedish startup backed by Sequoia, Atomico and others that has built a database of 130 million images through crowdsourcing — think open-source Street View — is releasing a free dataset of 25,000 street-level images from 190 countries, with pixel-level annotations that can be used to train automotive AI systems.

The Mapillary Vistas Dataset claims to be “the world’s largest, most diverse dataset for object recognition on street-level imagery.” As with the rest of Mapillary’s photos, the startup builds its image database on top of Mapbox and OpenStreetMap maps.

The dataset is free for both academic and commercial researchers, and if anyone wants to build the results into commercial products, they must pay a commercial license.

As Jan-Erik Solem, the CEO and co-founder, explained, while there are other datasets that companies are using to train the machine learning algorithms for their in-car systems, these fall short because they “do not have enough variability and coverage to be useful in real-world scenarios.”

This Vistas dataset is built on top of regular Mapillary images, where most of the images come from crowdsourcing. “What we have done here is that we manually selected 25,000 images with the variability we wanted from the 130+ million available on Mapillary,” Solem explained. “Then we manually annotated them to label all the pixels in the images. This is a tedious and expensive manual labor process.”

Expensive, and yet now free to use, because of the companies that are “sponsoring” the work, Solem said.

Read the source article at TechCrunch.