Data Science on a Budget: Audubon’s Advanced Analytics


On Memorial Day weekend 2038, when your grandchildren visit the California coast, will they be able to spot a black bird with a long orange beak called the Black Oystercatcher? Or will that bird be long gone? Will your grandchildren only be able to see that bird in a picture in a book or on a website?

A couple of data scientists at the National Audubon Society have been examining the question of how climate change will impact where birds live in the future, and the Black Oystercatcher has been identified as a “priority” bird — one whose range is likely to be impacted by climate change.

How did Audubon determine this? It’s a classic data science problem.

First, consider birdwatching itself, which is pretty much good old-fashioned data collection. Hobbyists go out into the field, identify birds by species and gender and sometimes age, and record their observations on their bird lists or bird books, and more recently on their smartphone apps.

Audubon itself has sponsored an annual crowdsourced data collection event for more than a century — the Audubon Christmas Bird Count — providing the organization with an enormous dataset of bird species and their populations in geographies across the country at specific points in time. The event is 118 years old and one of the longest data sets for birds in the world

That’s one of the data sets that Audubon used in its project that looks at the impact of climate change on bird species’ geographical ranges, according to Chad Wilsey, director of conservation science at Audubon. He spoke with InformationWeek in an interview. Wilsey is an ecologist, and not trained as a data scientist. But like many scientists, he uses data science as part of his work. In this case, as part of a team of two ecologists, he applied statistical modeling using technologies such as R to multiple data sets to create the predictive models for future geographical ranges for specific bird species. The results are published in the 2014 report, Audubon’s Birds and Climate Change. Audubon also published interactive ArcGIS maps of species and ranges to its website.

The initial report used Audubon’s Christmas bird count data set and the North American Breeding Bird Survey from the US government. The report assessed geographic range shifts through the end of the century for 588 North American bird species during both the summer and winter seasons under a range of future climate change scenarios. Wilsey’s team built models based on climatic variables such as historical monthly temperature and precipitation averages and totals. The team built models using boosted regression trees and machine learning. These models were built with bird observations and climate data from 2000 to 2009 and then evaluated with data from 1980 to 1999.

“We write all our own scripts,” Wilsey told me. “We work in R. It is all machine learning algorithms to build these statistical models. We were using very traditional data science models.”

Audubon did all this work on an on-premises server with 16-CPUs and 128 gigabytes of RAM.

Read the source article in