The Data Science Project Playbook


By Matthew Coffman, High Alpha

Given how frequently we hear or talk about machine learning and AI as emerging ways for startups to differentiate themselves, I’ve been working to identify baby steps that would allow startups to identify and create valuable data science projects on their own. I recently attended MLconf 2016, an event bringing together a nice mix of academics, product leaders, and practicing data scientists.

I found it to be an inspiring and humbling experience in terms of seeing how bigger and more established companies are attacking these challenges. The talks were a mix of recipes, reflections, and advice. Since I have more of an engineering mind, some of the content was over my head. Overall, though, I learned so much from the presenters and other attendees. I walked away with some thoughts on how we as startup product leaders might best tackle data science challenges. I wanted to try to organize these thoughts as a playbook others could use for getting started.

Step One: Understand the Data Science Landscape

Certainly, data science/machine learning/AI has achieved critical mass as a standalone industry. There is no shortage of platforms, tools, and algorithms available from a variety of vendors to tackle just about any application. On the other hand, finding experts with availability to tackle your challenges is a different matter. The big companies are all at war to lure away each other’s data scientists. That doesn’t leave much opportunity for the rest of us who are looking to build the next great chatbot or insight-driven application.

If you’re lucky enough to already have a data scientist on your payroll, then make this person your partner in planning and executing your projects. Meanwhile, understand that data scientists many times don’t have the same expertise and experience as other engineers in building and scaling all the other complex parts of applications. Be sure to get both data scientists and engineers involved in the planning of projects to best ensure success.

In the absence of a relationship with a subject matter expert, how should product leaders still pursue meaningful data science-driven features for their applications? I advocate for an extremely practical approach — as with most other product planning processes, get ready for a number of trade-offs. Luckily, the highly competitive environment of tools and platforms means that almost any dreamy feature can be built. For a product leader, then, the focus needs to be on finding the right feature and balancing the implications.

Step Two: Identify the MVDP

Josh Wills, who is the head of data engineering at Slack, gave a really interesting talk at the conference. Slack has a lot of interesting ways they look at product, particularly in that they are very focused on not selling a product, but a solution to a problem. Every ounce of effort is directed to solving discrete business problems for their customers.

As product leaders, we often use the concept of minimum viable products as a way to identify the least amount of work required to establish that we have solved a customer’s problem. Josh advocates for Minimum Viable Data Products as a way to balance “the vision of what’s possible with what is necessary.” Slack chose a small set of features — channel recommendations, as an example — then strove to identify the lowest effort, most-easily-measurable way to validate that it made the customer’s experience better.

Minimum viable data products need the following to be successful:

  1. Real value to customers — something that enhances or deepens their relationship with the product
  2. Available and sufficient data — even the best algorithm can’t perform without data
  3. Delivery practicality —in other words, can the team deliver the capability with available resources and off-the-shelf solutions?

Product leaders can start with brainstorming features, prioritizing those with most value to customers. Working with engineering leaders (and potentially data science professionals), discuss the availability of data and resources to deliver a given feature.

Don’t be afraid to reduce scope — the goal here is to build something that can quickly prove that a feature is valuable to customers. Once that value is established, then additional layers of complexity can be added on top. With data science projects in particular, though, it’s important to guard against too much complexity up front to reduce the chance that a given project won’t get off the ground.

Step Three: Find Engineer-Friendly Solutions
Our engineering and product teams are excellent at building and delivering features, but they may not yet have the experience and expertise to do this all on their own today. A data scientist provides a higher-level understanding of the possibilities in a given dataset, the right tool/technique to build a feature, and then (and equally important) how to put it into production. Fortunately, the internet is full of courses, learning materials, applications, and APIs that can help companies launch data science features even if they don’t have their own data scientist.

Read the source article at