Where's data?
You've decided data science is the way to go.
You've determined specific business problems or questions data science can help you address and/or answer.
You're ready to hire data scientists.
Hold on a minute.
Before you hire it's time to look inward, reflect a moment on what's necessary for data science to be effective.
Data. Where's your data?
As part of a bigger picture on integrating data science an early step in making data science effective is having the relevant data to address and/or answer the critical business questions.
When I say data, what am I talking about? Big data? Distributed data? Data in the cloud? Some other buzz phrase or hot topic in technology blogs and mags? Summary charts and tables? Presentations with beautiful dashboard graphics and visualizations?
None of these necessarily. Most of the above terminology refers to how much data and where that data is stored or how it's presented.
Like the Where's Waldo hidden character series, data scientists need access to the relevant data to begin finding the hidden patterns and stories within to solve problems and answer questions. Data that's all over the organization, buried on individual desktops, stuck in emails, or stored in siloed departmental databases make answering questions and solving problems far more difficult. Data scientists work at a raw data level that is often confusing for organizations to understand. So let's clear that up.
Data scientists live in a world of probabilities, and sometimes individual probabilities, at that. What does individual probabilities mean? Predictive modeling and machine learning supports the kind of individualization and customization that consumers and clients want these days. But that individualization and customization comes at a price, and that price is the data required to build the models that produce the probabilistic output at an individual level.
Let's try all that again: without the requisite, relevant available data to build models that output probabilities that targets will do what you want data science can't be valuable in creating optimized and efficient inputs for business success.
Gathering data is the first step in making data science work, which could be called what to collect.
Data collection based on important questions to be asked and answered are necessary as the foundation of the data science hierarchy:
- What data do you need and for what purpose?
- If the data doesn't exist, are you willing to step back and take the time to gather or purchase it and potentially lose tactical business traction to sacrifice for the future?
- Do you understand the volume, velocity, and variety of data you have or need? Put another way, do you understand how much data, how quickly it's being gathered or can be purchased, and how many different types of data are available or relevant?
- Have profiles of clients or consumers been built that classify them by channel interest (if this is important from a marketing perspective); e.g. is the recipient more disposed to receiving email than banner ads? Beyond this, are digital assets tagged correctly to identify those recipients?
- Have the appropriate third-party agreements been put in place if data is being collected from another entity you don’t control, such as partners, vendors, or data marketplaces?
- Do you need or have access to 3rd party consumer profile data from vendors such as Acxiom, Experian, or ESRI?
- Is there a plan for how to bring together the data being gathered directly and from outside sources? Basically, is there a common "link" that will identify individuals to build a profile?
- Has a measurement plan to track and report the most important and relevant key performance indicators been developed in conjunction with preparing for data collection?
If you've committed to data science and want to talk more about the first step of organizing, coordinating, and collecting the relevant and valuable data your organization will need for data scientists to be successful, reach out to me at sam.johnson@bluejacketsol.com.
And if you've been successful it's always great to hear how others have made it work. Each opportunity is unique; like the nationwide regional BBQ battles over whose is best, data science is done in many flavors, none the only or the best. Having baseline processes to define the output, though, like knowing you'll need a smoker for brisket--Big Green Egg, Traeger pellet, barrel smoker--is part of the required methodology for a modicum of success.