What is Data Strategy?
This blog is intended to be a quick-ish guide on what a solid Data Strategy should outline for an organisation. As you can imagine, a lots of details are skipped over here in order to avoid writing a book on the subject.
Today, medium to large businesses need to have a Data Strategy in place to guide their Digital journey. In my experience, a meaningful Data Strategy is a three-pronged one:
A: Data Curation
First and foremost, the Data Strategy for an organisation needs to define The Enterprise Data Platform for storing enterprise-wide data from 100s of sources scattered around the org (on-prem and cloud), in a reliable & scalable way. This "platform" tends to be an Apache Hadoop and Spark-powered Data Lake of sorts. Hadoop is just one of the frameworks, albeit the popular one, but there are other specialised NoSQL (e.g., Mongo) and messaging (e.g., Kafka) stores that often compliment Hadoop & Spark the overall Enterprise Data Platform.
In today's enterprise landscape when a majority of applications are hosted on the cloud with 3rd party vendors (think SalesForce, Xero, etc), the Data curation becomes quintessential in order to bring together all of an organisations data about its operations, its product & services, & its customers into a central repository in at-least near-real-time, such as an Hadoop Data Lake.
This section of strategy needs to contain most if not all of the following:
Data architecture
Details must include cradle-to-grave Data pipelines, technical architecture (like RDBMS', Spark & Kafka clusters, master and name nodes, VPCs etc). Must include Data both at rest (e.g., HDFS or S3 or Blobs) and in-motion (Kinesis, Spark Streaming etc). More on this in another post.
Target Operating Model
Details must includes Data governance, people & agility model, best practices, patterns & anti-patterns, on-boarding & off-boarding of projects onto the Enterprise Data Platform. More on this in another post.
The primary outcome of executing this part of Data Strategy must be to have a "platinum" copy of data available to the entire organisation, as a platform
The above must be achieved - with no further prohibitive initiation costs to the rest of the organisation for use of this data. This needs to be funded by the IT budget, and it will be in essence, the most valuable "service" that an IT org of future will provide.
B: Financial benefit
Secondly, the Data Strategy needs to identify and vet one Top-line and one Bottom-line impacting business initiative, per year, considering a 3-year roadmap, that must be accomplished by the organisation in order to reap ROI out of the data curated in strategy #1.
These initiatives are not easy to define due to many stakeholders with varying & conflicting priorities, hence a bottom-up approach sometimes works better, where focused teams aim to find as many validated data-related business use-cases from the various departments within an org, and then consolidate these together into over-arching initiatives that have full business buy-in.
For example, "Acquisition-cost reduction by 15%" could be a strong candidate as its has a tangible $$, whereas "Data Quality improvement" can't easily be tied to a top/bottom line number. The high-level initiative will typically have an array of Data & Analytics sub-initiatives like Customer 1080, DQ, MDM and so on, that are stitched together in the Data Strategy to deliver the real financial impact.
This is usually funded by the business as the ROI is closely tied to business outcome.
C: Innovation enabler
Finally, the Data Strategy must outline the framework to systematically enable data-driven innovation that is supported by strategy #1 listed above.
Most companies already have innovation incubators, but having a strong link between the Data strategy and Innovation is key to keeping Innovation at centrepiece of an organisation's data.
In a Data Strategy, technical and function processes need to be laid out for "Innovation sandbox", on-boarding off-boarding, and an individual Innovation project journey from POC to MVP to Production.
An Innovation sandbox can simply be a PC with Python & R-studio connected to the Data Lake for an analyst to perform exploratory models.
Structured approach to Innovation usually comes down to how funding is allocated to Innovation within a company. Old way used to be centralised R&D departments, today this is generally nurtured by entrepreneurial digital and innovation capital boards that can allocate small pots of money on the most worthy ideas and projects from anywhere within or outside the org. In-house startup incubators are also a good example of this model.
Summary
Hope this guide helps you better structure your Data Strategy that pays individual focus to Data Curation, Financial Benefits, and Innovation.