Exploring the mutually inclusive modern data architecture of Machine learning and Serving infrastructure
Machine learning(ML) supercharges the data architecture and irrefutably glues functional experts, data engineers(DE) & data scientists(DS) to work as one team.
Challenges:
- Fragmented ML development duplicating the learning process and inducing issues like compatibility & ambiguity(inconsistent metadata) w.r.t data platform.
- Leveraging traditional 'data warehouse' practices while developing ML solutions. DS Vs. DE viewpoints & needs are different (e.g., need more data? or how much data is ‘right’ data?)
- Lack of focus on operational ML components in overall organizational data strategy.
- Conflict & chaos in data architecture due to the collision of Rule-based & ML development activities.
Overview:
ML solutions need, and usage is ever increasing in a product or enterprise. It is evident that the pioneers in this space are already reaping the rewards and are creating a ‘competitive’ landscape.
By 2020, 75% of large and midsize organizations globally will compete using proprietary algorithms. Gartner, 2015
An ‘operational’ machine learning architecture is beyond commonly seen ‘traditional’ ML block diagram (see Figure 1).
Here is the ‘evolved’ representation of a modern distributed data architecture (Figure 2).
Given below are few 'recommendations' to product/enterprise data leaders who are focused on building & managing modern data architectures with ‘ML’:
- Standardize enterprise ML implementations & tools by leveraging 'data science workbench' like frameworks; provide a scalable end-to-end ML workflow with deeper integration into the data platform.
- Domain experts who thoroughly understand the data characteristics should perform the ‘data transformation’ & ‘feature engineering’ activities. A fundamental shift from 'traditional' enterprise data management philosophy.
- Expand the ability of functional experts to ensure data ‘correctness’ and ‘coverage’ by providing the right tools to perform ‘model evaluation’ & ‘monitoring’. ML operational efficiency.
- ML or Rule-based systems have respective strengths & leverage points. Migration from rule-based to ML or building ML solution requires end users ‘trust’. 'Augmented analytics' is an evolving component that is trying to bridge these two.
A paradigm shift in the product/enterprise data strategy prompts ‘unlearn to re-learn’ the agreements reinforced in the last two decades.
Finally, the deep-rooting of ML in modern data architectures is definite & profound. ML workflows, tools, implementation frameworks, and capability to auto scale are all evolving pieces. Based on the organizations need and ability, it should either develop in-house solutions or leverage off-the-shelf (from cloud/platform) products to succeed in their ML journey!
Principal Product Manager at Microsoft
5yGreat article Siva!