The ten steps of information integration
Created with Craiyon AI. Unfortunately, it could not count to ten.

The ten steps of information integration

by Friedhelm Reydt.

Information integration usually consists of ten (abstracted) steps:

1. Job specification

For which object or process within an organisation is data completeness required? 

Example: In order to provide customers with a price indication with the help of a product calculator, all the necessary data must be available. The order specification describes the information needs of the addressees as well as the target state to be achieved.

2. Data identification

If the data for the calculator is incomplete, the product will be offered on the market either too expensive or too cheap. Both can have a negative impact on the market positioning of the company. Without wanting to go into the method of recursive data identification here, this phase includes the complete localisation of all data sources that are available in different areas of activity and are needed to fulfil the information requirements of the addressed target group. Both formal and informal technical or non-technical data sources can be considered. Technical data can be structured and unstructured. The goal of data identification is information completeness. All identified information is presented in the form of an information map.

3. Data extraction

To ensure information completeness, data from the identified technical source systems are continuously passed on for data transformation. This should be done regularly and automatically. Non-technical and unstructured data (e.g. Word files from document management systems) are transformed into structured technical data.

4. Data transformation

All data is converted into a common format and structure ... and collected in a temporary database.

5. Data cleansing

Inconsistent, duplicate and missing data are eliminated from the temporary database. A set of rules is created in advance for this purpose. Data cleansing can be automated and/or manual. 

6. Data reconciliation

With regard to the single version of the truth principle, semantic differences between data sources must be identified and eliminated. This is also done with the help of a set of rules.

7. Data enrichment

To improve their quality and completeness, the data in our temporary database can be enriched or supplemented with additional information (metadata).

8. Data storage

Once the data in our temporary database has been completely processed, it is formally added to the central database (repository). With this step, the extraction, transformation and loading (ETL) is considered complete.

9. Data linking

The data in the central database can be used to meet the information needs of different target groups inside and outside the organisation. The data required for this purpose can be virtually linked both logically and mathematically and combined to form target group-specific information sets. In this way, new information is created from previously independent data that would not have been available without information integration.

10. Data dissemination

Target group-specific information sets are made available to further processing systems via standard interfaces. These are available at the gateway and the target system accesses them. Examples of such a system are, for example, a mobile app or a web frontend with the function of a configurator that requires daily updated data for the purpose of calculating offers or a simple database. It is important that data dissemination does not preclude writing data back to the original source systems....

To be continued.


Excursus

Data models vs. information sets

The information systems of the departments are based on specific data models, which, for example, can consist of components such as customer name, service item and corresponding price for an invoice. Such information systems may or may not communicate with each other in the sense of the value chain.

If process-relevant systems do not communicate with each other, even though the data generated in system A is needed in the operational context of system B, a media break occurs within the digital value chain, which in the worst case remains undetected or must be remedied with the help of a manual process step. A media break always indicates a possible source of error that can lead to data being falsified.

To resolve the dilemma of insufficient data reconciliation, tools such as Information Integration Platforms are used to map higher-level information sets to clean up missing, erroneous or redundant data. An information set can therefore be composed of different attributes from different data models.

To view or add a comment, sign in

Insights from the community

Others also viewed

Explore topics