Data mapping is the process of creating data element mappings between two distinct data models. It is an essential part of data management that ensures data quality in integrations, migrations, and other data management tasks. Data mapping provides a visual representation of data movement and transformation, and it is often the first step in the process of executing end-to-end data integration.
Data mapping is used for a wide variety of data integration tasks, including data transformation or data mediation between a data source and a destination, identification of data relationships as part of data lineage analysis, and discovery of hidden sensitive data such as the last four digits of a social security number hidden in another user id as part of a data masking or de-identification project.
Data mappings can be done in a variety of ways using procedural code, creating XSLT transforms, or by using graphical mapping tools that automatically generate executable transformation programs. Some graphical data mapping tools allow users to "auto-connect" a source and a destination, which is dependent on the source and destination data element name being the same.
There are different types of data mapping techniques, including manual data mapping, semi-automated data mapping, and fully automated data mapping. Manual data mapping requires connecting data sources and documenting the process using code, while semi-automated data mapping is a hybrid process between fully automated and manual data mapping. Fully automated data mapping is the newest approach in data mapping and involves simultaneously evaluating actual data values in two data sources using heuristics and statistics to automatically discover complex mappings between two data sets.
Data mapping is crucial to the success of many data processes, and one misstep in data mapping can lead to replicated errors and ultimately affect the quality of the data to be analyzed for insights. Data mapping bridges the differences between two systems or data models so that when data is moved from a source, it is accurate and usable at the destination. Data mapping has become more complex as the amount of data and sources increase, requiring automated tools to make it feasible for large data sets.