Data virtualization is an approach to data management that allows an application to retrieve and manipulate data without requiring technical details about the data, such as how it is formatted at source, or where it is physically located. The defining feature of data virtualization is that the data used remains in its original locations and real-time access is established to allow analytics across multiple sources. Data virtualization software aggregates structured and unstructured data sources for virtual viewing through a dashboard or visualization tool.
Key features of data virtualization include:
-
Centralized data-access layer: Data virtualization establishes a single data-access layer for finding and using all enterprise data, comprised of logical/virtual representations of physical data sources like data warehouses, data lakes, transactional and analytical databases, cloud and enterprise applications data services and APIs, and data files.
-
Real-time access: Data virtualization enables real-time access to data stored across multiple heterogeneous data sources.
-
Metadata engine: A metadata engine collects, stores, and manages metadata about the data sources, including data definitions, data lineage, and data quality metrics.
-
Security and compliance: Data virtualization can help to resolve privacy-related problems and ensure compliance with regulations by limiting the view to all other collected variables.
Benefits of data virtualization include:
-
Faster access to data: Data virtualization provides a modern data layer that enables users to access, combine, transform, and deliver datasets with breakthrough speed and cost.
-
Reduced complexity: Data virtualization simplifies data integration by providing a unified, virtual data access layer built on top of many data sources.
-
Lower risk of error: Data virtualization lowers the risk of error caused by faulty data by ensuring that the newest data is used.
-
Increased availability and usage of enterprise data: Existing data infrastructure can continue performing their core functions while the data virtualization layer just leverages the data from those sources, making it complementary to all existing data sources.
Data virtualization is an excellent foundation for modern distributed data architecture and use cases. It adds data integration flexibility so data architects can successfully evolve their data strategies and architectures to take full advantage of the latest data technologies and innovations.