Active Metadata: Your 101 Guide From People Pioneering the Concept & It’s Understanding
Share this article
Active metadata is a way of managing metadata. It leverages open APIs to connect all the tools in your data stack and ferry metadata back and forth in a two-way stream. It’s about utilizing metadata in the best way possible and not having it sit in a silo.
Active metadata helps you continuously access and process all kinds of metadata to understand your data better, regardless of the tools.
With active metadata management, you get an always-on, intelligent, API-driven, and action-oriented system that powers use cases from cost optimization and quality control to data security.
Gartner predicts that through 2024, organizations that adopt active metadata capabilities can decrease the time-to-delivery of new data assets to users by as much as 70%. So, let’s understand active metadata, its characteristics, use cases, and management.
Table of Contents #
- What is active metadata?
- The 4 characteristics of active metadata
- Active metadata example
- Active vs. passive metadata: What’s the difference?
- 14 active metadata use cases
- What is active metadata management?
- What does an active metadata management platform look like?
- Active Metadata: Related Resources
What is active metadata? #
Active metadata is a way of managing metadata. It leverages open APIs to connect all the tools in your data stack and ferry metadata back and forth in a two-way stream.
This is what allows active metadata to bring context, say, from Snowflake into Looker, Looker into Slack, Slack into Jira, and Jira back into Snowflake.
The 4 characteristics of active metadata #
There are four fundamental characteristics of active metadata:
- Active metadata is always on
- Active metadata is intelligent
- Active metadata is action-oriented
- Active metadata is open by default
#1- Active metadata is always on #
Active metadata is always on. This means having the ability to automatically and continually collect metadata from various sources and steps of data flow — logs, query history, usage statistics, and more.
#2- Active metadata is intelligent #
Active metadata isn’t just about collecting metadata. It’s about constantly processing metadata to connect the dots and create intelligence from it. This means that with active metadata, the system will only get smarter over time as people use it more and it observes more metadata.
So, you can auto-classify sensitive data, use automatic suggestions to document a data asset’s description, send alerts about critical issues, and more.
#3- Active metadata is action-oriented #
Active metadata doesn’t just stop at intelligence. It should drive action by:
- Curating recommendations
- Generating alerts
- Making it easier for people to make decisions
- Automatically making decisions without human intervention, like stopping downstream pipelines when data quality issues are detected
#4- Active metadata is open by default #
Active metadata platforms use APIs to hook into every piece of the modern data stack. This makes magical user experiences possible by saving data practitioners from the endless tool- and context-switching.
This is called embedded collaboration, which is when work happens where you are with the least amount of effort.
An example to explain active metadata #
Let’s take Spotify as an example to understand how active metadata works.
When you open Spotify, the platform’s algorithm analyzes various types of metadata associated with each song, such as the genre, mood, and tempo to automatically suggest similar songs or artists that you might enjoy. This happens as soon as you listen to a song or play one of your playlists.
Spotify also analyzes the metadata associated with each song or album, such as the artist, release date, and popularity, to auto-classify music into playlists such as “Discover Weekly,” “Release Radar,” and “Daily Mix” that are tailored to each user’s taste. These playlists are constantly updated depending on your listening history and tastes.
So, Spotify can create personalized playlists, categorize its music library, and provide intelligent recommendations, all thanks to active metadata.
Active vs. passive metadata: What’s the difference? #
Both active and passive metadata refers to how we aggregate, store, and use metadata.
The main difference between active and passive metadata is that passive metadata is the standard way of collecting technical metadata — schemas, data types, models, etc. Meanwhile, active metadata is a way of making metadata flow dynamically across the entire data stack.
This enables bidirectional data flow, embedding enriched context and information in every tool in the data stack. So, active metadata goes beyond technical metadata to include operational, business, and social metadata.
Here’s how Prukalpa Sankar, co-founder at Atlan, highlights the difference between active vs. passive metadata:
“Think of passive metadata as putting out information on a personal blog. Every so often, it could get picked up and go viral, but most of the time, it’s just going to sit unseen and unused. Think of active metadata as a viral story. It shows up everywhere you already live in what seems like seconds. It’s immediately cross-checked against and combined with other information, bringing together a network of related context into a larger trend or story. And it sparks conversations, making everyone more knowledgable and informed in the end.”
Now let’s look at the possibilities that active metadata can offer.
14 active metadata use cases #
While there are numerous active metadata use cases, here’s a list of the top 14 enterprise use cases to get you started:
- Optimize data stack spending with dynamic pipeline optimization
- Purge stale or unused assets
- Reduce the time spent on the root cause and impact analysis
- Instill more trust in your data
- Manage security classifications
- Raise security alerts programmatically
- Archive data programmatically
- Generate periodic data security and compliance reports
- Set up and regulate data access
- Streamline analyst service requests
- Streamline data/analytics engineer service requests
- Speed up the onboarding of your data team
- Write better SQL queries
- Enrich user experience with BI tools
Let’s explore each active metadata use case further.
1. Optimize data stack spending with dynamic pipeline optimization #
With active metadata, you can find answers to questions about the queries taking the longest time to run, most/least queried tables, jobs taking too long to be executed, and more. Each data processing workflow consumes resources and running thousands of such processes each day can add up to a substantial amount.
With active metadata, you can collect runtime metrics from data processing engines and usage metrics from BI tools. This helps you monitor peak access times, identify the most clunky processes, and track assets that get updated the most/least.
You can then set up automated workflows to:
- Scale your resources up or down to accommodate for the peak hours
- Reduce the frequency of reprocessing for the least used data assets
- Set up a processing schedule that uses resources better
2. Purge stale or unused assets #
You can track the popularity of each data asset with usage metadata to know when it was last used or how many people used it. If the asset hasn’t been updated or used in months, then it’s stale or redundant and must be purged.
3. Reduce the time spent on the root cause and impact analysis #
A single root cause analysis can take anywhere between 2 to 6 hours as engineers have to examine each workflow associated with a reported issue.
Active metadata can automate lineage — tracking data flow across the data universe — and reduce the analysis time to mere minutes.
So, you can see everything that happens to data that was extracted from Salesforce and is now being used to set up some Tableau dashboards. You can also see which downstream assets will get affected whenever you make some changes to that Salesforce data.
As a result, in addition to speeding up root cause analysis, you can also preview the impact of any changes you make and prevent workflows from breaking in the first place.
4. Instill more trust in your data #
Active metadata can help you send real-time alerts and announcements about the status of each data asset so that data users are always in the loop.
For instance, whenever you detect an anomaly, you automatically send alerts to all downstream BI asset users flagging that anomaly along with the associated data assets that also might have been affected.
Another example is when you’re about to make some changes to your Fivetran data and aren’t sure of its impact. With active metadata, you can send an upcoming change alert informing data users and asking them to monitor data quality issues (if any) over the next few days.
Similarly, when you’re migrating data assets, you can do everything from assessing the complexity of the migration and change announcements to deprecation notices.
5. Manage security classifications #
Data within organizations are classified according to levels of sensitivity to maintain data security and privacy. This is essential to comply with regulations like GDPR or CCPA.
Active metadata gives you the ability to propagate CIA (confidentiality, integrity, availability) ratings automatically via column-level lineage in real time.
6. Raise security alerts programmatically #
There are hundreds and thousands of changes that happen to data within organizations on a daily basis — adding or updating columns, classification tag changes, purging assets, etc.
Whenever these changes happen, it’s important to gauge the impact of such changes on data security. That means addressing questions, such as “what is the new data and where did it come from?” or “does the column contain sensitive data?”, proactively rather than reactively.
Active metadata can help you make the data security team proactive as they’ll receive real-time alerts and announcements about change events automatically.
For instance, any changes to a sensitive asset will immediately send a Slack notification to the security team and automatically raise a Jira ticket.
7. Archive data programmatically #
Tracking and deleting data is often manual as it requires you to check for potential legal risks or contractual breaches.
For instance, let’s assume that business managers must delete some data periodically to avoid legal and compliance risks. They’ll raise a request with a data analyst or an engineer.
But without end-to-end visibility of the data assets to be deleted and by relying on manual processes, the data team might miss deleting some of it.
With active metadata, you can set up automatic workflows to crawl data, notify the right stakeholders as soon as that data is available, keep track of its storage period, and archive it at the right time to avoid any compliance breaches.
8. Generate periodic data security and compliance reports #
Active metadata helps you understand each data asset fully with 360-degree profiles and explore its relationships with other assets across the data universe via lineage.
This helps you consolidate and analyze the resulting set of metadata. With the bidirectional flow of metadata, you can export your findings in the form of a BI dashboard and use it for security and compliance reporting.
9. Set up and regulate data access #
You can define access control policies using contextual metadata — classifications, business glossary, etc. and link it to the relevant data assets and their fields.
You can then set up tag-based or attribute-based access control and propagate it automatically across assets via column-level lineage. This enables access control at scale, making it easier to monitor who is requesting access to what data, why, and how are they accessing that data.
10. Streamline analyst service requests #
Business users constantly ask analysts questions such as “what does this metric mean?” or “can I see it by geography instead of a segment?”
Addressing each of these service requests could take analysts several hours.
The entire process could be reusable and reproducible when each data asset — data sets, metrics, queries — is viewed as a product. With active metadata, you can build GitHub-like repositories for each of these products and then share those profiles with just a link.
11. Streamline data/analytics engineer service requests #
Data users also ask engineers questions about data SLAs getting updated, the origins of certain tables, the availability of certain data, and so on.
Just like analysts in the previous use case, data engineers end up spending hours reviewing and responding to such service requests. Extending the data-as-a-product analogy, you can build engineering context by leveraging active metadata so that all users can look up information such as last run status in Airflow or freshness of dbt tags whenever they want.
12. Speed up the onboarding of your data team #
New data team members spend most of their time trying to locate data in their organizations. In such cases, onboarding could drag on for weeks if not months.
When each data asset comes with a 360-degree asset profile, it offers context about the asset’s origins, ownership, upstream and downstream workflows, quality, freshness, and more. You can trace its full lineage all the way to its source.
13. Write better SQL queries #
Active metadata also helps you keep track of SQL queries run for each data asset.
From the asset’s profile, you can easily see definitions, recent joins, metrics, and any issues/warnings involved. This helps you write better SQL queries with all the necessary context.
Moreover, the entire process is self-service and doesn’t require you to reach out to others within your organization for context.
14. Enrich user experience with BI tools #
Instead of switching between a BI tool and a data catalog, you can use active metadata to bring context into dashboards. Relevant metadata (business terms, descriptions, owners, and lineage) can be pushed into the BI tool.
So, when someone is looking at each table, they can understand who owns it, where the data came from, how it’s getting used, and more. This information could even be used as labels in auto-generated reports.
20 popular active metadata use cases for modern enterprises #
What is active metadata management? #
An active metadata management platform enables the two-way movement of metadata by analyzing all types of metadata from various data sources and then sending enriched metadata back into different tools in the tech stack.
According to Gartner, active metadata management is the “capability of operationalizing analytic outputs in the form of operational alerts and generating recommendations. It identifies the nature and extent of patterns in data operations, ultimately resulting in AI-assisted reconfiguration of data itself and operations that use that data in active metadata utilization.”
Here’s how Prukalpa Sankar, co-founder at Atlan, envisions a world with active metadata management:
“Imagine a world where data catalogs aren’t standalone tools. Instead, a user can get all the context where they need it — either in the BI tool of their choice or whatever tool they’re already in, whether that’s Slack, Jira, the query editor, or the data warehouse. It’s like reverse ETL, but for metadata.”
Active metadata management and the modern data stack #
Gartner predicts that “the stand-alone metadata management platform will be refocused from augmented data catalogs to a metadata ‘anywhere’ orchestration platform”. They also scrapped the famous Magic Quadrant for Metadata Management Solutions and replaced it with a Market Guide for Active Metadata.
This is being seen as a transformational leap towards a new approach to metadata — with active metadata right in the driver’s seat.
“The major problem that this new approach is trying to solve is ensuring that metadata management platforms catch up with the speed and types of metadata that are being generated and captured. It also particularly recognizes the increase in the value of metadata, once it stops living “passively” across platforms and changes to “actively” moving between platforms.”
What does that mean for the modern data stack? With an always-on, active, and intelligent way of handling metadata, the entire modern data stack will support a bidirectional flow of metadata. This will require an active metadata management platform that:
- Imports and exports metadata, workflows, and other optimization strategies
- Uses machine learning to recommend job flows, resource allocation, etc.
- Analyzes metadata across platforms
Let’s further explore the anatomy of such a platform.
What does an active metadata management platform look like? #
An active metadata management platform is always on, intelligent, API-driven, and action-oriented, enabling a bidirectional flow of metadata to ensure embedded collaboration. They’re no longer passive solutions that solve the “too many tools” problem by adding yet another tool — expensive shelfware.
To my many friends/followers doing metadata/catalog startups, I have a request: please integrate the metadata info with my BI tool so that I can see it _while I am doing queries._
— @jwills@data-folks.masto.host (@josh_wills) April 29, 2022
I have no desire to _ever_ visit a third website to just "browse the metadata."
An active metadata management platform sends metadata back into every tool in the data stack. The five core components of such a platform are:
- The metadata lake: A unified repository to store all kinds of metadata, in raw and processed forms, built on open APIs and powered by a knowledge graph
- Programmable-intelligence bots: A framework that allows teams to create customizable ML or data science algorithms to drive intelligence
- Embedded collaboration plugins: A set of integrations, unified by the common metadata layer, that seamlessly integrates data tools with each data team’s daily workflow
- Data process automation: An easy way to build, deploy, and manage workflow automation bots that will emulate human decision-making processes to manage a data ecosystem
- Reverse metadata: Orchestration to make relevant metadata available to the end-user, wherever and whenever they need it, rather than in a standalone catalog
So, what’s next? Here’s the first step to getting started with active metadata management #
Getting started with active metadata means starting your journey toward building a more forward-looking stack. The first step is to identify your use cases, and then pick the right tools.
Whether you are trying to compose a data fabric, data mesh, or trying to democratize data across all teams in the organization, the first thing you need to do is to choose metadata management tools that have the ability to use and exchange active metadata.
The right active metadata management platform will help you, as Forrester puts it, address the diversity, granularity, and dynamic nature of data and metadata and weave end-to-end visibility into your modern data stack.
With bidirectional communication, collaboration, and data flows, you can set up a living, intelligent, action-oriented data ecosystem that helps you optimize costs, enhance data security, ensure regulatory compliance, and improve data team productivity.
Share this article