Photo by Erik Mclean on Unsplash

Metadata Debt

Charles Landau
4 min readMar 1, 2021

--

Can we call a spade a entity:tool:spade?

Originally published on The Slip Box.

Imagine that you are the first-time founder of an innovative logistics company, Scones Unlimited: “The Lyft of Scones”. Over time your business grows into enterprise software management problems, and your technical executive leadership comes to you with various initiatives to promote code standards, security standards, and generally stamp out technical debt.

You are a very enlightened leader, and you know that every business is a technology business now. You decide to invest in these initiatives — to invest in technical leverage. Code standards become socialized. Some of the engineering staff complain about “onerous bureaucracy”, and you collaborate with your technical leaders to develop a solution to that problem too.

Over time, you build up automated pipelines that check each engineer’s work for code problems before it can be accepted. For security, the pipelines also check for outdated libraries with vulnerabilities, or for complicated sequences of text (because those might be passwords).

All is right with the world, and the engineers are happy… until one day, the metadata debt comes due.

Something Amiss In Scone Town

As your business (and staff) grow, you start to become frustrated with some of the systems in your company. The problems you hear about rhyme in a way that you can’t quite put your finger on.

  • Your data science team is spending a lot of its time trying to acquire data. They don’t have an easy way to discover data, and when they do, getting access can be very difficult.
  • Your CTO confides in you that they want to nudge the CISO to implement less restrictive policies, but they can’t because the CISO doesn’t have an easy way to distinguish sensitive systems from the rest.
  • Your CFO calls you, haggard, asking you to reign in spending on cloud computing. Her team is spending all their time tracking down the charges. A charge will have a project code that says “Unicorn” or “Manhattan”, but her staff have no way of relating that to real teams with real billing codes.
  • Your Director of Scone Excellence complains to you that her logistics partner tracking systems don’t give her enough data to evaluate anything. She needs fine-grained reports that are aware of what rating a partner had at the time of transit.

It all comes to a head with project Crust.

“Project Crust” is an enterprise initiative proposed by your CTO. They will unify data management across the disparate systems, resolve the access patterns, and at long last unite the teams.

One interface will let anyone do Google-like search for the data they need. They’ll restrict results based on the requester’s role in the company, or even based on fine-grained attributes. You are very excited to hear about this ambitious project. You’re also a little concerned about a project of this scope.

You were right to be concerned. The project runs into all sorts of problems:

  1. There’s no common language across teams. Setting up a top-level “Project” or “Team” data type in the application has been incredibly challenging.
  2. Many datasets have no clear way to get access. Project Crust’s “Click to Request Access” feature is going to be unavailable for the vast majority of company data.
  3. Attributes are used inconsistently throughout the company, i.e. tagging something as “private” means different things for the HR, infrastructure, and legal teams.

This all comes down to metadata, your CDO explains. Metadata is the data about your data: sources, access levels, quality, subject matter, “freshness”, size, storage type, and much more. You find out that everything in your company has metadata: documents, cloud resources, applications, staff profiles, marketing campaigns, machine learning models, and of course, datasets. You meet with the team and end up approving a six month extension on the roadmap for the project, as well as additional staff. The extra resources will be used to work with teams across the firm and realign all this “metadata”.

Near Miss

Still, it wasn’t all joy. A few months later you have a panicked call from your VP of HR. There had been a marked uptick in turnover, in particular for data professionals. They did some investigating and traced the change back to an article that was circulating on the company chat boards.

Your direct competitor, Popovers Incorporated (“the Twitch for quickbreads”) released a product profile on their engineering blog about their data discovery platform “Project Gluten”. Your data scientists and operations teams alike had drooled over the self-service data access features, and the natural language search for new datasets. It ran on open source software, and at its core was a metadata database that the company was using to consolidate all of its enterprise data management. Your infrastructure team admired the simple and scalable architecture of the solution.

You called a quick huddle with your engineering leaders and made the decision to announce Project Crust internally within a week, and externally within one month. Team Crust would also investigate whether they could set an earlier date for internal test users to begin trying out the site.

At a subsequent check-in, you hear about the excitement across the company for Project Crust, and you allow yourself to relax. A year later still, you look back on the project as one of your best investments to date. Across your company, teams find it easier than ever to do data discovery. The Crust database is widely considered one of your most successful internal tools. Your staff all agree, they have never worked at a company that makes it easier to get the data they need than at Scones Unlimited.

--

--

Charles Landau

Always learning, usually building. Solutions Architect at Guidehouse