By: Ryan Malone
Published: 04/17/2021
Interesting article we enjoyed at PDHNow. See the original article here: https://medium.com/be-greator/becoming-a-data-centric-organization-f9c12b3f9a63
Businesses that are not very mature in data governance often have their different business units operating and using data in silos. The IT department manages the system and each department manages their own data without any collaboration. This is a waste of a brilliant opportunity. With today’s technology, we can create a single platform for the entire organization to collaborate around its data. This is what we call a data-centric organization.
Now, the thing about good data governance is that it is a mindset, and for companies that aren’t quite there yet it’s going to be a journey. Becoming a data-centric organization isn’t a one-off project, in most cases it requires a change of culture. If you think it’s going to be done and dusted in six months then you’ve already failed.
But hey, you need to start somewhere.
We got the down-lo in a Mini Masterclass on what it means to be data-centric and what the journey towards becoming a data-centric organization looks like from Laurent Dresse, a data governance expert with over 15 years of experience in the field.
Ultimately, we want to have a platform in place that enables people to collaborate around their data and enables the organization to crowd-source knowledge. Everyone in the company needs to understand that they all have a role to play in the organization of the data. In simple terms, a data-centric organization is one in which:
-People can share their own knowledge, and
-put it into a platform where it will be accessible by everyone in the organization, and
-users can enrich the data through their use of it on an everyday basis.
To know where you are in the journey towards becoming data-centric, you must consider maturity along two axes as follows.
1. Data culture awareness: This axis has to do with all of your technical assets, that is, an inventory of your IT resource, systems, and databases. This is the data catalog. The levels in increasing order of maturity are:
i. No central data knowledge available
ii. Data is localized and identified
iii. Data is defined and classified
iv. Data Lifecycle is known, stewardship organization is clear and linked to data assets
v. Data governance rules are established and monitored
2. Data landscape awareness: This axis entails the maturity of data management culture in your organization. The levels in increasing order of maturity are:
i. Siloed approach; no collaboration
ii. Limited collaboration; Subject Matter Expert Syndrome
iii. Group of people; ad-hoc collaboration
iv. Operational teams; daily collaboration
v. Collective intelligence; operating with a purpose
Most organizations are at a stage where the data is defined and classified, and collaboration is ad-hoc. They may have an inventory of data processes in the organization, but they don’t yet have a full understanding of the data flow and data construction in their organization.
You wouldn’t want a medical student to jump into surgery before they know where and what everything underneath the skin is, right? Of course not! Well, your organization is a living, breathing entity that circulates data, before you do anything you need to do your homework.
So, as you embark on your journey to becoming data-centric, you need to capture the following 4 pieces which will help define what the data in your organization is, where it is, and how it moves about. These pieces are:
Business Glossary: A business view of the data assets in your organization with definitions and attributes that make sense to the business users.
Data Dictionary: An inventory of your data; an exhaustive list of all the data sources in your organization and the hierarchies and structures of those data sources.
Processing Catalog: Identify all data processing and transportation is taking place in your organization. How does your data flow from System A to a System B? What is the transformation logic that is happening between them?
Uses Catalog: How is data consumed in your organization? How are all of your data sets made available to data scientists, BI platforms, or other applications downstream that consume the data?
Once you have all of these elements down, you will have a pretty exhaustive view of your data as it exists in silos. That’s when the real fun begins.
The unifying element is how you will build the connections between these parts. Obviously there is no point of having a business glossary if you can’t link it to a data source or a column in a table somewhere.
Having a complete view of the data/IT landscape of your organization will help you determine the impact across the organization of adding or change something.
Generally speaking, all organizations have the following 4 categories of relationships to the data organization (followed by some examples of typical roles in these categories).
Data Governors: Those who are establishing the strategy around data, such as the CEOs. (Chief Data Officer, Domain Data Officer, Data Protection Officer)
Data Managers: They are from the business, they are the SMEs, and they understand how the data is operationally used in day-to-day processes. (Data Owner, Data Steward, Analytics Manager, Data Analyst)
Data Craftsmen: The ones from the IT side who are modeling the data (Data Architect, Data Modeler, Data Engineer, Data Scientist)
Data Prosumers: Anyone in the organization who are not in above 3 categories. They want to use the data for business purposes. (Data Manager, Marketing, Sales, Finance, Business Analyst)
Now, the first three categories make up about 10% of the organization, and largely must lead the journey to becoming data-centric. The other 90% is probably not yet data-centric, but is eager to be able to understand the data and make use of it. They will also be the ones to define the real use cases of data that add true business value.
Make no mistake about it, everyone in the organization is essential to transition to becoming data-centric as a whole. There is no point undertaking the whole process if it does not deliver what the end business users really expect out of the system.
The journey requires a top-down and bottom-up approach simultaneously. Obviously the business leaders of the organization need to put the initiative forward, or if it is not coming directly from the top then you need their sponsorship. The leaders of the initiative need to understand that value of becoming data-centric, even if few others in the organization do. At the same time, those leading the charge need to get input from those on the frontlines– the data prosumers– on the use cases required at their level, or you’ll be building a system that will not be adopted. And everyone together needs to be coached on the various sources and connections of data in the data organization, and their responsibility in maintaining its sanity.
Data-centricity will ultimately be implemented via tool of some kind. While what we have covered here is about the elements of a data-centric organization to give an idea of what that looks like, data governance is much broader than just this. It consists of a lot of policies and procedures, and all of this needs to be put in place before the tool.
A tool won’t be able to do anything by itself. You need to identify the benefit for day-to-day business and only then go about addressing the two axes of data governance.
The crowd-sourcing of data really needs to be the goal. The really rich data that is valuable to the business is coming from those in the field. Everyone needs to be empowered to enter the data system from an angle. They should be able to enter information and see the rest of the information in the system, whether they are a reporting analyst, business user, IT architect, C-suite executive, and so on.
After this scintillating masterclass (delivered under 20 minutes, mind you), we got some great questions from the audience which led to more great insights into data governance. We’re putting the best ones here for you.
Q: For a company that’s completely new with data, should the management be first focused on people or processes?
A: First and foremost, they should focus first on their strategy. They need to define why they want to be data driven. Do they want to increase revenue by selling the data? Do they want more insight on their marketing or financial decisions?
Once they know the WHY, they then need to define the roles they will need in the organization in order to be successful, and who will play those roles. Many orgs miss this step. It’s not just about naming a data steward who will have extra responsibilities on top of their existing work.
Then they need to define the processes they want to apply.
Q: How often should we update our data catalog?
A: It’s a day-to-day routine that should be embedded in your people’s work.
Of course, you won’t be adding new tables and columns every day, you’ll have a refresh schedule for that.
But the most critical part of implementing your data catalog is actually not the first initiative to load data, as many people might think it is, but rather ensuring well-planned maintenance of your content. For example, if IT is migrating a new system, then updating your data catalog should be a part of that process. Or if the business team makes a new screen in their ERP, they need to update the data catalog. If this discipline is not instilled in your organization, then six months down the line you’ll have an outdated data catalog and you’ll have to start all over. So maintaining your data catalog really needs to be a part of your day-to-day processes.
Q: Is it possible to design a data catalog which looks at only one aspect of data governance, such as security, quality, etc.?
A: Yes you can certainly focus on just one aspect. But to get the full benefit of a data catalog, you need to document every aspect of data and establish the links between them.
Q: How do you convince people who have never paid attention to data to start paying attention?
A: If you have to ask “How can you ensure the tool will be used?” then you are not at the right moment as an organization to take this journey. If the organization doesn’t understand the benefit of governing their data, then they will never be able to get sponsorship or adoption.
Data governance is highly conceptual, and many times people see it as simply another policy being forced on them.
When an organization is ready to take charge of their own data, then I like to start with a concrete example and hard facts. For example, if they have a BI report, I ask them if they can tell me the meaning of every single column in it and where the information for these columns are coming from. This initiates the process of understanding of where the data is in their systems. Inevitably it will involve identifying a couple of departments and lead back to the root of the data in IT. So this process of reflection starts breaking silos, helping define how the departments interact with each other, and what templates and processes need to be put in place. This is really a great way to get everyone’s buy-in.
Check back oftern at PDHNow for other articles of interest to the Professional Engineer. Give us a call with any questions regarding your PE license renewal or continuing education requirements.