What is Data Architecture?
Data or information architecture is the 2nd phase of activities in most Enterprise Architecture frameworks and the basis of many technical kinds of activities that happens in the EA lifecycle. This post today will talk about the primary set of activities and tasks that must be undertaken to provide a world-class EA plan for a solution, strategy or plan. Data or information architecture captures where your data assets are, what you or your systems are going to do with them, how they will be secured, integrated with, governed, classified, reported on, what their lifespan will be as several other factors. This set of deterministic activities happens only AFTER the business architecture is clearly defined. Data architecture should be an established practice in a large organization and at a minimum a standards guided set of activities in a smaller organization.
Why should I care?
Robust, intelligent and planful data architecture is the key to a stable, secure and value driven solution in today's enterprise. Think back to all of the systems, solutions and products you have worked on that didn't quite have all of the data that was needed. Remember those systems where you would be midway through a process and would have to stop and import a bunch of .csv files or data from somewhere lese? Remember how challenging it was to integrate between systems? Think of the myriad of issues surround data governance. Who own which data elements and governs who can see them, use them and modify them? Who really owns the security and risk mitigation of the data elements? How can the information and data in your enterprise be collated with other industry data on a large scale? Never mind all of the hype and misnomers in the industry right now around big data and data science, data architecture plans for those disciplines and sets them up to add value with minimalist barriers to entry. In the modern enterprise application ecosystem more and more enterprises are dealing with hybrid-cloud environments where all of your data assets are not on premise. How do you control and appropriately secure and report on them?
What are the main components of a data/information architecture plan?
Data Governance - Data / Information governance is the practice of understanding, classifying and enforcing who can see which data elements and logging their usage and understanding who is the steward of the data. If a proper data governance practice is in place, business stakeholders feel "in control" of their data assets and empowered to use them for competitive advantage and have confidence in their value and meaning. It is critical that all data assets have clear ownership and a clear process for allowing other systems, people, processes and reporting access to these portions of information. Lack of data governance typically results in data integrity, validity, quality and other problems. Even a simple process is better than no process.
Integration - A plan describing which data elements can be integrated with and what means they will be consumed and published. A robust integration strategy defines how and where integration will be made available. In a "cloud to ground" and "ground to cloud" data integration scenario, the old integration tools, methodologies and established practices may not apply. This is perhaps one of the most critical elements of a data architecture. Publicly available APIs that allow "the public" to integrate with your digital assets are becoming more and more important for modern companies. Enabling 3rd party vendors, suppliers and eventually other businesses and aggregators to leverage your assets and products. Integration is typically where the majority of pain occurs over the lifespan of a system or data asset. Make sure you have clear, concise plan and architecture that securely enables easy integration.
Data Models - The artifacts that describe where and how your data is related, sourced and how it is stored across schemas, integrations and endpoints. Some detailed data models describe the canonical relationships between data elements. For example, which data elements comprise an address? OF course the easy response is Street, city, state, zip code, etc. Well what happens when you have a customer or supplier address in a country that uses only PO boxes and does not have a zip or local code? Your data model must accommodate this or at least not preclude you from easily accommodating this scenario. A data model must also have a process for keeping it up-to-date. One of the worst practices seen in data architecture is not keeping these artifacts versioned and updated. Some scheme fro regularly reviewing and or keeping these artifacts fresh as part of your change and or release processes.
Data Quality – What is your plan and methodology for ensuring that your data is both accurate and maintained in perpetuity to be accurate and validated. What are your points of data validation? Is your data validated as it is entered, whether by an end user, employee or customer? Are there regular checks in where and how you validate your data? What are your best practices for data quality and do you need any tools to ensure your data quality? How mature is your practice? Are you scoring the quality and accuracy of your data? Ensuring that your data / information is accurate, in context and free of anomalous data problems is critical to the integrity of reporting, analytics and essentially every transaction. In my experience, I have seen executives lose trust in reporting because of 'glitches' or other problems with the reporting to the point where they missed huge opportunities because of their lack of confidence in the reporting. Data quality is a must.
Data Retention – How long does your data stay with your organization? Some kinds of data are subject to data lifespan requirements and must be deleted after a certain period of time or portions of the data removed or permanently obfuscated. Data elements and data sets that need to be redacted, deleted or scrubbed after certain periods of time must be called out, defined and a plan to ensure the needed elements are dealt with presented.
Data Security & Compliance - This may be the most important and scrutinized section of your data architecture. How will your data assets, integration assets and reporting assets be secured at rest, in transit and in memory. Much of the data architecture security depends on platforms and solutions that involve other phases of enterprise architecture [business, solution and technical]. Your data architecture should include your plan and strategy for encryption, key management and other mechanism that enable securing your data and integration assets. This component of data architecture deserves its own detailed section.
Data Warehousing - This is the plan for correlating and storing data from various systems into one central warehouse or cube for reporting and analytics purposes. This strategy will describe the tools, processes, patterns and methodologies used to make sure transactional data and information makes it into an analytical data store where it can be analyzed and refined for seamless reporting.
Reporting - Your data architecture must describe what tools, methodologies, and processes will be used to help report on the business activities, outcomes and Key performance indicators that are relevant to helping the business.
Metadata - In the simplest of terms metadata is "data about data". In other words the descriptive elements about your data or "more context" about your data. An example might be a image as your data and metadata would be the subject of the image, the date it was taken or maybe where it was taken and who has the copyright on the image. Your data architecture should describe how you plan to capture, store and correlate metadata.
Data Dictionaries - Data dictionaries are closely related to metadata, data warehousing and reporting. Data dictionaries hold the metadata and semantic information about the different data elements, reporting and analytic capabilities and the methods used to manipulate the data and information. Your data architecture plan must account for a data dictionary capability.
Data Platforms - This portion of your data architecture plan should describe the requirements of your data platforms. Not the actual platforms, technologies, tools or vendors of these platforms. This portion should also describe an inventory of existing data platforms and their capabilities as well as any business drivers that may necessitate the need for additional or different platform capabilities.