Can your data model support your Next Generation Data Analytics?
Part 1: Modern Data Platform
As I started writing this piece, it became a bit longer because this question really involves two pieces, neither of which can be adequately addressed in a short form narrative. As a result, two related blog posts were created. One blog post, which you are reading now, addresses the modern data platform. The other blog post, which is to come shortly, addresses the characteristics of the data itself.
One may ask why we chose to address the data platform ahead of the data characteristics. This order appears to go contrary to what all project management and business analysis methodologies guide the industry to do. We did this for one simple fact. When a hardware or software vendor comes to do a demo or a presentation, the nature of the conversation automatically starts from the tool and expands to the business case. Starting at a hardware or software demo almost automatically precludes an investigation of the data held within an organization and instead starts the conversation from the point of what data the tool can manage. This becomes especially important when an organization is trying to leverage an investment in a current toolset before purchasing another piece of software, hardware, or cloud subscription. With that caveat, Part 1 starts with the Modern Data Platform.
The Modern Data Platform
As part of the creation of data models for our clients the conversation inevitably turns to one of technology. The conversations quickly start to involve the toolsets hosting and managing the data. Often the technology conversation is injected with a statement like “How would this work with X database?”. At Xtensible the data platform question is addressed from the perspective of an overarching semantic approach and methodology which addresses the data platform question. Our semantic data model feeds into multiple conceptual and physical model representations. Our data modeling strategy supports many different technologies. With this philosophy in mind, Xtensible views the data management tools as part of a complete data platform. Our data platform view includes the software and hardware managing data in motion as well as data at rest.
What should a data platform do?
In a post from 2017, DBTA asks the question “What is a Modern Data Platform?” The caution from this article was that many “data platforms” are rebranded point solutions which only manage one piece of the data puzzle without the ability to manage a subset of the data or fulfill a specific functionality such as replication while leaving out other functionality such as data governance. According to the DBTA article, a data platform has 6 key capabilities:
- Open data access
- Virtual data consolidation
- Deep adaptable data indexing through metadata
- Comprehensive data security
- Lifecycle data services
- Data value delivery
It is worth taking the time to review this article to get a better understanding of this perspective.
Xtensible client engagements have consistently shown it is virtually impossible for one technology to fulfill the full role of the data platform. At a minimum, the data platform would need to govern and manage data as it moves as well as at rest. While one could argue that a data appliance running a database could use SQL statements and triggers to manage data movement, it would become readily apparent that this limited argument loses the bigger picture of anything outside of that appliance. The other argument is that a “cloud platform” can support the full role of the data technology which is absolutely correct. However, in any cloud offering there is not just one technology available but instead it consists of multiple technology offerings that are bundled onto one platform in a SaaS or PaaS collection with an IaaS for custom software that is not offered on that specific cloud. That raises the next question.
What is a data platform?
Quite simply Xtensible has found that anything which participates in the operational process of managing and consuming the data is part of the data platform. This can include applications or services which act as the sources of information, data transport mechanisms such as an extract transform and load (ETL) or enterprise service bus (ESB). This also includes data at rest storage technologies such as a database or NoSQL data storage technology. The most visible part of the platform is the data warehousing, the external data services, and the reporting dashboards. This would also include the technologies associated with the various functions the data platform must also support such as semantic reconciliation, data governance, metadata management, and source of data responsibilities. The data platform must be able to take data from the identified sources, manage the data, apply the necessary rules, and serve the information securely to authorized consumers as a completely reconciled information object. While a technology may participate in the creation of a data platform it can be easily demonstrated that no one technology can fully accept the role of a data platform. The needs imposed on a data platform are too great. Just as DBTA pointed out, some market offerings which claim to offer a full platform solution may just be rebranded partial solutions.
What does this have to do with data modeling?
An enterprise data model such as one based of the IEC CIM will greatly help the data platform in fulfilling its tasks of semantic reconciliation, sourcing and data governance, as well as the application of data transformation rules and meta data enrichment. This is how the data model will support your data platform and help guide your technology choices. For this exercise we will use the classic 3 V dimensions of big data along with a fourth V dimension of value. While these V dimensions typically identify the characteristics of Big Data technologies, in this scenario they will be used as the basis for technology use in the data platform.
In part 2 of this blog series the focus will be on considerations of processing of the data volume, velocity, variety, and value. The content will drive to what are the core elements that one should consider. One of Xtensible clients generates nearly 10 terabytes of data a week via the analytics generation; part 2 will help to give insights one needs to consider moving into the “Big Data” space as this example.Back To Blog