Saturday, March 22, 2008

Canonical Data Model is the incarnation of Loose Coupling

Quote from a reader of my blog with regard to the Canonical Data Model:

The main issue I have is that someone has to come up with a data model that includes the information required by everyone - a superset - rather than a subset what a point-to-point connection requires. It seems to me that this is very difficult to achieve, from a design point of view - capture everything - to a governance point of view - who is going to own this and define what an object is - to a technical point of view - very complex objects, different versions etc.

The "superset" he is talking about is merely a metamodel of the data that point-to-point connections would require. The canonical data model is a federated collection of local metamodels including the definition of the common semantics and the format transformation rules. It need not be "more" than you need and it does not contain any stored application data.

To enable loose coupling a layer of indirection is defined in terms of a global data space, a canonical data model and canonical messages. This enables the mapping of semantics and transformation of formats between mutually unknown (decoupled) endpoints.

A good way to understand the mechanism is to view the canonical messages as the formally defined carriers of specific information throughout the enterprise. Data providers (sending endpoints) fill the appropriate canonical message using the metadata defined in the canonical data model. Data consumers (receiving endpoints) consume the data from this canonical message, also using the metadata defined in the canonical data model. In this way the endpoints don't need to have any knowledge of eachother.

The endpoints don't even need to know the canonical data model. Services delivered by the infrastructure (global data space), which has knowledge of the canonical data model, will take care of loading the data delivered by an endpoint into the appropriate canonical message (carrier) and unload the data from the canonical message to be consumed by the receiving endpoint. The endpoints only use their own formats and are totally decoupled.

You might recognize that in fact - from a software architecture perspective - the canonical data model is the incarnation of loose coupling.


Indeed it is true that this addresses a governance-aspect that nowadays in most IT organizations is not represented very strongly. If you want to reach the next level of IT maturity based on the ideas of SOA and EDA, it is a prerequisite to extent your governance with regard to formal semantics and format definitions as well.

Conclusion


The idea of the canonical data model is to define the semantics and formats from the local endpoint perspectives. To be able to map the endpoint interfaces in a loose coupling context (endpoints do not know each other), an intermediate mediation layer needs to be in place. The canonical data model is the underpinning facility that allows for the mapping of the distinct local semantics and the transformation of the distinct local formats between decoupled and independent endpoints.

So yes, it is right that maturing your software architectures requires maturing the required governance: loose coupling comes at the price of a tighter governance. On the other hand: evolving SOA governance tools are coming to help.

2 comments:

Anonymous said...

Hi Jack

Happy Easter!

I hope I understand you: A data provider sends its data in its own format. A data consumer receives this message, converts it to a canonical data model, possibly based on the message type, and then transforms it to its own format. All of this is happening within the "global data space" layer. (You probably would merge the transformation rules, instead of performing two transformations)

If I am correct so far - please interrupt at any time ;-) - then both endpoints are completely decoupled.

Let's assume that a new data consumer needs an additional piece of information, a piece of data which can be provided by the data provider. Wouldn't that mean that I have to change the transformation rules for both end points, because the canonical data model gets an additional field?

From a deployment view the whole "global data space" layer would become an atomic unit: A piece that can only be deployed in one piece. Is that a good idea when talking about a major backbone in the corporate's IT environment?

Cheers
Jan

Jack van Hoof said...

@Jan,

See my answer

-Jack