A reader commented on my posting: Canonical Data Model is the incarnation of Loose Coupling. Let me walk through the comment:
I hope I understand you: A data provider sends its data in its own format.Yes, that is correct.
A data consumer receives this message, converts it to a canonical data model, possibly based on the message type, and then transforms it to its own format.No, that is not correct. The message is converted to a canonical format by a generic transformation service. This service queries the canonical data model to get the transformation rules. The canonical message is published for consumption by any interested endpoint. Before consumption by an endpoint, another generic service converts the message from the canonical format to the endpoint's format. So the endpoint consumes the message in its own format.
All of this is happening within the "global data space" layer.Yes.
(You probably would merge the transformation rules, instead of performing two transformations)No. The messages are converted near the endpoints; there will always be an intermediate canonical instance of the message traveling across the global data space. This simplifies the mechanism. If there are multiple data providers and/or multiple data consumers, merged transformation rules would lead to an exponential increasing number of transformations, and multiple instances (different formats) of the message would travel across the global data space. See picture below.
The picture shows one message type that is provided by two different sources and that is consumed by 4 targets. The left hand side shows direct transformations whereas the right hand side shows an intermediate canonical message instance.
If I am correct so far - please interrupt at any time ;-) - then both endpoints are completely decoupled.
Let's assume that a new data consumer needs an additional piece of information, a piece of data which can be provided by the data provider. Wouldn't that mean that I have to change the transformation rules for both end points, because the canonical data model gets an additional field?Yes, if the new data was not foreseen at design time of the canonical message, you will have to extend the transformation rules in the canonical data model AND have the provider deliver the new data. But if the data were available, it would have been wise to model that data into the canonical message, even if it were not required at that moment.
If the data is not available you might add a new service that enriches the original message. This pattern is known as the VETO pattern.
By modeling the canonical messages from an event-driven perspective - messages representing relevant business events - and not from a "currently required data" perspective you might decrease the need for change.
No, not quite. You should think of federated infrastructures for the global data space as well as for the canonical datamodel.
From a deployment view the whole "global data space" layer would become an atomic unit: A piece that can only be deployed in one piece. Is that a good idea when talking about a major backbone in the corporate's IT environment?
Domains need only know there own formats and semantics plus the canonical formats and semantics. Not those of other domains. Relevant canonical formats and semantic definitions could be pushed to the domains in a federated model.
If you don't have a federated bus infrastructure, messages can yet be propagated across multiple bus implementations as depicted below.
A service subscribes to a published message in Bus 1 and calls (synchronously) a service in bus 2 to pass the message reliably. The called service republishes the message in Bus 2. This is a simple method to pass published messages across multiple independent service bus infrastructures that are unaware of each other and yet being part of one global data space.
See also a nice article I referred to in this blog about a distributed implementation of the global data space.