The main issue I have is that someone has to come up with a data model that includes the information required by everyone - a superset - rather than a subset what a point-to-point connection requires. It seems to me that this is very difficult to achieve, from a design point of view - capture everything - to a governance point of view - who is going to own this and define what an object is - to a technical point of view - very complex objects, different versions etc.
The "superset" he is talking about is merely a metamodel of the data that point-to-point connections would require. The canonical data model is a federated collection of local metamodels including the definition of the common semantics and the format transformation rules. It need not be "more" than you need and it does not contain any stored application data.
To enable loose coupling a layer of indirection is defined in terms of a global data space, a canonical data model and canonical messages. This enables the mapping of semantics and transformation of formats between mutually unknown (decoupled) endpoints.
A good way to understand the mechanism is to view the canonical messages as the formally defined carriers of specific information throughout the enterprise. Data providers (sending endpoints) fill the appropriate canonical message using the metadata defined in the canonical data model. Data consumers (receiving endpoints) consume the data from this canonical message, also using the metadata defined in the canonical data model. In this way the endpoints don't need to have any knowledge of eachother.
The endpoints don't even need to know the canonical data model. Services delivered by the infrastructure (global data space), which has knowledge of the canonical data model, will take care of loading the data delivered by an endpoint into the appropriate canonical message (carrier) and unload the data from the canonical message to be consumed by the receiving endpoint. The endpoints only use their own formats and are totally decoupled.
You might recognize that in fact - from a software architecture perspective - the canonical data model is the incarnation of loose coupling.
Indeed it is true that this addresses a governance-aspect that nowadays in most IT organizations is not represented very strongly. If you want to reach the next level of IT maturity based on the ideas of SOA and EDA, it is a prerequisite to extent your governance with regard to formal semantics and format definitions as well.
The idea of the canonical data model is to define the semantics and formats from the local endpoint perspectives. To be able to map the endpoint interfaces in a loose coupling context (endpoints do not know each other), an intermediate mediation layer needs to be in place. The canonical data model is the underpinning facility that allows for the mapping of the distinct local semantics and the transformation of the distinct local formats between decoupled and independent endpoints.
So yes, it is right that maturing your software architectures requires maturing the required governance: loose coupling comes at the price of a tighter governance. On the other hand: evolving SOA governance tools are coming to help.