Single source of truth

In information science and information technology, single source of truth (SSOT) architecture, or single point of truth (SPOT) architecture, for information systems is the practice of structuring information models and associated data schemas such that every data element is mastered (or edited) in only one place, providing data normalization to a canonical form (for example, in database normalization or content transclusion).

There are several scenarios with respect to copies and updates:

The master data is never copied and instead only references to it are made; this means that all reads and updates go directly to the SSOT.
The master data is copied but the copies are only read and only the master data is updated; if requests to read data are only made on copies, this is an instance of CQRS.
The master data is copied and the copies are updated; this needs a reconciliation mechanism when there are concurrent updates.
- Updates on copies can be thrown out whenever a concurrent update is made on the master, so they are not considered fully committed until propagated to the master. (many blockchains work that way.)
- Concurrent updates are merged. (if an automatic merge fails, it could fall back on another strategy, which could be the previous strategy or something else like manual intervention, which most source version control systems do.)

The advantages of SSOT architectures include easier prevention of mistaken inconsistencies (such as a duplicate value/copy somewhere being forgotten), and greatly simplified version control. Without a SSOT, dealing with inconsistencies implies either complex and error-prone consensus algorithms, or using a simpler architecture that's liable to lose data in the face of inconsistency (the latter may seem unacceptable but it is sometimes a very good choice; it is how most blockchains operate: a transaction is actually final only if it was included in the next block that is mined).

Ideally, SSOT systems provide data that are authentic (and authenticatable), relevant, and referable.^[1]

Deployment of an SSOT architecture is becoming increasingly important in enterprise settings where incorrectly linked duplicate or de-normalized data elements (a direct consequence of intentional or unintentional denormalization of any explicit data model) pose a risk for retrieval of outdated, and therefore incorrect, information. Common examples (i.e., example classes of implementation) are as follows:

In electronic health records (EHRs), it is imperative to accurately validate patient identity against a single referential repository, which serves as the SSOT. Duplicate representations of data within the enterprise would be implemented by the use of pointers rather than duplicate database tables, rows, or cells. This ensures that data updates to elements in the authoritative location are comprehensively distributed to all federated database constituencies in the larger overall enterprise architecture. EHRs are an excellent class for exemplifying how SSOT architecture is both poignantly necessary and challenging to achieve: it is challenging because inter-organization health information exchange is inherently a cybersecurity competence hurdle, and nonetheless it is necessary, to prevent medical errors, to prevent the wasted costs of inefficiency (such as duplicated work or rework), and to make the primary care and medical home concepts feasible (to achieve competent care transitions).
Single-source publishing as a general principle or ideal in content management relies on having SSOTs, via transclusion or (otherwise, at least) substitution. Substitution happens via libraries of objects that can be propagated as static copies which are later refreshed when necessary (that is, when refreshing of the copy-paste or import is triggered by a larger updating event). Component content management systems are a class of content management systems that aim to provide competence on this level.

Implementation

Ontologic interactions

An acknowledged prerequisite (of the notion that any given single source of truth can exist) is that it depends on the ontologic condition that no more than a single truth (about any particular fact or idea) exists, an assertion that is ontologic in both the IT sense and the general sense of that word. In many instances, this presents no problem (for example, within particular namespaces, or even across them, as long as naming collisions or broader name conflicts are adequately handled). The broadest contexts (and thus thorniest, regarding ontologic discrepancies) require adequate epistemic regime comparison and reconciliation (or at least negotiation or transactional exchanges). An archetypal example of this class of reconciliation is that two theological seminary libraries, from two different religions (X and Y), could exchange information with an SSOT architecture, but the unification of truth would reside on the level of the statement that "religion X asserts that God is purple whereas religion Y asserts that God is green", rather than on the level of "God is purple" or "God is green".

Architectures or architectural features

An ideal implementation of SSOT is rarely possible in most enterprises. This is because many organisations have multiple information systems, each of which needs access to data relating to the same entities (e.g., customer). Often these systems are purchased as commercial off-the-shelf products from vendors and cannot be modified in trivial ways. Each of these various systems therefore needs to store its own version of common data or entities, and therefore each system must retain its own copy of a record (hence immediately violating the SSOT approach defined above). For example, an enterprise resource planning (ERP) system (such as SAP or Oracle e-Business Suite) may store a customer record; the customer relationship management (CRM) system also needs a copy of the customer record (or part of it) and the warehouse dispatch system might also need a copy of some or all of the customer data (e.g., shipping address). In cases where vendors do not support such modifications, it is not always possible to replace these records with pointers to the SSOT.

For organisations (with more than one information system) wishing to implement a Single Source of Truth (without modifying all but one master system to store pointers to other systems for all entities), some supporting architectures are:

Master data management (MDM)
Event store and event sourcing (ES)

Master data management (MDM)

An MDM system can act as the source of truth for any given entity that might not necessarily have an alternative "source of truth" in another system. Typically the MDM acts as a hub for multiple systems, many of which could allow (be the source of truth for) updates to different aspects of information on a given entity. For example, the CRM system may be the "source of truth" for most aspects of the customer, and is updated by a call centre operator. However, a customer may (for example) also update their address via a customer service web site, with a different back-end database from the CRM system. The MDM application receives updates from multiple sources, acts as a broker to determine which updates are to be regarded as authoritative (the golden record) and then syndicates this updated data to all subscribing systems. The MDM application normally requires an ESB to syndicate its data to multiple subscribing systems.^[2]

Event store and event sourcing (ES)

In event oriented architectures, it has become increasingly common to find an implementation of the Event Sourcing pattern which stores the system state as an ordered sequence of state changes.^[3] To do this, you need an Event Store, a particular type of database designed to hold all the events that change the state of the system. The event store in an Event Sourcing Command Query Responsibility Separation Domain Driven Design Messaging architecture is in fact a "single source of truth", with the additional advantage that it can also act as an Enterprise Service Bus as it can listen directly to the event store for status changes as everything passes by. In addition, by saving all the events, it also plays the role of Data Warehouse. One last advantage is that through this system the Shared Database pattern can be implemented, another technique not mentioned to obtain a single source of truth.

Data warehouse (DW)

While the primary purpose of a data warehouse is to support reporting and analysis of data that has been combined from multiple sources, the fact that such data has been combined (according to business logic embedded in the data transformation and integration processes) means that the data warehouse is often used as a de facto SSOT. Generally, however, the data available from the data warehouse are not used to update other systems; rather the DW becomes the "single source of truth" for reporting to multiple stakeholders. In this context, the Data Warehouse is more correctly referred to as a "single version of the truth" since other versions of the truth exist in its operational data sources (no data originates in the DW; it is simply a reporting mechanism for data loaded from operational systems).^[4]

References

^ "IBM Smarter Planet - Operational risk management for financial services". IBM. Archived from the original on 2015-09-24.
^ BAYT Job Site - June 2014
^ "Event Sourcing". martinfowler.com. Retrieved 2021-12-06.
^ "What Is a Data Warehouse?". Oracle database. Retrieved 2023-08-10. Data warehouses are solely intended to perform queries and analysis and often contain large amounts of historical data. The data within a data warehouse is usually derived from a wide range of sources such as application log files and transaction applications.
^ Why Google stores billions of lines of code in a single repository

[1] "IBM Smarter Planet - Operational risk management for financial services". IBM. Archived from the original on 2015-09-24.

[2] BAYT Job Site - June 2014

[3] "Event Sourcing". martinfowler.com. Retrieved 2021-12-06.

[4] "What Is a Data Warehouse?". Oracle database. Retrieved 2023-08-10. Data warehouses are solely intended to perform queries and analysis and often contain large amounts of historical data. The data within a data warehouse is usually derived from a wide range of sources such as application log files and transaction applications.

[5] Why Google stores billions of lines of code in a single repository

[1]

[2]

[3]

[4]

[5]