Data valuation

Data valuation is a discipline in the fields of accounting and information economics.^[1] It is concerned with methods to calculate the value of data collected, stored, analyzed and traded by organizations. This valuation depends on the type, reliability and field of data.

History

In the 21st century, exponential increases in computing power and data storage capabilities (in line with Moore's law) have led to a proliferation of big data, machine learning and other data analysis techniques. Businesses increasingly adapt these techniques and technologies to pursue data-driven strategies to create new business models.^{[citation needed]} Traditional accounting techniques used to value organizations were developed in an era before high-volume data capture and analysis became widespread and focused on tangible assets (machinery, equipment, capital, property, materials etc.), ignoring data assets. As a result, accounting calculations often ignore data and leave its value off organizations' balance sheets.^[2] Notably, in the wake of the 9/11 attacks on the World Trade Center in 2001, a number of businesses lost significant amounts of data. They filed claims with their insurance companies for the value of information that was destroyed, but the insurance companies denied the claims, arguing that information did not count as property and therefore was not covered by their policies.^[3]

A number of organizations and individuals began noticing this and then publishing on the topic of data valuation. Doug Laney, vice president and analyst at Gartner, conducted research on Wall Street valued companies, which found that companies that had become information-centric, treating data as an asset, often had market-to-book values two to three times higher than the norm.^[3]^[4] On the topic, Laney commented: "Even as we are in the midst of the Information Age, information simply is not valued by those in the valuation business. However, we believe that, over the next several years, those in the business of valuing corporate investments, including equity analysts, will be compelled to consider a company's wealth of information in properly valuing the company itself."^[2] In the latter part of the 2010s, the list of most valuable firms in the world (a list traditionally dominated by oil and energy companies) was dominated by data firms – Microsoft, Alphabet, Apple, Amazon and Facebook.^[5]^[6]

Characteristics of data as an asset

A 2020 study by the Nuffield Institute at Cambridge University, UK divided the characteristics of data into two categories, economic characteristics and informational characteristics.^[7]

Economic characteristics

Data is non-rival. Multiple people can use data without it being depleted or used up.
Data varies in whether it is excludable. Data can be a public good or a club good, depending on what type of information it contains. Some data can reasonably be shared with anyone who desires to access it (e.g., weather data). Other data is limited to particular users and contexts (e.g., administrative data).
Data involves externalities. In economics, an externality is the cost or benefit that affects a third party who did not choose to incur that cost or benefit. Data can create positive externalities because when new data is produced, it combines with already existing data to produce new insights, increasing the value of both, and negative externalities, when data may be leaked, breached or otherwise misused.
Data may have increasing or decreasing returns. Sometimes collecting more data increases insight or value, though at other times it can simply lead to hoarding.
Data has a large option value. Due to the perpetual development of new technologies and datasets, it is hard to predict how the value of a particular data asset might change. Organizations may store data, anticipating possible future value, rather than actual present value.
Data collection often has high up-front cost and low marginal cost. Collecting data often requires significant investment in technologies and digitization. Once these are established, further data collection may cost much less. High entry barriers may prevent smaller organizations from collecting data.
Data use requires complementary investment. Organizations may need to invest in software, hardware and personnel to realize value from data.

Informational characteristics

Subject matter. Encompasses what the data describes, and what can it help with.
Generality. Some data is useful across a range of analyses; other data is useful only in particular cases.
Temporal coverage, Data can be forecast, real-time, historic or back-cast. These are used differently, for planning, operational and historical analyses.
Quality. Higher quality data is generally more valuable as it reduces uncertainty and risk, though the required quality varies from use to use. Greater automation in data collection tends to lead to higher quality.
Sensitivity. Sensitive data is data that could be used in damaging ways (e.g., personal data, commercial data, national security data). Costs and risks are incurred keeping sensitive data safe.
Interoperability and linkability. Interoperability relates to the use of data standards when representing data, which means that data relating to the same things can be easily brought together. Linkability relates to the use of standard identifiers within the data set that enables a record in one data set to be connected to additional data in another data set.

Data value drivers

A number of drivers affect the extent to which future economic benefits can be derived from data. Some drivers relate to data quality, while others may either render the data valueless or create unique and valuable competitive advantages for data owners.^[8]

Exclusivity. Having exclusive access to a data asset makes it more valuable than if it is accessible to multiple license holders.
Timeliness. For much data, the more closely it reflects the present, the more reliable the conclusions that can be drawn from it. Recently captured data is more valuable than historic data.
Accuracy. The more closely data describes the truth, the more valuable it is.
Completeness. The more variables about a particular event or object described by data, the more valuable the data is.
Consistency. The more a data asset is consistent with other similar data assets, the more valuable it is (e.g., there are no inconsistencies as to where a customer resides).
Usage Restrictions. Data collected without necessary approvals for usage (e.g., personal data for marketing purposes) is less valuable as it cannot be used legally.
Interoperability/Accessibility. The more easily and effectively data can be combined with other organizational data to produce insights, the more valuable it is.
Liabilities and Risk. Reputational consequences and financial penalties for breaching data regulations such as GDPR can be severe. The greater the risk associated with data use, the lower its value.

The process of realizing value from data can be subdivided into a number of key stages: data assessment, where the current states and uses of data are mapped; data valuation, where data value is measured; data investment, where capital is spent to improve processes, governance and technologies underlying data; data utilization, where data is used in business initiatives; and data reflection, where the previous stages are reviewed and new ideas and improvements are suggested.^[9]

Methods for valuing data

Due to the wide range of potential datasets and use cases, as well as the relative infancy of data valuation, there are no simple or universally agreed upon methods. High option value and externalities mean data value may fluctuate unpredictably, and seemingly worthless data may suddenly become extremely valuable at an unspecified future date.^[7] Nonetheless, a number of methods have been proposed for calculating or estimating data value.

Information-theoretic characterization

Information theory provides quantitative mechanisms for data valuation. For instance, secure data sharing requires careful protection of individual privacy or organization intellectual property. Information-theoretic approaches and data obfuscation can be applied to sanitize data prior to its dissemination.^[10]^[11]

Information-theoretic measures, such as entropy, information gain, and information cost, are useful for anomaly and outlier detection.^[12] In data-driven analytics, a common problem is quantifying whether larger data sizes and/or more complex data elements actually enhance, degrade, or alter the data information content and utility. The data value metric (DVM) quantifies the useful information content of large and heterogeneous datasets in terms of the tradeoffs between the size, utility, value, and energy of the data.^[13] Such methods can be used to determine if appending, expanding, or augmenting an existent dataset may improve the modeling or understanding of the underlying phenomenon.

Infonomics valuation models

Doug Laney identifies six approaches for valuing data, dividing these into two categories: foundational models and financial models. Foundational models assign a relative, informational value to data, where financial models assign an absolute, economic value.^[14]

Foundational models

Intrinsic Value of Information (IVI) measures data value drivers including correctness, completeness and exclusivity of data and assigns a value accordingly.
Business Value of Information (BVI) measures how fit the data is for specific business purposes (e.g., initiative X requires 80% accurate data that is updated weekly – how closely does the data match this requirement?).
Performance Value of Information (PVI) measures how the usage of the data effects key business drivers and KPIs, often using a control group study.

Financial models

Cost Value of Information (CVI) measures the cost to produce and store the data, the cost to replace it, or the impact on cash flows if it was lost.
Market Value of Information (MVI) measures the actual or estimated value the data would be traded for in the data marketplace.
Economic Value of Information (EVI) measures the expected cash flows, returns or savings from the usage of the data.

Bennett institute valuations

Research by the Bennett Institute divides approaches for estimating the value of data into market-based valuations and non-market-based valuations.^[7]

Market based valuations

Stock market valuations measure the advantage gained by organizations that invest in data and data capability.
Income based valuations seek to measure the current and future income derived from data. This approach has limitations due to its inability to measure value realized in a wider business or societal ecosystem, or beyond financial transactions involving data. Where income from data is realized through trading data in a marketplace, there are further limitations, as markets fail to describe the full option value of data, and usually lack enough buyers and sellers for the market to settle on a price that truly reflects the economic value of the data.
Cost based valuations measure the cost to create and maintain data. This can look at the actual cost incurred, or projected costs if the data needed to be replaced.

Non-market based valuations

Economic value of open data examines who open or free data creates value for: organizations that host or steward the data; intermediary organizations or individuals that reuse the data to create products and services; organizations and individuals that use these products and services.
Value of personal data can be estimated by asking consumers questions such as how much they would be willing to pay to access a data-privacy service or would charge for access to their personal data. Values can also be estimated by examining the profits of companies that rely on personal data (In 2018 Facebook generated $10 for every active user), and by examining fines handed out to organizations that breach data privacy or other regulations.

Other approaches

A modified cost value approach suggests refinements to a cost-based valuation approach. It proposes the following modifications: data collected redundantly should be considered to have zero value to avoid double counting; unused data should be considered to have zero value (this can be identified via data usage statistics); the number of users and number of accesses to the data should be used to multiply the value of the data, allowing the historical cost of the information to be modified in the light of its use in practice; the value should be depreciated based on a calculated "shelf life" of the information; the value should be modified by its accuracy relative to what is considered an acceptable degree of accuracy.^[15]
A consumption-based approach builds on the principles in the modified cost value approach by assigning data users different weightings based on the relative value they contribute to the organization. These weightings are including in the modelling of data usage statistics and further modify the measured value of data.^[16]
Data hub valuation uses a cost-based approach that measures the cost of data hubs where large repositories of data are stored, rather than measuring the cost of separate datasets. The data hub cost can then be modified, as in the consumption based and modified cost value approaches.^[17] Another hub valuation approach uses a modified market value approach, by measuring savings to users from accessing data via hubs versus individually accessing data from producers, and user willingness-to-pay for access to data hubs.^[18]
A stakeholder approach engages key stakeholders to value data, examining how data supports activities which external stakeholders identify as creating value for them. It uses a model that combines the total value created by the organization, a weighted list of value creating initiatives (as defined by external stakeholders) and an inventory of data assets. This approach was developed in a collaboration between Anmut, a consultancy firm, and Highways England, a public sector agency for which data valuations based on market value, income gains or economic performance are less meaningful. The approach can also be applied in the private sector.^[19]^[20]

Companies performing Data Valuations

Oyster Venture Partners [1] performs Data Valuation as a Service for companies. They provide a provide a proven and defensible data valuation service to determine a monetary value for an organization's data assets. Their services are designed to ensure maximum value for companies' data assets so they can manage it as a monetary intangible asset. They have realized over $1.5 Billion in data asset value.

Data Valuation as a Service provides:

A data valuation report from 21 different data valuation methodologies and calculations to create a defensible valuation of your data unique to your company and its data.
An interrogation of data via data due diligence and for strategy, security, governance, monetization, substantiation, security, privacy and people
A data monetization strategies review against each use case in order to glean as much current and future value of data as possible.
Analytic evidence of data value as well as model forecasts for data drivers, use case, and monetization impacts to the data valuation of your data

References

^ Allen, Beth (1990). "Information as an Economic Commodity". The American Economic Review. 80 (2): 268–273. JSTOR 2006582.
^ ^a ^b "Gartner Says Within Five Years, Organizations Will Be Valued on Their Information Portfolios".
^ ^a ^b "How Do You Value Information?". 15 September 2016.
^ "Applied Infonomics: Why and How to Measure the Value of Your Information Assets".
^ "The Value of Data". 22 September 2017.
^ "Most Valuable Companies in the World – 2020".
^ ^a ^b ^c "The Value of Data Summary Report" (PDF).
^ "Putting a value on data" (PDF).
^ "Data Valuation – What is Your Data Worth and How do You Value it?". 13 September 2019.
^ Askari, M; Safavi-Naini, R; Barker, K (2012). "An information theoretic privacy and utility measure for data sanitization mechanisms". Proceedings of the second ACM conference on Data and Application Security and Privacy. Association for Computing Machinery. pp. 283–294. doi:10.1145/2133601.2133637. ISBN 9781450310918. S2CID 18338542.
^ Zhou, N; Wu, Q; Wu, Z; Marino, S; Dinov, ID (2022). "DataSifterText: Partially Synthetic Text Generation for Sensitive Clinical Notes". Journal of Medical Systems. 46 (96): 96. doi:10.1007/s10916-022-01880-6. PMC 10111580. PMID 36380246.
^ Lee , W; Xiang, D (2001). "Information-theoretic measures for anomaly detection". Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001. IEEE. pp. 130–143. doi:10.1109/SECPRI.2001.924294. ISBN 0-7695-1046-9. S2CID 6014214.
^ Noshad, M; Choi, J; Sun, Y; Hero, A; Dinov, ID (2021). "An information theoretic privacy and utility measure for data sanitization mechanisms". J Big Data. 8 (82). Springer: 82. doi:10.1186/s40537-021-00446-6. PMC 8550565. PMID 34777945.
^ "Why and How to Measure the Value of your Information Assets".
^ "Measuring the Value of Information: An Asset Valuation Approach" (PDF).
^ "The Valuation of Data as an Asset" (PDF).
^ "Consumption-Based Method". 4 December 2018.
^ "Keeping Research Data Safe Method". 4 December 2018.
^ "Why you should be treating data as an asset". 2 March 2020.
^ "Data Valuation – Valuing the World's Greatest Asset".

[1] Allen, Beth (1990). "Information as an Economic Commodity". The American Economic Review. 80 (2): 268–273. JSTOR 2006582.

[auto-2] "Gartner Says Within Five Years, Organizations Will Be Valued on Their Information Portfolios".

[auto2-3] "How Do You Value Information?". 15 September 2016.

[4] "Applied Infonomics: Why and How to Measure the Value of Your Information Assets".

[5] "The Value of Data". 22 September 2017.

[6] "Most Valuable Companies in the World – 2020".

[auto1-7] "The Value of Data Summary Report" (PDF).

[putting-8] "Putting a value on data" (PDF).

[9] "Data Valuation – What is Your Data Worth and How do You Value it?". 13 September 2019.

[10] Askari, M; Safavi-Naini, R; Barker, K (2012). "An information theoretic privacy and utility measure for data sanitization mechanisms". Proceedings of the second ACM conference on Data and Application Security and Privacy. Association for Computing Machinery. pp. 283–294. doi:10.1145/2133601.2133637. ISBN 9781450310918. S2CID 18338542.

[11] Zhou, N; Wu, Q; Wu, Z; Marino, S; Dinov, ID (2022). "DataSifterText: Partially Synthetic Text Generation for Sensitive Clinical Notes". Journal of Medical Systems. 46 (96): 96. doi:10.1007/s10916-022-01880-6. PMC 10111580. PMID 36380246.

[12] Lee , W; Xiang, D (2001). "Information-theoretic measures for anomaly detection". Proceedings 2001 IEEE Symposium on Security and Privacy. S&P 2001. IEEE. pp. 130–143. doi:10.1109/SECPRI.2001.924294. ISBN 0-7695-1046-9. S2CID 6014214.

[13] Noshad, M; Choi, J; Sun, Y; Hero, A; Dinov, ID (2021). "An information theoretic privacy and utility measure for data sanitization mechanisms". J Big Data. 8 (82). Springer: 82. doi:10.1186/s40537-021-00446-6. PMC 8550565. PMID 34777945.

[14] "Why and How to Measure the Value of your Information Assets".

[15] "Measuring the Value of Information: An Asset Valuation Approach" (PDF).

[16] "The Valuation of Data as an Asset" (PDF).

[17] "Consumption-Based Method". 4 December 2018.

[18] "Keeping Research Data Safe Method". 4 December 2018.

[19] "Why you should be treating data as an asset". 2 March 2020.

[20] "Data Valuation – Valuing the World's Greatest Asset".

[1]

[2]

[3]

[4]

[5]

[6]

[7]

[8]

[9]

[10]

[11]

[12]

[13]

[14]

[15]

[16]

[17]

[18]

[19]

[20]