Wikidata:Property proposal/earliest known OpenTimestamps

From Wikidata
Jump to navigation Jump to search

earliest known OpenTimestamps

[edit]

Originally proposed at Wikidata:Property proposal/Commons

   Not done
DescriptionEarliest known OpenTimestamps proof of an image, video or audio file
Data typeString
DomainOpenTimestamps proof (Q105855601)
Allowed valuesbase64. Regex: [^-A-Za-z0-9 /=]
Example 1File:Monumento_en_recuerdo_de_las_víctimas_de_la_pandemia_del_Covid-19_(Madrid),_18_May_2020.jpg

AE9wZW5UaW1lc3RhbXBzAABQcm9vZgC/ieLohOiSlAEIqmhCuD/Hf4ZtJzsAHTuQh9y0nPkClhDz 1fdeopuI7h7wENd17BixRGTBvCRgkX/EAkcI8CBa w41aQZeT67MxV8a6yzLwM5 CGD1Ulj/ab4u 4EkpCAjxIB NHI1sm5WX8 US70Xgx4X8thEd5QPvpmRH4pwLn qECPEgwZerXDX3dIzm2TFEDW45 Q3AtCOochAehg/J5ogyYtdgI8CA8SlvQWaw7khoCbhoQtd0 PFDMpT8t/fGo5tMn6Ris4wjwIHmO KaNIQi3LwWhGADlVG/sN7sD6GNXtPN6jI5aqbOFMCPAg9QE6QOmANEFop4w5S6YGF8pBqOcbRxit gwwxawlIm9AI8BCKpKkqvs82FFUY pbH c/kCPEEXsLzg/AI5DlgRjbBYDcI8SCcKPfc5sG8TJjQ Xp kcNV/yIbcJ1TaPKuJltI1e1EdewjxIIywgDtDEexxANU6/hRjxfLVPyWpBRp9JKYBzypMvmtA CPEgh1kQ6Urwy6GpyR2Wa 53Emr8AvbXwmotNVh74fEhgYEI8CA1uwbzIusvFmTWqlf/9tg0ih3n p589opkQ/mz6Fmr0hAjxIFEwStiOuUVBzNWZXsaYM7h8rcJ56reJ9V06Ku9Wq38PCPAg5OygZlVq 00XtW3f3VyJhj7Qh7E8 sNrzshYgNw7KgMI8SDGNzqPE1 icnk1NRdBzNytsMguZAFgvogKDu5P 7CIP1AjxIN4HfpyjWeEWG4pnwXtxsEDa RAx4Vi/mNaU8u0Hts bCPEg6e5CofTGfhHfpM7RGXPE QjNY5Fw0CFTuCT3S5MN/WmcI8SB9I6WjLXjPYThqGqC/elvtbJu3t3gRw35gWp8jQRGNiwjxII9e CBzL40YnNfzyn4ORhBCFjpAwgtymAbK kVsup6jSCPEgcD7U8WAEmftpJcH4Yc/5csto8TCtILdv 65lWDjwqoCAI8CAL8o45rswne7cZbmrrXg3MQP565Ryn/KVoDUhzqduBkAjxICsjI 9/MlqYf2D7 2NMhUS9GIT0H5lB9J2voTDva7ZsJCPAgYHvLDgmDgzmITM8SapnzS1oGOwpZidYbHrkdy ElXoUI 8XEBAAAAAc QsL9Yqo4aUWGF5T9lojorCl42C7EaS9XwbWQ9XJ0IAAAAABcWABQ/ZZ3/jRuZ0NSz EGQUN0dZQ89VRv3///8CAUYEAAAAAAAXqRQhsSNjOkrjMU/sC TiStQnBwbTHocAAAAAAAAAACJq IPAEbKAJAAgI8CCvRyFW9QMGdd7kHReE0xQgG4/78U6FStJ51o25YB/1IwgI8SAQx2gEme6PKezd cePDHRV4oLSSgOjyG8jZDw/0S6nnZwgI8SB5Swqf8QwWAJfuX2Z3XrTQKAfnFFKTwyny51VYIxQy bQgI8CC/oEBbCu2Qe76XpB Wg4NL/0n21GxA4HHRCQe10PhhBwgI8SCLhMxRar7DRnPmaBz/GSxP g5pppSXWVPzmbj46Jk7 QAgI8CC1MpoACmclmjs4F NHHA1MGme/Io8mPYw 2yDlOhobkggI8SA3 edryoUc9SQw62Z7 u53TtK5ztEnGInUjWqCa/IZwLggI8SCt3iW9UHON4S11 6FdPMl5bIByFfoA OCfA/UYooNz37wgI8SDR EV5h9mQHeZJXDGy0MwB0v9TMUtozb4N7NnCcLTUAQgI8SAG3Tj7sWdc Og8BR3TeOFdWggcP7H3wrDRtVm5wEXhAUQgI8SDit/ppsbaYVBNs1c1OlOJHVK NrjktRaReuWDr QGvlnQgI8CB2PhvEnsOMgwvG7bzA1rR4q9 UFxzaIs8BXViUdPFeBQgIAAWIlg1z1xkBA 3AJg==

Example 2MISSING
Example 3MISSING

Motivation

[edit]

The timestamp in the Exif metadata of an image file is only as accurate as the clock in the camera, and it may be completely wrong. Moreover, it can be easily faked with a software app. The OpenTimestamps proof can not be faked: it guarantees a file existed at a specific time or earlier. This may be critical for historically relevant images and for new original discoveries.

It can also be applied to any other file format, including video and audio files.

OpenTimestamps is open source, scalable, permissionless and free to timestamp and to verify. FrankAndProust (talk) 19:06, 9 June 2021 (UTC)[reply]

Discussion

[edit]

The size of a typical pruned OpenTimestamps proof is around 1500 bytes in base64 format (String). This applies to any file, no matter how big.

The image on Example 1 was uploaded to Wikimedia Commons on 8th June 2021. The uploader might find difficult to convince the community that the image is of a much earlier date, specifically from 18th May 2020 because Exif data are easy to fake. However, the base64 representation of the OpenTimestamps timestamp, irrefutably proofs the image is from date 19th May 2020 or earlier. --FrankAndProust (talk) 19:06, 9 June 2021 (UTC)[reply]

 Comment @FrankAndProust: Can you check whether these strings fit within the limits of Wikidata string type? It looks to me like it will be too long. You can test this on test.wikidata.org - create a new property there for this and try it out. ArthurPSmith (talk) 17:34, 10 June 2021 (UTC)[reply]
 Comment @ArthurPSmith: You are right. This OpenTimestamps (OTS) proof is slightly longer than the maximum length constraint for String on Commons (1500 characters) when encoded in base64.
It would be possible to bypass the maximum length constraint if we used the "Tabular data" datatype, but that would need a cluttered hack so I consider it would not be appropriate.
Do you feel there is a suitable way of embedding the OTS proof on a property? If not, I will relinquish this property proposal and open a new one with URL as the suggested data type. --FrankAndProust (talk) 09:05, 11 June 2021 (UTC)[reply]
I suppose the timestamp is itself a file, but it's not one of the formats that Commons accepts - I guess it could be done with a "cluttered hack" approach as you suggest! Other than that, do you think the more compact en:Ascii85 format rather than base64 would work here? ArthurPSmith (talk) 12:22, 11 June 2021 (UTC)[reply]
Ascii85 provides some savings but it still falls short of the needs.
A pruned OTS proof can be consistently reflected on a tabular data datatype: a three column table is needed; the first column is the execution order, the second column is the command (mandatory), the third column is the argument (non-mandatory). Commands are executed sequentially from first to last to generate the proof. The exact replica of the the previously stated binary proof encoded as base64 as tabular data would be:

JPEG file sha256 hash (initial value): aa6842b83fc77f866d273b001d3b9087dcb49cf9029610f3d5f75ea29b88ee1e

Execution order Command (mandatory) Argument (non-mandatory)
1 append d775ec18b14464c1bc2460917fc40247
2 sha256
3 prepend 1f8d1c8d6c9b9597f3e512ef45e0c785fcb6111de503efa66447e29c0b9fea84
4 sha256
5 prepend c197ab5c35f7748ce6d931440d6e3943702d08ea1c8407a183f279a20c98b5d8
6 sha256
7 append 3c4a5bd059ac3b921a026e1a10b5dd3e3c50cca53f2dfdf1a8e6d327e918ace3
8 sha256
9 append 798e29a348422dcbc168460039551bfb0deec0fa18d5ed3cdea32396aa6ce14c
10 sha256
11 append f5013a40e980344168a78c394ba60617ca41a8e71b4718ad830c316b09489bd0
12 sha256
13 append 8aa4a92abecf36145518fa96c7f9cfe4
14 sha256
15 prepend 5ec2f383
16 append e439604636c16037
17 sha256
18 prepend 9c28f7dce6c1bc4c98d05e9fa470d57fc886dc2754da3cab8996d2357b511d7b
19 sha256
20 prepend 8cb0803b4311ec7100d53afe1463c5f2d53f25a9051a7d24a601cf2a4cbe6b40
21 sha256
22 prepend 875910e94af0cba1a9c91d966bee77126afc02f6d7c26a2d35587be1f1218181
23 sha256
24 append 35bb06f322eb2f1664d6aa57fff6d8348a1de7a79f3da29910fe6cfa166af484
25 sha256
26 prepend 51304ad88eb94541ccd5995ec69833b87cadc279eab789f55d3a2aef56ab7f0f
27 sha256
28 append e4eca066556afb4d17b56ddfdd5c89863ed087b13cfac36bcec85880dc3b2a03
29 sha256
30 prepend c6373a8f135fa2727935351741ccdcadb0c82e640160be880a0eee4fec220fd4
31 sha256
32 prepend de077e9ca359e1161b8a67c17b71b040daf91031e158bf98d694f2ed07b6cf9b
33 sha256
34 prepend e9ee42a1f4c67e11dfa4ced11973c4423358e45c340854ee093dd2e4c37f5a67
35 sha256
36 prepend 7d23a5a32d78cf61386a1aa0bf7a5bed6c9bb7b77811c37e605a9f2341118d8b
37 sha256
38 prepend 8f5e081ccbe3462735fcf29f83918410858e903082dca601b2be915b2ea7a8d2
39 sha256
40 prepend 703ed4f1600499fb6925c1f861cff972cb68f130ad20b76feb99560e3c2aa020
41 sha256
42 append 0bf28e39aecc277bb7196e6aeb5e0dcc40fe7ae51ca7fca5680d4873a9db8190
43 sha256
44 prepend 2b2323ef7f325a987f60fbd8d321512f46213d07e6507d276be84c3bdaed9b09
45 sha256
46 append 607bcb0e09838339884ccf126a99f34b5a063b0a5989d61b1eb91dcbe1255e85
47 sha256
48 prepend 0100000001cf90b0bf58aa8e1a516185e53f65a23a2b0a5e360bb11a4bd5f06d643d5c9d0800000000171600143f659dff8d1b99d0d4b310641437478633cf5546fdffffff02014604000000000017a91421b123633a4ae3314fec0be4e24ad4270706d31e870000000000000000226a20
49 append 6ca00900
50 sha256
51 sha256
52 append af472156f5030675dee41d1784d314201b8ffbf14e854ad279d68db9601ff523
53 sha256
54 sha256
55 prepend 10c7680499ee8f29ecdd71e3c31d1578a0b49280e8f21bc8d90f0ff44ba9e767
56 sha256
57 sha256
58 prepend 794b0a9ff10c160097ee5f66775eb4d02807e7145293c329f2e755582314326d
59 sha256
60 sha256
61 append bfa0405b0aed907bbe97a41f9683834bff49f6d46c40e071d10907b5d0f86107
62 sha256
63 sha256
64 prepend 8b84cc516abec34673e6681cff192c4f839a69a525d654fce66e3e3a264efe40
65 sha256
66 sha256
67 append b5329a000a67259a3b3817e3471c0d4c1a67bf228f263d8c3edb20e53a1a1b92
68 sha256
69 sha256
70 prepend 3779daf2a1473d490c3ad99efebb9dd3b4ae73b449c62275235aa09afc86702e
71 sha256
72 sha256
73 prepend adde25bd50738de12d75fba15d3cc9796c807215fa003827c0fd4628a0dcf7ef
74 sha256
75 sha256
76 prepend d1f8457987d9901de6495c31b2d0cc01d2ff53314b68cdbe0decd9c270b4d401
77 sha256
78 sha256
79 prepend 06dd38fbb1675c3a0f014774de38575682070fec7df0ac346d566e7011784051
80 sha256
81 sha256
82 prepend e2b7fa69b1b69854136cd5cd4e94e24754af8dae392d45a45eb960eb406be59d
83 sha256
84 sha256
85 append 763e1bc49ec38c830bc6edbcc0d6b478abdf94171cda22cf015d589474f15e05
86 sha256
87 sha256
The great advantage of this format is the fact that the proof is embedded on Wikimedia Commons: it is always available for everybody at any time.
The great disadvantage is the fact that the OpenTimestamps client does not understand this format, so currently it cannot be directly used as an input to the OTS validator.
This is why I feel the correct solution, at least as a first step, would be using a URL data type to a OTS timestamp file. OTS timestamps should consequently be hosted externally to the Wikimedia project. The main disadvantage is the fact the host can withdraw those files at any time and without previous notice, therefore becoming unavailable to Wikidata users. However, I don't feel there is a better solution.
So as I said, if no better suggestion is provided, I will create a new property proposal for "earliest known OpenTimestamps URL". --FrankAndProust (talk) 19:04, 11 June 2021 (UTC)[reply]
  •  Comment I have no idea of the usefulness of the above data, but if Commons contributors want it stored as structured data, this is feasible.
  1. In Wikibase, the length of string property is configurable. So it's technically possible that the SDC team increases it for the above.
  2. Another aspect is the GUI. The value may not look that great in the GUI and aren't really useful for human readers. It may be worth to define a new sub-datatype of string values that just has some summary as displayed value and the full string value is only visible when querying or editing. There are already a few string-subtypes with special display. This would need to be requested from the SDC team as well.
  3. At least in Wikidata, it's possible to convert between string datatypes, so it may be possible to start out with a string property while the new datatype is being developed. --- Jura 13:03, 13 June 2021 (UTC)[reply]
 Comment @Jura1: The solution you propose is the one I like the most: "a new sub-datatype of string values that just has some summary as displayed value and the full string value is only visible when querying or editing".
And, like you said, "it may be possible to start out with a string property while the new datatype is being developed". As we have already tested out, the standard constraint of 1500 characters is not enough if we use base64 encoding. Maybe we could start out first with a string property with a max length constraint of 3000 characters?
This is my implementation proposal. It can be "tuned" to better adapt to the current Wikimedia technical infrastructure:
  1. The content of the OpenTimestamps file is copied on a text box encoded as base64 and stored on a sub-datatype of a string.
  2. Wikidata calculates the SHA256 hash of the highest resolution image (or video, or audio).
  3. Wikidata verifies the SHA256 hash at step 2 matches all the way up to the Merkle root on the OpenTimestamps file. This is done with the "verify" command. There are implementations of this command in Python, Java, Rust and Javascript at the OpenTimestamps Github.
  4. Wikidata gets the date of the Bitcoin block at the top of the Merkle root and checks for its date and time. It is not necessary to enforce trustless programming here, it would make the process a little more cumbersome, so it would be easier to not use a Bitcoin node in this step and stick to more classical API programming. We would use APIs available publicly. For the image at this example, we would need these two API calls:
$ curl https://blockstream.info/api/block-height/630893
0000000000000000000964ee0c5d67c74783626a058a76105f77ce0469ef379c
$ curl https://blockstream.info/api/block/0000000000000000000964ee0c5d67c74783626a058a76105f77ce0469ef379c
{"id":"0000000000000000000964ee0c5d67c74783626a058a76105f77ce0469ef379c",
 "height":630893,
 "version":536870912,
 "timestamp":1589857180,
 "tx_count":2065,
 "size":1396834,
 "weight":3993379,
 "merkle_root":"71cb3d8de9d964953d5f1a0ee00deb650a16cc055bed2366881a4574617af828",
 "previousblockhash":"000000000000000000068e9a1a96347f7815cd6f0a4d8c9e1b45305a9afd8364",
 "mediantime":1589851896,
 "nonce":3231938334,
 "bits":387021369,
 "difficulty":16104807485529}
We take the "mediantime" value from the last JSON, which in this case is 1589851896. We treat it as Unix time and it gives us: "Tue May 19 2020 01:31:36 GMT 0000". This way it is proven that the example image existed that date or earlier.
5. The date obtained in the previous step is shown to the user. If the user presses a button somewhere, the content of the OTS file is made visible in base64 encoding.
n.b.: I am not affiliated with OpenTimestamps or blockstream.info .
I am a developer and I could freely help on my spare time if necessary.
What are your thoughts? --FrankAndProust (talk) 19:26, 14 June 2021 (UTC)[reply]
It is already more than two weeks with no responses. Would it be more practical to simply display the URL pointing to the external OpenTimestamps file? --FrankAndProust (talk) 09:40, 29 June 2021 (UTC)[reply]
 Oppose you should just build this as something separate from Commons. On Commons we have the upload history in which we can verify that something was uploaded at some point of time. That is enough for us. Bloating the Commons database for this is a bit too much. Multichill (talk) 16:58, 13 September 2021 (UTC)[reply]
I'm marking this as not done because the data it's intended to store is too long. If that changes, it could be revisited, but I would recommend only doing that if there is evidence that other people would use this because, right now, there are no files on Commons which even mention this, not even the example given here. - Nikki (talk) 00:24, 24 December 2021 (UTC)[reply]