Migrate SHA-1 hashes to SHA-256 (tracking)
Open, HighPublic
Actions

Assigned To

None

Authored By

	• brooke
	Feb 24 2017, 6:56 PM

Description

SHA-1 collisions can now be manufactured semi-practically so we should start planning to migrate off SHA-1 to SHA-256.

The attack requires creating a common prefix and suffix that can be slipped into pairs of files, so cannot generally be used to create 'duplicates' of existing files, but can be used to create pairs or sets of files that can confound and confuse expected behavior once put into a common system.

Priority of adding better hashes is highest where the SHA-1 hash is used for unique addressing of user-submitted data, lowest where used to validate server-generated downloads.

Related Objects
Search...

Status	Assigned	Task
Open	None	T158986 Migrate SHA-1 hashes to SHA-256 (tracking)
Open	None	T158903 Provide other checksums than MD5 and SHA-1 for dumps
Open	None	T158988 Migrate filearchive's fa_storage_key from SHA-1 to SHA-256
Open	None	T158989 Add img_sha256, related columns with SHA-256 hash of file
Open	None	T197125 MediaWiki deadlock when multiple files of same SHA1 are deleted simultaneously

Event Timeline

• brooke created this task.Feb 24 2017, 6:56 PM

Restricted Application added a subscriber: Aklapper. · View Herald TranscriptFeb 24 2017, 6:56 PM

• brooke added a subtask: T158903: Provide other checksums than MD5 and SHA-1 for dumps.Feb 24 2017, 6:56 PM

Reading the relevant publication, priority seems to be high but not at the "drop everything" stages; although their new technique does make it several orders of magnitude more efficient, the attack is still fairly impractical in terms of CPU/GPU time required.

Ricordisamoa subscribed.Feb 24 2017, 7:01 PM

xSavitar subscribed.Feb 24 2017, 7:01 PM

• brooke created subtask T158988: Migrate filearchive's fa_storage_key from SHA-1 to SHA-256.Feb 24 2017, 7:10 PM

• brooke created subtask T158989: Add img_sha256, related columns with SHA-256 hash of file.Feb 24 2017, 7:16 PM

I'm not sure, if it really needs to be a "High" prio task. SHA1 is known to be insecure since 2005. What we now know is the fact only, that someone really broke it. However, like brion already wrote, the attack announced requires an identical prefix and suffix of both files, and a number of (non-visible) data between to work. This probably is
(a) to cost intensive to break functions inside MediaWiki
(b) not even possible so easily

So, from my point of view, we should evaluate, where we use sha1 hashes and decide on each usage, what priority applies to the specific use case (including how effetively it can be broken).

Please don't misunderstand this comment: I think, the mid-term goal should be to migrate to a more safer algorithm, but I don't think, that it _needs_ to be a high priority :)

BethNaught subscribed.Feb 24 2017, 7:41 PM

Framawiki subscribed.Feb 24 2017, 7:52 PM

MaxSem subscribed.Feb 24 2017, 8:30 PM

He7d3r subscribed.Feb 24 2017, 8:38 PM

Jdforrester-WMF subscribed.Feb 24 2017, 8:39 PM

Krenair subscribed.Feb 24 2017, 9:39 PM

greg subscribed.Feb 24 2017, 10:18 PM

tom29739 subscribed.Feb 24 2017, 10:37 PM

Thoughts on if its worth it to ban the prefix of the shattered files on upload? The attack scenario seems very minor at this point, but it might help put users' minds at ease.

Ltrlg subscribed.Feb 25 2017, 12:07 AM

Have we considered adding sha256 in addition to (not in place of) sha1. We were discussing informally some mitigation strategies at infrastructure side, and having several hashes checked may be more secure than replacing it with a separate one- sha256 will be broken at some point, I assume, but finding something that works for both sha1 and sha256 will probably take an order of magnitude more (even practically impossible). I was told that is not really hack solution, but a technique used for example for checking debian packages.

As a bonus, old code that is not critical can stay the same and I would guess that adding code and a column on the schema is easier than replacing it. Insecure API calls could be continue being used while a full transition is done. And the column can in the future just be unused.

As a downside, non-mediawiki applications like labs tools could be tempted on not updating the code, continue using sha1, and continue being vulnerable- although I am not sure how impacting that would be.

zhuyifei1999 subscribed.Feb 25 2017, 3:48 PM

Im not an expert on crypto so its possible im misinterpreting the paper, but I believe that https://www.iacr.org/archive/crypto2004/31520306/multicollisions.pdf suggests such constructions arent really much more secure then just using sha256.

If I understand the suggestion of @jcrespo correctly, it isn't something to add more security in its own, but make the transition to a new hash algorithm easier (as he wrote: Replacing a hash algorithm is probably more complex as adding a new one). As I understood him correctly, he proposed something like:

If a new file is uploaded, generate a sha1 and sha256 hash sum, insteaad of just a sha1 or only a sha256 checksum

Let's assume, someone is uploading another (or "bad") file:

The sha1 checksum is generated first and checked against all known sha1 checksums in the database

For mathching files with a sha256 checksum, too:

Generate the sha256 checksum of the "new" file and check this, too

For "old" files, which does have only the sha1 checksum:

Check, if the file is still existing (it should be) and generate the sha256 checksum on the fly (and, of course, save it in the database) and check it against the new file
If the file does, for whatever reason, does not exist anymore, use the sha1 checksum only to compare with the new file, as this is the only thing we can do in this case.

I'm not sure, if this can scale to other usages of sha1 checksums, too, and, of course, I'm also far away from being an expert in this area, so I can be mistaken, that this is at least possible to do and not more insecure as having only one sha1 checksum.

Also: Probably for some (or most?) cases where we use sha1 checksums, we still have the original content (file, text, whatever), so wouldn't it be possible to migrate to another hash algorithm with some amount of time and cpu power? If we do that, we should probably also try to implement another way of storing hashes. Instead of having a column (like img_sha1), we probbaly should have one, which can save one hash (like we do for passwords already), and where the hash itself indicates, what algorithm was used to calculate it? This would at least minimize the database schema changes, whenever we want to change the algorithm.

In T158986#3053723, @Florian wrote:

I'm not sure, if it really needs to be a "High" prio task

"High" was an educated guess based on the discussion. Feel free to change it if that's more accurate. :-)

Glaisher subscribed.Feb 26 2017, 12:17 PM

McZusatz subscribed.Feb 26 2017, 1:06 PM

In T158986#3055343, @Bawolff wrote:

Im not an expert on crypto so its possible im misinterpreting the paper, but I believe that https://www.iacr.org/archive/crypto2004/31520306/multicollisions.pdf suggests such constructions arent really much more secure then just using sha256.

That paper talks about iterating hashes, e.g. "hashing twice", something that for me is quite logical [that doesn't work]. I am not suggesting that, I am suggesting maintaining the current hash and adding another, checking both, with AND logic, not chaining them.

I also mentioned that there are some downsides, like adding or being tempted to continue using code with the old hash, so I will let the right people decide :-).

Paladox subscribed.Feb 26 2017, 2:28 PM

In T158986#3055988, @jcrespo wrote:

In T158986#3055343, @Bawolff wrote:

Im not an expert on crypto so its possible im misinterpreting the paper, but I believe that https://www.iacr.org/archive/crypto2004/31520306/multicollisions.pdf suggests such constructions arent really much more secure then just using sha256.

That paper talks about iterating hashes, e.g. "hashing twice", something that for me is quite logical [that doesn't work]. I am not suggesting that, I am suggesting maintaining the current hash and adding another, checking both, with AND logic, not chaining them.

I also mentioned that there are some downsides, like adding or being tempted to continue using code with the old hash, so I will let the right people decide :-).

The paper is talking about concatenating (which is equivalent to using AND logic) different iterative hashes. An iterative hash function is a common type of hash function. MD5, SHA-1, SHA-256 are all examples of iterative hash functions. They are not talking about applying the same hash function multiple times.

They are basically saying that the big-oh time to find an input that collides in two different iterative hash functions is roughly the same as finding a collision in just the stronger hash function. (They do say that if both hash functions have shortcut attacks, it depends on the details of the shortcut attack on whether "both" shortcut attacks can be used, but i think we should operate under the assumption that they can)

So we may want to store both hashes to ease the transition period, but we should not assume that it provides any more security than just using the new hash.

Nikerabbit subscribed.Feb 27 2017, 7:56 AM

• Phabricator_maintenance added a project: acl*security.Sep 20 2018, 9:15 AM

Aklapper removed a project: Security-General.Dec 13 2019, 10:46 AM

AntiCompositeNumber subscribed.Dec 28 2019, 4:40 PM

• chasemp added a project: Security.Feb 10 2020, 10:55 PM

• chasemp removed a project: acl*security.Feb 20 2020, 8:05 PM

jcrespo added a subtask: T197125: MediaWiki deadlock when multiple files of same SHA1 are deleted simultaneously.Jan 27 2021, 10:14 AM

Right now, I have as a plan to hash in sha-256 all wiki media files in order to track its backup, and store it on a database: T262668

This is operations outside of Mediawiki, but ping me in case my findings/experience would be helpful to anyone here.

RhinosF1 subscribed.Feb 17 2021, 10:58 AM

As promised, I have downloaded and hashed 99.94% of commons files with both sha1 and sha-256, and stored them in a separate metadata database (for backups). Sadly, it is difficult to contribute that back, and even keep that up to date, because the lack of proper unique identifiers of blobs on mediawiki, but I will document soon all my learnings about operational challenges of mediawiki workflow- including hashing collisions, hoping that will be useful for someone else. Feel free to ping me in private if you want to know more.

Don-vip subscribed.Jul 23 2022, 12:34 PM

Migrate SHA-1 hashes to SHA-256 (tracking)Open, HighPublicActions

Description

Related ObjectsSearch...

Event Timeline

Migrate SHA-1 hashes to SHA-256 (tracking)
Open, HighPublic
Actions

Related Objects
Search...