Recording file uploads is a complex process involving updates to different systems that need to be kept consistent. The current transaction logic is flawed and can lead to data loss and corruption under certain circumstances, see T263301#6487019.
This could be prevented by re-structuring the upload process as follows:
Stage one:
- start db transaction (do not use a deferred update)
- determine archive name of the file, insert a row into oldimage, based on data in the image table
- copy the current version to the archive name. Do not use "quick" operations.
- commit db transaction (really flush! we must know this is permanent before overwriting the current version of the file!)
Stage two:
- start db transaction (do not use a deferred update)
- determine meta-data for the new version, insert update the row in the image table
- copy the new version into the primary location. Do not use "quick" operations.
- commit db transaction (really flush!)
Stage three:
- schedule jobs for thumbnail generation (in a deferred update?)
This should prevent any data loss. However, if we fail before stage two is committed, we end up with an extra row in oldimage, which is visible to users. It would point to a copy of the current version, and have the same meta data. We could try to detect this during the next upload, and remove such a row. This would be even easier with a unique index over oi_name and oi_timestamp (plus perhaps oi_sha1).