Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add a imghdr_strict option to .what() to deliver the same false positives #81

Merged
merged 1 commit into from
May 28, 2024

Conversation

cclauss
Copy link
Contributor

@cclauss cclauss commented May 23, 2024

- def what(file: os.PathLike | str | None, h: bytes | None) -> str | None:
  def what(file: os.PathLike | str | None, h: bytes | None, imghdr_strict: bool = True) -> str | None:

imghdr_strict enables bug-for-bug compatibility between imghdr.what() and puremagic.what() when the imghdr returns a match but puremagic returns None. We believe that imghdr is delivering a "false positive" in each of these scenarios but we want puremagic.what()'s default behavior to match imghdr.what()'s false positives so we do not break existing applications.

If imghdr_strict is True (the default) then a lookup will be done to deliver a matching result on all known false positives. If imghdr_strict is False then puremagic's algorithms will determine the image type. True is more compatible while False is more correct.

NOTE: This compatibility effort only deals with imghdr's false positives and we are not interested in tracking the opposite situation where puremagic's delivers a match while imghdr would have returned None. Also, puremagic.what() can recognize many more file types than the twelve image file types that imghdr focused on.

@NebularNerd Your review, please.

@cclauss cclauss changed the title Add a imghdr_strict option to what() to deliver the same false positives Add a imghdr_strict option to .what() to deliver the same false positives May 23, 2024
@cclauss cclauss marked this pull request as ready for review May 23, 2024 19:39
@NebularNerd
Copy link
Contributor

Sounds good, at least folks using imghdr will continue to get the results they expect. 🙂

@cclauss
Copy link
Contributor Author

cclauss commented May 24, 2024

When you think a pull request is useful and is ready to be merged, please consider giving it a positive review.

Every check mark ✔️ at the top right of this page gives project maintainers confidence that the proposed changes have been read through and deemed both useful and safe to merge into the codebase.

Anyone can review a pull request on GitHub. To do so here:

  1. Scroll to the top of this page.
  2. Click the Files changed tab and read through each file carefully looking for potential issues.
  3. Click the Review changes button.
  4. Click Approve (or one of the other options) and add comments only if they have not already been stated in the PR.
  5. Click Submit review so that your ✔️ will be added to the list.

Copy link
Contributor

@NebularNerd NebularNerd left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, the notes regarding matching imghdr will be helpful to newcomers.

Comment on lines 431 to 432
if h and imghdr_strict and (ext := imghdr_bug_for_bug.get(h)):
return ext
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this bug only exist when sending in those exact bytes, or also when those bytes are in part of a file?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It should be that the non-passing tests are down to PureMagic's mostly superior matching rules. This should be in relation to my comment where @cclauss was testing dummy strings against the database and getting fails or no returned data.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My tests come from reading the test_xxx() functions in https://github.com/python/cpython/blob/3.12/Lib/imghdr.py ? They are very byte-by-byte tests so are quite specific. If you think there are other positives to add to our tests, they are easy to add.

@cclauss cclauss requested a review from cdgriffith May 25, 2024 16:34
@cdgriffith cdgriffith merged commit 73e94c5 into cdgriffith:develop May 28, 2024
8 checks passed
@cclauss cclauss deleted the bug-for-bug branch May 28, 2024 01:54
@cdgriffith cdgriffith mentioned this pull request Jun 16, 2024
cdgriffith added a commit that referenced this pull request Jun 16, 2024
- Adding #72 #75 #76 #81 `.what()` to be a drop in replacement for `imghdr.what()` (thanks to Christian Clauss and Andy - NebularNerd)
- Adding #67 Test on Python 3.13 beta (thanks to Christian Clauss)
- Adding #77 from __future__ import annotations (thanks to Christian Clauss
- Fixing #66 Confidence sorting (thanks to Andy - NebularNerd)

---------

Co-authored-by: Andy <[email protected]>
Co-authored-by: Christian Clauss <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants