Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Text in matplotlib svg #1097

Merged
merged 10 commits into from
Oct 7, 2024

Conversation

jeromedockes
Copy link
Member

Fixes #1095

previously in the matplotlib plots, we used the default behavior for text which is that matplotlib itself renders the text (the matplotlib output svg contains svg paths rather than text)

this is simple because then matplotlib knows the size taken by the text and can set the correct viewbox so we don't need any processing of the svg afterwards.

but the drawbacks are that:

  • it makes the plots bigger (around 40% I would say)
  • matplotlib is not as good as a browser at finding the right font for each glyph and rendering the text. at the moment we don't specify a font stack for matplotlib so some glyphs will be missing and replaced by empty boxes, and even if we do the result may be suboptimal. This is reported for emojis in Table Report unable to read emojis #1095 but it will also occur for many characters outside of the default font on the system where the report is generated -- eg in my machine the default matplotlib config uses dejavu which misses Chinese and Japanese characters.

the alternative, chosen in this PR, is to ask matplotlib to put the labels as text in the svg, in which case the application which displays it (in our case a web browser) is responsible for rendering it. the main drawback is that matplotlib does not know how much space the text will take, so we need to adjust the plot's viewbox and width and height with javascript after the page loads.

not directly related, but this pr also improves a bit how the text is truncated for right-to-left scripts, so that the ellipsis ("...") is displayed on the correct side of the text (in more cases). and it also normalizes whitespace so that we don't have line breaks in matplotlib labels

main

screenshot_2024-09-30T14:44:34 02:00

this PR

screenshot_2024-09-30T14:44:48 02:00

wider screenshots

main

screenshot_2024-09-30T14:44:28 02:00

this pr

screenshot_2024-09-30T14:43:33 02:00

@jeromedockes jeromedockes marked this pull request as ready for review September 30, 2024 12:56
@jeromedockes jeromedockes marked this pull request as draft September 30, 2024 13:07
@jeromedockes jeromedockes marked this pull request as ready for review September 30, 2024 13:50
@jeromedockes
Copy link
Member Author

Ok this one is ready for review. @jovan-stojanovic could you check that it fixes #1095 for you?

@jeromedockes jeromedockes added this to the 0.4.0 milestone Oct 7, 2024
@jovan-stojanovic
Copy link
Member

Thanks @jeromedockes, works perfectly well (verified on more than 4 million tweets)! Merging when the conflicts are solved

Copy link
Member

@GaelVaroquaux GaelVaroquaux left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks great!

On small comment: can you add a comment on the fact that there is direction-changing character which could be picked up as a security risk.

@jeromedockes
Copy link
Member Author

On small comment: can you add a comment on the fact that there is direction-changing character which could be picked up as a security risk.

thanks @GaelVaroquaux in the tests I replaced the literal bidirectional characters by unicode escape sequences so that the issue doesn't come up, with a comment explaining why this causes the display to look messed up in the source code

I also added an example in the script that generates example reports for testing that we can visually check

@jeromedockes
Copy link
Member Author

Thanks @jeromedockes, works perfectly well (verified on more than 4 million tweets)! Merging when the conflicts are solved

thanks a lot for checking @jovan-stojanovic !

@GaelVaroquaux GaelVaroquaux merged commit 41863c8 into skrub-data:main Oct 7, 2024
22 checks passed
@GaelVaroquaux
Copy link
Member

Great! Merged!

@jeromedockes jeromedockes deleted the text-in-matplotlib-svg branch October 7, 2024 14:24
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

Table Report unable to read emojis
3 participants