[BUG] Inconsistent Handling of Unicode Line Breaks (U 2028, U 2029) #924

AnonymouX47 · 2024-08-31T01:54:06Z

Description:

The Unicode codepoints U 2028 (Line Separator) and U 2029 (Paragraph Separator) are kind of treated as line breaks by Text.pack() but as normal zero-width symbols by Text.render(), instead of omitting the codepoints, breaking the text and padding the line as necessary.

I said "kind of" because pack() returns the width as though the text is broken by the codepoint(s) but the height is always 1, regardless of how many of these codepoints there are in the text.

By the way, urwid.calc_width() treats them as zero-width symbols.

Affected versions (if applicable)

master branch (specify commit)
Latest stable version from pypi
Other (specify source)

Steps to reproduce (if applicable)

Text.pack():

>>> urwid.Text("123\u202812345").pack()
(5, 1)
>>> urwid.Text("123\u20281234").pack()
(4, 1)
>>> urwid.Text("1234\u20281234").pack()
(4, 1)
>>> urwid.Text("12345\u20281234").pack()
(5, 1)
>>> urwid.Text("123\u20281234\u202912").pack()
(4, 1)
>>> urwid.Text("123\u202812\u20291234").pack()
(4, 1)

Text.render():

>>> urwid.Text("123\u202812345").render(()).text
[b'123\xe2\x80\xa812', b'345  ']
>>> urwid.Text("123\u202812\u20291234").render(()).text
[b'123\xe2\x80\xa81', b'2\xe2\x80\xa9123', b'4   ']

Expected/actual outcome

I honestly care less whether they're treated as actual line breaks or just plain zero-width symbols and escaped (replaced with "?") like other whitespace codepoints such as U 0009 (Horizontal Tab, \t), all I care about is that they're handled consistently.

Thank you 😃

The text was updated successfully, but these errors were encountered:

- Fix: Use `urwid.calc_width()` instead of `urwid.Text.pack()[0]` to compute the screen column width of text. - Additionally results in a performance improvement. Avoids buggy behaviour of `Text.pack()` (see urwid/urwid#924) and fixes ihabunek/toot#499.

penguinolog · 2024-09-02T06:58:36Z

Lookks like calc_string_text_pos should re-implement full wcwidth.wcswidth internals (https://github.com/jquast/wcwidth/blob/a20c9441aaa42f3ac88be573cf8027229f1e3520/wcwidth/wcwidth.py#L160).
At this moment it not implement public methods for position calculation.

AnonymouX47 added the bug label Aug 31, 2024

AnonymouX47 mentioned this issue Aug 31, 2024

TUI: Crash when loading a specific account ihabunek/toot#499

Open

penguinolog added the Unicode Issues related to Unicode <-> bytes conversion label Sep 2, 2024

penguinolog self-assigned this Oct 15, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] Inconsistent Handling of Unicode Line Breaks (U 2028, U 2029) #924

[BUG] Inconsistent Handling of Unicode Line Breaks (U 2028, U 2029) #924

AnonymouX47 commented Aug 31, 2024

penguinolog commented Sep 2, 2024

[BUG] Inconsistent Handling of Unicode Line Breaks (U 2028, U 2029) #924

[BUG] Inconsistent Handling of Unicode Line Breaks (U 2028, U 2029) #924

Comments

AnonymouX47 commented Aug 31, 2024

Description:

Affected versions (if applicable)

Steps to reproduce (if applicable)

Expected/actual outcome

penguinolog commented Sep 2, 2024