Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

make sure we work with unicode #6

Open
nrc opened this issue Mar 9, 2015 · 10 comments
Open

make sure we work with unicode #6

nrc opened this issue Mar 9, 2015 · 10 comments
Labels

Comments

@nrc
Copy link
Member

nrc commented Mar 9, 2015

In particular, we use byte positions where we should use char positions in many places. Furthermore, when we do use char positions, we don't check the 'physical' width of the character.

@tbu-
Copy link
Contributor

tbu- commented Apr 30, 2015

Byte positions should be fine, and they're also more performant than character indices.

@vessd
Copy link

vessd commented Jan 19, 2016

I think that rustfmt should use char positions for at least the width of the line, or writing error messages in native language leads to the fact that it is necessary to set max_width = 180.
Otherwise the line simply is not processed.

@bradjc
Copy link

bradjc commented Jan 6, 2017

When using many unicode characters in a line I get a series of:

Rustfmt failed at process.rs:894: line exceeded maximum length (sorry)

Even though my line is only 71 characters long.

iliekturtles added a commit to iliekturtles/rustfmt that referenced this issue Apr 2, 2017
Resolves rust-lang#1335. Does not attempt to handle a `\r` not followed by a `\n` nor
attempt to handle Unicode intricacies (rust-lang#6) including zero-width or multi-byte
characters.
@czipperz
Copy link

czipperz commented May 31, 2019

It appears that rust fmt replaces many non-unicode chars (such as σ or μ) with space . (test case 'μ' will be converted to ' ')

@scampi
Copy link
Contributor

scampi commented Jun 10, 2019

@czipperz Can you share an example where this happens ?

@czipperz
Copy link

Yes. https://github.com/czipperz/rust-comp/blob/b1e3df1f7a04f99e0c0a7bac7f97d715c43ab187/rust-comp-front/src/pos.rs#L78 . It's possible it's a problem with emacs. When I run M-x rust-fmt-buffer it replaces unicode characters with spaces. But cargo fmt --all works fine.

@scampi
Copy link
Contributor

scampi commented Jun 11, 2019

@czipperz looks like it's emacs related indeed. Maybe if you find out how emacs calls rustfmt we can reproduce the bug if it really is on rustfmt side.

@hcsch
Copy link

hcsch commented Oct 1, 2020

The code in this gist was formatted with rustfmt in the Rust playground. As can be seen, the comments behind the string containing (non-ASCII) unicode characters seem rather haphazardly "aligned". I believe this to be related to this issue, and likely the cause is use of byte lengths instead of unicode string lengths, but do correct me if I'm wrong. (Yes, this particular example might be a bit of a niche case, but I can imagine more legitimate situations where similar issues would arise)

@Cldfire
Copy link

Cldfire commented Feb 23, 2021

Running into the same issue as @hcsch with a codebase at work:

image

The unicode strings are causing the comment alignment to go haywire.

@vyamkovyi
Copy link

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

9 participants