Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

urlencoded: Support iso-8859-1, utf8 sentinel, and numeric entities #326

Merged
merged 27 commits into from
Jul 30, 2024

Conversation

papandreou
Copy link

@papandreou papandreou commented Jul 31, 2018

Fixes #194

Further elaborations here: ljharb/qs#268

Supports both the simple and extended parsers as outlined in ljharb/qs#268 (comment) -- and doesn't use the equivalent capabilities of the qs module.

Note: This will fail in node.js 0.8 and 0.10 as the capability to specify a custom decodeURIComponent implementation to use wasn't added until 0.12.

@dougwilson
Copy link
Contributor

Note: This will fail in node.js 0.8 and 0.10 as the capability to specify a custom decodeURIComponent implementation to use wasn't added until 0.12.

We can hold the PR for the 2.0 release, which will drop support for those versions 👍

@papandreou
Copy link
Author

@dougwilson, that sounds fair! Does it look good otherwise?

@Lurow
Copy link

Lurow commented Sep 10, 2018

Hi there,

Any news on v2.0 release date with the pr?

Cheers

@thopaw
Copy link

thopaw commented Jan 9, 2019

Any news to iso-8859-1 support?

@papandreou
Copy link
Author

Fixed conflict with master.

@dougwilson, this has been sitting for some time now 😅

I'm not really in a hurry, but how about getting this released soon? I can remove those ancient node versions from .travis.yml if that would be a help? (I guess it might be a nice occasion to drop more old versions)

@dougwilson
Copy link
Contributor

The reason it has been sitting is just because it cannot be merged into the 1.x line due to the incompatibility. Because this module is a part of express, it effectively inherits it's support policy. The 1.x series of this module won't end up out of support for some time, and the count down starts with the next express release.

In order to reduce the support burden by maintaining multiple major versions of this module, I am just waiting to make the 2.0 in coordination with express 5.

If this could be made to work in those node.js versions I would of course be happy to land right away.

@papandreou
Copy link
Author

papandreou commented Aug 15, 2019

Okay, thanks for the update, that makes sense!

If this could be made to work in those node.js versions I would of course be happy to land right away.

Unfortunately I think this would involve us to stop relying on the built-in querystring module in simple mode. Either by copying in a newer implementation than the one in 0.10, or by figuring out a way to use the qs module in simple mode as well, disabling the advanced behaviors. That seems a bit far-reaching, unless you were already planning to go in that direction?

@ljharb
Copy link
Contributor

ljharb commented Aug 15, 2019

@papandreou if there's manageable changes i could make in qs that would allow body-parser to switch to it for everything, i'm willing to make those.

@dougwilson
Copy link
Contributor

And I would be happy to use qs for everything here as well. The downside to using querystring for the simple parser is that since it ships as part of Node.js core, the results of this module will vary by both the version of this module and the version of Node.js (vs only varying by the version of this module).

@ljharb
Copy link
Contributor

ljharb commented Aug 16, 2019

@dougwilson if you can save me a few minutes and link me to a branch that uses qs for the simple parser, but is failing tests, i can use npm link and see what options and changes it would take to get it working :-D 🙏

@dougwilson
Copy link
Contributor

@ljharb this is likely not even correct for params to qs (I passed depth: -1 since 0 is not the right behavior, but idk), but it's at least something that should help :)

Branch: https://github.com/expressjs/body-parser/tree/urlencoded-qs-simple
Test: npm i && npm test -- --grep=bodyParser.urlencoded

@ljharb
Copy link
Contributor

ljharb commented Aug 16, 2019

give me 48 hours, i'll see what i can do

@ljharb
Copy link
Contributor

ljharb commented Aug 16, 2019

body-parser tests on that branch now pass with ljharb/qs#326 (comment), and I can merge and release that in qs as soon as I've convinced myself what semver it is.

@dougwilson
Copy link
Contributor

Ah, yes, so depth: 0 was supposed to make sense after all :D . Yes, that is a hard decision. But if it helps at all, this module and Express do not directly expose the qs arguments, making whichever semver it is a non issue to us :)

@ljharb
Copy link
Contributor

ljharb commented Aug 16, 2019

What I'm probably leaning towards is releasing it as a minor, and if users report breakage, reverting the depth 0 part, but keeping the depth false part - so express thus should use depth false to ensure continuity.

@ljharb
Copy link
Contributor

ljharb commented Aug 17, 2019

v6.8.0 of qs is published.

@DarkBitz
Copy link

Is there a specific reason why the pull request is not being merged?

papandreou added a commit to papandreou/body-parser that referenced this pull request Jul 23, 2024
papandreou added a commit to papandreou/body-parser that referenced this pull request Jul 23, 2024
wesleytodd pushed a commit that referenced this pull request Jul 24, 2024
* Always use the qs module, even in simple mode

#326 (comment)

* Create the simple and extended parser with the same function, reducing duplication

* Don't mention the querystring module in the README

* Fix lint
@wesleytodd wesleytodd changed the base branch from master to 2.x July 26, 2024 21:44
@wesleytodd
Copy link
Member

I merged #387 yesterday, I am trying to catch up on what was going on here but was thinking it might be a better idea to see if you (@papandreou) wanted to get this PR updated with that in mind? Or are we calling this not necessary anymore?

@ljharb
Copy link
Contributor

ljharb commented Jul 26, 2024

Note: This will fail in node.js 0.8 and 0.10 as the capability to specify a custom decodeURIComponent implementation to use wasn't added until 0.12.

means it should probably wait for a major?

@wesleytodd
Copy link
Member

wesleytodd commented Jul 26, 2024

A major is what I am doing! So that is why I am circling back on all these PRs.

EDIT: I should have pointed out the "changed the base branch from master to 2.x"

@ljharb
Copy link
Contributor

ljharb commented Jul 26, 2024

in that case it seems like a great idea

@wesleytodd
Copy link
Member

With that in mind, I am not up on the specifications involved here and am just doing the mechanical turk job of preparing all these stale branches for prime time, can you give a review and TLDR of what we should be aware of with this and #387 going out along with express@5?

Copy link
Contributor

@ljharb ljharb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed some of it, but i'm not familiar enough with body-parser to review the rest.

README.md Outdated
Comment on lines 278 to 281
##### defaultCharset

The default charset to parse as, if not specified in content-type. Must be
either `utf-8` or `iso-8859-1`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this allows changing the charset, which is a nice semver-minor; the default should probably be utf-8

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is. Documented that in 44a6aa9 now.

Comment on lines 285 to 287
Whether to let the value of the `utf8` parameter take precedence as the charset
selector. It requires the form to contain a parameter named `utf8` with a value
of `✓`. Defaults to `false`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this goes along with defaultCharset

Comment on lines 289 to 292
##### interpretNumericEntities

Whether to decode numeric entities such as `☺` when parsing an iso-8859-1
form. Defaults to `false`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

and this goes along with the other defaultCharset value

Comment on lines 30 to 41
var charsetBySentinel = {
// This is what browsers will submit when the ✓ character occurs in an
// application/x-www-form-urlencoded body and the encoding of the page containing
// the form is iso-8859-1, or when the submitted form has an accept-charset
// attribute of iso-8859-1. Presumably also with other charsets that do not contain
// the ✓ character, such as us-ascii.
'✓': 'iso-8859-1', // encodeURIComponent('✓')
// These are the percent-encoded utf-8 octets representing a checkmark, indicating
// that the request actually is utf-8 encoded.
'✓': 'utf-8' // encodeURIComponent('✓')
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this does confuse me a bit tho; it kind of seems like code that already exists in qs? meaning that the options can just be passed to qs and it'll handle the logic.

altho if someone's using a different querystring parser then the logic might need replicating

Copy link
Author

@papandreou papandreou Jul 27, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You're correct that these magic values are also found here. I spent some time trying to remember why also having the decoding logic here was warranted. It seems like we can get actually get rid of it and still have the test suite pass!

I wonder if I did it like this because it seemed like a long shot to get changes into the qs module, so I went for body-parser first, because that's what I needed myself. It has been 6 years, though, so I'm not really sure 🤔

* 2.x: (98 commits)
  fix(deps): raw-body@^3.0.0 (expressjs#529)
  Also use the qs module for the simple parser (expressjs#387)
  feat!: remove node less than 18 from ci
  2.0.0-beta.2
  docs: add missing history entry
  tests: enable strict mode
  Remove deprecated bodyParser() combination middleware
  build: remove conditional code coverage
  deps: [email protected]
  deps: [email protected]
  deps: [email protected]
  1.20.2
  Fix strict json error message on Node.js 19 
  deps: [email protected]
  build: [email protected]
  build: [email protected]
  build: [email protected]
  deps: content-type@~1.0.5
  build: [email protected]
  build: [email protected]
  ...
@papandreou
Copy link
Author

I merged #387 yesterday, I am trying to catch up on what was going on here but was thinking it might be a better idea to see if you (@papandreou) wanted to get this PR updated with that in mind? Or are we calling this not necessary anymore?

I think it's still a nice-to-have! I've merged the 2.x branch in and resolved the conflicts now.

Copy link
Contributor

@ljharb ljharb left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - altho i don’t actually see how this is breaking? where is a custom decodeURIComponent implementation used?

@papandreou
Copy link
Author

altho i don’t actually see how this is breaking? where is a custom decodeURIComponent implementation used?

Ah, that referred to the querystring module built into node.js. In node.js 0.12 and above, it supports specifying a custom decodeURIComponent implementation (docs), whereas 0.10 and below didn't (docs).

Before #387, body-parser would use the querystring module when not configured to use the extended parsing mode. I think that meant that without #387, the new features introduced in this PR would only have worked in extended mode in node.js 0.10 and below. Even 6 years ago, node.js 0.12 had been out for 6 years, and it seemed fine to just live with it and do a new major release. And here we (almost) are 😆

@wesleytodd
Copy link
Member

Well either way I don't think we are doing any extra work to land these things in the current majors. We have needed to clean house on much of this for a while, so as long as this is good to land we will be doing it along with a major no matter what. And with that in mind, here I am clicking the merge button. We will be preparing the v2 release soon.

@wesleytodd wesleytodd merged commit 6cea6bd into expressjs:2.x Jul 30, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

iso-8859-1/windows-1252 support?
8 participants