Package com.google.api.client.util.escape (1.45.0)

Character escaping utilities.

Classes

CharEscapers

Utility functions for encoding and decoding URIs.

Escaper

An object that converts literal text into a format safe for inclusion in a particular context (such as an XML document). Typically (but not always), the inverse process of "unescaping" the text is performed automatically by the relevant parser.

For example, an XML escaper would convert the literal string "Foo<Bar>" into "Foo<Bar>" to prevent "<Bar>" from being confused with an XML tag. When the resulting XML document is parsed, the parser API will return this text as the original literal string "Foo<Bar>".

An Escaper instance is required to be stateless, and safe when used concurrently by multiple threads.

Several popular escapers are defined as constants in the class CharEscapers.

PercentEscaper

A UnicodeEscaper that escapes some set of Java characters using the URI percent encoding scheme. The set of safe characters (those which remain unescaped) is specified on construction.

For details on escaping URIs for use in web pages, see RFC 3986 - section 2.4 and RFC 3986 - appendix A

When encoding a String, the following rules apply:

  • The alphanumeric characters "a" through "z", "A" through "Z" and "0" through "9" remain the same.
  • Any additionally specified safe characters remain the same.
  • If plusForSpace is true, the space character " " is converted into a plus sign " ".
  • All other characters are converted into one or more bytes using UTF-8 encoding. Each byte is then represented by the 3-character string "%XY", where "XY" is the two-digit, uppercase, hexadecimal representation of the byte value.

RFC 3986 defines the set of unreserved characters as "-", "_", "~", and "." It goes on to state:

URIs that differ in the replacement of an unreserved character with its corresponding percent-encoded US-ASCII octet are equivalent: they identify the same resource. However, URI comparison implementations do not always perform normalization prior to comparison (see Section 6). For consistency, percent-encoded octets in the ranges of ALPHA (A-Z and a-z), DIGIT (0-9), hyphen (-), period (.), underscore (_), or tilde (~) should not be created by URI producers and, when found in a URI, should be decoded to their corresponding unreserved characters by URI normalizers.

Note: This escaper produces uppercase hexadecimal sequences. From RFC 3986:
"URI producers and normalizers should use uppercase hexadecimal digits for all percent-encodings."

UnicodeEscaper

An Escaper that converts literal text into a format safe for inclusion in a particular context (such as an XML document). Typically (but not always), the inverse process of "unescaping" the text is performed automatically by the relevant parser.

For example, an XML escaper would convert the literal string "Foo<Bar>" into "Foo<Bar>" to prevent "<Bar>" from being confused with an XML tag. When the resulting XML document is parsed, the parser API will return this text as the original literal string "Foo<Bar>".

As there are important reasons, including potential security issues, to handle Unicode correctly if you are considering implementing a new escaper you should favor using UnicodeEscaper wherever possible.

A UnicodeEscaper instance is required to be stateless, and safe when used concurrently by multiple threads.

Several popular escapers are defined as constants in the class CharEscapers. To create your own escapers extend this class and implement the #escape(int) method.