tokenize-rt

The stdlib tokenize module does not properly roundtrip. This wrapper around the stdlib provides two additional tokens ESCAPED_NL and UNIMPORTANT_WS, and a Token data type. Use src_to_tokens and tokens_to_src to roundtrip.

This library is useful if you're writing a refactoring tool based on the python tokenization.

Installation

pip install tokenize-rt

Usage

datastructures

`tokenize_rt.Offset(line=None, utf8_byte_offset=None)`

A token offset, useful as a key when cross referencing the ast and the tokenized source.

`tokenize_rt.Token(name, src, line=None, utf8_byte_offset=None)`

Construct a token

name: one of the token names listed in token.tok_name or ESCAPED_NL or UNIMPORTANT_WS
src: token's source as text
line: the line number that this token appears on.
utf8_byte_offset: the utf8 byte offset that this token appears on in the line.

`tokenize_rt.Token.offset`

Retrieves an Offset for this token.

converting to and from `Token` representations

`tokenize_rt.src_to_tokens(text: str) -> List[Token]`

`tokenize_rt.tokens_to_src(Iterable[Token]) -> str`

additional tokens added by `tokenize-rt`

`tokenize_rt.ESCAPED_NL`

`tokenize_rt.UNIMPORTANT_WS`

helpers

`tokenize_rt.NON_CODING_TOKENS`

A frozenset containing tokens which may appear between others while not affecting control flow or code:

COMMENT
ESCAPED_NL
NL
UNIMPORTANT_WS

`tokenize_rt.parse_string_literal(text: str) -> Tuple[str, str]`

parse a string literal into its prefix and string content

>>> parse_string_literal('f"foo"')
('f', '"foo"')

`tokenize_rt.reversed_enumerate(Sequence[Token]) -> Iterator[Tuple[int, Token]]`

yields (index, token) pairs. Useful for rewriting source.

`tokenize_rt.rfind_string_parts(Sequence[Token], i) -> Tuple[int, ...]`

find the indices of the string parts of a (joined) string literal

i should start at the end of the string literal
returns () (an empty tuple) for things which are not string literals

>>> tokens = src_to_tokens('"foo" "bar".capitalize()')
>>> rfind_string_parts(tokens, 2)
(0, 2)
>>> tokens = src_to_tokens('("foo" "bar").capitalize()')
>>> rfind_string_parts(tokens, 4)
(1, 3)

Differences from `tokenize`

tokenize-rt adds ESCAPED_NL for a backslash-escaped newline "token"
tokenize-rt adds UNIMPORTANT_WS for whitespace (discarded in tokenize)
tokenize-rt normalizes string prefixes, even if they are not parsed -- for instance, this means you'll see Token('STRING', "f'foo'", ...) even in python 2.
tokenize-rt normalizes python 2 long literals (4l / 4L) and octal literals (0755) in python 3 (for easier rewriting of python 2 code while running python 3).

Name		Name	Last commit message	Last commit date
Latest commit History 403 Commits
.github/workflows		.github/workflows
testing/resources		testing/resources
tests		tests
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
LICENSE		LICENSE
README.md		README.md
requirements-dev.txt		requirements-dev.txt
setup.cfg		setup.cfg
setup.py		setup.py
tokenize_rt.py		tokenize_rt.py
tox.ini		tox.ini

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

tokenize-rt

Installation

Usage

datastructures

`tokenize_rt.Offset(line=None, utf8_byte_offset=None)`

`tokenize_rt.Token(name, src, line=None, utf8_byte_offset=None)`

`tokenize_rt.Token.offset`

converting to and from `Token` representations

`tokenize_rt.src_to_tokens(text: str) -> List[Token]`

`tokenize_rt.tokens_to_src(Iterable[Token]) -> str`

additional tokens added by `tokenize-rt`

`tokenize_rt.ESCAPED_NL`

`tokenize_rt.UNIMPORTANT_WS`

helpers

`tokenize_rt.NON_CODING_TOKENS`

`tokenize_rt.parse_string_literal(text: str) -> Tuple[str, str]`

`tokenize_rt.reversed_enumerate(Sequence[Token]) -> Iterator[Tuple[int, Token]]`

`tokenize_rt.rfind_string_parts(Sequence[Token], i) -> Tuple[int, ...]`

Differences from `tokenize`

Sample usage

About

Releases

Sponsor this project

Packages

Contributors 3

Languages

License

asottile/tokenize-rt

Folders and files

Latest commit

History

Repository files navigation

tokenize-rt

Installation

Usage

datastructures

tokenize_rt.Offset(line=None, utf8_byte_offset=None)

tokenize_rt.Token(name, src, line=None, utf8_byte_offset=None)

tokenize_rt.Token.offset

converting to and from Token representations

tokenize_rt.src_to_tokens(text: str) -> List[Token]

tokenize_rt.tokens_to_src(Iterable[Token]) -> str

additional tokens added by tokenize-rt

tokenize_rt.ESCAPED_NL

tokenize_rt.UNIMPORTANT_WS

helpers

tokenize_rt.NON_CODING_TOKENS

tokenize_rt.parse_string_literal(text: str) -> Tuple[str, str]

tokenize_rt.reversed_enumerate(Sequence[Token]) -> Iterator[Tuple[int, Token]]

tokenize_rt.rfind_string_parts(Sequence[Token], i) -> Tuple[int, ...]

Differences from tokenize

Sample usage

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Sponsor this project

Packages 0

Contributors 3

Languages

`tokenize_rt.Offset(line=None, utf8_byte_offset=None)`

`tokenize_rt.Token(name, src, line=None, utf8_byte_offset=None)`

`tokenize_rt.Token.offset`

converting to and from `Token` representations

`tokenize_rt.src_to_tokens(text: str) -> List[Token]`

`tokenize_rt.tokens_to_src(Iterable[Token]) -> str`

additional tokens added by `tokenize-rt`

`tokenize_rt.ESCAPED_NL`

`tokenize_rt.UNIMPORTANT_WS`

`tokenize_rt.NON_CODING_TOKENS`

`tokenize_rt.parse_string_literal(text: str) -> Tuple[str, str]`

`tokenize_rt.reversed_enumerate(Sequence[Token]) -> Iterator[Tuple[int, Token]]`

`tokenize_rt.rfind_string_parts(Sequence[Token], i) -> Tuple[int, ...]`

Differences from `tokenize`

Packages