Skip to content

Convert text to JSON using only regular expressions Read-only mirror of https://gitlab.com/tozd/regex2json

License

Notifications You must be signed in to change notification settings

tozd/regex2json

Repository files navigation

Convert text to JSON using only regular expressions

pkg.go.dev Go Report Card pipeline status coverage report

Main motivation for this tool is to convert traditional text-based (and line-based) logs to JSON for programs which do not support JSON logs themselves. It can be used in online manner (pipelining output of the program into regex2json, e.g., as log processor in runit and dinit init systems) or offline manner (to process logs stored in files). But the tool is more general and can enable any workflow where you prefer operating on JSON instead of text. It works especially great when combined with jq.

Features:

  • Reads stdin line by line, converting each line to JSON to stdout.
  • Supports transformations of matched capture groups by specifying the transformation as capture group's name.
  • Transformation consists of a series of operators (e.g., parsing numbers, timestamps, creating arrays and objects).
  • Supports regexp matching a line multiple times, combining all matches into one JSON.

Installation

Releases page contains a list of stable versions. Each includes:

  • Statically compiled binaries.
  • Docker images.

You should just download/use the latest one.

The tool is implemented in Go. You can also use go install to install the latest stable (released) version:

go install gitlab.com/tozd/regex2json/cmd/regex2json@latest

To install the latest development version (main branch):

go install gitlab.com/tozd/regex2json/cmd/regex2json@main

Usage

regex2json reads lines from stdin, matching every line with the provided regexp. If line matches, values from captured named groups are mapped into output JSON which is then written out to stdout. If the line does not match, it is written to stderr.

Capture groups' names are compiled into Expressions and describe how are matched values mapped and transformed into output JSON. See Expression for details on the syntax and Library for available operators.

Any error (e.g., a failed expression) is logged to stderr while the rest of the output JSON is still written out.

If regexp can match multiple times per line, all matches are combined together into the same one JSON output per line.

Usage:

regex2json <regexp>

Example:

$ while true; do LC_ALL=C date; sleep 1; done | regex2json "(?P<date___time__UnixDate__RFC3339>. )"
{"date":"2023-06-13T11:26:45Z"}
{"date":"2023-06-13T11:26:46Z"}
{"date":"2023-06-13T11:26:47Z"}

Example:

$ echo '192.168.0.100 - - [13/Jun/2023:13:15:13  0000] "GET /index.html HTTP/1.1" 200 1234 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36"' | \
  regex2json '^(?P<address>\S ) - (?P<user>\S ) \[(?P<time___time__Nginx__RFC3339>[\w:/] \s[ \-]\d{4})\] "(?P<method>\S )\s?(?P<url>\S )?\s?(?P<http>\S )?" (?P<status___int>\d{3}) (?:(?P<size___int>\d )|-) "(?P<referrer>[^"]*)" "(?P<agent>[^"]*)"'
{"address":"192.168.0.100","agent":"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.124 Safari/537.36","http":"HTTP/1.1","method":"GET","referrer":"-","size":1234,"status":200,"time":"2023-06-13T13:15:13Z","url":"/index.html","user":"-"}

As a package

This is also a Go package. You can add it to your project using go get:

go get gitlab.com/tozd/regex2json

It requires Go 1.20 or newer.

See full package documentation on pkg.go.dev on using regex2json as a Go package.

Contributing

Feel free to make a merge-request add more time layouts and/or operators.

Which regular expression syntax is supported?

regex2json is implemented in Go and uses its standard regexp package for parsing and compiling regular expressions.

Why is syntax of transformations so awkward?

This is a consequence of the limitation on which characters can be in a capture group name in Go ([A-Za-z0-9_] ). See this issue for more details.

Related projects

  • jc – jc enables the same idea of converting text-based output of programs into JSON, but its focus is to support popular programs out of the box. regex2json enables quick transformations by providing a regexp with expressions how captured groups are transformed into JSON.

GitHub mirror

There is also a read-only GitHub mirror available, if you need to fork the project there.