Skip to content

Latest commit

 

History

History
276 lines (178 loc) · 13.9 KB

README.md

File metadata and controls

276 lines (178 loc) · 13.9 KB

ndjson-cli

Unix-y tools for operating on newline-delimited JSON streams.

npm install ndjson-cli

Command Line Reference

Options

All ndjson-cli commands support --help and --version. Commands that take an expression also support --require.

# ndjson-command -h
# ndjson-command --help

Output usage information.

# ndjson-command -V
# ndjson-command --version

Output the version number.

# ndjson-command -r [name=]module
# ndjson-command --require [name=]module

Requires the specified module, making it available for use in any expressions used by this command. The loaded module is available as the symbol name. If name is not specified, it defaults to module. (If module is not a valid identifier, you must specify a name.) For example, to sort using d3.ascending from d3-array:

ndjson-sort -r d3=d3-array 'd3.ascending(a, b)' < values.ndjson

Or to require all of d3:

ndjson-sort -r d3 'd3.ascending(a, b)' < values.ndjson

The required module is resolved relative to the current working directory. If the module is not found during normal resolution, the global root is also searched, allowing you to require global modules from the command line.

cat

# ndjson-cat [files…] <>

Sequentially concatenates one or more input files containing JSON into a single newline-delimited JSON on stdout. If files is not specified, it defaults to “-”, indicating stdin. This command is especially useful for converting pretty-printed JSON (that contains newlines) into newline-delimited JSON. For example, to print the binaries exported by this repository’s package.json:

ndjson-cat package.json | ndjson-split 'Object.keys(d.bin)'

filter

# ndjson-filter [expression] <>

Filters the newline-delimited JSON stream on stdin according to the specified expression: if the expression evaluates truthily for the given value d at the given zero-based index i in the stream, the resulting value is output to stdout; otherwise, it is ignored. If expression is not specified, it defaults to true. This program is much like array.filter.

For example, given a stream of GeoJSON features from shp2json, you can filter the stream to include only the multi-polygon features like so:

shp2json -n example.shp | ndjson-filter 'd.geometry.type === "MultiPolygon"'

Or, to skip every other feature:

shp2json -n example.shp | ndjson-filter 'i & 1'

Or to take a random 10% sample:

shp2json -n example.shp | ndjson-filter 'Math.random() < 0.1'

Side-effects during filter are allowed. For example, to delete a property:

shp2json -n example.shp | ndjson-filter 'delete d.properties.FID, true'

map

# ndjson-map [expression] <>

Maps the newline-delimited JSON stream on stdin according to the specified expression: outputs the result of evaluating the expression for the given value d at the given zero-based index i in the stream. If expression is not specified, it defaults to d. This program is much like array.map.

For example, given a stream of GeoJSON features from shp2json, you can convert the stream to geometries like so:

shp2json -n example.shp | ndjson-map 'd.geometry'

Or you can extract the properties, and then convert to tab-separated values:

shp2json -n example.shp | ndjson-map 'd.properties' | json2tsv -n > example.tsv

You can also add new properties to each object by assigning them and then returning the original object:

shp2json -n example.shp | ndjson-map 'd.properties.version = 2, d' | json2tsv -n > example.v2.tsv

reduce

# ndjson-reduce [expression [initial]] <>

Reduces the newline-delimited JSON stream on stdin according to the specified expression. For each value in the input stream, evaluates the expression for the given value d at the given zero-based index i in the stream and the previous value p, which is initialized to initial. If expression and initial are not specified, they default to p.push(d), p and [], respectively, merging all input values into a single array (the inverse of ndjson-split). Otherwise, if initial is not specified, the first time the expression is evaluated p will be equal to the first value in the stream (i = 0) and d will be equal to the second (i = 1). Outputs the last result when the stream ends. This program is much like array.reduce.

For example, to count the number of values in a stream of GeoJSON features from shp2json, like wc -l:

shp2json -n example.shp | ndjson-reduce 'p   1' '0'

To merge a stream into a feature collection:

shp2json -n example.shp | ndjson-reduce 'p.features.push(d), p' '{type: "FeatureCollection", features: []}'

Or equivalently, in two steps:

shp2json -n example.shp | ndjson-reduce | ndjson-map '{type: "FeatureCollection", features: d}'

To convert a newline-delimited JSON stream of values to a JSON array, the inverse of ndjson-split:

ndjson-reduce < values.ndjson > array.json

split

# ndjson-split [expression] <>

Expands the newline-delimited JSON stream on stdin according to the specified expression: outputs the results of evaluating the expression for the given value d at the given zero-based index i in the stream. The result of evaluating the expression must be an array (though it may be the empty array if no values should be output for the given input). If expression is not specified, it defaults to d, which assumes that the input values are arrays.

For example, given a single GeoJSON feature collection from shp2json, you can convert a stream of features like so:

shp2json example.shp | ndjson-split 'd.features'

To convert a JSON array to a newline-delimited JSON stream of values, the inverse of ndjson-reduce:

ndjson-split < array.json > values.ndjson

join

# ndjson-join [expression₀ [expression₁]] file₀ file₁ <>

Joins the two newline-delimited JSON streams in file₀ and file₁ according to the specified key expressions expression₀ and expression₁. For each value d₀ at the zero-based index i₀ in the stream file₀, the corresponding key is the result of evaluating the expression₀. Similarly, for each value d₁ at the zero-based index i₁ in the stream file₁, the corresponding key is the result of evaluating the expression₁. When both input streams end, for each distinct key, the cartesian product of corresponding values d₀ and d₁ are output as an array [d₀, d₁].

If expression₀ is not specified, it defaults to i, joining the two streams by line number; in this case, the length of the output stream is the shorter of the two input streams. If expression₁ is not specified, it defaults to expression₀.

For example, consider the CSV file a.csv:

name,color
Fred,red
Alice,green
Bob,blue

And b.csv:

name,number
Fred,21
Alice,42
Bob,102

To merge these into a single stream by name using csv2json:

ndjson-join 'd.name' <(csv2json -n a.csv) <(csv2json -n b.csv)

The resulting output is:

[{"name":"Fred","color":"red"},{"name":"Fred","number":"21"}]
[{"name":"Alice","color":"green"},{"name":"Alice","number":"42"}]
[{"name":"Bob","color":"blue"},{"name":"Bob","number":"102"}]

To consolidate the results into a single object, use ndjson-map and Object.assign:

ndjson-join 'd.name' <(csv2json -n a.csv) <(csv2json -n b.csv) | ndjson-map 'Object.assign(d[0], d[1])'

# ndjson-join --left
# ndjson-join --right
# ndjson-join --outer

Specify the type of join: left, right, or outer. Empty values are output as null. If none of these arguments are specified, defaults to inner.

sort

# ndjson-sort [expression] <>

Sorts the newline-delimited JSON stream on stdin according to the specified comparator expression. After reading the entire input stream, sorts the array of values with a comparator that evaluates the expression for two given values a and b from the input stream. If the resulting value is less than 0, then a appears before b in the output stream; if the value is greater than 0, then a appears after b in the output stream; any other value means that the partial order of a and b is undefined. If expression is not specified, it defaults to ascending natural order.

For example, to sort a stream of GeoJSON features by their name property:

shp2json -n example.shp | ndjson-sort 'a.properties.name.localeCompare(b.properties.name)'

top

# ndjson-top [expression] <>

Selects the top n values (see -n) from the newline-delimited JSON stream on stdin according to the specified comparator expression. (This selection algorithm is implemented using partial heap sort.) After reading the entire input stream, outputs the top n values in ascending order. As with ndjson-sort, the input values are compared by evaluating the expression for two given values a and b from the input stream. If the resulting value is less than 0, then a appears before b in the output stream; if the value is greater than 0, then a appears after b in the output stream; any other value means that the partial order of a and b is undefined. If expression is not specified, it defaults to ascending natural order. If the input stream has fewer than n values, this program is equivalent to ndjson-sort.

For example, to output the GeoJSON feature with the largest size property:

shp2json -n example.shp | ndjson-top 'a.properties.size - b.properties.size'

This program is equivalent to ndjson-sort expression | tail -n count, except it is much more efficient if n is smaller than the size of the input stream.

# ndjson-top -n count

Specify the maximum number of output values. Defaults to 1.

Recipes

To count the number of values in a stream:

shp2json -n example.shp | wc -l

To reverse a stream:

shp2json -n example.shp | tail -r

To take the first 3 values in a stream:

shp2json -n example.shp | head -n 3

To take the last 3 values in a stream:

shp2json -n example.shp | tail -n 3

To take all but the first 3 values in a stream:

shp2json -n example.shp | tail -n  4