data notation for the conveyance of values
na is a simple yet flexible data notation format.
Its value types represent a minimal set of common data types.
Notation matters.
The syntax should be simple, ergonomic, flexible, reliable and secure.
It should be a solid foundation for a wide range of use cases.
#truth
– Boolean truth values#number
– arbitrary precision numbers#text
– a sequence of Unicode scalar values#block
– a sequence of linear/associative values
na meets UAX31-R1-2 by using a profile of UAX31-R1-1, adding the optional start character _
(low line) and the optional medial character -
(hyphen-minus). In the syntax of UAX31-D1:
<Identifier> := <Start> <Continue>* (<Medial> <Continue> )*
<Start> := XID_Start U 005F
<Continue> := XID_Continue
<Medial> := U 002D
That is, a name must conform with UAX31-R1-2 and:
- may start with, contain and end with
_
(low line) - may contain but cannot start or end with
-
(hyphen-minus)
Names are case insensitive, meeting UAX31-R4 with normalization form KC and UAX31-R5 with full case folding. Implementations should ignore default ignorable code points in comparison.
When compiling to a target language that does not support kebab-case
, names may be transliterated to a compatible case style that maintains the separation of words within a name.
-- this is a comment
Boolean truth values, represented with Unicode symbols ⊤
and ⊥
.
For the convenience of end users, implementations should allow the words true
and false
as aliases.
Arbitrary precision signed numbers.
42 -- integer
6.28 -- decimal fraction
1/3 -- rational fraction (ratio)
1.6e-35 -- scientific/exponential notation
1_771_561 -- digit grouping
007 -- leading zeros
48fps -- suffix
99% -- percentage (ratio to 100)
Bases with radix from 2 to 36 is supported, using 0…9 A…Z/a…z as numerals.
2\101010 -- binary
8\755 -- octal
16\decaf -- hexadecimal
A sequence of zero or more Unicode scalar values in UTF-8 encoding.
Single-quoted text is verbatim.
'"verbatim" text'
Double-quoted text supports escape sequences.
"\"escaped\" text"
The following escape sequences are supported:
\"
– quotation markU 0022
\\
– reverse solidusU 005C
\
- line continuation\xxxxxx
– Unicode code point (6 hexadecimal numerals padded with leading zeros)
Multiline texts follow the same rules as Julia's triple-quoted string literals.
'''
this is a "verbatim" text
that's multiline
'''
"""
this is an "escaped" text
that's multiline \01F632
"""
A versatile data structure able to represent both linear and associative collections.
Blocks are enclosed by square brackets []
. Inline items are separated by comma ,
.
Keys are optional and can be either non-negative integers or names.
Associative items are explicitly defined with colon :
.
Linear items are implicitly given 0-indexed integer keys.
[] -- empty
[ 1, 2, 3 ] -- implicit integer keys (list/array/sequence/stack/queue)
[ 7: true, 42: true ] -- explicit integer keys (sparse array)
[ foo: 42, bar: true ] -- explicit names (record/object/map/structure/dictionary/hash)
Similar to Lua tables, JavaScript objects and Dart records, a block may contain both linear and associative values.
[ 1, 2, 3, length: 3 ] -- a mix of implicitly indexed and explicitly named values
More specific data types may be enforced with extensions.
linear: [
[1, 2, 3] -- inline items separated by comma
[4, 5, 6] -- multiline items separated by newline
[7, 8, 9]
]
associative: [
foo: [ -- multiline block
bar: [ baz: true ] -- inline block
]
]
Brackets and commas are required for inline blocks and optional for multiline blocks.
Indentation is significant for multiline blocks.
Multiline items may be prefixed with bullet point •
for readability.
person:
name: 'Alan'
age: 38
friends:
• 'Ada'
• 'Charles'
Either UTF-8 or a compatible binary format, for example CBOR or a derivative of Nota.
- Lightweight
- Human-friendly
- Line-oriented (newline is significant)
- Indentation-based (indentation is significant)
- Extensible
na is the kesh word for river.