“It is important to view knowledge as a sort of semantic tree. Make sure you understand the fundamental principles, i.e., the trunk and big branches before you get into the leaves/details or there is nothing for them to hang on to.”
semtree
is a utility to construct a semantic tree from word lists/indexes which may span multiple objects -- the most likely setup being multiple filenames which map to their file content.
semtree
itself is essentially a collection of functions to facilitate the cultivation of this tree, namely with lint()
, create()
, update()
, or print()
, and handles the build process via a state machine.
This package can be used in conjunction with treehouze and is compatible with [[wikirefs]]
, caml and yaml syntaxes.
See context for more about why and how this package is useful.
🌳 Cultivate a "semantic tree" or "knowledge bonsai" in your 🎋 WikiBonsai digital garden.
Install with npm:
npm install semtree
Say we have the following two markdown files:
// file: fname-a
- [[node-1]]
- [[node-1a]]
- [[node-2]]
- [[node-2a]]
- [[node-2b]]
- [[fname-b]]
// file: fname-b
- [[node-3]]
- [[node-4]]
If we wanted to create a single tree from both of these files, we can use semtree
like so:
import * as semtree from 'semtree';
let opts = {
wikiLink: true, // defaults to 'true'
};
const rootName: string | undefined = 'fname-a';
// read in files and create a record where
// keys are filenames and values are the file's content
const semTreeText: Record<string, string> = {
// key: filename; value: file content
'fname-a':
`- [[node-1]]
- [[node-1a]]
- [[node-2]]
- [[node-2a]]
- [[node-2b]]
- [[fname-b]]
`,
'fname-b':
`- [[node-3]]
- [[node-4]]
`,
};
const tree = semtree.create(semTreeText, rootName, opts);
Which will create a tree that looks like:
graph TD;
node-1-->node-1a;
node-1-->node-2;
node-1-->fname-b;
node-2-->node-2a;
node-2-->node-2b;
fname-b-->node-3;
fname-b-->node-4;
Tree requirements are sparse because the idea is to allow the end-user to determine the shape and structure of their tree in their markdown files. This package merely creates a single, virtual tree so as to better present that unified structure to the end-user.
Parsing:
- Semtree will check for delimiters delineating where the index content is:
: title : index file This is some markdown text. <!--semtree--> - [[node-a]] - [[node-b]] - [[node-c]] <!--/semtree-->
- This content would return the following as the file content relevant for tree-building:
- [[node-a]] - [[node-b]] - [[node-c]]
- If delimiters do not exist any attribute metadata in
caml
oryaml
format will be stripped.--- subject: semantic tree --- : title : index file - [[node-a]] - [[node-b]] - [[node-c]]
- Here, the same content would be identified as the tree content since
caml
andyaml
attrs would be stripped:
- [[node-a]] - [[node-b]] - [[node-c]]
- Here, the same content would be identified as the tree content since
Syntax:
- Indentation size defaults to
2
'space'
s. (see optionsindentKind
andindentSize
). - Markdown bullets (
-*
) are optional (see optionmkdnBullet
). [[wikilink]]
syntax is optional (see optionwikiLink
).
Validity:
- Every node in the tree should be unique; e.g. each list-item's text should be unique.
- Must be a directed-acyclic-graph (DAG).
- Each level can have any number of nodes.
Each node in the tree contains:
export interface TreeNode {
text: string;
ancestors: string[];
children: string[];
// custom data
[key: string]: any;
}
ancestors
: An array of strings that are the text of other nodes in the tree. Represents ancestors of the current node from the root node following the ancestral path to the current node.
children
: An array of strings that are the text of other nodes in the tree. Represents children of the current node.
text
: Contains the node text, which should be unique across all nodes in the tree and is used as an identifier in each nodes' other properties ancestors
and children
.
Finally, custom data is supported.
The full SemTree
looks like this:
interface SemTree {
root: string;
nodes: TreeNode[];
trunk: string[];
petioleMap: Record<string, string>;
orphan: string[];
}
root
: The text
of the root node.
nodes
: Contains a flat array of all the TreeNode
s in the tree.
trunk
: An array of text
names of all the index/branch nodes (which typically correspond to the keys of the content
hash).
petioleMap
: A hash whose keys are the text
names of all the nodes in the tree and the values are the text
names of the index/branch node those keys appeared in (e.g. key node-1
yields value fname-a
from the example above because node-1
appears in fname-a
).
('petiole': "A leaf petiole is a thin stalk that connects a leaf blade to a stem")
orphan
: An array of text
names of any unprocessed index/branch nodes from the content
hash keys not processed after calling create()
or update()
.
Create a tree from a given Record
, where keys represent nodes in a tree and values represent multiple values in the tree (such as filenames and their content) and build a tree from them. Will return a tree instance upon successful creation. Will return an error string otherwise, for example if there are duplicates found in the tree.
A Record
whose keys are entities (such as files) and values are content strings of those entities.
Name of the root node of the tree.
Options object -- see options below.
Lint a file's content or a record of multiple files' file content.
Checks for:
- Duplicates / cycles
- Spaces / tabs
- Inconsistent indentation
- Over-indentation
- Markdown bullets
- WikiLink
- Lists files that weren't linked in the tree
(Note: Lint line numbers returned will be offset by wherever the target semtree content started within the file. If the content starts at line 5 and the linter says an error occurred on line 1, then the error probably occurs on line 6 of the file.)
A content string or a Record
whose keys are entities (such as files) and values are content strings of those entities.
Lint options:
Kind of indentation -- either 'space's or 'tab's.
Number of indentations (spaces or tabs) which represent each level in the tree.
Whether the linter should check for markdown bullets (-
, *
,
) and print a warning if any nodes are missing them.
Whether the linter should check for [[wikilink]]
and print a warning if any nodes are missing them.
The root filename is needed to print the names of any orphan (unprocessed / unlinked) index / trunk files.
Print the contents of a tree to console logs and return the string if there was a valid tree to print. Returns undefined
if the tree is invalid.
Example output:
bk.how-to-read-a-book
├── demanding-reader
| └── active-reading
| ├── reading-comprehension
| └── the-art-of-reading
└── 4-levels-of-reading
├── elementary-reading
├── inspectional-reading
├── analytical-reading
└── syntopical-reading
An instance of a SemTree
.
Seeing this to false
will suppress printing the tree to the console log and just return the string representation.
update(tree: SemTree, subroot: string, content: Record<string, string>, opts?: SemTreeOpts): SemTree | string;
A method to update a subtree within the semantic tree. (Best used to update individual index
documents.) The given tree
will be directly updated and the updated subtree nodes will be returned separately by update()
.
A SemTree
object.
A Record
whose keys are entities (such as filenames) and values are content strings of those entities (such as file content).
Name of the subroot node of the subtree to be replaced.
Options object -- see options below.
Whether or not to include the semtree/index files themselves as nodes in the tree. This option is a useful toggle between 'tree-building' (non-virtual to allow for index/trunk file traversal) and 'tree-viewing' (virtual to eliminate unnecessary index/trunk files) states. Default is false
. Best used for things like static site generation where updates are not a usual occurrence.
Note: If virtualTrunk
is set to true
, the resulting tree will not be updatable via the update
function.
The delimiter string to look for when identifying semtree indexes within a markdown file. Defaults to 'semtree'
.
The kind of whitespace expected for indentation of each level of the tree. The default is 'space'
.
The size of each indentation level in the tree -- corresponds to number of spaces or tabs. The default is 2.
Whether or not to expect markdown bullets (-
, *
,
).
Whether or not to expect [[wikilink square brackets]]
. Default is true
.
Option functions are useful when keeping the state of the tree in-sync with some other source like an index or database.
A function to execute when each node is added to the tree.
A function to execute when each node is removed from the tree.
A function that can return/operate on the text of the root of the tree when it is being set.
A semantic tree wends through concepts in semantic space, like a melody winds through harmonies in music.
In personal knowledge management (pkm) systems, there are sometimes mechanisms to facilitate the creation and management of hierarchical structures: Tag hierarchies, dynamic tables of contents, note metadata, namespacing, even using the directory system itself, adding a folgezettel to a zettelkasten, are all attempts to create one unified hierarchy from one's atomic notes.
But none of these solutions accommodate the specific aim of trying to build a single "semantic tree" very well: Tag hierarchies and namespacing both suffer from branch length problems -- namespaces generally require the entire branch be spelled out to represent a node accurately, which restricts branch size and thus the size of the whole tree. Metadata pointers is better, but because relationships are built one by one between notes, making large changes to the tree itself is burdensome and visualizing the entire tree at once requires imagination. Using the file directory itself runs into The Duplicate Folder Problem, where using paths to represent branches would contain needless duplicates which correspond to a file of the same name at the same level.
This implementation attempts to ameliorate these issues with the primary focus on facilitating semantic tree cultivation.
Side-Note: If you already have a collection of markdown notes, good candidates for index/tree(trunk) files might be "zettelkasten hubs" or "maps of content" (will likely require some tweaking to fit the model required by this package).