This is mainly based on the Wikipedia article.
Note that a radix tree is a space-optimized trie.
According to the article cited above, tries were first suggested in 1959 by a Frenchman
named René de la Briandais (tri
is French for sort
). According to Donald Knuth’s
research in The Art of Computer Programming:
Trie memory for computer searching was first recommended by René de la Briandais. He pointed out that we can save memory space at the expense of running time if we use a linked list for each node vector, since most of the entries in the vectors tend to be empty.
[Note that this is not the approach being followed here.]
More particularly, this seems to be a Patricia trie - which I believe is the binary form.
UPDATE: I found this example (also from Wikipedia) that shows an interesting edge case:
It shows that slow can be both a node - and a leaf - at the same time. This is something
which I had not considered (another example might be real and realistic). Interestingly,
my understanding was that the root node of the trie (it may - or may not - be true that trie
comes from retrieval) is always a single character whereas this diagram shows the root node
as the entire word slow
.
For retrieval purposes I am inclined to persist with using a single character as a root. The only uses for a trie that I have been able to find are spell-checking and autocomplete for search bars and the like. So being able to respond quickly to that first typed character sounds like what I am after. In the case of bytes this would make it possible to use the initial byte as an offset index, although I doubt this would be practical for runes.
I've been doing a lot of high-level programming lately (RESTful APIs, Scala/Akka) so something a little more low-level sounded like a nice change of pace. And it has also allowed me to investigate parts of Go that I do not normally run into.
More importantly, it's a nice opportunity to use Table-driven testing.
-
A recent version of Go
-
make
installed
[Or can simply type go test -v
.]
First seven inserts:
1 (romane) 2 (romanus) 3 (romulus) 4 (rubens) 5 (ruber) 6 (rubicon) 7 (rubicundus)
| | | | | | |
r r r r r r r
| | | / \ / \ / \ / \
omane oman om om ubens om \ om ub om ub
/ \ / \ / \ / \ \ / \ \ / \ \
e us an ulus an ulus an ulus \ an ulus * an ulus *
/ \ / \ / \ ube / \ / \ / \ / \
e us e us e us / \ e us e icon e us e \
ns r / \ / \ \
ns r ns r ic
/ \
on undus
Type the following command:
$ make
The results should look as follows:
GOPATH=/go GOOS=linux GOARCH=amd64 gofmt -d -e -s -w *.go
GOPATH=/go GOOS=linux GOARCH=amd64 golint -set_exit_status *.go
GOPATH=/go GOOS=linux GOARCH=amd64 go tool vet *.go
GOPATH=/go GOOS=linux GOARCH=amd64 go test -v
=== RUN TestInsert
--- PASS: TestInsert (0.00s)
=== RUN TestFind
--- PASS: TestFind (0.00s)
PASS
ok
In concept, these sounded like a great idea. In practice, I'd have to say my feelings are mixed.
Like all tests, they get unwieldy very quickly.
[For an example of how tests can become opaque very quickly, check out the
following example.
While I made every effort to code these tests to be as transparent as possible, I do not think
the results - in terms of being able to grok
what I am actually testing for - are all that
easy to sort out. Doing so requires scrolling through pages of tests in order to form an overview.]
On the other hand, the table-driven test framework allowed me to code up the Find tests - and mock up some data to test them with - while I was still working on the Insert tests, whereas previously I would have had to complete the Insert tests (and code) in order to test the matching Find tests (and code). Or maybe do them both in lock-step, step-wise.
As it is, being able to do them both at the same time has been a great help, at least conceptually.
Looking at examples of this testing style, it seemed like it would be possible to quickly grasp - at a high level - what the tests were. But my experience - in practice - is that this framework still gets very cluttered quite quickly, so that this high-level overview is not really possible.
I still think this approach is great - but I'd probably reserve it for lower-level components, rather than for system testing.
UPDATE: One nice feature of Table-driven Tests is that they are very easily extended. For instance, it can be trivially simple to add a new edge case (such as the empty string - "" - for example) with a few new lines of code (it may not always be this simple, but it's easy to think of these types of examples where extending a table-driven test for a new edge case is far easier than having to write an entirely new test). More tests is almost always better than less, so being able to add edge cases in a very simple way seems like a good approach to encourage more and better testing.
- Investigate applications of Patricia tries
- Refactor tests to avoid some of the duplicated code
- Add code and tests to allow for entries such as "slow", "slower", "slowly"
- Find out the idiom for stacking Insert and Find tests (avoiding mocks)
- Investigate whether byte-based and rune-based options are viable
- Find more examples of tries in use - specifically Rune-based CJKV (Chinese, Japanese, Korean, Vietnamese)
- Add example of Chinese Rune-based trie
- Find out whether the usual practice is to sort trie entries (the Wikipedia example is sorted)
- Tests and code for 'retrieve all entries' functionality
- Upgrade to latest release of Golang (1.14 as of the time of writing)
- Upgrade
release
badge to confomr to new Shields.io standards
This follows on from work four of us did as a group in a couple of hours at the local Golang Meetup.
In particular, Dan coded up the original Table-driven Tests, which proved very instructive.
Blake coded up the original ASCII art, which has been fun to elaborate on.
My original fork of this project may be seen here:
https://github.com/mramshaw/golangvan/tree/master/radix-trie
As I have departed heavily from the original framework (both design and implementation) I created this repo.
In particular, I have followed a Rune-based approach, rather than a Byte-based approach. I have opted for lazy structures, which are cheaper in terms of memory requirements but possibly costly in performance terms. Also, the original goal was to be as efficient as possible, whereas I am not greatly concerned with efficiency here. My goal is an MVP (minimum viable product; meaning a proof-of-concept, demo or spike) for learning purposes.