Abacus let you count item frequencies in big datasets with a fixed amount of memory.
Unlike a regular counter it trades off accuracy for memory. This is useful for particular tasks, for example in NLP/ML related tasks you might want to count millions of items however approximate counts are good enough.
Example:
counter := abacus.New(maxMemoryMB=10) // abacus will use max 10MB to store your counts
counter.Update([]string{"item1", "item2", "item2"})
counter.Counts("item1") // 1 , counts for "item1"
counter.Total() // 3 ,Total number of counts (sum of counts of all elements)
counter.Cardinality() // 2 , How many different items are there?
Abacus lets you define how much memory you want to use and you go from there counting items. Of course there are some limitations, and if you set the memory threshold too low, you might get innacurate counts.
- Counting bigrams (words) from Wiki corpus.
- Compared memory and accuracy of
Abacus
vs using amap[string]int
Corpus Data Structure Used Memory Accuracy
Corpus | Data Structure | Used Memory | Accuracy |
---|---|---|---|
Half of Wiki corpus (English) | Abacus (1000MB) | 1.75GB | 96% |
Half of Wiki corpus (English) | Abacus (Log8) (200MB) | 369MB | 70% |
Half of Wiki corpus (English) | Abacus (Log8) (400MB) | 407MB | 98% |
Half of Wiki corpus (English) | Map | 3.3GB | 100% |
Corpus | Data Structure | Used Memory | Accuracy |
---|---|---|---|
Complete Wiki corpus (English) | Abacus (2200MB) | 3.63GB | 98% |
Complete Wiki corpus (English) | Abacus (500MB) | 741MB | 15% |
Complete Wiki corpus (English) | Abacus (Log8) (500MB) | 760MB | 90% |
Complete Wiki corpus (English) | Abacus (Log8) (700MB) | 889MB | 97% |
Complete Wiki corpus (English) | Map | 10.46GB | 100% |
Note: This is me playing with Golang again, heavily based on Bounter
Used to count item frequencies.
Used to calculate the cardinality
Icon made by free-icon