This branch is the start of a near total refactoring of the DFI framework to make it easier to write modules, and to facilitate the use of Redis for IPC. (and also to fix some wierdness left over from coding in the summer heat).
Stay tuned! :)
You need to install the dfitools package to use this application.
Diverse Folio Isle is a framework for doing text-mining.
There is a manual available here: Read the docs
The program has a simple CLI that is accessed through Diverse_Folio_Isle.py. This script directs a three-step process where the user specifies a sourcing, a preprocessing and a classification script. These scripts are as orthogonally modular as possible, some are even usable as standalone applications (like the UFT pdf scraper).
Modularity is meant to facilitate scientific comparison of text-mining techniques. For more on this, see my thesis.