I tested this up and down with a Phi-1.5 model. Works very nice. There are optimizations that can be made for sure. It does exactly what I wanted it to do though. Fine tuning is ded. This algorithm gives an LLM model a 'long term memory' instead. You can adjust the size of it, depending on your needs and your hardware. More memory = more GPU. Simple equation. You can scale to infinity in theory though. Think of it like hooking up a database to the LLM. But you can make the database VERY flexible. You can even make it add to itself via your conversations.