This repository contains notebook demos on encoding strings as images using the Word2Vec model and performing predictions on those images using ML techniques. I recently came across a reasearch paper which takes a similar approach for detecting malware. We can probably apply some of the concepts they worked on to this problem space. Check out the notebooks/POC folder to see how this would work.
- Clone this repository
pip install git https://github.com/tteigman/string2image
- Data cleansing by detecting strings in a data field that are out of place (and putting them in place)
- Domain specific fuzzy string matching