Skip to content

String comparison and classification with Word2Vec and CNNs

Notifications You must be signed in to change notification settings

tteigman/string2image

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

17 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

String comparison and classification with character embedding and deep learning

This repository contains notebook demos on encoding strings as images using the Word2Vec model and performing predictions on those images using ML techniques. I recently came across a reasearch paper which takes a similar approach for detecting malware. We can probably apply some of the concepts they worked on to this problem space. Check out the notebooks/POC folder to see how this would work.

https://www.intel.com/content/www/us/en/artificial-intelligence/documents/stamina-deep-learning-for-malware-protection-whitepaper.html

To try it yourself

  1. Clone this repository
  2. pip install git https://github.com/tteigman/string2image

Use cases

  • Data cleansing by detecting strings in a data field that are out of place (and putting them in place)
  • Domain specific fuzzy string matching

About

String comparison and classification with Word2Vec and CNNs

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published