A specialized NLP library which provides tools to perform basic NLP tasks on Hindi Datasets. Currently, the library supports
- Named Entity Recognition : The Library provides NER support to tag hindi sentences. Further you can apply NER to sentences in textfiles with running only one line of code. If you wish to train youe own NER model, you can do so without writing any script but in one line of code.
- AutoClassifier : Train your models on classification datasets just in one line of code. Finetune model using custom parameters
Directly install library using pip
pip install HindiNLP
The NER Tagger identifies various parts of sentences and tags them with the type of entity they could represent. Currently the tags supported by our model are
Heading | NEP | NED | NEO | NEA | NEB | NETP | NETO | NEL | NETI | NEN | NEM | NETE |
---|---|---|---|---|---|---|---|---|---|---|---|---|
Tag Type | Person | Designation | Object | Abbreviation | Brand | Title-Person | Title-Object | Location | Time | Number | Measure | Terms |
In order to use the Tagger, for one sentence
from HindiNLP.HindiNer import NER
detect_ner = NER()
sentence = detect_ner.Predict("अविनाश आगरा में रहता है")
print(sentence)
Print the sentence to see what the tagger found. Furthermore, entire textfiles can be processed and NER tags can be identified for all sentences
from HindiNLP.HindiNer import NER
detect_ner = NER()
detect_ner.Predict_textfile("/path/to/textfile.txt")
The input textfile must contain sentences one after each other in seperate lines. The annotated tags can be found in the file named 'textfile__NER.txt'.
If you wish, to train your own NER model with specific NER tags you can do so in one line. First, create dictionary with training hyperparameters in the following format.
train_dict = {
"lr" : 0.1 , # learning rate
"batch_size" : 32 , # batch size
"epochs" : 150 , # no.of epochs
"hidden_size" : 256 # size of hidden_state LSTMs
}
The input text files should be of the format. Finally, pass it to the training code.
detect_ner.train_NER(data_folder="/path/to/folder", # folder path containing all text files
train_file="train.txt",
dev_file="dev.txt",
test_file="test.txt",
train_dict=train_dict)
Further use the trained model to predict Sentences in tags
from HindiNLP.HindiNer import NER
detect_ner = NER()
sentence = detect_ner.Predict("अविनाश आगरा में रहता है",is_path=True,path="/path/to/trained/model")
print(sentence)
The Library also provides support for training your own Classifier with just one line of code.
from HindiNLP.AutoClassifier import classifier
SVC = classifier("/path/to/dir") # path to directory containing text files
train_dict = {
"hidden_size" : 512, # hidden size of LSTMs
"output_size" : 256, # output size of LSTMs
"lr" : 0.1 , # initial learning rate
"batch_size" : 256 , #mini batch_size
"n_epochs" : 150} # no.of epochs
SVC.train(train_dict)
SVC.predict("I am a good man")