HindiNLP

A specialized NLP library which provides tools to perform basic NLP tasks on Hindi Datasets. Currently, the library supports

Named Entity Recognition : The Library provides NER support to tag hindi sentences. Further you can apply NER to sentences in textfiles with running only one line of code. If you wish to train youe own NER model, you can do so without writing any script but in one line of code.
AutoClassifier : Train your models on classification datasets just in one line of code. Finetune model using custom parameters

Installation

Directly install library using pip

pip install HindiNLP

NER Tagger

The NER Tagger identifies various parts of sentences and tags them with the type of entity they could represent. Currently the tags supported by our model are

Heading	NEP	NED	NEO	NEA	NEB	NETP	NETO	NEL	NETI	NEN	NEM	NETE
Tag Type	Person	Designation	Object	Abbreviation	Brand	Title-Person	Title-Object	Location	Time	Number	Measure	Terms

In order to use the Tagger, for one sentence

from HindiNLP.HindiNer import NER
detect_ner = NER()
sentence = detect_ner.Predict("अविनाश आगरा में रहता है")
print(sentence)

Print the sentence to see what the tagger found. Furthermore, entire textfiles can be processed and NER tags can be identified for all sentences

from HindiNLP.HindiNer import NER
detect_ner = NER()
detect_ner.Predict_textfile("/path/to/textfile.txt")

The input textfile must contain sentences one after each other in seperate lines. The annotated tags can be found in the file named 'textfile__NER.txt'.

Train your own NER model

If you wish, to train your own NER model with specific NER tags you can do so in one line. First, create dictionary with training hyperparameters in the following format.

    train_dict = {
            "lr" : 0.1 ,              # learning rate
            "batch_size" : 32 ,       # batch size
            "epochs" : 150 ,          # no.of epochs
            "hidden_size" : 256       # size of hidden_state LSTMs
            }

The input text files should be of the format. Finally, pass it to the training code.

   detect_ner.train_NER(data_folder="/path/to/folder",  # folder path containing all text files
                        train_file="train.txt",            
                        dev_file="dev.txt",
                        test_file="test.txt",
                        train_dict=train_dict)

Further use the trained model to predict Sentences in tags

from HindiNLP.HindiNer import NER
detect_ner = NER()
sentence = detect_ner.Predict("अविनाश आगरा में रहता है",is_path=True,path="/path/to/trained/model")
print(sentence)

Auto Classifier

The Library also provides support for training your own Classifier with just one line of code.

from HindiNLP.AutoClassifier import classifier
SVC = classifier("/path/to/dir") # path to directory containing text files
train_dict = {
          "hidden_size" : 512, # hidden size of LSTMs
          "output_size" : 256, # output size of LSTMs
          "lr" : 0.1 ,  # initial learning rate
          "batch_size" : 256 , #mini batch_size
          "n_epochs" : 150} # no.of epochs
   SVC.train(train_dict)
   SVC.predict("I am a good man")

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
.idea		.idea
HindiNLP		HindiNLP
.DS_Store		.DS_Store
.gitattributes		.gitattributes
.gitignore		.gitignore
LICENSE		LICENSE
MANIFEST		MANIFEST
README.md		README.md
setup.cfg		setup.cfg
setup.py		setup.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HindiNLP

Installation

NER Tagger

Train your own NER model

Auto Classifier

About

Releases

Packages

Languages

License

Sanyam07/HindiNLP

Folders and files

Latest commit

History

Repository files navigation

HindiNLP

Installation

NER Tagger

Train your own NER model

Auto Classifier

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages