Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Can we expect docs and examples update for v4? #596

Closed
labs20 opened this issue Sep 18, 2020 · 8 comments
Closed

Can we expect docs and examples update for v4? #596

labs20 opened this issue Sep 18, 2020 · 8 comments

Comments

@labs20
Copy link

labs20 commented Sep 18, 2020

Hi. Before anything, thanks for your hard and awesome work on this package, and most of all, really thanks for sharing your hard work with us. Thanks!

Now, the lack of docs or functional examples about v4 is making hard to use this package. I mean, NLP is already hard on its own, at least for a newcomer like me, and I'm on that hard spot were I must choose and settle wich weapons I'll use for the entire project.

I'd like very much to stay on node so I'm trying to find packages that I could use to my main goal that is text classification. Right now it's looking that I should go to the other side of the fence and take python's spacy or nltk, with word2vec and such, but mostly because there are plenty of docs and examples to help me sort things out.

So, with all due respect for your time and work, is there a plan to produce new docs and examples, or there is a community forum that I could fetch more information about?

Thanks!

@jesus-seijas-sp jesus-seijas-sp pinned this issue Sep 18, 2020
@jesus-seijas-sp jesus-seijas-sp unpinned this issue Sep 18, 2020
@jesus-seijas-sp
Copy link
Contributor

jesus-seijas-sp commented Sep 18, 2020

Hi,

I will put here a FAQ with the links to different interesting parts to documentation or examples. If there is something that you want to cover that is not here, just ask, and I will update the documentation with it.
Also, there are 1800 unit tests that can help to understand classes and functions that are not intended to be the API for the developer (from node-nlp the API for the developer is intented to be the class NlpManager).

- Where do I find an example of use of v4?
https://github.com/axa-group/nlp.js#example-of-use

- But this example seems to be of the v3...
The version 4 is splitted into different smaller packages, but the https://www.npmjs.com/package/node-nlp package use tose smaller packages to build a version so retrocompatible with v3 as we can

- But I don't find the NluManager in this package...
You don't need it, you have the NlpManager that inside is able to handle the NLU, NLG, Language guesser....

- Ok, but I feel that I want to go with the pure v4, how I start?
There is a quickstart here: https://github.com/axa-group/nlp.js/blob/master/docs/v4/quickstart.md

- But this is for backend... I want this bundle to work in my browser or mobile using react native
You have the quickstart for browser and react native here: https://github.com/axa-group/nlp.js/blob/master/docs/v4/webandreact.md

- This thing of intents is hard to understand, I just want to do a Questions and Answers bot
You have a quickstart for simple QnA here: https://github.com/axa-group/nlp.js/blob/master/docs/v4/qna.md

- How I test my chatbot in Console?
https://github.com/axa-group/nlp.js/blob/master/docs/v4/quickstart.md#adding-your-first-connector

- How I do a multilanguage chatbot?
Installing the package for the language and adding it as a plugin
https://github.com/axa-group/nlp.js/blob/master/docs/v4/quickstart.md#adding-multilanguage

- Do you have some example of a chatbot running in several languages?
You can remix this project in glitch, you'll see that it only haves one line of source code, but with only one line it creates a backend with API and exposes a react frontend with the bot, and it's multi-language. To see the frontend click on the button "show" and then "next to the code".
https://glitch.com/edit/?utm_content=project_nlpjs-multi&utm_source=remix_this&utm_medium=button&utm_campaign=glitchButton#!/remix/nlpjs-multi

- Using node-nlp package I need to install the languages separately?
No. It use a pacakge @nlpjs/lang-all that mounts all the languages

- Where I can see the languages and their locales to find the correct package to install?
Here, the one with Native Support
https://github.com/axa-group/nlp.js/blob/master/docs/v4/language-support.md

- But I want to have a web for my chatbot!
Here you have how to easely expose your chatbot with directline API, and how to expose a WebChat: https://github.com/axa-group/nlp.js/blob/master/docs/v4/quickstart.md#adding-api-and-webchat

- But this does not help on how to orchestrate a chatbot
Well, NLP.js is the set of NLP tools, not the chatbot ones. For orchestrating a chatbot I recommend you to use Microsoft Bot Framework

- When an intent is triggered I want to get the answer from an API call and I'm not using any chatbot orchestrating SDK
You can with pipelines that react to your intent, you've an example here: https://github.com/axa-group/nlp.js/blob/master/docs/v4/quickstart.md#adding-logic-to-an-intent

- How I guess the language from an utterance when I have a multi-language bot?
The language is guessed automatically using the most common 3-grams from the language, but also with the 3-grams from the corpus training it, so that way you can use even languages that does not exists, or get a better guessing based on your corpus.

- And how I guess a language from a sentence, not integrated with the NLP?
You have the example here: https://github.com/axa-group/nlp.js/blob/master/docs/v3/language-guesser.md
But if you want to get an smaller impact on your node_moules use the library @nlpjs/language instead of node-nlp one.

- Ok, what about the NER?
You can use the NER directly from the NlpManager:

const { NlpManager } = require('node-nlp');

async function main() {
  const manager = new NlpManager({ languages: ['en'], forceNER: true });
  manager.addNamedEntityText(
    'hero',
    'spiderman',
    ['en'],
    ['Spiderman', 'Spider-man'],
  );
  manager.addNamedEntityText(
    'hero',
    'iron man',
    ['en'],
    ['iron man', 'iron-man'],
  );
  manager.addNamedEntityText('hero', 'thor', ['en'], ['Thor']);
  manager.addNamedEntityText(
    'food',
    'burguer',
    ['en'],
    ['Burguer', 'Hamburguer'],
  );
  manager.addNamedEntityText('food', 'pizza', ['en'], ['pizza']);
  manager.addNamedEntityText('food', 'pasta', ['en'], ['Pasta', 'spaghetti']);
  const result = await manager.process('I saw spederman eating speghetti in the city');
  console.log(result);
}

main();

- This is not extracting the entities...
When you create the NlpManager be sure to set to true forceNER. This will activate the NER even if you don't have entities associated to intents.

  const manager = new NlpManager({ languages: ['en'], forceNER: true });

- The enum entity extraction is slow
By default the NER threshold is 0.8, that allows users to have "mistakes" when they write, but also makes the problem to identify the entities to be heavier. Right now, until this process performance is improved, the way to do that is to set the threshold to 1:

const manager = new NlpManager({ languages: ['en'], forceNER: true, ner: { threshold: 1 } });

With threshold set to 1, the exact match of entities is done by searching words in a dictionary, so the process is able to search over millions of posible values in miliseconds.

- The builtin entity extraction is slow or crash the process
By default this extraction is done with Microsoft Recognizers: https://github.com/microsoft/Recognizers-Text
This do the search using complex regular expressions that are computationally very slow.
In windows we detected that some sentences can cause it to crash, mostly when using french.
One option is to use duckling instead, but it requires to have an instance of duckling up and running, and connect to it through its API https://github.com/axa-group/nlp.js/blob/master/docs/v3/builtin-duckling.md

- How I use enum entities
https://github.com/axa-group/nlp.js/blob/master/docs/v3/ner-manager.md#enum-named-entities

- How I search entities by regular expressions
https://github.com/axa-group/nlp.js/blob/master/docs/v3/ner-manager.md#regular-expression-named-entities

- What builtin (golden) entities I can extract
https://github.com/axa-group/nlp.js/blob/master/docs/v3/builtin-entity-extraction.md

- I want to extract builtin (golden) entities but it only works in a few languages
You can use Duckling instead, but it requires to have an instance of duckling up and running, and connect to it through its API https://github.com/axa-group/nlp.js/blob/master/docs/v3/builtin-duckling.md

- I want to go "lowlevel" to use only the Neural Network for classifying
Here you'll find the example code if you want only to tokenize: https://github.com/jesus-seijas-sp/nlp-course/blob/master/02-classifiers/05-nlpjs-classifier.js
Here you'll find the example code if you also want stemming:
https://github.com/jesus-seijas-sp/nlp-course/blob/master/02-classifiers/14-nlpjs-stemmer-classifier.js

- I want to use NGrams

Here you have an example of how to use ngrams by char and by word:

const { NGrams } = require('@nlpjs/utils');
const fs = require('fs');

const gramsByChar = new NGrams();
const gramsByWord = new NGrams({ byWord: true, startToken: '[START]', endToken: '[END]' });

const input = 'one ring to rule them all';
const outputByChar = gramsByChar.getNGrams(input, 3);
const outputByWord = gramsByWord.getNGrams(input, 3);
console.log(outputByChar);
console.log(outputByWord);

const freqByChar = gramsByChar.getNGramsFreqs(input, 2, true);
console.log(freqByChar);

const lines = fs.readFileSync('./data/wikipedia_es.txt', 'utf-8').split(/\r?\n/);
console.log(lines);
const freqs = gramsByChar.getNGramsFreqs(lines, 3);
console.log(freqs);

- I want a pattern corpus, I mean, to generate a full cartesian product corpus from sentences with different options
That is, from a sentence like "I [am having|have] a [problem|question|issue]" you want to generate all the possibilities: I am having a problem, I am having a question, I am having a issue, I have a problem, I have a question, I have a issue.
Here you have an example code:

const { composeFromPattern, composeCorpus } = require('@nlpjs/utils');
const corpusPattern = require('./data/corpus-en-pattern.json');

const input = 'I [am having|have] a [problem|question|issue] that I have to [solve|investigate]';
const result = composeFromPattern(input);
console.log(result);

const corpus = composeCorpus(corpusPattern);
console.log(JSON.stringify(corpus, null, 2));

To use with this example corpus:
```json
{
  "name": "Corpus Pattern",
  "locale": "en-US",
  "data": [
    {
      "intent": "eat",
      "utterances": [
        "I [usually|always] [like|love] to eat [pizza|spaghetti|burguer]"
      ],
      "tests": [
        "I [like|love] [pizza|burguer]"
      ]
    },
    {
      "intent": "investigate",
      "utterances": [
        "I [am having|have] a [problem|question] that I have to [solve|investigate]"
      ],
      "tests": [
        "I should [solve|investigate] that [problem|question]"
      ]
    }
  ]
}

- I want to calculate the levenshtein distance of two strings

Use similarity function, the third parameter by default is "false", set it to "true" if you want both strings to be normalized.

const { similarity } = require('@nlpjs/similarity');

console.log(similarity('potatoe', 'potatoe'));
console.log(similarity('potatoe', 'potatoes'));
console.log(similarity('potatoe', 'potsatoe'));
console.log(similarity('potatoe', 'poattoe'));
console.log(similarity('potatoe', 'postatoé', true));
console.log(similarity('potatoe', 'Postatoé', true));

- Given a text I want to calculate the best substring that match an string

Use getBestSubstring from ExtractorEnum of ner

const { ExtractorEnum } = require('@nlpjs/ner');

const text = 'Morbi ainterd multricies neque varius condimentum. Donec volutpat turpis interdum metus ultricies vulputate. Duis ultricies rhoncus sapien, sit amet fermentum risus imperdiet vitae. Ut et lectus';
const str = 'interdum ultricies';

const extractor = new ExtractorEnum();
const result = extractor.getBestSubstring(text, str);
console.log(result);

@labs20
Copy link
Author

labs20 commented Sep 22, 2020

=]

Thanks!!

@carlocadiz
Copy link

How do I get Trim Named Entities working? When I try usingconst fromEntity = manager.addNamedEntity('fromEntity', 'trim'); I get TypeError: manager.addNamedEntity is not a function

@labs20
Copy link
Author

labs20 commented Aug 11, 2021

How do I get Trim Named Entities working? When I try usingconst fromEntity = manager.addNamedEntity('fromEntity', 'trim'); I get TypeError: manager.addNamedEntity is not a function

I'm struggling with that too. Is it possible to find one full working example on how to use NER and all of it functions?

I cant find help for many of the functions, and all the examples ar or outdated or incomplete.

Some function have "options" as last parameter and even going trough source code I could'nt figure that out.

Looks like the "addTrimEntity" is no more, and it appears that direct function took its place, but just guessing here.

Is there a working example on how to use "addBetweenCondition"?

Thanks

@y-nk
Copy link

y-nk commented Feb 12, 2022

The first answer is amazing 👀 I've learnt a lot 🌶️. I was really looking to compute a corpus from code as well, thanks a lot.

The only missing piece is how to use Ner in v4, bc the example is given with v3 code sample. There's a nlp.ner instance attached with a bunch of methods, but tbh without types it gets pretty hard to know what to pass without docs :)

PS: this issue should be pinned.

PS2: const { composeFromPattern, composeCorpus } = require('@nlpjs/utils'); is not supported in browser. something with process.binding not supported

@Apollon77
Copy link
Contributor

@carlocadiz @labs20 The methods are called addNER* in v4 ... just check the nlp class

@Apollon77
Copy link
Contributor

I included the FAQ into my PR to add to the main Readme

Apollon77 added a commit to Apollon77/nlp.js that referenced this issue Aug 8, 2022
@aigloss
Copy link
Contributor

aigloss commented Nov 25, 2022

Closing due to inactivity. Please, re-open if you think the topic is still alive.

@aigloss aigloss closed this as completed Nov 25, 2022
@aigloss aigloss closed this as not planned Won't fix, can't repro, duplicate, stale Nov 25, 2022
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

6 participants