-
Notifications
You must be signed in to change notification settings - Fork 1k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
自定义词典时,若整个句子为词典中的词时,模型输出预期外的切分。 #466
Comments
按照上方的修改后,同样的代码,输出如下: [['跟谁学']], {'word_cls': tensor([[[-4.0777e-01, 4.0268e-01, 1.2457e-01, -1.5838e- ... 这个感觉更符合直觉。 |
应该是写错了,不过问题不在这,是由于分词 TAG 导致的问题 |
稍后你可以试一下 4.1.3.post1 版本,我测试了一下已经修复了这个问题 |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
ltp version: 4.1.1
输出:
([['跟', '谁', '学']], {'word_cls': tensor([[[-4.0777e-01, 4.0268e-01, 1.2457e-01, -1.5838e-01, 6.1400e-03 ...
预期中应该讲整个句子作为一个词,将源码
ltp.algorithms.Trie
的maximum_forward_matching
方法中的代码while end <= text_len and curr_len < max_len:
修改为while end <= text_len and curr_len <= max_len
则可以正确切分。即将 < 修改为 <=
不清楚是作者故意这么设计的,还是一个小 bug?
The text was updated successfully, but these errors were encountered: