Any interest in loading emoji data from unicode data files? #145

yob · 2020-12-29T14:23:41Z

I was interested in using emojis released over 2019/2020 however I wasn't sure of the correct way to edit the data files so I created a fork that loads emoji data from Unicode Consortium data files.

A nice side-effect of this approach is updating to support newly released emoji looks like this:

$ npm update emoji-datasource
$ npm run gen-emojis
$ git commit [email protected]/data/emoji.js -m "updated emoji data"

A downside is customising keywords and names gets harder (and in my fork at least, I've skipped any customisations to keep things simple).

It works pretty well, however it's a hacky solution that works for me and I didn't put much consideration into making it suitable for merging upstream. I'd be happy to polish it up and help resolve issue #28 if you're interested? If so I'd love some guidance on your preferred approach. It's also completely fine if this approach doesn't work for you, I'm happy to run a customised fork for now.

Thanks for a great extension 🫀🪅🪠🪃

xurizaemon · 2021-11-19T19:02:35Z

It would be a shame to lose the customised naming. It would be great to make aliasing / customised naming easier and simplify updating with new emoji too!

Can we lay our own data over the top, starting with "official" emoji & names from the built data and overlaying useful extras like categories and aliases already in this package? Then the autopopulated data / default emoji names can be safely updated beneath the Emoji Selector additions, and we get best of both worlds.

{
  "😀": { "categories": ["people"], "aliases": ["grinning face", "grin"] },
  "😁": { "categories": ["people"], "aliases": ["grinning face with smiling eyes", "grin", "smile"] },
  "🤡": { "categories": ["people", "git"], "aliases": ["clown", "mocking"] },
  "🚨": { "categories": ["whatever", "git"], "aliases": ["police car light", "police", "revolving light", "rotating light", "linter", "tests"],
}

(NB: Quick examples above only - I do recall from #80 that we don't want a category just for emoji commit things and am not proposing a category change here)

maoschanz · 2021-11-20T07:15:29Z

Oh sorry it looks like i didn't see this issue

That's a great idea and i had plans to do something similar, however:

as you say there is the customization problem, and i should change the code handling it before merging your approach
this data source has numerous languages, which is a great opportunity to finally translate the keywords
...so it would be great if the extension uses a default "english" file, but is able to generate the data for emojis in any languages
to do this on the end-user machine, i think it shouldn't use insane bloatware like npm

xurizaemon · 2021-11-20T08:04:18Z

to do this on the end-user machine, i think it shouldn't use insane bloatware like npm

Would there be any reason to run those commands on the end user machine? GitHub actions or a developer task could do that occasional work when updates ship, and trigger a release i believe.

maoschanz · 2021-11-20T11:04:57Z

a single one of these files is already quite big, so the size of the entire extension if i ship all the possible translations? An extension shouldn't be dozens of megabytes big.

Also, a big potential pro of relying on an external data source would be that users don't have to wait for updates from me when new emojis are released by unicode

zelch · 2022-12-05T08:25:36Z

I note that this discussion has been idle for the past year, but I'm pretty interested in it.

From a 'how' perspective, I suggest that we start with Github actions stuff, and then figure out how to optimize the process and experience once we know what the download sizes, processed files sizes, processing time, and the like work out to be.

On the source side, hashes that start with the unicode character for the emoji, and then contain things like what language the entry is in, categories, aliases, etc, would make it pretty easy to take the current list of stuff, and future customizations, and merge them with the upstream unicode data.

What that turns into after processing could easily be the same structures that we have today, or something else. What makes sense for the compilation of the data, and what makes sense for using the data, are almost exact opposites. After all, the aliases are what we want to search by, not the raw unicode of the emoji.

Thoughts?

maoschanz · 2022-12-08T10:33:29Z

sorry, as you point out i didn't have any thought about any of this for the past year, and i will need to go back to it before saying anything

i need the silly 🥸 emoji so bad so i think i'll do it this winter

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Any interest in loading emoji data from unicode data files? #145

Any interest in loading emoji data from unicode data files? #145

yob commented Dec 29, 2020

xurizaemon commented Nov 19, 2021 •

edited

Loading

maoschanz commented Nov 20, 2021

xurizaemon commented Nov 20, 2021

maoschanz commented Nov 20, 2021

zelch commented Dec 5, 2022

maoschanz commented Dec 8, 2022

Any interest in loading emoji data from unicode data files? #145

Any interest in loading emoji data from unicode data files? #145

Comments

yob commented Dec 29, 2020

xurizaemon commented Nov 19, 2021 • edited Loading

maoschanz commented Nov 20, 2021

xurizaemon commented Nov 20, 2021

maoschanz commented Nov 20, 2021

zelch commented Dec 5, 2022

maoschanz commented Dec 8, 2022

xurizaemon commented Nov 19, 2021 •

edited

Loading