Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Reorganizing name files #426

Open
wants to merge 19 commits into
base: master
Choose a base branch
from
Open

Reorganizing name files #426

wants to merge 19 commits into from

Conversation

tukkek
Copy link
Contributor

@tukkek tukkek commented Jun 30, 2023

After #425, there are 9 files in humans/ relating to names, which is a substantial chunk of the folder. I've taken the liberty of reorganizing them in this PR rather than creating an issue first to discuss the changes, with a names/ sub-folder.

Pros:

  1. Proper structure (such as with data/games/rpg/ and data/words/literature/).
  2. Better discoverability (may be easy for users to not realize there are more name-related files when browsing humans/).
  3. More elegant file-names (see below).

Cons:

  1. Will break backwards compatibility (if someone is using the whole project as git-submodule, for example).

As per 3, I've taken the liberty to rename the files for a more elegant and cohesive naming scheme. for example:

  • firstNanes.json and lastNames.json to usa.json and usaSurnames.json
  • norwayFirstNamesBoys.json to norwayMale.json

I hope the benefits out-weight the compatibility issue, especially as it's safe to expect more name-related files to be added again in the future but I understand this might be a concern for some projects.

Here's a simpler view of the folder changes: https://github.com/tukkek/corpora/tree/names-folder/data/humans/names

@tukkek
Copy link
Contributor Author

tukkek commented Jun 30, 2023

Here's a simpler view of the folder changes: https://github.com/tukkek/corpora/tree/names-folder/data/humans/names

@hugovk
Copy link
Collaborator

hugovk commented Jun 30, 2023

Corpora doesn't make any compatibility promises, so I think this is okay?

Let's ask @dariusk :)

@tukkek
Copy link
Contributor Author

tukkek commented Jun 30, 2023

Corpora doesn't make any compatibility promises

A few more thoughts on this matter:

  1. This isn't an API change which requires logic updates and re-testing. Updating a JSON file's address should be trivial.
  2. It''s unlikely that anyone is doing git-submodule for a single file or two, which should be most use-cases for corpora. Even if they are, git sub-modules don't update automatically and end-users should be aware of issues when doing so.
  3. People hot-linking (such as this) could and should just as easily be doing so in a stable manner instead (this).

This is probably overkill argumentation just to get this issue merged but I'm hoping it may inform the project as a whole (as a data-focused project in comparison to a programming library API or similar). I do appreciate the discussion though, as it may ultimately affect end-users! Cheers to all.

@tukkek
Copy link
Contributor Author

tukkek commented Jul 6, 2023

Corpora doesn't make any compatibility promises

Since it's been a week with no opinions against (and considering previous discussion), do you think it'd be alright to merge?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants