Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deprecate crate categories and allow metadata on keywords #3488

Open
wants to merge 10 commits into
base: master
Choose a base branch
from

Conversation

clarfonthey
Copy link
Contributor

@clarfonthey clarfonthey commented Sep 10, 2023

Categories for crates are now deprecated and implicitly added as keywords instead. A new set of policies is added to allow the crates.io team to curate the way keywords are presented, replacing features such as the "Popular Categories" list on crates.io with a "Popular Keywords" instead.

Rendered

Special thanks to @Turbo87 from the crates.io team for cleaning up this RFC and adding extra context.

@ehuss ehuss added T-cargo Relevant to the Cargo team, which will review and decide on the RFC. T-crates-io Relevant to the crates.io team, which will review and decide on the RFC. labels Sep 10, 2023
@epage
Copy link
Contributor

epage commented Sep 11, 2023

imo there is value in having both closed and open vocabularies for finding content. I'd hate to deal with proliferation of terms to find cargo plugins.

@clarfonthey
Copy link
Contributor Author

clarfonthey commented Sep 12, 2023

@epage I feel like this example is actually one in favour of removing the existing closed vocabularies.

For example, right now, there's one category for cargo plugins, which contains over 350 crates. This gives some indication that instead of cargo being a single category, it should be multiple. But with the existing vocabulary, this isn't an option, since we can't retroactively remove categories. The other option is to make categories have arbitrary nesting, which can be quite messy.

So, instead, crates can choose to add to the existing cargo plugins subtag (development-tools::cargo-plugins), but also to a new cargo plugins tag (cargo-plugins) which can be split into subtags (like cargo-plugins::testing) as a way of making it easier to identify cargo plugins by purpose. Crates that adopt this new convention can both identify themselves as cargo plugins while also differentiating based upon purpose, something that couldn't be done under the old model.

Sure, we could just spend hours fiddling with the categories as examples like this show up. But the point is that they will always show up, and I think that it'd be more ideal to decouple this fiddling from the actual technical standard, rather than it being an inherent part of it.

@epage
Copy link
Contributor

epage commented Sep 12, 2023

While the existing development-tools::cargo-plugins can be used as a tag in the future, we have no scaffolding in place to help these kind of canonical tags from cropping up in the future. Categories are already not that easy to discover and define in manifests, keywords are even worse off. While its not perfect, the closed vocabulary is filling an important role and I don't think we should remove it until we've filled that gap with a viable alternative.

@clarfonthey
Copy link
Contributor Author

Right. The point here is that this is suggesting that alternative.

Changing vocabulary is something that cannot and should not happen by decree. It has to happen organically in stages. Namely:

  1. People identify that an existing category is not sufficient.
  2. A small subset of people suggest a new category and start using it.
  3. A larger group of people start adopting that category until it reaches critical mass.
  4. The old category is officially replaced.

Since a crate can have multiple categories, this is something that's entirely doable without losing the old categorization. The difference here is that by making the canonicity of a category an emergent property (modified on the side of crates.io), and not one explicitly labelled (in Cargo.toml), we have the option of trying things out. Under the current system, we have either this bad option:

  1. People identify that an existing category is not sufficient.
  2. They make a PR adding a new one.
  3. People slowly start using the new one.
  4. The old category sits around forever, confusing everyone.

Or this bad option:

  1. People identify that an existing category is not sufficient.
  2. They add a keyword instead.
  3. People start using the keyword, until everyone is using it.
  4. A PR is made to instate the keyword as a category.
  5. Now the community has to keep using both the keyword and the category until the category is used by everyone.
  6. The old category still sits around forever.

This feels obvious to me, based upon the text of the RFC. Is it not? Have you read it?

@epage
Copy link
Contributor

epage commented Sep 12, 2023

This is not suggesting what I feel is needed: guard rails to help in working towards de facto standardized tags. The problem is with discoverability for crate authors. The RFC mentions noteworthy tags which helps with some of the problem but not all. We're asking crate authors to do SEO and giving them a limited tool that is managed by a small team of people (crate authors with sway and crates.io) and are putting it outside of their workflow. I think itd be good to do an analysis of how we are doing on categories and keywords and find ways to improve those as a testing bed for if we can handle exclusively open vocabulary.

text/0000-cargo-tags.md Outdated Show resolved Hide resolved
text/0000-cargo-tags.md Outdated Show resolved Hide resolved
@kornelski
Copy link
Contributor

kornelski commented Sep 15, 2023

The current state of categories and keywords in crates is poor, but OTOH I don't think the root cause is in what they're called, or in which toml field they're specified.

When you have a mix of blessed and arbitrary tags, you still have categories and keywords, only merged into a single namespace, and you lose control over the category namespace.

The current categories are an odd bunch, with some large gaps, and easily confused siblings.

  • there's neuroscience, but no biology.
  • there's a bunch of categories for aerospace. There's nothing for IoT or home automation.
  • there's both localization and internationalization, which is a slim difference.
  • there's no category for fonts, chat clients or chatbots, antivirus software, pentesting software.
  • nearly everyone uses "parsing" when they should have used "parser-implementations".
  • there's "compilers", which is a bit too specific for non-rust programming languages in general, interpreters, shell scripting.
  • "algorithms" is an incredibly broad category. Lacks subcategories for hashing, random, search.
  • developer-tools and command-line-utilities are also pretty broad (everything is CLI, from cat to TUI spotify clients)
  • there's no place for applications with a GUI. "gui" is for toolkits, and categorizing by functionality often lacks specific categories.
  • there's computer vision, but not machine learning.
  • text-processing spans everything from markdown to nlp.
  • there's no specific place for clients for specific web APIs/services (a webby equivalent of external-ffi-bindings). the web-programming category is quite broad.

Few crates actually select categories, so browsing by category on crates.io gives an incomplete picture. However, keywords are also a mess! They're also pretty sparse.

Dealing with synonyms and normalization is laborious. There's data-structure, data-structures, datastructures, ["data", "structure"]. Normalization of these is not as easy as it seems, e.g. kebab-case makes iOS into i-os and SSDs into ssd-s. Some names and acronyms appear plural, but aren't: bio != bios, hdf != hdfs. On lib.rs I have 4300 synonyms, 7000 categorization exceptions, 2000 associations between keywords and categories.

I've tried using StackOverflow's tag data, but they actually have very little data for tag synonyms, and their perspective is weirdly specific to their site, focused on SEO, .Net and webdev product names (webform -> winforms, jws -> java-web-start, ctrl-c -> copy-paste, span -> html). They maintain descriptions/wikis for their tags, so they can afford having ambiguous tag names.

Making good categories based on keywords is surprisingly hard. I thought it'd be easy to have everything categorized on lib.rs, but it's been an enormous time sink and endless whack-a-mole. "geo" is for geometry or geography, sometimes both. A crate tagged ["layout" "algorithm"] can be for GUI (widgets or window manager), or struct layout in rust-patterns or FFI, or be an app for visualising charts, or get lost in the bottomless pit of the generic "algorithm" tag. A crate tagged "database" can be a database engine, or a client for a general-purpose database, or a client for a dataset (postcode database), or a dataset itself (unicode property database). If you pick one broad tag, you have a mishmash. If you pick one specific tag, you have sparse data. If you try to pick combinations of tags, you have an endless list of missed cases, and exceptions to exceptions.


So overall I think this change is just rearranging fields, and improvement of categorization needs other work, that doesn't really require tags.

  • Many crates have zero categories and zero keywords. They won't have any tags either. There needs to be some mechanism to nudge authors to add metadata to their crates.

  • Authors seem to have trouble finding the right categories, and don't always pick the most appropriate ones. They seem to have difficulty understanding difference between similar categories, or difference between technical and general meaning of category names. acessibility is the worst example — it seems to be for "I can access something with this crate". IMHO they'd benefit from more guidance, e.g. an interactive category picker that explains the terms or asks clarifying questions. StackOverflow has an interactive picker for their tags with descriptions, plus crowdsourced moderation. Unprivileged SO users can't pick arbitrary tags, so they're really categories in disguise.

  • crates.io would benefit from reorganizing and growing the categories in a systematic way. This is equally needed whether they're category slugs or blessed tags. Picking tags to bless is not that different from analyzing keywords to pick new categories.

@fbstj
Copy link
Contributor

fbstj commented Sep 15, 2023

I don't know how relevant it is, but the tag curation effort that immediately came to mind whilst reading this is the AO3 tag wranglers, as they seem to have a "moderation team" style group dedicated to the curation of aliases and hierarchy.

I realise now that there's probably something more equivalent in the Wikipedia categorisation moderation space, which might be more applicable due to the public nature of Wikipedia governance?

Copy link
Member

@Turbo87 Turbo87 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a couple of suggestions but overall I'm very much in favor of this RFC.

I think combining a list of blessed (aka. "noteworthy") keywords with a system that allows users to create/use new keywords without interaction and discussion with the crates.io team will be very beneficial for everyone.

Having both concepts in the same package ecosystem has always confused me, and I'm very happy that we finally have a proposal to get rid of the confusion! :)

text/0000-cargo-tags.md Outdated Show resolved Hide resolved
Comment on lines 38 to 39
* It's now possible to add a single double-colon inside a tag to indicate a parent tag. This has the effect of adding two tags to a crate: for example, adding the `development-tools::testing` tag adds the crate to both the `development-tools` tag and the `development-tools::testing` tag. The part after the double-colon is called a subtag.
* On each side of the double-colon, the length of text may be up to 25 characters. That means that, including the double-colon, tags can be up to 52 characters long. (This is to accomodate the accomodate the largest category before the unification, `development-tools::procedural-macro-helpers`, although rounding up to a nice number for the actual limit.)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not convinced that we necessary need the colon support in tags. If we put that kind of meaning to it, it would probably have to be supported on the crates.io backend and I see that as potentially challenging if we allow users to invent new tags.

I also don't really see the need for it since #development-tools::testing could be replaced by #development-tools #testing and most likely still work equally well.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel like nesting is helpful to know what is an intentional refinement of a tag when browsing blessed tags like browsing the development-tools category

text/0000-cargo-tags.md Outdated Show resolved Hide resolved

crates.io can now add metadata to tags according to this policy. Immediately after the adoption of this RFC, this would likely start by converting the `categories.toml` configuration into a `tags.toml` configuration, with PRs affecting this file subject to the policy.

Tags with the same name as a crate may be automatically marked as "crate tags" if the majority of the crates with the tag have the tagged crate as a non-development dependency, optional or otherwise. At the discretion of the crates.io team, tags may be explicitly marked as non-crate tags, to account for cases of popular crates with generic names. This can help avoid marking a particular crate as "canonical" just because it's popular and has a generic name.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure I completely understand this part. Could you add an example for this?

text/0000-cargo-tags.md Outdated Show resolved Hide resolved
@kornelski
Copy link
Contributor

because categories cannot be removed, they also cannot be renamed or otherwise curated by the community.

Categories can be removed and renamed. crates.io could ignore existence of slugs it doesn't want, and create aliases for category slugs it wants to rename. On lib.rs I've completely deleted external-ffi-bindings, and renamed math to science::math.

text/0000-cargo-tags.md Outdated Show resolved Hide resolved

Tags with the same name as a crate may be automatically marked as "crate tags" if the majority of the crates with the tag have the tagged crate as a non-development dependency, optional or otherwise. At the discretion of the crates.io team, tags may be explicitly marked as non-crate tags, to account for cases of popular crates with generic names. This can help avoid marking a particular crate as "canonical" just because it's popular and has a generic name.

Tags that are deemed "noteworthy" can have metadata added to them. A tag is noteworthy if it:
Copy link
Contributor

@kornelski kornelski Sep 15, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

crates.io choosing which tags are "noteworthy" doesn't seem all that different from crates.io choosing which categories can be created.

To me this seems to be mainly a matter of creating a policy and a process, whether you call that categories or noteworthy tags.

New categories could be picked based on popularity of keywords.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, the main issue with that model is that if new categories are picked based upon keywords, users now have to upload new versions of their crates that use those categories. If keywords themselves can be treated as categories, those changes are applied retroactively.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only so far as they picked the "winners". With all of the different cases, etc, I still seeing all but a select few influential packages as having to make a change and do a new release.

bors added a commit to rust-lang/cargo that referenced this pull request Sep 19, 2023
doc/reference/manifest: Adjust `keywords` description

This adjusts the naming rules for keywords to match the implemented reality:

https://github.com/rust-lang/crates.io/blob/aab95692baa0dd2374a2ab5cb2cb2d89d7b2a2eb/src/models/keyword.rs#L56-L64

see also:

- rust-lang/rfcs#3488 (comment)
- rust-lang/rfcs#3488 (comment)
@Turbo87
Copy link
Member

Turbo87 commented Sep 29, 2023

we talked through this RFC in the crates.io team meeting today. I'll try to summarize:

  • the team is in agreement that we should keep keywords as they are instead of introducing a new "tags" concept.
    • introducing a whole new concept means a lot of additional work for the crates.io team, but also on the cargo side. since the benefits from this new concept are relatively slim, we would prefer to avoid that.
    • it would be great if the RFC could be amended accordingly, so that we have a place to discuss the finer implementation details.
  • the team is generally in favor of dropping categories, though some of the migration details will have to be discussed before we can move this RFC forward.
    • when people publish their crates, do we want to automatically convert the categories values to keywords for them? do we just ignore the categories?
    • how do we want to handle existing crates that only have categories, but no keywords? how many such crates are there?
    • do we need to increase the limit on the number of keywords per crate?
  • one internal implementation detail that was not mentioned in the RFC: the crates.io codebase is currently using the ltree PostgreSQL extension to implement nested categories. when moving away from nested categories towards flat keywords our code to support this would become quite a bit more manageable :)
  • regarding the work involved in curating the list of noteworthy tags vs. categories and any potential bias there: the major difference is that people can start to use keywords before they become noteworthy, while with categories they only become usable once the crates.io team has added them to the database. this makes it possible for us to retroactively promote keywords to be noteworthy after we have seen them actually used a lot by crate authors. this can avoid situations where e.g. https://crates.io/categories/science::neuroscience only has a single crate.
  • with the crates.io team in favor of the change, it is up to the cargo team to raise any objections from their side. with the change of keeping keywords, and not introducing "tags", the work on the cargo side should be relatively small though.

@rust-lang/crates-io I hope I summarized this correctly. If not, please let me know :)

@clarfonthey
Copy link
Contributor Author

Thank you for the detailed feedback! I'll try to address these over the weekend and remove the new "tags" in favour of just keeping keywords.

In terms of migrating categories over, my thought process was that categories would be implicitly treated as keywords, but we can work out the details on that after the revisions.

@clarfonthey clarfonthey changed the title Unify crate categories and keywords as tags Deprecate crate categories and allow metadata on keywords Sep 30, 2023
@clarfonthey
Copy link
Contributor Author

Okay, the big revision is done, and tags are no more, replaced with keywords. Please let me know if anything seems off; I may have overzealously search-replaced "tags" with "keywords" in ways that might not make sense.

Subtags are now "commonly paired" keywords, which allow the crates.io team to create some kind of keyword structure without there actually having to be one. There are probably still details to flesh out, but that's what the RFC process is for.


Recently, [a discussion](https://github.com/rust-lang/crates.io/discussions/6762) was opened in the crates.io repository on whether the cryptocurrencies category should be removed from crates.io, due to the plethora of issues surrounding them. This should not be treated as a reason for adopting the RFC (although it was a motivation to write it), but instead as something that brought up the fact that categories cannot be removed by policy.

Because categories cannot be removed, they also cannot be renamed or otherwise curated by the community. However, by switching to keywords, we can effectively solve this problem; community members can simply start publishing their crates under different keywords and older, unsupported crates wouldn't have any issue remaining published under their older versions.
Copy link
Contributor

@epage epage Oct 3, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@kornelski commented earlier

Categories can be removed and renamed. crates.io could ignore existence of slugs it doesn't want, and create aliases for category slugs it wants to rename. On lib.rs I've completely deleted external-ffi-bindings, and renamed math to science::math.

(moved here to avoid it getting lost in the noise)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, although crates.io still ultimately has to accept categories that were valid in the past as valid in the future, meaning they're not really removed. Like, I don't think it's very good UX to just passively ignore a category when uploading to crates.io, since the user added that for a reason and we shouldn't just get rid of it. And if we actually stopped builds because of it, that'd be breaking backwards compatibility.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

More so my point is that this motivation doesn't seem valid. The RFC doesn't address the alternative that @kornelski mentions: of having category merges, renames, and removals through a normalization process.

Like, I don't think it's very good UX to just passively ignore a category when uploading to crates.io, since the user added that for a reason and we shouldn't just get rid of it

That doesn't mean it can't be done but might be a downside to this alternative. That is something for the approving team to weigh in on with seeing your RFC and not for you to evaluate and leave out. Personally, I would be in favor of such an approach though it isn't my call.

Copy link
Member

@Turbo87 Turbo87 Oct 5, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I tend to agree with @kornelski and @epage. While the crates.io team has never deleted a category so far, it is technically possible with some effort to do so.

Suggested change
Because categories cannot be removed, they also cannot be renamed or otherwise curated by the community. However, by switching to keywords, we can effectively solve this problem; community members can simply start publishing their crates under different keywords and older, unsupported crates wouldn't have any issue remaining published under their older versions.

Probably easiest to just remove this part :)

Edit: looking at the surrounding diff that suggestion might not be the best..... 😅

# Guide-level explanation
[guide-level-explanation]: #guide-level-explanation

Categories in `Cargo.toml` are now deprecated. Setting `categories` will now trigger a warning and suggest to use `keywords` instead. To ensure that crates still build correctly in these cases, the provided `categories` are implicitly converted into `keywords`.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To ensure that crates still build correctly in these cases, the provided categories are implicitly converted into keywords.

This makes it sound like by adding a deprecation warning, crates using categories will no longer be able to build, so cargo must move categories to keywords.

However, a warning should not affect building. I can see crates.io doing the conversion on their side, since they have to deal with a variety of cargo versions. I can also see us removing support for package.categories in an edition with cargo fix automatically migration people. Those are both a little different from how I read this.

Copy link
Contributor Author

@clarfonthey clarfonthey Oct 4, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right, so, I definitely over-simplified in the guide-level explanation, and will try and take a look at it later to see how I can reword it. It should hopefully be clear from the reference explanation.

I don't really think we should ever remove the categories from the manifest, since the edition system would force us to parse it anyway, and it's not like we really gain anything by forcing crates past a certain edition to not use it. We could make the theoretical lint that triggers deny-by-default, but we also don't need an edition bump to do that either.

I do think that the potential to cargo fix is on the table, though, assuming we go through with this.

Ultimately, we don't want to break old builds, so, the best we can do is implicitly convert to keywords and tell people to not use categories. Breaking builds would be breaking backwards compatibility, after all, and that's not allowed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Its fine for the guide to simplify things; its just the guide made it sound like this was the only place explaining a part of what is happening.

I don't really think we should ever remove the categories from the manifest, since the edition system would force us to parse it anyway, and it's not like we really gain anything by forcing crates past a certain edition to not use it. We could make the theoretical lint that triggers deny-by-default, but we also don't need an edition bump to do that either.

If we are completely changing the interpretation of categories and deprecating it (warning maybe cargo fix), then I think it is worth considering migrating people to avoid confusion ("I have the categories field, where does it show up?", "I'm looking at categories, but what does it mean?").

Ultimately, we don't want to break old builds, so, the best we can do is implicitly convert to keywords and tell people to not use categories. Breaking builds would be breaking backwards compatibility, after all, and that's not allowed.

Except what is build-breaking that you are avoiding? That was mentioned in the RFC and in the reply but there isn't any explanation as to what you are referring to. Are you talking saying we can't remove the field altogether? Or something else? What made this more confusing is that the concern for "build breaking" is tied to the registries implicitly converting.

# Summary
[summary]: #summary

Categories for crates are now deprecated and implicitly added as keywords instead. A new set of policies is added to allow the crates.io team to curate the way keywords are presented, replacing features such as the "Popular Categories" list on crates.io with a "Popular Keywords" instead.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm wondering if we should split this RFC up into two RFCs... 🫣

The current one is conflating 1) crates.io no longer showing any categories on the webpage and ignoring them for published crates, and 2) deprecating the categories field in Cargo.toml for all registries.

The first one is more specific to crates.io, while the second has larger consequences also for third-party registries and cargo.

While there might be some overlap I'm wondering if the first part would be easier to get consensus on, and then the second part may need more discussion, also with e.g. some third party registry maintainers.

Disclaimer: I'm not saying that we absolutely have to split the RFC up, but I'm wondering what others think about that. It might also resolve the "what team is responsible for this RFC" question, with crates.io being responsible for the first part and the cargo team for the second part?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For me, if we split it, it would be for procedural reasons but that they should be approved hand-in-hand.

I feel that approving one without ahead of the other applies an innate pressure on the other team to approve something.

Using an extreme to illustrate my point: Say the cargo team put up and approved a RFC for removing categories from manifests all by itself. In a way that would compel the crates.io team to do something.

Comment on lines 29 to 31
1. Instead of being gatekept by a PR to the crates.io repository, categories can organically be adopted by community members in the form of keywords. Which keywords are most popular and useful can be decided organically.
2. Keywords can still be given descriptions and other metadata on crates.io, although no distinction between these "special" keywords and other keywords is made in cargo itself. This allows making changes to the way crates are presented without having to worry about backwards compatibility.
3. Adding, removing, and modifying the curated set of keywords is no longer a technical choice, but a cultural one.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we might want to include some of the points I mentioned in #3488 (comment) as motivations too

@clarfonthey
Copy link
Contributor Author

Thanks to @Turbo87 we now have a completely revised version of this RFC which adds historical context and some more analysis. It's still probably going to be changed before the final version, but hopefully things are a lot clearer now!

to the one the user is currently looking at can help with this though.

- **Clarity:** In a flat namespace the chance of a single keyword being used for
multiple purposes is slightly increased. With nested hierarchies, the parent
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Personally, this reads as having a lot of what look like opinions that bias the reader to the opinion of the author.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not quite sure what you mean here. It is difficult to be truly neutral in the analysis (especially since the RFC is in favour of a particular solution) but I do think that if we can change the wording to be more neutral, that's a good idea.

Comment on lines 264 to 265
anymore. Eventually the crates.io server will also start to ignore the `categories`
field for new uploads and remove the existing categories data from the database.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ignoring categories is a silent breaking behavior change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is mostly why I originally suggested treating the categories as keywords in the original RFC, but after @Turbo87 did some research, this would most likely just lead to a lot of duplicates since users who use categories often use keywords, and they don't necessarily use the same ones for each.

That is a strict change in behavior though, although I'm not sure if it's considered a breaking change since we haven't really defined what a breaking change means in these cases. Do you simply classify it that way, or are there specific cases where this can cause crate builds to break?

Comment on lines 358 to 361
- Should the `categories` field be deprecated in cargo too?
- Are third-party registries using categories?
- How much usage of cargo is with crates.io vs. third party registries?
- Would users be confused if it was only deprecated on crates.io but not in cargo?
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

imo if we move forward with this RFC, we should deprecate it and should be in an adjoining RFC that get approved together. We can transition out categories on an Edition boundary, having cargo fix --edition do the merge in Cargo.toml.

Copy link
Contributor Author

@clarfonthey clarfonthey Oct 28, 2023

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think that moving the true "removal" of categories (in the sense that Cargo no longer sends them) to an edition boundary would be okay, although I'm not quite sure how editions would work here since registries aren't really controlled by editions, and crates.io also isn't controlled by editions. Since crates.io would be effectively ignoring categories down the line, this would mostly just require users to use an older cargo edition to use categories on their registries, which feels like a weird thing to support.

In terms of splitting the RFC: which parts exactly do you think would be better suited to an adjacent RFC? The main reason for combining the two is that, if the plan is to coordinate merging both at the same time, there doesn't seem to be much benefit to splitting them, but I'm also not 100% sure what split you're thinking of.

Regardless, I think that the deprecation of categories should at least be followed with an increase in the number of allowed keywords at the same time, to ensure that users have the ability to adequately tag their crates. We could potentially discuss some of the other features separately, though.

@epage
Copy link
Contributor

epage commented Oct 27, 2023

Personally, I think this is still premature and we should be trying out some of the ideas mentioned here to vet them out before we commit to relying on them to make up for the loss of a closed vocabulary.

@clarfonthey
Copy link
Contributor Author

I definitely think that there's room for doing a "soft" rollout to these features after the RFC is merged, since they by their nature depend on the community using them to work. I don't think that even if we accepted this RFC as-is that we'd be locked into the approach, although I'm also not sure what the general policies are for the cargo and crates.io teams to experiment before an RFC is accepted.

I think that experimentation is especially important here since we don't know 100% how the community will respond to the changes, but the primary idea (replacing a closed vocabulary with a curated open one) is still there. What kind of vetting do you think should be required to make a change like this?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
T-cargo Relevant to the Cargo team, which will review and decide on the RFC. T-crates-io Relevant to the crates.io team, which will review and decide on the RFC.
Projects
Status: RFC needs review
Archived in project
Status: Unreviewed
Development

Successfully merging this pull request may close these issues.

7 participants