Page MenuHomePhabricator

Wikidata tool request: Users can enter an ISBN to see if a Wikidata item exists, if not it makes a new one based on the data available (works like 'Cite' tool on Wikipedia)
Open, Needs TriagePublic

Description

I would really love a tool that worked a bit like the 'Cite' on Wikipedia, that can tell you if there's already an item for an publication with an ISBN and if not it goes to get the data to make a new one for that edition. The process of using it would be something like:

  1. User enters an ISBN on a page and it says if there is an item on Wikidata which has this ISBN
  2. If there is already an item which uses this ISBN the tool suggests extra statements to add that the user just clicks to add them (using the FRBR model). The tool gets the data from the same place the Cite tool on Wikipedia does and users can add additional ones by hand also.
  3. If there isn't already an item using this ISBN, the tool prefills a form for the new item which the user click to create (using the FRBR model structure). It then looks for any item on Wikidata with the same name and author to find other editions and then connects them together using which ever statement is needed.

Questions

  1. Is this possible/realistic? Would be a really nice thing to have.
  2. Would this rely on Citoid being installed on Wikidata or could it get the info from somewhere else? (T199197)
  3. I'm sure there could also be an option to enter multiple ISBNs but I don't know what's the best layout for that.
  4. Also how could I propose this as a project for the Wikimedia Hackathon? I don't know if there's a category for ideas from non programmers?

Similar tools

  1. Magnus Manske built ISBN2wiki https://isbn2wiki.toolforge.org/

Tasks

  • Find which databases exist for ISBN lookup
  • Identify licenses for databases
  • Get clear community consensus on how to model something with an ISBN
  • Lots more steps (to be defined)

Thanks

Event Timeline

If someone ever develops this, it needs to be done with the greatest of care. If not, such a tool could produce excessive duplicate and wrongly modeled items, producing a huge amount of work for the community to clean up.

See https://www.wikidata.org/wiki/Wikidata:WikiProject_Books for some input on modeling. This community needs to be consulted.

  • Books need to conform to the FRBR description model, in order to conform with the level of data quality that libraries also apply. This tool needs to make sure all necessary items on work and edition/translation level are checked and created if necessary.
  • Make sure the end user, even if they are beginners, do correct matching with existing authors, publishers, locations of publishing. I'm actually not sure what would need to be done if new items need to be created at this level... do we even want that?
  • Avoid creation of duplicates (duplicate works, editions, authors, publishers, locations of publishing).

Other input very welcome. I'd rather see no tool than one that introduces large amounts of erroneous data.

If someone ever develops this, it needs to be done with the greatest of care. If not, such a tool could produce excessive duplicate and wrongly modeled items, producing a huge amount of work for the community to clean up.

See https://www.wikidata.org/wiki/Wikidata:WikiProject_Books for some input on modeling. This community needs to be consulted.

  • Books need to conform to the FRBR description model, in order to conform with the level of data quality that libraries also apply. This tool needs to make sure all necessary items on work and edition/translation level are checked and created if necessary.
  • Make sure the end user, even if they are beginners, do correct matching with existing authors, publishers, locations of publishing. I'm actually not sure what would need to be done if new items need to be created at this level... do we even want that?
  • Avoid creation of duplicates (duplicate works, editions, authors, publishers, locations of publishing).

Other input very welcome. I'd rather see no tool than one that introduces large amounts of erroneous data.

Hi Sandra

I've updated the description to include the FRBR model as the basis for creating and matching items. Is it logic to say that new items the based on ISBN number (or other similar ID) should be Instance='edition' in Wikidata?

One special case of this is publications of proceeedings of scientific conferences. I am intending to have a wikidata backend for my Proceedings Title Parser at ptp.bitplan.com anyways so i could add an "ISBN" input mode.

42nd European Conference on IR Research, ECIR 2020, Lisbon, Portugal, April 14–17, 2020, Proceedings, Part I
//

This special usecase might be a good starting point for the feasibility of the more general approach to do the entry for any ISBN

I have added Wikidata ticket to the Proceedings Title Parser github repository that is cross-linked to this one.

Hi Sandra

I've updated the description to include the FRBR model as the basis for creating and matching items. Is it logic to say that new items the based on ISBN number (or other similar ID) should be Instance='edition' in Wikidata?

No, you're trying to simplify something that is not that simple.

On Wikipedia, which only uses text strings, I guess it's fine to take an ISBN number and then use a database to look it up and retrieve strings of text to build up a (single) reference. On Wikidata, where we are dealing with multilingual structured data, a 'book' is a cluster of sometimes dozens of multilingual layered entities.

To make it very clear, as I think I didn't succeed in doing that with my earlier comment: I think it is NOT a good idea to ask for a tool for this. It's not simple matter that should be automated. This is stuff that needs knowledgeable human review.

If someone has a ISBN number and wants to 'have that ISBN on Wikidata', they should need to do at least the following checks (and this list is non-exhaustive; I'm actually not a librarian so I will probably miss important things and get some things wrong):

  • Is there already any work-level and/or edition-level item that corresponds with the book? If so, any new items need to correspond correctly with the existing ones. If not, new items need to be created for both levels.
  • A check needs to happen whether the ISBN is of a translation. Maybe we already have Wikidata items for the (original language) work, and/or for an edition of a translation in another language. If that is the case, correct connections need to be made.
  • The book needs *both* a work-level item and an edition-level item. (This is exactly what FRBR is about.)
  • Both the work item and the edition item need the correct statements in the right place: author, publisher, date of publication and other metadata. Correctly. I don't know of any centralized databases that get this stuff right where we can retrieve this; this step probably needs deeper human research than just a check with a (any) database. Worldcat is (in my experience) pretty messy at this point so please do NOT use Worldcat as a sole source.
  • If lookups on Wikidata need to happen for the right author and publisher, that needs to be done _with care_ - if the author, is, say, named John Johnson where we have dozens of people with a similar name, a thorough check is needed to make sure the right one is chosen. In my experience, it is always best to actually go on Wikidata and do a bit of (non-tool) research at this point, because there may be items for humans with just initials or with a slightly different spellings that can't be caught by a tool.
  • If an item for the author/publisher is not found, should a new one be created? I'm not sure - there's so much opportunity for introduction of errors and duplicates here.

Let me state it very clearly: I'm not a fan of this task at all. I think describing books on Wikidata is simply too complex for a single-purpose tool. If describing books correctly is an intricate piece of reality that needs a few years of education in library science, then it's not something that we can magically simplify on Wikidata by 'toolifying' it.

Although I have never been able to use it due to bugs, sourcemd has a button to add books by ISBN. I do not know the details, or the community discussions, leaving it here just in case it is useful to anyone:

image.png (73×916 px, 17 KB)