Commons talk:Machine-readable data

From Wikimedia Commons, the free media repository
Jump to navigation Jump to search

Very good

[edit]

Very good addition. Microformats and other tags are sometimes added by people but as the template changes and pieces are cut and pasted than Machine-readable data can get out of wack. We could use some documentation of how to test correctness of those tags, and have regular check on some list of templates that suppose to have them. We could also use some more info on why it matters how can people benefit from it. See Template_talk:Artwork/Archiv/2011#Problem_with_microformats_in_artwork_template for related discussion of tags in other templates. --Jarekt (talk) 13:58, 6 March 2012 (UTC)[reply]

See Template_talk:Book#Machine-readable_data Jean-Fred (talk) 17:54, 8 March 2012 (UTC)[reply]

Commons namespace?

[edit]

Should this perhaps be moved to Commons:Machine-readable data, since the "Help:" namespace is usually (but not always) reserved for more general MediaWiki-related help? - dcljr (talk) 15:33, 9 March 2012 (UTC)[reply]

✓ Done, indeed, thanks. Jean-Fred (talk) 22:04, 14 March 2012 (UTC)[reply]

Other similar page

[edit]

I stumpled upon Commons:Machine readability as I was looking for this page. That page is from 2008 and as far as I can see contains incorrect, not up to date information? Should we just replace it with a redirect to here, or is there something to be kept? /skagedaltalk 21:54, 27 March 2012 (UTC)[reply]

 Agree those should be merged --Jarekt (talk) 13:04, 6 September 2012 (UTC)[reply]
✓ Done redirect. I updated this page as well, since that one referenced some fields this page didn't have; if anything else looks useful to merge, feel free. Rd232 (talk) 13:26, 6 September 2012 (UTC)[reply]
I was thinking about adding information about {{Book}} template markup and parallel markup scheme using en:microformats. There is also discussion about adding <td> id tags to {{Creator}}. Once that is done we should document it too. --Jarekt (talk) 14:20, 6 September 2012 (UTC)[reply]

Readibility from other projects

[edit]

Just to underline that any "machine-readable data" is happily exported into any wiki project and can be read into html of File: page just as other local data (i.e. without any trick to avoid AJAX limitations by "same origin policy"). I'm going to parse those data from it.wikisource, both to load them when creating new pages, and to use them to align data content of old pages. I opened two threads into wikitech-l and wikisource-l about. --Alex_brollo Talk|Contrib 15:36, 30 August 2012 (UTC)[reply]

Indeed. I addded so to the page. Jean-Fred (talk) 19:34, 30 August 2012 (UTC)[reply]

Template:Information#Microformat section contains information about en:Microformat markup used by {{Information}}, {{Creator}} and possibly other templates. I think this information should also be included here. --Jarekt (talk) 13:04, 6 September 2012 (UTC)[reply]

Yes. Rd232 (talk) 16:56, 6 September 2012 (UTC)[reply]

<td> id attributes

[edit]

A strange thing about id attributes added to <td> HTML elements, is that they are added to the <td> cell with the field name not the one with the field content. This creates a problem for {{Creator}} and {{Institution}} templates that have some cells with field values but no field names ("name" parameter) and some cells sharing a single name field ("Date of birth/death" fields). What should be done to those? If I add the IDs to the value fields than some templates will have them one way and some the other which can be quite confusing. --Jarekt (talk) 02:50, 14 September 2012 (UTC)[reply]

I noticed that recently. It's very strange, and fixing it is a headache - changing it may break things that expect the ID to be attached to the field name. Really we need a whole new set of ID attributes, eg of the form FIELDNAME_value. Rd232 (talk) 05:46, 14 September 2012 (UTC)[reply]
That is a good idea. I will propose adding ID's with FIELDNAME_value names to Creator and Institution templates and than start discussions about adding those to {{Information}}, {{Artwork}} and {{Book}}. That is going to be a lot of discussions. --Jarekt (talk) 12:37, 17 September 2012 (UTC)[reply]
Just to tell thet I wrote a jQuery "parser" to get back data fron Information and Book templates, even if they contain one or more Creator templates. It gives back a js object, main key is Book/Information fileinfotpl_key, contents are a string or a nested object where keys are fileinfotpl_creator_key. This structure is needed since multiple Creator templates give omonym IDs, i.e. same IDs are produced for author and illustrator. I added too a more key name to Creator data, since - strange to say - this datum hasn't its ID. So, if r is the name of js object, r["aut"]["name"] gives the name of author, while r["book-illustrator"]["name"] gives the name of illustrator. By now, html is storet, some more parsing is needed to get "clean" data. You can find test sctipsts in it:s:Utente:Alex brollo/ParsingHproduct.js. --193.43.176.15 15:01, 24 September 2012 (UTC)[reply]
193.43.176.15 (user:Alex brollo I presume), You mentioned that Creator name does not have its ID. I can add that gender, nationality, occupations, sort-key, and probably other data does not have them either. That is because that data is not stored in a separate <td> cell. Is there some other way to ID them. By the way, Creator's name has vCard.fn tag, is that useful to you? --Jarekt (talk) 15:27, 24 September 2012 (UTC)[reply]

Relevant discussion

[edit]

Please see Commons_talk:EXIF#Commons:Metadata_redirects_here. --Piotr Konieczny aka Prokonsul Piotrus Talk 16:55, 16 September 2012 (UTC)[reply]

Classes in description

[edit]

Hi

Some input on Template_talk:Description#Fetching_description_in_a_given_language would be welcome.

On a related note: it would be nice if the code generated by using {{Fr}}/{{En}} and {{Mld|fr|en}} would be the same.

Jean-Fred (talk) 16:40, 17 October 2012 (UTC)[reply]

What about creating a template for machine readable data of license templates

[edit]

…so it looks like this:

{{Machine-readable-data/license
  | template_name= STRING
  | short        = STRING
  | long         = STRING
  | attr_req     = BOOL
  | attr         = STRING
  | link_req     = BOOL
  | link         = STRING
}}

instead of

<span style="display:none" class="licensetpl_STRING">
<span class="licensetpl_short">STRING</span>
<span class="licensetpl_long">STRING</span>
<span class="licensetpl_attr_req">BOOL</span>
<span class="licensetpl_attr">STRING</span>
<span class="licensetpl_link_req">BOOL</span>
<span class="licensetpl_link">STRING</span>
</span>

This would deobfuscate the whole situation, I believe. Thoughts? -- Rillke(q?) 23:15, 11 November 2012 (UTC)[reply]

List of templates that use machine readable markup

[edit]

Is there a way of automatically constructing a list of templates that use, for example, fileinfotpl_aut? I'm thinking of user created templates such as http://commons.wikimedia.org/wiki/User:Biopics/infong

HYanWong (talk) 12:12, 28 February 2013 (UTC)[reply]

I do not know. The lists on this page were created by reading the template source code. --Jarekt (talk) 14:27, 28 February 2013 (UTC)[reply]

Extension development

[edit]

Things are moving: mw:Requests for comment/Image information. Jean-Fred (talk) 14:24, 11 March 2013 (UTC)[reply]

Images with text, but no Artwork

[edit]

In {{Artwork}} exist parameter inscriptions, this table row is marked with attribute fileinfotpl_art_inscriptions. We have many images with text outside of artwork, like monuments, plaques, and other with text is not readable for searching. So i have created an extension for Template:Information this add a new row labeled with "Inscription" (singular) and than does the same as {{Inscription}}. If this an god idea? I will transfer the template later form my userspace to Template:Inscription field.

Is it possible that inscriptions with this template in the future easy transfered to Wikidata d:Property:P438? Or should we mark it otherwise? Like mark inscription text direct with <span itemprop="inscription">...<span> --MatthiasDD (talk) 11:59, 10 November 2013 (UTC)[reply]

You can accomplish the same with:
{{Information
| Description  = Description
|other_fields_1=
{{Information_field |name={{i18n/inscription}} |value=
 {{inscription |1=monum. Latine |full form=monumenti Latine |position=bottom
   |transliteration=Transliteration |language=la 
   |de=Deutsche Übersetzung |en=English translation |fr=traduction française }}
}}
| Date         = 2013-10-22
| Source       = {{own}}
| Author       = Author
}}
Description Description
Inscription
InfoField
bottom:
monum. Latine
[monumenti Latine] -Transliteration- [English translation]
Date
Source Own work
Author Author
So I do not think there is a need for a new template. Also I do not like the templates that you add to the end of the description field and they add a row. We had complains about those that they produce non-valid html which some browswers handle and some not. That is why we have other_fields_1 or other_fields fields so we do not have to do it. Finally I do not like copying of the content of the {{Inscription}} template. You should not cut and paste a content of other templates when a call to the original template would be equally easy. It just complicates maintenance. --Jarekt (talk) 04:28, 11 November 2013 (UTC)[reply]

Yes, I knew this version, but I think it is not easy to use like a wiki, this code of nested templates is more like a programming language. My version have the advantage that we can change the template and the inscriprion is write in the description field - if we need this in the future. I have an idea to change the new Template so that the user can write: |other_fields_1={{Inscription field |1=monum. Latine |...}} This Template can call {{Information_field |class="inscription" |...}} and {{inscription |...}}. But this can mark the new row only with <td class="fileinfo-paramfield {{{class|}}}"> and not with <td id="fileinfotpl_art_inscriptions"> or other markers for machine readable data. --MatthiasDD (talk) 22:51, 14 November 2013 (UTC)[reply]

I would be fine with writing a template you described and you can make it add <td class="fileinfo-paramfield inscriptions"> field. --Jarekt (talk) 04:08, 15 November 2013 (UTC)[reply]

Now, i have changed the template (see my userspace). The class is <td class="fileinfo-paramfield inscription"> (singular). Is it right so? --MatthiasDD (talk) 21:28, 22 November 2013 (UTC)[reply]

[edit]

{{Copyright information}} is used to add licenses that refer to some underlying work the image was derived from (e.g. copyright of a statue which is visible on the photograph). This messes up the machine-readable data since there is no algorithmic way to tell that the license is not about the file but some parent work. This results in Bugzilla57465 in the extmetadata API.

Internally, {{Copyright information/row}} is used with the underlying=yes parameter; the solution would be to add some machine-readable markup when that parameter is present, so that the license can be ignored or interpreted in a more nuanced way.

As an example, something like this could work:

<div data-mrd-scope="restoration">[...license HTML...]</div>

--Tgr (WMF) (talk) 19:07, 22 July 2014 (UTC)[reply]

Microformat is dead

[edit]

This system should be changed to utilize microdata and/or RDFa. Microformat has never been a proper standard and is now more or less dead. It is sort of usable as an internal solution, but if we want this to have real importance it should be made in some proper ways, and that is most likely microdata or RDFa. The first one is the one most similar to microformat, while RDFa is probably more flexible. Jeblad (talk) 21:34, 21 August 2014 (UTC)[reply]

99% of users (me included) only look on visible parts of the templates or pages. I am not aware of any tools or processes relying on machine-readable data, so as a result it is hard to tell who (if anybody) might be impacted by such changes. Machine-readable data should be designed by the people who might use it or other stake-holders, while me and other users maintaining templates will be happy to add any tags, or microformats which have consensus of the stakeholders. --Jarekt (talk) 22:20, 21 August 2014 (UTC)[reply]
Microformat was an attempt to make pages machine readable. Pages will be read by machines, that is part of indexing the pages. Use microdata and RDFa, that is the proper way to do it if you want to be part of the semantic web. Jeblad (talk) 05:45, 22 August 2014 (UTC)[reply]
Everything on that entire page is dead. Our entire methodology is completely braindead and any improvement should be with the sole purpose of improving it to a degree that we can write software to migrate it to a more sane system. That was always the plan, even when we were working on stock photo a couple of years ago. —TheDJ (Not WMF) (talkcontribs) 12:46, 26 August 2014 (UTC)[reply]
Commons:Structured data? --El Grafo (talk) 13:07, 26 August 2014 (UTC)[reply]

Template:Art Photo and MediaViewer

[edit]

Hello, {{Artwork}} suggests to use {{Art photo}} for works where different licenses are required for the depicted artwork and the photograph of it. However, the current implementation of MediaViewer produces nonsensical output under some conditions (see more detailed report here), not mentioning the photographer as the copyright holder of the image. Not sure if this will be fixed with the upcoming version of MV, but It doesn't work in the recent design prototype as well, so it seems like a more complex problem. --El Grafo (talk) 09:51, 15 September 2014 (UTC)[reply]

Many works require you to specify different licenses for different aspects of the artwork. All sculptures require copyright tags for both sculptor and the photographer. Derivative works could require specifying licenses for both the original and the copy. One can envision scenarios where we are dealing with several authors each from different country or century, see Commons:Multi-license copyright tags for some usual approaches to deal with them. When we add to this that each author might require specifying licenses for the country of origin and the US, or that some recent photographs might be multi-licensed (for example CC and GFDL) and we might end up with a lot of different license templates in an image, while current MediaViewer can handle only one. I ma not sure what to do about it. Bug report? --Jarekt (talk) 16:26, 15 September 2014 (UTC)[reply]

Machine-readable data for non-free images

[edit]

The machine-readable format described here is used on several other wikis (such as en.wikipedia) which allow non-free content; this usually means that any template informing about legal status will be marked up as a license template, and we end up with "licenses" such as fair use. To an extent this is OK (the confusion between license and legal status already exists on Commons, thanks to the PD templates, and legal statuses are usually displayed the same way as licenses), but it can be misleading when using machine-readable data to inform potential reusers.

To avoid this, the markup standard on COM:MRD should be extended to inform clients whether the image can be freely reused or only with limitations. A simplistic method for that could be to add a licensetpl_free field with the same syntax as licensetpl_attr_req. What do you think?

(Pinging @Guillaume as this could be a candidate for inclusion in the m:File metadata cleanup drive.)

--Tgr (WMF) (talk) 11:57, 15 October 2014 (UTC)[reply]

Sounds good to me. FYI, I've started m:File metadata cleanup drive/How to fix metadata, which notably includes a section about non-free media. Feedback is welcome (before I reach out to wikis with local uploads). Guillaume (WMF) (talk) 15:00, 15 October 2014 (UTC)[reply]

On second thought, maybe call it licensetpl_nonfree? That would make it clearer that the default is free (I imagine the majority of license templates / copyright tags are about free images). --Tgr (WMF) (talk) 20:39, 20 October 2014 (UTC)[reply]

✓ Done :) https://meta.wikimedia.org/w/index.php?diff=10281719 . Guillaume (WMF) (talk) 08:56, 22 October 2014 (UTC)[reply]

Identifying information-like templates

[edit]

Currently {{Information}}, {{Artwork}}, {{Photograph}} and (to some extent) {{Book}} all emit the same machine-readable markup, so a client of the COM:MRD standard cannot easily tell them apart. This is problematic because e.g. the author of a photograph and the author of the statue that's visible on the photograph cannot be used interchangeably in most cases. This leads to outright copyright violations by clients in some cases (see Template talk:Art Photo#Issue with MediaViewer for lots of details). So there is a need to label these templates in a machine-readable way.

I propose putting the classes fileinfotpl-type-information, fileinfotpl-type-artwork, fileinfotpl-type-photograph and fileinfotpl-type-book on the top-level <table> elements of these templates. --Tgr (WMF) (talk) 11:19, 21 October 2014 (UTC)[reply]

Sounds good to me. Anything we can do to help resolve this particular issue is certainly welcome! Guillaume (WMF) (talk) 08:44, 22 October 2014 (UTC)[reply]

This properties page now needs xwiki pointers as it has become a Wikimedia default help page

[edit]

@Guillaume (WMF), Bawolff, and TheDJ: With the global change to categorisation of files, and the adaptation of classes as defined and utilised (here) now being applicable to all WMF wikis. We need to do some more about promoting the classes used, and assisting communities to update. While I have seen information about the categorisation changes, I have not seen obvious helpful information about how to fix.

It is unusual for Commons: to be the place for WMF-wide documentation, and/or the configuration of a WMF-wide standard, but now it is, presumably by weight of being the "file place". Noting that usually Meta hosts such information, or sometimes we find it at [[mw:|Mediawiki] if it is broader again.

So we do actually have a meta page with information and it is at m:File metadata cleanup drive, however, that is not a page, by its name, that entices people to go and visit for standards, it is its own project. And we do have mw:Extension:CommonsMetadata that is the extension that implements these changes but it is pretty generic, and doesn't assist in compliance.

We need to look to how we express a (new/now) universal Wikimedia standard, and have it widely available, easily readable, and easily findable. I also think that the data as expressed on the general page needs to be known to every wiki, and we should be looking to how that is to be done better now that we have introduced a new baseline. I am wondering what people see as the alternatives to make this happen.  — billinghurst sDrewth 01:56, 22 October 2014 (UTC)[reply]

billinghurst: I wrote a page about how to fix the metadata at m:File metadata cleanup drive/How to fix metadata; I haven't advertised it very widely yet because it's being translated. That page could easily be renamed when the cleanup drive is over and made into a more reference-like documentation page. This Commons page contains a lot of information that is Commons-specific and that most wikis don't need (e.g. Artwork-related classes), and the how-to-fix page on Meta focuses on the most common IDs and classes. Does that address your concern? Guillaume (WMF) (talk) 08:39, 22 October 2014 (UTC)[reply]

Machine readable data on MIT license template

[edit]

Hi,

I added machine-readable data to {{MIT}}. Could someone please check whether I did appropriately − both on the tech side (though I am not really worried, data is now correctly displayed by TheDJ’s tool]) and on the license side? Thanks! Jean-Fred (talk) 14:23, 2 November 2014 (UTC)[reply]

Machine-readable markup for languages/language names

[edit]

There are several ways of internationalizing content on Commons (and even more on other wikis) which output multiple languages and/or automatically put a language name before the text. A machine reading the page has to be able to 1) realize that the given machine-readable field contains the same information in multiple languages, 2) identify which piece of text belongs to which language, 3) identify which piece of text is a language name (which makes sense on the wiki page but should be hidden in some other contexts). Currently there is no standard for this; e.g. {{Description}} and {{Ls}} produce identical looks but wildly different markup:

{{description|en|foo}}
English: foo
<div class="description mw-content-ltr en" dir="ltr" lang="en" style="" xml:lang="en"><span class="language en" title=""><b>English:</b></span> foo</div>
{{ls|en|foo}}
English: foo
<div class="en lang-en" lang="en" style="margin:0.3em 0;line-height:1.2;direction:ltr;" xml:lang="en"><span class="langlabel-en" lang="en" style="font-weight:bold;" xml:lang="en">English:</span> foo</div>

CommonsMetadata currently understands {{Description}} but not {{Ls}}. Before fixing that, I would really like to see a standard way of marking up languages so that COM:MRD can be used as a reference when creating such templates/modules and clients don't have to identify and support a dozen competing and potentially unstable alternatives. --Tgr (WMF) (talk) 11:12, 11 November 2014 (UTC)[reply]

There are more templates that set the identify the language (or should identify the language): {{LangSwitch}}, {{Multilingual description}}, {{Translation table}}, etc. I have never heard of {{Ls}} but I think it is related to {{Multilingual description}}, as they both rely on m:Meta:Language select. They all should mark the language in a similar way, except that {{Multilingual description}} and {{Ls}} use <div class="multilingual"> marking are designed to hide description in languages you do not know (I never liked that approach, since pages using it, in the past not work correctly and hide wrong parts of the description, I do not know the current status). Another difference is that {{LangSwitch}} and {{Multilingual description}} do not visibly identify the language the way {{Description}} does. But I agree the underlying Machine-readable data should be the same. --Jarekt (talk) 13:36, 11 November 2014 (UTC)[reply]
@Jarekt: {{Ls}} is used by {{Multilingual description}}, yes. It does visibly identify the language though.
How about the following standard (made by merging some non-visual elements of {{Description}} and {{Ls}}):
  • the language-specific blocks should have a lang attribute (the plain HTML one, not xml:lang)
  • the language names should have class="language"
  • the whole multilingual block can be wrapped in a tag with class="multilingual" if it would not be otherwise clear where it starts/ends
None of those classes are used for styling (on Commons at least), {{Description}} already conforms to this and {{Ls}} can be made to conform with some trivial changes. --Tgr (WMF) (talk) 02:23, 25 November 2014 (UTC)[reply]
I know very little about Machine-readable tags so hopefully some more knowledgeable colleges will also take part in this discussion. I am fine with whatever changes seem appropriate that do not change the appearance and do not break existing tools and process (do we even know who uses Machine-readable tags). About class="multilingual", it looks very much like <div class="multilingual"> used by m:Meta:Language select to make some parts of the description to magically disappear (I do not understand the process). I do not thing we want text marked with {{Description}} to disappear. You are also saying that the whole multilingual block can be wrapped in a tag with class="multilingual" - this might be advantageous but with several {{Description}} blocks on a page I do not see a way to do it by changing existing templates. We have 17 M pages using {{Description}} blocks without any starts/ends marking, so it would be hard to add it. Do you have any thoughts about {{LangSwitch}} - it shows only one language byt it could have Machine-readable tags in many. --Jarekt (talk) 04:15, 25 November 2014 (UTC)[reply]
The two tools using CMD:MRD that I am aware are the CommonsMetadata extension (and through that MediaViewer, the mobile media viewer and the OCG service) and the StockPhoto gadget.
You are right about multilingual - something that has a visible function should not be used for metadata. Maybe something like language-list then? Anyway, this would be optional, for fields if the Information template there are other ways to figure out where the language list starts/ends, so I would be fine with just the first two items from the list.
For LangSwitch there is no way to get the full list of the languages (for a machine using the HTML output of the page, anyway).
--Tgr (WMF) (talk) 20:51, 1 December 2014 (UTC)[reply]
Looks like we are stuck :) Jarekt, do you have any idea who could be asked to comment on this? --Tgr (WMF) (talk) 21:50, 13 January 2015 (UTC)[reply]

Queries

[edit]

We were asked to identify patterns, perhaps making some queries and lists helps? See Commons:Machine-readable data/Queries for a simple example.

When categorising stuff, it's often useful to go through uncategorised media by day categories because all files belonging to a same group upload tend to be together; we can probably identify similar useful divide et impera procedures. --Nemo 22:40, 11 December 2014 (UTC)[reply]

For example, some surgical editing of Template:Blason-fr-en can probably fix source information for almost 10k files. --Nemo 22:51, 11 December 2014 (UTC)[reply]
[edit]

I suggest adding class="otrs-permission-ticket-link" to make OTRS permission information machine-readable in Template:PermissionOTRS. Please give any inputs making this a new standard. – Kwj2772 (talk) 12:22, 12 December 2014 (UTC)[reply]

It is always easy to add machine-readable tags, but harder to change them latter since someone might be relying on them. So adding a tag should be fine, but others should say if the format is OK. --Jarekt (talk) 03:07, 13 December 2014 (UTC)[reply]

Template:Spoken article entry is used on quite some files as main template and might benefit from machine readable data. Another thing is that images with Template:Book end up in the no-machine readable author and source categories en masse, while they actually have tags (wrong formatting?). See this question. This could result in a few hundred thousand of those cases solved and make the no-machine readable author and source categories more workable. Mvg, Basvb (talk) 21:28, 26 December 2014 (UTC)[reply]

[edit]

CommonsMetadata checks for restriction-* classes now (as a part of phab:T77717, as {{Trademarked}} used that classname). Any objections to applying them to the other restriction templates? ({{Insignia}}, {{Nazi symbol}}, {{Copydesign}}, {{IHL Symbol}}, {{Personality rights}}, {{Currency}}, {{Fan art}}, {{Costume}}, {{2257}}. {{Romania personality rights}}, {{Australian Commonwealth reserve}}, {{Soprintendenza}}, {{Italy-MiBAC-disclaimer}} are the ones I could find). --Tgr (WMF) (talk) 20:38, 16 June 2015 (UTC)[reply]

And {{Communist symbol}} too. Sn1per (talk) 20:44, 16 June 2015 (UTC)[reply]
In general that would be fine although I am not sure how much review some of these tags got. --Jarekt (talk) 01:27, 17 June 2015 (UTC)[reply]

Derivative works

[edit]

So, I think derivative works need some love from this standard, and I've drafted a change to the {{Derived from}} template that should work fairly well. Head over to Template talk:Derived from/Machine-readable to see the results of my experiment, check out {{Derived from/Machine-readable}} and Module:Derived from for my work, let me know what you think and whether I need to explain it more.

Basically the structured data looks exactly like what I suggested on Template talk:Derived from#Machine-readable format, with class="fileinfo-sourcefile" denoting a file from which the file was derived, and id="fileinfo-sourcefiles" denoting a list of all source files (omitted if there is only one source). --MarkTraceur (WMF) (talk) 18:22, 1 December 2015 (UTC)[reply]

Template replacement text

[edit]

Various tools (MediaViewer, VE, OCG...) use the contents of the description, author, source and permission fields of the {{Information}} template to display information about the file. It is a widespread practice to use large HTML templates in these fields; when these are displayed in places that cannot accept arbitrary HTML, the result is garbled, unreadable text. (See e.g. T119686#2147466, T64255/T123428/T68606, T59383.)

It would be nice if template authors could provide a machine-readable alternative text for their templates... maybe something like

<div class="filemetadata-hasmachinereadableversion">
    {...template contents...}
    <div style="display:none" class="filemetadata-machinereadableversion">{machine-readable text}</div>
</div>

and the CommonsMetadata API which most tools use could substitute the machine readable text for the template HTML. As much as it sucks to add another hack on the pile of hacks that's the current machine-readable data definition, this seems to me the least painful way to handle the millions of templates that are in the author/source/description fields. What do you think? --Tgr (WMF) (talk) 14:10, 24 March 2016 (UTC)[reply]

I'd love if add an interface (checking whether the text qualifies MediaWiki's needs), the necessary storage fields and an API to MediaWiki allowing authors to add this kind of text were added to MediaWiki. -- Rillke(q?) 17:11, 24 March 2016 (UTC)[reply]
I believe adding the burden of maintenance of the suggested feature to Commons (i.e. validation using LUA or similar, layout, support) isn't a good idea. The data would be used only by Tools, we can't do anything about, consequently it should be fully maintained by the consumers (VE, MediaViewer, Offline content generator). -- Rillke(q?) 17:19, 24 March 2016 (UTC)[reply]
Expecting templates to be maintained and updated on Commons but template <-> machine readable data equivalency to be maintained somewhere else is completely unrealistic. The whole point of COM:MRD is to avoid that situation. --Tgr (WMF) (talk) 23:34, 24 March 2016 (UTC)[reply]
No, no more templates or template parameters, please. Please do not add more clutter just because it is easy. display:none might be prone to non-obvious vandalism. Our license templates already expose short versions of the license name. The issue is with MediaViewer: Part of the provided example: https://commons.wikimedia.org/wiki/File:Iglesia_de_San_Pedro,_Teruel,_España,_2014-01-10,_DD_11-12_HDR.JPG#/media/File:Iglesia_de_San_Pedro,_Teruel,_España,_2014-01-10,_DD_11-12_HDR.JPG First, why does it append the hash link to itself. Fix it. Next point, it might use a short URL that is guaranteed to redirect, even after the file had been deleted. I think there is a new extension deployment pending. Third point: Do we really need the link back. -- Rillke(q?) 01:22, 25 March 2016 (UTC)[reply]
Please re-read the proposal, you seem to be talking about something entirely different. The machine-readable metadata in the image you mention is fine. An example of problematic metadata is this one (see the "this tag..." part). --Tgr (WMF) (talk) 05:55, 25 March 2016 (UTC)[reply]
The example was taken from one of the Phab tickets listed. Nice to learn, this has been fixed.
I understood that the Multimedia-Team would like Commons users to add another template parameter to templates, e.g. a source_machine_readable= as the end result, or to allow source and author template authors to specify an alternative, machine-readable text. I do not really like the latter idea for their complexity but if this is the only viable way currently, and you'd like to continue supporting templates in credit fields you'll have to do so. -- Rillke(q?) 11:21, 25 March 2016 (UTC)[reply]

Rillke: I don't think it would require any new parameters, the goal is for templates to have a machine-readable version, they should have all required information via their existing parameters already. It just needs to be done at the right level. E.g. the image I linked has {{Information|...|Source={{LOC-image|id=ds.07135}}...}}; there is no way to change the Information template to make that machine-readable (without introducing a new parameter to be filled out by hand, which is, as you say, unmanageable), but changing the LOC-image template is simple (like this: diff, diff; or even simpler: diff). I hope that makes more sense than my initial explanation. --Tgr (WMF) (talk) 10:32, 28 March 2016 (UTC)[reply]

Tgr (WMF), I am reading and re reading and I am still confused about the proposal. I think it is no problem to add or modify templates in anyway that would make your job easier, as long as we do not break it for other possible users of machine-readable data (MRD) (whoever they might be) and does not require (much) changes to individual files. I am weary of any new parameters added to infoboxes. They have two problems: they can be misused and cause more trouble than they are worth, like "other_fields" parameter is sometimes misused, and I doubt you will find volunteers exited enough about MRD to be adding and maintaining those fields. I agree with you assessment that "there is no way to change the Information template to make" it more MRD. However if there is some MRD data you would like to add to Source templates like {{LOC-image}} or author templates like Creator templates. I think that is fine. We added MRD to all primary license tags and we can do something similar to other classes of templates. May be the easiest way would be to pick one of Commons partnerships and see what would it takes to fix it without altering individual files, as they often use a uniform way of formatting their metadata. We can also create some "MRD" template which can be added to other templates to help. --Jarekt (talk) 13:07, 28 March 2016 (UTC)[reply]

User-space templates

[edit]

There should be a mentioning in the policies for user-space templates, like User:Andrew-k/Templates/Coat (I saw some more in he past). -- User: Perhelion 14:49, 14 January 2018 (UTC)[reply]

Created Commons:Deletion requests/User:Andrew-k/Templates/Coat. --Steinsplitter (talk) 14:57, 14 January 2018 (UTC)[reply]

Multi-licensed exposed in the API?

[edit]

(Raising this here before opening a Phab ticket).

I thought that CommonsMetadata was correctly exposing to the API multi-licensed files ; however looking at a random dual-licensed file, only CC-BY-SA 3.0 is returned ; although as far as I can see, the file Papenburg - Meyer Parkplatz Tor 3 Sielkanal Anleger Meyer (Parkplatz Tor 3) 01 ies.jpg has a classic {{self|GFDL|cc-by-sa-3.0}} with the correct attributes.

Was I just wrong to think that the API is supposed to return both licenses? cc @Jarekt, TheDJ, and Tgr (WMF):

Jean-Fred (talk) 09:15, 15 March 2018 (UTC)[reply]

@Jean-Frédéric: It will only return one license, and it tends to bias (when it can) to the more free license. If a page contains a non-free license statement, than that takes precedence and marks the whole file as non-free. While I think it might be a good idea to return multiple licenses (as an API), most consumers of the api probably don't really care. —TheDJ (talkcontribs) 09:33, 15 March 2018 (UTC)[reply]
Interesting observation however.. Notice how the MMV displays a "terms" button, because the Permissions field of the Information template was filled with some text. So basically the software counts 3 license statements because of that, one of which it can't do anything at all with, because it has no machine readable data, so it just shows it completely upon demand.. Bit of a mess. —TheDJ (talkcontribs) 09:45, 15 March 2018 (UTC)[reply]
Thanks for the answer DJ − not sure why I thought all licenses were returned (although that could easily be achieved by returning an array in the JSON, with licenses ordered by priority).
Interesting also that it privileges the newer licenses, I did not know that. (I wonder then why it returns 2.0 for this file… Must be something wrong with the template then…)
Jean-Fred (talk) 12:02, 15 March 2018 (UTC)[reply]
@Jean-Frédéric: likely because the license's 'short name' doesn't follow the pattern that the parser expects.. Something like that.. Not sure. —TheDJ (talkcontribs) 16:58, 15 March 2018 (UTC)[reply]
Well, he made his own license template, without understanding how they work. The parser only looks for the first license inside a license template. If you want to multi license, you can put multiple license templates inside another licensetpl, but you can't put multiple licenses into a single license template. This is why i'm so against users' personal license templates. —TheDJ (talkcontribs) 17:02, 15 March 2018 (UTC)[reply]
K, fixed. But now he has three licenses (one main one, and two wrapped inside it). Why can't people just use {{Self}}.. —TheDJ (talkcontribs) 17:18, 15 March 2018 (UTC)[reply]
Yikes, that's broken too, as it just concatenates everything for CC into the CC template and hopes for the best: chck the example at the bottom of Self. Commons Data can't come soon enough. —TheDJ (talkcontribs) 17:21, 15 March 2018 (UTC)[reply]
Thanks for having a look!
If I understood correctly (there is a thread on the Village Pump and the German-language Forum), the user wanted to dual-license 2.0/4.0 (I did not really get why but why not), and thought stacking both template was taking too much space (…) Jean-Fred (talk) 20:21, 15 March 2018 (UTC)[reply]
See T59259. I doubt anyone wants to touch that before CommonsData is here (and forces a full rewrite anyway). --Tgr (WMF) (talk) 20:29, 15 March 2018 (UTC)[reply]

Need help fixing this on some Wikipedias

[edit]

Hi! I need help to fix the meta data on several Wikipedias. I asked at meta but no luck so far. So I try here where I hope more users stop by.

I like to visit (often smaller) Wikipedias and make sure that they clean up files if they allow local uploads. I hope that if they clean up then we will have fewer bad files transferred to Commons and perhaps the wiki will even decide to send uploaders of free files directly to Commons.

I noticed that Category:Files with no machine-readable license (Q18218525) have 75 translations but there are much more wikis with files. As I understand it the feature is active on all wikis by default and the categories are controlled by editing translatewiki and if no translation is set the files will end in the standard category (English). Same with the other automatic categories for Category:Files with no machine-readable description (Q18218524), Category:Files with no machine-readable author (Q18218522) and Category:Files with no machine-readable source (Q18218520). Is that correct?

Even if I read m:File metadata cleanup drive/How to fix metadata I could not fix the templates. It does of course not help that many wikis have their own way of creating license templates. But I hope I can learn with a little help to get started :-)

Anyone wanna give it at try? --MGA73 (talk) 19:00, 1 June 2020 (UTC)[reply]