Commons talk:Structured data

Latest comment: 12 days ago by SunilNOpenRefine in topic OpenRefine - Commons upload validations
SpBot archives all sections tagged with {{Section resolved|1=~~~~}} after 7 days.

Talk pages of subpages and archives

It seems that there's no consensus yet on using OpenStreetMap way ID (P10689) (or OpenStreetMap relation ID (P402) and OpenStreetMap node ID (P11693), respectively) in SDC. Using those properties would allow using SPARQL queries based on OSM IDs, like https://w.wiki/B3sq or https://w.wiki/B3su - with the advantage that such a query could return multiple views of an OSM map feature (similar to Google Maps Images for a map object). What's your opinion on this proposal? Fl.schmitt (talk) 06:19, 30 August 2024 (UTC)Reply

Better example: https://w.wiki/B3t6 ("Big Ben", London) Fl.schmitt (talk) 06:33, 30 August 2024 (UTC)Reply
Query for current usage of P10689, grouped by OSM ID: https://w.wiki/B3tC Fl.schmitt (talk) 06:37, 30 August 2024 (UTC)Reply
If i remember correctly, the opposition to using OSM identifiers in Wikidata was that the OSM identifiers weren't stable. The proposed method then was to add Wikidata items to OSM and link from OSM to Wikidata. This would also create a permanent identifier to OSM for an entity. However, the problem with this approach was that it is impossible to know in the wiki Lua/template code if there was anything on the OSM side, which is a problem when the template creates links to OSM or uses OSM location in maps. Soving this for inwiki use would require adding it to software (Lua/Wikidata) and afaik only workaround is to add these as properties. In SPARQL however images can be queried using federated queries ( https://w.wiki/B3tZ or https://w.wiki/B3tt), but with performance penalty. --Zache (talk) 07:05, 30 August 2024 (UTC)Reply
Photos don’t have OSM way/relation/node IDs, only the depicted places have them. Therefore I don’t think these properties should be used directly in SDC: add them to the appropriate Wikidata item and link to that using depicts (P180). Those can also be queried, using federation:
#defaultView:ImageGrid
select ?place ?placeLabel ?thumb with {
  select * {
    service <https://query.wikidata.org/sparql> {
      bind('54486345' as ?way_id).
      ?place wdt:P10689 ?way_id;
             rdfs:label ?placeLabel.
      filter (lang(?placeLabel) = "en").
    }
  } 
} as %places where {
  include %places. 
  ?image wdt:P180 ?place;
         schema:url ?thumb.
}
Try it! (By the way, as you can see from these results, your example of finding the Big Ben / Elizabeth Tower by the ID https://www.openstreetmap.org/way/54486345 is wrong: it’s the nearby St Margaret’s Church. This doesn’t invalidate your example, but it does mean that the files your query finds have incorrect SDC and should be fixed.) —Tacsipacsi (talk) 13:35, 1 September 2024 (UTC)Reply
@Tacsipacsi: Thanks a lot - that's in fact interesting, I didn't check the OSM ID for the Big Ben example. Good hint, I will look at it. Regarding "Photos don’t have OSM way/relation/node IDs, only the depicted places have them": Here I disagree - that's exactly the point of my question. Of course, there's no such thing like a 1:1 relation between a photo and it's object (insofar I agree). But there are 429,892 OSM tags which have Commons Files or Categories "attached". So while OSM entities have Commons content (and Categories/Files can reference OSM objects using {{On OSM}} or {{OSMLink}}), there seems to be a practical need for such relations. Usage Bot has collected more than 200,000 commons files used on OSM. I admit that OSM identifiers may not be stable, but is this a practical problem? This looks like a standard maintenance task to be done by a bot - periodically check such ID references for validity.
Regarding depicts (P180): Using that property in SDC works only in cases if there's a wikidata item as target. But on OSM, the wikimedia_commons or image attributes are used on objects like hiking sign posts, wayside shrines or other "non-notable" things. Especially for hiking sign posts (but also for fountains, sculptures and other 3d objects), it would be very useful to have multiple images available, showing different perspectives of that object. On OSM, there's currently no "recommended" way to reference multiple images, leading to an incoherent use of the respective tags. Assigning an OSM node ID to a Commons file would allow a 1:n relation between OSM objects and Commons files. Fl.schmitt (talk) 17:34, 1 September 2024 (UTC)Reply
To add to previous comments: OpenStreetMap way ID (P10689), OpenStreetMap relation ID (P402) & OpenStreetMap node ID (P11693) shouldn't be used here at all. It's just data pollution. Multichill (talk) 20:05, 30 September 2024 (UTC)Reply

sdc experts? how to query based on camera data?

edit

@XRay asked in Commons:Categories_for_discussion/2024/09/Photographs_by_technical_parameters#c-XRay-20240925100400-Voting, how to search for files based on iso values, etc.

is it currently possible? any sdc experts? RZuo (talk) 16:43, 30 September 2024 (UTC)Reply

Sorry, it looks like I have no idea. It's a bit out of context. For example, you can query with haswbstatement or with SPARQL. The first way is rather inflexible, the second is not quite as simple. IMO the ISO value is not searchable with haswbstatement. --XRay 💬 17:48, 30 September 2024 (UTC)Reply
By the way, you can find out what you can query with ?action=cirrusdump. However, this also shows how little can actually be queried. --XRay 💬 17:50, 30 September 2024 (UTC)Reply
Here is a little bit documentation: [1] --XRay 💬 17:52, 30 September 2024 (UTC)Reply
i saw your question over there and thought that it should be raised to more users' attention. if no one has any solutions then a feature request should be filed on phab. RZuo (talk) 18:38, 30 September 2024 (UTC)Reply
You're right about that. The query options are relatively inflexible. SPARQL requires a lot of knowledge. I mentioned things because I don't think the extensive removal of categories makes sense. I query the source code a lot with regular expressions, so deleting would be counterproductive. I think the SDC options for flexible searches with the normal search function are important, but unfortunately there is far too little available. --XRay 💬 04:44, 1 October 2024 (UTC)Reply
@RZuo: Basically query can be done like this, but there are couple of reasons why SDC is not currently substitute for categories. One is that there is that to be useful values are used in combination with other categories. Easiest way afaik to do this is using Petscan which can filter articles based on multiple categories. This is accessible for normal users. In theory some of use cases can be currently done also using SPARQL (example below), there is performance limitations and not all things can be done using SPARQL because information needed for combined queries are not accessible from SPARQL.
:# P6789 = ISO speed 
:# P2151 = Focal length
:# P6790 = f-number
:# P6757 = exposure time
:SELECT * WHERE {
:  ?file wdt:P6790  ?aperture .
:  ?file schema:url ?image.
:  FILTER (?aperture > 14) 
:}
:LIMIT 10
:
Try it!
--Zache (talk) 08:36, 1 October 2024 (UTC)Reply

Emails

edit

Why am I now receiving unsolicited emails about this? I complained back in 2021 about the intrusive notifications, and now someone has the lack of awareness to start emailing me to do an interview. Who in Wikimedia authorised this? STOP. Cnbrb (talk) 09:19, 19 October 2024 (UTC)Reply

Why would a person doing research interviews have any idea that you asked someone else not to contact you? - Jmabel ! talk 14:01, 19 October 2024 (UTC)Reply
If you read what I wrote, you might get an inkling that being pestered about this project is not entirely welcome. And that's not how privacy works. Why would a person doing research interviews assume that I want to receive emails about something I have already expressed disinterest in? Sounds like lax use of personal data to me. Cnbrb (talk) 15:48, 19 October 2024 (UTC)Reply
On your user page is the invitation "Email this user". If you do not wish to receive unsolicited emails you can turn that off in your account preferences. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 12:29, 20 October 2024 (UTC)Reply

"Publish changes" does nothing

edit

I have changed the geocoordinates of File:Kreuzung L489 - panoramio.jpg in wikitext, because I know better than Panoramio where this picture has been taken. However, now I'm getting the Commons:Structured_data/Reconciliation warning. Switching to the SDC tab, I can add the new coordinates and remove the old ones, but the "Publish changes" button does nothing. No HTTP request. There is an error in the developer console saying Error: View mediainfoview does not exist, but this error already appears when loading the page. "Remove all" removes the coordinates from the list, but no corresponding HTTP request happens, and when reloading the page, the coordinates are back to the way they were, including the discrepancy warning. Scytale (talk) 22:54, 28 October 2024 (UTC)Reply

Modeling video / audio / pdf / djvu files

edit

language of the work, etc. what properties should i be using? also, a bot should take care of some of these. RoyZuo (talk) 14:27, 19 November 2024 (UTC)Reply

@RoyZuo: I just want to make sure before I give any longer answer: are you familiar with Commons:Structured data/Properties table? Seems to me that most (though not all) of what is relevant to a photograph would also be relevant to a video. And the answers for video are going to be different for a PDF (and for a PDF of a written document are going to be different from a PDF that is just a collection of images), so you are really asking several separate questions here. - Jmabel ! talk 22:16, 19 November 2024 (UTC)Reply

Derivatives

edit

I have been wondering for some time how the information on derivatives is correctly provided. The starting point is the crop tool, which basically provokes violations of the license conditions. The information is usually not customized. In addition, a bot like User:BotMultichillT comes very quickly and transfers the inadequate information directly into the structured data. Sources, information on the persons who created the derivation and other information relating to the license are missing. I have tried to correct the information for one derivation (File:2019 BMW i3 (Giga Turbine style 429 wheel).jpg) that I know of. But I don't know whether this is correct. I even doubt it, because I'm not really familiar with the structures from Wikidata. Can someone perhaps give me a hint? --XRay 💬 11:38, 30 November 2024 (UTC)Reply

Important to distinguish crops from derivatives in general. Both may affect "depicts," but crops never change where and when the image was created, and with what equipment, and rarely (if ever) create new intellectual property rights, but may require indicating who did the crop where the original license says that derivatives must not be made to appear to be entirely the work of the original author. But the latter does not always apply: for example CC-0, PD, or just cropping out a border. - Jmabel ! talk 17:50, 30 November 2024 (UTC)Reply
Basically, I think it is important that all information is as correct as possible. As a photographer, I decide what a photo should look like. Derivations of any kind are permitted and possible under license. In my opinion, however, it is not acceptable if the person who created the derivative is not named - in other words, if the terms of the license are not adhered to. The person is responsible for the derivative. This is important to me and can also have legal consequences, because not every excerpt is legally permitted. (See for example File:Louvre at night centered.jpg.) However, the question remains as to what the structured data should look like. (The background: I want to use the structured data of the derivatives of my photos in a meaningful way.) --XRay 💬 18:11, 30 November 2024 (UTC)Reply

To be a little more specific: I am concerned with the following properties:

But: How to set this properties? How to fix the crop tool? How to fix the bots like User:BotMultichillT or User:SchlurcherBot?

--XRay 💬 06:59, 1 December 2024 (UTC)Reply

@XRay: I have similar considerations. On my user page, I keep a log of categories, where I feel the SDCs are not sufficient as of now. These are: Other versions (like SVG), retouched versions and other derivatives like crops. I'm happy to update my bot as soon we have something like an aligned modeling specification for this. Here the problem starts for me already is why can extracted from (P7009) not be added to files manually? Why is it greyed out if added by bots? Can we fix this first and then use it? --Schlurcher (talk) 22:44, 1 December 2024 (UTC)Reply
There seem to be a special Wikimedia type of commonsMedia: https://www.wikidata.org/w/index.php?title=Special:ListProperties/commonsMedia&limit=500&offset=50. These directly link to Commons files (but however cannot be added by the structured data UI). extracted from (P7009) is of type commonsMedia, but based on (P144) is not. Should we use for the latter based on media (P12346) instead, which is again of type commonsMedia? --Schlurcher (talk) 08:03, 9 December 2024 (UTC)Reply

OpenRefine - Commons upload validations

edit

As you may know, OpenRefine lets users upload media files to Commons in batch. Because some of the uploads done in this way add too little metadata to the uploaded files, we are considering introducing more pre-upload checks to prevent that. We need your help to determine which metadata fields should be required for any file uploaded via OpenRefine. Are these guidelines still up to date and accurate? Based on this information, we would require the users to provide:

We would not require copyright license (P275) as this statement is not required for works in the public domain, and we don't anticipate being able to be able to express this conditional dependency.

We also looked into adding constraints on the wikitext associated to the media files but this is likely too complicated to implement reliably, as some required parts could be added via different sorts of templates, which OpenRefine isn't able to expand before upload.

What do you think of this plan? Can you think of any case where it would be fine to upload a file without one of the 5 fields mentioned above? Do you think OpenRefine should only warn the user about those missing fields, or even prevent the upload entirely if those fields are not provided? SunilNOpenRefine (talk) 04:41, 11 December 2024 (UTC)Reply

Return to the project page "Structured data".