Shortcuts: WD:RAQ, w.wiki/LX

Wikidata:Request a query

From Wikidata
Jump to navigation Jump to search

Request a query
Fishing in the Wikidata river requires both an idea where to look for fish and a suitable fishing method. If you have the former, this page can help you find the latter.

This is a page where SPARQL 1.1 Query Language (Q32146616) queries can be requested. Please provide feedback if a query is written for you.

For sample queries, see Examples and Help:Dataset sizing. Property talk pages include also summary queries for these.

For help writing your own queries, or other questions about queries, see Wikidata talk:SPARQL query service/queries and Wikidata:SPARQL query service/query optimization.

Help resources about Wikidata Query Service (Q20950365) and SPARQL: Wikidata:SPARQL query service/Wikidata Query Help and Category:SPARQL.

To report an issue about the Query Service (interface, results views, export...) please see Wikidata:Contact the development team/Query Service and search.
On this page, old discussions are archived. An overview of all archives can be found at this page's archive index. The current archive is located at 2024/04.

Optimizing a SPARQL query -- times out when I add Labels

I am attempting to write a query to return all QIDs with occupation (P106) of visual artist (and all subclasses) with en wiki articles, with optional inclusion of sex or gender (P21) and date of birth (P569). Without labels, it runs without timeout (~90,000 results). Adding labels causes the time out. The query is here: https://w.wiki/pYR and below

SELECT DISTINCT ?item ?itemLabel ?itemDescription ?dob ?gender WHERE {
  {
  SELECT ?item  WHERE {
  ?item wdt:P106/wdt:P279* wd:Q3391743.   #art Q483501  visart Q3391743 /wdt:P279*
  ?item wdt:P31 wd:Q5.
  ?article schema:about ?item ; schema:isPartOf <https://en.wikipedia.org/> ;  schema:name ?page_titleEN .
  OPTIONAL {?item wdt:P21 ?gender.}
  OPTIONAL {?item wdt:P569 ?dob.}
  }  
  ORDER BY DESC(?item)
#  LIMIT 50
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }

}
LIMIT 10000
Try it!

My research led me to this Phabricator page, which shows these timeouts are an ongoing issue, as the database has grown so much. Per the suggestions on that page, I tried to nest the queries, putting the Label service outside the main SELECT, though this didn't have a huge impact. The Phab page also mentions query hints, but I couldn't get that to work, or find documentation on it.

Also, I seem to get duplicate entries, when the subject has two dates of birth list (e.g. there is historical uncertainty as to which is correct). I can clean this up afterwards, but it might be simple to do now.

Can someone offer me guidance on how to optimize this to fit the timeout? Thanks. --Theredproject (talk) 13:37, 7 December 2020 (UTC)[reply]

@Theredproject: This is your big problem: 400k distinct items. As a rule of thumb, getting labels for more than ~100k items is an ask.
SELECT DISTINCT ?item WHERE {
  ?item wdt:P106/wdt:P279* wd:Q3391743.   hint:Prior hint:gearing "forward".
}
Try it!
I'd suggest getting the results in three bites, using a named subquery; here's the second bite. Remove OFFSET 30000 to get the first bite; use OFFSET 60000 LIMIT 40000 for the final bite. If you have not come across it, Wikidata:SPARQL query service/query optimization may be your friend. Finally I removed schema:name ?page_titleEN since it was not doing anything useful.
SELECT DISTINCT ?item ?itemLabel ?itemDescription ?dob ?gender WITH { 
SELECT ?item WHERE {
  ?item wdt:P106/wdt:P279* wd:Q3391743.   hint:Prior hint:gearing "forward".
  ?article schema:about ?item ; schema:isPartOf <https://en.wikipedia.org/> .
} OFFSET 30000 LIMIT 30000}  as %i
WHERE
{
  INCLUDE %i
  ?item wdt:P31 wd:Q5.
  OPTIONAL {?item wdt:P21 ?gender.}
  OPTIONAL {?item wdt:P569 ?dob.}
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en". }

}
Try it!
--Tagishsimon (talk) 17:25, 7 December 2020 (UTC)[reply]
Nice .. tried it, but hadn't thought of OFFSET and LIMIT --- Jura 20:54, 7 December 2020 (UTC)[reply]
@Tagishsimon: ThAnKyOu! I was able to get through the query. I'm still getting duplicates. Is that because of problems endemic to Wikidata, or because according to W3, "Using LIMIT and OFFSET to select different subsets of the query solutions will not be useful unless the order is made predictable by using ORDER BY." Does the query need an ORDER BY? If so, where?
I had thought about segmenting it, but couldn't think of a way to do that with subclasses. Didn't think about OFFSET and LIMIT. Also, didn't know that named subquery is faster -- I see that now in the Query Optimization documentation. I read through that earlier, but I saw it earlier in my research and I couldn't parse it. I did catch the small to big Order Tweaking. Having seen your edits here, I think I have a better angle into the Optimization docs. TX --Theredproject (talk) 12:12, 8 December 2020 (UTC)[reply]
@Theredproject: Yes, you need to use ORDER BY to be sure to get all results when slicing up with LIMIT and OFFSET. That is because without ORDER BY the order of the results is undefined and may change each time you run the query. The only thing here you can order by is ?item which is fine to use because all values are unique. Insert ORDER BY ?item just before OFFSET 30000 LIMIT 30000. --Dipsacus fullonum (talk) 14:04, 8 December 2020 (UTC)[reply]
PS: Well, you could also order by ?article with the same effect. However, I wouldn't recommend that because that would remove an opportunity to eliminate that variable by the optimizer as it is not otherwise used. I don't know if the opportunity is used, but why not make it possible? --Dipsacus fullonum (talk) 14:15, 8 December 2020 (UTC)[reply]

──────────────────────────────────────────────────────────────────────────────────────────────────── @Tagishsimon, Dipsacus fullonum: This time the query returned results in order, but I noticed that the results seemed to have gaps at the start/end of each chunk. I counted and total number of lines is ~13,000 less. The previous query that returned without labels had 94963 results. This trio had 81755 total. The new queries still have duplicate QIDs, where the item has two dates of birth. Here is the tail of the first query (sorry, I don't know how to format a CSV for a talk page (without going through the creation of a wiki table, which is unnecessary and also would make me cry;-):

And the head of the second query:

So it seems like it is skipping some chunks. Any idea why? Also, do you know how to suppress the second result per QID when there is a second birthday value? Thanks. --Theredproject (talk) 20:48, 8 December 2020 (UTC)[reply]

@Theredproject: I have four comment/answers:
  1. I don't understand what you mean by "this trio". What trio of what?
  2. The total number of results are unchanged as you can see by either removing the label service and the use of LIMIT/OFFSET to get all results at once, or by adding the number of results in all the slices.
  3. No, the results don't come in order now. The order is still undefined because there is no ordering on the outer query which may reorder the results from the ordered subquery in any way. Therefore it makes no good sense to talk about gaps. Did you sort the results before making the lists of head and tail of query results you quote above?
  4. The standard way to eliminate multiple values for variables such as ?dob here is to group by all other variables and use some kind of aggregation of the variable with several values. Do that by adding GROUP BY ?item ?itemLabel ?itemDescription ?gender at the very end of the query, and replace ?dob in the SELECT clause by an aggregation. You can use (GROUP_CONCAT(?dob) AS ?dobs) to list all values or e.g. (SAMPLE(?dob) AS ?sample_dob) to get one sample. --Dipsacus fullonum (talk) 09:17, 9 December 2020 (UTC)[reply]
@Dipsacus fullonum: thanks for this final bit of guidance. I got it to work. Posting it here, in case others stumble upon this thread trying to figure out something similar. https://w.wiki/qF4 --Theredproject (talk) 22:11, 11 December 2020 (UTC)[reply]

Query about books written by women

Hi all. I am trying to do a query in which I get a list of books written by women that exist in en.wikipedia but not in pt.wikipedia, and in the query list I wanted to see the url to the en.wiki article... kinda like this, but with columns saying "article in portuguese (book), article in portuguese (writer), article in english (book), article in english (writer), item wikidata, site links.

Can someone help me with this? Tuga1143 (talk) 17:31, 9 December 2020 (UTC)[reply]

@Tuga1143: The basic problem here, I think, it determining what items represent "books" in Wikidata's what-passes-for-an-ontology. AFAIK, we start from {{|Q47461344}} ... ideally we'd also look at its subclass tree, but that is 70k items; and whether we look for wdt:P31/wdt:P279+ or wdt:P31/wdt:P279* (both interrogating the subclass tree) we quickly run into 10s of millions of academic papers and, in short, the query times out.
A structure for the query might be as follows. Others might have additional views on how to recognise books:
SELECT ?item ?itemLabel ?author ?authorLabel ?sitelink ?article WITH { SELECT ?item ?sitelink ?article
WHERE 
{
  ?item wdt:P31 wd:Q47461344.            #written work
  ?article schema:about ?item ;          #on en wiki
          schema:isPartOf <https://en.wikipedia.org/> ;
          schema:name ?sitelink .
} } as %i
WHERE
{
  INCLUDE %i
  ?item wdt:P50 ?author . 
  ?author wdt:P21 wd:Q6581072 .          #female author
  filter not exists {                    #not on pt wiki
  ?article2 schema:about ?item ;
           schema:isPartOf <https://pt.wikipedia.org/> .}
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
Try it!
--Tagishsimon (talk) 19:37, 9 December 2020 (UTC)[reply]

Hi @Tagishsimon:, thank you so much for your help. I was trying to generate some lists about written works by women, sculptures made by women, paintings, etc, for the portuguese wikiproject about women. Right now I'm trying to make a table like this one using the query you gave me, so that users in pt.wiki may see a simple table with the red link of the written work for pt.wiki, the link to the written work in en.wiki and the wikidata item. I'll try to do it. If i have any problem, could I continue to make questions and ask for help? Tuga1143 (talk) 19:58, 9 December 2020 (UTC)[reply]

@Tuga1143: Thanks. Yes of course, we'll do all we can to help; all questions welcome. --Tagishsimon (talk) 20:09, 9 December 2020 (UTC)[reply]
I'm sorry for being terrible at this... jesus christ.
I'm trying to do a table for the query, having something like:
LIMIT 100
|section=
|sort=
|columns= label:article in portuguese,article in english wiki,item:Item Wikidata
|links= red_only
|row_template= subst:Usuário:Tuga1143/Listas template
|thumb=
|freq=
|min_section=
But, in the line about columns, what can I write for it to be possible to generate a table having three columns saying "article in portuguese (redlink), article in english wiki (with link), item:Item Wikidata" and show red links only? Tuga1143 (talk) 20:15, 9 December 2020 (UTC)[reply]

Hi @Tagishsimon:, how are you? I hope you're ok. I've managed to do what I was thinking about, as you can see here. The second and third columns are perfectly fine... but im trying to make the first column show only the red articles and, instead, it is showing me blue articles + wikidata links... do you have any idea? IF it helps, the template im using is this one. Thank you so much. Tuga1143 (talk) 09:22, 10 December 2020 (UTC)[reply]

@Tuga1143: In the query at pt:Usuário:Tuga1143/Testes15 you have the filter
filter not exists {                    #not on pt wiki
 ?article schema:about ?item ;
          schema:isPartOf <https://pt.wikipedia.org/> .}
But it doesn't filter out anything because the variable ?article is already bound to a sitelink to the book on English Wikipedia, and the English sitelink never exists on Portugese Wikipedia. If you look above you will see that Tagishsimon used a new variable named ?article2 in the filter. When you do that it will filter out the result if it is possible to find a value for ?article2 on Portuguese Wikipedia. --Dipsacus fullonum (talk) 11:22, 10 December 2020 (UTC)[reply]
@Dipsacus fullonum: I see... so it's impossible to make query showing the red links like in the example I gave but, instead of biographies, showing books written by women still missing in pt.wiki, sculptures made by an artist women and still missing in pt.wiki, art work made by women still missing in pt.wiki, etc? Tuga1143 (talk) 11:41, 10 December 2020 (UTC)[reply]
@Tuga1143: No, I didn't say that it is not possible. There is one problem though: You cannot know the article name on Portuguese Wikipedia which may be different from the Portuguese label at Wikidata due to disambiguation or other reasons. Otherwise you can change the name for article variable in the filter to an unused name to get the filter to work as intended. --Dipsacus fullonum (talk) 11:55, 10 December 2020 (UTC)[reply]
Sorry, I didnt want to put words on your mouth. Could you help me and tell me how to change the article variable in the filter to an unsused name (equal to the one in en.wiki) to get the filter to work as intended? Tuga1143 (talk) 11:58, 10 December 2020 (UTC)[reply]

Duplicate BHL items

The items:

represent the same publication, for which separate DOIs have been issued in error.

Our friends at the Biodiversity Heritage Library have asked for our our help in identifying similar cases, in order that they may manually check and resolve them in their data (and thus in ours).

We could use a query to find pairs of items with different DOIs, but where titles (case insensitive?) and perhaps other properties, such as author (item or string), date, etc., match closely if not exactly (note the minor difference in labels, above).

To reduce processing, checks can be limited to items with a BHL bibliography ID (P4327) property.

The above example should be found by such a query. Accordingly, I will not merge the items yet.

Can anyone assist, please? Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 09:04, 10 December 2020 (UTC)[reply]

@Pigsonthewing: calculate proximity between strings is not easy (to other people with SPARQL-fu: is there even a way ta do a Levenshtein distance in SPARQL?).
That said, here is already a query for pair of publication with the exact same author (item), exact same date and exact same title (limited to 1000 first results):
SELECT * WHERE {
  ?q1 wdt:P356 ?doi1 ; wdt:P4327 ?id1 ; wdt:P50 ?aut ; wdt:P577 ?date ; wdt:P1476 ?title .
  FILTER regex (?doi1, "^10.5962/BHL.").
  ?q2 wdt:P356 ?doi2 ; wdt:P4327 ?id2 ; wdt:P50 ?aut ; wdt:P577 ?date ; wdt:P1476 ?title  .
  FILTER regex (?doi2, "^10.5962/BHL.").
  FILTER ( str(?q1) < str(?q2) )
}
LIMIT 1000
Try it!
There is probably some false-positive and a lot to check. I'll let you check and tell me if it helps, how it can be more helpful, and so on.
Cheers, VIGNERON (talk) 13:51, 10 December 2020 (UTC)[reply]
@VIGNERON: That's very helpful, even with strict matching. Thank you. Andy Mabbett (Pigsonthewing); Talk to Andy; Andy's edits 14:17, 10 December 2020 (UTC)[reply]

Getting Wikidata QIds for ~1000 Wikipedia articles (various languages)?

I'm trying to bring three lists together: people's names, wikipedia article links and wikidata ids. I've obtained ~7k wikidata ids for my 40k names, and have worked out how to ask for the wikipedia article name associated with a wikidata id in each given language in turn. So far, so good, but I don't know how to approach the wikipedia->wikidata pairing. I have ~1000 names+wplink pairs, and I'd like to get the missing wikidata id. How can I get hold of the QId from a lang+wplink? eg: en.wikipedia.org/wiki/Fernando_Botero Thanks! Scarabocchio (talk) 20:24, 10 December 2020 (UTC)[reply]

@Scarabocchio: With the schema:about predicate. E.g.
SELECT ?item ?itemLabel ?itemDescription
WHERE
{
  <https://en.wikipedia.org/wiki/Fernando_Botero> schema:about ?item .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
Try it!
--Dipsacus fullonum (talk) 02:22, 11 December 2020 (UTC)[reply]


SELECT ?item ?itemLabel ?itemDescription
WHERE
{
  [] schema:about ?item ; schema:isPartOf <https://en.wikipedia.org/> ; schema:name "Fernando Botero"@en
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
Try it!

@Scarabocchio: Another way: slightly different, but essentially the same. --- Jura 14:18, 11 December 2020 (UTC)[reply]

Prefix should be s: but it return wds:

Hi, I expect this to return s: prefix but how come it return with prefix wds: instead ? wds:Q36949-91bc1581-43b0-78c1-4970-c2480d22c56c

Because according to this entity ttl https://www.wikidata.org/wiki/Special:EntityData/Q36949.ttl

The value prefix is s: not wds: , you can search Q36949-91bc1581-43b0-78c1-4970-c2480d22c56c at that ttl.

select * 
WHERE {
  wd:Q36949 p:P2218 ?vv.
}
Try it!
It's not really my area, but AFAICS the turtle file uses @prefix s: <http://www.wikidata.org/entity/statement/> . where the RDF representation is PREFIX wds: <http://www.wikidata.org/entity/statement/> [1]. Why the powers that be in WMDE chose to be inconsistent in this matter I cannot say. You could ask at Wikidata:Contact the development team/Query Service and search, but tbh I doubt anyone will know. --Tagishsimon (talk) 11:43, 11 December 2020 (UTC)[reply]

Altering this query to include all subclasses of meteorite (Q60186)

I'd like this query to expand coverage and include all subclasses of meteorite (Q60186). Can someone edit the query below?

SELECT ?item ?itemLabel ?Meteoritical_Bulletin_Database_ID WHERE {
  
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en,en". }
  OPTIONAL {
    ?item wdt:P824 ?Meteoritical_Bulletin_Database_ID.
    BIND(REPLACE(?Meteoritical_Bulletin_Database_ID, "^([0-9]+).*$", "$1") AS ?number)
    BIND(STRAFTER(?Meteoritical_Bulletin_Database_ID, ?number) AS ?after)
  }
}
ORDER BY (xsd:integer(?number)) (?after)
LIMIT 1000
Try it!

Thank you. Trilotat (talk) 21:16, 11 December 2020 (UTC)[reply]

SELECT DISTINCT ?item ?itemLabel ?Meteoritical_Bulletin_Database_ID ?instanceofLabel
WHERE 
{
  { ?item wdt:P279*/wdt:P31 wd:Q60186 } UNION { ?item wdt:P824 [] }   #both
  # ?item wdt:P279*/wdt:P31 wd:Q60186                                 #only Q60186 and subclasses
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en,en". }
  OPTIONAL {
    ?item wdt:P824 ?Meteoritical_Bulletin_Database_ID.
    BIND(REPLACE(?Meteoritical_Bulletin_Database_ID, "^([0-9]+).*$", "$1") AS ?number)
    BIND(STRAFTER(?Meteoritical_Bulletin_Database_ID, ?number) AS ?after)
  }
  OPTIONAL { ?item wdt:P31 ?instanceof }
}
ORDER BY (xsd:integer(?number)) (?after)
LIMIT 1000

Try it!

@Trilotat: As you want it to be expanded, I take it you'd want both to be included. --- Jura 00:05, 12 December 2020 (UTC)[reply]

List of authors with links to Wikisource but not any other project

I'd like a list of authors with links to Wikisource but not any other project. The catch is that there must also be an item which is a subclass of written work (Q47461344) linking to that author. NMaia (talk) 06:32, 13 December 2020 (UTC)[reply]

@NMaia:
SELECT ?p ?pLabel (SAMPLE(?ẁork) as ?work)
{
    ?work wdt:P50 ?p .
    ?p wikibase:sitelinks 1 . 
    [] schema:about ?p ; schema:isPartOf / wikibase:wikiGroup "wikisource" .
    SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}  
GROUP BY ?p ?pLabel
LIMIT 10

Try it!. It's not limited to written works, but P50 should get you mostly that. --- Jura 12:19, 13 December 2020 (UTC)[reply]

That query only finds authors with link to one Wikisource. The request said "links to Wikisource" (plural). NMaia, did you mean authors with links to more than one Wikisource? --Dipsacus fullonum (talk) 20:12, 13 December 2020 (UTC)[reply]
@Dipsacus fullonum: It can link to more than one Wikisource. Only one would be unnecessarily limiting. NMaia (talk) 01:31, 14 December 2020 (UTC)[reply]
@NMaia: I made a query which is almost what you requested. It excludes the authors from other projects than Wikisource by checking that the number of counted Wikisources equals the total number of sitelinks. The difference from the request is that the query only includes authors which are instance of (P31) human (Q5). I had to include that limitation to avoid timeout.
SELECT ?author ?authorLabel
WITH
{
  SELECT ?author
  {
    [] schema:about ?author ; schema:isPartOf / wikibase:wikiGroup "wikisource" .
    ?author wikibase:sitelinks ?sitelinks . 
    ?author wdt:P31 wd:Q5 .
  }
  GROUP BY ?author
  HAVING (COUNT(?author) = SAMPLE(?sitelinks))
} AS %1
WITH
{
  SELECT ?author
  WHERE
  {
    INCLUDE %1
    FILTER EXISTS
    {
      ?work wdt:P50 ?author .
      ?work wdt:P31/wdt:P279* wd:Q47461344 . # Written work exists
      hint:Prior hint:gearing "forward" .
    }
  }
} AS %2
WHERE
{
  INCLUDE %2
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" . }
}
Try it!
--Dipsacus fullonum (talk) 03:11, 14 December 2020 (UTC)[reply]
@Dipsacus fullonum: Thank you! I think this should realistically cover most cases. NMaia (talk) 03:15, 14 December 2020 (UTC)[reply]
@NMaia: The previous query did miss 1869 authors who either aren't one human or aren't marked as such. Here is a query for these cases:
SELECT ?author ?authorLabel
WITH
{
  SELECT DISTINCT ?author
  WHERE
  {
    ?work wdt:P50 ?author .
    [] schema:about ?author ; schema:isPartOf / wikibase:wikiGroup "wikisource" .
    MINUS { ?author wdt:P31 wd:Q5 . }
  }
} AS %0
WITH
{
  SELECT ?author
  {
    INCLUDE %0
    [] schema:about ?author ; schema:isPartOf / wikibase:wikiGroup "wikisource" .
    ?author wikibase:sitelinks ?sitelinks . 
  }
  GROUP BY ?author
  HAVING (COUNT(?author) = SAMPLE(?sitelinks))
} AS %1
WITH
{
  SELECT ?author
  WHERE
  {
    INCLUDE %1
    FILTER EXISTS
    {
      ?work wdt:P50 ?author .
      ?work wdt:P31/wdt:P279* wd:Q47461344 . # Written work exists
      hint:Prior hint:gearing "forward" .
    }
  }
} AS %2
WHERE
{
  INCLUDE %2
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" . }
}
Try it!
--Dipsacus fullonum (talk) 06:28, 14 December 2020 (UTC)[reply]

@NMaia: I found a way to get all results at once without timeout. This query finds approx. 4232 authors where the previous two queries each found about 2366 humans and 1869 other authors.

SELECT ?author ?authorLabel
WITH
{
  SELECT ?author
  {
    ?wikisource schema:about ?author ; schema:isPartOf / wikibase:wikiGroup "wikisource" .
    ?author wikibase:sitelinks ?sitelinks . 
    ?work wdt:P50 ?author .
  }
  GROUP BY ?author
  HAVING (COUNT(DISTINCT ?wikisource) = SAMPLE(?sitelinks))
} AS %1
WITH
{
  SELECT ?author
  WHERE
  {
    INCLUDE %1
    FILTER EXISTS
    {
      ?work wdt:P50 ?author .
      ?work wdt:P31/wdt:P279* wd:Q47461344 . # Written work exists
      hint:Prior hint:gearing "forward" .
    }
  }
} AS %2
WHERE
{
  INCLUDE %2
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" . }
}
Try it!

--Dipsacus fullonum (talk) 07:08, 14 December 2020 (UTC)[reply]

Exclude members of duo

I would like to revise this query w:en:Wikipedia:WikiProject Women in Red/Missing articles by dictionary/Encyclopædia Britannica Online to exclude rows that are members of a duo/married couple with their own encyclopedia article. E.g., exclude Jeanne-Claude (Q3351266), who is part of (P361) Christo and Jeanne-Claude (Q158477), because the latter QID already has its own enwp article. czar 18:34, 14 December 2020 (UTC)[reply]

@Czar: Done in the page's SPARQL (diff). --Tagishsimon (talk) 18:42, 14 December 2020 (UTC)[reply]
@Czar: In addition, it is possible to remove individual entries, by adding SPARQL along the lines of FILTER ( ?item NOT IN ( wd:Q12345, wd:Q67890 ) ) - which might be necessary to properly weed out rows which are never going to yield an article. --Tagishsimon (talk) 19:09, 14 December 2020 (UTC)[reply]

How to get ask a query to find all the given names and their meanings

Hi

I'm struggeling with the following: I need a list of given names, by country , and their meaning, origin, related names.

Can you please help?

Thx in advance!

Once working, no longer working query

Hello, what's wrong with that query ? It is supposed to show a count of countries per year (not sure if it's the list or the count). Many thanks,

select ?country (group_concat(?NatureLabelFr;separator=" ,") as ?NatureLabelFr) ?countryLabel  ?creationDate ?dissolutionDate
with {
select ?country (coalesce(?countryLabelFr, ?countryLabelEn,  ?country) as ?countryLabel) ?creationDate ?dissolutionDate{
?country wdt:P31/wdt:P279* wd:Q3624078 # sovereign states, I don’t know if it’s the right item
MINUS { ?country (wdt:P31/wdt:P279*) wd:Q1790360. }#empires coloniaux
MINUS { ?country (wdt:P31/wdt:P279*) wd:Q1371288. } #états vassals
optional { ?country rdfs:label ?countryLabelFr filter(lang(?countryLabelFr)= "fr")} .
optional { ?country rdfs:label ?countryLabelEn filter(lang(?countryLabelEn)= "en")}
optional { ?country wdt:P571 ?creationDate }
optional { ?country wdt:P576 ?dissolutionDate }
} order by ?countryLabel
} as %datas
where {
include %datas .
optional{
?country wdt:P31/rdfs:label ?NatureLabelFr filter(lang(?NatureLabelFr)= "fr") .
}
} group by ?country ?countryLabel  ?creationDate ?dissolutionDate
Try it!
@Bouzinac : The query fails because it is invalid SPARQL code. Variable names after "AS" may not already be in scope so the construct group_concat(?NatureLabelFr; separator=" ,") as ?NatureLabelFr) is not allowed. You have to use a new variable name. At one point the engine was more forgiving for that error but now if fails without a proper error message. It is tracked in phab:T235540. --Dipsacus fullonum (talk) 00:56, 16 December 2020 (UTC)[reply]

Getting the point-in-time of a property?

I think I'm confused about property hierarchy. (And maybe using the wrong terminology). For this query:

SELECT ?item ?pl ?signatory ?date2 ?short ?itemLabel (CONCAT("Public Law ",?pl, ". ", STRBEFORE(?stat,"-"), " Stat ", STRAFTER(?stat,"-"),".") as ?text) 
WHERE {
  ?item wdt:P3825 ?stat ;
        wdt:P3837 ?pl .
  OPTIONAL { ?item wdt:P585 ?date . }
  OPTIONAL { ?item wdt:P1813 ?short . }
  OPTIONAL { ?item p:P1891 ?signatory .
             #?signatory wdt:P585 ?date2 .
           }
  OPTIONAL { 
}
  FILTER (CONTAINS(?pl, '116-')) .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en,en"  }
}
LIMIT 10
Try it!

I would like ?date2 to be the point in time (P585) property of the signatory signatory (P1891) property e.g. as shown in https://www.wikidata.org/wiki/Q62774908#Q62774908$d9cef975-4de4-7ad7-08b8-1a686db021a0 but…it doesn't seem to work. Either in its own OPTIONAL stanza or together with the ?signatory definition. How do I do I get that? (Also, for bonus points, format the date as "January 16, 2019" rather than "16 January 2019"). Thanks! This is for w:Talk:List_of_acts_of_the_116th_United_States_Congress#Wikidata_table_experiment Jhawkinson (talk) 01:58, 18 December 2020 (UTC)[reply]

@Jhawkinson: This. Tutorial on qualifiers. You'd have to do some string slicing to form the date in your preferred format; there's no native control of the date format.
SELECT ?item ?pl ?signatory ?date2 ?short ?itemLabel (CONCAT("Public Law ",?pl, ". ", STRBEFORE(?stat,"-"), " Stat ", STRAFTER(?stat,"-"),".") as ?text) 
WHERE {
  ?item wdt:P3825 ?stat ;
        wdt:P3837 ?pl .
  OPTIONAL { ?item wdt:P585 ?date . }
  OPTIONAL { ?item wdt:P1813 ?short . }
  OPTIONAL { ?item p:P1891 ?stat2 .
            ?stat2 ps:P1891 ?signatory; .
            OPTIONAL {?stat2 pq:P585 ?date2 .}
           }
  OPTIONAL { 
}
  FILTER (CONTAINS(?pl, '116-')) .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "en,en"  }
}
Try it!
--Tagishsimon (talk) 02:17, 18 December 2020 (UTC)[reply]
Thank you, @Tagishsimon:. I have to say, that syntax and model seem particularly…arcane? I'm not sure how anyone could figure out that's how it worked from either looking at existing examples at wd:SPARQL_query_service/queries/examples or even from a casual glance at the SPARQL spec (maybe I need to be looking at the RDF spec? I dunno. I looked in a bunch of places). Of course the specification resources are pretty formal and set-in-stone, but it does seem like there should be some examples that make use of the
p -> OPTIONAL { pq }
pattern to help people out. Although I now see there are some examples that do use
p -> pq
they are not well-commented or documented as to what is going on there. (Or at least, they did not seem to be the first time I read them, trying to figure this out). But the tutorial you linked to is particularly helpful, yes! It's also striking to me that that's an off-site resource and there doesn't seem to be an analogous resource in the Wikidata documentation, although maybe there is and I just didn't find it. Also: I was confused about the ps: part, which I guess is not necessary here.
I suspect it's possible to avoid the use of ?stat2…ah yes, the "blank node" syntax. That is, we can turn
OPTIONAL { ?item p:P1891 ?stat2 . ?stat2 ps:P1891 ?signatory; OPTIONAL {?stat2 pq:P585 ?date2 .} }
into<
OPTIONAL { ?item p:P1891 [ pq:P585 ?date2 ] }
which seems to be a win for clarity. Anyhow, thank you again. I don't think I'm experienced enough to tackle trying to improve the examples and help here (at a minimum I would have to read more of them and think about them), but I did want to flag the issue with the wide-open eyes of a newcomer. Jhawkinson (talk) 13:56, 18 December 2020 (UTC)[reply]
@Jhawkinson: OPTIONAL { ?item p:P1891 [ pq:P585 ?date2 ] } is fine if you only want to know the date but not who the signatory is. But it is not good if you want to know both because if you extract the signatory in another independent part of the query, you will not be able to pair the signatories and dates to each other in case that there are more than one statement with signatory (P1891). Even if that is never the case, it would be an unlogical data model and also less efficient. --Dipsacus fullonum (talk) 15:59, 18 December 2020 (UTC)[reply]

Add references to a query

Hello, given that existing query which functions perfectly, I'd like to improve it with the (max 1) reference reference URL (P854) / Wikimedia import URL (P4656) of their each annualised statement. Thanks ! Bouzinac💬✒️💛 10:47, 21 December 2020 (UTC)[reply]

@Bouzinac: That is done here. I also reformatted the query to make it more readable (otherwise I was not able to edit it) and removed the defintion of the unused variable ?where:
SELECT ?year ?item ?itemLabel (MAX(?number) AS ?passengers)
  (SAMPLE(COALESCE(?reference_URL, ?monthly_reference_URL)) AS ?sample_reference_URL)
WITH
{
  SELECT ?item ?statement ?year ?timevalue ?numberperperiod ?reference_URL
  WHERE
  {
    ?item wdt:P238 ?IATA.
    VALUES ?IATA 
    {
      "SOF" "VAR" "BOJ" "EPU"
    }
    ?item p:P3872 ?statement.
    ?statement pqv:P585 ?timevalue;
               ps:P3872 ?numberperperiod.
    ?timevalue wikibase:timeValue ?date.
    OPTIONAL { ?statement pq:P518 ?applies. }
    OPTIONAL { ?statement prov:wasDerivedFrom / (pr:P854|pr:P4656) ?reference_URL. }
    FILTER (BOUND(?applies)=false || ?applies = wd:Q2165236 )
    MINUS { ?statement wikibase:rank wikibase:DeprecatedRank }
    BIND (YEAR(?date) AS ?year)
    FILTER (?year > 2010)
    FILTER (?year < YEAR(NOW()))
  }
} AS %airport
WHERE
{
  {
    SELECT ?item ?year (SUM(?numberperperiod) AS ?number) (SAMPLE(?reference_URL) AS ?monthly_reference_URL)
    WHERE
    {
      INCLUDE %airport
      ?timevalue wikibase:timePrecision ?prec.
      FILTER (?prec > 9)
    }
    GROUP BY ?item ?year
  }
  UNION
  {
    ?timevalue wikibase:timePrecision 9 .
    BIND (?numberperperiod AS ?number)
    BIND (?reference_URL AS ?sample_reference_URL)
    INCLUDE %airport
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "fr". }
}
GROUP BY ?item ?itemLabel ?year
ORDER BY ?item DESC (?year)
Try it!
--Dipsacus fullonum (talk) 12:51, 21 December 2020 (UTC)[reply]
Thank you, works perfectly like there. Thanks again ! Bouzinac💬✒️💛 13:36, 21 December 2020 (UTC)[reply]
Hello again [[User:|]] ([[User talk:|talk]] • contribslogs), there seems to be a pb if there is 2 references for same month (looks it sums double) eg : SVO
SELECT ?year ?item ?itemLabel (MAX(?number) AS ?passengers)
  (SAMPLE(COALESCE(?reference_URL, ?monthly_reference_URL)) AS ?sample_reference_URL)
WITH
{
  SELECT ?item ?statement ?year ?timevalue ?numberperperiod ?reference_URL
  WHERE
  {
    ?item wdt:P238 ?IATA
    VALUES ?IATA 
    {
 "SVO"
    }
    ?item p:P3872 ?statement.
    ?statement pqv:P585 ?timevalue;
               ps:P3872 ?numberperperiod.
    ?timevalue wikibase:timeValue ?date.
    OPTIONAL { ?statement pq:P518 ?applies. }
    OPTIONAL { ?statement prov:wasDerivedFrom / (pr:P854|pr:P4656) ?reference_URL. }
    FILTER (BOUND(?applies)=false || ?applies = wd:Q2165236 )
    MINUS { ?statement wikibase:rank wikibase:DeprecatedRank }
    BIND (YEAR(?date) AS ?year)
    FILTER (?year >2016).
    FILTER (?year < YEAR(NOW()))
  }
} AS %airport
WHERE
{
  {
    SELECT ?item ?year (SUM(?numberperperiod) AS ?number) (SAMPLE(?reference_URL) AS ?monthly_reference_URL)
    WHERE
    {
      INCLUDE %airport
      ?timevalue wikibase:timePrecision ?prec.
      FILTER (?prec > 9)
    }
    GROUP BY ?item ?year
  }
  UNION
  {
    ?timevalue wikibase:timePrecision 9 .
    BIND (?numberperperiod AS ?number)
    BIND (?reference_URL AS ?sample_reference_URL)
    INCLUDE %airport
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "fr". }
}
GROUP BY ?item ?itemLabel ?year
ORDER BY ?item DESC (?year)
Try it!
Thank you Bouzinac💬✒️💛 15:12, 21 December 2020 (UTC)[reply]
@Bouzinac: Yes, you are right. I think it is necessary with another inner subquery to select a single sample reference URL inside the subquery to add monthly values. So I hope this is better:
SELECT ?year ?item ?itemLabel (MAX(?number) AS ?passengers)
  (SAMPLE(COALESCE(?reference_URL, ?monthly_reference_URL2)) AS ?sample_reference_URL)
WITH
{
  SELECT ?item ?statement ?year ?timevalue ?numberperperiod ?reference_URL
  WHERE
  {
    ?item wdt:P238 ?IATA.
    VALUES ?IATA 
    {
      "SOF" "VAR" "BOJ" "EPU"
    }
    ?item p:P3872 ?statement.
    ?statement pqv:P585 ?timevalue;
               ps:P3872 ?numberperperiod.
    ?timevalue wikibase:timeValue ?date.
    OPTIONAL { ?statement pq:P518 ?applies. }
    OPTIONAL { ?statement prov:wasDerivedFrom / (pr:P854|pr:P4656) ?reference_URL. }
    FILTER (BOUND(?applies)=false || ?applies = wd:Q2165236 )
    MINUS { ?statement wikibase:rank wikibase:DeprecatedRank }
    BIND (YEAR(?date) AS ?year)
    FILTER (?year > 2010)
    FILTER (?year < YEAR(NOW()))
  }
} AS %airport
WHERE
{
  {
    # Get the sum of monthly values within a year
    SELECT ?item ?year (SUM(?numberperperiod) AS ?number) (SAMPLE(?monthly_reference_URL) AS ?monthly_reference_URL2)
    WHERE
    {
      # Get a sample reference URL for each monthly value
      {
        SELECT ?item ?year ?numberperperiod (SAMPLE(?reference_URL) AS ?monthly_reference_URL)
        WHERE
        {
          INCLUDE %airport
          ?timevalue wikibase:timePrecision ?prec.
          FILTER (?prec > 9)
        }
        GROUP BY ?item ?statement ?year ?numberperperiod
        # Include ?statement in the GROUP BY because ?numberperperiod may not be unique
      }
    }
    GROUP BY ?item ?year
  }
  UNION
  {
    ?timevalue wikibase:timePrecision 9 .
    BIND (?numberperperiod AS ?number)
    BIND (?reference_URL AS ?sample_reference_URL)
    INCLUDE %airport
  }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "fr". }
}
GROUP BY ?item ?itemLabel ?year
ORDER BY ?item DESC (?year)
Try it!
--Dipsacus fullonum (talk) 00:45, 22 December 2020 (UTC)[reply]
Wonderful! Thank you again dude Bouzinac💬✒️💛 10:51, 22 December 2020 (UTC)[reply]

What are the most frequently used external identifiers?

and how to find out? --Loominade (talk) 14:40, 21 December 2020 (UTC)[reply]

Top 5 is:
  1. PubMed ID (P698) (60 million)
  2. DOI (P356) (27 million)
  3. PMCID (P932) (11 million)
  4. SIMBAD ID (P3083) (8 million)
  5. VIAF ID (P214) (6 million)
You find out by looking at Wikidata:Database reports/List of properties/all. There are too many external identifiers and statements to make the list with SPARQL query without timeout. --Dipsacus fullonum (talk) 01:58, 22 December 2020 (UTC)[reply]

I'd like a list of winners of Wollaston Medal (Q832424) that includes and is sorted by point in time (P585). I've searched the examples for one that I could, but they are all too complex and fail when I edit them. Thanks for any help. Trilotat (talk) 17:51, 21 December 2020 (UTC)[reply]

@Trilotat: Hi, try this:
SELECT ?recipient ?recipientLabel (YEAR(?time) AS ?year)
WHERE
{
  ?recipient p:P166 ?statement .
  ?statement ps:P166 wd:Q832424 . # Wollaston Medal
  OPTIONAL { ?statement pq:P585 ?time . }
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en" . }
}
ORDER BY ?time
Try it!
--Dipsacus fullonum (talk) 00:59, 22 December 2020 (UTC)[reply]

Steer on combining subquery results

The aim of this query is to find candidate town/parish items that match town council items based on a string match. First subquery finds the councils and extracts the town name from the label as ?town_name. Second subquery finds town and parish items, extracting their labels into ?town_name. The grand idea is that the query delivers rows linking council items to town/parish items based on a ?town_name match, but instead, horrow. I've tried a variety of other approaches, but timeout. Clearly I don't know how to combine two disparate subquery sets in a useful fashion.

SELECT ?item ?itemLabel ?town ?town_name 
WITH { 
  SELECT ?item ?itemLabel ?town ?town_name WHERE {
    ?item wdt:P31 wd:Q3507268.
    ?item rdfs:label ?itemLabel . filter(lang(?itemLabel)="en")
    ?item wdt:P17 wd:Q145  .
    FILTER NOT EXISTS { ?item wdt:P1001 []. }
    BIND(REPLACE( ?itemLabel, " Town Council","") as ?town_name)
  } } AS %councils
WITH {
  SELECT DISTINCT ?town ?town_name WHERE {
    ?town wdt:P17 wd:Q145 .
    {?town wdt:P31 wd:Q1115575.}
    UNION
    {?town wdt:P31 wd:Q3957 .}
    ?town rdfs:label ?town_name . filter(lang(?town_name)="en")
  } } AS %towns
WHERE {
   {INCLUDE %councils} 
   UNION 
   {INCLUDE %towns}
}
Try it!
Hi. The idea is good and it almost works. When combining patterns or subqueries based on matching variables you don't use the UNION keyword, as that means that you want the results from each part without matching the variables (as you already do in the %towns subquery). Just remove the UNION keyword in the main query, and it will work. --Dipsacus fullonum (talk) 09:30, 22 December 2020 (UTC)[reply]
lol. There's no hope for me :) ... thank you, Dipsacus fullonum --Tagishsimon (talk) 09:50, 22 December 2020 (UTC)[reply]

version, edition, or translation

Hello!
I would like a list/query of all instance of (P31)=version, edition or translation (Q3331189) that have the statement: edition or translation of (P629)=X. And X has the statement instance of (P31)=Christian hymn (Q856713). --Geogast (talk) 14:09, 22 December 2020 (UTC)[reply]

@Geogast: These, concisely:
SELECT ?item ?itemLabel 
WHERE 
{
  ?item wdt:P31 wd:Q3331189;
        wdt:P629/wdt:P31 wd:Q856713 .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
Try it!
More sensibly:
SELECT ?item ?itemLabel ?x ?xLabel
WHERE 
{
  ?item wdt:P31 wd:Q3331189;
        wdt:P629 ?x . 
  ?x wdt:P31 wd:Q856713 .
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
}
Try it!
--Tagishsimon (talk) 14:18, 22 December 2020 (UTC)[reply]
Now, that was quick! Exactly what I needed. Thanks a lot!--Geogast (talk) 14:26, 22 December 2020 (UTC)[reply]

List of countries

Sorry for posting such a basic question, but my attempt at getting all countries of the world miserably fails:

SELECT ?item
WHERE 
{
  ?item wdt:P31 wd:Q6256. # item is a country
}
Try it!

The result strangely does not contain Albania (Q222), which is an instance of (P31) of country (Q6256). What am I doing wrong? Thanks! Syced (talk) 15:36, 22 December 2020 (UTC)[reply]

@Syced: wdt: will only find truthy values; for Albania, Sovereign State is marked as the preferred value, so it truthy. Country is marked as normal preference and so is not truthy. Use p:/ps: to see all values. Arcane documentation here. And although the tutorial doesn't explicitly deal with truthiness, this section - statements-with-qualifiers - describes p:/ps: business well enough.
SELECT ?item ?itemLabel
WHERE 
{
  ?item p:P31 [ps:P31 wd:Q6256] . # item is a country
  SERVICE wikibase:label { bd:serviceParam wikibase:language "[AUTO_LANGUAGE],en". }
} order by ?itemLabel
Try it!
--Tagishsimon (talk) 15:44, 22 December 2020 (UTC)[reply]

Ah, I had not noticed that! Understood, thanks! :-) Syced (talk) 23:20, 22 December 2020 (UTC)[reply]

Timeout in a query about Wikidata pages

I am trying to write a query to get the list of all Wikipedias that have an article about a given entity, with the respective titles.

This seems to work:

SELECT DISTINCT ?lang ?name WHERE {
  ?article schema:about wd:QXXXXXX;
              schema:inLanguage ?lang ;
              schema:name ?name ;
              schema:isPartOf [ wikibase:wikiGroup "wikipedia" ] .
  FILTER (!CONTAINS(?name, ':')) .
}
Try it!

The problem is that it times out for some entities, for example Q15 and Q16. For most entities, it runs in less than a second, even those with relatively long lists, for example Q2, and I don't know how I could improve it. Tcp-ip (talk) 21:17, 22 December 2020 (UTC)[reply]

@Tcp-ip: Near as I can work out, the optimiser is doing ill-advised stuff with the schema:isPartOf element. This solves it, at least for Q15. Come back if you find more timeouts. See also Wikidata:SPARQL query service/query optimization fwiw.
SELECT DISTINCT ?lang ?name WHERE {
  hint:Query hint:optimizer "None".
  ?article schema:about wd:Q15 .
  ?article schema:name ?name .
  ?article schema:inLanguage ?lang .
  ?article schema:isPartOf [ wikibase:wikiGroup "wikipedia" ] .
  FILTER (!CONTAINS(?name, ':')) .
}
Try it!
--Tagishsimon (talk) 21:43, 22 December 2020 (UTC)[reply]
@Tcp-ip: Yes, when _:blanknode wikibase:wikiGroup "wikipedia" has fewer results than ?article schema:about wd:QXXX it will try to resolve the former pattern first. You can force another order of execution in several ways. My solution which was made before I saw Tagishsimon's is:
SELECT DISTINCT ?lang ?name
WITH
{
  SELECT ?article WHERE { ?article schema:about wd:Q15 . }
} AS %1
WHERE
{
  INCLUDE %1
  ?article schema:inLanguage ?lang ;
              schema:name ?name ;
              schema:isPartOf [ wikibase:wikiGroup "wikipedia" ] .
  FILTER (!CONTAINS(?name, ':')) .
}
Try it!
--Dipsacus fullonum (talk) 22:51, 22 December 2020 (UTC)[reply]
And for those following along at home, that's a named subquery approach, in which WDQS is compelled to execute the first (named AS %1) subquery first, before delivering its results (INCLUDE %1) to constrain the action of the second query. --Tagishsimon (talk) 23:00, 22 December 2020 (UTC)[reply]