Skip to content
This repository was archived by the owner on Jan 12, 2026. It is now read-only.

Commit 2180af3

Browse files
committed
v1.0.0 release
1 parent 5051819 commit 2180af3

15 files changed

Lines changed: 285 additions & 206 deletions

.lintr

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
linters: with_defaults(line_length_linter(120), object_usage_linter = NULL, closed_curly_linter = NULL, open_curly_linter = NULL, spaces_left_parentheses_linter = NULL, camel_case_linter = NULL)
2-
exclusions: list()
2+
exclusions: list("R/zzz.R", "R/WikidataQueryServiceR-package.R")
33
exclude: "# Exclude Linting"
44
exclude_start: "# Begin Exclude Linting"
55
exclude_end: "# End Exclude Linting"

DESCRIPTION

Lines changed: 10 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,7 @@
11
Package: WikidataQueryServiceR
22
Title: API Client Library for 'Wikidata Query Service'
33
Version: 1.0.0
4-
Date: 2017-08-05
4+
Date: 2020-06-16
55
Authors@R: c(
66
person("Mikhail", "Popov", email = "mikhail@wikimedia.org",
77
role = c("aut", "cre"), comment = "@bearloga on Twitter"),
@@ -13,16 +13,20 @@ Depends:
1313
R (>= 3.1.2)
1414
Imports:
1515
httr (>= 1.2.1),
16-
dplyr (>= 0.5.0),
16+
dplyr (>= 1.0.0),
1717
jsonlite (>= 1.2),
18-
WikipediR (>= 1.5.0)
18+
WikipediR (>= 1.5.0),
19+
ratelimitr (>= 0.4.1),
20+
purrr (>= 0.3.4),
21+
readr (>= 1.3.1),
22+
rex (>= 1.2.0)
1923
Suggests:
20-
testthat,
21-
lintr
24+
testthat (>= 2.3.0),
25+
lintr (>= 2.0.1)
2226
URL: https://github.com/bearloga/WikidataQueryServiceR
2327
BugReports: https://github.com/bearloga/WikidataQueryServiceR/issues
2428
License: MIT + file LICENSE
2529
Encoding: UTF-8
2630
LazyData: true
2731
Roxygen: list(markdown = TRUE)
28-
RoxygenNote: 6.0.1
32+
RoxygenNote: 7.1.0

NAMESPACE

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -3,3 +3,4 @@
33
export(get_example)
44
export(query_wikidata)
55
export(scrape_example)
6+
import(ratelimitr)

NEWS.md

Lines changed: 9 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,11 @@
1-
WikidataQueryServiceR 0.1.1
2-
---------------------------
1+
# WikidataQueryServiceR 1.0.0
2+
3+
* Fixed example retrieval (was broken due to translation wikitext markers)
4+
* Rate-limiting ([11](https://github.com/bearloga/WikidataQueryServiceR/issues/11))
5+
* Using tidyverse family of packages (tibble, dplyr, purrr, readr)
6+
* Various improvements and modernizations
7+
8+
# WikidataQueryServiceR 0.1.1
39

410
## Changes
511

@@ -11,8 +17,7 @@ WikidataQueryServiceR 0.1.1
1117

1218
* Fixed a bug with JSON-formatted results ([#3](https://github.com/bearloga/WikidataQueryServiceR/issues/3))
1319

14-
WikidataQueryServiceR 0.1.0
15-
---------------------------
20+
# WikidataQueryServiceR 0.1.0
1621

1722
* Initial CRAN release:
1823
- Support for multiple SPARQL queries
Lines changed: 10 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -1,11 +1,8 @@
1-
#' @title WikidataQueryServiceR: An R Wrapper For Wikidata Query Service API
2-
#' @description This is an R wrapper for the
3-
#' [Wikidata Query Service](https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service)
4-
#' (WDQS) which provides a way for tools to query Wikidata via
5-
#' [SPARQL](https://en.wikipedia.org/wiki/SPARQL).
1+
#' @keywords internal
2+
#' @aliases WDQS
63
#' @details [Wikidata Query Service](https://www.mediawiki.org/wiki/Wikidata_query_service)
7-
#' is maintained by [Wikimedia Foundation](https://wikimediafoundation.org/).
8-
#' @references
4+
#' is maintained by the [Wikimedia Foundation](https://wikimediafoundation.org/).
5+
#' @section Resources:
96
#' - [A beginner-friendly course for SPARQL](https://www.wikidata.org/wiki/Wikidata:A_beginner-friendly_course_for_SPARQL)
107
#' - Building a SPARQL query: [Museums on Instagram](https://www.wikidata.org/wiki/Help:SPARQL/Building_a_query/Museums_on_Instagram)
118
#' - [SPARQL Query Examples](https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples) for WDQS
@@ -20,7 +17,10 @@
2017
#' - [WDQS User Manual](https://www.mediawiki.org/wiki/Wikidata_query_service/User_Manual)
2118
#' - [Quick intro to WDQS & SPARQL](https://github.com/bearloga/wmf/blob/master/presentations/talks/Cascadia\%20R\%20Conference\%202017/presentation.md#wikidata-query-service-wdqs)
2219
#' from [my Cascadia R Conference 2017 talk](https://github.com/bearloga/wmf/tree/master/presentations/talks/Cascadia\%20R\%20Conference\%202017)
23-
#' @aliases WDQS
24-
#' @docType package
25-
#' @name WDQSR-package
20+
"_PACKAGE"
21+
22+
# The following block is used by usethis to automatically manage
23+
# roxygen namespace tags. Modify with care!
24+
## usethis namespace: start
25+
## usethis namespace: end
2626
NULL

R/http.R

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,12 @@
1+
#' @import ratelimitr
2+
wdqs_requester <- function() {
3+
req <- function(query, ...) {
4+
httr::POST(
5+
url = "https://query.wikidata.org/sparql",
6+
query = list(query = query),
7+
httr::user_agent("https://github.com/bearloga/WikidataQueryServiceR"),
8+
...
9+
)
10+
}
11+
return(limit_rate(req, rate(n = 30, period = 60)))
12+
}

R/query.R

Lines changed: 43 additions & 46 deletions
Original file line numberDiff line numberDiff line change
@@ -4,85 +4,82 @@
44
#' @param format "simple" uses CSV and returns pure character data frame, while
55
#' "smart" fetches JSON-formatted data and returns a data frame with datetime
66
#' columns converted to `POSIXct`
7-
#' @param ... Additional parameters to supply to [httr::POST]
8-
#' @return A `data.frame`
7+
#' @return A tibble data frame
98
#' @examples
10-
#' # R's versions and release dates:
11-
#' sparql_query <- 'SELECT DISTINCT
9+
#' sparql_query <- "SELECT
1210
#' ?softwareVersion ?publicationDate
1311
#' WHERE {
1412
#' BIND(wd:Q206904 AS ?R)
1513
#' ?R p:P348 [
1614
#' ps:P348 ?softwareVersion;
1715
#' pq:P577 ?publicationDate
1816
#' ] .
19-
#' }'
17+
#' }"
2018
#' query_wikidata(sparql_query)
2119
#'
2220
#' \dontrun{
23-
#' # "smart" format converts all datetime columns to POSIXct
2421
#' query_wikidata(sparql_query, format = "smart")
2522
#' }
23+
#' @section Query limits:
24+
#' There is a hard query deadline configured which is set to 60 seconds. There
25+
#' are also following limits:
26+
#' - One client (user agent + IP) is allowed 60 seconds of processing time each
27+
#' 60 seconds
28+
#' - One client is allowed 30 error queries per minute
29+
#' See [query limits section](https://www.mediawiki.org/wiki/Wikidata_Query_Service/User_Manual#Query_limits)
30+
#' in the WDQS user manual for more information.
2631
#' @seealso [get_example]
2732
#' @export
28-
query_wikidata <- function(sparql_query, format = c("simple", "smart"), ...) {
29-
if (!format[1] %in% c("simple", "smart")) {
33+
query_wikidata <- function(sparql_query, format = c("simple", "smart")) {
34+
format <- format[1]
35+
if (!format %in% c("simple", "smart")) {
3036
stop("`format` must be either \"simple\" or \"smart\"")
3137
}
3238
output <- lapply(sparql_query, function(sparql_query) {
33-
if (format[1] == "simple") {
34-
response <- httr::POST(
35-
url = "https://query.wikidata.org/sparql",
36-
query = list(query = sparql_query),
37-
httr::add_headers(Accept = "text/csv"),
38-
httr::user_agent("https://github.com/bearloga/WikidataQueryServiceR"),
39-
...
40-
)
39+
rate_limited_query <- wdqs_requester()
40+
if (format == "simple") {
41+
response <- rate_limited_query(sparql_query, httr::add_headers(Accept = "text/csv"))
4142
httr::stop_for_status(response)
4243
if (httr::http_type(response) == "text/csv") {
43-
con <- textConnection(httr::content(response, as = "text", encoding = "UTF-8"))
44-
df <- utils::read.csv(con, header = TRUE, stringsAsFactors = FALSE)
45-
message(nrow(df), " rows were returned by WDQS")
46-
return(df)
44+
content <- httr::content(response, as = "text", encoding = "UTF-8")
45+
return(readr::read_csv(content))
4746
} else {
4847
stop("returned response is not formatted as a CSV")
4948
}
5049
} else {
51-
response <- httr::GET(
52-
url = "https://query.wikidata.org/sparql",
53-
query = list(query = sparql_query),
54-
format = "json",
55-
httr::user_agent("https://github.com/bearloga/WikidataQueryServiceR"),
56-
...
57-
)
50+
response <- rate_limited_query(sparql_query, httr::add_headers(Accept = "application/sparql-results+json"))
5851
httr::stop_for_status(response)
5952
if (httr::http_type(response) == "application/sparql-results+json") {
60-
temp <- jsonlite::fromJSON(httr::content(response, as = "text", encoding = "UTF-8"), simplifyVector = FALSE)
53+
content <- httr::content(response, as = "text", encoding = "UTF-8")
54+
temp <- jsonlite::fromJSON(content, simplifyVector = FALSE)
6155
}
6256
if (length(temp$results$bindings) > 0) {
63-
df <- as.data.frame(dplyr::bind_rows(lapply(temp$results$bindings, function(x) {
64-
return(lapply(x, function(y) { return(y$value) }))
65-
})))
66-
datetime_cols <- vapply(temp$results$bindings[[1]], function(x) {
67-
if ("datatype" %in% names(x)) {
68-
return(x$datatype == "http://www.w3.org/2001/XMLSchema#dateTime")
57+
data_frame <- purrr::map_dfr(temp$results$bindings, function(binding) {
58+
return(purrr::map_chr(binding, ~ .x$value))
59+
})
60+
datetime_columns <- purrr::map_lgl(temp$results$bindings[[1]], function(binding) {
61+
if ("datatype" %in% names(binding)) {
62+
return(binding[["datatype"]] == "http://www.w3.org/2001/XMLSchema#dateTime")
6963
} else {
7064
return(FALSE)
7165
}
72-
}, FALSE)
73-
if (any(datetime_cols)) {
74-
for (datetime_col in which(datetime_cols)) {
75-
df[[datetime_col]] <- as.POSIXct(df[[datetime_col]], format = "%Y-%m-%dT%H:%M:%SZ", tz = "GMT")
76-
}
77-
}
78-
message(nrow(df), " rows were returned by WDQS")
79-
return(df)
66+
})
67+
data_frame <- dplyr::mutate_if(
68+
.tbl = data_frame,
69+
.predicate = datetime_columns,
70+
.funs = as.POSIXct,
71+
format = "%Y-%m-%dT%H:%M:%SZ", tz = "GMT"
72+
)
8073
} else {
81-
message("0 rows were returned by WDQS")
82-
return(data.frame(matrix(character(), nrow = 0, ncol = length(temp$head$vars),
83-
dimnames = list(c(), unlist(temp$head$vars))),
84-
stringsAsFactors = FALSE))
74+
data_frame <- dplyr::as_tibble(
75+
matrix(
76+
character(),
77+
nrow = 0, ncol = length(temp$head$vars),
78+
dimnames = list(c(), unlist(temp$head$vars))
79+
)
80+
)
8581
}
82+
return(data_frame)
8683
}
8784
})
8885
if (length(output) == 1) {

R/utils.R

Lines changed: 15 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -26,13 +26,20 @@ get_example <- function(example_name) {
2626
page_name = "Wikidata:SPARQL query service/queries/examples",
2727
as_wikitext = TRUE
2828
)
29-
wiki <- strsplit(content$parse$wikitext$`*`, "\n")[[1]]
30-
wiki <- wiki[wiki != ""]
31-
return(vapply(example_name, function(example_name) {
32-
heading_line <- which(grepl(paste0("^===\\s?", example_name, "\\s?===$"), wiki, fixed = FALSE))
33-
start_line <- which(grepl("{{SPARQL", wiki[(heading_line + 1):length(wiki)], fixed = TRUE))[1]
34-
end_line <- which(grepl("}}", wiki[(heading_line + start_line + 1):length(wiki)], fixed = TRUE))[1]
35-
query <- paste0(wiki[(heading_line + start_line):(heading_line + start_line + end_line - 1)], collapse = "\n")
29+
wikitext <- strsplit(content$parse$wikitext$`*`, "\n")[[1]]
30+
wikitext <- wikitext[wikitext != ""]
31+
examples <- purrr::map(example_name, function(example_name) {
32+
regex <- paste0(
33+
"^={2,}\\s?(<translate><!--T:[0-9]+-->)?\\s?",
34+
rex::escape(example_name),
35+
"\\s?(</translate>)?\\s?={2,}$"
36+
)
37+
heading_line <- which(grepl(regex, wikitext, fixed = FALSE))
38+
start_line <- which(grepl("{{SPARQL", wikitext[(heading_line + 1):length(wikitext)], fixed = TRUE))[1]
39+
end_line <- which(grepl("}}", wikitext[(heading_line + start_line + 1):length(wikitext)], fixed = TRUE))[1]
40+
query <- paste0(wikitext[(heading_line + start_line):(heading_line + start_line + end_line - 1)], collapse = "\n")
3641
return(sub("^\\s*\\{\\{SPARQL2?\\n?\\|query\\=", "", query))
37-
}, ""))
42+
})
43+
names(examples) <- example_name
44+
return(examples)
3845
}

R/zzz.R

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,3 @@
1+
.onAttach <- function(libname, pkgname) {
2+
packageStartupMessage("See ?WDQS for resources on Wikidata Query Service and SPARQL")
3+
}

README.Rmd

Lines changed: 9 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -17,7 +17,7 @@ library(printr)
1717
[![CRAN Total Downloads](https://cranlogs.r-pkg.org/badges/grand-total/WikidataQueryServiceR)](https://cran.r-project.org/package=WikidataQueryServiceR)
1818
[![License: MIT](https://img.shields.io/badge/license-MIT-blue.svg)](https://opensource.org/licenses/MIT)
1919

20-
This is an R wrapper for the [Wikidata Query Service (WDQS)](https://www.mediawiki.org/wiki/Wikidata_query_service) which provides a way for tools to query [Wikidata](https://www.wikidata.org/wiki/Wikidata:Main_Page) via [SPARQL](https://en.wikipedia.org/wiki/SPARQL) (see the beta at https://query.wikidata.org/). It is written in and for R, and was inspired by Oliver Keyes' [WikipediR](https://github.com/Ironholds/WikipediR) and [WikidataR](https://github.com/Ironholds/WikidataR) packages.
20+
This is an R wrapper for the [Wikidata Query Service (WDQS)](https://www.mediawiki.org/wiki/Wikidata_query_service) which provides a way for tools to query [Wikidata](https://www.wikidata.org/wiki/Wikidata:Main_Page) via [SPARQL](https://en.wikipedia.org/wiki/SPARQL) (see the beta at https://query.wikidata.org/). It is written in and for R, and was inspired by Os Keyes' [WikipediR](https://github.com/Ironholds/WikipediR) and [WikidataR](https://github.com/Ironholds/WikidataR) packages.
2121

2222
__Author:__ Mikhail Popov (Wikimedia Foundation)<br/>
2323
__License:__ [MIT](http://opensource.org/licenses/MIT)<br/>
@@ -32,8 +32,8 @@ install.packages("WikidataQueryServiceR")
3232
To install the development version:
3333

3434
```R
35-
# install.packages(c("devtools", "httr", "dplyr", "jsonlite"))
36-
devtools::install_github("bearloga/WikidataQueryServiceR")
35+
# install.packages("remotes")
36+
remotes::install_github("bearloga/WikidataQueryServiceR")
3737
```
3838

3939
## Usage
@@ -68,20 +68,21 @@ For more example SPARQL queries, see [this page](https://www.wikidata.org/wiki/W
6868
The package provides a [WikipediR](https://github.com/Ironholds/WikipediR/)-based function for getting SPARQL queries from the [WDQS examples page](https://www.wikidata.org/wiki/Wikidata:SPARQL_query_service/queries/examples).
6969

7070
```{r get_examples, cache=TRUE}
71-
sparql_query <- get_example(c("Cats", "Horses", "Largest cities with female mayor"))
71+
sparql_query <- get_example(c("Cats", "How many states this US state borders"))
7272
```
7373
```{r, eval=FALSE}
74-
sparql_query[["Largest cities with female mayor"]]
74+
sparql_query[["How many states this US state borders"]]
7575
```
7676
```{r, echo=FALSE, results='asis'}
77-
cat("```SPARQL\n", sparql_query[["Largest cities with female mayor"]], "\n```")
77+
cat("```SPARQL\n", sparql_query[["How many states this US state borders"]], "\n```")
7878
```
7979

80-
Now we can run all three extracted SPARQL queries and get back three data.frames:
80+
Now we can run all extracted SPARQL queries:
8181

8282
```{r run_examples, cache=TRUE, dependson='get_examples'}
8383
results <- query_wikidata(sparql_query)
84-
results$`Largest cities with female mayor`[, c("cityLabel", "mayorLabel")]
84+
lapply(results, dim)
85+
head(results$`How many states this US state borders`)
8586
```
8687

8788
## Links for learning SPARQL

0 commit comments

Comments
 (0)