Installation
rdb is not on CRAN yet1, but you can install the package directly
from the development source code repository’s master
branch, which we try to keep in a working state at all times.
To install the latest development version of rdb, run the following in R:
if (!("remotes" %in% rownames(installed.packages()))) {
install.packages(pkgs = "remotes",
repos = "https://cloud.r-project.org/")
}
remotes::install_gitlab(repo = "zdaarau/rpkgs/rdb")
Usage
Get data
You can download RDB referendum data via the two functions
rdb::rfrnd()
and rdb::rfrnds()
. The former one
fetches the data of a single referendum only, of which you must already
know its uniqe RDB id
.
The latter function allows to retrieve data for an arbitrary number of
referendums, depending on the conditions you specify via the function’s
various arguments.
To simply retrieve all referendums in the database (excluding draft entries), run
rdb::rfrnds()
which should output a tibble like this one:
#> # A tibble: 17,766 × 69
#> id id_official id_sudd country_code country_name subnational_entity_n…¹ municipality level date is_former_country title_en title_de title_fr
#> <chr> <chr> <chr> <fct> <fct> <chr> <chr> <ord> <date> <lgl> <chr> <chr> <chr>
#> 1 65096e84481d… NA pl0120… PL Poland NA NA nati… 2023-10-15 NA Privati… Privati… NA
#> 2 65096dc1481d… NA pl0220… PL Poland NA NA nati… 2023-10-15 NA raising… Anhebun… NA
#> 3 65096535481d… NA pl0320… PL Poland NA NA nati… 2023-10-15 NA Removal… Abbau d… NA
#> 4 6509625d481d… NA pl0420… PL Poland NA NA nati… 2023-10-15 NA Europea… Beschlu… NA
#> 5 650034e4481d… NA au0120… AU Australia NA NA nati… 2023-10-14 NA Establi… Stimme … NA
#> 6 64e46f7a481d… NA ec0920… EC Ecuador NA NA nati… 2023-08-20 NA No crud… Keine R… NA
#> 7 64e46c8f481d… NA fm0820… FM Micronesia NA NA nati… 2023-07-04 NA Indepen… Unabhän… NA
#> 8 64c8b3ca0b8b… NA fm0720… FM Micronesia NA NA nati… 2023-07-04 NA Providi… NA NA
#> 9 64c8b19c0b8b… NA fm0620… FM Micronesia NA NA nati… 2023-07-04 NA Creatin… Angleic… NA
#> 10 64c8a28d0b8b… NA fm0520… FM Micronesia NA NA nati… 2023-07-04 NA Alterin… Verände… NA
#> # ℹ 17,756 more rows
#> # ℹ abbreviated name: ¹subnational_entity_name
#> # ℹ 56 more variables: question <chr>, question_en <chr>, committee_name <chr>, result <fct>, subterritories_yes <dbl>, subterritories_no <dbl>,
#> # electorate_total <int>, electorate_abroad <int>, votes_yes <int>, votes_no <int>, votes_empty <int>, votes_invalid <int>, votes_per_subterritory <list>,
#> # lower_house_yes <int>, lower_house_no <int>, lower_house_abstentions <int>, upper_house_yes <int>, upper_house_no <int>, upper_house_abstentions <int>,
#> # position_government <fct>, topics_tier_1 <list>, topics_tier_2 <list>, topics_tier_3 <list>, remarks <chr>, files <list>, url_sudd <chr>,
#> # url_swissvotes <chr>, sources <chr>, is_draft <lgl>, date_time_created <dttm>, date_time_last_edited <dttm>, type <fct>, inst_legal_basis_type <ord>, …
The RDB referendum data’s individual variables (columns) are
documented in the codebook. It is
also available as a dataset via rdb::data_codebook
.
Results of rdb::rfrnds()
and some other functions in
this package are by default cached on disk using pkgpins2. You can define the
maximum age of cached results you’re willing to tolerate via the
argument max_cache_age
(defaults to a week). It accepts
anything that can be successfully converted to a lubridate
duration – e.g. a string like "3 hours"
,
"2 days"
or "1 week"
, or a number which will
simply be interpreted as number of seconds.
To only re-download RDB data once every 4 hours and 48 minutes for example, use
rdb::rfrnds(max_cache_age = "4 hours 48 minutes")
Although we usually advise against it, you can also completely opt
out of caching by specifying use_cache = FALSE
. However,
please make sure to not run such code in excess, as it creates
additional (and most likely unnecessary) load on our servers.
Augment data
rdb includes various functions to augment the referendum data by additional information which wouldn’t make sense to be stored in the RDB itself.
For example, you can add the period (week, month, quarter, year,
decade or century) in which a referendum took place using
rdb::add_period()
. By default, the recurring numeric week
number of the year is added (i.e. period = "week"
):
rdb::rfrnds() |>
rdb::add_period() |>
dplyr::select(id, date, week)
#> # A tibble: 17,766 × 3
#> id date week
#> <chr> <date> <int>
#> 1 65096e84481d20233932cc70 2023-10-15 41
#> 2 65096dc1481d20233932cc6e 2023-10-15 41
#> 3 65096535481d20233932cc66 2023-10-15 41
#> 4 6509625d481d20233932cc63 2023-10-15 41
#> 5 650034e4481d20233932cc39 2023-10-14 41
#> 6 64e46f7a481d20233932cc0b 2023-08-20 33
#> 7 64e46c8f481d20233932cc09 2023-07-04 27
#> 8 64c8b3ca0b8bae0c78c7ec8a 2023-07-04 27
#> 9 64c8b19c0b8bae0c78c7ec86 2023-07-04 27
#> 10 64c8a28d0b8bae0c78c7ec82 2023-07-04 27
#> # ℹ 17,756 more rows
Another frequently required augmentation is
rdb::add_country_code_long()
which adds an additional
column country_code_long
containing the ISO 3166-1
alpha-3 code. These three-letter codes are often required to join
RDB referendum data with data from other sources.
See the package reference for all available data augmentation functions.
Transform data
For certain analyses, it might come in handy to transform the referendum data to a different shape beforehand. For a few such transformations, rdb provides ready-made functions.
rdb::as_ballot_dates()
for example transforms the
default referendum-level observations to ones on the level of ballot
date and jurisdiction:
rdb::rfrnds() |> nrow()
#> [1] 17766
rdb::rfrnds() |> rdb::as_ballot_dates() |> nrow()
#> [1] 5884
Noteworthy is also rdb::unnest_var()
which provides a
convenient and standardized way to unnest a multi-value variable of type
list like the topics_tier_*
variables to long format.
rdb::rfrnds() |> nrow()
#> [1] 17766
rdb::rfrnds() |> rdb::unnest_var(topics_tier_1) |> nrow()
#> [1] 21513
See the package reference for all available data transformation functions.
Tabulate and visualize data
rdb also includes some ready-made convenience functions to create tables and (interactive) plots.
If you’d like a tabular overview of the top-ten countries by number of ballot dates per political level for example, you could simply run
rdb::rfrnds() |>
rdb::as_ballot_dates() |>
rdb::tbl_n_rfrnds(by = c(country_name, level),
n_rows = 10L,
order = "descending")
and you’d get the following nicely formatted gt table:
Political level | local | subnational | national | Total |
---|---|---|---|---|
Country | ||||
Switzerland |
1 | 2666 | 325 | 2992 |
United States |
0 | 1252 | 0 | 1252 |
Liechtenstein |
0 | 0 | 88 | 88 |
Germany |
0 | 59 | 9 | 68 |
Italy |
5 | 20 | 33 | 58 |
New Zealand |
0 | 2 | 46 | 48 |
Australia |
0 | 20 | 25 | 45 |
France |
2 | 16 | 27 | 45 |
Canada |
0 | 33 | 3 | 36 |
Ireland |
0 | 0 | 31 | 31 |
… |
… | … | … | … |
Total |
22 | 4243 | 1619 | 5884 |
A table of the number of referendums in the UN subregion Polynesia since 2010 per a certain period, say years, can be generated via
rdb::rfrnds() |>
rdb::add_world_regions() |>
dplyr::filter(un_subregion == "Polynesia" & date > "2009-12-31") |>
rdb::tbl_n_rfrnds_per_period(period = "year")
n | |
---|---|
2022 | 12 |
2019–2021 | 0 |
2018 | 1 |
2015–2017 | 0 |
2014 | 1 |
2013 | 0 |
2012 | 2 |
2011 | 0 |
2010 | 2 |
Total | 18 |
Or a stacked area chart visualizing the worldwide share of referendums per year since 1950, grouped by political level:
rdb::rfrnds() |>
dplyr::filter(date >= "1950-01-01") |>
rdb::tbl_n_rfrnds_per_period(period = "year",
by = "level")
Or, as a final example, the overall (hierarchical) segmentation of the political topics all the referendums in the RDB were about:
rdb::rfrnds() |> rdb::plot_topic_segmentation(method = "per_topic_lineage")
Again, see the package reference for all available data visualization and tabulation functions. More will likely be added in the future.