--- title: "1. Introduction to Dataverse" date: "`r Sys.Date()`" output: html_vignette: fig_caption: false vignette: > %\VignetteIndexEntry{1. Introduction to Dataverse} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- The **dataverse** package is the official R client for [Dataverse 4](https://dataverse.org/) data repositories. The package enables data search, retrieval, and deposit with any Dataverse installation, thus allowing R users to integrate public data sharing into the reproducible research workflow. In addition to this introduction, the package contains additional vignettes covering: * ["Data Search and Discovery"](B-search.html) * ["Data Retrieval"](C-download.html) They can be accessed from [CRAN](https://cran.r-project.org/package=dataverse) or from within R using `vignettes(package = "dataverse")`. ## Installation The dataverse client package can be installed from [CRAN](https://cran.r-project.org/package=dataverse), and you can find the latest development version and report any issues on GitHub: ```R remotes::install_github("iqss/dataverse-client-r") library("dataverse") ``` ## Terminology Dataverse has some terminology that is worth quickly reviewing before showing how to work with Dataverse in R. Dataverse is an application that can be installed in many places. As a result, **dataverse** can work with any installation but you need to specify which installation you want to work with. This can be set by default with an environment variable, `DATAVERSE_SERVER`: ```{r} library("dataverse") Sys.setenv("DATAVERSE_SERVER" = "dataverse.harvard.edu") ``` This should be the Dataverse server, without the "https" prefix or the "/api" URL path, etc. The package attempts to compensate for any malformed values, though. Within a given Dataverse installation, organizations or individuals can create objects that are also called "Dataverses". These Dataverses can then contain other *dataverses*, which can contain other *dataverses*, and so on. They can also contain *datasets* which in turn contain files. You can think of Harvard's Dataverse as a top-level installation, where an institution might have a *dataverse* that contains a subsidiary *dataverse* for each researcher at the organization, who in turn publishes all files relevant to a given study as a *dataset*. ## Search You can search for and retrieve data without a Dataverse account for that a specific Dataverse installation. For example, to search for data files or datasets that mention "ecological inference", we can just do: ```R dataverse_search("ecological inference")[c("name", "type", "description")] ``` The [search vignette](B-search.html) describes this functionality in more detail. ## Get To retrieve a data file, we need to investigate the dataset being returned and look at what files it contains using a variety of functions: ```R get_dataset() dataset_files() get_file_metadata() get_file() get_dataframe_by_name() ``` The most practical of these is likely `get_dataframe_by_name()` which imports the object directly as a dataframe. `get_file()` is more primitive, and calls a raw vector. Recall that, because _datasets_ in Dataverse are a collection of files rather than a single csv file, for example, the `get_dataset()` function does not return data but rather information about a Dataverse dataset. The [download vignette](C-download.html) describes this functionality in more detail. ## Upload and Maintain For "native" Dataverse features (such as user account controls) or to create and publish a dataset, you will need an API key linked to a Dataverse installation account. Instructions for obtaining an account and setting up an API key are available in the [Dataverse User Guide](https://guides.dataverse.org/en/latest/user/account.html). (Note: if your key is compromised, it can be regenerated to preserve security.) Once you have an API key, this should be stored as an environment variable called `DATAVERSE_KEY`. It can be set within R using: ```R Sys.setenv("DATAVERSE_KEY" = "examplekey12345") ``` where `examplekey12345` should be replaced with your own key. With that set, you can easily create a new dataverse, create a dataset within that dataverse, push files to the dataset, and release it, using functions such as ```R initiate_sword_dataset() add_dataset_file() publish_dataset() ``` As of `dataverse` version 0.3.0, we recommended the Python client (`https://github.com/gdcc/pyDataverse`) for these upload and maintenance functions.