% Generated by roxygen2: do not edit by hand
% Please edit documentation in R/get_eurostat_data.R
\name{get_eurostat_data}
\alias{get_eurostat_data}
\title{Download/extract Eurostat Data}
\usage{
get_eurostat_data(id, filters = NULL, ignore.case = FALSE,
  date_filter = NULL, label = FALSE, select_freq = NULL,
  cache = TRUE, update_cache = FALSE, cache_dir = NULL,
  compress_file = TRUE, stringsAsFactors = default.stringsAsFactors(),
  keep_flags = FALSE, verbose = FALSE, ...)
}
\arguments{
\item{id}{A code name for the dataset of interest.
See \code{\link{search_eurostat_toc}} for details how to get an id.}

\item{filters}{a string or a character vector containing words to filter by the different concepts or geographical location.
If filter applied only part of the dataset is downloaded through the API. The words can be  
any words, Eurostat variable codes, and values available in the DSD \code{\link{search_eurostat_dsd}}. 
The default is \code{NULL}, in this case the whole dataset is returned via the bulk download. To filter by time see \code{date_filter} below.
If after filtering still the dataset has more observations than the limit per query via the API, then the bulk download is used to retrieve the data.}

\item{ignore.case}{a boolean with the default value \code{FALSE}, if the strings provided in \code{filters} shall be matched as is or the case can be ignored.}

\item{date_filter}{a vector which can be numeric or character containing dates to filter the dataset.
If date filter applied only part of the dataset is downloaded through the API. 
The default is \code{NULL}, in this case the whole dataset is returned via the bulk download.
If after filtering still the dataset has more observations than the limit per query via the API, then the bulk download is used to retrieve the data.}

\item{label}{a boolean with the default \code{FALSE}. If it is \code{TRUE} then the code values are replaced by the name from the Data Structure Definition (DSD) \code{\link{get_eurostat_dsd}}.
For example instead of "D1110A", "Raw cows' milk from farmtype" is used or "HU32" is replaced by "Észak-Alföld".}

\item{select_freq}{a character symbol for a time frequency when a dataset has multiple time
frequencies. Possible values are:
  A = annual, S = semi-annual, H = half-year, Q = quarterly, M = monthly, W = weekly, D = daily. 
  The default is \code{NULL} as most datasets have just one time
frequency and in this case if there are multiple frequencies, then only the most common frequency kept.
If all the frequencies needed the \code{\link{get_eurostat_raw}} can be used.}

\item{cache}{a logical whether to do caching. Default is \code{TRUE}. Affects 
only queries without filtering. If \code{filters} or \code{date_filter} is used then there is no caching.}

\item{update_cache}{a logical with a default value \code{FALSE}, whether to update the data in the cache. Can be set also with
\code{options(restatapi_update=TRUE)}}

\item{cache_dir}{a path to a cache directory. The \code{NULL} (default) uses the memory as cache. 
If the folder \code{cache_dir} directory does not exist it saves in the 'restatapi' directory 
under the temporary directory from \code{tempdir()}. Directory can also be set with
\code{option(restatapi_cache_dir=...)}.}

\item{compress_file}{a logical whether to compress the
RDS-file in caching. Default is \code{TRUE}.}

\item{stringsAsFactors}{if \code{TRUE} (the default) the non-numeric columns are
converted to factors. If the value \code{FALSE}
they are returned as a characters.}

\item{keep_flags}{a logical whether the observation status (flags) - e.g. "confidential",
"provisional", etc. - should be kept in a separate column or if they
can be removed. Default is \code{FALSE}. For flag values see: 
\url{http://ec.europa.eu/eurostat/data/database/information}.}

\item{verbose}{A boolean with default \code{FALSE}, so detailed messages (for debugging) will not printed.
Can be set also with \code{options(restatapi_verbose=TRUE)}}

\item{...}{further argument for \code{\link{load_cfg}} function.}
}
\value{
a data.table. One column for each dimension in the data,
        the time column for a time dimension, 
        the values column for numerical values and the flags column if the \code{keep_flags=TRUE}.
        Eurostat data does not include all missing values. The missing values are dropped if all dimensions are missing
        on particular time.
}
\description{
Download full or partial data set from \href{https://ec.europa.eu/eurostat/}{Eurostat} database.
}
\details{
Data sets are downloaded from the Eurostat Web Services 
\href{https://ec.europa.eu/eurostat/web/sdmx-web-services}{SDMX API} if there is a filter otherwise the 
\href{http://ec.europa.eu/eurostat/estat-navtree-portlet-prod/BulkDownloadListing}{the Eurostat bulk download facility} is used.
If only the table \code{id} is given, the whole table is downloaded from the
bulk download facility. If also \code{filters} or \code{date_filter} is defined then the SDMX API is
used. In case after filtering the dataset has more rows than the limitation of the SDMX API (1 million values at one time) then the bulk download is used to retrieve the whole dataset .

By default all datasets cached as they are often rather large. 
The datasets cached in memory (default) or can be stored in a temporary directory if \code{cache_dir} or \code{option(restatpi_cache_dir)} is defined.
The cache can be emptied with \code{\link{clean_restatapi_cache}}.

The \code{id}, is a value from the \code{code} column of the table of contents (\code{\link{get_eurostat_toc}}), and can be searched for with the \code{\link{search_eurostat_toc}} function. The id value can be retrieved from the \href{http://ec.europa.eu/eurostat/data/database}{Eurostat database}
 as well. The Eurostat
database gives codes in the Data Navigation Tree after every dataset
in parenthesis.

Filtering can be done by the codes as described in the API documentation providing in the correct order and connecting with "." and "+". 
If we do not know the codes we can filter based on words or by the mix of the two putting in a vector like \code{c("AT$","Belgium","persons","Total")}. Be careful that the filter is case sensitive, if you do not know exactly you can use the option \code{ignore.case=TRUE}, but it can include unwanted elements as well. 
We do not have to worry about the correct order it will be put in the correct place based on the DSD. In the \code{filters} parameter regular expressions can be used as well. 

The \code{date_filter} shall be a string in the format yyyy[-mm][-dd]. The month and the day part is optional, but if we use the years and we have monthly frequency then all the data for the given year is retrieved.
The string can be extended by adding the "<" or ">" to the beginning or to the end of the string. In this case the date filter is treated as range, and the date is used as a starting or end date. The data will include the observation of the start/end date.
A single date range can be defined as well by concatenating two dates with the ":", e.g. "2016-08:2017-03-15". The dates can have different length. In this case cannot use the "<" or ">" characters.
If there are multiple dates which is not a continuous range, it can be put in vector in any order like c("2016-08",2013-2015,"2017-07-01"). In this case, as well, it is  not possible to use the  "<" or ">" characters.
}
\examples{
\dontshow{
if ((parallel::detectCores()<2)|(Sys.info()[['sysname']]=='Windows')){
   options(restatapi_cores=1)
}else{
   options(restatapi_cores=2)
}    
}
\donttest{
dt<-get_eurostat_data("NAMA_10_GDP")
dt<-get_eurostat_data("nama_10_gdp",update_cache=TRUE)
dt<-get_eurostat_data("nama_10_gdp",cache_dir="/tmp")
options(restatapi_update=FALSE)
options(restatapi_cache_dir=file.path(tempdir(),"restatapi"))
dt<-get_eurostat_data("avia_gonc",select_freq="A",cache=FALSE)
dt<-get_eurostat_data("agr_r_milkpr",date_filter=2008,keep_flags=TRUE)
dt<-get_eurostat_data("avia_par_me",
                      filters="BE$",
                      date_filter=c(2016,"2017-03","2017-07-01"),
                      select_freq="Q",
                      label=TRUE)
dt<-get_eurostat_data("agr_r_milkpr",
                      filters=c("BE$","Hungary"),
                      date_filter="2007-06<",
                      keep_flags=TRUE)
dt<-get_eurostat_data("nama_10_a10_e",
                      filters=c("Annual","EU28","Belgium","AT","Total","EMP_DC","person"),
                      date_filter=c("2008",2002,2013:2018))
dt<-get_eurostat_data("avia_par_me",
                      filters="Q...ME_LYPG_HU_LHBP+ME_LYTV_UA_UKKK",
                      date_filter=c("2016-08","2017-07-01"),
                      select_freq="M")
}
}
\seealso{
\code{\link{search_eurostat_toc}},\code{\link{search_eurostat_dsd}}
}
