--- title: "Executing SQL" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Executing SQL} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE, comment = "#>" ) ``` `sqlhelper`'s main purpose is the execution of _files_ of SQL, with options for controlling the execution of individual SQL queries within each file. The function `run_files()` executes .SQL files. Internally, `run_files()` calls the functions `read_sql()`, `prepare_sql()` and `run_queries()` and these functions can be used to read and prepare SQL files without executing them, and to execute SQL query strings. ## Executing SQL files Executing SQL code requires a connection to a database, and `sqlhelper` provides ways to automate creating and managing connections. These are described in `vignette("connections")`. Once connections are configured, the `run_files()` command can be used to execute files of SQL code. ```{r execute_file} library(sqlhelper) connect("examples/sqlhelper_db_conf.yml", exclusive = TRUE) DBI::dbWriteTable( default_conn(), name = "IRIS", iris) readLines("examples/example.sql") |> writeLines() petal_length <- 1.3 results <- run_files("examples/example.sql") results ``` As well as individual file names, `run_files()` accepts a vector of file names. ### Accessing results of queries `run_files()` returns a list of results of the same length as the number of queries. In the above example, names are assigned to queries with _interpreted comments_, of the form `-- qname = my_query_name`. If queries are named, individual results can be accessed by the same name: ```{r named_access} results$short_petal_setosa ``` Results returned as list may also be accessed by index, of course. However, if a file contains a single query, the result of that query will be returned as is, (i.e. an object, not a single element list). Beware of the usual gotchas around list names. `sqlhelper` will not complain if you give two queries the same name, but if you then try to access the results by name, you will only get the result of the first query with that name. This is particularly relevant if your project executes queries from multiple files and those files are developed by different people. Similarly, be careful not to use anything in query names that that R will interpret as an operator or special character. In the above example, naming the query `short-petal-setosa` will cause an error because R will interpret `-` as a subtraction. ### Controlling execution of individual queries As well as naming queries, interpreted comments can be used to control aspects of execution on a per-query basis. For example, queries are executed with `DBI::dbGetQuery()` by default, but `sqlite` will complain if you use this to send a statement to the database. You can control the execution function with the `execmethod` keyword: ```sql -- qname = bobby_tables -- execmethod = sendq DROP TABLE Students; ``` All interpreted comments follow the form `-- = `. Interpretable keywords are: * __qname__ A name for this query * __interpolate__ "yes" or "no" - should this query be parameterized? * __quotesql__ "yes" or "no" - should interpolated characters be quoted? * __execmethod__ One of "get", "execute", "sendq", "sends" or "spatial" - which method should be used to execute the query? "get" means `DBI::dbGetQuery()`; "execute" means `DBI::dbExecute()`; "sendq" means `DBI::dbSendQuery()`; "sends" means `DBI::dbSendStatement()`; "spatial" means `sf::st_read()`. * __geometry__ The name of a spatial column. Ignored if `execmethod` is not 'spatial' * __conn_name__ The name of a connection to execute this query against ```{r interpreted_comments} ## add combined standard/spatial example ``` #### Cascaded execution parameters {#cascade} All interpreted comments except `qname` are cascaded _within their file_, meaning that if you want to use the same values throughout, you need only set them for the first query. See also `read_sql()` for details. If you want to change the execution parameters for the first query only and retain the defaults for the remainder you will need to either: * use interpreted comments to explicitly reset the defaults for the second query; or * put the second and subsequent queries in a different file. You can prevent cascading by passing `cascade = FALSE`. ```{r cascade} ## cascaded comments example ``` ### Interpolation By default, `sqlhelper` will attempt to parameterize SQL queries with `glue::glue_sql()` using values from the current environment. This means that values from R can be easily inserted in your SQL code, or calculated in situ: ```{r} readLines("examples/petal_length_params.sql") |> writeLines() petal_length = 1.2 run_files("examples/petal_length_params.sql") ``` Interpolation behaviour can be controlled using the keywords __interpolate__ and __quotesql__, and the `values` parameter of `prepare_sql()` (which can be also passed to `run_files()` and `run_queries()`). The default behaviour is to quote SQL strings (i.e. interpolate with `glue::glue_sql()`); if this is not desired it can be avoided with `-- quotesql = no` (for an individual query in a file; but see [the 'cascaded comments' example, above](#cascade)) or by passing `quotesql = "no"` as a parameter to `run_files()` (for all queries). If strings are not quoted they will be inserted bare; whilst this is occasionally useful, great care should be taken to sanitize any interpolated values. If you want to skip interpolation for an individual query, precede it with `-- interplate = no`. If you want to skip interpolation altogether, pass `interpolate = "no"` as a parameter and see also [the 'cascaded comments' example, above](#cascade). #### Passing parameter values {#values} Sometimes you may need to parameterize your SQL with values that are not in the calling environment. This is particularly important if your are executing SQL code from within a package: you cannot rely on, and should not risk writing to, your users' `globalenv()`. To supply interpolation values to `run_files()` and `run_queries()`, pass a populated environment as the `values` parameter. ```{r values} # reusing the petal length parameter example # A user may have a petal_length parameter in the globalenv already print(petal_length) result_from_globalenv <- run_files("examples/petal_length_params.sql") result_from_globalenv$short_petal_setosa # a bespoke environment can provide a specific set of values for interpolation my_values <- new.env() my_values$petal_length <- 1.4 result_from_my_values <- run_files("examples/petal_length_params.sql", values = my_values) result_from_my_values$short_petal_setosa ``` ### Binding Binding can be performed alongside interpolation. Queries and statements are first interpolated and then should then be executed with `DBI::dbSendQuery()` or `DBI::dbSendStatement()`. They may then be bound and the result fetched. ```{r binding} readLines("examples/binding.SQL") |> writeLines() petal_width <- 0.2 result <- run_files("examples/binding.SQL") DBI::dbBind(result$binding_example, list("setosa")) DBI::dbFetch(result$binding_example) DBI::dbClearResult(result$binding_example) ``` ## Reading and preparing SQL files SQL files and strings can be read and prepared without being executed by the `read_sql()` and `prepare_sql()` functions. These functions return tibbles containing the prepared SQL, associated metadata (e.g. filename), and execution parameters. These functions enable both inspection of prepared SQL and parameters for debugging, and further manipulation of SQL queries prior to execution. ## Executing SQL strings One of the main objectives of `sqlhelper` is to reduce the incidence of SQL written as strings in R code. However, it is occasionally convenient for interactive exploratory work to type a query as a string. For this you may use `run_queries()`. This function can also be used to execute queries that have been read from files (e.g. with `read_sql()`) and then manipulated programmatically before execution. ## Passing ad-hoc connections to functions It may not always be possible or desirable to have `sqlhelper` manage your database connections. For example, the use of secrets is not yet supported in `sqlhelper` connections. In these cases, connections created outside `sqlhelper` may be passed to to `run_files()` or `run_queries()`. ```{r conns} con <- DBI::dbConnect(RSQLite::SQLite(), ":memory:") cars <- mtcars cars$model <- row.names(mtcars) DBI::dbWriteTable(con, "cars", cars) minmpg = 30 run_queries("SELECT model, mpg, cyl FROM CARS WHERE mpg >= {minmpg}", default.conn = con) ```