Skip to content

Commit 84f1139

Browse files
Merge pull request #27 from The-Strategy-Unit/francisbarton/issue18
Add usage instructions to README
2 parents 38a5b45 + c489804 commit 84f1139

File tree

4 files changed

+82
-7
lines changed

4 files changed

+82
-7
lines changed

R/get_container.R

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5,12 +5,12 @@
55
#'
66
#' @param container_name Name of the container as a string. NULL by default,
77
#' which means the function will look instead for a container name stored in
8-
#' the environment variable "AZ_STORAGE_CONTAINER"
8+
#' the environment variable "AZ_CONTAINER"
99
#' @param ... arguments to be passed through to `get_auth_token()`
1010
#' @returns An Azure blob container (list object of class "blob_container")
1111
#' @export
1212
get_container <- function(container_name = NULL, ...) {
13-
container_envvar_name <- "AZ_STORAGE_CONTAINER"
13+
container_envvar_name <- "AZ_CONTAINER"
1414
cst_msg1 <- cst_error_msg("{.var container_name} must be a string")
1515
cst_msg2 <- cst_error_msg("{.envvar {container_envvar_name}} is not set")
1616
c_name <- (container_name %||% Sys.getenv(container_envvar_name, NA)) |>

R/read_azure_files.R

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -63,6 +63,7 @@ read_azure_json <- function(container, file, path = "/", info = NULL, ...) {
6363

6464

6565
#' Common routine for all `read_azure_*()` functions
66+
#'
6667
#' Downloads the blob with `dest = NULL`, which keeps the data in memory
6768
#'
6869
#' @inheritParams read_azure_parquet

README.md

Lines changed: 78 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -38,12 +38,74 @@ pak::pak("The-Strategy-Unit/azkit")
3838

3939
## Usage
4040

41-
_To be added._
41+
A primary function in `{azkit}` enables access to an Azure blob container:
42+
43+
```r
44+
data_container <- azkit::get_container()
45+
46+
```
47+
Authentication is handled "under the hood" by the `get_container()` function,
48+
but if you need to, you can explicitly return an authentication token for
49+
inspection or testing:
50+
51+
```r
52+
my_token <- azkit::get_auth_token()
53+
54+
```
55+
56+
The container returned will be set by the name stored in the `AZ_CONTAINER`
57+
environment variable, if any, by default, but you can override this by supplying
58+
a container name to the function:
59+
60+
```r
61+
custom_container <- azkit::get_container("custom")
62+
```
63+
64+
Return a list of all available containers in your default Azure storage with:
65+
66+
```r
67+
list_container_names()
68+
```
69+
70+
Once you have access to a container, you can use one of a set of data reading
71+
functions to bring data into R from `.parquet`, `.rds`, `.json` or `.csv` files:
72+
73+
```r
74+
pqt_data <- azkit::read_azure_parquet(data_container, "v_important_data")
75+
76+
```
77+
78+
The functions will try to match a file of the required type using the `file`
79+
name supplied. In the case above, "v_important_data" would match a file named
80+
"v_important_data.parquet", no need to supply the file extension.
81+
82+
By default the `read_*` functions will look in the root folder of the container.
83+
To specify a subfolder, supply this to the `path` argument.
84+
The functions will _not_ search recursively into further subfolders, so the path
85+
needs to be full and accurate.
86+
87+
If there is more than 1 file matching the string supplied to `file` argument,
88+
the functions will throw an error.
89+
Specifying the exact filename will avoid this of course - but shorter `file`
90+
arguments may be convenient in some situations.
91+
92+
Currently these functions only read in a single file at a time.
93+
94+
Setting the `info` argument to `TRUE` will enable the functions to give some
95+
confirmatory feedback on what file is being read in.
96+
You can also pass through arguments to for example `readr::read_csv()`:
97+
98+
```r
99+
csv_data <- data_container |>
100+
azkit::read_azure_csv("vital_data.csv", path = "data", col_types = "ccci")
101+
102+
```
42103

43104
## Environment variables
44105

45-
To access Azure Storage you need to add some variables to a
46-
[`.Renviron` file][posit_env] in your project.
106+
To access Azure Storage you will want to set some environment variables.
107+
The neatest way to do this is to include a [`.Renviron` file][posit_env] in
108+
your project folder.
47109

48110
⚠️These values are sensitive and should not be exposed to anyone outside The
49111
Strategy Unit.
@@ -54,12 +116,24 @@ Your `.Renviron` file should contain the variables below.
54116
Ask a member of [the Data Science team][suds] for the necessary values.
55117

56118
```
119+
# essential
57120
AZ_STORAGE_EP=
58-
AZ_STORAGE_CONTAINER=
121+
# useful but not absolutely essential:
122+
AZ_CONTAINER=
123+
124+
# optional, for certain authentication scenarios:
125+
AZ_TENANT_ID=
126+
AZ_CLIENT_ID=
127+
AZ_APP_SECRET=
59128
```
60129

61130
These may vary depending on the specific container you’re connecting to.
62131

132+
For one project you might want to set the default container (`AZ_CONTAINER`) to
133+
one value, but for a different project you might be mainly working with a
134+
different container so it would make sense to set the values within the
135+
`.Renviron` file for each project, rather than globally for your account.
136+
63137
## Getting help
64138

65139
Please use the [Issues][issues] feature on GitHub to report any bugs, ideas

man/get_container.Rd

Lines changed: 1 addition & 1 deletion
Some generated files are not rendered by default. Learn more about customizing how changed files appear on GitHub.

0 commit comments

Comments
 (0)