Skip to contents

This function will read in the metadata file for a chosen dataset, loop through all the data elements, and ask the user to map (to categorise) each data element to one or more domains. The domains will appear in the Plots tab for the user's reference.

These categorisations will be saved to a csv file, alongside a log file which summarises the session details. To speed up this process, some auto-categorisations will be made by the function for commonly occurring data elements, and categorisations for the same data element can be copied from one table to another.

Example inputs are provided within the package data, for the user to run this function in a demo mode.

Usage

metadata_map(
  metadata_file = NULL,
  domain_file = NULL,
  look_up_file = NULL,
  output_dir = getwd(),
  table_copy = TRUE,
  long_output = TRUE
)

Arguments

metadata_file

This should be a csv download from HDRUK gateway (0_Dataset_Structural_Metadata.csv). Deafult is 'data/metadata.rda' - run '?metadata' to see how it was created.

domain_file

This should be a csv file created by the user, with each domain on a separate line, no header. Default is 'data/domain_list.rda' - run '?domain_list' to see how it was created. Note that 4 domains will be added automatically (NO MATCH/UNSURE, METADATA, ID, DEMOGRAPHICS) and therefore should not be included in the domain_file.

look_up_file

The lookup file makes auto-categorisations intended for variables that appear regularly in health datasets. It only works for 1:1 mappings right now, i.e. DataElement should only be listed once in the file. Default is 'data/look-up.rda' - run '?look_up' to see how it was created.

output_dir

The path to the directory where the two csv output files will be saved. Default is the current working directory.

table_copy

Turn on copying between tables (default TRUE). If TRUE, categorisations you made for all other tables in this dataset will be copied over (if 'OUTPUT_' files are found in output_dir). This can be useful when the same data elements (variables) appear across multiple tables within one dataset; copying from one table to the next will save the user time, and ensure consistency of categorisations across tables.

long_output

Run map_convert.R to create a new longer output 'L-OUTPUT_' which gives each categorisation its own row. Default is TRUE.

Value

The function will return two csv files: 'OUTPUT_' which contains the mappings and 'LOG_' which contains details about the dataset and session.

Examples

if (FALSE) { # \dontrun{
# Demo run requires no function inputs but requires user interaction
metadata_map()
} # }