Skip to contents

This function will read in the metadata file for a chosen dataset, loop through all the data elements, and ask the user to catergorise/label each data element as belonging to one or more domains. The domains will appear in the Plots tab for the user's reference.

These categorisations will be saved to a csv file, alongside a log file which summarises the session details. To speed up this process, some auto-categorisations will be made by the function for commonly occurring data elements, and categorisations for the same data element can be copied from one table to another.

Example inputs are provided within the package data, for the user to run this function in a demo mode.

Usage

mapMetadata(
  json_file = NULL,
  domain_file = NULL,
  look_up_file = NULL,
  output_dir = NULL,
  table_copy = TRUE
)

Arguments

json_file

The metadata file. This should be a json download from the metadata catalogue. By default, 'data/json_metadata.rda' is used - run '?json_metadata' to see how it was created.

domain_file

The domain list file. This should be a csv file created by the user, with each domain listed on a separate line, no header. By default, 'data/domain_list.rda' is used - run '?domain_list' to see how it was created. Note that 4 domains will be added automatically (NO MATCH/UNSURE, METADATA, ID, DEMOGRAPHICS) and therefore should not be included in the domain_file.

look_up_file

The look-up table file. By default, 'data/look-up.rda' is used - run '?look_up' to see how it was created. The lookup file makes auto-categorisations intended for variables that appear regularly in health datasets. It only works for 1:1. mappings right now, i.e. DataElement should only be listed once in the file.

output_dir

The path to the directory where the two csv output files will be saved. By default, the current working directory is used.

table_copy

Turn on copying between tables (TRUE or FALSE, default TRUE). If TRUE, categorisations you made for all other tables in this dataset will be copied over (if 'OUTPUT_' files are found in output_dir). This can be useful when the same data elements (variables) appear across multiple tables within one dataset; copying from one table to the next will save the user time, and ensure consistency of categorisations across tables.

Value

The function will return two csv files: 'OUTPUT_' which contains the mappings and 'LOG_' which contains details about the dataset and session.