This function will read in the metadata file for a chosen dataset, loop
through all the data elements, and ask the user to catergorise/label each
data element as belonging to one or more domains. The domains will appear
in the Plots tab for the user's reference.
These categorisations will be saved to a csv file, alongside a log file which
summarises the session details. To speed up this process, some
auto-categorisations will be made by the function for commonly occurring data
elements, and categorisations for the same data element can be copied from one
table to another.
Example inputs are provided within the package data, for the user to run this
function in a demo mode.
Usage
map_metadata(
json_file = NULL,
domain_file = NULL,
look_up_file = NULL,
output_dir = NULL,
table_copy = TRUE
)
Arguments
- json_file
The metadata file. This should be a json download from the metadata catalogue. By default, 'data/json_metadata.rda' is used - run '?json_metadata' to see how it was created.
- domain_file
The domain list file. This should be a csv file created by the user, with each domain listed on a separate line, no header. By default, 'data/domain_list.rda' is used - run '?domain_list' to see how it was created. Note that 4 domains will be added automatically (NO MATCH/UNSURE, METADATA, ID, DEMOGRAPHICS) and therefore should not be included in the domain_file.
- look_up_file
The look-up table file. By default, 'data/look-up.rda' is used - run '?look_up' to see how it was created. The lookup file makes auto-categorisations intended for variables that appear regularly in health datasets. It only works for 1:1. mappings right now, i.e. DataElement should only be listed once in the file.
- output_dir
The path to the directory where the two csv output files will be saved. By default, the current working directory is used.
- table_copy
Turn on copying between tables (TRUE or FALSE, default TRUE). If TRUE, categorisations you made for all other tables in this dataset will be copied over (if 'OUTPUT_' files are found in output_dir). This can be useful when the same data elements (variables) appear across multiple tables within one dataset; copying from one table to the next will save the user time, and ensure consistency of categorisations across tables.