This function will read in the metadata file for a chosen dataset, loop
through all the data elements, and ask the user to map (to categorise) each
data element to one or more domains. The domains will appear
in the Plots tab for the user's reference.
These categorisations will be saved to a csv file, alongside a log file which
summarises the session details. To speed up this process, some
auto-categorisations will be made by the function for commonly occurring data
elements, and categorisations for the same data element can be copied from one
table to another.
Example inputs are provided within the package data, for the user to run this
function in a demo mode.
Usage
metadata_map(
metadata_file = NULL,
domain_file = NULL,
look_up_file = NULL,
output_dir = getwd(),
table_copy = TRUE,
long_output = TRUE
)
Arguments
- metadata_file
This should be a csv download from HDRUK gateway (0_Dataset_Structural_Metadata.csv). Deafult is 'data/metadata.rda' - run '?metadata' to see how it was created.
- domain_file
This should be a csv file created by the user, with each domain on a separate line, no header. Default is 'data/domain_list.rda' - run '?domain_list' to see how it was created. Note that 4 domains will be added automatically (NO MATCH/UNSURE, METADATA, ID, DEMOGRAPHICS) and therefore should not be included in the domain_file.
- look_up_file
The lookup file makes auto-categorisations intended for variables that appear regularly in health datasets. It only works for 1:1 mappings right now, i.e. DataElement should only be listed once in the file. Default is 'data/look-up.rda' - run '?look_up' to see how it was created.
- output_dir
The path to the directory where the two csv output files will be saved. Default is the current working directory.
- table_copy
Turn on copying between tables (default TRUE). If TRUE, categorisations you made for all other tables in this dataset will be copied over (if 'OUTPUT_' files are found in output_dir). This can be useful when the same data elements (variables) appear across multiple tables within one dataset; copying from one table to the next will save the user time, and ensure consistency of categorisations across tables.
- long_output
Run map_convert.R to create a new longer output 'L-OUTPUT_' which gives each categorisation its own row. Default is TRUE.