Browse and categorise health metadata • browseMetadata

What is the browseMetadata package?
- Browse metadata
- Map metadata
Getting started with browseMetadata
- Installation and set-up
- Demo (using the R Studio IDE)
  - browseMetadata()
  - mapMetadata()
Using a custom metadata input
Using a custom domain list input
Using a custom lookup table input
Tips and future steps
License
Citation
Contributing
Acknowledgements

What is the `browseMetadata` package?

The browseMetadata package allows researchers to explore publicly available metadata from the Health Data Research Gateway and the connected Metadata Catalogue. This tool helps researchers plan projects by interacting with metadata prior to gaining full access to health datasets. Learn more about health metadata here.

At the early stages of a project, researchers can use this tool to browse datasets and categorise variables.

Browse metadata

What datasets are available? Which datasets fit my research?

The tool summarises datasets and their tables, and displays how many variables within each table have descriptions.

example bar plot showing number of variables for each table alongside counts of whether variables have missing descriptions

Map metadata

Which variables align with my research domains?
(e.g. socioeconomic, childhood adverse events, diagnoses, culture and community)

After browsing, users can categorise each variable into predefined research domains. To speed up this manual process, the function automatically categorises frequently used variables (e.g. ID, Sex, Age). The function also accounts for variables that appear across multiple tables and allows users to copy their categorisations to ensure consistency. The output files can be used in later analyses to filter and visualise variables by category.

Getting started with `browseMetadata`

Installation and set-up

Run in the R console:

install.packages("devtools")
devtools::install_github("aim-rsf/browseMetadata")

Load the library:

library(browseMetadata)

Set your working directory to an empty folder:

setwd("/Users/your-username/test-browseMetadata")

Demo (using the `R Studio` IDE)

Fo a longer more detailed demo, see the Getting Started page on the package website.

There are four main functions you can interact with: browse_metadata(), map_metadata(), map_metadata_compare(), and map_metadata_convert(). For more information on any function, type ?function_name. For example: ?browse_metadata.

`browse_metadata()`

This function is easy to run and doesn’t require user interaction. Run it in demo mode using the demo JSON file located in the inst/inputs directory:

browse_metadata()

Upon success, you should see:

ℹ Three outputs have been saved to your output directory.
ℹ Open the two HTML files in your browser for full-screen viewing.

The output files are saved to your working directory. You can change the save location by adjusting the output_dir argument. Examples of outputs are available in inst/outputs.

`map_metadata()`

Use the outputs from browse_metadata() as a reference when running map_metadata().

To run the mapping function in demo mode, use:

map_metadata()

In demo mode, the function processes only the first 20 variables from selected tables. Follow the on-screen instructions, and categorise variables into research domains, using the Plot tab as your reference. The demo will simplify domains for ease of use; in a real scenario, you can define more specific domains.

Upon completion, your categorisations, session log, and a summary plot will be saved in your output directory.

Using a custom metadata input (recommended)

You can run map_metadata() and browse_metadata() using a custom JSON file instead of the demo input:

new_json_file <- "path/your_new_json.json"
demo_domains_file <- system.file("inputs/domain_list_demo.csv", package = "browseMetadata")

browse_metadata(json_file = new_json_file)
map_metadata(json_file = new_json_file, domain_file = demo_domains_file)

Currently, the recommended way of retrieving these metadata JSON files is to download them from Metadata Catalogue. Navigate to the Data Model page of interest and use the drop down button to select the JSON format to download.

Using a custom domain list input (recommended)

You can replace the default demo domains with research-specific domains. Remember any domain file input will have Codes 0,1,2 and 3 automatically appended to the start of the domain list, so do not include these in your domain list.

Using a custom lookup table input (advanced)

The lookup table governs the automatic categorisations. If you modify the default lookup file, ensure that all domain codes in the lookup file are also included in your domain file for valid outputs.

Tips and future steps

You can process a subset of variables in one session and complete the rest later.
If you’re processing multiple tables, save all outputs in the same directory to enable table copying. This feature will speed up categorisation and ensure consistency.
You can compare categorisations across researchers using the map_metadata_compare() function.
Use the output file from the map_metadata() function as input for subsequent analysis to filter and visualise variables by research domain.

License

This project is licensed under the GNU General Public License v3.0 - see the LICENSE file for details.
For more information, refer to GNU General Public License.

Citation

To cite browseMetadata in publications:

Stickland R (2024). browseMetadata: browse and categorise health metadata. R package version 2.0.1.

A BibTeX entry for LaTeX users:

  @Manual{,
    title = {browseMetadata: browse and categorise health metadata},
    author = {Rachael Stickland},
    year = {2024},
    note = {R package version 2.0.1},
    doi = {https://doi.org/10.5281/zenodo.10581499}, 
  }

Contributing

We welcome contributions to browseMetadata. Please read our Contribution Guidelines for details on how to contribute.

Report Issues: Found a bug? Have a feature request? Report it on GitHub Issues.
Submit Pull Requests: Follow our Contribution Guidelines for pull requests.
Feedback: Share your thoughts by opening an issue.

Contributors ✨

Thanks go to these wonderful people (emoji key):

_{Rachael Stickland}
🖋 📖 🚧 🤔 📆 👀

_{Batool Almarzouq}
📓 👀 🤔 📆 📖

_{Mahwish Mohammad}
📓 👀 🤔

_{Daniel Delbarre}
🤔 📓

_NidaZiaS
🤔

This project follows the all-contributors specification. Contributions of any kind are welcome!

Acknowledgements ✨

Thanks to the MELD-B research project, the SAIL Databank team, and the Health Data Research Innovation Gateway for ideas, feedback, and hosting open metadata.

This project is funded by the NIHR [Artificial Intelligence for Multiple Long-Term Conditions (AIM) programme (NIHR202647). The views expressed are those of the author(s) and not necessarily those of the NIHR or the Department of Health and Social Care.

browseMetadata

Table of Contents

What is the browseMetadata package?