Details of the purpose and any published outputs from this project can be found at the link above.
The contents of this repository MUST NOT be considered an accurate or valid representation of the study or its purpose. This repository may reflect an incomplete or incorrect analysis with no further ongoing work. The content has ONLY been made public to support the OpenSAFELY open science and transparency principles and to support the sharing of re-usable code for other subsequent users. No clinical, policy or safety conclusions must be drawn from the contents of this repository.
Deatils of the purpose for each file in this repository are outlined below
-
The most recent version of the protocol can be found in the
protocolsfolder. -
The
codelistsfolder containscodelists/codelists.txt, a full list of all codelists used in this project. Each of the .CSV for those codelists can also be found in this folder. -
The following scripts are in the
analysisdirectory:- The
dataset_definitiondirectory contains the scripts for generating the initial raw dataset for the project.codelists.pyloads each codelist that is needed.- [
create_variables.py] contains functions for creating variables for the inclusion / exclusion criteria, covariates and outcome variables - The
dataset_definition_*.pyfiles define the study population and analysis variables for different stages of the pipeline. Each script generates a specific dataset using ehrQL, in some cases building on outputs from earlier data generation or cleaning steps. - [
variable_helper_functions.py] contains various functions to help facilitate other scripts in this directory. - [`study_dates.py] is a file where we define the study dates to be loaded and used at various points in the repository.
- The
dataset_cleandirectory contains the scripts for cleaning the raw dataset into an analysis ready dataset.- The
dataset_clean_*.Rscripts clean and process the datasets generated by thedataset_definition_*.pyfiles at different stages of the pipeline. These scripts apply consistent data cleaning, derive additional variables, and produce analysis-ready datasets and flow outputs, often passing cleaned data forward to later steps (e.g. matching or final analysis). - The function in
fn-modify_dummy.Ris called first and is used to alter the dummy data generated by opensafely to something close to what we expect the population dataset to look like. fn-preprocess.Ris the function carrying out initial preprocessing, formatting columns correctlyfn-qa.Ris the quality assurance function- The function in
fn-inex.Rapplies the inclusion and excusion criteria for the study. fn-ref.Ris the function that sets the reference levels for factors.
- The
- The
-
[
utility.R] contains miscellaneous functions used in the analysis. -
The
project.yamlfile lists all actions to be run in OpenSAFELY and their run order -
create_table1.Rgenerates a csv file for table one for the study. This uses the output ofdataset_clean.Rto describe the patient characteristics, displaying the proportion of study population in each variable category. -
Suffixes used in dataset scripts
Thedataset_definition_*.pyanddataset_clean_*.Rfiles use consistent suffixes to indicate the dataset or stage they relate to:_inex*- dataset containing inclusion/exclusion and QA variables._hist- dataset containing variables for generating medication gap histograms._match- dataset prepared for matching._matched*- dataset after matching.
These suffixes help track the flow of data through the pipeline, showing which datasets are inputs to later cleaning, matching, or analysis steps.
The OpenSAFELY framework is a Trusted Research Environment (TRE) for electronic health records research in the NHS, with a focus on public accountability and research quality.
Read more at OpenSAFELY.org.
As standard, research projects have a MIT license.