Skip to contents

This function annotates a result from a test cohort with information from the PGRM

Usage

annotate_results(
  results,
  use_allele_dir = TRUE,
  ancestry = "all",
  build = "hg19",
  phecode_version = "V1.2",
  calculate_power = TRUE,
  annotate_CI_overlap = TRUE,
  LOUD = TRUE
)

Arguments

results

A data frame (or data.table) with results of a test cohort; columns for SNP, phecode, cases, controls, odds_ratio, P (see demo files for example (e.g results_MGI))

use_allele_dir

If TRUE, direction of effect is used when assessing if an association is replicated. To use this argument, odds ratios must be reported for the alternative allele

ancestry

A string that specifies ancestry of the PGRM that is then used to annotate the results file. Options EAS, EUR, AFR, SAS, AMR, ALL. Default ALL

build

A string indicating the genome reference build used in the results table. Options hg19, hg38. Default is hg19.

phecode_version

A string indicating the phecode version used in the results table. (Currently only V1.2 is supported)

calculate_power

If TRUE then power calculations will be conducted using case and control counts from the results file. Necessary for get_AER(). Default FALSE

annotate_CI_overlap

If TRUE then a column called 'annotate_CI_overlap' is added to the table, values: overlap: 95% CIs of PGRM and test cohort overlap test_cohort_greater: 95% CI of test cohort greater than PGRM PGRM_greater: 95% CI of PGRM greater than test cohort If annotate_CI_overlap is TRUE, then results must include 95% CIs

LOUD

If TRUE then progress info is printed to the terminal. Default FALSE

Value

A data.table of the results file annotated with columsn from the PGRM

Details

This function takes a dataframe with summary statistics from a test cohort. For an example of the way to format the results data frame, see one of the results sets included in the package (e.g. results_MGI). (NOTE: If the direction of effect is used to determine if an association is replicated, then the odds ratios of the result set must be oriented to the alternative allele.)

The function returns a data.table with the following annotations:

  • Phecode informtion, including phecode_string and phecode_category

  • Risk allele frequency from GnomAD (column RAF), ancestry specified by the ancestry argument

  • The rsID

  • The direction of effect (ref or alt) and risk allele of the original association

  • Summary statistics from the GWAS catalog association, including the -log10(P), odds ratio, and 95% confidence intervals (cat_LOG10_P, cat_OR, cat_L95, cat_U95)

  • The study accession ID from the GWAS catalog

  • A column called powered, which is 1 or 0 indicating whether the test association is powered > 80%. If calculate_power==TRUE, then the power is determined by the case/control counts specified in the results data.table. Otherwise, it is derrived from the estimate pre-computed cases needed assuming a 1:5 case:control ratio. All power calculations use alpha=0.0

  • A column called rep that indicates if the association is replicated (i.e. p<0.05 in the test cohort; if use_allele_dir==TRUE, then the direction of effect from the test cohort must also be consistant with what is reported in the catalog)

  • If annotate_CI_overlap is true, then information about the relationship between the 95% CIs from the catalog and the test set is included in column CI_overlap, and new columns for odds_ratio, L95, and U95 are created(rOR, rL95, rU95) that are oriented to the risk allele. (This option requires that the confidence intervals are reported in the test cohort summary statistics)

Examples

library(pgrm)

## annotate the BioVU African ancestry result set
anno = annotate_results(results_BioVU_AFR, ancestry = 'AFR', build = 'hg19', calculate_power = TRUE)
#> [1] "Doing power calculations"

## Get the replication rate of associations powered at >80%
get_RR(anno)
#> Replicated 11 of 14 for RR=78.6%
#> [1] 0.7857143

## Get the replication rate of all associations
get_RR(anno,include='all')
#> Replicated 19 of 31 for RR=61.3%
#> [1] 0.6129032

## Get the actual:expected ratio
get_AER(anno)
#> Expected 20.2, replicated 19 for AE=0.94 (31 associations for 14 uniq phecodes)
#> [1] 0.9402914