Abstract
A semiquantitative risk factor has 2 components: any exposure (yes/no) and the quantitative amount of exposure (if exposed). We describe the statistical properties of alternative analyses with such a risk factor using linear, logistic, or Cox proportional hazards models. Often analyses employ the amount exposed as a single quantitative covariate, including the nonexposed with value zero. However, this analysis provides a biased estimate of the exposure coefficient (slope) and we describe the magnitude of the bias. This bias can be eliminated by adding a binary covariate for exposed versus not to the model. This 2-factor analysis captures the full risk-factor effect on the outcome. However, the coefficient for any exposure versus not does not have a meaningful interpretation. Alternatively, when exposure values among those exposed are centered (by subtracting the mean), the estimate of this coefficient represents the difference in the outcome between those exposed versus not in aggregate. We also show that the biased model provides biased estimates of the coefficients for other covariates added to the model. Proper analysis of a semiquantitative risk factor should start with a 2-factor model, with centering, to assess the joint contributions of the 2 components of the risk-factor exposure. Properties of models were illustrated using data from a multisite study in North America (1983–2019).
Reference Type
Journal Article
Periodical Full
American journal of epidemiology
Publication Year
2020
Publication Date
Dec 1,
Volume
189
Issue
12
Start Page
1573
Other Pages
1582
Publisher
Oxford University Press
Place of Publication
United States
ISSN/ISBN
0002-9262
Document Object Index
10.1093/aje/kwaa071
URL
https://www.ncbi.nlm.nih.gov/pubmed/32556076
PMID
32556076