Mack, et al. (Breslow-Day)
BresMack.text : is the documentation to Appendix III of Breslow and Day (1980) that presents data from the matched case-control study of endometrial cancer described in Mack et al. (1976). This file was downloaded in the fall of 1999 from Norman Breslow's web site at the Department of Biostatistics of the University of Washington. It describes the variables and corrections to the descriptions in Breslow and Day.
BresMack.txt : is the data set downloaded from Norman Breslow's web site. The programs described in Chapters 5 and 7 use this data set.
Hosmer-Lemeshow Low Birthweight Data
HosLem.sas : generates the data set used as the basis for the analyses in the text, Tables 7.8 and 7.9. The original source of the data was a table in the SAS Technical Report P-229, p. 465-6. The data were scanned and used in this program to generate the data set HLData.dat that is used in the program HLBwt.sas for the analyses shown in Chapter 7. Because the data were scanned from a secondary source, the data and analyses may differ from those shown by Hosmer and Lemeshow.
DCCT Hypoglycemia
dccthypo.txt : is a flat (text) file used in Examples 8.1-8.3. This is used as the basis for the data presented in Table 8.1, see the program Table81.sas in Chapter 8.
Fleming-Harrington CGD Data
FH-CGD.txt . Fleming and Harrington (1991) present the data from a clinical trial of gamma interferon versus placebo in the treatment of children with chronic granulamatous disease (CGD) to reduce the incidence of recurrent pyogenic infections. The data set includes multiple records for each subject to record the time of each successive infection or the date of right censoring. The variables include
id |
the patient ID |
IDT |
either the date of onset of a serious infection, or the date follow-up ended |
Z2 |
Inheritance pattern: X-linked (1) versus autosomal recessive (2) |
Z4 |
Height (cm) |
Z6 |
Corticosteroid use on entry: yes (1) versus no (2) |
|
Z8 |
Gender: male (1) versus female (2); and |
T1 |
elapsed time from randomization to the value of IDT in the current observation, i.e. the time to an infection or the number of days of follow-up (IDT-RDT) |
d |
indicator (1) for an infection at the date IDT or (2) for censoring at that time (end of follow-up) |
|
  |
RDT |
the date of randomization into the study, in mmddyy format |
Z1 |
treatment group: interferon (1) versus placebo (2) |
Z3 |
Age (years) |
Z5 |
Weight (kg) |
Z7 |
Antibiotic use on entry: yes (1) versus no (2) |
Z9 |
Type of hospital: NIH (1), other US (2), Amsterdam (3), other European (4) |
T2 |
the start time for the current interval of risk, either 0 for the first record of each subject, or the time IDT+1 from the previous infection time for that subject |
S |
the sequence number for the current infection (if any) for this subject. This is used for analyses of recurrent events in Chapter 9.
|
|
Note: FHcnt.txt is a data set created by FHcnt.sas that has one record per subject contianing the additonal variables nevents: number of severe infections experienced, and futime: the number of days of follow-up. This is used for analyses using Poisson regression in Chapter 8.
Lagakos Squamous Cell Carcinoma
Lagakos.sas reads the data from Lagakos (1978) and creates a SAS data set that is used for the analyses in Chapter 9. This job should be run on your platform to create the SAS data set. The data set was originally used by Lagakos to describe an approach to the analysis of competing risks, there being two modes or causes of failure (spread of disease) - metastatic versus not. For the analyses herein, however, a single outcome is employed - spread of disease of any cause.
DCCT Nephropathy (Microalbuminuria) Data
nephdata.txt contains data related to the onset of microalbuminuria in the DCCT. These data are used for simple survival analyses as presented in Example 9.2. The data set, however, contains additional variables that could be used for supplemental exercises. See DCCTneph.sas. The variables in the data set are
Patient |
ID number (a dummy number to mask the patient's identity) |
primary |
for primary prevention cohort (1) versus secondary intervantion cohort/td> |
neur |
for neuropathy present on entry (1) versus not (0) |
neph2flg |
the indicator for the development of microalbuminuris during the study (1) versus censored |
duration |
the months duration of diabetes on entry |
age |
in years |
bcval5 |
the entry level of stimulated C-peptide, a measure of residual endogenous insulin secretory function |
bmi |
a measure of obesity calculated as weight/(height**2), and the array of variables |
|
  |
int |
for intensive (1) versus conventional (0) treatment |
etdpatb |
the baseline ETDRS grade of retinopathy severity (see DCCT, 1995) |
aer0 |
the entry level of albumin excretion rate (mg/24 h) |
neph2vis |
the quarterly visit number at which microalbuminuria was first observed or the last observation visit |
female |
(1) versus male (0) |
adult |
(1) (>17 years of age) versus adolescent (0) |
hbael |
the baseline level of HbA1c |
mhba1-mhba9 |
that represent the current mean HbA1c over the period since randomization up to the current annual visit (1-9). |
|
DCCT Hypoglycemia Recurrent Event Data
Due to their size, the four data sets in this section are provided as a single SAS export file. You can download this file as an uncompressed file (17.6 Mbytes), as a gzip-compressed file (952 Kbytes), or as a zip-compressed file (938 Kbytes). Please run the program impthypo.sas to generate the following SAS data sets on your platform.
Dataset hyevents
Contains one record per hypoglycemia event for each subject. The variables are
ETIME |
the day number since randomization when an event occurred missing if no event in this observation |
EVENTDAY |
the calendar date of the event in MMDDYY8. format |
FTIME |
the total follow-up time of the subject |
NEVENTS |
the total number of events for this subject |
RANDSAS |
the calendar date of randomization into the study in MMDDYY6. format. |
|
  |
EVENT |
an indicator for whether an event occurred at this time (1=yes, .=no) |
EVNUM |
the cumulative event number since randomization |
INTGROUP |
an indicator for intensive (1) versus conventional (2) treatment group |
PATIENT |
the patient ID number (masked) |
|
Dataset hytimes
This data set contains a single observation with six sets of array variables:
MAXJ |
is the number of elements in the array that equals the total number of distinct event times in the data set (1565 in this case) |
XE1-XE1565 |
are the numbers of events in the intensive (experimental) group at each time |
YE1-YE1565 |
are the numbers of subjects at risk in the intensive (experimental) group at each time |
Y1-Y1565 |
are the total numbers of subjects at risk in both groups at each time. |
|
  |
T1-T1565 |
are the times at which events occurred |
XC1-XC1565 |
are the numbers of events in the conventional group at each time |
YC1-YC1565 |
are the numbers of subjects at risk in the conventional group at each time |
|
Dataset hypomimi
Contains DCCT intensive group recurrent hypoglycemia event observations with time dependent covariate data as described in Example 9.12. Each observation is defined in terms of start and stop times, the associated time dependent covariate (mhba) and the number of events at the stop time, if any. The covariates in the data set are
ADULT |
Adult >=18 (0=no/1=yes) |
AGE |
Age at entry |
CALORIES |
Calories (kcal) per day |
DURATION |
Duration of IDDM (months) at Baseline |
FAMIDDM |
Family History of IDDM (0=no/1=yes) |
FULLIQ |
Full Scale IQ |
HBAEL |
HbA1c at Eligibility |
HYPOFLG |
1 if had a hypoglycemia event at this time |
LAER00 |
Log of Baseline AER |
LHBA1C |
time dependent Log of the current mean HbA1c since randomization |
MARRIED |
Marital Status (0=NOT Married,1=Married) |
NEUR0FLG |
Clinical Neuropathy at Baseline (0,1) |
PATIENT |
Patient ID number (masked) |
PRIMARY |
Base retinopathy strata (0=Scnd, 1=Prim) for primary versus secondary cohort |
RET20FLG |
Baseline ETDRS 20/20 (0=No,1=Yes) |
RET43FLG |
Baseline ETDRS 43/<43 + (0=No,1=Yes) |
SMOKER |
Smoking Status at Baseline (0=No,1=Yes) |
STOPS |
End of Interval (in study time) |
WPMEAN |
Within-Profile Mean Blood Glucose(mg/dl). All covariates are DCCT baseline values except lhba1c and nprior which are time dependent covariates. |
|
  |
AER00 |
Albumin Excretion Rate (mg/24hr) at Baseline |
BMI |
Body Mass Index (kg/m**2) |
CPEPTIDE |
Stimulated C-Peptide(pmol/ml) |
EDUCAT |
Mean Education (Years) - Form 013 |
FEMALE |
Female (0=no/1=yes) |
GROUP |
treatment group 'EXPERIMENTAL' for intensive or 'STANDARD' for conventional |
HDL |
HDL Cholesterol (serum,mg/dl) |
INSULIN |
Total Insulin Dosage Units/Weight (kg) |
LDL |
LDL Cholesterol (serum,mg/dl) |
LHBAEL |
Log of HbA1c at Eligibility |
MBP |
Mean Arterial Pressure |
NPRIOR |
time dependent cumulative number of hypoglycemia events since randomization prior to the current interval |
PHASE2 |
Randomization in phase 2 (1) versus phase 3 (0) |
PROTEIN |
Dietary Protein (gm) |
RET35FLG |
Baseline ETDRS 35/<=35 (0=No,1=Yes) |
RETBASE |
Baseline Retinopathy Strata 'PRIM' for primary or 'SCND' for secondary |
STARTS |
Start of Interval (in study time) |
TRG |
Triglycerides (serum,mg/dl) |
|
Dataset hypomimc
Contains DCCT conventional group recurrent hypoglycemia event observations with time dependent covariate data as described in Example 9.12. Each observation is defined in terms of start and stop times, the associated time dependent covariate (mhba) and the number of events at the stop time, if any. The variables in the data set are the same as those described above.
Veterans Administration Cooperative Urological Research Group
VACURG85.txt presents the data from the VACURG study of prostate cancer described by Byar in the book edited by Andrews and Herzberg (1985) which gives the variable descriptions. These data have been used by many, including Thall and Lachin (1986). The data are also available from StatLib in a slightly different format as Table46.dat of the file Andrews. The variables included are
patid |
patient number |
rx |
treatment group 1=placebo, 2=0.2 mg. estrogen, 3=1 mg., 4=5 mg. |
mosfu |
months of follow-up |
age |
in years, 89=>88 |
pf |
performance status 0=normal, 1=<50% time in bed, 2=(50-<100%) time, 3=confined to bed |
sbp |
systolic blood pressure/10 mm/hg (e.g. 118 recorded as 12) |
ekg |
EKG 0=normal, 1=benign, 2=rhythmic disturbances and electrolyte changes, 3=heart blocks or conduction defects, 4=heart strain, 5=old myocardial infarct (MI), 6=recent MI |
sz |
tumor size in cm**2 (0=none palpable) |
stage |
tumor stage |
startm startd starty |
randomization (m d y) |
|
|
ap |
alkaline phosphatase in King-Armstrong units *10, a measure of liver function |
status |
survival status 0=alive, 1=dead from prostate cancer, 2=dead from heart/vascular disease, 3=dead from cerebrovascular disease, 4=dead from pulmonary embolus, 5=dead from other cancer, 6=dead from respiratory disease, 7=dead from other specific non-cancer cause, 8=dead from unspecified non-cancer cause, 9=dead from unknown cause |
wt |
weight index = kg - cm. height +200 |
hx |
history of cardiovascular disease 0=no, 1=yes |
dbp |
diastolic BP/10 |
hg |
serum hemoglobin in g/100 ml *10 |
sg |
combined index of tumor stage and histology grade |
bm |
bone metastases 0=no, 1=yes |
|