# Biostatistical Methods Preface

**
Biostatistical Methods: The Assessment of Relative Risks
**

**NO LONGER IN PRINT**

John M. Lachin

John Wiley and Sons, 2000

ISBN: 0-471-36996-9 **PREFACE**

In 1993 to 1994 I led the effort to establish a graduate program in biostatistics at the George Washington University. The program, which I now direct, was launched in 1995 and is a joint initiative of the Department of Statistics, the Biostatistics Center (which I have directed since 1988) and the School of Public Health and Health Services. Biostatistics has long been a specialty of the statistics faculty, starting with Samuel Greenhouse, who joined the faculty in 1946. When Jerome Cornfield joined the faculty in 1972, he established a two-semester sequence in biostatistics (Statistics 225-6) as an elective for the graduate program in statistics (our 200 level being equivalent to the 600 level in other schools). Over the years these courses were taught by many faculty as a lecture course on current topics. With the establishment of the graduate program in biostatistics, however, these became pivotal courses in the graduate program and it was necessary that Statistics 225 be structured so as to provide students with a review of the foundations of biostatistics.

Thus I was faced with the question "what are the foundations of biostatistics?" In my opinion, biostatistics is set apart from other statistics specialties by its focus on the assessment of risks and relative risks through clinical research. Thus biostatistical methods are grounded in the analysis of binary and count data such as in 2x2 tables. For example, the Mantel-Haenszel procedure for stratified 2x2 tables forms the basis for many families of statistical procedures such as the G-rho family of modern statistical tests in the analysis of survival data. Further, all common medical study designs, such as the randomized clinical trial and the retrospective case-control study, are rooted in the desire to assess relative risks. Thus I developed Statistics 225, and later this text, around the principle of the assessment of relative risks in clinical investigations.

In doing so, I felt that it was important first to develop basic concepts and derive core biostatistical methods through the application of classical mathematical statistical tools, and then to show that these and comparable methods may also be developed through the application of more modern, likelihood-based theories. For example, the large sample distribution of the Mantel-Haenszel test can be derived using the large sample approximation to the hypergeometric and the Central Limit Theorem, and also as an efficient score test based on a hypergeometric likelihood.

Thus the first five chapters present methods for the analysis of single and multiple 2x2 tables for cross-sectional, prospective and retrospective (case-control) sampling, without and with matching. Both fixed and random effects (two-stage) models are employed. Then, starting in Chapter 6 and proceeding through Chapter 9, a more modern likelihood or model-based treatment is presented. These chapters broaden the scope of the book to include the unconditional and conditional logistic regression models in Chapter 7, the analysis of count data and the Poisson regression model in Chapter 8, and the analysis of event time data including the proportional hazards and multiplicative intensity models in Chapter 9. Core mathematical statistical tools employed in the text are presented in the Appendix. Following each chapter problems are presented that are intended to expose the student to the key mathematical statistical derivations of the methods presented in that chapter, and to illustrate their application and interpretation.

Although the text provides a valuable reference to the principal literature, it is not intended to be exhaustive. For this purpose, readers are referred to any of the excellent existing texts on the analysis of categorical data, generalized linear models and survival analysis. Rather, this manuscript was prepared as a textbook for advanced courses in biostatistics. Thus the course (and book) material was selected on the basis of its current importance in biostatistical practice and its relevance to current methodological research and more advanced methods. For example, Cornfield's approximate procedure for confidence limits on the odds ratio, though brilliant, is no longer employed because we now have the ability to readily perform exact computations. Also, I felt it was more important that students be exposed to over-dispersion and the use of the information sandwich in model-based inference than to residual analysis in regression models. Thus each chapter must be viewed as one professor's selection of relevant and insightful topics.

In my Statistics 225 course, I cover perhaps two-thirds of the material in this text. Chapter 9, on survival analysis, has been added for completeness, as has the section in the Appendix on quasi-likelihood and the family of generalized linear models. These topics are covered in detail in other courses. My detailed syllabus for Statistics 225, listing the specific sections covered and exercises assigned, is available at the Biostatistics Center web site (www.bsc.gwu.edu/jml/biostatmethods). Also, the data sets employed in the text and problems are available at this site or the web site of John Wiley and Sons, Inc. (www.wiley.com).

Although I was not trained as a mathematical statistician, during my career I have learned much from those with whom I have been blessed with the opportunity to collaborate (chronologically): Jerry Cornfield, Sam Greenhouse, Nathan Mantel, and Max Halperin, among the founding giants in biostatistics; and also Robert Smythe, L.J. Wei, Peter Thall, K.K. Gordon Lan and Zhaohai Li, among others, who are among the best of their generation. I have also learned much from my students, who have always sought to better understand the rationale for biostatistical methods and their application.

I especially acknowledge the collaboration of Zhaohai Li, who graciously agreed to teach Statistics 225 during the fall of 1998, while I was on sabbatical leave. His detailed reading of the draft of this text identified many areas of ambiguity and greatly improved the mathematical treatment. I also thank Costas Cristophi for typing my lecture notes, and Yvonne Sparling for a careful review of the final text and programming assistance. I also wish to thank my present and former statistical collaborators at the Biostatistics Center, who together have shared a common devotion to the pursuit of good science: Raymond Bain, Oliver Bautista, Patricia Cleary, Mary Foulkes, Sarah Fowler, Tavia Gordon, Shuping Lan, James Rochon, William Rosenberger, Larry Shaw, Elizabeth Thom, Desmond Thompson, Dante Verme, Joel Verter, Elizabeth Wright, and Naji Younes, among many.

Finally, I especially wish to thank the many scientists with whom I have had the opportunity to collaborate in the conduct of medical research over the past 30 years: Dr. Joseph Schachter, who directed the Research Center in Child Psychiatry where I worked during graduate training; Dr. Leslie Schoenfield, who directed the National Cooperative Gallstone Study; Dr. Edmund Lewis, who directed the Collaborative Study Group in the conduct of the Study of Plasmapheresis in Lupus Nephritis and the Study of Captropil in Diabetic Nephropathy; Dr. Thomas Garvey, who directed the preparation of the New Drug Application for treatment of gallstones with ursodiol; Dr. Peter Stacpoole, who directed the Study of Dichloroacetate in the Treatment of Lactic Acidosis; and especially Drs. Oscar Crofford, Saul Genuth and David Nathan, among many others, with whom I have collaborated since 1982 in the conduct of the Diabetes Control and Complications Trial, the study of the Epidemiology of Diabetes Interventions and Complications, and the Diabetes Prevention Program. The statistical responsibility for studies of such great import has provided the dominant motivation for me to continually improve my skills as a biostatistician.

John M. Lachin

Rockville, Maryland