September 2002-May 2003 Meetings


 

Thursday, September 12, 2002

Speaker: Rebecca W. Doerge, Ph.D. - Purdue University
Title: Determining Statistically Significant Changes in Gene Expression

Microarrays allow the simultaneous assessment of expression levels for thousands of genes across various treatment conditions and time. Currently, statistical analyses of these data are twofold, and focus on either finding similarity of gene expression profiles (e.g., clustering), or calculating per gene fold changes (ratios) between control and treatment (e.g., differential expression). An alternative approach to the latter analysis uses linear models to detect genes undergoing statistically significant differential expression. The main advantage of this approach is that it allows the estimation of variability inherent in microarray experiments, and in turn allows a level of confidence, or probability, to be attached to statements about changes in expression levels across various conditions. A linear model approach to analyzing microarray data will be introduced for the purpose of demonstrating its benefits, and for introducing its use with quantitative trait loci (QTL) analysis for the identification of expression level polymorphisms (ELPs).

Tuesday, October 15, 2002

9:00AM ~ 10:00AM, W. Michael O'Fallon, Ph.D., Professor of Biostatistics, Mayo Clinic, Retired.
Attributable Risk Estimation: A Tale of Mathematical/Statistical Modeling

Attributable risk, a simple but underused concept, is defined as the percent of cases of a disease which can be "attributed" to a risk factor. The deceptive simplicity of the concept provides a fertile ground for a discussion of its mathematical and statistical underpinnings. Not surprisingly, in the complex human world of multiple diseases with multiple interrelated causes, implementation of the concept proves to be not at all simple. This presentation discusses the development of the concept from its simple phase to a complex generalized model of attributable risk which, by permitting the simultaneous consideration of multiple causation and confounding factors, may come acceptably close to the "Real World". A software package will be discussed briefly which employs computer intensive resampling methods (Bootstrap, Jackknife) to estimate the distribution of the generalized estimate. The utility of the concept will be illustrated using data from a large study designed to explicate risk factors for ischemic stroke. This presentation will combine issues of medicine, public health, computer science, mathematics and statistics in a meaningful and easily understood fashion.

10:10AM ~ 11:00AM, Sujuan Gao, IUMC Division of Biostatisitcs and Harvard School of Public Health.
A Multi-State Stochastic Model for the Analysis of Longitudinal Dementia Data

Longitudinal dementia studies face many challenging issues, including multiple outcomes, panel data, nonignorable missing information due to death of elderly participants, and complex sampling designs for ascertaining clinical outcomes. We propose a multi-state transitional stochastic model for the analysis of dementia data. The model includes three states for clinical outcomes: normal, cognitively impaired but not demented and demented state ...plus a fourth, absorbing state for death. A maximum likelihood approach will be used for parameter estimation. Incidence rates of cognitive impairment, dementia and mortality can be estimated simultaneously using this approach. Estimates of risk factor effects also result by assuming a proportional hazard model for transitions between states. The methodology can also be extended, using an EM algorithm, to deal with data collected using complex sampling plans. We illustrate our approach using data from the Indianapolis Dementia Study.

Followup information from Mike O'Fallon:

"I had a very good visit in Indianapolis and enjoyed meeting old friends and making new ones. Thanks for the invitation and the hospitality.

Michael Kahn, Ph.D., recently put links to the Attributable Risk (arhat) report and the software on his web page. It can be accessed at http://acunix.wheatonma.edu/~mkahn

The link to the Shell archive (shar) for the attributable risk software (gzip) will automatically ask the user if he/she wants to download the archive."

Tuesday, November 19, 2002

Speaker: Jingjin Li, Ph.D. - IU Biostatistics
Title: Combining Model Selection and False Detection Rates (FDRs) to Detect Differentially Expressed Genes in Microarray Experiments

Oligonucleotide microarrays are amongst a set of technologies that allow for high throughput assessment of vast numbers of gene expressions. Finding significant changes in gene expression is a complex statistical problem. Simultaneous inference for detecting differentially expressed genes has given rise to a methodology based on estimating false detection rates (FDRs), Efron et al. (2001.) FDRs lead to realistic assessments of true expression changes after adjusting for the multiplicity of genes being screened. On the other hand, model- based approaches have been adopted to control the variations in the micro-array experiments and to achieve less biased and more accurate estimates.

In this talk, we discuss combining model selection and FDRs to better detect differentially expressed genes under two experimental conditions and two RNA preparation time points. Analyses using various bootstrap approaches with and without model selection are compared based on data for 5032 genes.

Tuesday, January 21, 2003

Speaker: Wanzhu Tu, Ph.D. - IU Biostatistics
Title: A model for repeatedly measured multivariate data

In clinical investigations, repeatedly collected multiple outcomes are fairly common. Traditionally, we use univariate mixed effect model to depict the outcomes, one at a time. For example, we use separate models to describe the systolic and diastolic blood pressure measures, even though the two are usually collected in pairs. In this presentation, I will discuss a multivariate mixed effect model for repeated measurements and its potential application in various clinical settings. This model is designed to accommodate data following exponential family of distributions and it introduces a flexible correlation structure among the repeatedly collected multiple outcomes. In contrast to the univariate modeling approach, the new structure allows the assessment of the global effect of a covariate on all outcomes; it also affords comparisons of the impact of exposure across outcomes. Two data examples are used to illustrate the method.

February 2003 ...Speaker caught in snowstorm; talk postponed.
See May 6, 2003 meeting (below.)

Tuesday, March 18, 2003

Speaker: Chong Gu, Ph.D. – Purdue University, West Lafayette
Title: Model Diagnostics for Smoothing Spline ANOVA Models

Functional ANOVA decompositions can be incorporated in multivariate function estimation through the penalized likelihood method. In this talk, we propose some simple diagnostics for "testing'' selected model terms in the decomposition. The elimination of practically insignificant terms generally enhances the interpretability of estimates and sometimes may also have inferential implications. What we try to achieve are the same tasks as traditional likelihood ratio tests ...but in the absence of sampling distributions due to nulls in nonparametric settings typically being infinite dimensional. Model diagnostics are illustrated in the settings of regression, probability density estimation, and hazard rate estimation.

Tuesday, April 15, 2003

Speaker: Chandan Saha, Ph.D. – Biostatistics, IU School of Medicine
Title: Large Sample Bias in the Last Observation Carried Forward Approach Under Informative Dropout

In clinical trials, subjects are usually followed over a fixed period of time to analyze their response to the treatments randomly assigned to them. Missing data are an inescapable problem in such clinical trials. When the main interest is the outcome at endpoint of the study, the last observation carried forward (LOCF) is the most frequently used approach for dealing with missing values in clinical trials with continuous variables. However, there are several criticisms of using this approach because of possible bias in resulting estimates of treatment effects. Some researchers emphasized that it would be interesting to have a theoretical quantification of the magnitude of bias caused by dropouts inherent to various analyses. This talk describes theoretical results that quantify large sample bias in the LOCF approach under informative dropout.

We consider a longitudinal study with two groups (treatment and control) and model the dropout mechanism as a function of subject specific random intercept, slope and group membership. Several case studies are included to show the magnitude of bias in the estimators for the treatment effects and variance.

This is joint work with Michael P. Jones, PhD, Professor, University of Iowa

Tuesday, May 6, 2003

Speaker: Donglin Zeng, Ph.D., Department of Biostastics, University of North Carolina at Chapel Hill
Title: Joint Modeling for Longitudinal Data with Outcome-related Observation Times

In observational longitudinal studies, subjects' visit times can be informative of their longitudinal outcome. For example, when visit times potentially depend on the current outcome (possibly through a latent process such as health status), simply fitting a mixed model without accounting for informative observation times may result in biased estimates and invalid inferences. In this talk, we propose jointly modeling both longitudinal outcome and visit times. We consider two classes of models, one semi-parametric and the other nonparametric. In both cases, observation times are modelled as a counting process with subject-specific random effects. We assume that these random effects are shared by the longitudinal outcome and imply dependence between the visit times and the longitudinal outcomes. In the semi-parametric approach, the parameter estimates are derived from an EM algorithm and their asymptotic standard errors are obtained from the numerical derivatives of the profile likelihood function. In the nonparametric approach, the mean function of the longitudinal outcome is assumed to be smooth but completely unknown. We first derive the theoretical bias in the usual kernel estimate; then for a special case, we propose some intuitive methods to eliminate bias and ensure valid inferences. Both approaches are illustrated with simulation studies and actual applications.

This talk is based upon joint work with Dr. Daohai Yu of Duke University.

Tuesday, May 20, 2003

Speaker: Hua Yun Chen, Ph.D., Division of Epidemiology/Biostatistics, University of Illinois at Chicago
Title: Nonparametric and semiparametric models for missing covariates in parametric regression

Robustness of covariate modeling for the missing covariate problem in parametric regression is studied under the MAR assumption. For a simple missing covariate pattern, a non-parametric covariate model is proposed and shown to yield a consistent and semiparametrically efficient estimator for the regression parameter. Total robustness is achieved in this situation.

For more general missing covariate patterns, a novel semiparametric modeling approach is proposed for the covariates. In this approach, the covariate distribution is first decomposed into the product of a series of conditional distributions according to the overall missing data patterns and the conditional distributions are then represented in general odds ratio form. The general odds ratios are modeled parametrically and the other components of the covariate distribution are modeled nonparametrically. Maximum semiparametric likelihood is proposed to find the parameter estimates.

The proposed method yields a consistent estimator for the regression parameter when the odds ratios are modeled correctly. In general, the semiparametric covariate modeling strategy increases the robustness against covariate model misspecification when compared with the parametric modeling strategy proposed by Lipsitz and Ibrahim. The new covariate modeling approach can also be incorporated into the doubly robust procedure of Robins et al to increase protection against misspecification of the missing data mechanism. In addition, the proposed modeling strategy avoids the usually intractable integrations involved in the maximization of the incomplete data likelihood with parametric covariate models. The proposed method can be applied to many regression models to handle incomplete covariates.

Monday-Wednesday, May 19-21, 2003 - Midwestern Biopharmaceutical Statistics Workshop

Co-sponsored by Ball State University & the Biopharmaceutical Section of the American Statistical Association

WORKSHOP REGISTRATION: $140 until May 1 ($45 for students), $160 after May 1

Monday, 9:00 am – 1:00 pm

SHORT COURSE (Separate Registration Fee: $55)
Topic: Analysis of Categorical Data with Overdispersion (Extravariation)
JORGE MOREL, Procter & Gamble, and NAGARAJ NEERCHAL, University of Maryland Baltimore County

Monday, 2:30 pm – 4:30 pm
PLENARY SESSION: MURRAY CLAYTON, University of Wisconsin, Madison
Topic: The Art of Statistical Consulting

TUESDAY MORNING, MAY 20
A. Risk Management: When the Label is Not Enough
B. Analysis of Large datasets in Preclinical and Early Development
C. Statistical Issues Related to Development, Manufacture, and Control of Biotech Products

TUESDAY AFTERNOON, MAY 20
A. Novel Approaches for Analyzing Clinical Safety/Adverse Event Data
B. Tools and Techniques for Analyzing Large Data Sets
C. Statistical Training of Research Scientists and Engineers

TUESDAY EVENING BANQUET
Speaker: CHRISTY CHUANG-STEIN, Pharmacia
Topic: My Yellow Brick Roads – the Journeys of a Statistician

WEDNESDAY MORNING, MAY 21
A. Analysis of QT/QTc Interval Data
B. Analysis of Large Data Sets in Clinical, Regulatory and Marketing
C. Current Issues in Stability




Archived Meetings