September 2004-May 2005 Meetings
Tuesday, September 21, 2004
Speaker: Chiang-Ching Huang, Ph.D., Department of Preventive Medicine, Northwestern University
Title: Markov Model for Defining Genomic Changes Using Gene Expression Profiling
With the advent and recent proliferation of genomic technologies such as gene expression arrays, researchers are now able to explore gene expression patterns for the majority of the genes in the genome. One active research area using gene expression profiles is the study of the transcriptome map. Several studies from various genomes have revealed unexpected gene clusters within which the gene expression levels are highly correlated. To facilitate the search for such clusters in gene expression data, we propose a type of hidden Markov model (HMM) that will infer expression status for genes along the chromosomes. We define a Markov chain for the genomic sequence from which gene expression can be categorized into five states: region of overexpression, singleton of overexpression, normal expression, region of underexpression, and singleton of underexpression. We develop a continuous-time HMM (CTHMM) to account for the variation in base-pair distances between genes. The statistical properties of the CTHMM are studied through a Monte Carlo simulation. We also compare the performance of the CTHMM to moving median techniques to see how well they can recover abnormal expression regions. This model is applied to a lung cancer gene expression data set to search for abnormal expression regions. We also assess the global impact of those regions on the patients' clinical variables, such as tumor stage, tumor differentiation, and patient survival.
Tuesday, October 19, 2004
Speaker: Menggang Yu, Ph.D., Department of Biostatistics, Indiana University School of Medicine
Topic: Bivariate Designs in phase II trials
In this talk, we consider study designs in phase II clinical trials where both efficacy and toxicity are primary outcomes. One stage and two stage designs will be discussed and a real example will be given. A new design which allows for trade-offs between efficacy and toxicity is proposed. Related computational aspects of the design will also be presented.
This is a joint work with Dr. Constantin Yiannoutsos
Tuesday, November 16, 2004
Speaker: Jarek Harezlak, Ph.D. candidate, Department of Biostatistics, Harvard University School of Public Health
Topic: Individual and population penalized regression splines for accelerated longitudinal designs
Accelerated longitudinal design (ALD) sampling schemes consist of a few observations per sampling unit over a short time span. ALD data are combined across independent units to provide an estimate of the overall population curve and predictions of individual patterns of change. Extending the work of Ruppert, Wand and Carroll (2003), we develop computationally efficient procedures for longitudinal penalized regression spline (P-spline) methods under ALD sampling schemes. A major advantage of P-spline methodology is that the models can be fit using standard mixed models software (e.g. PROC MIXED in SAS).
Extensive simulation studies indicate good performance of our method in the settings considered. We then compare balanced and complete longitudinal designs to ALDs using the Berkeley Growth study data, and we apply our method to longitudinal brain volume measurements from an ongoing pediatric magnetic resonance imaging (MRI) developmental study.
This talk is based on joint work with Louise Ryan, Nicholas Lange and Jay Giedd.
Tuesday, January 18, 2005
Speaker: Alex Dmitrienko, Ph.D., Eli Lilly
Topic: Analysis of Clinical Trials Using SAS: A Practical Guide
This talk will introduce a recently published biostatistical book written by A. Dmitrienko (Lilly), G. Molenberghs (Limburgs Universitair Centrum, Belgium), C. Chuang-Stein (Pfizer) and W. Offen (Lilly). This book focuses on key problems arising in the context of clinical trials and medical research in general, including
- analysis of stratified data,
- multiple comparisons,
- interim safety and efficacy monitoring,
- analysis of incomplete data,
- analysis of safety and diagnostic measurements.
The book discusses solutions to these problems based on modern statistical methods and reviews SAS techniques that help clinical researchers efficiently and rapidly implement these methods. To illustrate the methods discussed in the book, today's talk will discuss quantile regression modeling and apply it to the analysis of large data sets of diagnostic measurements from a Lilly clinical trial.
Tuesday, March 15, 2005
Speaker: Fang Li, Ph.D., Division of Mathematics, Indiana University-Purdue University Indianapolis
Topic: Testing for superiority among two time series
This talk discusses the problem of testing the equality of two autoregressive functions against one-sided alternatives in the presence of conditional heteroscedasticity in each of the autoregressive time series. The two time series are assumed to be strictly stationary, strongly mixing and are allowed to have possibly different heteroscedastic error and stationary densities. The proposed class of tests avoids the estimation of the common non-parametric autoregressive function and is based on the time series differences after matching the lagged variables. The asymptotic normality under general one-sided local nonparametric alternatives is derived. This paper also discusses asymptotically optimal tests against these alternatives within the proposed class of tests.
Tuesday, April 19, 2005
Speaker: Steve Qin, Ph.D., Department of Biostatistics, School of Public Health, University of Michigan
Topic: Operon prediction in newly sequenced genomes using HMM
With the rapid improvement of high throughput sequencing technologies, the number of completely sequenced bacterial genomes soared in past years. The massive amount of data offered us unprecedented opportunities to answer critical biological questions. An important step in elucidating bacterial transcription regulation is the identification of operon structure. Various computational approaches have been proposed for operon prediction. However, for newly sequenced genomes, very limited experimental information or functional annotation is available. In this study, we explored the possibility of using phylogenic information contained in multiple sequenced genomes to aid in operon prediction. Our study indicated that hidden Markov models that combine degree of phylogenic conservation and traditional predictors such as intergenic distances, perform very well in newly sequenced bacterial genomes, achieving better than 85% sensitivity and specificity using the optimal probability cut-off.
Tuesday, May 17, 2005
Speaker: Bhramar Mukherjee, Department of Statistics, University of Florida
Topic: Bayesian Analysis of Case-Control Studies
The case-control study design is a popular tool for studying etiology of rare diseases like cancer. Bayesian methods offer possibilities for flexible, hierarchical modeling of case-control data. In this talk, I will first provide an overview of the Bayesian literature available in this domain. I will then propose a unified Bayesian semi-parametric approach for analyzing matched case-control data with possible missingness in the exposure variables. The method allows for the possible presence of stratum effects on the distribution of the exposure variables. Extension of the method to situations when one has finer categorization within the cases will be indicated. The Bayesian estimation procedure is implemented via a Markov chain Monte Carlo numerical integration technique. Examples of real matched case-control studies are used to illustrate the methodology.