Statistics Seminar Spring 2012
  Department of Math Sciences, IUPUI
  Wednesday 12:15-1:30PM, LD 265

  (*Abstracts can be found below)  

01/11. Hanxiang Peng, Department of Math Sciences,  IUPUI
Organizing Meeting
Central Limit Theorems for Quadratic Forms of Random Vectors of Increasing Dimensions.*

01/18. Qun Lin, Department of Math Sciences,  IUPUI
On Robustness of Empirical Likelihood Ratio Confidence Intervals

01/27. Cheng Ouyang, Department of Math Sciences, Univ Illinois At Chicago
Some functional inequalities for stochastic differential equations driven by fractional Brownian motions.*

02/01. Shan Wang, Department of Math Sciences,  IUPUI
Estimating Linear Functionals With Responses MAR.

02/08. Lingnan Li, Department of Math Sciences,  IUPUI
Empirical Likelihood for Estimating Equations with Missing Values.

02/15. Frank Zou, Department of Math Sciences,  IUPUI
A Spatio-Temporal Bayesian Model for Syndromic Surveillance.*

02/22. Fang Li, Department of Math Sciences,  IUPUI
Testing for the equality of two autoregressive functions using quasi-residuals.*

03/02. Jie Yang, Department of Math Sciences, University of Illinois At Chicago
Classification Based on a Permanental Process*

03/07. Ziyi Yang, Department of Math Sciences,  IUPUI
Analysis of Repeated Events.*

03/14. Spring Break No Seminar

03/21. Xiaofeng Shao, Department of Statistics, University of Illinois At Urbana Champaign
Self-Normalization for Time Series*

03/30. Xin Dang, Department of Math University of Mississippi
Data Mining Methods based on Kernelized Spatial Depth*

04/04. Yiwen Xu, Department of Math Sciences,  IUPUI
Introducing estimation methods for missing data*

04/11. Giri Karishma, Department of Math Sciences,  IUPUI
Methods for coping with missing data: A comparison of Complete-case analysis and Imputation Methods*

04/20. Bo Li, Department of Statistics, Purdue University
An approach to modeling asymmetric multivariate spatial covariance structures*

04/27. Yichao Wu, Department of Statistics North Carolina State University*
Continuously Addictive Models for Functional Regression

05/02.

                                            Abstracts

Title: Central limit theorems for quadratic forms of random vectors of growing dimensions. (Hanxiang Peng)
Abstract: This paper provides su cient conditions for the asymptotic normality of quadratic forms of averages of random vectors of increasing dimension and improves on conditions found in the literature. Such results are needed in applications of Owen's empirical likelihood when the number of constraints is allowed to grow with the sample size. In this connection we x a gap in the proof of Theorem 4.1 of Hjort, McKeague and Van Keilegom (2009). We also show how our results can be used to obtain the asymptotic distribution of the empirical likelihood under contiguous alternatives. Joint with Anton Schick.

Title: Some functional inequalities for stochastic differential equations driven by fractional Brownian motions. (Cheng Ouyang)
Abstract: The concentration of measure phenomenon and logarithmic Sobolev inequalities are closely related. In this talk, I will present some recent results in this direction for stochastic differential equations (SDEs) driven by fractional Brownian motions. In particular, as a consequence of the concentration property, we obtain a Gaussian upper bound for the density of solution to such SDEs. The presentation is based on a joint work with F.Baudoin and S. Tindel.

Title: A Spatio-Temporal Bayesian Model for Syndromic Surveillance (Jian Zhou)
Abstract: Syndromic surveillance uses syndrome (a speci c collection of clinical symptoms) data for early detection of infectious disease outbreaks and bioterrorist attacks. In this talk, we propose an inference model for determining the location of outbreaks of epidemics in a network of nodes. The model is epidemiological, by choice, to process daily counts from the counties in order to infer when an outbreak is present that is distinguishable from background counts. The methodology incorporates Gaussian Markov random eld (GMRF) and spatio-temporal conditional autoregressive (CAR) modeling. The methodology has some nice features including timely detection of outbreaks, robust inference to model misspecification, reasonable prediction performance, as well as attractive analytical and visualization tool to assist public health authorities in risk assessment. Based on extensive simulation studies and synthetic data generated from a dynamic SIR model, we demonstrated that the model is capable of capturing outbreaks rapidly, while still limiting false positives.

Title: Testing for the equality of two autoregressive functions using quasi-residuals (Fang Li)
Abstract: It discusses the problem of testing the equality of two nonparametric autoregressive functions against one-sided alternatives. The heteroscedastic errors  and stationary densities of the two independent strong mixing strictly stationary time series can be possibly different. The paper adapts the idea of using sum of quasi-residuals to construct the test and derives its asymptotic null distributions. The paper also shows that the test is consistent for general alternatives and obtains its limiting distributions under a sequence of local alternatives. Then a Monte Carlo simulation is conducted to study
the finite sample level and power behavior of these tests at some alternatives. We also compare the test to an existing lag matched test theoretically and by Monte Carlo experiments.

Title: Classification Based on a Permanental Process (Jie Yang)
Abstract: In this talk we introduce a doubly stochastic marked point process model for supervised classification problems. Regardless of the number of classes or the dimension of the feature space, the model requires only 2~3 parameters for the covariance function. The model is effective even if the feature region occupied by one class is a patchwork interlaced with regions occupied by other classes. The classification criterion involves a permanental ratio for which an approximation using a polynomial-time cyclic expansion is proposed. Applications to DNA microarray analysis and protein classifications indicate that the cyclic approximation is effective even for high-dimensional data. It can employ feature variables in an efficient way to reduce the prediction error significantly. This is critical when the true classification relies on non-reducible high-dimensional features.

Title: Analysis of Repeated Events.(Ziyi Yang)
Abstract: Events that may occur repeatedly for individual subjects are of interest in many medical studies. We review methods of analysis for repeated events, emphasizing that the approach taken in a given study should allow clinical questions to be addressed as directly as possible. Methods based on full models for event processes as well as on simpler 'marginal' assumptions are considered. We apply various methods of analysis to studies involving pulmonary exacerbations in persons with cystic fibrosis, and the occurrence of bone metastases and skeletal events in cancer patients, respectively. Most of the methodology considered can be implemented with existing software. This talk is based on R.J. Cook and J.F. Lawless. Analysis of Repeated Events. Statistical Methods in Medical Research. 2002; 11: 141-166.

Title: Self-Normalization for Time Series (Xiaofeng Shao)
Abstract: In the inference of time series (e.g. hypothesis testing and confidence interval construction), one often needs to obtain a consistent estimate for the asymptotic covariance matrix of a statistic. Or the inference can be conducted by using resampling (e.g. moving block bootstrap) and subsampling techniques. What is common for almost all the existing methods is that they involve the selection of a smoothing parameter. Some rules have been proposed to choose the smoothing parameter, but they may involve another user-chosen number, or assume a parametric model. In this talk, we introduce the so-called selfnormalized (SN) approach in the context of confidence interval construction and change point detection. The self-normalized statistic does not involve any smoothing parameter and limiting distribution is nuisance parameter free. The finite sample performance of the SN approach is evaluated in simulated and real data examples.

Title: Data Mining Methods based on Kernelized Spatial Depth (Xin Dang)
Abstract: Statistical depth functions provide center-outward ordering of points with respect to a distribution or a date set in high dimensions. Of the various depth notions, the spatial depth is appealing because of its computational efficiency. However, it tends to provide circular contours and fail to capture well the underlying probabilistic geometry outside of the family of spherically symmetrical distributions. We propose a novel depth, the kernelized spatial depth (KSD), which generalizes the spatial depth via positive definite kernels. By choosing a proper kernel, the KSD captures the local structure of data while the spatial depth fails. Based on KSD, a simple outlier detector is proposed, by which an observation with a depth value less than a threshold is declared as an outlier. Upper bounds of false alarm probability are derived and used to determine the threshold. KSD is extended to graph data, where pairwise relationships of objects are given and represented by edges. Several graph kernels including a new proposed one, complement Laplacian kernel, are considered for ranking the "centrality" of graph vertices. An application of graph KSD to gene data will be briefly discussed. A clustering algorithm based on KSD is also proposed. Preliminary results show it promising. With successes in the application, theoretical developments of KSD are demanding. The talk will be ended with questions:
1. What properties does the KSD possess?
2. What's the role of parameter in the kernel? How to choose it optimally?
3. What is relationship between KSD and Kernel Density Estimation?  

Title: Introducing estimation methods for missing data (Yiwen Xu)
Abstract: In this presentation, I will introduce the least squares analysis for complete data, followed by the least squares analysis for missing data, including estimation methods (in particular, Yate's Method) and methods about finding missing values. I will also introduce Bartlett's ANCOVA and its properties, the method of estimating missing values, residual sums of squares, and covariance. At last, I will talk about least squares estimation using ANCOVA and the correct least squares estimates of standard errors.

Title: Methods for coping with missing data: A comparison of Complete-case analysis and Imputation Methods (Giri Karishma).
Abstract: Missing data are a potential source of bias and persistent problem in health care investigations. The standard approach to this problem is Complete-case analysis, which confines the analysis to the set of cases with no missing values, and modifications and extensions. However, Imputation methods provide better estimates, where the data is imputed or filled-in for the values that are missing. This talk will be focused on discussing some background of Complete-case missing data analysis and Imputation methods, which is a more flexible method in handling missing data problems. And to illustrate the inefficiency of Complete-case analysis as compared to the imputation methods.

Title: An approach to modeling asymmetric multivariate spatial covariance structures (Bo Li)
Abstract: We propose a framework in light of the delay effect to model the asymmetry of multivariate covariance functions that is often exhibited in real data. This general approach can endow any valid symmetric multivariate covariance function with the ability of modeling asymmetry and is very easy to implement. Our simulations and real data examples show that asymmetric multivariate covariance functions based on our approach can achieve remarkable improvements in prediction over symmetric models.

Title: Continuously Addictive Models for Functional Regression (Yichao Wu)
Abstract: We propose Continuously Additive Models (CAM), an extension of additive regression models to the case of infinite-dimensional predictors, corresponding to smooth random trajectories, coupled with scalar responses. As the number of predictor times and thus the dimension of predictor vectors grows larger, properly scaled additive models for these high-dimensional vectors are shown to converge to a limit model, in which the additivity is conveyed through an integral. This defines a new type of functional regression model. In these Continuously Additive Models, the path integrals over paths defined by the graphs of the functional predictors with respect to a smooth additive surface relate the predictor functions to the responses. This is an extension of the situation for traditional additive models, where the values of the additive functions, evaluated at the predictor levels, determine the predicted response. We study prediction in this model, using tensor product basis expansions to estimate the smooth additive surface that characterizes the model. In a theoretical investigation, we show that the predictions obtained from fitting continuously additive estimators are asymptotically consistent. We also consider extensions to generalized responses. The proposed estimators are found to outperform existing functional regression approaches in simulations and in applications to human growth and yeast cell cycle data.