** Statistics Seminar
Spring 2012**

Department of Math Sciences, IUPUI

Wednesday 12:15-1:30PM, LD 265

(*Abstracts can be found below)

01/11. Hanxiang Peng, Department of Math Sciences, IUPUI

Organizing Meeting

Central Limit Theorems for Quadratic Forms of Random Vectors of Increasing
Dimensions.*

01/18. Qun Lin, Department of Math Sciences, IUPUI

On Robustness of Empirical Likelihood Ratio Confidence Intervals

01/27. Cheng Ouyang, Department of Math Sciences, Univ Illinois At Chicago

Some functional inequalities for stochastic differential equations driven by
fractional Brownian motions.*

02/01. Shan Wang, Department of Math Sciences, IUPUI

Estimating Linear Functionals With Responses MAR.

02/08. Lingnan Li, Department of Math Sciences, IUPUI

Empirical Likelihood for Estimating Equations with Missing Values.

02/15. Frank Zou, Department of Math Sciences, IUPUI

A Spatio-Temporal Bayesian Model for Syndromic Surveillance.*

02/22. Fang Li, Department of Math Sciences, IUPUI

Testing for the equality of two autoregressive functions using quasi-residuals.*

03/02. Jie Yang, Department of Math Sciences, University of Illinois At Chicago

Classification Based on a Permanental Process*

03/07. Ziyi Yang, Department of Math Sciences, IUPUI

Analysis
of Repeated Events.*

03/14. Spring Break No Seminar

03/21. Xiaofeng Shao, Department of Statistics, University of Illinois At Urbana Champaign

Self-Normalization for Time Series*

03/30. Xin Dang, Department of Math University of Mississippi

Data Mining Methods based on Kernelized Spatial Depth*

04/04. Yiwen Xu, Department of Math Sciences, IUPUI

Introducing estimation methods for missing data*

04/11. Giri Karishma, Department of
Math Sciences, IUPUI

Methods for coping with missing data: A comparison of Complete-case analysis and Imputation Methods*

04/20. Bo Li, Department of Statistics, Purdue University

An approach to modeling asymmetric multivariate spatial covariance structures*

04/27. Yichao Wu, Department of Statistics North Carolina State University*

*
*

05/02.

**Abstracts**

**Title: ***Central limit theorems for quadratic forms of
random vectors of growing dimensions. (Hanxiang Peng) *

**Abstract:** This paper provides su cient conditions for the asymptotic normality of quadratic forms of averages of random vectors of increasing dimension
and improves on conditions found in the literature. Such results are needed in
applications of Owen's empirical likelihood when the number of constraints is
allowed to grow with the sample size. In this connection we x a gap in the
proof of Theorem 4.1 of Hjort, McKeague and Van Keilegom (2009). We also
show how our results can be used to obtain the asymptotic distribution of the
empirical likelihood under contiguous alternatives. Joint with Anton Schick.

**Title:** *Some functional inequalities for stochastic differential equations driven by fractional Brownian motions.
(Cheng Ouyang) *

**Abstract:** The concentration of measure phenomenon and logarithmic Sobolev inequalities are closely related. In this talk, I will present some recent results in this direction for stochastic differential equations
(SDEs) driven by fractional Brownian motions. In particular, as a consequence of
the concentration property, we obtain a Gaussian upper bound for the density of solution to such SDEs. The presentation is based on a joint work with F.Baudoin and S. Tindel.

**Title: **
*A Spatio-Temporal Bayesian Model for Syndromic Surveillance (Jian Zhou) *

**Abstract:** Syndromic surveillance uses syndrome (a speci c collection of
clinical symptoms)
data for early detection of infectious disease outbreaks and bioterrorist
attacks. In
this talk, we propose an inference model for determining the location of
outbreaks of epidemics in a network of nodes. The model is epidemiological, by
choice, to process daily counts from the counties in order to infer when an outbreak is
present that
is distinguishable from background counts. The methodology incorporates Gaussian
Markov random eld (GMRF) and spatio-temporal conditional autoregressive (CAR)
modeling. The methodology has some nice features including timely detection of
outbreaks, robust inference to model misspecification, reasonable prediction
performance,
as well as attractive analytical and visualization tool to assist public health
authorities
in risk assessment. Based on extensive simulation studies and synthetic data
generated
from a dynamic SIR model, we demonstrated that the model is capable of capturing
outbreaks rapidly, while still limiting false positives.

**Title:** *Testing for the equality of two autoregressive functions using quasi-residuals
(Fang Li)*

**Abstract:** It discusses the problem of testing the equality of two nonparametric
autoregressive functions against one-sided alternatives. The heteroscedastic errors
and stationary densities of the two independent strong mixing strictly stationary time
series can be possibly different. The paper adapts the idea of using sum of quasi-residuals
to construct the test and derives its asymptotic null distributions. The paper also shows
that the test is consistent for general alternatives and obtains its limiting distributions under
a sequence of local alternatives. Then a Monte Carlo simulation is conducted to study

the finite sample level and power behavior of these tests at some alternatives. We also
compare the test to an existing lag matched test theoretically and by Monte Carlo experiments.

**Title:** *Classification Based on a Permanental Process (Jie Yang)*

**Abstract:** In this talk we introduce a doubly
stochastic marked point process model for supervised classification problems.
Regardless of the number of classes or the dimension of the feature space, the
model requires only 2~3 parameters for the covariance function. The model is
effective even if the feature region occupied by one class is a patchwork
interlaced with regions occupied by other classes. The classification criterion
involves a permanental ratio for which an approximation using a polynomial-time
cyclic expansion is proposed. Applications to DNA microarray analysis and
protein classifications indicate that the cyclic approximation is effective even
for high-dimensional data. It can employ feature variables in an efficient way
to reduce the prediction error significantly. This is critical when the true
classification relies on non-reducible high-dimensional features.

**Title: ** *
Analysis of Repeated Events.(Ziyi Yang)*

**Abstract**: Events that may occur repeatedly for individual subjects are of
interest in many medical studies. We review methods of analysis for repeated
events, emphasizing that the approach taken in a given study should allow
clinical questions to be addressed as directly as possible. Methods based on
full models for event processes as well as on simpler 'marginal' assumptions are
considered. We apply various methods of analysis to studies involving pulmonary
exacerbations in persons with cystic fibrosis, and the occurrence of bone
metastases and skeletal events in cancer patients, respectively. Most of the
methodology considered can be implemented with existing software.
This talk is based on R.J. Cook and J.F. Lawless. Analysis of Repeated Events.
*Statistical Methods in Medical Research*. 2002; 11: 141-166.

**Title:** *Self-Normalization for Time Series (Xiaofeng Shao)*

**Abstract:** In the inference of time series (e.g. hypothesis testing
and confidence interval construction), one often needs to obtain a consistent estimate for the asymptotic covariance
matrix of a statistic. Or the inference can be conducted by using resampling (e.g. moving block bootstrap) and subsampling
techniques. What is common for almost all the existing methods is that they involve the selection of a smoothing
parameter. Some rules have been proposed to choose the smoothing parameter, but they may involve another user-chosen number,
or assume a parametric model. In this talk, we introduce the so-called selfnormalized (SN) approach in the context of confidence
interval construction and change point detection. The self-normalized statistic does not involve any smoothing parameter and limiting
distribution is nuisance parameter free. The finite sample performance of the SN approach is evaluated in simulated and real data examples.

**Title:** *Data Mining Methods based on
Kernelized Spatial Depth (Xin Dang) *

**Abstract**: Statistical depth functions provide center-outward ordering of
points with respect to a distribution or a date set in high dimensions. Of the
various depth notions, the spatial depth is appealing because of its
computational efficiency. However, it tends to provide circular contours and
fail to capture well the underlying probabilistic geometry outside of the family
of spherically symmetrical distributions. We propose a novel depth, the
kernelized spatial depth (KSD), which generalizes the spatial depth via positive
definite kernels. By choosing a proper kernel, the KSD captures the local
structure of data while the spatial depth fails. Based on KSD,
a simple outlier detector is proposed, by which an observation with a depth
value less than a threshold is declared as an outlier. Upper bounds of false
alarm probability are derived and used to determine the threshold. KSD is
extended to graph data, where pairwise relationships of objects are given and
represented by edges. Several graph kernels including a new proposed one,
complement Laplacian kernel, are considered for ranking the "centrality" of
graph vertices. An application of graph KSD to gene data will be briefly
discussed. A clustering algorithm based on KSD is also proposed. Preliminary
results show it promising. With successes in the application, theoretical
developments of KSD are demanding. The talk will be ended with questions:

1. What properties does the KSD possess?

2. What's the role of parameter in the kernel? How to choose it optimally?

3. What is relationship between KSD and Kernel Density Estimation?

**Title**: *Introducing estimation methods for
missing data (Yiwen Xu)*

**Abstract:** In this presentation, I will introduce the least squares analysis
for complete data, followed by the least squares analysis for missing data,
including estimation methods (in particular, Yate's Method) and methods about finding missing values. I will
also introduce Bartlett's ANCOVA and its properties, the method of estimating
missing values, residual sums of squares, and covariance. At last, I will talk
about least squares estimation using ANCOVA and the correct least squares
estimates of standard errors.

**Title:** *
Continuously Addictive Models for Functional Regression (Yichao Wu)*

**Abstract**: We propose Continuously Additive Models (CAM), an extension of
additive regression models to the case of infinite-dimensional predictors,
corresponding to smooth random trajectories, coupled with scalar responses. As
the number of predictor times and thus the dimension of predictor vectors grows
larger, properly scaled additive models for these high-dimensional vectors are
shown to converge to a limit model, in which the additivity is conveyed through
an integral. This defines a new type of functional regression model. In these
Continuously Additive Models, the path integrals over paths defined by the
graphs of the functional predictors with respect to a smooth additive surface
relate the predictor functions to the responses. This is an extension of the
situation for traditional additive models, where the values of the additive
functions, evaluated at the predictor levels, determine the predicted response.
We study prediction in this model, using tensor product basis expansions to
estimate the smooth additive surface that characterizes the model. In a
theoretical investigation, we show that the predictions obtained from fitting
continuously additive estimators are asymptotically consistent. We also consider
extensions to generalized responses. The proposed estimators are found to
outperform existing functional regression approaches in simulations and in
applications to human growth and yeast cell cycle data.