# 非参数模型

What is a nonparametric model? Linear regression models has parameters, the coefficients of covariates, and possible an intercept. A nonparametric model looks like

$Y=g(T)+\epsilon,(*)$

here $T$ is a univariate, $Y$ is the response, $\epsilon$ is the error. Model(*) is said to be infinite-dimension because possible solution for unknown function $g(\cdot)$ is in an infinite-dimension space. There are some fitting methods for function $g(\cdot)$, and we will introduce two of them, kernel estimation and local polynomial estimation.

At first, suppose there is a sample from model(*) ${(T_i,Y_i), i=1,\cdots,n}$.

## 1)Kernel estimation

For any $t$, we estimate $g(t)$ by weighted average of $Y_i$ with correspoding $T_i$ near $t$, and we should assign larger weight for nearer $T_i$’s than further ones. Let $K(\cdot)$ is a probability distribution function(pdf) with bounded support, say, $[a,b]$. In practice, we often choose a symmetric function $K(\cdot)$, such as $a=-1$, $b=1$. Let $h>0$ is a bandwidth and $K_h(\cdot)=K(\cdot/h)/h$. With bandwidth $h$, we approximate $g(t)$ by

$\hat{g}(t)=\frac{\sum\limits_{i=1}^nK_h(T_i-t)Y_i}{\sum\limits_{i=1}^nK_h(T_i-t)}.$

Bandwidth $h$ controls the “neighborhood” of $t$ to come into the weighted average. It can be shown that $\hat{g}(t)$ is a good estimation if bandwidth $h$ is properly selected. How to select a proper bandwidth?

Define $MSE(h)=\sum\limits_{i=1}^n(Y_i-\hat{g}(T_i))^2$, we choose $h$ by

$\hat{h}=\mathop{argmin}_hMSE(h).$

Though minimizing $MSE(h)$ with respect to $h$ is challenging because the kernel fuction may be very complicated. We could select an empirical optimal bandwidth. This criterion of minimizing $MSE(h)$ is Cross-Validation rule. There is a problem with it. If $h$ is so small that for each $T_i$, there is only one sample point in the “neighborhood”, that is $T_i$ itself, then $MSE(h)$ is 0. That is not what we want. So in paractice, for bandwidth selection we use delete-one version of Cross-Validation, when estimate $g(T_i)$, we do not include $T_i$. By some conclusions drawn in papers about kernel estimation, the optimal bandwidth $h$ is of order $n^{-1/5}$. So we could loop for bandwidth in a grid of order $n^{-1/5}$ to select an empirical optimal bandwidth $\hat{h}$. Another criterion for bandwidth selection is to minimize $MISE(h)= E[\int{(\hat{g}(t)-g(t))^2}dt|T_1,\cdots,T_n]$. Those who are interested can refer to papers such as “optimal bandwidth selection for local linear regression” by Li-Shan Huang and JianQing Fan.

## 2)Local polynomial estimation

Suppose $t$ is in a neighborhood of $t_0$, we can approximate $g(t_0)$ by polynomial expansion of $g(\cdot)$ at $t$

$g(t_0)\approx{g(t)+\frac{g^{‘}(t)}{1!}(t_0-t)+ \frac{g^{”}(t)}{2!}(t_0-t)^2+\cdots+\frac{g^{(p)}(t)}{p!}(t_0-t)^p}.$

Set $\beta(t)=(\beta_0(t),\cdots,\beta_p(t))^T=(g(t),g^{‘}(t),\cdots,g^{(p)}(t))^T$. To estimate $g(t)$, we minimize weighted least-squares below:

$\sum\limits_{i=1}^n(Y_i-V_i\beta)^2K_h(T_i-t),$

where $V_i=(1,T_i-t,\cdots,(T_i-t)^p/p!)^T$, $K_h(\cdot)$ is a kernel function with bandwidth $h$. Using matrice notation, we get estimation

$\hat{\beta}(t)=\mathop{argmin}_{\beta}(Y-V\beta)^TW(Y-V\beta),$

where $Y=(Y_1,\cdots,Y_n)^T$, $V=(V_1,\cdots,V_n)^T$, $W=diag(K_h(T_i-t))$. So

$\hat{\beta}(t)=(V^TWV)^{-1}V^TWY.$

For any given $t$, we can estimate $g(t)$ and its derivatives by local polynomial estimation now. In practice, local linear estimation($p=1$) is mostly used.

Like in kernel estimation, bandwidth selection is a practical problem, and delete-one version Cross-Validation method can be used. Notice that if one’s main interest in a semiparametric model is to estimate the parametric component, then bandwidth selection is not very important, a proper one of order $n^{-1/5}$ may be good enough.

Note: 1. Kernel estimation is actually local polynomial estimation with $p=0$.

2. Some commonly used kernels:

1).the boxcar kernel: $K(x)=\frac{1}{2}I(x)$,

2).the Gaussian kernel: $K(x)=\frac{1}{\sqrt{2\pi}}e^{-x^2/2}$,

3).the Epanechnikov kernel: $K(x)=\frac{3}{4}(1-x^2)I(x)$,

4).the tricube kernel: $K(x)=\frac{70}{81}(1-|x|^3)^3I(x)$,

where

$I(x)=\begin{cases}1,\ if\ |x|\le{1},\\0,\ if\ |x|>1.\end{cases}$