# Kernel density estimation

KDE (*kernel density estimation*) is used to estimate the unknown density function in probability theory. This application is also the basis for the "heat map" visualization of the whereabouts of team players during a soccer game.

It is one of the non-parametric test methods, proposed by *Rosenblatt *(1955) and *Emanuel Parzen *(1962), also known as *Parzen Window*.

Now we are going to have an understanding of *kernel density estimation*.

## Principles

### General Algorithm Introduction

*Kernel density estimation *(KDE) is a non-parametric method for estimating the probability density function. It is using a smooth peak function called kernel to fit the observed data points, thereby simulating the true probability distribution curve. \[ \widehat{f}_h\left( x \right) = \frac{1}{n}\sum_{i=1}^{n} K_h \left( x-x_i \right) = \frac{1}{nh}\sum_{i=1}^{n} K \left( \frac{x-x_i}{h} \right) \] \(K\) is kernel function and \(h\) is a smoothing parameter, called bandwidth or window. \(K_h\) is scaled kernel function.

**1. Kernel function**

In theory, all smoothed peak functions can be used as the kernel function of KDE. As long as the area under the function curve is equal to 1 for the normalized situation. \[ \int_{N} K\left(x\right) dx = 1 \] Sigmoid function, Gaussian function and rectangle function will be showed below: \[ K_\left(Sigmoid\right) \left( x \right) = \frac{1}{1+e ^ \left(-x\right)} \] \[ K_\left(Gaussian\right) \left( x \right) = \frac{1}{\sqrt{2\pi}} e^\left( -\frac{1}{2} x^2 \right) \] \[K_\left(Rectangle\right) \left( x \right)= \begin{cases} \frac{1}{2}, & \mbox{if } \left|x\right|\le 1 \\ 0, & \mbox{otherwise } \end{cases}\]

**2. Bandwidth**

The bandwidth reflects the overall flatness of the KDE curve, that is, the proportion of the observed data points in the formation of the KDE curve. The larger the bandwidth is, the smaller the proportion of observed data points in the resulting curve shape will be. The smaller the bandwidth is, the steeper the overall curve of KDE will be. It turns out that the choosing the bandwidth is the most difficult step in creating a good kernel density estimate that captures the underlying distribution of the variable. Choosing the bandwidth is a complicated topic that is better addressed in a more advanced book or paper, but here are some useful guidelines:

\((1)\) A small \(h\) results in a small standard deviation, and the kernel places most of the probability on the datum. Use this when the sample size is large and the data are tightly packed.

\((2)\) A large \(h\) results in a large standard deviation, and the kernel spreads more of the probability from the datum to its neighbouring values. Use this when the sample size is small and the data are sparse.

Generally use the size of \(MISE\) (Mean Integrated Squared Error) to measure the performance of \(h\). \[ MISE\left( h \right)=\frac{1}{n}\sum_{i=1}^{n} \left( \widehat{f}_h\left( x_i \right) - f\left( x_i \right) \right)^2 \]

**3. 2-D KDE**

The two-dimensional kernel density estimation uses the principle of kernel density estimation to plot the density of points on the two-dimensional plane. It is a method to express probability density distribution through pixels.

In principle, 2-D kernel density estimation is the expansion of 1-D situation. The space of dataset transfer from axis to a plane, so the each data point will have 2 dimention information. We need to choose another value format to describe the density. Consequently the value of pixels can be used to present the density. In 2-D KDE, we use the distance parameter \(dist\) to replace the single parameter in 1-D dimention. \[ \widehat{f}_h\left( x, y \right) = \frac{1}{nh^2}\sum_{i=1}^{n} K_0 \left( \frac{dist_i}{h} \right) \] The principle and the evaluation method of kernel function and bandwidth are same to those in 1-D situation. We provide an interactive module for this part.

## Hands-On

In the *Data* tab, you can draw data points, and we also provide three data sets: 3 cluster, Smiley and Grid 15 clusters.

Random noise and random clusters dataset are also available.

In the *Kernel Density Estimation* tab, you can choose from four kernel functions, which are: *Epanechnikov*, *Gaussian*, *Silverman K2* and *Silverman K3*. The graph below shows the shape of each function.

Search radius can be set manually, but the default is 30.

## References

- Murray Rosenblatt:
*Remarks on Some Nonparametric Estimates of a Density Function*, 1956 - Emanuel Parzen:
*On Estimation of a Probability Density Function and Mode*, 1962 - Bernard Silverman:
*Density Estimation for Statistics and Data Analysis*, 1986 - Wikipedia:
*Kernel density estimation*, July 2020 - Wikipedia:
*Multivariate kernel density estimation*, July 2020