Non Parametric Density Estimation Methods in Machine Learning
Last Updated :
28 Apr, 2025
Non-parametric methods: Similar inputs have similar outputs. These are also called instance-based or memory-based learning algorithms. There are 4 Non - parametric density estimation methods:
- Histogram Estimator
- Naive Estimator
- Kernel Density Estimator (KDE)
- KNN estimator (K - Nearest Neighbor Estimator)
Histogram Estimator
It is the oldest and the most popular method used to estimate the density, where the input space is divided into equal-sized intervals called bins. Given the training set X = {xt}N t=1 an origin x0 and the bin width h, the histogram density estimator function is:
Histogram estimator
The density of a sample is dependent on the number of training samples present in that bin. In constructing the histogram of densities we choose the origin and the bin width, the position of origin affects the estimation near the boundaries.
Python3
import numpy as np
def hist_pdf(x, data, n_bins=2,
minv=None, maxv=None):
if minv is None:
minv = np.min(data)
if maxv is None:
maxv = np.max(data)
d = (maxv-minv) / n_bins
bins = np.arange(minv, maxv, d)
bin_id = int((x-minv)/d)
bin_minv = minv+d*bin_id
bin_maxv = minv+d*(bin_id+1)
n_data = len(data)
y = len(data[np.where((data > bin_minv)\
& (data < bin_maxv))])
pdf = (1.0/d) * (y / n_data)
return pdf
Now, let's plot a histogram for the histogram Estimator.
Python3
from sklearn.datasets import load_boston
import matplotlib.pyplot as plt
ds = load_boston()
data = ds['target']
# Demo histogram
xvals = np.arange(min(data), max(data), 1)
n_bins = 15
pdf = [hist_pdf(x, data, n_bins=n_bins) for x in xvals]
plt.xlabel('Data')
plt.ylabel('Density')
plt.title('Histogram Density Estimation')
plt.plot(xvals, pdf)
plt.show()
Output:
Histogram Density Estimation PlotNaive Estimator
Unlike the Histogram estimator, the Naive estimator does not use the concept of origin. There is no assumption of choosing the origin. The density of the sample depends on the neighboring training samples. Given the training set X = {xt}Nt=1 and the bin width h, the Naive density estimator function is:
Naive estimator
The values in the range of h/2 to the left and right of the sample involve the density contribution.
Python3
import numpy as np
def naive_pdf(x, data, n_bins=2,
minv=None, maxv=None):
if minv is None:
minv = np.min(data)
if maxv is None:
maxv = np.max(data)
d = (maxv-minv) / n_bins
bins = np.arange(minv, maxv, d)
bin_id = int((x-minv)/d)
bin_minv = minv+d*bin_id
bin_maxv = minv+d*(bin_id+1)
n_data = len(data)
y = len(data[np.where((data > bin_minv//2)\
& (data < bin_maxv//2))])
pdf = (1.0/d) * (y / n_data)
return pdf
Now we will use the above function to plot a Naive Estimator graph.
Python3
from sklearn.datasets import load_boston
import matplotlib.pyplot as plt
ds = load_boston()
data = ds['target']
# Demo histogram
xvals = np.arange(min(data), max(data), 1)
n_bins = 15
pdf = [naive_pdf(x, data, n_bins=n_bins)\
for x in xvals]
plt.xlabel('Data')
plt.ylabel('Density')
plt.title('Naive Density Estimation')
plt.plot(xvals, pdf)
plt.show()
Naive Density Estimation plotKernel Density Estimator (KDE)
Kernel estimator is used to smoothen the probability distribution function (pdf) and cumulative distribution function (CDF) graphics. The kernel is nothing but a weight. Gaussian Kernel is the most popular kernel:
Gaussian kernel
The kernel estimator is also called Parzen Window:
Kernel density estimator
As you can observe, as |x - xt| increases that means, the training sample is far away from the given sample, and the kernel value decreases. Hence we can say that the contribution of a farther sample is less when compared to the nearest training samples. There are many more kernels: Gaussian, Rectangular, Triangular, Biweight, Uniform, Cosine, etc.
Python3
import numpy as np
from scipy.stats import norm
from sklearn.neighbors import KernelDensity
# Kernel Density Estimator using gaussian kernel
X = np.random.randn(100)
model = KernelDensity(kernel='gaussian',
bandwidth=0.2)
model.fit(X[:, None])
new_data = np.linspace(-5, 5, 1000)
density = np.exp(model.score_samples(new_data[:, None]))
# print(new_data)
# Plot the densities
plt.plot(new_data, density, '-',
color='red')
plt.xlabel('Data')
plt.ylabel('Density')
plt.title('Kernel Density Estimator using Gaussian kernel')
plt.show()
Output:
KDE plot using Gaussian Kernel K - Nearest Neighbor Estimator (KNN Estimator)
Unlike the previous methods of fixing the bin width h, in this estimation, we fix the value of nearest neighbors k. The density of a sample depends on the value of k and the distance of the kth nearest neighbor from the sample. This is close enough to the Kernel estimation method. The K-NN density estimation is, where dk(x) is the Euclidean distance from the sample to its kth nearest neighbor.
KNN Estimator
Let us have an example data sample and estimate the density at a point using nonparametric density estimation functions.
Note: Points marked with 'x' are the given data samples. Unlike the above estimation methods, we do not fix the bind size/width, instead, this density estimation method is based on the k value. We observe a high-density value when k is less and the density is less when the value of k increases.
KNN Estimator
Python3
# KNN Density esitmator
gaussian = norm(loc=0.5, scale=0.2)
X = gaussian.rvs(500)
grid = np.linspace(-0.1, 1.1, 1000)
k_set = [5, 10, 20]
fig, axes = plt.subplots(3, 1, figsize=(10, 10))
for i, ax in enumerate(axes.flat):
K = k_set[i]
p = np.zeros_like(grid)
n = X.shape[0]
for i, x in enumerate(grid):
dists = np.abs(X-x)
neighbours = dists.argsort()
neighbour_K = neighbours[K]
p[i] = (K/n) * 1/(2 * dists[neighbour_K])
ax.plot(grid, p, color='orange')
ax.set_title(f'$k={K}$')
plt.show()
Output:
KNN Density Plots
Similar Reads
Difference between Parametric and Non-Parametric Models in Machine Learning When it comes to statistical modeling and machine learning, parametric and non-parametric models represent two fundamental approaches, each with its strengths and suitability depending on the data and the problem at hand. In this article, we are going to explore parametric and non-parametric models
8 min read
Diffusion Models in Machine Learning A diffusion model in machine learning is a probabilistic framework that models the spread and transformation of data over time to capture complex patterns and dependencies.In this article, we are going to explore the fundamentals of diffusion models and implement diffusion models to generate images.
9 min read
Information Theory in Machine Learning Information theory, introduced by Claude Shannon in 1948, is a mathematical framework for quantifying information, data compression, and transmission. In machine learning, information theory provides powerful tools for analyzing and improving algorithms. This article delves into the key concepts of
5 min read
Agglomerative Methods in Machine Learning Before diving into the Agglomerative algorithms, we must understand the different concepts in clustering techniques. So, first, look at the concept of Clustering in Machine Learning: Clustering is the broad set of techniques for finding subgroups or clusters on the basis of characterization of objec
14 min read
Probabilistic Models in Machine Learning Machine learning algorithms today rely heavily on probabilistic models, which take into consideration the uncertainty inherent in real-world data. These models make predictions based on probability distributions, rather than absolute values, allowing for a more nuanced and accurate understanding of
6 min read
Probability Density Estimation & Maximum Likelihood Estimation Probability density and maximum likelihood estimation (MLE) are key ideas in statistics that help us make sense of data. Probability Density Function (PDF) tells us how likely different outcomes are for a continuous variable, while Maximum Likelihood Estimation helps us find the best-fitting model f
8 min read