python - Relation between sigma and bandwidth in gaussian_filter and gaussian_kde -
applying functions scipy.ndimage.filters.gaussian_filter , scipy.stats.gaussian_kde on given set of data can give similar results if sigma , bw_method parameters in each function respectively chosen adequately.
for example, can obtain random 2d distribution of points following plots setting sigma=2. in gaussian_filter (left plot) , bw_method=sigma/30. in gaussian_kde (right plot):

(the mwe @ bottom of question)
there's relation between these parameters since 1 applies gaussian filter , other 1 gaussian kernel density estimator on data.
the definition of each parameter is:
sigma : scalar or sequence of scalars standard deviation gaussian kernel. standard deviations of gaussian filter given each axis sequence, or single number, in case equal axes.
this 1 can understand given definition of gaussian operator:

- scipy.stats.gaussian_kde,
bw_method:
bw_method : str, scalar or callable, optional method used calculate estimator bandwidth. can ‘scott’, ‘silverman’, scalar constant or callable. if scalar, used directly kde.factor. if callable, should take gaussian_kde instance parameter , return scalar. if none (default), ‘scott’ used. see notes more details.
in case let's assume input bw_method scalar (float) comparable sigma. here's lost since can find no information kde.factor parameter anywhere.
what i'd know precise mathematical equation connects both these parameters (ie: sigma , bw_method when float used) if possible.
mwe:
import numpy np scipy.stats import gaussian_kde scipy.ndimage.filters import gaussian_filter import matplotlib.pyplot plt def rand_data(): return np.random.uniform(low=1., high=200., size=(1000,)) # generate 2d data. x_data, y_data = rand_data(), rand_data() xmin, xmax = min(x_data), max(x_data) ymin, ymax = min(y_data), max(y_data) # define grid density. gd = 100 # define bandwidth bw = 2. # using gaussian_filter # obtain 2d histogram. rang = [[xmin, xmax], [ymin, ymax]] binsxy = [gd, gd] hist1, xedges, yedges = np.histogram2d(x_data, y_data, range=rang, bins=binsxy) # gaussian filtered histogram. h_g = gaussian_filter(hist1, bw) # using gaussian_kde values = np.vstack([x_data, y_data]) # data 2d kernel density estimate. kernel = gaussian_kde(values, bw_method=bw / 30.) # define x,y grid. gd_c = complex(0, gd) x, y = np.mgrid[xmin:xmax:gd_c, ymin:ymax:gd_c] positions = np.vstack([x.ravel(), y.ravel()]) # evaluate kde. z = kernel(positions) # re-shape plotting z = z.reshape(gd, gd) # make plots. fig, (ax1, ax2) = plt.subplots(1, 2) # gaussian filtered 2d histograms. ax1.imshow(h_g.transpose(), origin='lower') ax2.imshow(z.transpose(), origin='lower') plt.show()
there no relationship because doing 2 different things.
with scipy.ndimage.filters.gaussian_filter, filtering 2d variable (an image) kernel, , kernel happens gaussian. smoothing image.
with scipy.stats.gaussian_kde try estimate probability density function of 2d-variable. bandwidth (or smoothing parameter) integration step, , should small data allows.
the 2 images same because uniform distribution, drew samples, not different normal distribution. you'd better estimate normal kernel function.
you can read kernel density estimation.
edit: in kernel density estimation (kde), kernels scaled such bandwidth standard deviation of smoothing kernel. bandwidth use not obvious depends on data. there exists optimal choice univariate data, called silverman's rule of thumb.
to summarize, there no relationship between standard deviation of gaussian filter, , bandwidth of kde, because we're talking oranges , apples. however, talking kde only, there is relationship between kde bandwidth , standard deviation of same kde kernel. equal! in truth implementation details differ, , there may scaling depends on size of kernel. read specific package gaussian_kde.py
Comments
Post a Comment