python - for every point in a list, compute the mean distance to all other points -
i have numpy array points
of shape [n,2] contains (x,y) coordinates of n points. i'd compute mean distance of every point other points using existing function (which we'll call cmp_dist
, use black box).
first verbose solution in "normal" python illustrate want (written top of head):
mean_dist = [] i,(x0,y0) in enumerate(points): dist = [ j,(x1,y1) in enumerate(points): if i==j: continue dist.append(comp_dist(x0,y0,x1,y1)) mean_dist.append(np.array(dist).mean())
i found "better" solution using list comprehensions (assuming list comprehensions better) seems work fine:
mean_dist = [np.array([cmp_dist(x0,y0,x1,y1) j,(x1,y1) in enumerate(points) if not i==j]).mean() i,(x0,y0) in enumerate(points)]
however, i'm sure there's better solution in pure numpy, function allows operation every element using other elements.
how can write code in pure numpy/scipy?
i tried find myself, quite hard google without knowing how such operations called (my respective math classes quite while back).
edit: not duplicate of fastest pairwise distance metric in python
the author of question has 1d array r
, satisfied scipy.spatial.distance.pdist(r, 'cityblock')
returns (an array containing distances between points). however, pdist
returns flat array, is, is not clear of distances belong point (see my answer).
(although, explained in answer, pdist
looking for, doesnt solve problem i've specified in question.)
based on @ali_m's comment question ("take @ scipy.spatial.distance.pdist"), found "pure" numpy/scipy solution:
from scipy.spatial.distance import cdist ... fct = lambda p0,p1: great_circle_distance(p0[0],p0[1],p1[0],p1[1]) mean_dist = np.sort(cdist(points,points,fct))[:,1:].mean(1)
definitely that's sure improvement on list comprehension "solution".
what don't this, though, have sort , slice array remove 0.0 values result of computing distance between identical points (so that's way of removing diagonal entries of matrix cdist).
note 2 things above solution:
- i'm using
cdist
, notpdist
suggested @ali_m. - i'm getting array of same size
points
, contains mean distance every point other points, specified in original question.
pdist
unfortunately returns array contains these mean values in flat array, is, mean values unlinked points referring to, necessary problem i've described in original question.
however, since in actual problem @ hand need mean on means of points (which did not mention in question), pdist
serves me fine:
from scipy.spatial.distance import pdist ... fct = lambda p0,p1: great_circle_distance(p0[0],p0[1],p1[0],p1[1]) mean_dist_overall = pdist(points,fct).mean()
though sure definite answer if had asked mean of means, i've purposely asked array of means points. because think there's still room improvement in above cdist
solution, won't accept answer.
Comments
Post a Comment