Clustering and periodic boundaries

Clustering in Python can be nicely done using the statistical tools provided by the sklearn library.

For example, the DBSCAN method easily implements a clustering algorithm that detects connected regions, given a maximum distance between two elements of a cluster.

However, natively the library does not support periodic boundaries, which can be sometimes annoying. But an easy workaround can be found precisely exploiting the power of the library: methods like DBSCAN can be given in input distance matrices directly, and then the clustering is computed on these.

The workaround is to compute the distance matrix with the periodic boundaries in it. The easiest way that I have found is to use the scipy function pdist on each coordinate, correct for the periodic boundaries, then combine the result in order to obtain a distance matrix (in square form) that can be digested by DBSCAN.

The following example may give you a better feeling of how it works.

import pylab as pl
from sklearn.cluster import DBSCAN
from scipy.spatial.distance import pdist,squareform

# box size
L=5.
threshold=0.3
# create data
X=pl.uniform(-1,1, size=(500,2))
# create for corners
X[XL*0.5]-=L

# finding clusters, no periodic boundaries
db=DBSCAN(eps=threshold).fit(X)

pl.scatter(X[:,0], X[:,1],c=db.labels_, s=3,edgecolors='None')
pl.figure()

# 1) find the correct distance matrix
for d in xrange(X.shape[1]):
    # find all 1-d distances
    pd=pdist(X[:,d].reshape(X.shape[0],1))
    # apply boundary conditions
    pd[pd>L*0.5]-=L
    
    try:
        # sum
        total+=pd**2
    except Exception, e:
        # or define the sum if not previously defined
        total=pd**2
# transform the condensed distance matrix...
total=pl.sqrt(total)
# ...into a square distance matrix
square=squareform(total)
db=DBSCAN(eps=threshold, metric='precomputed').fit(square)
pl.scatter(X[:,0], X[:,1],c=db.labels_,s=3, edgecolors='None')
pl.show()

Before the periodic boundaries (Lx=Ly=5):
fig1

… and after (Lx=Ly=5):

fig2

Published by

francescoturci

Physicist and oboe player.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s