## Box Counting in Numpy

The fractal dimension of an object is a single scalar number that allows us to quantify how compact an object is , i.e. how wiggly a curve is, how wrinkled a surface is, how porous a complex volume is.

However, estimating the fractal dimension of an object is not an easy task, and many methods exist. The simplest method is box counting: the idea is to fully cover the object with many boxes of a given size, count how many boxes are needed to cover the object and repeat the process for many box sizes. The scaling of the number of boxes covering the object with the size of the boxes gives an estimate for the fractal dimension of the object.

The algorithm has many limitations, but in its simplest form it can be easily implemented in Python. The idea is to simply bin the object in a histogram of variable bin sizes. This can be easily generalised to any dimensions, thanks to Numpy’s histrogramdd  function.

In the following code, I test the idea with a known fractal, Sierpinski’s triangle, which has an exact (Hausdorff) fractal dimension of log(3)/log(2).

import numpy as np
import pylab as pl

def rgb2gray(rgb):
r, g, b = rgb[:,:,0], rgb[:,:,1], rgb[:,:,2]
gray = 0.2989 * r + 0.5870 * g + 0.1140 * b
return gray

# finding all the non-zero pixels
pixels=[]
for i in range(image.shape[0]):
for j in range(image.shape[1]):
if image[i,j]>0:
pixels.append((i,j))

Lx=image.shape[1]
Ly=image.shape[0]
print (Lx, Ly)
pixels=pl.array(pixels)
print (pixels.shape)

# computing the fractal dimension
#considering only scales in a logarithmic list
scales=np.logspace(0.01, 1, num=10, endpoint=False, base=2)
Ns=[]
# looping over several scales
for scale in scales:
print ("======= Scale :",scale)
# computing the histogram
H, edges=np.histogramdd(pixels, bins=(np.arange(0,Lx,scale),np.arange(0,Ly,scale)))
Ns.append(np.sum(H>0))

# linear fit, polynomial of degree 1
coeffs=np.polyfit(np.log(scales), np.log(Ns), 1)

pl.plot(np.log(scales),np.log(Ns), 'o', mfc='none')
pl.plot(np.log(scales), np.polyval(coeffs,np.log(scales)))
pl.xlabel('log $\epsilon$')
pl.ylabel('log N')
pl.savefig('sierpinski_dimension.pdf')

print ("The Hausdorff dimension is", -coeffs[0]) #the fractal dimension is the OPPOSITE of the fitting coefficient
np.savetxt("scaling.txt", list(zip(scales,Ns)))


## Clustering and periodic boundaries

Clustering in Python can be nicely done using the statistical tools provided by the sklearn library.

For example, the DBSCAN method easily implements a clustering algorithm that detects connected regions, given a maximum distance between two elements of a cluster.

However, natively the library does not support periodic boundaries, which can be sometimes annoying. But an easy workaround can be found precisely exploiting the power of the library: methods like DBSCAN can be given in input distance matrices directly, and then the clustering is computed on these.

The workaround is to compute the distance matrix with the periodic boundaries in it. The easiest way that I have found is to use the scipy function pdist on each coordinate, correct for the periodic boundaries, then combine the result in order to obtain a distance matrix (in square form) that can be digested by DBSCAN.

The following example may give you a better feeling of how it works.

import pylab as pl
from sklearn.cluster import DBSCAN
from scipy.spatial.distance import pdist,squareform

# box size
L=5.
threshold=0.3
# create data
X=pl.uniform(-1,1, size=(500,2))
# create for corners
X[XL*0.5]-=L

# finding clusters, no periodic boundaries
db=DBSCAN(eps=threshold).fit(X)

pl.scatter(X[:,0], X[:,1],c=db.labels_, s=3,edgecolors='None')
pl.figure()

# 1) find the correct distance matrix
for d in xrange(X.shape[1]):
# find all 1-d distances
pd=pdist(X[:,d].reshape(X.shape[0],1))
# apply boundary conditions
pd[pd>L*0.5]-=L

try:
# sum
total+=pd**2
except Exception, e:
# or define the sum if not previously defined
total=pd**2
# transform the condensed distance matrix...
total=pl.sqrt(total)
# ...into a square distance matrix
square=squareform(total)
db=DBSCAN(eps=threshold, metric='precomputed').fit(square)
pl.scatter(X[:,0], X[:,1],c=db.labels_,s=3, edgecolors='None')
pl.show()


Before the periodic boundaries (Lx=Ly=5):

… and after (Lx=Ly=5):

## Concatenate pdfs from the Terminal

Oftentimes it can be convenient to merge different PDF documents in order to get a single, continuous document that can be easily sent via mail for review or correction.

If one has just a few documents, this can be done directly through the Preview.app application on the Mac, but for more documents (or when we want to repeat the merge many times) a command-line application can be very convenient.

On Linux, or on the Mac, poppler is the kind of set of tools that makes the trick (you can install it with  Homebrew on the Mac).

In particular, you will find that the package includes a program called
pdfunite
. Its usage is straightforward:

pdfunite file_in_a.pdf file_in_b.pdf file_in_c.pdf fileout.pdf

## brew update ––force

Homebrew is a very convenient package manager for Mac OS X. It makes the installation of numerous utilities and programs incredibly easy. It is based on a databases of instructions (Ruby formulas) that are kept up to date using Git.

Keeping the database up-to-date is normally done with

brew update

Sometimes, however, it can fail.  It occurred to me already a few times that I was unable to retrieve the latest version of the database, and installing new software becomes impossible.

If the internal diagnostic tool

brew doctor

is not sufficient for identifying and solving the issue, there is a way to force the update. As indicated on these pages, one can use Git directly and recover the database:

cd brew --prefix