Augment Coordinates for Implementation of Periodic Boundary Conditions
Periodic Boundary conditions are one of the essential requirements of Molecular Dynamics Simulations. Majority of the MD simulations are executed with PBC in all three directions. Therefore, all the distance computations in MDAnalysis should also support PBC in triclinic simulation boxes. In simpler terms, a particle exiting from one of the face of a box will also enter simulataneously from the corresponding opposite face. In three dimensions, this creates 26 exact replicas of any particle in the neighbourhood of the box under consideration. Inorder to calculate the distance between two particles under the influence of periodic boundary conditions, a minimum image convention is followed. Minimum image distance is the smallest possible distance between two particles in the box as well as between the images in the surrounding duplicate boxes. This naive implementation of PBC for distance calculation increases the computation cost by 27 times.
Present implementation to handle PBC using KDTree in MDAnalysis is a more advanced approach. Rather than evaluating all the images, only the relevant images are generated and checked based on the relative position of both the particles. For a given box dimensions, its reciprocal vectors are evaluated. If [[a1, a2, a3], [b1, b2, b3], [c1, c2, c3]]
are the dimensions of the box such that [a, b, c]
are the three vectors of the box, the reciprocal vectors [ra, rb, rc]
are evaluated as:
ra = b X c
rb = c X a
rc = a X b
The reciprocal vectors represent the normal vectors to the plane formed by the its member vectors i.e. ra
is the plane normal vector of a plane formed by b
and c
vectors. Since the problem is inherently three dimensional, both the particles in consideration have a location defined by their respective three dimensional vectors. A dot product with the reciprocal vectors yield the distance of the coordinate from the corresponding face along the direction of normal (reciprocal) vector. This concept can be exploited to limit the generation of images which are within a certain distance from the box faces. The cutoff distance of [a/2, b/2, c/2]
would be the condition of equivalency of this approach with the naive PBC implementation described above.
However, typical problems in MDAnalysis doesnot require to go to the extreme of choosing half the box distance as the cutoff distance. A conditional evaluation of image distance only of the coordinate lie inside the cutoff distance saves a significant amount of time. The current algorithm in MDAnalysis is pure python implementation and can be described as follows:
- Evaluate the reciprocal vectors.
- Find the distance of the coordinates from each of the 6 faces.
- If any of the distance is less than the specified distance, generate images and check the distance based on the minimum image convention.
- Return the minimum distance.
This has two drawbacks though :
- Pure python implementation is slower as the evaluation of minimum distance requires multiple loops.
- Other algorithms for near neighbour searching which donot have specific implementation of PBC cannot be handled direclty.
To make the algorithm more flexible, an approach of augment coordinates is adopted. In this approach, all the particles within the cutoff distance are duplicated as per the Periodic Boundary Conditions. Another mapping of these particles with the original particles is also constructed to undo the augmentation. These duplicated particles along with the original particles can now be operated with any algorithm to evaluate the distances without any special consideratio to PBC. All the indices/coordinates can be mapped back to the original set of particles using the mapping described above. A cython implementation is implemented for triclinic boxes for fast looping based on this notebook. Furthermore, memoryviews are used here based on this observation.
Similar algorithm described above is used here for cython implementation. To evaluate the distance of particle from each of the faces, dot product operation of two vectors i.e. [x, y, z]
(particle coordinate) and [x, y, z] - ([a1, a2, a3] + [b1, b2, b3] + [c1, c2, c3])
i.e distance of coordinate from the opposite diagonal box is performed. Furthermore, duplicates in the diagonal images of box are generated based on the condition if the particle distance evaluated from more than one faces in less than the specified distance. For example, if the particle lies close to the xy plane but far from the yz plane, then corresponding duplicate will be generated in xy plane near the opposite plane. The function to augment the coordinates and undo the arguments can be written as
%%cython --annotate
cimport cython
import cython
cimport numpy as np
import numpy as np
from libc.math cimport sqrt
@cython.boundscheck(False)
@cython.wraparound(False)
def make_halo(float[:, ::1] coordinates, float[:,::1] box, float[:,::1] reciprocal, float r):
"""Calculate augmented coordinate set
Parameters
----------
coordinates : np.ndarray
coordinates to augment
box : np.ndarray
size of box
rm : np.ndarray
reciprocal matrix
r : float
thickness of halo region to buffer by
Returns
-------
augmented : np.ndarray
coordinates of the new augmented coordinates
indices : np.ndarray
original indices of the augmented coordinates
"""
cdef bint lo_x, hi_x, lo_y, hi_y, lo_z, hi_z
cdef int i, j, p, N
cdef float shiftX[3]
cdef float shiftY[3]
cdef float shiftZ[3]
cdef float coord[3]
cdef float end[3]
cdef float other[3]
cdef int dim
dim = coordinates.shape[1]
# room for adding triclinic support by using (3,) vectors
for i in range(dim):
shiftX[i] = box[0, i]
shiftY[i] = box[1, i]
shiftZ[i] = box[2, i]
end[i] = box[0, i] + box[1, i] + box[2,i]
N = coordinates.shape[0]
p = 0 # output counter
# allocate output arrays
# could be more conservative with this
# or use C++ vectors + push etc
cdef float[:, :] output = np.zeros((N, 3), dtype=np.float32)
cdef int[:] indices = np.zeros(N, dtype=np.int32)
for i in range(0,N):
for j in range(3):
coord[j] = coordinates[i, j]
other[j] = end[j] - coordinates[i,j]
# identify the condition
lo_x = _dot(&coord[0], &reciprocal[0,0]) <= r
hi_x = _dot(&other[0], &reciprocal[0,0]) <= r
lo_y = _dot(&coord[0], &reciprocal[1,0]) <= r
hi_y = _dot(&other[0], &reciprocal[1,0]) <= r
lo_z = _dot(&coord[0], &reciprocal[2,0]) <= r
hi_z = _dot(&other[0], &reciprocal[2,0]) <= r
if lo_x:
# if X, face piece
for j in range(3):
# add to output
output[p, j] = coord[j] + shiftX[j]
# keep record of which index this augmented position was created from
indices[p] = i
p += 1
if lo_y:
# if X&Y, edge piece
for j in range(3):
output[p, j] = coord[j] + shiftX[j] + shiftY[j]
indices[p] = i
p += 1
if lo_z:
# if X&Y&Z, corner piece
for j in range(3):
output[p, j] = coord[j] + shiftX[j] + shiftY[j] + shiftZ[j]
indices[p] = i
p += 1
elif hi_z:
for j in range(3):
output[p, j] = coord[j] + shiftX[j] + shiftY[j] - shiftZ[j]
indices[p] = i
p += 1
elif hi_y:
for j in range(3):
output[p, j] = coord[j] + shiftX[j] - shiftY[j]
indices[p] = i
p += 1
if lo_z:
for j in range(3):
output[p, j] = coord[j] + shiftX[j] - shiftY[j] + shiftZ[j]
indices[p] = i
p += 1
elif hi_z:
for j in range(3):
output[p, j] = coord[j] + shiftX[j] - shiftY[j] - shiftZ[j]
indices[p] = i
p += 1
if lo_z:
for j in range(3):
output[p, j] = coord[j] + shiftX[j] + shiftZ[j]
indices[p] = i
p += 1
elif hi_z:
for j in range(3):
output[p, j] = coord[j] + shiftX[j] - shiftZ[j]
indices[p] = i
p += 1
elif hi_x:
for j in range(3):
output[p, j] = coord[j] - shiftX[j]
indices[p] = i
p += 1
if lo_y:
for j in range(3):
output[p, j] = coord[j] - shiftX[j] + shiftY[j]
indices[p] = i
p += 1
if lo_z:
for j in range(3):
output[p, j] = coord[j] - shiftX[j] + shiftY[j] + shiftZ[j]
indices[p] = i
p += 1
elif hi_z:
for j in range(3):
output[p, j] = coord[j] - shiftX[j] + shiftY[j] - shiftZ[j]
indices[p] = i
p += 1
elif hi_y:
for j in range(3):
output[p, j] = coord[j] - shiftX[j] - shiftY[j]
indices[p] = i
p += 1
if lo_z:
for j in range(3):
output[p, j] = coord[j] - shiftX[j] - shiftY[j] + shiftZ[j]
indices[p] = i
p += 1
elif hi_z:
for j in range(3):
output[p, j] = coord[j] - shiftX[j] - shiftY[j] - shiftZ[j]
indices[p] = i
p += 1
if lo_z:
for j in range(3):
output[p, j] = coord[j] - shiftX[j] + shiftZ[j]
indices[p] = i
p += 1
elif hi_z:
for j in range(3):
output[p, j] = coord[j] - shiftX[j] - shiftZ[j]
indices[p] = i
p += 1
if lo_y:
for j in range(3):
output[p, j] = coord[j] + shiftY[j]
indices[p] = i
p += 1
if lo_z:
for j in range(3):
output[p, j] = coord[j] + shiftY[j] + shiftZ[j]
indices[p] = i
p += 1
elif hi_z:
for j in range(3):
output[p, j] = coord[j] + shiftY[j] - shiftZ[j]
indices[p] = i
p += 1
elif hi_y:
for j in range(3):
output[p, j] = coord[j] - shiftY[j]
indices[p] = i
p += 1
if lo_z:
for j in range(3):
output[p, j] = coord[j] - shiftY[j] + shiftZ[j]
indices[p] = i
p += 1
elif hi_z:
for j in range(3):
output[p, j] = coord[j] - shiftY[j] - shiftZ[j]
indices[p] = i
p += 1
if lo_z:
for j in range(3):
output[p, j] = coord[j] + shiftZ[j]
indices[p] = i
p += 1
elif hi_z:
for j in range(3):
output[p, j] = coord[j] - shiftZ[j]
indices[p] = i
p += 1
return np.array(output[:p]), np.array(indices[:p])
@cython.boundscheck(False)
@cython.wraparound(False)
cdef float _dot(float *a, float *b):
"""Return dot product of two sequences in range."""
cdef ssize_t n
cdef float sum1
cdef ssize_t dim
dim=3
sum1 = 0.0
for n in range(dim):
sum1 += a[n] * b[n]
return sum1
@cython.boundscheck(False)
@cython.wraparound(False)
def undo_augment(int[:] results, int[:] translation, int nreal):
"""Translate augmented indices back to originals
Note: modifies results in place!
Parameters
----------
results : ndarray of ints
indices of coordinates, including "augmented" indices
translation : ndarray of ints
original indices of augmented coordinates
nreal : int
number of real coordinates, ie values in results equal or larger than this
need to be translated to their real counterpart
Returns
-------
results : ndarray of ints
"""
cdef int N
N = results.shape[0]
for i in range(N):
if results[i] >= nreal:
results[i] = translation[results[i] - nreal]
return results
Followed by the definition of this function, the API for distance evaluations which returns all the pairs within a specified distance using KDTree can be written as:
def augment_kdtree(coords, cutoff, box):
"""
box of type [A,B,C,a1,a2,a3]
"""
box = triclinic_vectors(box)
# Define reciprocal matrix
rm = np.zeros(9, dtype=np.float32).reshape(3, 3)
rm[0] = np.cross(box[1], box[2])
rm[1] = np.cross(box[2], box[0])
rm[2] = np.cross(box[0], box[1])
for i in range(3):
rm[i] /= norm(rm[i]) # normalize
aug, idx = make_halo(coords, box, rm, cutoff)
aug_coord = np.concatenate([coords, aug])
kdtree = spatial.cKDTree(aug_coord)
pairs = np.array(list(kdtree.query_pairs(cutoff)), dtype=np.int32)
if len(pairs) > 1:
undo_augment(pairs[:, 0], idx, len(coords))
undo_augment(pairs[:, 1], idx, len(coords))
pairs = np.unique(np.sort(pairs), axis=0)
return pairs
Similarly, other distance evaluations such as atom selections can be performed using similar methods. To benchmark the cython implementation of KDtree and already implemented PKDtree in MDAnalysis, a use case to identify all the bonds i.e. distance between two particles less than the sum of radius of each particle is chosen. Two different real case are also chosen with ~12k and ~1000k particles to check the extremeties of performance. While more detailed analysis is provided here and here, major highlights of the benchmark is provided below.
For the two cases described above the guess_bonds function in MDAnalysis.topology.guessers.py
took 2.86 sec
for smaller system, while the PKDTree
in MDAnalysis.lib.pkdtree
took 2.69 sec
, whereas the function defined here took 920 ms
i.e increase in performance by 3 times. For the larger system, the computation time for ` guess_bonds was more than
30 min whereas the current implementation of PKDTree in MDAnalysis took around
7 mins. For this case, the method described above took
2 mins` i.e. the performance boost of 3 times than already implemented PKDtree (but not in guess bonds), whereas the performance boost is significantly high compared to brute force approach currently implemented in MDAnalysis for guessing the bonds.
The next step is to include this functionality in MDAnalysis and replacement of Bio.KDtree, which is obsolete in BioPython, with scipy.spatial.cKDTree.
See you next time!!!