Realising a modular data interface to couple quantum mechanical calculators with external data-driven workflows

ARCHER2-eCSE03-10


image

Electronic structure calculations are the workhorse of modern computational chemistry, allowing for the prediction of information for a range of vital societal applications, such as drug efficacy, the composition of new solar cells, or the environmental impact of pollutants. With these predictions, computation has been transformed from an approach that supports experiment into a research leading activity that provides more efficient pathways from the lab to commercial applications.

Whilst electronic structure, and in particular density functional theory (DFT), are widespread in their usage on modern highly parallelised computing (HPC) infrastructures to tackle problems like those given above, there remains a significant overhead to their application that prevents the speed of scientific breakthroughs to which we aspire. In part, this is due to the intrinsically complicated nature of the electron interactions, which must be solved iteratively through computation of “most probable” electron distributions. New paradigms are arising to tackle these problems, such as multiscale modelling and the incorporation of data-driven approaches, such as machine learning, in the software workflows; however, to date this incorporation has been challenging as most electronic structure packages are designed as monolithic, linear software programmes that do not efficiently interact with external packages that can deliver these benefits.

In this work, we challenged this existing paradigm by developing methods to remove and insert large data objects directly into electronic structure calculations. Previously, this has typically been handled using data storage on disk, which has led to human efforts and computing resources being wasted on data conversion, text formatting and parsing, and data pipelining. We developed a new software module, known as the “Atomic Simulation Interface (ASI)”, which is a unified and efficient way to expose and import large data structures related to atomistic and electronic structure models. The work involved both the conceptualisation and realisation of the software module (i.e., design from scratch), and then subsequent interfacing with established electronic structure software packages (FHI-aims and DFTB+) to demonstrate the immediate capabilities and potential. The interfaces allow the preparation of memory space for large data objects, and the transfer in/out of the software packages without the need for writing to disk (which has a large time overhead), or complete destruction at termination, as is the standard in their workflows normally. With the outcome of this work, one can start to use data-driven processes to learn from the data, and hierarchical coupling of software packages so that accurate multiscale embedding models can be developed. The ASI package will support ongoing efforts of the computational chemistry and materials science community to build simulation workflows that include several codes interacting on the level of electronic structure; our implementation can be envisaged to support permutations of quantum (QM) and molecular mechanics (MM), coupled with machine learning (ML), in forms such as QM/MM, QM/QM, QM/ML.

Information about the code

FHI-aims: Most recent release available through centrally installed modules, with proven licence permission. Access to recent developments requires recompilation of development code, with formal release due in the near future.

DFTB+: DFTB+ is an open-source software. The most stable release is available at the central GitHub repository. ASI bindings are included since the 22.2 release. Access to the most recent developments related to ASI API are available from the branch.

ASI API wrapper for DFTB+: It is also open-source software and is available at GitLab along with compilation instructions.

Webinar

Webinar presenting this work

Technical Report

Download as PDF