0% found this document useful (0 votes)
18 views

VIAMD

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

VIAMD

Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

This article is licensed under CC-BY 4.

pubs.acs.org/jcim Article

VIAMD: a Software for Visual Interactive Analysis of Molecular


Dynamics
Robin Skånberg,* Ingrid Hotz, Anders Ynnerman, and Mathieu Linares*
Cite This: https://doi.org/10.1021/acs.jcim.3c01033 Read Online

ACCESS Metrics & More Article Recommendations *


sı Supporting Information
See https://pubs.acs.org/sharingguidelines for options on how to legitimately share published articles.

ABSTRACT: The typical workflow in molecular dynamics (MD)


analysis requires several separate tools, often resulting in a lack of
synergy and interaction between the individual analysis steps. This
article presents VIAMD, an application designed to address this issue
Downloaded via 128.14.196.30 on November 28, 2023 at 14:33:06 (UTC).

by integrating a 3D visualization of molecular trajectories with flexible


analysis components. VIAMD uses an interactive scripting interface,
allowing for property definition and evaluation. The application
provides context-aware suggestions and expression feedback through
information and visualizations. The user-defined properties can be
explored and analyzed through the various components. This enables
correlation with spatial conformations, statistical analysis of
distributions, and powerful aggregation of multidimensional proper-
ties such as spatial distribution functions. VIAMD has the potential to
advance research in many scientific disciplines and is a promising solution for improving the workflow of MD visualization and
analysis.

■ INTRODUCTION
Molecular dynamics (MD) analysis can be summarized as the
Molecular visualization tools, such as Chimera,1 VMD,2
Pymol,3 Caver,4 Jmol/Jsmol,5 Samson,6 Ovito,7 MegaMol,8 and
process of studying properties from simulated molecular Mol*,9 provide means of configuring visual representations,
systems. During the simulation, a base set of properties, such often with a focus on generating images for scientific
as coordinates and energies, is periodically sampled and written publications. Many of the listed tools also provide some tools
as output. These base properties are then examined directly or for analysis. Kozlikova et al.10 and Kut’ák et al.11 provide a
used indirectly as inputs to derive new properties. Because MD comprehensive overview of the molecular visualization
is used in various scientific disciplines to study processes at the techniques used in the software mentioned above.
molecular level, there is natural variation in the properties Property computation in the form of user-defined ex-
studied. This property variation has led to the development of pressions through scripting is exposed through tools such as
MD analysis tools that allow researchers to construct new Collective Variables Colvars12 and is available in software such
properties from the basis set, often through a scripting interface as VMD,2 LAMMPS,13 NAMD,14 and GROMACS.15 MDA-
using expressions and operations. The usual MD analysis nalysis16 provides similar functionality but is implemented as a
workflow is an iterative process in which the MD systems are Python library. The analyzed properties are often time-varying
visualized using a 3D visualization tool to gain a spatial and scalar values derived from the atom coordinates of the
temporal understanding of the system and its structures. trajectory, either as an aggregated measure of the complete
Properties are then defined and computed using a scripting trajectory or, more frequently, varying over trajectory frames.
tool, and their values are plotted and studied. The plotted More recently, Ulbrich et al.17 demonstrated the computa-
values can then be correlated with spatial and temporal tion of user-defined properties through a visual node-based
geometric configurations displayed in the visualization tool, graph network instead of the typical scripting interface. A
inspiring ideas for new hypotheses to test. While most benefit of using visual nodes to construct expressions is the
molecular visualization tools facilitate some form of analysis,
they are often task-specific and implemented as a separate plug-
in. Task-specific means that it is designed to compute and plot a Received: July 8, 2023
specific property, but its data are unavailable to other Revised: October 13, 2023
components in the application. When resorting to isolated Accepted: November 6, 2023
tools in the analysis process, the potential synergy and
interaction between the tools are left on the table. Tapping
into this can potentially increase the efficiency and ease of use.
© XXXX The Authors. Published by
American Chemical Society https://doi.org/10.1021/acs.jcim.3c01033
A J. Chem. Inf. Model. XXXX, XXX, XXX−XXX
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article

Figure 1. Overview of the application running with its components: spatial view (a), representation window (b), animation window (c), script editor
(d), Ramachandran plot (e), temporal window (f), density volume window (g), distribution window (h), and shape space window (i).

ability to visualize each node in the expression tree. sMolBoxes 1e,f,g,h,i). The central component orchestrating the analysis is
exploits this to plot intermediate results from the operations. the script editor, Figure 1d and Figure 3, where the user can
The downside of visual node-based graph networks is the write expressions and assign them to properties. The script is
occupation of screen space, a limited resource. The graph can then evaluated over the trajectory frames, similar to Colvars.
grow substantially for complex expressions, and the layout of The user-defined properties form the core analysis entities of
nodes then has to be carefully considered not to end up with a the application, and upon evaluation, they can be inspected in
spaghetti graph. the various components of the application.
In this Article, we present VIAMD�software for Visual
Interactive Analysis of Molecular Dynamics that builds upon an
initial prototype18 created in Inviwo.19 The software has since
then evolved and been redesigned to be applicable in a broader
■ COMPONENTS OF VIAMD
Spatial View (a). The spatial view, Figure 1a, provides a 3D
setting. The software is designed to optimize the workflow view of the molecular system together with graphical primitives,
common in MD analysis by tightly linking the three main which enhance and emphasize structures and operations within
analysis tools: molecular visualization, property calculation, and the system. The camera can be manipulated by clicking and
property plotting. By placing the tools in the same application, dragging the mouse, and upon double-clicking on any geometry
we can leverage emergent synergies formed by tight coupling. in the scene, the camera will use that point in space as its pivot
Examples of such synergies include the ability to click and point. The user can select atoms with the mouse by holding the
inspect points in time displayed by plots, enabling correlation of shift key, where the left mouse button appends and the right
the value and conformation of structures. Another example is mouse button removes items from the current selection. The
providing suggestions for properties from the user’s current user defines the system’s visual representations in the
selection and supporting the user in declaring properties by representations window, Figure 1b. Each representation is
providing direct visual feedback asserting the operation and defined by a type, color map, and textual filter that defines the
structures involved. The article introduces the software VIAMD visible set of atoms.
by first providing a brief overview of the application, followed Selection and Interaction. The application exposes
by individual sections that describe the components in more different ways of selecting and interacting with the data
detail. through different views, but all operate on a shared set of atoms

■ OVERVIEW
The application is built around a 3D visualization of the
representing active selection. If atoms are present in the active
selection, the colors of the shown representations are
desaturated and the selected atoms are shown in blue (Figure
molecular data set referred to as the spatial view, Figure 1a. In 2a). Similar to the active selection, a highlighted set of atoms
conjunction with the spatial view, several components are (shown in yellow) is used to communicate structures involved
exposed, most of which are dedicated to analysis (Figure in operations. When the user right-clicks in the spatial view, a
B https://doi.org/10.1021/acs.jcim.3c01033
J. Chem. Inf. Model. XXXX, XXX, XXX−XXX
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article

Figure 2. Selection vs highlight (a): Atoms part of the active selection are shown in blue, and the hovered atom is highlighted in yellow with an
information window. The Spatial Context menu (b) exposes operations through submenus that can be applied to the data set. Script (c) supplies
script suggestions based on active selection. Remap Element (d) provides a mechanism for remapping the assigned atomic element for specific labels.
Recenter Trajectory (e) provides the option of recentering the trajectory, and Selection (f) exposes tools for manipulating the active selection.

spatial context menu (Figure 2b) is accessed, which offers the becomes available. The operation uses the center of mass of the
user a set of operations based on active selection. supplied set to recenter each trajectory frame.
Script Suggestions. With an active selection present, the Selection Growth. When an active selection is present,
user is provided with suggestions of script snippets based on the growing that selection becomes available. The operation is
active selection. There are two categories of suggestions given: applied either as a flood fill across covalent bonds or as a radial
operations and selections. The operations result in basic growth based on the distance to any atom in the active set.20
function calls that match the number of selected entities, given Periodic Boundary Conditions. When the application loads
some permutations of the context in which they are applied. a trajectory frame into memory, atoms within structures are
Selection suggestions are the textual equivalent of the active optionally translated to prevent them from being split across the
selection expressed in the scripting language under different periodic boundaries by applying Deperiodization. Structures in
permutations. The active selection contains three atoms in this context are implicitly defined by the covalent bonds formed
Figure 2c. Therefore, the generated suggestions assume that the between atoms. Any atoms connected by covalent bonds will
user intends to compute an angle. In general, there are many belong to the same structure. Deperiodization of a structure is
achieved by computing its center of mass, then applying
ways of expressing the same operation. In this case, the selected
Periodic Boundary Conditions to the center of mass to ensure
atoms belong to the same residue; therefore, the user is given
that it resides within the viewed period of the system. Then,
options to perform this operation within the context of the
each atom of the structure is translated to the same period as
residue. Note that the indexing of atoms changes when applied the center of mass.
within a context and is, in such cases, local to the context. Animation Window (c). Trajectories can be animated
Element Remapping. Some topology file formats lack atom under different interpolation methods: nearest-, linear-, and
element data, and in such cases, the application attempts to cubic-spline. The interpolation takes into account any periodic
infer it from other data available, mainly the atom label. This boundary conditions present in the system. However, any form
approach is based on heuristics and will fail on certain of interpolation in a periodic system will result in coordinates
occasions. In such a case, the user can modify the mapped that can end up outside the system’s boundary. This occurs
element of the atoms through the Remap Element menu shown whenever the control points (coordinates) reside in different
in Figure 2d. This operation can be applied to a single atom or periods. To prevent this, the user can Apply PBC to the
to all atoms with the same label. interpolated frame, which enforces the Periodic Boundary
Recentering. The application supports dynamic recentering Conditions as a postinterpolation operation.
of trajectories: by selecting an atom, residue, chain, or arbitrary Script Language and Editor (d). The scripting language
set of atoms, the context menu option seen in Figure 2e used in the script editor (Figure 3) is a central part of VIAMD
C https://doi.org/10.1021/acs.jcim.3c01033
J. Chem. Inf. Model. XXXX, XXX, XXX−XXX
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article

Contexts. Filtering operations are not limited to producing a


single set of atoms but can produce multiple sets. For example,
line 3 in listing 1, resname(“GLY”), results in multiple sets of
atoms, each set representing a residue. The motivation is to
allow the sets to serve as contexts for operations to be applied
within them. This is referred to as contextual operations.
Contextual operations are achieved by using the keyword in,
where the left-hand side of the keyword expresses the operation
to be performed and the right-hand side provides the contexts
for the operation. Again, consider line 3 in listing 1, where the
Figure 3. Script editor window: A text-based editor for defining
properties and expressions evaluated for the loaded data set. property agly is declared as the result of a contextual operation
where the operation is to compute the angle between atoms 1,
2, and 3, and the contexts are each residue named GLY. In
isolation, the angle operation yields a single scalar value, but
and has been designed from the ground up, focusing on MD
analysis. It is used for both defining selections and expressing since this operation is evaluated within multiple contexts, the
computational properties in the script editor. The motivation resulting type will have a length matching the number of
for designing a new language instead of choosing some existing evaluated contexts.
high-level scripting language, e.g., Python, is two-fold. First and
foremost, the language’s syntax can be simplified and task-
focused, to support MD analysis. Second, it allows us to inspect
and evaluate any part of an expression by traversing the abstract
syntax tree. This grants the ability to provide feedback in the
form of type information and to invoke visualizations of any
part of any expression. The language is designed with
declarative syntax, meaning the user declares the desired results
rather than the more common imperative counterpart, where Listing 2: Example of and vs in keywords and how they differ.
the user expresses the explicit algorithmic steps involved. This At first glance, the keywords seem to offer similar
simplifies the syntax of the language and reduces its overall functionality. However, the difference is that in preserves
complexity. We refer the reader to the Supporting Information contexts while and implicitly flattens the result into a single set.
and online resources for listings of the available operations and Consider listing 2, where two selections are created, s1 and s2,
more advanced examples. each representing the same set of atoms. The key difference is
that s2 maintains the contexts of the proteins, and subsequent
operations that use the selections may differ in their results.
Script Editor. The scripting editor, Figure 1d and Figure 3, is
a text editor in which the user can write scripts containing
identifiers and expressions. Input parameters of functions are
validated against the loaded topology of the system, enabling
Listing 1: Example of the syntax of the VIAMD scripting errors and feedback, e.g., referring to residue names or atomic
language. elements that are not present. Another type of feedback occurs
Variables and Properties. Variables are expressions assigned when the user hovers over an expression or subexpression with
to identifiers that can then be referenced in other expressions. the mouse cursor. A visualization operation is then performed
Variables of specific types (float[1..N], Distribution, and to provide visual feedback on the operation and what parts of
Volume) are promoted into Properties, which can be visualized the system are involved. This is achieved by embedding
in the various components of VIAMD. Float properties with a geometrical primitives, such as points, lines, and triangles, and
length greater than 1 do not hold a single value but a population highlighting structures in the spatial view. Examples of this can
of values. In the components of the application, they have the be seen in Figure 1 and Figure 4. This geometrical embedding
option of being shown as aggregates of the population. As
effectively ties the expressions to the spatial interpretation of
examples, see Figure 1f,h and Figure 5 where user-defined
the data, providing a feedback channel for asserting the user’s
properties d1 and a1 are shown as population aggregates,
respectively. In the temporal window (f), aggregate properties intent. When the user evaluates the script, the expressions are
can be shown individually or as population, min, max, mean, evaluated over the trajectory frames.
and variance. While in the distribution window (h), the Property Import/Export. The scripting language is designed
aggregate populations can be shown individually or as an to be lightweight and expressive rather than exhaustive and
aggregate. complex. In cases where the set of exposed operations becomes
Selections. Selections are the results of filtering operations the limiting factor, the property data can be exported to a
and are represented by sets of atoms. An example of a filtering tabular format. This functionality can be found in the script
operation resulting in a selection can be seen in line 1 in listing editor window under File → Export. Temporal and 1D
1. The expression contains an and operation representing the distributions can be exported to .csv or .xvg files. 3D
intersection of subsets: element(′H′) and protein, where distributions can be exported as .cube files, which can also
element(′H′) is the set of atoms with the element hydrogen encode molecule structures. This is utilized in the case of spatial
and protein represents the set of atoms belonging to residues distribution functions, where the reference structure is exported
identified as proteins. as the molecular structure.
D https://doi.org/10.1021/acs.jcim.3c01033
J. Chem. Inf. Model. XXXX, XXX, XXX−XXX
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article

Figure 4. (a) Visualization of the angle formed by local atom indices within all residues named ALA. The involved atoms are highlighted; the spatial
positions are marked by white points connected by lines, and a wedge is shown to emphasize the angle. The resulting type has a size of 15 and
corresponds to the number of contexts (residues named ALA) in which the operation was performed. (b) Visualization of the property d1, the
pairwise distance between the center of mass of the ligand (residue named AIN) and the user-defined pocket defined as a list of residues indices. The
resulting type has a length of 6, corresponding to the individual distances from the ligand to each of the pocket’s residues.

Figure 5. Temporal window (left) with two subplots showing two distinct properties evaluated through the script. The top subplot shows the
individual lines within the population of dist as blue lines, with the population mean in white and population min/max as a transparent area. The
bottom subplot shows a single line depicting property rmsd_asp. The currently viewed animation frame is shown as a vertical yellow line. The user
hovers over a line corresponding to the first entry within the population of dist; therefore, its value is shown together with the time in a tooltip
window. The context menu (right) is shown upon right-clicking a property’s legend entry, in which the visual representation of the property can be
configured.

Data can also be imported into the script using the command between vertical white lines. The temporal filter results in an
import. It currently supports temporal data in tabular format in extra evaluation pass for distribution and volume properties
.csv, .xvg, and .edr (Gromacs energy file). The import command where only the frames in the range are considered. This enables
takes a path to a file and an optional filter argument to specify the inspection of localized temporal trends within the data.
the fields or columns to import. If the path to the file is relative, Distribution View (h). The distribution views, Figure 1h
it is assumed to be relative to the workspace if present, and Figure 6, show properties that evaluate to 1D distributions.
otherwise to the loaded trajectory. The distributions are evaluated and stored in memory as high-
Temporal View (f). The temporal views, Figures 1f and 5, resolution histograms, and when shown in the distribution
provide an overview of the timeline of the trajectory. The window, a down-sampled version is used, configurable by the
timeline shows the temporal evolution of properties as line user (in powers of 2). The down-sampling scheme is
plots. If the property consists of a population of values, the conservative, ensuring an accurate representation of the
population min/max, mean, and variance can optionally be underlying distribution without the need to re-evaluate the
shown. Properties are placed within the subplot by dragging data when changing the visual number of bins. Hovering on
and dropping legend entries between the subplots or dragging properties (label or line) shown in the distribution window will
properties from the Properties menu, which lists all available provide visual feedback in the spatial view, similar to hovering
properties that can be shown in the temporal view. Hovering on the source expression in the script editor. If the property is
properties (label or line) shown in the timeline window will currently shown in the temporal window, it will also be
provide visual feedback in the spatial view, similar to hovering highlighted there.
the source expression in the script editor. If the property is The distribution view is divided into a number of subplots
currently shown in the distribution window, it will also be configurable by the user, and their contents are controlled by
highlighted there. dragging and dropping legend entries listed under Properties in
Temporal Filtering. The temporal view also exposes an the menu or from other subplots. The visual representation of
optional temporal filter, represented by a contiguous range each property can be configured by right-clicking the property’s
(start and end time) shown as a shaded gray region in Figure 5 legend, which opens a context menu (Figure 6, right). If the
E https://doi.org/10.1021/acs.jcim.3c01033
J. Chem. Inf. Model. XXXX, XXX, XXX−XXX
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article

Figure 6. Distribution window (left) with three subplots showing three distinct properties: D1, D2, and D3. The top subplot shows the individual
distributions within the population of D1 as light blue lines and their aggregate, D1(agg) in white. The middle subplot shows the distributions of D2
as bars, and the bottom subplot shows the population of D3 as shaded areas and its aggregate D3(agg) as a white line. The mouse hovers over one of
the lines in the plot, and its label is shown together with its value. The context menu (right) is shown upon right-clicking a property’s legend entry, in
which the visual representation of the property can be configured.

Figure 7. Volume view: the spatial density function of ligands labeled PFT in relation to stacked chains that form amyloid fibrils. Each chain serves as
a reference frame in which the spatial occurrence of the ligands is aggregated and contributes to the resulting total density. Clip planes have been
applied to provide a tight cross-section of the density close to the reference structures. The reference structures (chains) are shown in conjunction
with the density volume in order to serve as a spatial reference.

property consists of a population, then the visibility of the reference structures can optionally be shown as representations
individual components within the population can also be in conjunction with the density volume, as shown in Figure 7.
controlled. Shape Space View (i). The shape space view, Figure 1i and
Volume View (g). The volume view, Figure 1g and Figure Figure 8, serves as a complement to studying the geometrical
7, shows properties that evaluate into volumes, e.g., Spatial deformation of structures over time. Temporally stable
Distribution Function, exposed in the script editor as sdf. structures are required to derive stable reference frames. They
Volumes can be rendered using direct volume ray-casting, are crucial for the temporal superpositioning of structures and a
where the density values are mapped to color and opacity using prerequisite for producing accurate results in operations such as
a predefined set of transfer functions or as user-defined iso- spatial distribution function (SDF). The shape space is a
surfaces. The applied transfer function is optionally shown as a temporal scatter plot where each point represents a structure
legend in the window for clarity. configuration projected into a linear geometric anisotropic
Volume properties resulting from sdf also contain metadata space spanned by three extremes, linear (line), bottom-left;
that can optionally be shown within the volume window. In the planar (disc), bottom-right; and isotropic (spherical), top. By
case of sdf, the metadata are the reference structures used studying the spread of points within this space, it is possible to
during the evaluation supplied as the first parameter. The determine if a structure or set of structures undergoes
F https://doi.org/10.1021/acs.jcim.3c01033
J. Chem. Inf. Model. XXXX, XXX, XXX−XXX
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article

Figure 8. Shape space plot: a scatter plot of the temporal geometric anisotropy for a set of residues was accessed with the identifier pocket. Each color
in the plot corresponds to one residue, and each point corresponds to a frame within the simulation. The three corners of the space correspond to the
three anisotropic extremes: linear (lower-left), planar (bottom-right), and isotropic (top).

Figure 9. Ramachandran plot: the four views show the occurrence of backbone angles ϕ and ψ for residue classes: general (top left), glycine (top
right), proline (bottom-left), and pre-proline (bottom-right). The background consists of configurable layers showing from bottom to top: reference
distribution (derived from PDB top-500 proteins) was indicated as colored contour layers corresponding to fixed percentiles. Trajectory distribution
(over full trajectory) as white isolines with matching percentiles. The top layer shows interactive points corresponding to the angles of the current
configuration. The blue points show which residues are currently selected, and the yellow points show the point hovering with the mouse.

substantial geometrical deformation over time and is thus unfit of configurable layers to provide context: The bottom-most
to serve as reference frames. In such cases, the user can further layer represents a reference distribution derived from the PDB
refine the structure selection by narrowing the selection to top 500 proteins as was originally suggested by Lovell et al.23
stable parts of the structures and excluding weakly connected and is today considered common practice. The percentiles used
extremities that degrade the stability of the reference frame. For for contour levels are the same as suggested by Lovell et al.,
details regarding the spatial distribution function and the shape 99.95% and 98% in the general case and 99.5% and 98% in the
space, see Skanberg et al.21 other cases. The next two layers represent distributions from
Ramachandran Plot (e). The Ramachandran plot,22 Figure the loaded data set, where one layer is the full trajectory and the
1e and Figure 9, is a scatter plot where the coordinate axes other is the selected frame interval. Both distributions are scaled
correspond to dihedral angles ϕ and ψ from the backbone of to match the contour levels of the reference distribution to
protein structures. The plot is divided into four views enable A-to-B comparisons.
corresponding to the peptide residue types: general, glycine, Each layer can be shown as filled contour levels or contour
proline, and pre-proline. The coordinate axes of each plot are lines or mapped with a transfer function.
linked. The points represent proteins in their current Lowell et al.23 employ a two-pass approach of normalized
configuration and are interactive, meaning the user can hover cosine smoothing in order to filter out points in specific
over and select them. The fill color of the points shows the percentiles and categorize the resulting regions as allowed and
current selection, where white means not selected, blue is favored, where favored is more strict and contains fewer points
selected, and yellow is highlighted (hovered). The views consist than allowed. Partly, this technique was used to filter out
G https://doi.org/10.1021/acs.jcim.3c01033
J. Chem. Inf. Model. XXXX, XXX, XXX−XXX
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article

outliers from the input data derived from Crystallography,


which has an inherent uncertainty.
The goal is not to replicate the process as the points
(backbone angles) now stem from simulated trajectories and do
not suffer from the inherent uncertainty present in Crystallog-
raphy and Cryo-EM. Thus, all points should be considered
equally. Instead, we employ a normalized Gaussian kernel with a
configurable standard deviation to spread the density. A
Gaussian kernel was chosen since it is separable and well- Listing 4: Properties p and v which represent the planarity and
suited for performance-critical applications. the spatial distribution of “PFT” molecules with respect to the
chains in the system.

■ WORKSPACE
The application’s state can be saved and loaded from workspace
Performance Measurements. Tables 1 and 2 contain
measurements of evaluations of specific properties (listings 3,

files with the default file extension .via. It contains Table 1. Evaluation Times in Seconds and Speedup Factor
representations, the script, stored selections, the current camera for Properties r1, r2, v1, and v2 in Listing 3a
state, and general settings. The format is a text-based ASCII #
format and can be modified with any text editor. The only threads 1 2 4 8 16
exception is stored selections, which are compressed and stored r1 9.4 (1.0×) 4.8 2.4 1.3 0.8
in Base64. In the Supporting Information, the reader can find an (1.96×) (3.91×) (7.23×) (11.75×)
r2 38.4 19.3 10.0 5.1 2.7
example workspace file in the online tutorials. (1.0×) (1.99×) (3.84×) (7.52×) (14.22×)
Requirements, Performance, and Scalability. The v1 33.3 16.8 8.5 4.7 2.8
software is designed to leverage thread-level parallelism by (1.0×) (1.98×) (3.92×) (7.09×) (11.89×)
using an internal task pool, to which computation-intensive v2 137.4 69.6 35.1 19.2 10.9
(1.0×) (1.97×) (3.91×) (7.16×) (12.60×)
tasks are submitted. Examples of such tasks include script a
Evaluated for data set 2, comprised of 50 512 atoms spanning 600
evaluations and Ramachandran and shape-space distributions. frames with a total in-memory size of 348 MB. There are 16 433 water
Each task may entail processing a range of elements (i.e., molecules present in the system.
trajectory frames). In such cases, the range is split into
subranges that are processed in parallel. The number of Table 2. Evaluation Times in Seconds and Speedup Factor
available worker threads in the thread pool is exposed as a user- for Properties p and v in listing 4a
configurable compile-time parameter in the CMake script. This # Threads 1 2 4 8 16
ensures that the task system can scale to the number of p seconds 0.4 0.2 (2.0x) 0.1 0.1 0.2 (2.0x)
hardware threads available. Note that one of the threads is (speedup) (1.0x) (4.0x) (4.0x)
always reserved as the application’s main thread for handling v seconds 28.8 15.8 8.1 4.2 2.5
(speedup) (1.0x) (1.82x) (3.55x) (6.85x) (11.52x)
logic and rendering to maintain interactivity. The software is a
also written to utilize data-level parallelism through SIMD- Evaluated for data set 3, comprised of 161 742 atoms spanning 2345
frames with a total in-memory size of 4.24 GB. There are 253 chains
instructions to accelerate the processing time of computation- and 61 “PFT” molecules present in the system.
intensive operations.
The application also employs a cache system for trajectory 4) performed for two different data sets. The evaluated
frames, which streams in and decompresses trajectory frames properties (with the exception of p) have been chosen as
on demand. It has a user-configurable compile-time parameter they represent both computation- and memory-intensive
in the CMake script that controls the available memory for the operations that access a substantial portion of each evaluated
cache system. frame. The measurements have been performed on a system
System Requirements. The minimum requirement is a dual- comprised of an AMD Ryzen 9 7950 × 3D 4.2 GHz CPU with
core system with 4 GB of available system memory and a 128 GB of DDR5 memory. The number of threads was varied
graphics accelerator supporting OpenGL 3.30. The recom- to measure the scalability of the evaluation. Tables 1 and 2 list
the evaluation times in seconds next to the scaling factor shown
mendation is to have a multicore system (4+) and at least 8 GB
in parentheses. The scaling with respect to the number of
of available system memory as the evaluation performance threads seems to follow a near-linear trend, with the exception
scales with the number of threads available. of p, whose speedup factor is linear only up until four threads.
Since p is computationally inexpensive compared to the other
properties, we hypothesize that a substantial portion of the
measured time stems from the overhead of initializing and
synchronizing evaluation tasks. Detailed instrumentation is
required for further analysis.
Listing 3: Properties r1 and r2 and v1 and v2 evaluate the radial
distribution function of oxygen in water with respect to water
and the spatial distribution of oxygen in water with respect to
■ CONCLUSIONS
To summarize, the VIAMD application presented in this article
water in data set 2. allows us to considerably improve the workflow of MD analysis
by tightly coupling visualization, property computation, and
H https://doi.org/10.1021/acs.jcim.3c01033
J. Chem. Inf. Model. XXXX, XXX, XXX−XXX
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article

plotting within the same application, allowing for the leveraging Mathieu Linares − Linköping University, SE-581 83
of synergies formed by this tight coupling. This optimization of Linköping, Sweden; orcid.org/0000-0002-9720-5429;
the common MD-analysis workflow offers various benefits, Email: [email protected]
including interactivity, the ability to declare derived properties
from spatial selections directly, and the visualization of Authors
properties to aid the user in asserting the operation and Ingrid Hotz − Linköping University, SE-581 83 Linköping,
structures involved. The VIAMD application is a promising Sweden
solution for improving the efficiency and accuracy of MD Anders Ynnerman − Linköping University, SE-581 83
analysis and has the potential to advance research in many Linköping, Sweden
scientific disciplines. We hope that VIAMD can be a tool for Complete contact information is available at:
dissemination since more and more MD trajectories are https://pubs.acs.org/10.1021/acs.jcim.3c01033
available online in repositories, and there is an initiative for a
search engine prototype to explore collected MD data.24 To Notes
reach a larger audience of computational chemists, we plan to The authors declare no competing financial interest.
extend the type of data that could be analyzed with VIAMD and
develop new functionalities in the future.
■ ACKNOWLEDGMENTS

■ AVAILABILITY AND DOCUMENTATION


The source code is freely available on https://github.com/
The authors thank Patrick Norman for his support and
involvement in the initial phases of the software. The authors
thank the Swedish e-Science Research Center (SeRC) and the
scanberg/viamd under the MIT license. VIAMD is under active Wallenberg Foundation for funding the ongoing research.
development; therefore, design and implementation aspects
may be subject to change. Documentation is available on the
wiki page of VIAMD https://github.com/scanberg/viamd/wiki
with pages dedicated to the software’s visual, analysis, and
■ REFERENCES
(1) Pettersen, E. F.; Goddard, T. D.; Huang, C. C.; Couch, G. S.;
language components. A series of tutorials is also proposed to Greenblatt, D. M.; Meng, E. C.; Ferrin, T. E. UCSF Chimera�a
visualization system for exploratory research and analysis. Journal of
encourage users to use VIAMD. The data sets used in this
computational chemistry 2004, 25, 1605−1612.
article to illustrate the functionalities of VIAMD are also (2) Humphrey, W.; Dalke, A.; Schulten, K. VMD: visual molecular
available on the wiki. dynamics. J. Mol. Graphics 1996, 14, 33−38.

■ DATA SETS
Data Set 1: Alanine Chain. The default data set is supplied
(3) Schrödinger, L.; DeLano, W. PyMOL. http://www.pymol.org/
pymol.
(4) Jurcik, A.; Bednar, D.; Byska, J.; Marques, S. M.; Furmanova, K.;
Daniel, L.; Kokkonen, P.; Brezovsky, J.; Strnad, O.; Stourac, J.; et al.
with the application and automatically loaded upon start. It is a CAVER Analyst 2.0: analysis and visualization of channels and tunnels
small data set of 500 frames with a single chain of 15 Alanine in protein structures and molecular dynamics trajectories. Bioinfor-
residues. matics 2018, 34, 3586−3588.
Data Set 2: Aspirin and Protein. This data set is available (5) Hanson, R. M.; Prilusky, J.; Renjian, Z.; Nakane, T.; Sussman, J.
to download from the provided link. It consists of an aspirin L. JSmol and the next-generation web-based representation of 3D
ligand being pulled from its specific binding to phospholipase molecular structure as applied to proteopedia. Isr. J. Chem. 2013, 53,
A2. The dynamic was performed from the complex’s crystal 207−216.
structure (1OXR).25 (6) OneAngstrom. SAMSON. https://www.samson-connect.net/.
Data Set 3: Amyloid Fibril and PFTAA. This nonbiased (7) Stukowski, A. Visualization and analysis of atomistic simulation
data with OVITO-the Open Visualization Tool. Modelling Simul.
molecular dynamic simulation illustrates the binding of a
Mater. Sci. Eng. 2010, 18, 015012.
pentameric oligothiophene used to detect amyloid-β(1−42), (8) Gralka, P.; Becher, M.; Braun, M.; Frieß, F.; Müller, C.; Rau, T.;
responsible for Alzheimer’s disease.26 Schatz, K.; Schulz, C.; Krone, M.; Reina, G.; Ertl, T. MegaMol − a

■ ASSOCIATED CONTENT
Data Availability Statement
comprehensive prototyping framework for visualizations. European
Physical Journal Special Topics 2019, 227, 1817−1829.
(9) Sehnal, D.; Bittrich, S.; Deshpande, M.; Svobodová, R.; Berka, K.;
Bazgier, V.; Velankar, S.; Burley, S. K.; Koča, J.; Rose, A. S. Mol*
There are also online resources providing documentation, Viewer: modern web app for 3D visualization and analysis of large
tutorials, and the data sets used in the article at https://github. biomolecular structures. Nucleic Acids Res. 2021, 49, W431−W437.
com/scanberg/viamd/wiki (10) Kozlíková, B.; Krone, M.; Falk, M.; Lindow, N.; Baaden, M.;
*
sı Supporting Information Baum, D.; Viola, I.; Parulek, J.; Hege, H.-C. Visualization of
biomolecular structures: State of the art revisited. Computer Graphics
The Supporting Information is available free of charge at
Forum 2017, 36, 178−204.
https://pubs.acs.org/doi/10.1021/acs.jcim.3c01033. (11) Kut’ák, D.; Vázquez, P.-P.; Isenberg, T.; Krone, M.; Baaden, M.;
Representation types, script types, script function Byška, J.; Kozlíková, B.; Miao, H. State of the Art of Molecular
glossary, limitations, example workspace (PDF) Visualization in Immersive Virtual Environments. Computer Graphics
Forum 2023, 42, e14738.

■ AUTHOR INFORMATION
Corresponding Authors
(12) Fiorin, G.; Klein, M. L.; Hénin, J. Using collective variables to
drive molecular dynamics simulations. Mol. Phys. 2013, 111, 3345−
3362.
(13) Thompson, A. P.; Aktulga, H. M.; Berger, R.; Bolintineanu, D.
Robin Skånberg − Linköping University, SE-581 83 Linköping, S.; Brown, W. M.; Crozier, P. S.; in 't Veld, P. J.; Kohlmeyer, A.; Moore,
Sweden; orcid.org/0000-0001-7447-483X; S. G.; Nguyen, T. D.; Shan, R.; Stevens, M. J.; Tranchida, J.; Trott, C.;
Email: [email protected] Plimpton, S. J. LAMMPS - a flexible simulation tool for particle-based

I https://doi.org/10.1021/acs.jcim.3c01033
J. Chem. Inf. Model. XXXX, XXX, XXX−XXX
Journal of Chemical Information and Modeling pubs.acs.org/jcim Article

materials modeling at the atomic, meso, and continuum scales.


Comput. Phys. Commun. 2022, 271, 108171.
(14) Phillips, J. C.; Braun, R.; Wang, W.; Gumbart, J.; Tajkhorshid,
E.; Villa, E.; Chipot, C.; Skeel, R. D.; Kale, L.; Schulten, K. Scalable
molecular dynamics with NAMD. Journal of computational chemistry
2005, 26, 1781−1802.
(15) Van Der Spoel, D.; Lindahl, E.; Hess, B.; Groenhof, G.; Mark, A.
E.; Berendsen, H. J. GROMACS: fast, flexible, and free. Journal of
computational chemistry 2005, 26, 1701−1718.
(16) Michaud-Agrawal, N.; Denning, E. J.; Woolf, T. B.; Beckstein, O.
MDAnalysis: a toolkit for the analysis of molecular dynamics
simulations. Journal of computational chemistry 2011, 32, 2319−2327.
(17) Ulbrich, P.; Waldner, M.; Furmanova, K.; Marques, S. M.;
Bednar, D.; Kozlikova, B.; Byska, J. sMolBoxes: Dataflow Model for
Molecular Dynamics Exploration. IEEE Transactions on Visualization
and Computer Graphics 2022, 1−10.
(18) Skånberg, R.; Linares, M.; König, C.; Norman, P.; Jönsson, D.;
Hotz, I.; Ynnerman, A. VIA-MD: Visual Interactive Analysis of Molecular
Dynamics; MolVa@ EuroVis, 2018; pp 19−27.
(19) Jönsson, D.; Steneteg, P.; Sundén, E.; Englund, R.; Kottravel, S.;
Falk, M.; Ynnerman, A.; Hotz, I.; Ropinski, T. Inviwo�a visualization
system with usage abstraction levels. IEEE transactions on visualization
and computer graphics 2020, 26, 3241−3254.
(20) Skånberg, R.; Linares, M.; Falk, M.; Hotz, I.; Ynnerman, A.
MolFind - Integrated Multi-Selection Schemes for Complex Molecular
Structures, Workshop on Molecular Graphics and Visual Analysis of
Molecular Data; Byska, J., Krone, M., Sommer, B. The Eurographics
Association: The Netherlands, 2019; https://doi.org/10.2312/molva.
20191096.
(21) Skanberg, R.; Falk, M.; Linares, M.; Ynnerman, A.; Hotz, I.
Tracking Internal Frames of Reference for Consistent Molecular
Distribution Functions. IEEE Transactions on Visualization and
Computer Graphics 2022, 28, 3126.
(22) Ramachandran, G.; Ramakrishnan, C.; Sasisekharan, V.
Stereochemistry of polypeptide chain configurations. J. Mol. Biol.
1963, 7, 95−99.
(23) Lovell, S. C.; Davis, I. W.; Arendall, W. B., III; De Bakker, P. I.;
Word, J. M.; Prisant, M. G.; Richardson, J. S.; Richardson, D. C.
Structure validation by Cα geometry: ϕ, ψ and Cβ deviation. Proteins:
Struct., Funct., Bioinf. 2003, 50, 437−450.
(24) Tiemann, J. K. S.; Szczuka, M.; Bouarroudj, L.; Oussaren, M.;
Garcia, S.; Howard, R. J.; Delemotte, L.; Lindahl, E.; Baaden, M.;
Lindorff-Larsen, K.; Chavent, M.; Poulain, P. MDverse: Shedding
Light on the Dark Matter of Molecular Dynamics Simulations. bioRxiv
2023, DOI: 10.1101/2023.05.02.538537.
(25) Singh, R. K.; Ethayathulla, A.; Jabeen, T.; Sharma, S.; Kaur, P.;
Singh, T. P. Aspirin induces its anti-inflammatory effects through its
specific binding to phospholipase A2: Crystal structure of the complex
formed between phospholipase A2 and aspirin at 1.9 Å resolution. J.
Drug Targeting 2005, 13, 113−119.
(26) König, C.; Skånberg, R.; Hotz, I.; Ynnerman, A.; Norman, P.;
Linares, M. Binding sites for luminescent amyloid biomarkers from
non-biased molecular dynamics simulations. Chem. Commun. 2018, 54,
3030−3033.

J https://doi.org/10.1021/acs.jcim.3c01033
J. Chem. Inf. Model. XXXX, XXX, XXX−XXX

You might also like