2. Loading your data¶
2.1. Basic Usage¶
MDAnalysis aims to read any and all molecular simulation data, from whatever files you provide to it. This is done via the Universe object, which accepts paths to files as its arguments. Usually two files are required to create a Universe object: A topology file, which provides information about the names and types of atoms, and a trajectory file, which gives the positions of atoms over time.
This generally corresponds to the file required to set up the simulation acting as the topology, while the results file of the simulation provides the trajectory. For example to load results from a CHARMM simulation, we provide a path to the PSF file to act as a topology, and a path to the DCD results to act as the trajectory
import MDAnalysis as mda
u = mda.Universe('adk.psf', 'adk_dims.dcd')
Note
If a file which also provides coordinates is used as a topology, no trajectory
information is read from this file. Ie the first frame will come from the trajectory
unless the all_coordinates keyword is set to True
.
2.1.1. Single file Universes¶
Occasionally a file may contain both topology and trajectory information, in such cases it is sufficient to provide only a single filename to Universe
import MDAnalysis as mda
u = mda.Universe('myfile.pdb')
2.1.2. Concatenating trajectories¶
It is also possible to read multiple consecutive trajectories, (for example if a simulation was restarted), by passing a list of trajectory filenames. In this example, frames from traj1.trr and traj2.trr will be concatenated when iterating through the trajectory.
import MDAnalysis as mda
u = mda.Universe('topol.tpr', ['traj1.trr', 'traj2.trr'])
2.2. Supported formats and further details¶
This table lists all currently supported file formats in MDAnalysis, whether they can act as either Topology or Trajectory files, as well as links to the relevant documentation pages. In general MDAnalysis will try and extract all available data from a given file, for full details of what is extracted from each file format consult the relevant documentation page.
Generally the format of a file is automatically detected from the extension, for example a file called system.xyz is recognised as an XYZ file. This can be overriden by supplying the topology_format and format keyword arguments to Universe. A full list of valid values for these keywords are given in the below table.
Note
It is possible to pass tarballed/zipped versions of files. The format detection will work around this.
Source | Name | Format | Topology | Trajectory | I/O |
---|---|---|---|---|---|
Amber | PARM parameter/topology | TOP, PRMTOP, PARM7 | Yes | Yes | r |
Ascii trajectory | TRJ, MDCRD | No | Yes | r | |
Ascii restart | INPCRD, RESTRT | No | Yes | r | |
NetCFD trajectory | NCDF, NC | Minimal | Yes | r/w | |
Poisson Boltzmann | PQR files | PQR | Yes | Yes” | r |
Autodock | Autodock PDBQT files | PDBQT | Yes | Yes | r |
Charmm | PSF files | PSF | Yes | No | r |
Binary DCD files | DCD | Minimal | Yes | r/w | |
Ascii trajectory | CRD | Minimal | Yes | r | |
Desmond MD | DMS trajectory | DMS | Yes | Yes | r |
DL Poly | Ascii History | HISTORY | Yes | Yes | r |
Ascii config | CONFIG | Yes | Yes | r | |
GAMESS | GAMESS | GMS, LOG, OUT | Yes | Yes | r |
Gromacs | Gromos | GRO | Yes | Yes | r/w |
TPR file | TPR | Yes | No | r | |
TRR trajectory | TRR | Minimal | Yes | r/w | |
XTC trajectory | XTC | Minimal | Yes | r/w | |
Hoomd | XML Topology | XML | Yes | Yes | r |
Global simulation data? | GSD | No | Yes” | r | |
IBIsCO and YASP trajectories | Binary trajectory | TRZ | Minimal | Yes | r/w |
Lammps | Data file | DATA | Yes | Yes | r |
Binary DCD | DCD | Minimal | Yes | r/w | |
Protein Databank | PDB | PDB, ENT, XPDB | Yes | Yes” | r/w |
Macromolecular transmission format | MMTF | Yes | Yes | r | |
Tinker | Extended XYZ | TXYZ | Yes | Yes | r |
Tripos | MOL2 | MOL2 | Yes | Yes | r/w |
XYZ files | Ascii XYZ files | XYZ | Yes | Yes | r/w |