2. Loading your data

2.1. Basic Usage

MDAnalysis aims to read any and all molecular simulation data, from whatever files you provide to it. This is done via the Universe object, which accepts paths to files as its arguments. Usually two files are required to create a Universe object: A topology file, which provides information about the names and types of atoms, and a trajectory file, which gives the positions of atoms over time.

This generally corresponds to the file required to set up the simulation acting as the topology, while the results file of the simulation provides the trajectory. For example to load results from a CHARMM simulation, we provide a path to the PSF file to act as a topology, and a path to the DCD results to act as the trajectory

import MDAnalysis as mda

u = mda.Universe('adk.psf', 'adk_dims.dcd')

Note

If a file which also provides coordinates is used as a topology, no trajectory information is read from this file. Ie the first frame will come from the trajectory unless the all_coordinates keyword is set to True.

2.1.1. Single file Universes

Occasionally a file may contain both topology and trajectory information, in such cases it is sufficient to provide only a single filename to Universe

import MDAnalysis as mda

u = mda.Universe('myfile.pdb')

2.1.2. Concatenating trajectories

It is also possible to read multiple consecutive trajectories, (for example if a simulation was restarted), by passing a list of trajectory filenames. In this example, frames from traj1.trr and traj2.trr will be concatenated when iterating through the trajectory.

import MDAnalysis as mda

u = mda.Universe('topol.tpr', ['traj1.trr', 'traj2.trr'])

2.2. Supported formats and further details

This table lists all currently supported file formats in MDAnalysis, whether they can act as either Topology or Trajectory files, as well as links to the relevant documentation pages. In general MDAnalysis will try and extract all available data from a given file, for full details of what is extracted from each file format consult the relevant documentation page.

Generally the format of a file is automatically detected from the extension, for example a file called system.xyz is recognised as an XYZ file. This can be overriden by supplying the topology_format and format keyword arguments to Universe. A full list of valid values for these keywords are given in the below table.

Note

It is possible to pass tarballed/zipped versions of files. The format detection will work around this.

Table of supported formats
Source Name Format Topology Trajectory I/O
Amber PARM parameter/topology TOP, PRMTOP, PARM7 Yes Yes r
  Ascii trajectory TRJ, MDCRD No Yes r
  Ascii restart INPCRD, RESTRT No Yes r
  NetCFD trajectory NCDF, NC Minimal Yes r/w
Poisson Boltzmann PQR files PQR Yes Yes” r
Autodock Autodock PDBQT files PDBQT Yes Yes r
Charmm PSF files PSF Yes No r
  Binary DCD files DCD Minimal Yes r/w
  Ascii trajectory CRD Minimal Yes r
Desmond MD DMS trajectory DMS Yes Yes r
DL Poly Ascii History HISTORY Yes Yes r
  Ascii config CONFIG Yes Yes r
GAMESS GAMESS GMS, LOG, OUT Yes Yes r
Gromacs Gromos GRO Yes Yes r/w
  TPR file TPR Yes No r
  TRR trajectory TRR Minimal Yes r/w
  XTC trajectory XTC Minimal Yes r/w
Hoomd XML Topology XML Yes Yes r
  Global simulation data? GSD No Yes” r
IBIsCO and YASP trajectories Binary trajectory TRZ Minimal Yes r/w
Lammps Data file DATA Yes Yes r
  Binary DCD DCD Minimal Yes r/w
Protein Databank PDB PDB, ENT, XPDB Yes Yes” r/w
  Macromolecular transmission format MMTF Yes Yes r
Tinker Extended XYZ TXYZ Yes Yes r
Tripos MOL2 MOL2 Yes Yes r/w
XYZ files Ascii XYZ files XYZ Yes Yes r/w