Converting particles from STAR file formats

This tutorial shows how to import cryo-EM image data into CryoLike from MRC files described by STAR-formatted files.

When CryoLike loads STAR file data, we expect the actual image values to be stored in MRC, MRCS, or MAP files, with the descriptive values stored in one or more STAR files.

For more background on STAR files in CryoLike, see the file formats documentation

STAR file data representations

CryoLike can import STAR file data using two possible modalities: indexed files and parallel file lists.

In both cases, the STAR file describes how to interpret the images, and the settings/parameters of the image capture apparatus, while the image data itself is in accompanying MRC files.

For more complete background on file conversion, see the image conversion documentation.

Parallel lists

We use the term “parallel lists” to refer to the cryolike.file_conversions.particle_stacks_wrappers.convert_particle_stacks_from_paired_star_and_mrc_files() conversion function, whose inputs include two separate lists: one list of STAR files, and one list of MRC files.

This function assumes that the two lists match: that is, that every STAR file describes exactly the images in the corresponding MRC file, and that the STAR file describes those images in the same order as they appear in the MRC file.

Because the user passes the paths to all the STAR and MRC files individually, this function makes no assumptions about directory structure. It ignores any MRC file indexing that is present in the STAR files.

Indexed files

We use the term “indexed file” to refer to a collection of images described by a single STAR file (even though they may reside in many MRC files). We assume that indexed STAR files will follow the ReLION naming convention for the descriptive fields.

In particular, we assume that images will be described in an ImageName section, where each row has an initial field following the pattern of 12345@path/to/file.mrc–i.e., a numeric index, followed by an @ sign, followed by a valid path to the MRC file. The other data block contains descriptive information in the same order as the files appear in the ImageName block.

Example:

So, a file whose ImageName block contained the following:

  • 0004@my/file.mrc

  • 0002@my/otherfile.mrc

would be describing two images. The first image described would be the 4th image stored in my/file.mrc, and the second image described would be the 2nd image stored in my/otherfile.mrc.

CryoLike allows the user to indicate that the MRC files are actually located in a different directory, in which case the path information (but not the filename or index number) is ignored.

Common parameters

For a description of the parameters common to all image conversion functions, see the image conversion documentation.

Examples

Parallel lists example

Suppose:

  • MyFile.star is a STAR file which describes 2 images

  • File2.star is a STAR file which describes 3 images

  • images.mrc is an MRC file containing 2 images

  • images2.mrc is an MRC file containing 3 images

The following function call would create a single CryoLike image stack with 5 images in the OUTDIR directory:

convert_particle_stacks_from_paired_star_and_mrc_files(
    params_input="my_params_file.npz",
    particle_file_list=["images.mrc", "images2.mrc"],
    star_file_list=["MyFile.star", "File2.star"],
    folder_output='OUTDIR'
)

Note that this function assumes that the first row of data in MyFile.star describes the first image in images.mrc, the second row of data describes the second image, and so on: any indexing information in the STAR file will be ignored.

By default this would assume that the defocus angle and phase shift information was given in degrees, not radians.

By default, existing output files in the OUTDIR directory will be left in place and file conversion will stop if there would be a naming conflict. Setting the overwrite parameter to True will suppress this behavior.

Indexed files examples

Suppose:

  • MyFile.star is a STAR file which describes 5 images

  • The file’s ImageName section contains:

    • 0003@somedir/file1.mrc

    • 0001@somedir/file1.mrc

    • 0010@somedir/file2.mrc

    • 10@otherdir/file3.mrc

    • 0002@somedir/file1.mrc

MyFile.star reflects the following intention:

  • The first entry in its data block describes the third image in somedir/file1.mrc

  • The second entry describes the first image in somedir/file1.mrc

  • The third entry describes the 10th image in somedir/file2.mrc

  • The fourth entry describes the 10th image in otherdir/file3.mrc

  • The fifth entry describes the second image in somedir/file1.mrc

The following function call would create a single CryoLike image stack with 5 images in the OUTDIR directory:

convert_particle_stacks_from_indexed_star_files(
    params_input="my_params_file.npz",
    star_file='MyFile.star',
    folder_output="OUTDIR",
)

assuming that the somedir and otherdir directories exist in the directory where CryoLike is being run.

If, instead, the user has moved file1.mrc, file2.mrc, and file3.mrc into the ~/my_research/my_mrc_files/ directory, then this call would achieve the same result:

convert_particle_stacks_from_indexed_star_files(
    params_input="my_params_file.npz",
    star_file='MyFile.star',
    folder_mrc='~/my_research/my_mrc_files/',
    folder_output="OUTDIR",
)

In this case, CryoLike will honor the indexing information but ignore the path information (other than the MRC files’ basenames).

The following call would result in two output stacks (one with the first 3 images and one with the remaining 2 images). It would also downsample each image by a factor of 2 during the image conversion, taking the mean of the downsampled pixel ranges:

convert_particle_stacks_from_indexed_star_files(
    params_input="my_params_file.npz",
    star_file='MyFile.star',
    folder_output="OUTDIR",
    batch_size=3,
    downsample_factor=2,
    downsample_type='mean'
)