Converting particles from CryoSPARC file formats
This tutorial shows how to import cryo-EM data into CryoLike from CryoSPARC files.
When CryoLike loads CryoSPARC data, we expect to load one CryoSPARC file that will describe image data held in one or more MRC files.
CryoSPARC data representations
CryoLike can import CryoSPARC data from two possible data storage paradigms: indexed files and job directories.
In both cases, the CryoSPARC file describes how to interpret the images, and the settings/parameters of the image capture apparatus.
See CryoSPARC for information about expected fields in CryoSPARC files.
Indexed files
We use the term “indexed file” to refer to a collecton of images
described by a single CryoSPARC file. When image data is stored
in this format, we expect that the CryoSPARC file will have an
fs section with path and idx members. These should
hold two lists: a list of paths on the file system, and
a list of index numbers. Each path specifies where to find
an MRC file holding the actual image data. The index number
tells us which one of the images in the MRC file is being
described by the corresponding CryoSPARC data.
The user can specify the parent directory of the MRC files, in which case we assume that all MRC files are located immediately under that directory. If the parent directory is not specified, we assume that the CryoSPARC file contains a valid path (either absolute, or relative to the directory where CryoLike is being run) to the location of each MRC file.
The function that processes images indexed by a CryoSPARC
file is
cryolike.file_conversions.particle_stacks_wrappers.convert_particle_stacks_from_indexed_cryosparc_file().
Job directories
We use the term “job directory” to refer to a collection of images stored in a CryoSPARC job directory. In this setting, we do not use any filesystem or index data from the CryoSPARC file–we expect that the rows of the CryoSPARC file describe the images in the MRC files in the job directory, in sequence.
This version makes particular strong assumptions about the layout of the job directory. The expected layout is discussed in detail in the image conversion documentation.
The function that processes images from a CryoSPARC job directory is
cryolike.file_conversions.particle_stacks_wrappers.convert_particle_stacks_from_cryosparc_job_directory().
Common parameters
For a description of the parameters common to all image conversion functions, see the image conversion documentation.
Examples
Indexed files examples
Suppose:
MyFile.csis a CryoSPARC file which describes 5 imagesThe file’s
blob/pathmember contains:somedir/file1.mrcsomedir/file1.mrcsomedir/file2.mrcotherdir/file3.mrcsomedir/file1.mrc
The file’s
blob/idxmember contains:3110102
MyFile.cs thus reflects the following intention:
Its first entry describes the third image in
somedir/file1.mrcIts second entry describes the first image in
somedir/file1.mrcIts third entry describes the 10th image in
somedir/file2.mrcIts fourth entry describes the 10th image in
otherdir/file3.mrcIts fifth entry describes the second image in
somedir/file1.mrc
In this case, the following function call would put all 5 images into
a single output file in OUTDIR:
convert_particle_stacks_from_indexed_cryosparc(
params_input="my_params_file.npz",
file_cs="MyFile.cs",
folder_output='OUTDIR',
)
assuming that it is run from a directory where somedir and
otherdir exist.
If, however, you had moved file1.mrc, file2.mrc,
and file3.mrc into the ~/my_research/my_mrc_files/
directory, then this call would achieve the same result:
convert_particle_stacks_from_indexed_cryosparc(
params_input="my_params_file.npz",
file_cs="MyFile.cs",
folder_cryosparc='~/my_research/my_mrc_files/',
folder_output='OUTDIR',
)
The following call would create 2 image stacks in the
current directory. The first stack would have the first 3
images from MyFile.cs and the second stack would hold
the remaining 2 images:
convert_particle_stacks_from_indexed_cryosparc(
params_input="my_params_file.npz",
file_cs="MyFile.cs",
batch_size=3
)
Job directory examples
Suppose:
The CryoSPARC job folder is located at
./cryosparc/J4./cryosparc/J4/J4_passthrough_particles.csexists, and has data describing at least 12 images./cryosparc/J4/restack/exists and contains:batch_000000_restacked.mrcwith 4 imagesbatch_000001_restacked.mrcwith 4 imagesbatch_000002_restacked.mrcwith 4 images
./cryosparc/J4/downsample/exists and contains:batch_000000_downsample.mrcwith 4 imagesbatch_000001_downsample.mrcwith 4 imagesbatch_000002_downsample.mrcwith 4 imagesbatch_000004_downsample.mrcwith 4 images (note that...000003...has been deliberately skipped)
The following call would convert all 12 images from the restack
directory into a single image stack placed in the OUTDIR
directory:
convert_particle_stacks_from_cryosparc_job_directory(
params_input="my_params.npz",
folder_cryosparc= 'cryosparc',
job_number=4,
folder_output='OUTDIR'
)
If the cryosparc/j4/restack/
directory did not exist, then
the MRC files from the downsample/ directory would be used.
The file batch_000004_downsample.mrc would never be read,
because image conversion would stop when the program looked for
batch_000003_downsample.mrc and could not find it.
The following call would stop processing after emitting 2 stacks of 4 images each:
convert_particle_stacks_from_cryosparc_job_directory(
params_input="my_params.npz",
folder_cryosparc= 'cryosparc',
job_number=4,
folder_output='OUTDIR',
batch_size=4,
n_stacks_max=2
)
The following call would downsample the imported images by a factor of 2 using the mean value over the affected pixel range:
convert_particle_stacks_from_cryosparc_job_directory(
params_input="my_params.npz",
folder_cryosparc= 'cryosparc',
job_number=4,
folder_output='OUTDIR',
downsample_factor=2,
downsample_type='mean'
)
Note that this would be independent of any downsampling already done to the image files.