File formats for image conversion
Image conversion (from mrc or cryoSPARC particle files) requires both an image descriptor and capture apparatus metadata. The image descriptor defines the grids and scales to use to interpret the image and is usually constructed from base values.
Capture apparatus metadata, however, is expected to be delivered in either Starfile or CryoSparc format. Since these file formats are potentially quite loosely defined, here we discuss the exact formats currently supported.
Note that, regardless of format, we always expect that there will one CTF per image in the stack. Therefore, defocus information (and phase shift information, if present) need to be available for each captured image.
Starfile
We expect that the Starfile can be read by the starfile library
and will return either a Pandas dataframe or a dictionary of string
keys pointing to the Pandas dataframe.
If the dictionary is returned, we expect it to contain the keys optics
and particles. These two dataframes will be joined together on
the rlnOpticsGroup field.
Any valid Starfile must contain at least the Voltage and
SphericalAberration fields.
ReLion-formatted
For an example of a supported ReLion-formatted file, see the
relion_style_particles.star file in the test/data directory
of the CryoLike github repository.
This format is specifically used for “indexed” metadata files, where the single Starfile describes a large number of particle images which may be found in other files across the filesystem.
We expect ReLion-formatted Starfiles to define optics and particles
sections
which can be joined on the rlnOpticsGroup column. Once this join
is done, we will read all fields from the result. (Note that any leading
rln or _rln will be stripped from the field names.)
The following fields MUST be defined:
DefocusU
DefocusV
DefocusAngle
SphericalAberration
Voltage
The following fields will be given default values or ignored, if missing:
PhaseShift– defaults to 0 if missing
AmplitudeContrast– defaults to 0.1 if missing
AngleRot– ignored if not present
AngleTilt– ignored if not present
AnglePsi– ignored if not present
CtfBfactor– defaults to 0 if missing
CtfScalefactor– defaults to 1 if missing
ImagePixelSize– size of pixels in the image
Any other field will be ignored.
We expect every row to consist of entries of the form:
000001@filename.mrcs [fields...]
where 000001 (the part before the @ sign) indicates the 1-based
index of the image within the MRC file, and filename.mrcs (the part after
the @ sign) indicates the source MRC file. The parameters passed to
the image conversion function will determine how the mrc filename is
interpreted: if a folder parameter is passed, we will take only the
filename (ignoring any path) and look for the file within the specified
directory; otherwise we will keep the path from the Starfile and assume
it is a valid relative path from the working directory
(where the script was called).
Non-ReLion-formatted
We also support a more generic Starfile format. The same basic conditions
(must be readable as a Pandas dataframe or dictionary of dataframes with
optics and particles keys) apply. However, instead of trying to
parse the remaining rows, we simply treat the rows as a list of metadata
values to be applied in sequence to images from an MRC file.
This is the format expected in the
convert_particle_stacks_from_paired_star_and_mrc_files()
wrapper
(see the image conversion documentation).
The expected fields are the same as for the ReLion case, above, except
that we will ignore the AngleRot, AngleTilt, AnglePsi, and
ImagePixelSize fields in this case, even if they are present.
CryoSPARC
CryoSparc files do not require a special library to read; they are assumed to be implemented as Numpy array files. We expect the following fields to be defined:
ctf/df1_Aas the “DefocusU”
ctf/df2_Aas the “DefocusV”
ctf/df_angle_radas the defocus angle
ctf/cs_mmas the spherical aberration value (note that this is assumed to be the same for all described images)
ctf/accel_kvas the voltage value (assumed consistent for all images)
ctf/amp_constrastas the amplitude contrast (assumed consistent for all images)
ctf/phase_shift_radas the phase shift value
With internal index
If the CryoSparc file defines an internal index of particle files, we will also look for the following fileds:
blob/path: defines the path to the MRC file containing each particle image
blob/idx: states the index, within the MRC file, of the image being described
blob/psize_A: optional. If defined, states the pixel size of the image
Note that all these values are being read by the same index. So for an indexed
CryoSparc file, looking at index i, we would expect:
ctf/df1_A[i]to give the defocus U value for that image
ctf/phase_shift_rad[i]to give the phase shift value for that image
blob/path[i]to be the path to the MRC file storing that particle image
blob/idx[i]to be the index withinblob/path[i]of that image
etc.
Without internal index
If the internal index fields are not present, we assume that the
records are correctly-ordered descriptors of the images in the
MRC files in the job directory. See the
image conversion documentation
for more details (convert_particle_stacks_from_cryosparc_job_directory()).