cryolike.file_conversions

cryolike.file_conversions.make_templates_from_inputs_api

cryolike.file_conversions.make_templates_from_inputs_api.make_templates_from_inputs(list_of_inputs: Sequence[str | ndarray | Tensor], image_parameters_file: str, output_plots: bool = True, folder_output: str = './templates/', verbose: bool = False)

Parse a series of inputs to internal pytorch tensor representation, then save to an output directory.

Parameters:
  • list_of_inputs (list) – List of inputs. Can be paths to pdb files, paths to mrc/mrcs/map files, or numpy arrays or torch tensors.

  • image_parameters_file (str) – Path to a saved image parameters file (ImageDescriptor)

  • output_plots (bool, optional) – Whether to output plots of the parsed Templates. Defaults to True.

  • folder_output (str, optional) – Directory in which to write the generated Template data. Defaults to “./templates/”.

  • verbose (bool, optional) – Whether to provide verbose output. Defaults to False.

Raises:

ValueError – If any inputs have an unrecognized file extension or are neither string nor array type.

Returns:

None. By side effect, all parsed templates will be written to an output directory as

Pytorch Tensors.

cryolike.file_conversions.particle_stacks_wrappers

cryolike.file_conversions.particle_stacks_wrappers.convert_particle_stacks_from_cryosparc_job_directory(params_input: str | ImageDescriptor, folder_cryosparc: str = '', job_number: int = 0, folder_output: str = '', batch_size: int = 1024, n_stacks_max: int = -1, pixel_size: float = -1.0, downsample_factor: int = 1, downsample_type: Literal['mean'] | Literal['max'] = 'mean', skip_exist: bool = False, flag_plots: bool = True)

Transcodes a set of (previously restacked) MRC files into internal representation, with optional downsampling.

Each source image file will be centered and normalized, and persisted in both physical/Cartesian and Fourier representations.

This function assumes metadata is stored in a cryosparc format, and the previously restacked images are stored in a single job directory as sequentially numbered files whose names match the pattern ‘batch_000000_restacked.mrc’ (for non-downsampled files) or ‘batch_000000_downsample.mrc’ (for previously downsampled files).

Parameters:
  • params_input (str | ImageDescriptor) – Parameters of the intended output. Mainly used to describe the polar grid for the Fourier representation and define output precision.

  • folder_cryosparc (str, optional) – Parent folder of the cryospark output jobs. Assumes directory structure is ‘folder_cryosparc/Jx/’ where x is the job number, with no padding. Defaults to the current directory.

  • job_number (int, optional) – Job number of the cryosparc output. Used to form the job output directory. The cryosparc metadata file describing the job is assumed to follow the format ‘Jx_passthrough_particles.cs’, where x is the job number (unpadded).

  • folder_output (str, optional) – Directory to use for outputting transcoded image files and metadata. Defaults to the current direcotry.

  • batch_size (int, optional) – Maximum number of images to include in each output file. Defaults to 1024.

  • n_stacks_max (int, optional) – Maximum number of stacks to emit before aborting the transcoding process. If -1 (the default), the entire source data will be processed.

  • pixel_size (float) – Side length of each pixel, in Angstroms. Pixels are assumed square.

  • downsample_factor (int, optional) – Downsampling factor to apply. Downsampling is skipped if the factor is 1 or below. Defaults to 1 (no downsampling).

  • skip_exist (bool, optional) – If True, we will skip processing of files that appear to have already been processed. This is currently not implemented.

  • flag_plots (bool, optional) – If True (the default), the function will output images and power spectrum along with the transcoding results.

cryolike.file_conversions.particle_stacks_wrappers.convert_particle_stacks_from_indexed_cryosparc_file(params_input: str | ImageDescriptor, file_cs: str, folder_cryosparc: str = '', folder_output: str = '', batch_size: int = 1024, n_stacks_max: int = -1, pixel_size: float | ndarray[tuple[int, ...], dtype[floating]] | None = None, downsample_factor: int = 1, downsample_type: Literal['mean'] | Literal['max'] = 'mean', skip_exist: bool = False, flag_plots: bool = True)

Transcodes a set of MRC files, with a cryosparc metadata file, into internal representation, with optional downsampling.

Each source image file will be centered and normalized, and persisted in both physical/Cartesian and Fourier representations.

The cryospark metadata file is expected to contain fields identifying the MRC files to be processed (blob/path and blob/idx). Transcoded data will be batched into outputs of a specified maximum image count.

Parameters:
  • params_input (str | ImageDescriptor) – Parameters of the intended output. Mainly used to describe the polar grid for the Fourier representation and define output precision.

  • file_cs (str, optional) – Path to cryosparc metadata file.

  • folder_cryosparc (str, optional) – Parent folder of the cryospark output jobs. Assumes directory structure is ‘folder_cryosparc/Jx/’ where x is the job number, with no padding. Defaults to the current directory.

  • folder_output (str, optional) – Directory to use for outputting transcoded image files and metadata. Defaults to the current direcotry.

  • batch_size (int, optional) – Maximum number of images to include in each output file. Defaults to 1024.

  • n_stacks_max (int, optional) – Maximum number of stacks to emit before aborting the transcoding process. If -1 (the default), the entire source data will be processed.

  • pixel_size (float | FloatArrayType | None) – Side length of each pixel, in Angstroms. If a scalar, this will be treated as the side length of square pixels. If a Numpy array, the first two elements will be taken as the x and y dimensions. If None (the default), we will attempt to read the value from the cryosparc file (blob/psize_A field).

  • downsample_factor (int, optional) – Downsampling factor to apply. Downsampling is skipped if the factor is 1 or below. Defaults to 1 (no downsampling).

  • downsample_type (Literal['mean'] | Literal['max']) – The type of downsampling to use in physical space

  • skip_exist (bool, optional) – If True, we will skip processing of files that appear to have already been processed. This is currently not implemented.

  • flag_plots (bool, optional) – If True (the default), the function will output images and power spectrum along with the transcoding results.

cryolike.file_conversions.particle_stacks_wrappers.convert_particle_stacks_from_indexed_star_file(params_input: str | ImageDescriptor, star_file: str, folder_mrc: str = '', folder_output: str = '', batch_size: int = 1024, downsample_factor: int = 1, downsample_type: Literal['mean'] | Literal['max'] = 'mean', pixel_size: float | ndarray[tuple[int, ...], dtype[floating]] | None = None, flag_plots: bool = True)

Transcode a set of particle files, with metadata described in starfile format, to consistent batches in a specified output folder.

Each source image file will be centered and normalized, and persisted in both physical/Cartesian and Fourier representations.

Parameters:
  • params_input (str | ImageDescriptor) – Parameters of the intended output. Mainly used to describe the polar grid for the Fourier-space representation and define the precision of our output.

  • star_file_list (str) – List of star files to process. Defaults to []. Should be of the same length and in the same order as the particle files.

  • folder_mrc (str) – Folder containing the MRC files. If set to ‘’, use relative path stated in the star file.

  • folder_output (str, optional) – Folder in which to output transcoding results. Defaults to ‘’, i.e. outputting in the current working directory.

  • batch_size (int, optional) – Maximum number of images per output file. Defaults to 1024.

  • pixel_size (float | FloatArrayType | None, optional) – Size of each pixel, in Angstroms, as a square side-length or Numpy array of (x, y). If unset (the default), it will be read from the MRC particle files.

  • flag_plots (bool, optional) – Whether to plot images and power spectrum along with the transcoding results. Defaults to True.

cryolike.file_conversions.particle_stacks_wrappers.convert_particle_stacks_from_paired_star_and_mrc_files(params_input: str | ImageDescriptor, particle_file_list: list[str], star_file_list: list[str], folder_output: str = '', batch_size: int = 1024, pixel_size: float | ndarray[tuple[int, ...], dtype[floating]] | None = None, defocus_angle_is_degree: bool = True, phase_shift_is_degree: bool = True, downsample_factor: int = 1, downsample_type: Literal['mean'] | Literal['max'] = 'mean', skip_exist: bool = False, flag_plots: bool = True, overwrite: bool = False)

Transcode a set of particle files, with metadata described in starfile format, to consistent batches in a specified output folder.

Each source image file will be centered and normalized, and persisted in both physical/Cartesian and Fourier representations.

Parameters:
  • params_input (str | ImageDescriptor) – Parameters of the intended output. Mainly used to describe the polar grid for the Fourier-space representation and define the precision of our output.

  • particle_file_list (list[str]) – List of particle files to process. Defaults to [].

  • star_file_list (list[str]) – List of star files to process. Defaults to []. Should be of the same length and in the same order as the particle files.

  • folder_output (str, optional) – Folder in which to output transcoding results. Defaults to ‘’, i.e. outputting in the current working directory.

  • batch_size (int, optional) – Maximum number of images per output file. Defaults to 1024.

  • pixel_size (float | FloatArrayType | None, optional) – Size of each pixel, in Angstroms, as a square side-length or Numpy array of (x, y). If unset (the default), it will be read from the MRC particle files.

  • defocus_angle_is_degree (bool, optional) – Whether the defocus angle in the metadata is in degrees (as opposed to radians). Defaults to True (for star files).

  • phase_shift_is_degree (bool, optional) – Whether the phase shift angle in the metadata is in degrees (as opposed to radians). Defaults to True (for star files).

  • skip_exist (bool, optional) – If True, we will skip processing on files that appear to have already been processed. This is currently not implemented, to avoid inadvertently dropping data. Defaults to False.

  • flag_plots (bool, optional) – Whether to plot images and power spectrum along with the transcoding results. Defaults to True.

  • overwrite (bool, optional) – Whether to overwrite existing stacks. Defaults to False.

cryolike.file_conversions.particle_stacks_converter

class cryolike.file_conversions.particle_stacks_converter.ParticleStackConverter(image_descriptor: str | ImageDescriptor, folder_output: str = '', n_stacks_max: int = -1, pixel_size: float | ndarray[tuple[int, ...], dtype[floating]] | None = None, downsample_factor: int = 1, downsample_type: Literal['mean'] | Literal['max'] = 'mean', skip_exist: bool = False, overwrite: bool = False, flag_plots: bool = True, device: str | device = 'cpu')

Bases: object

Object that manages converting images in Starfile or Cryosparc format to the internal format used by this package.

inputs_buffer

Internal deque storing partially processed data sources

Type:

deque[DataSource]

img_desc

ImageDescriptor expected to validly describe all images to be converted

Type:

ImageDescriptor

lens_desc

LensDescriptor that describes the experimental apparatus for all images to be converted. For Starfile sources, this will be reset with every new source file.

Type:

LensDescriptor

images_buffer

Buffer of images processed from mrc files. Used in restacking.

Type:

ImgBuffer

lens_desc_buffer

Buffer of per-image lens descriptor properties (defocus and phase shift). Used in restacking.

Type:

LensDescriptorBuffer

_must_flush_buffer

Internal tracking. If set, will completely empty the image buffer before processing a new file (the default behavior for starfile sources).

Type:

bool

_stack_start_file

Internal tracking. This represents the overall image number at which we began outputting the current batch. Used for sequential cryosparc outputs.

Type:

int

i_stacks

Total number of stacks output by the converter

Type:

int

_stack_absolute_index

Internal tracking. Used for Starfile data sources, which may be split into multiple files, to record the range of images in the source file which are output in the current stack. So if we have 150 images in the input Starfile and are outputting batch sizes of 100, we will emit two files; the first will have an absolute index of 0 and the second will have an absolute index of 100.

Type:

int

device

Device to use for converting MRC image files. Defaults to CPU.

Type:

torch.device

max_stacks

If set to a value greater than 0, the converter will stop processing once this many stacks have been emitted

Type:

int

pixel_size

Pixel size describing the physical images being processed. For indexed Cryosparc files, this data may be present in the source file. If so, the configured value must match the one in the source file, or an error will be generated.

Type:

FloatArrayType | None

downsample_factor

If set, downsample by this factor

Type:

int

downsample_type

The type of downsampling to use in physical space

Type:

Literal[‘mean’] | Literal[‘max’]

skip_exist

Not implemented. Once implemented, if set, this will cause the converter to attempt to skip files that appear to have already been processed.

Type:

bool

output_plots

If True, we will emit plots of the processed images

Type:

bool

max_imgs_to_plot

Sets the maximum number of images to plot; has no effect if output_plots is False.

Type:

int

convert_stacks(batch_size: int = 1024, never_combine_input_files: bool = False)

After preprocessing is complete, this function actually does the image conversion and outputs regular-sized batches. For Starfile inputs, each input file will result in one or more output stacks; for Cryosparc files, image inputs will be buffered and restacked, with only the final stack being smaller than the requested batch size.

If desired, the Starfile behavior (ensure each source file gets its own stack or stacks) can be emulated for Cryosparc files.

Parameters:
  • batch_size (int, optional) – Target stack size. Defaults to 1024.

  • never_combine_input_files (bool, optional) – If set, Cryosparc source files will be restacked in the same way as Starfile sources, i.e. one source file will generate one or more output stacks, but no output stack will contain images from multiple source files. Defaults to False.

prepare_indexed_file(src_file: str, filetype: Literal['cryosparc'] | Literal['starfile'], mrc_folder: str = '', ignore_manual_pixel_size: bool = False)

Preprocesses an indexed Starfile or Cryosparc file for conversion.

Parameters:
  • src_file (str) – Path to index to process

  • filetype (Literal['cryosparc'] | Literal['starfile']) – Type of input file

  • mrc_folder (str, optional) – Folder where MRC files can be found. If left as an empty string (‘’), assume that the source index file contains a correct relative path to the MRC files. Defaults to ‘’.

  • ignore_manual_pixel_size (bool, optional) – If True, will attempt to resolve conflicts between the pixel size in the metadata file and one that’s manually input by the caller. This may not be a good idea. Defaults to False.

prepare_sequential_cryosparc(folder_cryosparc: str, job_number: int = 0)

Preprocesses a set of sequential files with a Cryosparc descriptor. The expected directory structure is as follows. Assume:

  • the parent folder is FOLDER

  • the job number is 15

Then we expect the following directories to exist:

  • FOLDER/J15/J15_passthrough_particles.cs

  • An MRC file folder. This should be one of:

    • FOLDER/J15/restack

    • FOLDER/J15/downsample

    If both exist, the restack folder will be used.

  • MRC files in this folder should follow the naming convention:

    • batch_0_restacked.mrc, batch_1_restacked.mrc, … for restack or

    • batch_000000_downsample.mrc, batch_000001_downsample.mrc, … for downsample

MRC files will be processed sequentially until the first missing number in the sequence.

Parameters:
  • folder_cryosparc (str) – Path to directory on the filesystem where files are located

  • job_number (int, optional) – The number of the job, used to build out the expected directory structure for the source MRC files. Defaults to 0.

prepare_star_files(particle_file_list: list[str], star_file_list: list[str], defocus_angle_is_degree: bool = True, phase_shift_is_degree: bool = True)

Preprocesses image and starfiles so they are ready for conversion.

Parameters:
  • particle_file_list (list[str]) – List of filesystem paths pointing to MRC files containing image records

  • star_file_list (list[str]) – List of filesystem paths pointing to Starfile descriptors for the image records

  • defocus_angle_is_degree (bool, optional) – If True, the defocus angle values in the starfiles are presumed to be in degrees and will be converted to radians. Defaults to True.

  • phase_shift_is_degree (bool, optional) – If True, the phase shift values in the starfiles are presumed to be in degrees and will be converted to radians. Defaults to True.