cryolike.file_conversions
cryolike.file_conversions.make_templates_from_inputs_api
- cryolike.file_conversions.make_templates_from_inputs_api.make_templates_from_inputs(list_of_inputs: Sequence[str | ndarray | Tensor], image_parameters_file: str, output_plots: bool = True, folder_output: str = './templates/', verbose: bool = False)
Parse a series of inputs to internal pytorch tensor representation, then save to an output directory.
- Parameters:
list_of_inputs (list) – List of inputs. Can be paths to pdb files, paths to mrc/mrcs/map files, or numpy arrays or torch tensors.
image_parameters_file (str) – Path to a saved image parameters file (ImageDescriptor)
output_plots (bool, optional) – Whether to output plots of the parsed Templates. Defaults to True.
folder_output (str, optional) – Directory in which to write the generated Template data. Defaults to “./templates/”.
verbose (bool, optional) – Whether to provide verbose output. Defaults to False.
- Raises:
ValueError – If any inputs have an unrecognized file extension or are neither string nor array type.
- Returns:
- None. By side effect, all parsed templates will be written to an output directory as
Pytorch Tensors.
cryolike.file_conversions.particle_stacks_wrappers
- cryolike.file_conversions.particle_stacks_wrappers.convert_particle_stacks_from_cryosparc_job_directory(params_input: str | ImageDescriptor, folder_cryosparc: str = '', job_number: int = 0, folder_output: str = '', batch_size: int = 1024, n_stacks_max: int = -1, pixel_size: float = -1.0, downsample_factor: int = 1, downsample_type: Literal['mean'] | Literal['max'] = 'mean', skip_exist: bool = False, flag_plots: bool = True)
Transcodes a set of (previously restacked) MRC files into internal representation, with optional downsampling.
Each source image file will be centered and normalized, and persisted in both physical/Cartesian and Fourier representations.
This function assumes metadata is stored in a cryosparc format, and the previously restacked images are stored in a single job directory as sequentially numbered files whose names match the pattern ‘batch_000000_restacked.mrc’ (for non-downsampled files) or ‘batch_000000_downsample.mrc’ (for previously downsampled files).
- Parameters:
params_input (str | ImageDescriptor) – Parameters of the intended output. Mainly used to describe the polar grid for the Fourier representation and define output precision.
folder_cryosparc (str, optional) – Parent folder of the cryospark output jobs. Assumes directory structure is ‘folder_cryosparc/Jx/’ where x is the job number, with no padding. Defaults to the current directory.
job_number (int, optional) – Job number of the cryosparc output. Used to form the job output directory. The cryosparc metadata file describing the job is assumed to follow the format ‘Jx_passthrough_particles.cs’, where x is the job number (unpadded).
folder_output (str, optional) – Directory to use for outputting transcoded image files and metadata. Defaults to the current direcotry.
batch_size (int, optional) – Maximum number of images to include in each output file. Defaults to 1024.
n_stacks_max (int, optional) – Maximum number of stacks to emit before aborting the transcoding process. If -1 (the default), the entire source data will be processed.
pixel_size (float) – Side length of each pixel, in Angstroms. Pixels are assumed square.
downsample_factor (int, optional) – Downsampling factor to apply. Downsampling is skipped if the factor is 1 or below. Defaults to 1 (no downsampling).
skip_exist (bool, optional) – If True, we will skip processing of files that appear to have already been processed. This is currently not implemented.
flag_plots (bool, optional) – If True (the default), the function will output images and power spectrum along with the transcoding results.
- cryolike.file_conversions.particle_stacks_wrappers.convert_particle_stacks_from_indexed_cryosparc_file(params_input: str | ImageDescriptor, file_cs: str, folder_cryosparc: str = '', folder_output: str = '', batch_size: int = 1024, n_stacks_max: int = -1, pixel_size: float | ndarray[tuple[int, ...], dtype[floating]] | None = None, downsample_factor: int = 1, downsample_type: Literal['mean'] | Literal['max'] = 'mean', skip_exist: bool = False, flag_plots: bool = True)
Transcodes a set of MRC files, with a cryosparc metadata file, into internal representation, with optional downsampling.
Each source image file will be centered and normalized, and persisted in both physical/Cartesian and Fourier representations.
The cryospark metadata file is expected to contain fields identifying the MRC files to be processed (blob/path and blob/idx). Transcoded data will be batched into outputs of a specified maximum image count.
- Parameters:
params_input (str | ImageDescriptor) – Parameters of the intended output. Mainly used to describe the polar grid for the Fourier representation and define output precision.
file_cs (str, optional) – Path to cryosparc metadata file.
folder_cryosparc (str, optional) – Parent folder of the cryospark output jobs. Assumes directory structure is ‘folder_cryosparc/Jx/’ where x is the job number, with no padding. Defaults to the current directory.
folder_output (str, optional) – Directory to use for outputting transcoded image files and metadata. Defaults to the current direcotry.
batch_size (int, optional) – Maximum number of images to include in each output file. Defaults to 1024.
n_stacks_max (int, optional) – Maximum number of stacks to emit before aborting the transcoding process. If -1 (the default), the entire source data will be processed.
pixel_size (float | FloatArrayType | None) – Side length of each pixel, in Angstroms. If a scalar, this will be treated as the side length of square pixels. If a Numpy array, the first two elements will be taken as the x and y dimensions. If None (the default), we will attempt to read the value from the cryosparc file (blob/psize_A field).
downsample_factor (int, optional) – Downsampling factor to apply. Downsampling is skipped if the factor is 1 or below. Defaults to 1 (no downsampling).
downsample_type (Literal['mean'] | Literal['max']) – The type of downsampling to use in physical space
skip_exist (bool, optional) – If True, we will skip processing of files that appear to have already been processed. This is currently not implemented.
flag_plots (bool, optional) – If True (the default), the function will output images and power spectrum along with the transcoding results.
- cryolike.file_conversions.particle_stacks_wrappers.convert_particle_stacks_from_indexed_star_file(params_input: str | ImageDescriptor, star_file: str, folder_mrc: str = '', folder_output: str = '', batch_size: int = 1024, downsample_factor: int = 1, downsample_type: Literal['mean'] | Literal['max'] = 'mean', pixel_size: float | ndarray[tuple[int, ...], dtype[floating]] | None = None, flag_plots: bool = True)
Transcode a set of particle files, with metadata described in starfile format, to consistent batches in a specified output folder.
Each source image file will be centered and normalized, and persisted in both physical/Cartesian and Fourier representations.
- Parameters:
params_input (str | ImageDescriptor) – Parameters of the intended output. Mainly used to describe the polar grid for the Fourier-space representation and define the precision of our output.
star_file_list (str) – List of star files to process. Defaults to []. Should be of the same length and in the same order as the particle files.
folder_mrc (str) – Folder containing the MRC files. If set to ‘’, use relative path stated in the star file.
folder_output (str, optional) – Folder in which to output transcoding results. Defaults to ‘’, i.e. outputting in the current working directory.
batch_size (int, optional) – Maximum number of images per output file. Defaults to 1024.
pixel_size (float | FloatArrayType | None, optional) – Size of each pixel, in Angstroms, as a square side-length or Numpy array of (x, y). If unset (the default), it will be read from the MRC particle files.
flag_plots (bool, optional) – Whether to plot images and power spectrum along with the transcoding results. Defaults to True.
- cryolike.file_conversions.particle_stacks_wrappers.convert_particle_stacks_from_paired_star_and_mrc_files(params_input: str | ImageDescriptor, particle_file_list: list[str], star_file_list: list[str], folder_output: str = '', batch_size: int = 1024, pixel_size: float | ndarray[tuple[int, ...], dtype[floating]] | None = None, defocus_angle_is_degree: bool = True, phase_shift_is_degree: bool = True, downsample_factor: int = 1, downsample_type: Literal['mean'] | Literal['max'] = 'mean', skip_exist: bool = False, flag_plots: bool = True, overwrite: bool = False)
Transcode a set of particle files, with metadata described in starfile format, to consistent batches in a specified output folder.
Each source image file will be centered and normalized, and persisted in both physical/Cartesian and Fourier representations.
- Parameters:
params_input (str | ImageDescriptor) – Parameters of the intended output. Mainly used to describe the polar grid for the Fourier-space representation and define the precision of our output.
particle_file_list (list[str]) – List of particle files to process. Defaults to [].
star_file_list (list[str]) – List of star files to process. Defaults to []. Should be of the same length and in the same order as the particle files.
folder_output (str, optional) – Folder in which to output transcoding results. Defaults to ‘’, i.e. outputting in the current working directory.
batch_size (int, optional) – Maximum number of images per output file. Defaults to 1024.
pixel_size (float | FloatArrayType | None, optional) – Size of each pixel, in Angstroms, as a square side-length or Numpy array of (x, y). If unset (the default), it will be read from the MRC particle files.
defocus_angle_is_degree (bool, optional) – Whether the defocus angle in the metadata is in degrees (as opposed to radians). Defaults to True (for star files).
phase_shift_is_degree (bool, optional) – Whether the phase shift angle in the metadata is in degrees (as opposed to radians). Defaults to True (for star files).
skip_exist (bool, optional) – If True, we will skip processing on files that appear to have already been processed. This is currently not implemented, to avoid inadvertently dropping data. Defaults to False.
flag_plots (bool, optional) – Whether to plot images and power spectrum along with the transcoding results. Defaults to True.
overwrite (bool, optional) – Whether to overwrite existing stacks. Defaults to False.
cryolike.file_conversions.particle_stacks_converter
- class cryolike.file_conversions.particle_stacks_converter.ParticleStackConverter(image_descriptor: str | ImageDescriptor, folder_output: str = '', n_stacks_max: int = -1, pixel_size: float | ndarray[tuple[int, ...], dtype[floating]] | None = None, downsample_factor: int = 1, downsample_type: Literal['mean'] | Literal['max'] = 'mean', skip_exist: bool = False, overwrite: bool = False, flag_plots: bool = True, device: str | device = 'cpu')
Bases:
objectObject that manages converting images in Starfile or Cryosparc format to the internal format used by this package.
- inputs_buffer
Internal deque storing partially processed data sources
- Type:
deque[DataSource]
- img_desc
ImageDescriptor expected to validly describe all images to be converted
- Type:
- lens_desc
LensDescriptor that describes the experimental apparatus for all images to be converted. For Starfile sources, this will be reset with every new source file.
- Type:
- images_buffer
Buffer of images processed from mrc files. Used in restacking.
- Type:
ImgBuffer
- lens_desc_buffer
Buffer of per-image lens descriptor properties (defocus and phase shift). Used in restacking.
- Type:
LensDescriptorBuffer
- _must_flush_buffer
Internal tracking. If set, will completely empty the image buffer before processing a new file (the default behavior for starfile sources).
- Type:
- _stack_start_file
Internal tracking. This represents the overall image number at which we began outputting the current batch. Used for sequential cryosparc outputs.
- Type:
- _stack_absolute_index
Internal tracking. Used for Starfile data sources, which may be split into multiple files, to record the range of images in the source file which are output in the current stack. So if we have 150 images in the input Starfile and are outputting batch sizes of 100, we will emit two files; the first will have an absolute index of 0 and the second will have an absolute index of 100.
- Type:
- device
Device to use for converting MRC image files. Defaults to CPU.
- Type:
torch.device
- max_stacks
If set to a value greater than 0, the converter will stop processing once this many stacks have been emitted
- Type:
- pixel_size
Pixel size describing the physical images being processed. For indexed Cryosparc files, this data may be present in the source file. If so, the configured value must match the one in the source file, or an error will be generated.
- Type:
FloatArrayType | None
- downsample_type
The type of downsampling to use in physical space
- Type:
Literal[‘mean’] | Literal[‘max’]
- skip_exist
Not implemented. Once implemented, if set, this will cause the converter to attempt to skip files that appear to have already been processed.
- Type:
- max_imgs_to_plot
Sets the maximum number of images to plot; has no effect if output_plots is False.
- Type:
- convert_stacks(batch_size: int = 1024, never_combine_input_files: bool = False)
After preprocessing is complete, this function actually does the image conversion and outputs regular-sized batches. For Starfile inputs, each input file will result in one or more output stacks; for Cryosparc files, image inputs will be buffered and restacked, with only the final stack being smaller than the requested batch size.
If desired, the Starfile behavior (ensure each source file gets its own stack or stacks) can be emulated for Cryosparc files.
- Parameters:
batch_size (int, optional) – Target stack size. Defaults to 1024.
never_combine_input_files (bool, optional) – If set, Cryosparc source files will be restacked in the same way as Starfile sources, i.e. one source file will generate one or more output stacks, but no output stack will contain images from multiple source files. Defaults to False.
- prepare_indexed_file(src_file: str, filetype: Literal['cryosparc'] | Literal['starfile'], mrc_folder: str = '', ignore_manual_pixel_size: bool = False)
Preprocesses an indexed Starfile or Cryosparc file for conversion.
- Parameters:
src_file (str) – Path to index to process
filetype (Literal['cryosparc'] | Literal['starfile']) – Type of input file
mrc_folder (str, optional) – Folder where MRC files can be found. If left as an empty string (‘’), assume that the source index file contains a correct relative path to the MRC files. Defaults to ‘’.
ignore_manual_pixel_size (bool, optional) – If True, will attempt to resolve conflicts between the pixel size in the metadata file and one that’s manually input by the caller. This may not be a good idea. Defaults to False.
- prepare_sequential_cryosparc(folder_cryosparc: str, job_number: int = 0)
Preprocesses a set of sequential files with a Cryosparc descriptor. The expected directory structure is as follows. Assume:
the parent folder is FOLDER
the job number is 15
Then we expect the following directories to exist:
FOLDER/J15/J15_passthrough_particles.cs
An MRC file folder. This should be one of:
FOLDER/J15/restack
FOLDER/J15/downsample
If both exist, the restack folder will be used.
MRC files in this folder should follow the naming convention:
batch_0_restacked.mrc, batch_1_restacked.mrc, … for restack or
batch_000000_downsample.mrc, batch_000001_downsample.mrc, … for downsample
MRC files will be processed sequentially until the first missing number in the sequence.
- prepare_star_files(particle_file_list: list[str], star_file_list: list[str], defocus_angle_is_degree: bool = True, phase_shift_is_degree: bool = True)
Preprocesses image and starfiles so they are ready for conversion.
- Parameters:
particle_file_list (list[str]) – List of filesystem paths pointing to MRC files containing image records
star_file_list (list[str]) – List of filesystem paths pointing to Starfile descriptors for the image records
defocus_angle_is_degree (bool, optional) – If True, the defocus angle values in the starfiles are presumed to be in degrees and will be converted to radians. Defaults to True.
phase_shift_is_degree (bool, optional) – If True, the phase shift values in the starfiles are presumed to be in degrees and will be converted to radians. Defaults to True.