Image-to-Structure Likelihood Computation
The main output of CryoLike is the likelihood of between each input Images
and Templates created from 3D structures.
Overview
At its heart, CryoLike offers a way to compute the likelihood of a given observed 2D image to a particular 3D structure. As described in the Mathematical Framework, likelihood comparisons are based on comparing a stack of images with a templates set using the cross-correlation.
Templates sets are projections of a single 3D structure into image space from multiple viewing angles. Images will be compared against these templates at a number of different rotations and displacements, and the results can be returned with several different means of aggregation.
Note:
The Templates and Images stacks are unlikely to fit fully in GPU memory all at once, so CryoLike batches the comparison over several sets. To reduce memory transfer overhead, we preference Templates as the outer set of objects to loop over. We may provide more customization options for this in the future.
Main outputs
The primary outputs of CryoLike are the best cross-correlation for each image across every template set (each corresponding to a 3D structure), and the integrated likelihood for each image with respect to each 3D structure.
Interface
The run_likelihood module provides two wrapper functions that
serve as a convenient interface to
the underlying iterator and aggregator functions found in
cryolike.likelihoods. One wrapper returns the optimal pose
for each image
(cryolike.run_likelihood.run_likelihood_optimal_pose()),
and the other returns the full unaggregated cross correlation likelihood,
indexed by image, template, displacement, and inplane rotation
(cryolike.run_likelihood.run_likelihood_full_cross_correlation()).
For a worked example of this wrapper function in action, see the run likelihood example.
Both wrapper functions take the following parameters:
A configured file manager that handles fetching input files and writing output files to standard locations on the file system
A set of image descriptor parameters, in on-disk or in-memory form (
params_input)A callback function that applies the appropriate displacement-search grid to every batch of templates
The index of the template file to process (
i_template)The number of image stacks to process (
n_stacks)Whether to skip processing when the output files appear to exist already (
skip_exist)Number of templates and images to use per batch, and whether to attempt to determine those values automatically (
n_templates_per_batch,n_images_per_batch,discover_batch_size)
The file manager is provided by the
cryolike.run_likelihood.configure_likelihood_files() function, and
the displacer is provided by the
cryolike.run_likelihood.configure_displacement() function. See the
run likelihood example for example uses, and
the File and Directory Structure documentation for more details about
expected file locations.
Input system
We compute likelihood by matching images against templates.
We expect the templates to be located under the directory
specified by folder_templates and the images to be located
under the directory specified by folder_particles as passed to the
configure_likelihood_files() function. Specifically:
There must be a “template file list”
folder_templates/template_file_list.npyin thefolder_templatesdirectory which lists the available template stacks
The
i_templateparameter determines which of the template files in the template file list will be usedTemplates themselves can be placed anywhere, provided the template file list has paths to them
Image stacks should be in
folder_particles/fft/particles_fourier_stack_NUMBER.pt
NUMBERhere is a six-digit 0-padded increment starting from 0Every image file should have a correspondingly-named metadata file with an
.npzextension
It is anticipated that users may wish to run these comparisons in parallel,
especially when a cluster environment is available; hence the need for
the i_template parameter.
Displacement handling
The user specifies the displacement values to check using the
n_displacements_x, n_displacements_y, and
max_displacement_pixels parameters to the
cryolike.run_likelihood.configure_displacement() function,
which provides a callback that should be passed to the run_likelihood
wrapper.
To compute the available displacements, the
max_displacement_pixels is first
converted to Angstrom using the pixel size associated with
the image/template grids. The
resulting max_displacement is treated as a potential
displacement in either direction,
creating a total displacement length of 2 * max_displacement in
both dimensions.
This distance is then
divided linearly into n_displacements_x and n_displacements_y
steps, resulting in
a grid of displacement positions to test during cross-correlation
computation.
The set of displacements tested will be preserved in
folder_output/displacements_set.pt.
Possible outputs
CryoLike can return the computed values at the following levels of
aggregation. Note that the run_likelihood wrappers currently
only support computing optimal pose or providing the fully
unaggregated data, but other aggregation types are available in the
cryolike.likelihoods.interface module (just swap out the
compute_optimal_pose call for one of the other functions).
Output paths
The wrapper functions write computed likelihoods to disk for later review. The exact files written depend on which wrapper function is called.
The root output directory is specified by the folder_output parameter.
Within that directory, the following paths will be used. Note that the
directories will be created if they do not exist.
In the case of a name collision between an output file and an existing
file, the existing file will be overwritten unless the skip_exist
parameter is set and the complete set of output files are present.
For the following examples, assume folder_output is set to
OUT. N is the template number (the
value of i_template), NOT zero-padded.
STACK is the 6-digit 0-padded number, starting from 0, of the stack being
processed.
In all cases:
The actual set of displacement values used will be written to
OUT/displacements_set.pt
run_likelihood_optimal_pose(): Will write the 5 Tensors discussed above to individual files:
OUT/templateN/cross_correlation/cross_correlation_stack_STACK.pt
OUT/templateN/optimal_pose/optimal_template_stack_STACK.pt
OUT/templateN/optimal_pose/optimal_displacement_x_stack_STACK.pt
OUT/templateN/optimal_pose/optimal_displacement_y_stack_STACK.pt
OUT/templateN/optimal_pose/optimal_inplane_rotation_stack_STACK.pt
run_likelihood_full_cross_correlation()will, by contrast, write only a single file per image stack, toOUT/templateN/cross_correlation/cross_correlation_pose_msdw_stack_STACK.pt
Integrated Log-Likelihood
TODO: this seems inadequate, & also doesn’t distinguish between ILL and cross-correlation likelihood The integrated likelihood is calculated by comparing each image to each template in the Fourier-Bessel representation using the cross-correlation as described in the Mathematical Framework.
Cross-correlation
Optimal pose outputs
This will return 5 1-dimensional Tensors, indexed by the image sequence index:
Best cross-correlation value for each image (
cross_correlation_M). As described in the Mathematical Framework, CryoLike calculates the cross-correlation between each image and each template. This tensor reports the numeric value of the best match achieved.The template (by sequence number) of the best match (
optimal_template_M), i.e. the template that produced the number in the corresponding index ofcross_correlation_MThe optimal x-displacement matching this image with the best-fitting template (
optimal_displacement_x_M)The optimal y-displacement matching this image with the best-fitting template (
optimal_displacement_y_M)The optimal inplane rotation matching this image with the best-fitting template (
optimal_inplane_rotation_M)
Example:
So consider the values at index i, which correspond to the image at index i in the
input Images stack. Then:
cross_correlation_M[i]is the best alignment likelihoodoptimal_template_M[i]is the index of the template that got the score aboveoptimal_displacement_x_M[i]and..._y_M[i]are the displacements resulting in that alignment scoreoptimal_inplane_rotation_M[i]is the rotation resulting in that alignment score
Base Comparator
The underlying code that computes likelihood is found in the
compute_cross_correlation function. For further information, see
cryolike.likelihoods.kernels.