crantpy.utils.cave.segmentation module#
This module provides utility functions for cave-related segmentation operations.
Ported and adapted from fafbseg-py (navis-org/fafbseg-py) for use with CRANT datasets.
Available Functions#
- Utilities:
_check_bounds_coverage: Check if bounds are within CloudVolume coverage (internal)
- Core Segmentation:
roots_to_supervoxels: Get supervoxels making up given neurons
supervoxels_to_roots: Get root IDs for given supervoxels
update_ids: Update root IDs to their latest versions
- Location-based Queries:
locs_to_supervoxels: Get supervoxel IDs at given locations
locs_to_segments: Get root IDs at given locations
snap_to_id: Snap locations to the correct segmentation ID
- Neuron Analysis:
neuron_to_segments: Get root IDs overlapping with a neuron
get_lineage_graph: Get lineage graph showing edit history
- Voxel Operations:
get_voxels: Fetch voxels making up a given root ID
get_segmentation_cutout: Fetch cutout of segmentation
- Temporal Analysis:
find_common_time: Find time when root IDs co-existed
Note: is_valid_root, is_valid_supervoxel, and is_latest_roots are in helpers.py
Coordinate System Notes#
Understanding coordinate systems is critical when working with segmentation data:
- CAVE API (ChunkedGraph/Materialization):
Uses nanometers for all spatial coordinates
Base resolution: [8, 8, 42] nm/voxel (X, Y, Z)
Full dataset coverage
- CloudVolume API:
Works in voxel space at the current MIP (scale) level
IMPORTANT: vol.bounds returns voxel coordinates, NOT nanometers
- Resolution varies by MIP level:
MIP 0: [16, 16, 42] nm/voxel (missing CAVE’s 8nm base layer!)
MIP 1: [32, 32, 42] nm/voxel
Limited spatial coverage: Only ~360 x 344 x 257 µm region
Does NOT cover the full CAVE dataset
- Coordinate Conversions:
nm → voxels: divide by vol.scale[“resolution”]
voxels → nm: multiply by vol.scale[“resolution”]
Always use CAVE base resolution for voxel coordinate inputs
CloudVolume resolution changes with MIP level
- Best Practices:
Always provide explicit bounds for voxel operations
Use small regions (< 10 µm) for get_voxels queries
For full neurons, use get_l2_skeleton or get_mesh instead
Test bounds with get_segmentation_cutout before large queries
When in doubt, use nanometers (the CAVE standard)
- crantpy.utils.cave.segmentation.find_common_time(root_ids, progress=True, *, dataset=None)[source]#
Find a time at which given root IDs co-existed.
- Parameters:
- Returns:
A timestamp when all root IDs existed simultaneously.
- Return type:
Examples
>>> from crantpy.utils.cave.segmentation import find_common_time >>> common_time = find_common_time([123456789, 987654321])
- crantpy.utils.cave.segmentation.get_lineage_graph(x, progress=True, *, dataset=None)[source]#
Get lineage graph for given neuron.
- Parameters:
- Returns:
The lineage graph showing the history of edits for this root ID.
- Return type:
networkx.DiGraph
Examples
>>> from crantpy.utils.cave.segmentation import get_lineage_graph >>> G = get_lineage_graph(720575940621039145) >>> len(G.nodes()) 150
- crantpy.utils.cave.segmentation.get_segmentation_cutout(bbox, root_ids=True, mip=0, coordinates='nm', *, dataset=None)[source]#
Fetch cutout of segmentation.
- Parameters:
bbox (array-like) –
- Bounding box for the cutout::
[[xmin, xmax], [ymin, ymax], [zmin, zmax]]
root_ids (bool) – If True, will return root IDs. If False, will return supervoxel IDs.
mip (int) – Scale at which to fetch the cutout.
coordinates ("voxel" | "nm") – Units in which your coordinates are in.
dataset (str) – The dataset to use. If not provided, uses the default dataset.
- Returns:
cutout (np.ndarray) – (N, M, P) array of segmentation (root or supervoxel) IDs.
resolution ((3, ) numpy array) – [x, y, z] resolution of voxel in cutout.
nm_offset ((3, ) numpy array) – [x, y, z] offset in nanometers of the cutout with respect to the absolute coordinates.
Examples
>>> from crantpy.utils.cave.segmentation import get_segmentation_cutout >>> bbox = [[100000, 100100], [50000, 50100], [3000, 3010]] >>> cutout, resolution, offset = get_segmentation_cutout(bbox)
- crantpy.utils.cave.segmentation.get_voxels(x, mip=0, bounds=None, sv_map=False, thin=False, use_l2_chunks=True, threads=1, progress=True, *, dataset=None)[source]#
Fetch voxels making up a given root ID.
This function has two modes: 1. L2 chunk-based (default): Fetches voxels chunk by chunk using L2 IDs 2. Cutout-based: Downloads entire bounding box and extracts voxels
IMPORTANT - CloudVolume Limitations:
CloudVolume has limited spatial coverage (~360 x 344 x 257 µm) and is missing CAVE’s highest resolution layer. For most use cases, prefer: - get_l2_skeleton() for full neuron morphology - get_mesh() for 3D visualization - This function only for specific voxel-level analysis in small regions
- Parameters:
x (int) – A single root ID.
mip (int) – Scale at which to fetch voxels. For example, mip=0 is at highest resolution (16x16x42 nm/voxel for CloudVolume). Every subsequent mip halves the resolution. Use higher mip for faster queries: mip=1 is often sufficient.
bounds ((3, 2) or (2, 3) array, optional) – Bounding box to return voxels in (in nanometers). Format: [[xmin, xmax], [ymin, ymax], [zmin, zmax]] REQUIRED in practice - without bounds, will attempt to fetch entire neuron which may exceed CloudVolume coverage. Use small regions (< 10 µm per dimension) for best results.
sv_map (bool) – If True, additionally return a map with the supervoxel ID for each voxel. Useful for detailed connectivity analysis.
thin (bool) – If True, will remove voxels at the interface of adjacent supervoxels that are not supposed to be connected according to the L2 graph. Useful for neurons that self-touch. WARNING: This is computationally expensive!
use_l2_chunks (bool) – If True (default), fetch voxels chunk by chunk using L2 IDs. Faster and more memory efficient for neurons with L2 metadata. If False, download entire bounding box as single cutout.
threads (int) – Number of parallel threads for CloudVolume operations. More threads = faster but more memory usage.
progress (bool) – Whether to show a progress bar or not.
dataset (str) – The dataset to use. If not provided, uses the default dataset.
- Returns:
voxels ((N, 3) np.ndarray) – Voxel coordinates in voxel space according to mip. Each row is [x, y, z] in voxel coordinates.
sv_map ((N, ) np.ndarray) – Supervoxel ID for each voxel. Only if sv_map=True.
- Raises:
ValueError – If bounds exceed CloudVolume coverage or if root ID is invalid.
Examples
>>> from crantpy.utils.cave.segmentation import get_voxels >>> # RECOMMENDED: Always use explicit bounds within CloudVolume coverage >>> bounds = [[100000, 105000], [50000, 55000], [3000, 3100]] >>> voxels = get_voxels(720575940621039145, bounds=bounds, mip=1) >>> print(f"Retrieved {len(voxels)} voxels")
>>> # Get voxels with supervoxel mapping for detailed analysis >>> voxels, svids = get_voxels( ... 720575940621039145, ... bounds=bounds, ... mip=1, ... sv_map=True ... ) >>> print(f"Voxels from {len(np.unique(svids))} supervoxels")
>>> # Use cutout method for small regions without L2 metadata >>> voxels = get_voxels( ... 720575940621039145, ... bounds=bounds, ... mip=1, ... use_l2_chunks=False ... )
Notes
Coordinate System: - Input bounds are in nanometers (CAVE standard) - Output voxels are in voxel space at the specified MIP level - CloudVolume resolution at MIP 0: [16, 16, 42] nm/voxel - To convert voxels to nm: voxels * resolution
Performance Tips: - Use mip=1 (32x32x42 nm/voxel) for faster queries when precise
resolution isn’t critical
Keep bounds small (< 10 µm per dimension) to avoid timeouts
Set use_l2_chunks=True (default) for neurons with L2 metadata
Disable sv_map if you don’t need supervoxel IDs (faster)
Only use thin=True when absolutely necessary (very slow)
Common Issues: - “Bounds exceed CloudVolume coverage”: Your bounds are outside the
~360 µm cube that CloudVolume contains. Try smaller bounds or check if your neuron is within CloudVolume’s spatial coverage.
Slow performance: Reduce bounds size, increase mip level, or use get_l2_skeleton() instead for full neuron morphology.
Empty result: The neuron may not have voxels in the specified bounds, or bounds may need adjustment.
See also
get_l2_skeleton
Get skeleton representation (better for full neurons)
get_mesh
Get 3D mesh (better for visualization)
get_segmentation_cutout
Get all segmentation in a region
- crantpy.utils.cave.segmentation.locs_to_segments(locs, timestamp=None, coordinates='nm', progress=True, *, dataset=None)[source]#
Retrieve segment (i.e. root) IDs at given location(s).
- Parameters:
locs (array-like | pandas.DataFrame) – Array of x/y/z coordinates. If DataFrame must contain ‘x’, ‘y’, ‘z’ columns.
timestamp (Timestamp, optional) – Get roots at given date (and time). Int must be unix timestamp. String must be ISO 8601 - e.g. ‘2021-11-15’. “mat” will use the timestamp of the most recent materialization.
coordinates ("nm" | "voxel") – Units in which your coordinates are in.
progress (bool) – If True, show progress bar.
dataset (str) – The dataset to use. If not provided, uses the default dataset.
- Returns:
List of root IDs in the same order as
locs
.- Return type:
numpy.array
Examples
>>> from crantpy.utils.cave.segmentation import locs_to_segments >>> locs = [[133131, 55615, 3289], [132802, 55661, 3289]] >>> locs_to_segments(locs, dataset='latest') array([720575940631693610, 720575940631693610])
- crantpy.utils.cave.segmentation.locs_to_supervoxels(locs, mip=0, coordinates='nm', progress=True, *, dataset=None)[source]#
Retrieve supervoxel IDs at given location(s).
- Parameters:
locs (array-like | pandas.DataFrame) – Array of x/y/z coordinates. If DataFrame must contain ‘x’, ‘y’, ‘z’ columns.
mip (int) – Scale to query. Lower mip = more precise but slower; higher mip = faster but less precise. The default is 0 which is the highest resolution.
coordinates ("nm" | "voxel") – Units in which your coordinates are in. “nm” for nanometers, “voxel” for voxel coordinates.
progress (bool) – If True, show progress bar.
dataset (str) – The dataset to use. If not provided, uses the default dataset.
- Returns:
List of supervoxel IDs in the same order as
locs
. Invalid locations will be returned with ID 0.- Return type:
numpy.array
Examples
>>> from crantpy.utils.cave.segmentation import locs_to_supervoxels >>> locs = [[133131, 55615, 3289], [132802, 55661, 3289]] >>> locs_to_supervoxels(locs, dataset='latest') array([79801454835332154, 79731086091150780], dtype=uint64)
- crantpy.utils.cave.segmentation.neuron_to_segments(x, short=False, coordinates='nm', *, dataset=None)[source]#
Get root IDs overlapping with a given neuron.
- Parameters:
x (Neuron/List) – Neurons for which to return root IDs. Neurons must be in the correct coordinate space for the dataset.
short (bool) – If True will only return the top hit for each neuron (including a confidence score).
coordinates ("voxel" | "nm") – Units the neuron(s) are in.
dataset (str) – The dataset to use. If not provided, uses the default dataset.
- Returns:
overlap_matrix (pandas.DataFrame) – DataFrame of root IDs (rows) and neuron IDs (columns) with overlap in nodes as values.
summary (pandas.DataFrame) – If
short=True
: DataFrame of top hits only.
Examples
>>> from crantpy.utils.cave.segmentation import neuron_to_segments >>> import navis >>> # Assuming you have a neuron in the correct space >>> neuron = navis.TreeNeuron(...) >>> summary = neuron_to_segments(neuron, short=True)
- crantpy.utils.cave.segmentation.roots_to_supervoxels(neurons, clear_cache=False, progress=True, *, dataset=None)[source]#
Get supervoxels making up given neurons.
- Parameters:
neurons (Neurons = str | int | np.int64 | navis.BaseNeuron | Iterables of previous types | navis.NeuronList | NeuronCriteria)
clear_cache (bool) – If True, bypasses the cache and fetches a new volume.
progress (bool) – If True, show progress bar.
dataset (str) – The dataset to use. If not provided, uses the default dataset.
- Returns:
A dictionary mapping neuron IDs to lists of supervoxel IDs.
- Return type:
Examples
>>> from crantpy.utils.cave.segmentation import roots_to_supervoxels >>> roots_to_supervoxels([123456, 789012], dataset='latest') {123456: [1, 2, 3], 789012: [4, 5, 6]}
- crantpy.utils.cave.segmentation.snap_to_id(locs, id, snap_zero=False, search_radius=160, coordinates='nm', verbose=True, *, dataset=None)[source]#
Snap locations to the correct segmentation ID.
This function is useful for correcting imprecise coordinate annotations (e.g., from manual annotation, image registration, or synapse detection) to ensure they map to the expected neuron/segment.
- How it works:
Check segmentation ID at each location
For locations with wrong ID: search within radius for correct ID
Snap to closest voxel with correct ID
IMPORTANT - CloudVolume Coverage: This function requires CloudVolume segmentation data at the target locations. Locations outside CloudVolume’s spatial coverage (~360 µm cube) cannot be snapped and will be returned as [0, 0, 0].
- Parameters:
locs ((N, 3) array) – Array of x/y/z coordinates to snap.
id (int) – Expected/target segmentation ID at each location. Typically a root ID of the neuron of interest.
snap_zero (bool) – If False (default), we will not snap locations that map to segment ID 0 (i.e., no segmentation / background). Set to True to attempt snapping even for background locations.
search_radius (int) – Radius [nm] around a location to search for a voxel with the correct ID. Larger radius = more likely to find match but slower. Default 160 nm is usually sufficient for small annotation errors. Increase to 500-1000 nm for larger errors.
coordinates ("voxel" | "nm") – Coordinate system of locs. Default “nm” (nanometers).
verbose (bool) – If True, will print summary of snapping results and any errors encountered.
dataset (str) – The dataset to use. If not provided, uses the default dataset.
- Returns:
Snapped x/y/z locations guaranteed to map to the correct ID (or [0, 0, 0] for locations that couldn’t be snapped).
- Return type:
(N, 3) array
- Raises:
ValueError – If search region exceeds CloudVolume coverage for any location.
Examples
>>> from crantpy.utils.cave.segmentation import snap_to_id >>> import numpy as np >>> >>> # Example: Fix slightly misaligned synapse annotations >>> synapse_locs = np.array([ ... [100050, 50025, 3005], # Slightly off target ... [100150, 50125, 3015], ... ]) >>> target_neuron_id = 720575940621039145 >>> >>> # Snap to nearest voxel on target neuron >>> corrected_locs = snap_to_id( ... synapse_locs, ... id=target_neuron_id, ... search_radius=200, # Search within 200nm ... verbose=True ... ) >>> # Output: 2 of 2 locations needed to be snapped. >>> # Of these 0 locations could not be snapped...
>>> # Example: Quality control for traced neuron nodes >>> import navis >>> neuron = navis.TreeNeuron(...) # Your neuron reconstruction >>> expected_root_id = 720575940621039145 >>> >>> # Snap all nodes to ensure they're on the correct segment >>> corrected_nodes = snap_to_id( ... neuron.nodes[['x', 'y', 'z']].values, ... id=expected_root_id, ... search_radius=500, ... coordinates="nm" ... ) >>> >>> # Update neuron with corrected coordinates >>> neuron.nodes[['x', 'y', 'z']] = corrected_nodes
>>> # Example: Handle locations in background (ID 0) >>> locs_with_background = np.array([ ... [100000, 50000, 3000], # On neuron ... [999999, 999999, 9999], # In background (ID 0) ... ]) >>> >>> # By default, won't try to snap background locations >>> snapped = snap_to_id(locs_with_background, id=target_neuron_id) >>> # Background location will remain unchanged >>> >>> # Force snapping even for background (use with caution!) >>> snapped = snap_to_id( ... locs_with_background, ... id=target_neuron_id, ... snap_zero=True, # Try to snap background too ... search_radius=1000 # Larger search needed ... )
Notes
When to use this function: - Synapse annotation QC: Ensure synapses are on correct pre/postsynaptic neurons - Image registration errors: Fix coordinate misalignment after registration - Manual annotation cleanup: Correct imprecise manual annotations - Traced neuron validation: Ensure skeleton nodes are on correct segment
Performance considerations: - Each location requiring snapping fetches a small segmentation cutout - Larger search_radius = slower (more data to fetch and search) - Locations already on correct ID are very fast (no cutout needed) - For many locations, consider parallelizing or batching
Common issues: - “No voxels found in search region”: The target ID doesn’t exist
within search_radius. Try increasing search_radius or verify the expected ID is correct.
“Bounds exceed CloudVolume coverage”: Location is outside the ~360 µm region covered by CloudVolume. These locations cannot be snapped.
Many failures: Check if your locations and target ID are in the same coordinate space and if the neuron actually exists at those locations.
See also
locs_to_segments
Check which segment IDs are at given locations
get_segmentation_cutout
Get segmentation in a region
- crantpy.utils.cave.segmentation.supervoxels_to_roots(ids, timestamp='mat', clear_cache=False, batch_size=10000, stop_layer=8, retry=True, progress=True, *, dataset=None)[source]#
Get root(s) for given supervoxel(s).
- Parameters:
ids (IDs = str | int | np.int64 | Iterables of previous types) – Supervoxel ID(s) to find the root(s) for. Also works for e.g. L2 IDs.
timestamp (Timestamp = str | int | np.int64 | datetime | np.datetime64 | pd.Timestamp or str starting with "mat") – Get roots at given date (and time). Int must be unix timestamp. String must be ISO 8601 - e.g. ‘2021-11-15’. “mat” will use the timestamp of the most recent materialization. You can also use e.g. “mat_<version>” to get the root ID at a specific materialization.
clear_cache (bool) – If True, bypasses the cache and fetches a new volume.
batch_size (int) – Max number of supervoxel IDs per query. Reduce batch size if you experience time outs.
stop_layer (int) – Set e.g. to
2
to get L2 IDs instead of root IDs.retry (bool) – Whether to retry if a batched query fails.
progress (bool) – If True, show progress bar.
dataset (str) – The dataset to use. If not provided, uses the default dataset.
- Returns:
roots – Roots corresponding to supervoxels in x.
- Return type:
numpy array
Examples
>>> from crantpy.utils.cave.segmentation import supervoxels_to_roots >>> supervoxels_to_roots([123456, 789012], dataset='latest') [1, 2]
- crantpy.utils.cave.segmentation.update_ids(x, supervoxels=None, timestamp=None, stop_layer=2, progress=True, dataset=None, use_annotations=True, clear_cache=False)[source]#
Update root IDs to their latest versions.
This function prioritizes using supervoxel IDs from annotations when available, falling back to chunkedgraph methods only when necessary.
- Parameters:
x (Neurons or pd.DataFrame) – Root IDs to update. If DataFrame, must contain ‘root_id’ column and optionally ‘supervoxel_id’ column.
supervoxels (IDs, optional) – Supervoxel IDs corresponding to the root IDs. If provided, these will be used instead of looking up annotations.
timestamp (Timestamp, optional) – Target timestamp. Can be “mat” for latest materialization.
stop_layer (int, default 2) – Stop layer for chunkedgraph operations when supervoxels unavailable.
progress (bool, default True) – Whether to show progress bar.
dataset (str, optional) – Dataset to use.
use_annotations (bool, default True) – Whether to look up supervoxels from annotations when not provided.
clear_cache (bool, default False) – Whether to clear annotation cache.
- Returns:
DataFrame with columns: old_id, new_id, confidence, changed
- Return type:
pd.DataFrame
Examples
>>> from crantpy.utils.cave.segmentation import update_ids >>> update_ids([123456789, 987654321]) old_id new_id confidence changed 0 123456789 123456789 1.0 False 1 987654321 999999999 0.85 True
>>> # With supervoxels >>> update_ids([123456789], supervoxels=[111222333]) old_id new_id confidence changed 0 123456789 123456789 1.0 False