The baseline (bl) accessor#

The baseline accessor provides basic quality control metrics such as the number of cells, the number of genes, and the number of detected genes per cell.

segtraq.bl.baseline.genes_per_cell(sdata, transcript_key='transcripts', cell_key='cell_id', gene_key='feature_name')#

Calculates the number of unique genes detected per cell.

Parameters:
  • sdata (object) – An object containing spatial transcriptomics data with a points attribute.

  • transcript_key (str, optional) – The key to access the transcript data within sdata.points (default is “transcripts”).

  • cell_key (str, optional) – The column name in the transcript data representing cell identifiers (default is “cell_id”).

  • gene_key (str, optional) – The column name in the transcript data representing gene names (default is “feature_name”).

Returns:

A DataFrame with one row per cell, containing the cell identifier and the count of unique genes detected in that cell.

Return type:

pandas.DataFrame

segtraq.bl.baseline.morphological_features(sdata, shape_key: str = 'cell_boundaries', id_key: str = 'cell_id', features_to_compute: list = None, n_jobs: int = -1)#

Compute morphological features for cell shapes in a spatial transcriptomics dataset.

Parameters:
  • sdata (object) – Spatial data object containing cell shape information. Must have a .shapes attribute with geometries.

  • shape_key (str, optional) – Key in sdata.shapes specifying the geometry column (default is “cell_boundaries”).

  • id_key (str, optional) – Key in sdata.shapes specifying the unique cell identifier column (default is “cell_id”).

  • features_to_compute (list of str, optional) – List of morphological features to compute. If None, all available features are computed. Available features: “cell_area”, “perimeter”, “circularity”, “bbox_width”, “bbox_height”, “extent”, “solidity”, “convexity”, “elongation”, “eccentricity”, “compactness”, “sphericity”.

  • n_jobs (int, optional) – Number of parallel jobs to use for computation. -1 uses all available CPUs (default is -1).

Returns:

features – DataFrame containing the computed morphological features for each cell, indexed by id_key.

Return type:

pandas.DataFrame

Raises:

ValueError – If any requested feature in features_to_compute is not recognized.

Notes

  • Requires geopandas, shapely, numpy, pandas, and joblib.

  • Some features are proxies or approximations (e.g., “sphericity” uses “circularity”).

  • Invalid or null geometries are filtered out before computation.

segtraq.bl.baseline.num_cells(sdata: SpatialData, shape_key: str = 'cell_boundaries') int#

Counts the number of cells in the given SpatialData object based on the specified shape key.

Parameters:
  • sdata (sd.SpatialData) – The SpatialData object containing spatial information and cell boundaries.

  • shape_key (str, optional) – The key in the shapes attribute of sdata that corresponds to cell boundaries. Default is “cell_boundaries”.

Returns:

The number of cells found under the specified shape key.

Return type:

int

segtraq.bl.baseline.num_genes(sdata: SpatialData, transcript_key: str = 'transcripts', gene_key: str = 'feature_name') int#

Counts the number of unique genes in the given SpatialData object.

Parameters:
  • sdata (sd.SpatialData) – The SpatialData object containing gene information.

  • transcript_key (str, optional) – The key to access transcript data within the spatial data object. Default is “transcripts”.

  • gene_key (str, optional) – The key to access gene names within the transcript data. Default is “feature_name”.

Returns:

The number of unique genes found in the specified SpatialData object.

Return type:

int

segtraq.bl.baseline.num_transcripts(sdata: SpatialData, transcript_key: str = 'transcripts')#

Counts the total number of transcripts in the given SpatialData object.

Parameters:
  • sdata (sd.SpatialData) – The SpatialData object containing transcript information.

  • transcript_key (str, optional) – The key to access transcript data within the spatial data object. Default is “transcripts”.

Returns:

The total number of transcripts in the specified SpatialData object.

Return type:

int

segtraq.bl.baseline.perc_unassigned_transcripts(sdata: SpatialData, transcript_key: str = 'transcripts', cell_key: str = 'cell_id', unassigned_key: int = -1) float#

Calculates the proportion of unassigned transcripts in a SpatialData object.

Parameters:
  • sdata (sd.SpatialData) – The spatial data object containing transcript information.

  • transcript_key (str, optional) – The key to access transcript data within the spatial data object. Default is “transcripts”.

  • cell_key (str, optional) – The key to access cell assignment information within the transcript data. Default is “cell_id”.

  • unassigned_key (int, optional) – The value indicating an unassigned transcript. Default is -1.

Returns:

The fraction of transcripts that are unassigned.

Return type:

float

segtraq.bl.baseline.transcript_density(sdata: SpatialData, table_key: str = 'table', transcript_key: str = 'transcripts', cell_key: str = 'cell_id') DataFrame#

Calculates the transcript density for each cell in a SpatialData object. Transcript density is defined as the number of transcripts per unit area for each cell.

Parameters:
  • sdata (sd.SpatialData) – The SpatialData object containing spatial transcriptomics data.

  • table_key (str, optional) – The key to access the AnnData table from sdata.tables. Default is “table”.

  • transcript_key (str, optional) – The key in the transcript table indicating transcript identifiers. Default is “transcripts”.

  • cell_key (str, optional) – The key in the table indicating cell identifiers. Default is “cell_id”.

Returns:

A DataFrame with columns [cell_key, “transcript_density”], where “transcript_density” is the number of transcripts per unit area for each cell. Rows with missing values are dropped.

Return type:

pd.DataFrame

Notes

Requires that the input AnnData table contains a “cell_area” column in .obs.

segtraq.bl.baseline.transcripts_per_cell(sdata: SpatialData, transcript_key: str = 'transcripts', cell_key: str = 'cell_id') DataFrame#

Counts the number of transcripts assigned to each cell.

Parameters:
  • sdata (sd.SpatialData) – A SpatialData object containing transcript and cell assignment information.

  • transcript_key (str, optional) – The key in sdata.points corresponding to transcript data. Default is “transcripts”.

  • cell_key (str, optional) – The column name in the transcript data that contains cell assignment information. Default is “cell_id”.

Returns:

A DataFrame with two columns: the cell identifier (cell_key) and the corresponding transcript count (“transcript_count”).

Return type:

pd.DataFrame