Skip to content

cellects.utils.utilitarian

cellects.utils.utilitarian

Utility module with array operations, path manipulation, and progress tracking.

This module provides performance-optimized utilities for numerical comparisons using Numba, path string truncation, dictionary filtering, and iteration progress monitoring. It is designed for applications requiring efficient data processing pipelines with both low-level optimization and human-readable output formatting.

Classes:

Name Description
PercentAndTimeTracker : Track iteration progress with time estimates and percentage completion

Functions:

Name Description
greater_along_first_axis : Compare arrays element-wise along first axis
less_along_first_axis : Compare arrays element-wise along first axis
translate_dict : Convert standard dict to typed dict, filtering non-string values
reduce_path_len : Truncate long path strings with ellipsis insertion
find_nearest : Find array element closest to target value
Notes

Numba-optimized functions (greater_along_first_axis and less_along_first_axis) require input arrays of identical shape. String manipulation utilities include automatic type conversion. The progress tracker records initialization time for potential performance analysis.

PercentAndTimeTracker

Initialize a progress bar object to track and display the progress of an iteration.

Parameters:

Name Type Description Default
total int

The total number of iterations.

required
compute_with_elements_number bool

If True, create an element vector. Default is False.

False
core_number int

The number of cores to use. Default is 1.

1

Attributes:

Name Type Description
starting_time float

The time when the ProgressBar object is initialized.

total int

The total number of iterations.

current_step int

The current iteration step (initialized to 0).

element_vector (ndarray, optional)

A vector of zeros with the same length as total, created if compute_with_elements_number is True.

core_number int

The number of cores.

Examples:

>>> p = PercentAndTimeTracker(10)
>>> print(p.total)  # prints: 10
>>> p = PercentAndTimeTracker(10, compute_with_elements_number=True)
>>> print(p.element_vector)  # prints: [0 0 0 0 0 0 0 0 0 0]
Notes

Starting time is recorded for potential performance tracking.

Source code in src/cellects/utils/utilitarian.py
class PercentAndTimeTracker:
    """
    Initialize a progress bar object to track and display the progress of an iteration.

    Parameters
    ----------
    total : int
        The total number of iterations.
    compute_with_elements_number : bool, optional
        If True, create an element vector. Default is False.
    core_number : int, optional
        The number of cores to use. Default is 1.

    Attributes
    ----------
    starting_time : float
        The time when the ProgressBar object is initialized.
    total : int
        The total number of iterations.
    current_step : int
        The current iteration step (initialized to 0).
    element_vector : numpy.ndarray, optional
        A vector of zeros with the same length as `total`, created if
        `compute_with_elements_number` is True.
    core_number : int
        The number of cores.

    Examples
    --------
    >>> p = PercentAndTimeTracker(10)
    >>> print(p.total)  # prints: 10
    >>> p = PercentAndTimeTracker(10, compute_with_elements_number=True)
    >>> print(p.element_vector)  # prints: [0 0 0 0 0 0 0 0 0 0]

    Notes
    -----
    Starting time is recorded for potential performance tracking.

    """
    def __init__(self, total: int, compute_with_elements_number: bool=False, core_number:int =1):
        """Initialize an instance of the class.

        This constructor sets up the initial attributes including
        a starting time, total value, current step, and an optional
        element vector if ``compute_with_elements_number`` is set to True.
        The core number can be specified, defaulting to 1.

        Parameters
        ----------
        total : int
            The total number of elements or steps.
        compute_with_elements_number : bool, optional
            If True, initialize an element vector of zeros. Defaults to False.
        core_number : int, optional
            The number of cores to use. Defaults to 1.

        Attributes
        ----------
        starting_time : float
            The time of instantiation.
        total : int
            The total number of elements or steps.
        current_step : int
            The current step in the process.
        element_vector : ndarray of int64, optional
            A vector initialized with zeros. Exists if ``compute_with_elements_number`` is True.
        core_number : int
            The number of cores to use.
        """
        self.starting_time = default_timer()
        self.total = total
        self.current_step = 0
        if compute_with_elements_number:
            self.element_vector = np.zeros(total, dtype=np.int64)
        self.core_number = core_number

    def get_progress(self, step=None, element_number=None):
        """
        Calculate and update the current progress, including elapsed time and estimated remaining time.

        This function updates the internal state of the object to reflect progress
        based on the current step and element number. It calculates elapsed time,
        estimates total time, and computes the estimated time of arrival (ETA).

        Parameters
        ----------
        step : int or None, optional
            The current step of the process. If ``None``, the internal counter is incremented.
        element_number : int or None, optional
            The current element number. If ``None``, no update is made to the element vector.

        Returns
        -------
        tuple
            A tuple containing:
            - `int`: The current progress percentage.
            - `str`: A string with the ETA and remaining time.

        Raises
        ------
        ValueError
            If ``step`` or ``element_number`` are invalid.

        Notes
        -----
        The function uses linear regression to estimate future progress values when the current step is sufficiently large.

        Examples
        --------
        >>> PercentAndTimeTracker(10, compute_with_elements_number=True).get_progress(9, 5)
        (0, ', wait to get a more accurate ETA...')
        """
        if step is not None:
            self.current_step = step
        if element_number is not None:
            self.element_vector[self.current_step] = element_number

        if self.current_step > 0:
            elapsed_time = default_timer() - self.starting_time
            if element_number is None or element_number == 0 or self.current_step < 15:
                if self.current_step < self.core_number:
                    current_prop = self.core_number / self.total
                else:
                    current_prop = (self.current_step + 1) / self.total
            else:
                x_mat = np.array([np.ones(self.current_step - 4), np.arange(5, self.current_step + 1)]).T
                coefs = np.linalg.lstsq(x_mat, self.element_vector[5:self.current_step + 1], rcond=-1)[0]
                self.element_vector = coefs[0] + (np.arange(self.total) * coefs[1])
                self.element_vector[self.element_vector < 0] = 0
                current_prop = self.element_vector[:self.current_step + 1].sum() / self.element_vector.sum()

            total_time = elapsed_time / current_prop
            current_prop = int(np.round(current_prop * 100))
            remaining_time_s = total_time - elapsed_time

            local_time = time.localtime()
            local_m = int(time.strftime("%M", local_time))
            local_h = int(time.strftime("%H", local_time))
            remaining_time_h = remaining_time_s // 3600
            reste_s = remaining_time_s % 3600
            reste_m = reste_s // 60
            # + str(int(np.floor(reste_s % 60))) + "S"
            hours = int(np.floor(remaining_time_h))
            minutes = int(np.floor(reste_m))

            if (local_m + minutes) < 60:
                eta_m = local_m + minutes
            else:
                eta_m = (local_m + minutes) % 60
                local_h += 1

            if (local_h + hours) < 24:
                output = current_prop, f", ETA {local_h + hours}:{eta_m} ({hours}h{minutes}m left)"
            else:
                days = (local_h + hours) // 24
                eta_h = (local_h + hours) % 24
                eta_d = time.strftime("%m", local_time) + "/" + str(int(time.strftime("%d", local_time)) + days)
                output = current_prop, f", ETA {eta_d}d {eta_h}:{eta_m} ({hours}h{minutes}m left)"
            # return current_prop, str(local_h + hours) + ":" + str(local_m + minutes) + "(" + str()
        else:
            output = int(np.round(100 / self.total)), ", wait..."
        if step is None:
            self.current_step += 1
        if element_number is not None:
            if self.current_step < 50:
                output = int(0), ", wait to get a more accurate ETA..."
        return output

__init__(total, compute_with_elements_number=False, core_number=1)

Initialize an instance of the class.

This constructor sets up the initial attributes including a starting time, total value, current step, and an optional element vector if compute_with_elements_number is set to True. The core number can be specified, defaulting to 1.

Parameters:

Name Type Description Default
total int

The total number of elements or steps.

required
compute_with_elements_number bool

If True, initialize an element vector of zeros. Defaults to False.

False
core_number int

The number of cores to use. Defaults to 1.

1

Attributes:

Name Type Description
starting_time float

The time of instantiation.

total int

The total number of elements or steps.

current_step int

The current step in the process.

element_vector ndarray of int64, optional

A vector initialized with zeros. Exists if compute_with_elements_number is True.

core_number int

The number of cores to use.

Source code in src/cellects/utils/utilitarian.py
def __init__(self, total: int, compute_with_elements_number: bool=False, core_number:int =1):
    """Initialize an instance of the class.

    This constructor sets up the initial attributes including
    a starting time, total value, current step, and an optional
    element vector if ``compute_with_elements_number`` is set to True.
    The core number can be specified, defaulting to 1.

    Parameters
    ----------
    total : int
        The total number of elements or steps.
    compute_with_elements_number : bool, optional
        If True, initialize an element vector of zeros. Defaults to False.
    core_number : int, optional
        The number of cores to use. Defaults to 1.

    Attributes
    ----------
    starting_time : float
        The time of instantiation.
    total : int
        The total number of elements or steps.
    current_step : int
        The current step in the process.
    element_vector : ndarray of int64, optional
        A vector initialized with zeros. Exists if ``compute_with_elements_number`` is True.
    core_number : int
        The number of cores to use.
    """
    self.starting_time = default_timer()
    self.total = total
    self.current_step = 0
    if compute_with_elements_number:
        self.element_vector = np.zeros(total, dtype=np.int64)
    self.core_number = core_number

get_progress(step=None, element_number=None)

Calculate and update the current progress, including elapsed time and estimated remaining time.

This function updates the internal state of the object to reflect progress based on the current step and element number. It calculates elapsed time, estimates total time, and computes the estimated time of arrival (ETA).

Parameters:

Name Type Description Default
step int or None

The current step of the process. If None, the internal counter is incremented.

None
element_number int or None

The current element number. If None, no update is made to the element vector.

None

Returns:

Type Description
tuple

A tuple containing: - int: The current progress percentage. - str: A string with the ETA and remaining time.

Raises:

Type Description
ValueError

If step or element_number are invalid.

Notes

The function uses linear regression to estimate future progress values when the current step is sufficiently large.

Examples:

>>> PercentAndTimeTracker(10, compute_with_elements_number=True).get_progress(9, 5)
(0, ', wait to get a more accurate ETA...')
Source code in src/cellects/utils/utilitarian.py
def get_progress(self, step=None, element_number=None):
    """
    Calculate and update the current progress, including elapsed time and estimated remaining time.

    This function updates the internal state of the object to reflect progress
    based on the current step and element number. It calculates elapsed time,
    estimates total time, and computes the estimated time of arrival (ETA).

    Parameters
    ----------
    step : int or None, optional
        The current step of the process. If ``None``, the internal counter is incremented.
    element_number : int or None, optional
        The current element number. If ``None``, no update is made to the element vector.

    Returns
    -------
    tuple
        A tuple containing:
        - `int`: The current progress percentage.
        - `str`: A string with the ETA and remaining time.

    Raises
    ------
    ValueError
        If ``step`` or ``element_number`` are invalid.

    Notes
    -----
    The function uses linear regression to estimate future progress values when the current step is sufficiently large.

    Examples
    --------
    >>> PercentAndTimeTracker(10, compute_with_elements_number=True).get_progress(9, 5)
    (0, ', wait to get a more accurate ETA...')
    """
    if step is not None:
        self.current_step = step
    if element_number is not None:
        self.element_vector[self.current_step] = element_number

    if self.current_step > 0:
        elapsed_time = default_timer() - self.starting_time
        if element_number is None or element_number == 0 or self.current_step < 15:
            if self.current_step < self.core_number:
                current_prop = self.core_number / self.total
            else:
                current_prop = (self.current_step + 1) / self.total
        else:
            x_mat = np.array([np.ones(self.current_step - 4), np.arange(5, self.current_step + 1)]).T
            coefs = np.linalg.lstsq(x_mat, self.element_vector[5:self.current_step + 1], rcond=-1)[0]
            self.element_vector = coefs[0] + (np.arange(self.total) * coefs[1])
            self.element_vector[self.element_vector < 0] = 0
            current_prop = self.element_vector[:self.current_step + 1].sum() / self.element_vector.sum()

        total_time = elapsed_time / current_prop
        current_prop = int(np.round(current_prop * 100))
        remaining_time_s = total_time - elapsed_time

        local_time = time.localtime()
        local_m = int(time.strftime("%M", local_time))
        local_h = int(time.strftime("%H", local_time))
        remaining_time_h = remaining_time_s // 3600
        reste_s = remaining_time_s % 3600
        reste_m = reste_s // 60
        # + str(int(np.floor(reste_s % 60))) + "S"
        hours = int(np.floor(remaining_time_h))
        minutes = int(np.floor(reste_m))

        if (local_m + minutes) < 60:
            eta_m = local_m + minutes
        else:
            eta_m = (local_m + minutes) % 60
            local_h += 1

        if (local_h + hours) < 24:
            output = current_prop, f", ETA {local_h + hours}:{eta_m} ({hours}h{minutes}m left)"
        else:
            days = (local_h + hours) // 24
            eta_h = (local_h + hours) % 24
            eta_d = time.strftime("%m", local_time) + "/" + str(int(time.strftime("%d", local_time)) + days)
            output = current_prop, f", ETA {eta_d}d {eta_h}:{eta_m} ({hours}h{minutes}m left)"
        # return current_prop, str(local_h + hours) + ":" + str(local_m + minutes) + "(" + str()
    else:
        output = int(np.round(100 / self.total)), ", wait..."
    if step is None:
        self.current_step += 1
    if element_number is not None:
        if self.current_step < 50:
            output = int(0), ", wait to get a more accurate ETA..."
    return output

find_nearest(array, value)

Find the element in an array that is closest to a given value.

Parameters:

Name Type Description Default
array array_like

Input array. Can be any array-like data structure.

required
value int or float

The value to find the closest element to.

required

Returns:

Type Description
obj:`array` type

The element in array that is closest to value.

Examples:

>>> find_nearest([1, 2, 3, 4], 2.5)
2
Source code in src/cellects/utils/utilitarian.py
def find_nearest(array: NDArray, value):
    """
    Find the element in an array that is closest to a given value.

    Parameters
    ----------
    array : array_like
        Input array. Can be any array-like data structure.
    value : int or float
        The value to find the closest element to.

    Returns
    -------
    :obj:`array` type
        The element in `array` that is closest to `value`.

    Examples
    --------
    >>> find_nearest([1, 2, 3, 4], 2.5)
    2
    """
    array = np.asarray(array)
    idx = (np.abs(array - value)).argmin()
    return array[idx]

greater_along_first_axis(array_in_1, array_in_2)

Compare two arrays along the first axis and store the result in a third array.

This function performs a comparison between two input arrays along their first axis and stores the result in a third array. The comparison is made to determine which elements of each row of the first array are greater than the elements(s) corresponding to that row in the second array.

Parameters:

Name Type Description Default
array_in_1 ndarray

First input array.

required
array_in_2 ndarray

Second input array.

required

Returns:

Name Type Description
out ndarray of uint8

Boolean ndarray with same shape as input arrays, containing the result of element-wise comparison.

Examples:

>>> array_in_1 = np.array([[2, 4], [5, 8]])
>>> array_in_2 = np.array([3, 6])
>>> array_out = greater_along_first_axis(array_in_1, array_in_2)
>>> print(array_out)
[[0 1]
 [0 1]]
Source code in src/cellects/utils/utilitarian.py
@njit()
def greater_along_first_axis(array_in_1: NDArray, array_in_2: NDArray) -> NDArray[np.uint8]:
    """
    Compare two arrays along the first axis and store the result in a third array.

    This function performs a comparison between two input arrays
    along their first axis and stores the result in a third array. The comparison is
    made to determine which elements of each row of the first array are greater than
    the elements(s) corresponding to that row in the second array.

    Parameters
    ----------
    array_in_1 : ndarray
        First input array.
    array_in_2 : ndarray
        Second input array.

    Returns
    -------
    out : ndarray of uint8
        Boolean ndarray with same shape as input arrays,
        containing the result of element-wise comparison.

    Examples
    --------
    >>> array_in_1 = np.array([[2, 4], [5, 8]])
    >>> array_in_2 = np.array([3, 6])
    >>> array_out = greater_along_first_axis(array_in_1, array_in_2)
    >>> print(array_out)
    [[0 1]
     [0 1]]
    """
    array_out = np.zeros(array_in_1.shape, dtype=np.uint8)
    for i, value in enumerate(array_in_2):
        array_out[i, ...] = array_in_1[i, ...] > value
    return array_out

insensitive_glob(pattern)

Generates a glob pattern that matches both lowercase and uppercase letters.

Parameters:

Name Type Description Default
pattern str

The glob pattern to be made case-insensitive.

required

Returns:

Type Description
str

A new glob pattern that will match both lowercase and uppercase letters.

Examples:

>>> insensitive_glob('*.TXT')
Source code in src/cellects/utils/utilitarian.py
def insensitive_glob(pattern: str):
    """
    Generates a glob pattern that matches both lowercase and uppercase letters.

    Parameters
    ----------
    pattern : str
        The glob pattern to be made case-insensitive.

    Returns
    -------
    str
        A new glob pattern that will match both lowercase and uppercase letters.

    Examples
    --------
    >>> insensitive_glob('*.TXT')
    """
    def either(c):
        return '[%s%s]' % (c.lower(), c.upper()) if c.isalpha() else c
    return glob(''.join(map(either, pattern)))

less_along_first_axis(array_in_1, array_in_2)

Compare two arrays along the first axis and store the result in a third array.

This function performs a comparison between two input arrays along their first axis and stores the result in a third array. The comparison is made to determine which elements of each row of the first array are lesser than the elements(s) corresponding to that row in the second array.

Parameters:

Name Type Description Default
array_in_1 ndarray

The first input array.

required
array_in_2 ndarray

The second input array.

required

Returns:

Type Description
ndarray of uint8

A boolean array where each element is True if the corresponding element in array_in_1 is lesser than the corresponding element in array_in_2, and False otherwise.

Examples:

>>> array_in_1 = np.array([[2, 4], [5, 8]])
>>> array_in_2 = np.array([3, 6])
>>> array_out = less_along_first_axis(array_in_1, array_in_2)
>>> print(array_out)
[[1 0]
 [1 0]]
Source code in src/cellects/utils/utilitarian.py
@njit()
def less_along_first_axis(array_in_1: NDArray, array_in_2: NDArray) -> NDArray[np.uint8]:
    """
    Compare two arrays along the first axis and store the result in a third array.

    This function performs a comparison between two input arrays
    along their first axis and stores the result in a third array. The comparison is
    made to determine which elements of each row of the first array are lesser than
    the elements(s) corresponding to that row in the second array.

    Parameters
    ----------
    array_in_1 : ndarray
        The first input array.
    array_in_2 : ndarray
        The second input array.

    Returns
    -------
    ndarray of uint8
        A boolean array where each element is `True` if the corresponding
        element in `array_in_1` is lesser than the corresponding element
        in `array_in_2`, and `False` otherwise.

    Examples
    --------
    >>> array_in_1 = np.array([[2, 4], [5, 8]])
    >>> array_in_2 = np.array([3, 6])
    >>> array_out = less_along_first_axis(array_in_1, array_in_2)
    >>> print(array_out)
    [[1 0]
     [1 0]]
    """
    array_out = np.zeros(array_in_1.shape, dtype=np.uint8)
    for i, value in enumerate(array_in_2):
        array_out[i, ...] = array_in_1[i, ...] < value
    return array_out

reduce_path_len(pathway, to_start, from_end)

Reduce the length of a given pathway string by truncating it from both ends.

The function is used to shorten the pathway string if its length exceeds a calculated maximum size. If it does, the function truncates it from both ends, inserting an ellipsis ("...") in between.

Parameters:

Name Type Description Default
pathway str

The pathway string to be reduced. If an integer is provided, it will be converted into a string.

required
to_start int

Number of characters from the start to keep in the pathway string.

required
from_end int

Number of characters from the end to keep in the pathway string.

required

Returns:

Type Description
str

The reduced version of the pathway string. If truncation is not necessary, returns the original pathway string.

Examples:

>>> reduce_path_len("example/complicated/path/to/resource", 8, 12)
'example/.../to/resource'
Source code in src/cellects/utils/utilitarian.py
def reduce_path_len(pathway: str, to_start: int, from_end: int) -> str:
    """
    Reduce the length of a given pathway string by truncating it from both ends.

    The function is used to shorten the `pathway` string if its length exceeds
    a calculated maximum size. If it does, the function truncates it from both ends,
    inserting an ellipsis ("...") in between.

    Parameters
    ----------
    pathway : str
        The pathway string to be reduced. If an integer is provided,
        it will be converted into a string.
    to_start : int
        Number of characters from the start to keep in the pathway string.
    from_end : int
        Number of characters from the end to keep in the pathway string.

    Returns
    -------
    str
        The reduced version of the `pathway` string. If truncation is not necessary,
        returns the original pathway string.

    Examples
    --------
    >>> reduce_path_len("example/complicated/path/to/resource", 8, 12)
    'example/.../to/resource'
    """
    max_size = to_start + from_end + 3
    if len(pathway) > max_size:
        pathway = pathway[:to_start] + "..." + pathway[-from_end:]
    return pathway

remove_coordinates(arr1, arr2)

Remove coordinates from arr1 that are present in arr2.

Given two arrays of coordinates, remove rows from the first array that match any row in the second array.

Parameters:

Name Type Description Default
arr1 ndarray of shape (n, 2)

Array containing coordinates to filter.

required
arr2 ndarray of shape (m, 2)

Array containing coordinates to match for removal.

required

Returns:

Type Description
ndarray of shape (k, 2)

Array with coordinates from arr1 that are not in arr2.

Examples:

>>> arr1 = np.array([[1, 2], [3, 4]])
>>> arr2 = np.array([[3, 4]])
>>> remove_coordinates(arr1, arr2)
array([[1, 2],
       [3, 4]])
>>> arr1 = np.array([[1, 2], [3, 4]])
>>> arr2 = np.array([[3, 2], [1, 4]])
>>> remove_coordinates(arr1, arr2)
array([[1, 2],
       [3, 4]])
>>> arr1 = np.array([[1, 2], [3, 4]])
>>> arr2 = np.array([[3, 2], [1, 2]])
>>> remove_coordinates(arr1, arr2)
array([[3, 4]])
>>> arr1 = np.arange(200).reshape(100, 2)
>>> arr2 = np.array([[196, 197], [198, 199]])
>>> new_arr1 = remove_coordinates(arr1, arr2)
>>> new_arr1.shape
(98, 2)
Source code in src/cellects/utils/utilitarian.py
def remove_coordinates(arr1: NDArray, arr2: NDArray) -> NDArray:
    """
    Remove coordinates from `arr1` that are present in `arr2`.

    Given two arrays of coordinates, remove rows from the first array
    that match any row in the second array.

    Parameters
    ----------
    arr1 : ndarray of shape (n, 2)
        Array containing coordinates to filter.
    arr2 : ndarray of shape (m, 2)
        Array containing coordinates to match for removal.

    Returns
    -------
    ndarray of shape (k, 2)
        Array with coordinates from `arr1` that are not in `arr2`.

    Examples
    --------
    >>> arr1 = np.array([[1, 2], [3, 4]])
    >>> arr2 = np.array([[3, 4]])
    >>> remove_coordinates(arr1, arr2)
    array([[1, 2],
           [3, 4]])

    >>> arr1 = np.array([[1, 2], [3, 4]])
    >>> arr2 = np.array([[3, 2], [1, 4]])
    >>> remove_coordinates(arr1, arr2)
    array([[1, 2],
           [3, 4]])

    >>> arr1 = np.array([[1, 2], [3, 4]])
    >>> arr2 = np.array([[3, 2], [1, 2]])
    >>> remove_coordinates(arr1, arr2)
    array([[3, 4]])

    >>> arr1 = np.arange(200).reshape(100, 2)
    >>> arr2 = np.array([[196, 197], [198, 199]])
    >>> new_arr1 = remove_coordinates(arr1, arr2)
    >>> new_arr1.shape
    (98, 2)
    """
    if arr2.shape[0] == 0:
        return arr1
    else:
        if arr1.shape[1] != 2 or arr2.shape[1] != 2:
            raise ValueError("Both arrays must have shape (n, 2)")
        c_to_keep = ~np.all(arr1 == arr2[0], axis=1)
        for row in arr2[1:]:
            c_to_keep *= ~np.all(arr1 == row, axis=1)
        return arr1[c_to_keep]

smallest_memory_array(array_object, array_type='uint')

Convert input data to the smallest possible NumPy array type that can hold it.

Parameters:

Name Type Description Default
array_object ndarray or list of lists

The input data to be converted.

required
array_type str

The type of NumPy data type to use ('uint').

is 'uint'

Returns:

Type Description
ndarray

A NumPy array of the smallest data type that can hold all values in array_object.

Examples:

>>> import numpy as np
>>> array = [[1, 2], [3, 4]]
>>> smallest_memory_array(array)
array([[1, 2],
       [3, 4]], dtype=np.uint8)
>>> array = [[1000, 2000], [3000, 4000]]
>>> smallest_memory_array(array)
array([[1000, 2000],
       [3000, 4000]], dtype=uint16)
>>> array = [[2**31, 2**32], [2**33, 2**34]]
>>> smallest_memory_array(array)
array([[         2147483648,          4294967296],
       [         8589934592,        17179869184]], dtype=uint64)
Source code in src/cellects/utils/utilitarian.py
def smallest_memory_array(array_object, array_type='uint') -> NDArray:
    """
    Convert input data to the smallest possible NumPy array type that can hold it.

    Parameters
    ----------
    array_object : ndarray or list of lists
        The input data to be converted.
    array_type : str, optional, default is 'uint'
        The type of NumPy data type to use ('uint').

    Returns
    -------
    ndarray
        A NumPy array of the smallest data type that can hold all values in `array_object`.

    Examples
    --------
    >>> import numpy as np
    >>> array = [[1, 2], [3, 4]]
    >>> smallest_memory_array(array)
    array([[1, 2],
           [3, 4]], dtype=np.uint8)

    >>> array = [[1000, 2000], [3000, 4000]]
    >>> smallest_memory_array(array)
    array([[1000, 2000],
           [3000, 4000]], dtype=uint16)

    >>> array = [[2**31, 2**32], [2**33, 2**34]]
    >>> smallest_memory_array(array)
    array([[         2147483648,          4294967296],
           [         8589934592,        17179869184]], dtype=uint64)
    """
    if isinstance(array_object, list):
        array_object = np.array(array_object)
    if isinstance(array_object, np.ndarray):
        value_max = array_object.max()
    else:
        if len(array_object[0]) > 0:
            value_max = np.max((array_object[0].max(), array_object[1].max()))
        else:
            value_max = 0

    if array_type == 'uint':
        if value_max <= np.iinfo(np.uint8).max:
            array_object = np.array(array_object, dtype=np.uint8)
        elif value_max <= np.iinfo(np.uint16).max:
            array_object = np.array(array_object, dtype=np.uint16)
        elif value_max <= np.iinfo(np.uint32).max:
            array_object = np.array(array_object, dtype=np.uint32)
        else:
            array_object = np.array(array_object, dtype=np.uint64)
    return array_object

split_dict(c_space_dict)

Split a dictionary into two dictionaries based on specific criteria and return their keys.

Split the input dictionary c_space_dict into two dictionaries: one for items not ending with '2' and another where the key is truncated by removing its last character if it does end with '2'. Additionally, return the keys that have been processed.

Parameters:

Name Type Description Default
c_space_dict dict

The dictionary to be split. Expected keys are strings and values can be any type.

required

Returns:

Name Type Description
first_dict dict

Dictionary containing items from c_space_dict whose keys do not end with '2'.

second_dict dict

Dictionary containing items from c_space_dict whose keys end with '2', where the key is truncated by removing its last character.

c_spaces list

List of keys from c_space_dict that have been processed.

Raises:

Type Description
None
Notes

No critical information to share.

Examples:

>>> c_space_dict = {'key1': 10, 'key2': 20, 'logical': 30}
>>> first_dict, second_dict, c_spaces = split_dict(c_space_dict)
>>> print(first_dict)
{'key1': 10}
>>> print(second_dict)
{'key': 20}
>>> print(c_spaces)
['key1', 'key']
Source code in src/cellects/utils/utilitarian.py
def split_dict(c_space_dict: dict) -> Tuple[Dict, Dict, list]:
    """

    Split a dictionary into two dictionaries based on specific criteria and return their keys.

    Split the input dictionary `c_space_dict` into two dictionaries: one for items not
    ending with '2' and another where the key is truncated by removing its last
    character if it does end with '2'. Additionally, return the keys that have been
    processed.

    Parameters
    ----------
    c_space_dict : dict
        The dictionary to be split. Expected keys are strings and values can be any type.

    Returns
    -------
    first_dict : dict
        Dictionary containing items from `c_space_dict` whose keys do not end with '2'.
    second_dict : dict
        Dictionary containing items from `c_space_dict` whose keys end with '2',
        where the key is truncated by removing its last character.
    c_spaces : list
        List of keys from `c_space_dict` that have been processed.

    Raises
    ------
    None

    Notes
    -----
    No critical information to share.

    Examples
    --------
    >>> c_space_dict = {'key1': 10, 'key2': 20, 'logical': 30}
    >>> first_dict, second_dict, c_spaces = split_dict(c_space_dict)
    >>> print(first_dict)
    {'key1': 10}
    >>> print(second_dict)
    {'key': 20}
    >>> print(c_spaces)
    ['key1', 'key']

    """
    first_dict = Dict()
    second_dict = Dict()
    c_spaces = []
    for k, v in c_space_dict.items():
        if k == 'PCA' or k != 'logical' and np.absolute(v).sum() > 0:
            if k[-1] != '2':
                first_dict[k] = List(v)
                c_spaces.append(k)
            else:
                second_dict[k[:-1]] = List(v)
                c_spaces.append(k[:-1])
    return first_dict, second_dict, c_spaces

translate_dict(old_dict)

Translate a dictionary to a typed dictionary and filter out non-string values.

Parameters:

Name Type Description Default
old_dict dict

The input dictionary that may contain non-string values

required

Returns:

Name Type Description
numba_dict Dict

A typed dictionary containing only the items from old_dict where the value is not a string

Examples:

>>> result = translate_dict({'a': 1., 'b': 'string', 'c': 2.0})
>>> print(result)
{a: 1.0, c: 2.0}
Source code in src/cellects/utils/utilitarian.py
def translate_dict(old_dict: dict) -> Dict:
    """
    Translate a dictionary to a typed dictionary and filter out non-string values.

    Parameters
    ----------
    old_dict : dict
        The input dictionary that may contain non-string values

    Returns
    -------
    numba_dict : Dict
        A typed dictionary containing only the items from `old_dict` where the value is not a string

    Examples
    --------
    >>> result = translate_dict({'a': 1., 'b': 'string', 'c': 2.0})
    >>> print(result)
    {a: 1.0, c: 2.0}
    """
    numba_dict = Dict()
    for k, v in old_dict.items():
        if not isinstance(v, str):
            if isinstance(v, list):
                v = List(v)
            numba_dict[k] = v
    return numba_dict