cellects.utils.utilitarian
cellects.utils.utilitarian
Utility module with array operations, path manipulation, and progress tracking.
This module provides performance-optimized utilities for numerical comparisons using Numba, path string truncation, dictionary filtering, and iteration progress monitoring. It is designed for applications requiring efficient data processing pipelines with both low-level optimization and human-readable output formatting.
Classes:
| Name | Description |
|---|---|
PercentAndTimeTracker : Track iteration progress with time estimates and percentage completion |
|
Functions:
| Name | Description |
|---|---|
greater_along_first_axis : Compare arrays element-wise along first axis |
|
less_along_first_axis : Compare arrays element-wise along first axis |
|
translate_dict : Convert standard dict to typed dict, filtering non-string values |
|
reduce_path_len : Truncate long path strings with ellipsis insertion |
|
find_nearest : Find array element closest to target value |
|
Notes
Numba-optimized functions (greater_along_first_axis and less_along_first_axis) require input arrays of identical shape. String manipulation utilities include automatic type conversion. The progress tracker records initialization time for potential performance analysis.
PercentAndTimeTracker
Initialize a progress bar object to track and display the progress of an iteration.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
total
|
int
|
The total number of iterations. |
required |
compute_with_elements_number
|
bool
|
If True, create an element vector. Default is False. |
False
|
core_number
|
int
|
The number of cores to use. Default is 1. |
1
|
Attributes:
| Name | Type | Description |
|---|---|---|
starting_time |
float
|
The time when the ProgressBar object is initialized. |
total |
int
|
The total number of iterations. |
current_step |
int
|
The current iteration step (initialized to 0). |
element_vector |
(ndarray, optional)
|
A vector of zeros with the same length as |
core_number |
int
|
The number of cores. |
Examples:
>>> p = PercentAndTimeTracker(10)
>>> print(p.total) # prints: 10
>>> p = PercentAndTimeTracker(10, compute_with_elements_number=True)
>>> print(p.element_vector) # prints: [0 0 0 0 0 0 0 0 0 0]
Notes
Starting time is recorded for potential performance tracking.
Source code in src/cellects/utils/utilitarian.py
266 267 268 269 270 271 272 273 274 275 276 277 278 279 280 281 282 283 284 285 286 287 288 289 290 291 292 293 294 295 296 297 298 299 300 301 302 303 304 305 306 307 308 309 310 311 312 313 314 315 316 317 318 319 320 321 322 323 324 325 326 327 328 329 330 331 332 333 334 335 336 337 338 339 340 341 342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 | |
__init__(total, compute_with_elements_number=False, core_number=1)
Initialize an instance of the class.
This constructor sets up the initial attributes including
a starting time, total value, current step, and an optional
element vector if compute_with_elements_number is set to True.
The core number can be specified, defaulting to 1.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
total
|
int
|
The total number of elements or steps. |
required |
compute_with_elements_number
|
bool
|
If True, initialize an element vector of zeros. Defaults to False. |
False
|
core_number
|
int
|
The number of cores to use. Defaults to 1. |
1
|
Attributes:
| Name | Type | Description |
|---|---|---|
starting_time |
float
|
The time of instantiation. |
total |
int
|
The total number of elements or steps. |
current_step |
int
|
The current step in the process. |
element_vector |
ndarray of int64, optional
|
A vector initialized with zeros. Exists if |
core_number |
int
|
The number of cores to use. |
Source code in src/cellects/utils/utilitarian.py
get_progress(step=None, element_number=None)
Calculate and update the current progress, including elapsed time and estimated remaining time.
This function updates the internal state of the object to reflect progress based on the current step and element number. It calculates elapsed time, estimates total time, and computes the estimated time of arrival (ETA).
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
step
|
int or None
|
The current step of the process. If |
None
|
element_number
|
int or None
|
The current element number. If |
None
|
Returns:
| Type | Description |
|---|---|
tuple
|
A tuple containing:
- |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
Notes
The function uses linear regression to estimate future progress values when the current step is sufficiently large.
Examples:
>>> PercentAndTimeTracker(10, compute_with_elements_number=True).get_progress(9, 5)
(0, ', wait to get a more accurate ETA...')
Source code in src/cellects/utils/utilitarian.py
342 343 344 345 346 347 348 349 350 351 352 353 354 355 356 357 358 359 360 361 362 363 364 365 366 367 368 369 370 371 372 373 374 375 376 377 378 379 380 381 382 383 384 385 386 387 388 389 390 391 392 393 394 395 396 397 398 399 400 401 402 403 404 405 406 407 408 409 410 411 412 413 414 415 416 417 418 419 420 421 422 423 424 425 426 427 428 429 430 431 432 | |
find_nearest(array, value)
Find the element in an array that is closest to a given value.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
array
|
array_like
|
Input array. Can be any array-like data structure. |
required |
value
|
int or float
|
The value to find the closest element to. |
required |
Returns:
| Type | Description |
|---|---|
obj:`array` type
|
The element in |
Examples:
Source code in src/cellects/utils/utilitarian.py
greater_along_first_axis(array_in_1, array_in_2)
Compare two arrays along the first axis and store the result in a third array.
This function performs a comparison between two input arrays along their first axis and stores the result in a third array. The comparison is made to determine which elements of each row of the first array are greater than the elements(s) corresponding to that row in the second array.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
array_in_1
|
ndarray
|
First input array. |
required |
array_in_2
|
ndarray
|
Second input array. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
out |
ndarray of uint8
|
Boolean ndarray with same shape as input arrays, containing the result of element-wise comparison. |
Examples:
>>> array_in_1 = np.array([[2, 4], [5, 8]])
>>> array_in_2 = np.array([3, 6])
>>> array_out = greater_along_first_axis(array_in_1, array_in_2)
>>> print(array_out)
[[0 1]
[0 1]]
Source code in src/cellects/utils/utilitarian.py
insensitive_glob(pattern)
Generates a glob pattern that matches both lowercase and uppercase letters.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pattern
|
str
|
The glob pattern to be made case-insensitive. |
required |
Returns:
| Type | Description |
|---|---|
str
|
A new glob pattern that will match both lowercase and uppercase letters. |
Examples:
Source code in src/cellects/utils/utilitarian.py
less_along_first_axis(array_in_1, array_in_2)
Compare two arrays along the first axis and store the result in a third array.
This function performs a comparison between two input arrays along their first axis and stores the result in a third array. The comparison is made to determine which elements of each row of the first array are lesser than the elements(s) corresponding to that row in the second array.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
array_in_1
|
ndarray
|
The first input array. |
required |
array_in_2
|
ndarray
|
The second input array. |
required |
Returns:
| Type | Description |
|---|---|
ndarray of uint8
|
A boolean array where each element is |
Examples:
>>> array_in_1 = np.array([[2, 4], [5, 8]])
>>> array_in_2 = np.array([3, 6])
>>> array_out = less_along_first_axis(array_in_1, array_in_2)
>>> print(array_out)
[[1 0]
[1 0]]
Source code in src/cellects/utils/utilitarian.py
reduce_path_len(pathway, to_start, from_end)
Reduce the length of a given pathway string by truncating it from both ends.
The function is used to shorten the pathway string if its length exceeds
a calculated maximum size. If it does, the function truncates it from both ends,
inserting an ellipsis ("...") in between.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
pathway
|
str
|
The pathway string to be reduced. If an integer is provided, it will be converted into a string. |
required |
to_start
|
int
|
Number of characters from the start to keep in the pathway string. |
required |
from_end
|
int
|
Number of characters from the end to keep in the pathway string. |
required |
Returns:
| Type | Description |
|---|---|
str
|
The reduced version of the |
Examples:
Source code in src/cellects/utils/utilitarian.py
remove_coordinates(arr1, arr2)
Remove coordinates from arr1 that are present in arr2.
Given two arrays of coordinates, remove rows from the first array that match any row in the second array.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
arr1
|
ndarray of shape (n, 2)
|
Array containing coordinates to filter. |
required |
arr2
|
ndarray of shape (m, 2)
|
Array containing coordinates to match for removal. |
required |
Returns:
| Type | Description |
|---|---|
ndarray of shape (k, 2)
|
Array with coordinates from |
Examples:
>>> arr1 = np.array([[1, 2], [3, 4]])
>>> arr2 = np.array([[3, 4]])
>>> remove_coordinates(arr1, arr2)
array([[1, 2],
[3, 4]])
>>> arr1 = np.array([[1, 2], [3, 4]])
>>> arr2 = np.array([[3, 2], [1, 4]])
>>> remove_coordinates(arr1, arr2)
array([[1, 2],
[3, 4]])
>>> arr1 = np.array([[1, 2], [3, 4]])
>>> arr2 = np.array([[3, 2], [1, 2]])
>>> remove_coordinates(arr1, arr2)
array([[3, 4]])
>>> arr1 = np.arange(200).reshape(100, 2)
>>> arr2 = np.array([[196, 197], [198, 199]])
>>> new_arr1 = remove_coordinates(arr1, arr2)
>>> new_arr1.shape
(98, 2)
Source code in src/cellects/utils/utilitarian.py
smallest_memory_array(array_object, array_type='uint')
Convert input data to the smallest possible NumPy array type that can hold it.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
array_object
|
ndarray or list of lists
|
The input data to be converted. |
required |
array_type
|
str
|
The type of NumPy data type to use ('uint'). |
is 'uint'
|
Returns:
| Type | Description |
|---|---|
ndarray
|
A NumPy array of the smallest data type that can hold all values in |
Examples:
>>> import numpy as np
>>> array = [[1, 2], [3, 4]]
>>> smallest_memory_array(array)
array([[1, 2],
[3, 4]], dtype=np.uint8)
>>> array = [[1000, 2000], [3000, 4000]]
>>> smallest_memory_array(array)
array([[1000, 2000],
[3000, 4000]], dtype=uint16)
>>> array = [[2**31, 2**32], [2**33, 2**34]]
>>> smallest_memory_array(array)
array([[ 2147483648, 4294967296],
[ 8589934592, 17179869184]], dtype=uint64)
Source code in src/cellects/utils/utilitarian.py
split_dict(c_space_dict)
Split a dictionary into two dictionaries based on specific criteria and return their keys.
Split the input dictionary c_space_dict into two dictionaries: one for items not
ending with '2' and another where the key is truncated by removing its last
character if it does end with '2'. Additionally, return the keys that have been
processed.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
c_space_dict
|
dict
|
The dictionary to be split. Expected keys are strings and values can be any type. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
first_dict |
dict
|
Dictionary containing items from |
second_dict |
dict
|
Dictionary containing items from |
c_spaces |
list
|
List of keys from |
Raises:
| Type | Description |
|---|---|
None
|
|
Notes
No critical information to share.
Examples:
>>> c_space_dict = {'key1': 10, 'key2': 20, 'logical': 30}
>>> first_dict, second_dict, c_spaces = split_dict(c_space_dict)
>>> print(first_dict)
{'key1': 10}
>>> print(second_dict)
{'key': 20}
>>> print(c_spaces)
['key1', 'key']
Source code in src/cellects/utils/utilitarian.py
translate_dict(old_dict)
Translate a dictionary to a typed dictionary and filter out non-string values.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
old_dict
|
dict
|
The input dictionary that may contain non-string values |
required |
Returns:
| Name | Type | Description |
|---|---|---|
numba_dict |
Dict
|
A typed dictionary containing only the items from |
Examples: