sortednp

Sortednp (sorted numpy) is a python package which provides methods to perform efficient set operations on sorted numpy arrays. This includes intersecting and merging sorted numpy arrays. The returned intersections and unions are also sorted.

Classes

Algorithms

Bases: IntEnum

Intersections are calculated by iterating both arrays. For a given element in one array, the method needs to search the other and check if the element is contained. In order to make this more efficient, we can use the fact that the arrays are sorted. There are three search methods, which can be selected via the optional keyword argument algorithm.

Attributes

BINARY_SEARCH `class-attribute` `instance-attribute`

BINARY_SEARCH = 2

Slice the remainder of the array in halves and repeat the procedure on the slice which contains the searched element. More Information.

GALLOPING_SEARCH `class-attribute` `instance-attribute`

GALLOPING_SEARCH = 3

First, search for an element linearly, doubling the step size after each step. If a step goes beyond the search element, perform a binary search between the last two positions. More Information.

SIMPLE_SEARCH `class-attribute` `instance-attribute`

SIMPLE_SEARCH = 1

Search for an element by linearly iterating over the array element-by-element. More Information.

IntersectDuplicates

Bases: IntEnum

Specify how to handle duplicated items in the input arrays when computing their intersection.

Attributes

DROP `class-attribute` `instance-attribute`

DROP = 4

With DROP, the intersection ignore any duplicated entries. The output will contain only unique values.

KEEP_MAX_N `class-attribute` `instance-attribute`

KEEP_MAX_N = 7

With KEEP_MAX_N, the intersection an item occurs n > 0 times in one input array and m > 0 times in the other array, the output will contain the item min(n, m) times.

KEEP_MIN_N `class-attribute` `instance-attribute`

KEEP_MIN_N = 5

With KEEP_MIN_N, if an item occurs n > 0 times in one input array and m > 0 times in the other array, the output will contain the item min(n, m) times.

MergeDuplicates

Bases: IntEnum

Specify how to handle duplicated items in the input arrays during merging.

Attributes

DROP `class-attribute` `instance-attribute`

DROP = 4

Ignore any duplicated elements. The output contains only unique values.

DROP_IN_INPUT `class-attribute` `instance-attribute`

DROP_IN_INPUT = 6

Ignores duplicated elements in the input arrays separately. This is the same as ensuring that each input array unique values. The output contains every value at most twice.

KEEP `class-attribute` `instance-attribute`

KEEP = 8

Keep all duplicated entries. If an item occurs n times in one input array and m times in the other input array, the output contains the item n + m times.

SubsetDuplicates

Bases: IntEnum

Procedure to handle duplicates for issubset() function.

Attributes

IGNORE `class-attribute` `instance-attribute`

IGNORE = 9

Ignore duplicates in the potential subset and the superset arrays. The issubset() function returns true if every unique values in the first array exists at least once in the superset array.

import numpy as np
import sortednp as snp

a = np.array([2, 2])
b = np.array([2])

assert snp.issubset(a, b, duplicates=snp.SubsetDuplicates.IGNORE)
assert snp.issubset(b, a, duplicates=snp.SubsetDuplicates.IGNORE)

REPEAT `class-attribute` `instance-attribute`

REPEAT = 10

Account for the number of occurrences of each item in the potential subset. The issubset() function returns true if and only if every item in the first array appears at least as often in the superset array as it is contained in the subset array.

import numpy as np
import sortednp as snp

a = np.array([2])
b = np.array([2, 2])

assert snp.issubset(a, b, duplicates=snp.SubsetDuplicates.REPEAT)
assert not snp.issubset(b, a, duplicates=snp.SubsetDuplicates.REPEAT)

Functions

intersect

intersect(
    a: np.ndarray,
    b: np.ndarray,
    algorithm=Algorithms.GALLOPING_SEARCH,
    duplicates=IntersectDuplicates.KEEP_MIN_N,
    indices: bool = False,
) -> Union[np.ndarray, Tuple[np.ndarray, Tuple[np.ndarray, np.ndarray]]]

Return a sorted array with all elements that appear in both a and b, i.e., it computes the intersection of a and b.

import numpy as np
import sortednp as snp

a = np.array([1, 3, 5])
b = np.array([1, 2, 3])
assert snp.intersect(a, b).tolist() == [1, 3]

The algorithm parameter determines the search algorithm to use. See Algorithms for possible values. The default is GALLOPING_SEARCH.

If duplicates is set to DROP, the return value contains only unique values that appear in both input arrays. This behavior represents mathematical set intersection and ignores any duplication in the input.

a = np.array([2, 2])
b = np.array([2, 2, 2])

intersection = snp.intersect(
    a, b,
    duplicates=snp.IntersectDuplicates.DROP
)
assert intersection.tolist() == [2]

If duplicates is set to KEEP_MIN_N, the return value contains each item min(n, m) times, where n and m are the number of occurrences in array a and b respectively. The rational is that each duplicated item in the input arrays matches exactly once with an item in the other array. For example, consider:

a = np.array([2, 2])
b = np.array([2, 2, 2])

intersection = snp.intersect(
    a, b,
    # That's the default behavior
    duplicates=snp.IntersectDuplicates.KEEP_MIN_N
)
assert intersection.tolist() == [2, 2]

The first 2 in array a matches with the first 2 in array b. The second 2 in array a matches with the second 2 in array b. However, the third 2 in array b has no match in array a. The previous duplications are already taken.

If duplicates is set to KEEP_MAX_N, the return value contains each item max(n, m) times, where n and m are the number of occurrences in array a and b respectively. The rational is that duplicates items in one input array can all match with a single item in the other array. For example, consider:

a = np.array([2, 2])
b = np.array([2, 2, 2])

intersection = snp.intersect(
    a, b,
    duplicates=snp.IntersectDuplicates.KEEP_MAX_N
)
assert intersection.tolist() == [2, 2, 2]

For every occurrence of 2 in array b, we can verify that the element, i.e., 2, is contained in array a.

If the optional parameter indices is set to True, the function returns the intersection array and a tuple of arrays with integer indices. For each element in the intersection, the corresponding indices point to the position where the element is contained in the input arrays. Assuming intersection, (indices_a, indices_b) = snp.intersect(a, b, indices=True), the following conditions hold for all i in the valid range.

intersection[i] == a[indices_a[i]]
intersection[i] == b[indices_b[i]]

Parameters:

Name	Type	Description	Default
`a`	`ndarray`	First array to intersect	required
`b`	`ndarray`	Second array to intersect	required
`algorithm`	`Algorithms`	Search algorithm to search common items in the arrays	`GALLOPING_SEARCH`
`duplicates`	`IntersectDuplicates`	Specifies how to handle duplicated items in the input arrays	`KEEP_MIN_N`
`indices`	`bool`	If True, return the indices of the intersection items in the input arrays.	`False`

Returns:

Type	Description
`Union[ndarray, tuple[ndarray, tuple[ndarray, ndarray]]]`	The intersection of the input arrays. If `indices` is set to `True`, the function returns a tuple containing the intersection array and a tuple of arrays with integer indices.

Source code in sortednp/__init__.py

def intersect(
    a: np.ndarray,
    b: np.ndarray,
    algorithm=Algorithms.GALLOPING_SEARCH,
    duplicates=IntersectDuplicates.KEEP_MIN_N,
    indices: bool = False,
) -> Union[np.ndarray, Tuple[np.ndarray, Tuple[np.ndarray, np.ndarray]]]:
    """
    Return a sorted array with all elements that appear in both a and b, i.e.,
    it computes the intersection of a and b.

    ```python
    import numpy as np
    import sortednp as snp

    a = np.array([1, 3, 5])
    b = np.array([1, 2, 3])
    assert snp.intersect(a, b).tolist() == [1, 3]
    ```

    The `algorithm` parameter determines the search algorithm to use. See
    [Algorithms][sortednp.Algorithms] for possible values. The default is
    [GALLOPING_SEARCH][sortednp.Algorithms.GALLOPING_SEARCH].

    If `duplicates` is set to [DROP][sortednp.IntersectDuplicates.DROP], the
    return value contains only unique values that appear in both input arrays.
    This behavior represents mathematical set intersection and ignores any
    duplication in the input.

    ```python
    a = np.array([2, 2])
    b = np.array([2, 2, 2])

    intersection = snp.intersect(
        a, b,
        duplicates=snp.IntersectDuplicates.DROP
    )
    assert intersection.tolist() == [2]
    ```

    If `duplicates` is set to
    [KEEP_MIN_N][sortednp.IntersectDuplicates.KEEP_MIN_N], the return value
    contains each item `min(n, m)` times, where `n` and `m` are the number of
    occurrences in array `a` and `b` respectively. The rational is that
    each duplicated item in the input arrays matches exactly once with an item
    in the other array. For example, consider:

    ```python
    a = np.array([2, 2])
    b = np.array([2, 2, 2])

    intersection = snp.intersect(
        a, b,
        # That's the default behavior
        duplicates=snp.IntersectDuplicates.KEEP_MIN_N
    )
    assert intersection.tolist() == [2, 2]
    ```

    The first `2` in array `a` matches with the first `2` in array `b`. The
    second `2` in array `a` matches with the second `2` in array `b`. However,
    the third `2` in array `b` has no match in array `a`. The previous
    duplications are already *taken*.

    If `duplicates` is set to
    [KEEP_MAX_N][sortednp.IntersectDuplicates.KEEP_MAX_N], the return value
    contains each item `max(n, m)` times, where `n` and `m` are the number of
    occurrences in array `a` and `b` respectively. The rational is that
    duplicates items in one input array can all match with a single item in the
    other array. For example, consider:

    ```python
    a = np.array([2, 2])
    b = np.array([2, 2, 2])

    intersection = snp.intersect(
        a, b,
        duplicates=snp.IntersectDuplicates.KEEP_MAX_N
    )
    assert intersection.tolist() == [2, 2, 2]
    ```

    For every occurrence of `2` in array `b`, we can verify that the element,
    i.e., `2`, is contained in array `a`.

    If the optional parameter `indices` is set to `True`, the function returns
    the intersection array and a tuple of arrays with integer indices. For
    each element in the intersection, the corresponding indices point to the
    position where the element is contained in the input arrays. Assuming
    `intersection, (indices_a, indices_b) = snp.intersect(a, b, indices=True)`,
    the following conditions hold for all `i` in the valid range.

    ```python
    intersection[i] == a[indices_a[i]]
    intersection[i] == b[indices_b[i]]
    ```

    Parameters
    ----------
    a : np.ndarray
        First array to intersect
    b : np.ndarray
        Second array to intersect
    algorithm : Algorithms
        Search algorithm to search common items in the arrays
    duplicates : IntersectDuplicates
        Specifies how to handle duplicated items in the input arrays
    indices : bool
        If True, return the indices of the intersection items in the input
        arrays.

    Returns
    -------
    Union[np.ndarray, tuple[np.ndarray, tuple[np.ndarray, np.ndarray]]]
        The intersection of the input arrays. If `indices` is set to `True`,
        the function returns a tuple containing the intersection array and a
        tuple of arrays with integer indices.
    """

    return _internal.intersect(
        a, b, algorithm=algorithm, duplicates=duplicates, indices=indices
    )

isitem

isitem(element: Any, array: np.ndarray, algorithm=Algorithms.GALLOPING_SEARCH)

Returns True if element is contained in the array, otherwise False.

import numpy as np
import sortednp as snp

a = np.array([1, 2, 3])
assert snp.isitem(1, a)
assert not snp.isitem(4, a)

The algorithm parameter determines the search algorithm to use. See Algorithms for possible values. The default is GALLOPING_SEARCH.

Warning

Please note that the search element is casted to the numpy type of the given array. This might lead to unexpected results if the element is outside the domain of the date type.

a = np.array([1, 2, 255], dtype=np.unit8)
assert snp.isitem(-1, a)  # -1 turns into 255

Future numpy versions will raise an Exception.

Parameters:

Name	Type	Description	Default
`element`	`Any`	The element to search in array	required
`array`	`ndarray`	The sorted array to search the element. Adjacent items must be in ascending order. The behavior of the function is undefined if the array is not sorted.	required
`algorithm`	`int`	The algorithm to use for the search. See Algorithms for possible values.	`GALLOPING_SEARCH`

Returns:

Type	Description
`bool`	True if subset is a subset of array, otherwise False.

Source code in sortednp/__init__.py

def isitem(
    element: Any, array: np.ndarray, algorithm=Algorithms.GALLOPING_SEARCH
):
    """
    Returns True if element is contained in the array, otherwise False.

    ```python
    import numpy as np
    import sortednp as snp

    a = np.array([1, 2, 3])
    assert snp.isitem(1, a)
    assert not snp.isitem(4, a)
    ```

    The algorithm parameter determines the search algorithm to use. See
    [Algorithms][sortednp.Algorithms] for possible values. The default is
    [GALLOPING_SEARCH][sortednp.Algorithms.GALLOPING_SEARCH].

    !!! warning
        Please note that the search element is casted to the numpy type of the
        given array. This might lead to unexpected results if the element is
        outside the domain of the date type.

        ```python
        a = np.array([1, 2, 255], dtype=np.unit8)
        assert snp.isitem(-1, a)  # -1 turns into 255
        ```

        [Future numpy versions](https://numpy.org/devdocs/release/1.24.0-notes.
        html#conversion-of-out-of-bound-python-integers)
        will raise an Exception.

    Parameters
    ----------
    element : Any
        The element to search in array
    array : np.ndarray
        The sorted array to search the element. Adjacent items must be in
        ascending order. The behavior of the function is undefined if the array
        is not sorted.
    algorithm : int
        The algorithm to use for the search. See
        [Algorithms][sortednp.Algorithms] for possible values.

    Returns
    -------
    bool
        True if subset is a subset of array, otherwise False.
    """
    return issubset(
        np.array([element], dtype=array.dtype), array, algorithm=algorithm
    )

issubset

issubset(
    subset: np.ndarray,
    array: np.ndarray,
    algorithm: Algorithms = Algorithms.GALLOPING_SEARCH,
    duplicates: SubsetDuplicates = SubsetDuplicates.IGNORE,
)

Returns True if all values in subset appear in array, i.e., return True if subset is a subset of array, otherwise False.

import numpy as np
import sortednp as snp

a = np.array([1, 3])
b = np.array([1, 2, 3])
assert snp.issubset(a, b)

The algorithm parameter determines the search algorithm to use. See Algorithms for possible values. The default is GALLOPING_SEARCH.

If duplicates is SubsetDuplicates.IGNORE, returns True if and only if each unique element in subset appears at least once in array.

a = np.array([2])
b = np.array([2, 2])
assert snp.issubset(a, b, duplicates=snp.SubsetDuplicates.IGNORE)
assert snp.issubset(b, a, duplicates=snp.SubsetDuplicates.IGNORE)

If duplicates is SubsetDuplicates.REPEAT, returns True if and only if each unique element in subset appears at least as often in array as it appears in subset.

a = np.array([2])
b = np.array([2, 2])
assert snp.issubset(a, b, duplicates=snp.SubsetDuplicates.REPEAT)
assert not snp.issubset(b, a, duplicates=snp.SubsetDuplicates.REPEAT)

Parameters:

Name	Type	Description	Default
`subset`	`ndarray`	The potential subset in question.	required
`array`	`ndarray`	The sorted array to check if subset is a true subset. Adjacent items must be in ascending order. The behavior of the function is undefined if the array is not sorted.	required
`algorithm`	`int`	The algorithm to use for the search. See Algorithms for possible values.	`GALLOPING_SEARCH`
`duplicates`	`int`	How to handle duplicates. See Duplicates for possible values.	`IGNORE`

Returns:

Type	Description
`bool`	True if subset is a subset of array, otherwise False.

Source code in sortednp/__init__.py

def issubset(
    subset: np.ndarray,
    array: np.ndarray,
    algorithm: Algorithms = Algorithms.GALLOPING_SEARCH,
    duplicates: SubsetDuplicates = SubsetDuplicates.IGNORE,
):
    """
    Returns True if all values in `subset` appear in `array`,
    i.e., return True if `subset` is a subset of `array`, otherwise False.

    ```python
    import numpy as np
    import sortednp as snp

    a = np.array([1, 3])
    b = np.array([1, 2, 3])
    assert snp.issubset(a, b)
    ```

    The `algorithm` parameter determines the search algorithm to use. See
    [Algorithms][sortednp.Algorithms] for possible values. The default is
    [GALLOPING_SEARCH][sortednp.Algorithms.GALLOPING_SEARCH].

    If `duplicates` is
    [SubsetDuplicates.IGNORE][sortednp.SubsetDuplicates.IGNORE], returns True
    if and only if each unique element in `subset` appears at least once in
    `array`.

    ```python
    a = np.array([2])
    b = np.array([2, 2])
    assert snp.issubset(a, b, duplicates=snp.SubsetDuplicates.IGNORE)
    assert snp.issubset(b, a, duplicates=snp.SubsetDuplicates.IGNORE)
    ```

    If `duplicates` is
    [SubsetDuplicates.REPEAT][sortednp.SubsetDuplicates.REPEAT], returns True
    if and only if each unique element in `subset` appears at least as often
    in `array` as it appears in `subset`.

    ```python
    a = np.array([2])
    b = np.array([2, 2])
    assert snp.issubset(a, b, duplicates=snp.SubsetDuplicates.REPEAT)
    assert not snp.issubset(b, a, duplicates=snp.SubsetDuplicates.REPEAT)
    ```

    Parameters
    ----------
    subset : np.ndarray
        The potential subset in question.
    array : np.ndarray
        The sorted array to check if subset is a true subset. Adjacent items
        must be in ascending order. The behavior of the function is undefined
        if the array is not sorted.
    algorithm : int
        The algorithm to use for the search. See Algorithms for possible
        values.
    duplicates : int
        How to handle duplicates. See Duplicates for possible values.

    Returns
    -------
    bool
        True if subset is a subset of array, otherwise False.
    """
    return _internal.issubset(
        subset, array, algorithm=algorithm, duplicates=duplicates
    )

kway_intersect

kway_intersect(
    *arrays: Sequence[DeferredArray], assume_sorted: bool = True, **kwds
)

Return a sorted array containing the intersection of all input arrays.

If the optional flag assume_sorted is set to False, the function sorts the arrays prior to intersecting. The arrays are order by increasing size before starting to intersection the arrays one-by-one.

The method raises a TypeError, if the array count is zero.

The arguments can load the arrays on the fly. If an argument is callable, its return value is used as an array instead of the argument itself. This makes it possible to load one array after another to avoid having all arrays in memory at the same time. Arrays deferred in this way, are loaded only after all in-memory arrays are intersected. The computed intersection is empty at any point, the function stops and returns an empty array.

Note

Note on the performance: The function intersects the arrays one-by-one. This is not the most performant implementation.

Parameters:

Name	Type	Description	Default
`*arrays`	`ndarray or callable`	The arrays to intersect as numpy arrays or deferred arrays.	`()`
`assume_sorted`	`bool`	If True, the function assumes that the arrays are sorted. If False, the function sorts the arrays before intersecting. The default is True.	`True`
`**kwds`	`dict`	Any other keyword argument is passed to the native intersect function.	`{}`

Returns:

Type	Description
`ndarray`	The union array.

Source code in sortednp/kway.py

def intersect(
    *arrays: Sequence[DeferredArray], assume_sorted: bool = True, **kwds
):
    """
    Return a sorted array containing the intersection of all input arrays.

    If the optional flag assume_sorted is set to False, the function sorts the
    arrays prior to intersecting. The arrays are order by increasing size
    before starting to intersection the arrays one-by-one.

    The method raises a TypeError, if the array count is zero.

    The arguments can load the arrays on the fly. If an argument is callable,
    its return value is used as an array instead of the argument itself. This
    makes it possible to load one array after another to avoid having all
    arrays in memory at the same time. Arrays deferred in this way, are loaded
    only after all in-memory arrays are intersected. The computed intersection
    is empty at any point, the function stops and returns an empty array.

    !!! note
        Note on the performance: The function intersects the arrays one-by-one.
        This is not the most performant implementation.

    Parameters
    ----------
    *arrays : np.ndarray or callable
        The arrays to intersect as numpy arrays or deferred arrays.
    assume_sorted : bool, optional
        If True, the function assumes that the arrays are sorted. If False, the
        function sorts the arrays before intersecting. The default is True.
    **kwds : dict
        Any other keyword argument is passed to the native
        [intersect][sortednp.intersect] function.

    Returns
    -------
    np.ndarray
        The union array.
    """

    if not arrays:
        raise TypeError("Merge expects at least one array.")

    # start with smallest non-callable
    inf = float("inf")
    len_array = [(inf if callable(a) else len(a), a) for a in arrays]
    len_array = sorted(len_array, key=lambda x: x[0])
    arrays = [a for _, a in len_array]

    intersect_result = arrays.pop()
    intersect_result = resolve(intersect_result)
    if not assume_sorted:
        intersect_result.sort()
    for array in arrays:
        if len(intersect_result) == 0:
            break
        array = resolve(array)
        if not assume_sorted:
            array.sort()
        intersect_result = _internal.intersect(intersect_result, array, **kwds)
    return intersect_result

kway_merge

kway_merge(
    *arrays: Sequence[DeferredArray], assume_sorted: bool = True, **kwds
)

Return a sorted array containing the union of all input arrays.

If the optional flag assume_sorted is set to False, the function sorts the arrays before merging.

The method raises a TypeError, if the array count is zero.

The arguments can load the arrays on the fly. If an argument is callable, its return value is used as an array instead of the argument itself. This makes it possible to load one array after another to avoid having all arrays in memory at the same time.

Note

Note on the performance: The function merges the arrays one-by-one. This is not the most performant implementation.

Parameters:

Name	Type	Description	Default
`*arrays`	`ndarray or callable`	The arrays to merge as numpy arrays or deferred arrays.	`()`
`assume_sorted`	`bool`	If True, the function assumes that the arrays are sorted. If False, the function sorts the arrays before merging. The default is True.	`True`
`**kwds`	`dict`	Any other keyword argument is passed to the native merge function.	`{}`

Returns:

Type	Description
`ndarray`	The merged array.

Source code in sortednp/kway.py

def merge(
    *arrays: Sequence[DeferredArray], assume_sorted: bool = True, **kwds
):
    """
    Return a sorted array containing the union of all input arrays.

    If the optional flag assume_sorted is set to False, the function sorts the
    arrays before merging.

    The method raises a TypeError, if the array count is zero.

    The arguments can load the arrays on the fly. If an argument is callable,
    its return value is used as an array instead of the argument itself. This
    makes it possible to load one array after another to avoid having all
    arrays in memory at the same time.

    !!! note
        Note on the performance: The function merges the arrays one-by-one.
        This is not the most performant implementation.

    Parameters
    ----------
    *arrays : np.ndarray or callable
        The arrays to merge as numpy arrays or deferred arrays.
    assume_sorted : bool, optional
        If True, the function assumes that the arrays are sorted. If False, the
        function sorts the arrays before merging. The default is True.
    **kwds : dict
        Any other keyword argument is passed to the native
        [merge][sortednp.merge] function.

    Returns
    -------
    np.ndarray
        The merged array.
    """
    if not arrays:
        raise TypeError("Merge expects at least one array.")

    arrays: list = list(arrays)
    merge_result = arrays.pop()
    merge_result = resolve(merge_result)
    if not assume_sorted:
        merge_result.sort()
    for array in arrays:
        array = resolve(array)
        if not assume_sorted:
            array.sort()
        merge_result = _internal.merge(merge_result, array, **kwds)
    return merge_result

merge

merge(
    a: np.ndarray,
    b: np.ndarray,
    duplicates=MergeDuplicates.KEEP,
    indices: bool = False,
)

Return a sorted array containing all elements from both arrays.

import numpy as np
import sortednp as snp

a = np.array([2, 4])
b = np.array([1, 2, 3])
assert snp.merge(a, b).tolist() == [1, 2, 2, 3, 4]

The algorithm parameter determines the search algorithm to use. See Algorithms for possible values. The default is GALLOPING_SEARCH.

If duplicates is set to DROP, the any duplicates in the merged output are omitted. The return values only contains unique values.

a = np.array([2, 2, 2])
b = np.array([2, 2, 2, 2])

merged = snp.merge(
    a, b,
    sortednp.MergeDuplicates.DROP
)

assert merged.tolist() == [2]

If duplicates is set to DROP_IN_INPUT, the output is obtained by merging the unique values from the input arrays. Values occur at most twice in the output. A value appearing twice in the output implies it exists in both input arrays.

a = np.array([2, 2, 2, 3, 3])
b = np.array([2, 2, 2, 2])

merged = snp.merge(
    a, b,
    sortednp.MergeDuplicates.KEEP_IN_INPUT
)

assert merged.tolist() == [2, 2, 3]

If duplicates is set to DROP, the retains any duplication in the input arrays. If an item occurs n times in one input array and m times in the other input array, the output contains the item n + m times.

a = np.array([2, 2, 3, 3])
b = np.array([2, 2, 2])

merged = snp.merge(
    a, b,
    sortednp.MergeDuplicates.KEEP_IN_INPUT
)

assert merged.tolist() == [2, 2, 2, 2, 2, 3, 3]

If the optional parameter indices is set to True, the function returns the union array and a tuple of arrays with integer indices. For each element in the union, the corresponding indices point to the position where the element is contained in the input arrays. Assuming union, (indices_a, indices_b) = snp.merge(a, b, indices=True), the following conditions hold for all i in the valid range.

union[i] == a[indices_a[i]]
union[i] == b[indices_b[i]]

Parameters:

Name	Type	Description	Default
`a`	`ndarray`	First array to merge	required
`b`	`ndarray`	Second array to merge	required
`duplicates`	`MergeDuplicates`	Specifies how to handle duplicated items in the input arrays	`KEEP`
`indices`	`bool`	If True, return the indices of the union items in the input arrays.	`False`

Returns:

Type	Description
`Union[ndarray, tuple[ndarray, tuple[ndarray, ndarray]]]`	The union of the input arrays. If `indices` is set to `True`, the function returns a tuple containing the union array and a tuple of arrays with integer indices.

Source code in sortednp/__init__.py

def merge(
    a: np.ndarray,
    b: np.ndarray,
    duplicates=MergeDuplicates.KEEP,
    indices: bool = False,
):
    """
    Return a sorted array containing all elements from both arrays.

    ```python
    import numpy as np
    import sortednp as snp

    a = np.array([2, 4])
    b = np.array([1, 2, 3])
    assert snp.merge(a, b).tolist() == [1, 2, 2, 3, 4]
    ```

    The `algorithm` parameter determines the search algorithm to use. See
    [Algorithms][sortednp.Algorithms] for possible values. The default is
    [GALLOPING_SEARCH][sortednp.Algorithms.GALLOPING_SEARCH].

    If `duplicates` is set to [DROP][sortednp.MergeDuplicates.DROP], the
    any duplicates in the merged output are omitted. The return values
    only contains unique values.

    ```python
    a = np.array([2, 2, 2])
    b = np.array([2, 2, 2, 2])

    merged = snp.merge(
        a, b,
        sortednp.MergeDuplicates.DROP
    )

    assert merged.tolist() == [2]
    ```

    If `duplicates` is set to
    [DROP_IN_INPUT][sortednp.MergeDuplicates.DROP_IN_INPUT], the output is
    obtained by merging the unique values from the input arrays. Values occur
    at most twice in the output. A value appearing twice in the output implies
    it exists in both input arrays.

    ```python
    a = np.array([2, 2, 2, 3, 3])
    b = np.array([2, 2, 2, 2])

    merged = snp.merge(
        a, b,
        sortednp.MergeDuplicates.KEEP_IN_INPUT
    )

    assert merged.tolist() == [2, 2, 3]
    ```

    If `duplicates` is set to [DROP][sortednp.MergeDuplicates.KEEP], the
    retains any duplication in the input arrays. If an item occurs `n` times
    in one input array and `m` times in the other input array, the output
    contains the item `n + m` times.

    ```python
    a = np.array([2, 2, 3, 3])
    b = np.array([2, 2, 2])

    merged = snp.merge(
        a, b,
        sortednp.MergeDuplicates.KEEP_IN_INPUT
    )

    assert merged.tolist() == [2, 2, 2, 2, 2, 3, 3]
    ```

    If the optional parameter `indices` is set to `True`, the function returns
    the union array and a tuple of arrays with integer indices. For
    each element in the union, the corresponding indices point to the
    position where the element is contained in the input arrays. Assuming
    `union, (indices_a, indices_b) = snp.merge(a, b, indices=True)`,
    the following conditions hold for all `i` in the valid range.

    ```python
    union[i] == a[indices_a[i]]
    union[i] == b[indices_b[i]]
    ```

    Parameters
    ----------
    a : np.ndarray
        First array to merge
    b : np.ndarray
        Second array to merge
    duplicates : MergeDuplicates
        Specifies how to handle duplicated items in the input arrays
    indices : bool
        If True, return the indices of the union items in the input
        arrays.

    Returns
    -------
    Union[np.ndarray, tuple[np.ndarray, tuple[np.ndarray, np.ndarray]]]
        The union of the input arrays. If `indices` is set to `True`,
        the function returns a tuple containing the union array and a
        tuple of arrays with integer indices.
    """
    return _internal.merge(a, b, duplicates=duplicates, indices=indices)

resolve

resolve(obj: DeferredArray) -> np.ndarray

Helper function to check whether the given object is callable. If yes, return its return value, otherwise return the object itself.

This function is used by the package to load large datasets on demand to avoid out-of-memory scenarios.

Source code in sortednp/kway.py

def resolve(obj: DeferredArray) -> np.ndarray:
    """
    Helper function to check whether the given object is callable. If yes,
    return its return value, otherwise return the object itself.

    This function is used by the package to load large datasets on demand to
    avoid out-of-memory scenarios.
    """
    return obj() if callable(obj) else obj

sortednp

Classes

Algorithms

Attributes

BINARY_SEARCH class-attribute instance-attribute

GALLOPING_SEARCH class-attribute instance-attribute

SIMPLE_SEARCH class-attribute instance-attribute

IntersectDuplicates

Attributes

DROP class-attribute instance-attribute

KEEP_MAX_N class-attribute instance-attribute

KEEP_MIN_N class-attribute instance-attribute

MergeDuplicates

Attributes

DROP class-attribute instance-attribute

DROP_IN_INPUT class-attribute instance-attribute

KEEP class-attribute instance-attribute

SubsetDuplicates

Attributes

IGNORE class-attribute instance-attribute

REPEAT class-attribute instance-attribute

Functions

intersect

isitem

issubset

kway_intersect

kway_merge

merge

resolve

BINARY_SEARCH `class-attribute` `instance-attribute`

GALLOPING_SEARCH `class-attribute` `instance-attribute`

SIMPLE_SEARCH `class-attribute` `instance-attribute`

DROP `class-attribute` `instance-attribute`

KEEP_MAX_N `class-attribute` `instance-attribute`

KEEP_MIN_N `class-attribute` `instance-attribute`

DROP `class-attribute` `instance-attribute`

DROP_IN_INPUT `class-attribute` `instance-attribute`

KEEP `class-attribute` `instance-attribute`

IGNORE `class-attribute` `instance-attribute`

REPEAT `class-attribute` `instance-attribute`