WIP: Allow CuPy by jakirkham · Pull Request #212 · zarr-developers/numcodecs

jakirkham · 2019-11-15T00:17:37Z

This adds a small change to handle CuPy ndarrays in the compat functions. Admittedly we may come up with a better way to address the underlying concern here, but I wanted to have this here for tracking. Also it would be useful to have somewhere to discuss this.

TODO:

Unit tests and/or doctests in docstrings
tox -e py38 passes locally
tox -e py27 passes locally
Docstrings and API docs for any new/modified user-facing classes and functions
Changes documented in docs/release.rst
tox -e docs passes locally
AppVeyor and Travis CI passes
Test coverage to 100% (Coveralls passes)

As CuPy arrays (currently) do not have a `tobytes` method, workaround this in `ensure_bytes` by coercing CuPy arrays back to NumPy arrays before calling `tobytes`. This way it can just use NumPy's `tobytes` method. Should be unneeded in a future version of CuPy.

As the decoding algorithm requires something that supports the buffer protocol and assumes it is working on host memory, this won't work on a CuPy array. So convert the CuPy array to a NumPy array first so that it can proceed through the usual decoding path.

jakirkham

Have written a few things below to save some thoughts here and hopefully guide the discussion should others be interested.

jakirkham · 2019-11-15T00:27:31Z

numcodecs/compat.py

-        # already a numpy array
+    if isinstance(buf, (np.ndarray, cp.ndarray)):
+        # already an array
        arr = buf


This is easy enough to do, but it would be nice if we didn't need to learn about CuPy here.

We will probably need to find another solution for Numcodecs in the near term, but wanted to mention some related things. Namely there is some discussion about generally checking array types in issue ( numpy/numpy#14891 ).

Also mentioned by Matt in the aforementioned issue, Dask has is_arraylike, which presents a way of solving this problem. I'm not sure it totally helps us here as we really need to check that there is a data buffer we can grab and use, but it is at least a good starting point.

Another thought I had is we could registered ndarray types and their conversion mechanisms with Numcodecs. This could allow users to add their own array types (like CuPy), but not complicate things too much here.

Other thoughts on this would generally be welcome. 🙂

as we really need to check that there is a data buffer we can grab and use, but it is at least a good starting point.

Perhaps one could check for the presence of a __array__ or __cuda_array_interface__ protocol?

That's a good point. Thanks! Will give that a look.

Yep, if goal here is to ultimately enable codecs with GPU implementations then using __cuda_array_interface__ sounds good. E.g., when using numba cuda functions you get back an instance of some other class (DeviceArray or something like that IIRC) but that exports __cuda_array_interface__ too.

jakirkham · 2019-11-15T00:28:08Z

numcodecs/compat.py

+        # because they don't have a `tobytes` method (yet)
+        # xref: https://github.com/cupy/cupy/pull/2617
+        if isinstance(arr, cp.ndarray):
+            arr = arr.get()


As noted, this is a workaround. The real solution here is PR ( cupy/cupy#2617 ), which should land in CuPy 7.0.0. IOW this can probably be ignored for the sake of this discussion.

Ooh, +1 to cupy.ndarray..tobytes

jakirkham · 2019-11-15T00:30:06Z

numcodecs/compat.py

+        # Force CuPy arrays to NumPy arrays
+        # as they support the buffer protocol
+        if isinstance(s, cp.ndarray):
+            s = s.get()


One way to solve this might be to call ensure_bytes here. That would guarantee we have something we can perform text decoding on. It would also generalize for our use case here.

On the flip side that does incur a copy. Though this may be pretty inexpensive in the grand scheme of things.

I think in Dask we do a check like if hasattr(s, "get"): s = s.get(). This is a little non-specific, but it works well in practice and doesn't require the cupy import.

Just for context, this is what the ensure_bytes change would look like ( #213 ).

It's also possible we can get away with dropping this part entirely. This function is mainly used in conjunction with reading things (like JSON). So ensuring it can work with CuPy is probably unnecessary.

Yes not sure you need to worry about cupy arrays here as this is mainly for JSON reading IIRC.

mrocklin

Some thoughts:

I'm shocked at how small this is
I agree that it would be good to avoid explicitly importing cupy
My guess is that we can get around this with mild creativiity

mrocklin · 2019-11-15T00:35:38Z

numcodecs/compat.py

-        # already a numpy array
+    if isinstance(buf, (np.ndarray, cp.ndarray)):
+        # already an array
        arr = buf


as we really need to check that there is a data buffer we can grab and use, but it is at least a good starting point.

Perhaps one could check for the presence of a __array__ or __cuda_array_interface__ protocol?

mrocklin · 2019-11-15T00:38:06Z

numcodecs/compat.py

+        # Force CuPy arrays to NumPy arrays
+        # as they support the buffer protocol
+        if isinstance(s, cp.ndarray):
+            s = s.get()


I think in Dask we do a check like if hasattr(s, "get"): s = s.get(). This is a little non-specific, but it works well in practice and doesn't require the cupy import.

mrocklin · 2019-11-15T00:38:21Z

numcodecs/compat.py

+        # because they don't have a `tobytes` method (yet)
+        # xref: https://github.com/cupy/cupy/pull/2617
+        if isinstance(arr, cp.ndarray):
+            arr = arr.get()


Ooh, +1 to cupy.ndarray..tobytes

jakirkham · 2019-11-15T01:10:10Z

Yeah the short nature of this change is more a consequence of work Alistair and I did last year (interestingly also in November 🙂). ( #128 )

Thanks for the tips. I'm inclined to agree some small tweaks probably make this easier to integrate.

jakirkham added 3 commits November 8, 2019 18:42

Allow CuPy arrays through ensure_ndarray

41a9f74

jakirkham commented Nov 15, 2019

View reviewed changes

mrocklin reviewed Nov 15, 2019

View reviewed changes

jakirkham mentioned this pull request Jul 2, 2020

Try to converge on internal types consistency. zarr-developers/zarr-python#574

Closed

6 tasks

madsbk mentioned this pull request Jan 11, 2022

[REVIEW] Support of alternative array classes zarr-developers/zarr-python#934

Merged

6 tasks

madsbk mentioned this pull request Feb 9, 2022

Support of alternative array classes: ndarray-like #305

Merged

8 tasks

joshmoore closed this in #305 Mar 15, 2022

Conversation

jakirkham commented Nov 15, 2019

Uh oh!

jakirkham left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

mrocklin left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jakirkham commented Nov 15, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants