Einsums in Python#

As mentioned in the introduction, Einsums is compatible with Python. Here, we will describe the basics for setting up a computation in Python, as well as some useful functions.

Running Einsums#

In order to do a calculation using Einsums, we need to import it.

>>> import einsums as ein

From here, we can now define some data. The core Einsums functions will work with our own tensor classes, their children, and anything that implements the Python buffer protocol, including numpy.ndarray. To start, we need to compile a plan. This is where Einsums will decide on any optimizations that can be done, such as restructuring the tensor to use in a call to BLAS. Here is an example for a matrix multiplication.

>>> import einsums as ein
>>> import numpy as np
>>> plan = ein.core.compile_plan("ij", "ik", "kj")
>>> A = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=float)
>>> B = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=float)
>>> C = np.array([[0, 0, 0], [0, 0, 0], [0, 0, 0]], dtype=float)
>>> plan.execute(0, C, 1, A, B) # Compute C = 0 * C + 1 * A @ B
>>> print(C)
[[ 30.  36.  42.]
 [ 66.  81.  96.]
 [102. 126. 150.]]

And that’s it. Now you know the basics of running Einsums.

GPU Acceleration#

One of the big features of Einsums is its GPU acceleration using AMD’s HIP language. In order to make this feature accessible to even the most novice of users, we have provided an interface that should be very easy to use. This is done using the einsums.core.GPUView class. Essentially, objects of this class wrap a buffer object and handles data transfers from the CPU to the GPU and back. We will start with the previous example for the definitions, and wrap each of these in a GPUView before execution.

>>> A_view = ein.core.GPUView(A, ein.core.COPY) # Copy the data into the GPU.
>>> B_view = ein.core.GPUView(B, ein.core.COPY)
>>> C_view = ein.core.GPUView(C, ein.core.COPY)
>>> plan.execute(0, C_view, 1, A_view, B_view)
>>> C_view.update_D2H() # Copy the data from the GPU back to the host.

As we can see, there is very little difference between the CPU and GPU. One very important thing to note is that in order to reduce the amount of memory operations, memory is NOT AUTOMATICALLY SYNCHRONIZED after calls to execute when the buffers are wrapped using the einsums.core.COPY mode. Thus, in order to maintain data validity, you should call einsums.core.GPUView.update_H2D() anytime you modify a buffer on the host side before a call to einsums.core.execute, and you should call einsums.core.GPUView.update_D2H() anytime you modify a buffer on the GPU before you access it on the host. If we modify the example above with print statements, we can see this in more detail.

>>> import einsums as ein
>>> import numpy as np
>>> plan = ein.core.compile_plan("ij", "ik", "kj")
>>> A = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=float)
>>> B = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=float)
>>> C = np.array([[0, 0, 0], [0, 0, 0], [0, 0, 0]], dtype=float)
>>> A_view = ein.core.GPUView(A, ein.core.COPY) # Copy the data into the GPU.
>>> B_view = ein.core.GPUView(B, ein.core.COPY)
>>> C_view = ein.core.GPUView(C, ein.core.COPY)
>>> # At this point, the data is all synchronized, since view creation performs synchronization.
>>> plan.execute(0, C_view, 1, A_view, B_view)
>>> # After this call to execute, C has become desynchronized.
>>> print(C)
[[0. 0. 0.]
 [0. 0. 0.]
 [0. 0. 0.]]
>>> C_view.update_D2H() # Bring C back into synchronization.
>>> print(C)
[[ 30.  36.  42.]
 [ 66.  81.  96.]
 [102. 126. 150.]]

This does not need to be done if the data has been wrapped using the einsums.core.MAP mode. However, this mode tends to be very slow, since data is constantly being synchronized whenever the GPU encounters a cache miss, which will happen very often for large tensors.

>>> import einsums as ein
>>> import numpy as np
>>> plan = ein.core.compile_plan("ij", "ik", "kj")
>>> A = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=float)
>>> B = np.array([[1, 2, 3], [4, 5, 6], [7, 8, 9]], dtype=float)
>>> C = np.array([[0, 0, 0], [0, 0, 0], [0, 0, 0]], dtype=float)
>>> A_view = ein.core.GPUView(A, ein.core.MAP) # Map the data into the GPU's virtual memory
>>> B_view = ein.core.GPUView(B, ein.core.MAP)
>>> C_view = ein.core.GPUView(C, ein.core.MAP)
>>> # At this point, the data is all synchronized, since view creation performs synchronization.
>>> plan.execute(0, C_view, 1, A_view, B_view)
>>> # Since C is mapped into virtual memory, C will already be synchronized.
>>> print(C)
[[ 30.  36.  42.]
 [ 66.  81.  96.]
 [102. 126. 150.]]
>>> C_view.update_D2H() # Does nothing when wrapped with ein.core.MAP.
>>> print(C)
[[ 30.  36.  42.]
 [ 66.  81.  96.]
 [102. 126. 150.]]

Creating Tensors#

As we have seen, Einsums is compatible with any buffer object, including Numpy arrays. However, the C++ side of Einsums is not compatible with these Python objects. To aid in this transition, there are two sets of tensors that have been made available: einsums.core.RuntimeTensorX and einsums.core.RuntimeTensorViewX, where X stands for F for 32-bit single-precision floating point values such as numpy.single, D for 64-bit double-precision floating point values such as Python’s float or numpy.double, C for 64-bit single-precision complex values such as numpy.complex64, or Z for 128-bit double-precision complex values such as Python’s complex or numpy.complex128. Extended precision is not available, since it is not available for Windows or for AMD graphics cards. Half-precision is also not available due to lack of support in the C++ standard. For more documentation on the methods defined within these tensors, see the relevant documents. There are also types called einsums.core.RuntimeTensor and einsums.core.RuntimeTensorView. These are the base classes for all of these other tensors, but they have no code of their own. They are provided for things like isinstance(A, einsums.core.RuntimeTensor) to check if something is a runtime tensor without specifying its type. It should be noted that einsums.core.RuntimeTensorView is a child of einsums.core.RuntimeTensor, so all tensor views are also instance of einsums.core.RuntimeTensor. However, tensor views are not instances of the runtime tensors associated with their type. The following example will show all of this behavior.

>>> import einsums as ein
>>> plan = ein.core.compile_plan("ij", "ik", "kj")
>>> A = ein.utils.create_random_tensor("A", [3, 3])
>>> B = ein.utils.create_random_tensor("B", [3, 3])
>>> C = ein.utils.create_tensor("C", [3, 3], dtype=float)
>>> plan.execute(0, C, 1, A, B)
>>> print(C) # Since A and B are random, this is just an example.
Name: C
    Type: In Core Runtime Tensor
    Data Type: double
    Dims{3 3 }
    Strides{3 1 }

    (0,  0-2):        0.52218486     0.20413352     0.18708155

    (1,  0-2):        0.97491459     0.48250664     0.56360688

    (2,  0-2):        0.66677923     0.38629482     0.38812904
>>> # Checking instances.
>>> A_view = A[0:2, 0:2]
>>> print(type(A))
<class 'einsums.core.RuntimeTensorD'>
>>> print(type(A_view))
<class 'einsums.core.RuntimeTensorViewD'>
>>> print(isinstance(A, ein.core.RuntimeTensorD)) # A is a RuntimeTensorD.
True
>>> print(isinstance(A, ein.core.RuntimeTensor)) # A is a RuntimeTensorD, so also a RuntimeTensor.
True
>>> print(isinstance(A, ein.core.RuntimeTensorF)) # A is a RuntimeTensorD, not a RuntimeTensorF.
False
>>> print(isinstance(A, ein.core.RuntimeTensorView)) # A is not a view.
False
>>> print(isinstance(A_view, ein.core.RuntimeTensorView)) # A_view is a RuntimeTensorViewD, so also a RuntimeTensorView.
True
>>> print(isinstance(A_view, ein.core.RuntimeTensorViewD)) # A_view is a RuntimeTensorViewD.
True
>>> print(isinstance(A_view, ein.core.RuntimeTensor)) # RuntimeTensorView is a subclass of RuntimeTensor
True
>>> print(isinstance(A_view, ein.core.RuntimeTensorD)) # A is a view, not a tensor.
False