torch.backends¶
torch.backends controls the behavior of various backends that PyTorch supports.
These backends include:
torch.backends.cuda
torch.backends.cudnn
torch.backends.mps
torch.backends.mkl
torch.backends.mkldnn
torch.backends.openmp
torch.backends.opt_einsum
torch.backends.xeon
torch.backends.cuda¶
- torch.backends.cuda.is_built()[source]¶
Returns whether PyTorch is built with CUDA support. Note that this doesn’t necessarily mean CUDA is available; just that if this PyTorch binary were run a machine with working CUDA drivers and devices, we would be able to use it.
- torch.backends.cuda.matmul.allow_tf32¶
A
bool
that controls whether TensorFloat-32 tensor cores may be used in matrix multiplications on Ampere or newer GPUs. See TensorFloat-32(TF32) on Ampere devices.
- torch.backends.cuda.matmul.allow_fp16_reduced_precision_reduction¶
A
bool
that controls whether reduced precision reductions (e.g., with fp16 accumulation type) are allowed with fp16 GEMMs.
- torch.backends.cuda.matmul.allow_bf16_reduced_precision_reduction¶
A
bool
that controls whether reduced precision reductions are allowed with bf16 GEMMs.
- torch.backends.cuda.cufft_plan_cache¶
cufft_plan_cache
contains the cuFFT plan caches for each CUDA device. Query a specific device i’s cache via torch.backends.cuda.cufft_plan_cache[i].- torch.backends.cuda.clear()¶
Clears a cuFFT plan cache.
- torch.backends.cuda.preferred_linalg_library(backend=None)[source]¶
Warning
This flag is experimental and subject to change.
When PyTorch runs a CUDA linear algebra operation it often uses the cuSOLVER or MAGMA libraries, and if both are available it decides which to use with a heuristic. This flag (a
str
) allows overriding those heuristics.If “cusolver” is set then cuSOLVER will be used wherever possible.
If “magma” is set then MAGMA will be used wherever possible.
If “default” (the default) is set then heuristics will be used to pick between cuSOLVER and MAGMA if both are available.
When no input is given, this function returns the currently preferred library.
Note: When a library is preferred other libraries may still be used if the preferred library doesn’t implement the operation(s) called. This flag may achieve better performance if PyTorch’s heuristic library selection is incorrect for your application’s inputs.
Currently supported linalg operators:
torch.linalg.eighvals()
- Return type:
_LinalgBackend
- class torch.backends.cuda.SDPBackend(value)[source]¶
Enum class for the scaled dot product attention backends.
Warning
This class is in beta and subject to change.
This class needs to stay aligned with the enum defined in: pytorch/aten/src/ATen/native/transformers/sdp_utils_cpp.h
- torch.backends.cuda.flash_sdp_enabled()[source]¶
Warning
This flag is beta and subject to change.
Returns whether flash scaled dot product attention is enabled or not.
- torch.backends.cuda.enable_mem_efficient_sdp(enabled)[source]¶
Warning
This flag is beta and subject to change.
Enables or disables memory efficient scaled dot product attention.
- torch.backends.cuda.mem_efficient_sdp_enabled()[source]¶
Warning
This flag is beta and subject to change.
Returns whether memory efficient scaled dot product attention is enabled or not.
- torch.backends.cuda.enable_flash_sdp(enabled)[source]¶
Warning
This flag is beta and subject to change.
Enables or disables flash scaled dot product attention.
- torch.backends.cuda.math_sdp_enabled()[source]¶
Warning
This flag is beta and subject to change.
Returns whether math scaled dot product attention is enabled or not.
- torch.backends.cuda.enable_math_sdp(enabled)[source]¶
Warning
This flag is beta and subject to change.
Enables or disables math scaled dot product attention.
- torch.backends.cuda.sdp_kernel(enable_flash=True, enable_math=True, enable_mem_efficient=True)[source]¶
Warning
This flag is beta and subject to change.
This context manager can be used to temporarily enable or disable any of the three backends for scaled dot product attention. Upon exiting the context manager, the previous state of the flags will be restored.
torch.backends.cudnn¶
- torch.backends.cudnn.is_available()[source]¶
Returns a bool indicating if CUDNN is currently available.
- torch.backends.cudnn.allow_tf32¶
A
bool
that controls where TensorFloat-32 tensor cores may be used in cuDNN convolutions on Ampere or newer GPUs. See TensorFloat-32(TF32) on Ampere devices.
- torch.backends.cudnn.deterministic¶
A
bool
that, if True, causes cuDNN to only use deterministic convolution algorithms. See alsotorch.are_deterministic_algorithms_enabled()
andtorch.use_deterministic_algorithms()
.
- torch.backends.cudnn.benchmark¶
A
bool
that, if True, causes cuDNN to benchmark multiple convolution algorithms and select the fastest.
- torch.backends.cudnn.benchmark_limit¶
A
int
that specifies the maximum number of cuDNN convolution algorithms to try when torch.backends.cudnn.benchmark is True. Set benchmark_limit to zero to try every available algorithm. Note that this setting only affects convolutions dispatched via the cuDNN v8 API.
torch.backends.mps¶
torch.backends.mkl¶
- class torch.backends.mkl.verbose(enable)[source]¶
On-demand oneMKL verbosing functionality To make it easier to debug performance issues, oneMKL can dump verbose messages containing execution information like duration while executing the kernel. The verbosing functionality can be invoked via an environment variable named MKL_VERBOSE. However, this methodology dumps messages in all steps. Those are a large amount of verbose messages. Moreover, for investigating the performance issues, generally taking verbose messages for one single iteration is enough. This on-demand verbosing functionality makes it possible to control scope for verbose message dumping. In the following example, verbose messages will be dumped out for the second inference only.
import torch model(data) with torch.backends.mkl.verbose(torch.backends.mkl.VERBOSE_ON): model(data)
- Parameters:
level – Verbose level -
VERBOSE_OFF
: Disable verbosing -VERBOSE_ON
: Enable verbosing
torch.backends.mkldnn¶
- torch.backends.mkldnn.is_available()[source]¶
Returns whether PyTorch is built with MKL-DNN support.
- class torch.backends.mkldnn.verbose(level)[source]¶
On-demand oneDNN (former MKL-DNN) verbosing functionality To make it easier to debug performance issues, oneDNN can dump verbose messages containing information like kernel size, input data size and execution duration while executing the kernel. The verbosing functionality can be invoked via an environment variable named DNNL_VERBOSE. However, this methodology dumps messages in all steps. Those are a large amount of verbose messages. Moreover, for investigating the performance issues, generally taking verbose messages for one single iteration is enough. This on-demand verbosing functionality makes it possible to control scope for verbose message dumping. In the following example, verbose messages will be dumped out for the second inference only.
import torch model(data) with torch.backends.mkldnn.verbose(torch.backends.mkldnn.VERBOSE_ON): model(data)
- Parameters:
level – Verbose level -
VERBOSE_OFF
: Disable verbosing -VERBOSE_ON
: Enable verbosing -VERBOSE_ON_CREATION
: Enable verbosing, including oneDNN kernel creation
torch.backends.openmp¶
torch.backends.opt_einsum¶
- torch.backends.opt_einsum.is_available()[source]¶
Returns a bool indicating if opt_einsum is currently available.
- Return type:
- torch.backends.opt_einsum.get_opt_einsum()[source]¶
Returns the opt_einsum package if opt_einsum is currently available, else None.
- Return type:
- torch.backends.opt_einsum.enabled¶
A :class:
bool
that controls whether opt_einsum is enabled (True
by default). If so, torch.einsum will use opt_einsum (https://optimized-einsum.readthedocs.io/en/stable/path_finding.html) if available to calculate an optimal path of contraction for faster performance.If opt_einsum is not available, torch.einsum will fall back to the default contraction path of left to right.
- torch.backends.opt_einsum.strategy¶
A :class:
str
that specifies which strategies to try whentorch.backends.opt_einsum.enabled
isTrue
. By default, torch.einsum will try the “auto” strategy, but the “greedy” and “optimal” strategies are also supported. Note that the “optimal” strategy is factorial on the number of inputs as it tries all possible paths. See more details in opt_einsum’s docs (https://optimized-einsum.readthedocs.io/en/stable/path_finding.html).