PyTorch documentation¶
PyTorch is an optimized tensor library for deep learning using GPUs and CPUs.
Features described in this documentation are classified by release status:
Stable: These features will be maintained long-term and there should generally be no major performance limitations or gaps in documentation. We also expect to maintain backwards compatibility (although breaking changes can happen and notice will be given one release ahead of time).
Beta: These features are tagged as Beta because the API may change based on user feedback, because the performance needs to improve, or because coverage across operators is not yet complete. For Beta features, we are committing to seeing the feature through to the Stable classification. We are not, however, committing to backwards compatibility.
Prototype: These features are typically not available as part of binary distributions like PyPI or Conda, except sometimes behind run-time flags, and are at an early stage for feedback and testing.
- Automatic Mixed Precision examples
- Autograd mechanics
- Broadcasting semantics
- CPU threading and TorchScript inference
- CUDA semantics
- PyTorch Custom Operators Landing Page
- Distributed Data Parallel
- Extending PyTorch
- Extending torch.func with autograd.Function
- Frequently Asked Questions
- FSDP Notes
- Pytorch 2.4: Getting Started on Intel GPU
- Gradcheck mechanics
- HIP (ROCm) semantics
- Features for large-scale deployments
- Modules
- MPS backend
- Multiprocessing best practices
- Numerical accuracy
- Reproducibility
- Serialization semantics
- Windows FAQ
- torch
- torch.nn
- Parameter
- UninitializedParameter
- UninitializedBuffer
- Containers
- Convolution Layers
- Pooling layers
- Padding Layers
- Non-linear Activations (weighted sum, nonlinearity)
- Non-linear Activations (other)
- Normalization Layers
- Recurrent Layers
- Transformer Layers
- Linear Layers
- Dropout Layers
- Sparse Layers
- Distance Functions
- Loss Functions
- Vision Layers
- Shuffle Layers
- DataParallel Layers (multi-GPU, distributed)
- Utilities
- Quantized Functions
- Lazy Modules Initialization
- torch.nn.functional
- torch.Tensor
- Tensor Attributes
- Tensor Views
- torch.amp
- torch.autograd
- torch.autograd.backward
- torch.autograd.grad
- Forward-mode Automatic Differentiation
- Functional higher level API
- Locally disabling gradient computation
- Default gradient layouts
- In-place operations on Tensors
- Variable (deprecated)
- Tensor autograd functions
- Function
- Context method mixins
- Custom Function utilities
- Numerical gradient checking
- Profiler
- Debugging and anomaly detection
- Autograd graph
- torch.library
- torch.cpu
- torch.cuda
- StreamContext
- torch.cuda.can_device_access_peer
- torch.cuda.current_blas_handle
- torch.cuda.current_device
- torch.cuda.current_stream
- torch.cuda.cudart
- torch.cuda.default_stream
- device
- torch.cuda.device_count
- device_of
- torch.cuda.get_arch_list
- torch.cuda.get_device_capability
- torch.cuda.get_device_name
- torch.cuda.get_device_properties
- torch.cuda.get_gencode_flags
- torch.cuda.get_sync_debug_mode
- torch.cuda.init
- torch.cuda.ipc_collect
- torch.cuda.is_available
- torch.cuda.is_initialized
- torch.cuda.memory_usage
- torch.cuda.set_device
- torch.cuda.set_stream
- torch.cuda.set_sync_debug_mode
- torch.cuda.stream
- torch.cuda.synchronize
- torch.cuda.utilization
- torch.cuda.temperature
- torch.cuda.power_draw
- torch.cuda.clock_rate
- torch.cuda.OutOfMemoryError
- Random Number Generator
- Communication collectives
- Streams and events
- Graphs (beta)
- Memory management
- NVIDIA Tools Extension (NVTX)
- Jiterator (beta)
- TunableOp
- Stream Sanitizer (prototype)
- Understanding CUDA Memory Usage
- Generating a Snapshot
- Using the visualizer
- Snapshot API Reference
- torch.mps
- torch.mps.device_count
- torch.mps.synchronize
- torch.mps.get_rng_state
- torch.mps.set_rng_state
- torch.mps.manual_seed
- torch.mps.seed
- torch.mps.empty_cache
- torch.mps.set_per_process_memory_fraction
- torch.mps.current_allocated_memory
- torch.mps.driver_allocated_memory
- torch.mps.recommended_max_memory
- MPS Profiler
- MPS Event
- torch.xpu
- StreamContext
- torch.xpu.current_device
- torch.xpu.current_stream
- device
- torch.xpu.device_count
- device_of
- torch.xpu.empty_cache
- torch.xpu.get_device_capability
- torch.xpu.get_device_name
- torch.xpu.get_device_properties
- torch.xpu.init
- torch.xpu.is_available
- torch.xpu.is_initialized
- torch.xpu.set_device
- torch.xpu.set_stream
- torch.xpu.stream
- torch.xpu.synchronize
- Random Number Generator
- Streams and events
- torch.mtia
- StreamContext
- torch.mtia.current_device
- torch.mtia.current_stream
- torch.mtia.default_stream
- torch.mtia.device_count
- torch.mtia.init
- torch.mtia.is_available
- torch.mtia.is_initialized
- torch.mtia.set_device
- torch.mtia.set_stream
- torch.mtia.stream
- torch.mtia.synchronize
- device
- torch.mtia.DeferredMtiaCallError
- Streams and events
- Meta device
- torch.backends
- torch.export
- torch.distributed
- Backends
- Basics
- Initialization
- Post-Initialization
- Shutdown
- Distributed Key-Value Store
- Groups
- DeviceMesh
- Point-to-point communication
- Synchronous and asynchronous collective operations
- Collective functions
- Profiling Collective Communication
- Multi-GPU collective functions
- Third-party backends
- Launch utility
- Spawn utility
- Debugging
torch.distributed
applications - Logging
- torch.distributed.algorithms.join
- torch.distributed.elastic
- torch.distributed.fsdp
- torch.distributed.optim
- torch.distributed.pipelining
- torch.distributed.tensor.parallel
- torch.distributed.checkpoint
save()
async_save()
save_state_dict()
load()
load_state_dict()
AsyncStager
BlockingAsyncStager
Stateful
StorageReader
StorageWriter
LoadPlanner
LoadPlan
ReadItem
SavePlanner
SavePlan
WriteItem
FileSystemReader
FileSystemWriter
DefaultSavePlanner
DefaultLoadPlanner
get_state_dict()
get_model_state_dict()
get_optimizer_state_dict()
set_state_dict()
set_model_state_dict()
set_optimizer_state_dict()
StateDictOptions
dcp_to_torch_save()
torch_save_to_dcp()
BroadcastingTorchSaveReader
DynamicMetaLoadPlanner
- torch.distributions
- Score function
- Pathwise derivative
- Distribution
- ExponentialFamily
- Bernoulli
- Beta
- Binomial
- Categorical
- Cauchy
- Chi2
- ContinuousBernoulli
- Dirichlet
- Exponential
- FisherSnedecor
- Gamma
- Geometric
- Gumbel
- HalfCauchy
- HalfNormal
- Independent
- InverseGamma
- Kumaraswamy
- LKJCholesky
- Laplace
- LogNormal
- LowRankMultivariateNormal
- MixtureSameFamily
- Multinomial
- MultivariateNormal
- NegativeBinomial
- Normal
- OneHotCategorical
- Pareto
- Poisson
- RelaxedBernoulli
- LogitRelaxedBernoulli
- RelaxedOneHotCategorical
- StudentT
- TransformedDistribution
- Uniform
- VonMises
- Weibull
- Wishart
- KL Divergence
- Transforms
- Constraints
- Constraint Registry
- torch.compiler
- torch.fft
- torch.func
- torch.futures
- torch.fx
- torch.fx.experimental
- torch.hub
- torch.jit
- torch.linalg
- torch.monitor
- torch.signal
- torch.special
- torch.overrides
- torch.package
- torch.profiler
- torch.nn.init
- torch.nn.attention
- torch.onnx
- torch.optim
- Complex Numbers
- DDP Communication Hooks
- Quantization
- Distributed RPC Framework
- torch.random
- torch.masked
- torch.nested
- torch.Size
- torch.sparse
- torch.Storage
- torch.testing
- torch.utils
- torch.utils.benchmark
- torch.utils.bottleneck
- torch.utils.checkpoint
- torch.utils.cpp_extension
- torch.utils.data
- torch.utils.deterministic
- torch.utils.jit
- torch.utils.dlpack
- torch.utils.mobile_optimizer
- torch.utils.model_zoo
- torch.utils.tensorboard
- torch.utils.module_tracker
- Type Info
- Named Tensors
- Named Tensors operator coverage
- torch.__config__
- torch.__future__
- torch._logging
- Torch Environment Variables