torch.compile¶
- torch.compile(model=None, *, fullgraph=False, dynamic=False, backend='inductor', mode=None, options=None, disable=False)[source]¶
Optimizes given model/function using TorchDynamo and specified backend.
- Parameters:
model (Callable) – Module/function to optimize
fullgraph (bool) – Whether it is ok to break model into several subgraphs
dynamic (bool) – Use dynamic shape tracing
backend (str or Callable) – backend to be used - “inductor” is the default backend, which is a good balance between performance and overhead - Non experimental in-tree backends can be seen with torch._dynamo.list_backends() - Experimental or debug in-tree backends can be seen with torch._dynamo.list_backends(None) - To register an out-of-tree custom backend: https://pytorch.org/docs/master/dynamo/custom-backends.html
mode (str) – Can be either “default”, “reduce-overhead” or “max-autotune” - “default” is the default mode, which is a good balance between performance and overhead - “reduce-overhead” is a mode that reduces the overhead of python with CUDA graphs, useful for small batches - “max-autotune” is a mode that that leverages Triton based matrix multiplications and convolutions - To see the exact configs that each mode sets you can call torch._inductor.list_mode_options()
options (dict) – A dictionary of options to pass to the backend. Some notable ones to try out are - epilogue_fusion which fuses pointwise ops into templates. Requires max_autotune to also be set - max_autotune which will profile to pick the best matmul configuration - fallback_random which is useful when debugging accuracy issues - shape_padding which pads matrix shapes to better align loads on GPUs especially for tensor cores - triton.cudagraphs which will reduce the overhead of python with CUDA graphs - trace.enabled which is the most useful debugging flag to turn on - trace.graph_diagram which will show you a picture of your graph after fusion - For inductor you can see the full list of configs that it supports by calling torch._inductor.list_options()
disable (bool) – Turn torch.compile() into a no-op for testing
- Return type:
Example:
@torch.compile(options={"triton.cudagraphs": True}, fullgraph=True) def foo(x): return torch.sin(x) + torch.cos(x)