Torch-TensorRT Explained¶
Torch-TensorRT is a compiler for PyTorch models targeting NVIDIA GPUs via the TensorRT Model Optimization SDK. It aims to provide better inference performance for PyTorch models while still maintaining the great ergonomics of PyTorch.
Dynamo Frontend¶
The Dynamo frontend is the default frontend for Torch-TensorRT. It utilizes the dynamo compiler stack from PyTorch.
torch.compile
(Just-in-time)¶
torch.compile
is a JIT compiler stack, as such, compilation is deferred until first use. This means that as conditions change in the graph, the graph will automatically recompile.
This provides users the most runtime flexibility, however limits options regarding serialization.
Under the hood, torch.compile delegates subgraphs it believes can be lowered to Torch-TensorRT. Torch-TensorRT further lowers these graphs into ops consisting of solely Core ATen Operators
or select “High-level Ops” amenable to TensorRT acceleration. Subgraphs are further partitioned into components that will run in PyTorch and ones to be further compiled to TensorRT based
on support for operators. TensorRT engines then replace supported blocks and a hybrid subgraph is returned to torch.compile
to be run on call.
Accepted Formats¶
torch.fx GraphModule (
torch.fx.GraphModule
)PyTorch Module (
torch.nn.Module
)
Returns¶
Boxed-function that triggers compilation on first call
torch_tensorrt.dynamo.compile
(Ahead-of-time)¶
torch_tensorrt.dynamo.compile
is an AOT compiler, models are compiled in an explicit compilation phase. These compilation artifacts can then be serialized and reloaded at a later date.
Graphs go through the torch.export.trace
system to be lowered into a graph consisting of Core ATen Operators or select “High-level Ops” amenable to TensoRT acceleration.
Subgraphs are further partitioned into components that will run in PyTorch and ones to be further compiled to TensorRT based on support for operators. TensorRT engines then replace supported blocks
and a hybrid subgraph is packed into an ExportedProgram which can be serialized and reloaded.
Accepted Formats¶
torch.export.ExportedProgram (
torch.export.ExportedProgram
)torch.fx GraphModule (
torch.fx.GraphModule
) (viatorch.export.export
)PyTorch Module (
torch.nn.Module
) (viatorch.export.export
)
Returns¶
torch.fx.GraphModule (serializable with
torch.export.ExportedProgram
)
Legacy Frontends¶
As there has been a number of compiler technologies in the PyTorch ecosystem over the years Torch-TensorRT has some legacy features targeting them.
TorchScript (torch_tensorrt.ts.compile)¶
The TorchScript frontend was the original default frontend for Torch-TensorRT and targets models in the TorchScript format. The graph provided will be partitioned into supported and unsupported
blocks. Supported blocks will be lowered to TensorRT and unsupported blocks will remain to run with LibTorch. The resultant graph is returned back to the user as a ScriptModule
that can be loaded and saved
with the Torch-TensorRT PyTorch runtime extension.
Accepted Formats¶
TorchScript Module (
torch.jit.ScriptModule
)PyTorch Module (
torch.nn.Module
) (viatorch.jit.script
ortorch.jit.trace
)
Returns¶
TorchScript Module (
torch.jit.ScriptModule
)
FX Graph Modules (torch_tensorrt.fx.compile)¶
This frontend has almost entirely been replaced by the Dynamo frontend which is a superset of the features available though the FX frontend. The original FX frontend remains in the codebase for backwards compatibility reasons.
Accepted Formats¶
torch.fx GraphModule (
torch.fx.GraphModule
)PyTorch Module (
torch.nn.Module
) (viatorch.fx.trace
)
Returns¶
torch.fx GraphModule (
torch.fx.GraphModule
)
torch_tensorrt.compile
¶
As there are many different frontends and supported formats, we provide a convenience layer called torch_tensorrt.compile
which lets users access
all the different compiler options. You can specify to torch_tensorrt.compile
what compiler path to use by setting the ir
option, telling
Torch-TensorRT to try to lower the provided model through a specific intermediate representation.
ir
Options¶
torch_compile
: Use thetorch.compile
system. Immediately returns a boxed-function that will compile on first calldynamo
: Run the graph through thetorch.export
/ torchdynamo stack. If the input module is atorch.nn.Module
, it must be “export-traceable” as the module will be traced withtorch.export.export
. Returns atorch.fx.GraphModule
which can be run immediately or saved viatorch.export.export
ortorch_tensorrt.save
torchscript
orts
: Run graph through the TorchScript stack. If the input module is atorch.nn.Module
, it must be “scriptable” as the module will be compiled withtorch.jit.script
. Returns atorch.jit.ScriptModule
which can be run immediately or saved viatorch.save
ortorch_tensorrt.save
fx
: Run graph through thetorch.fx
stack. If the input module is atorch.nn.Module
, it will be traced withtorch.fx.trace
and subject to its limitations.