torch_tensorrt.ts¶

Functions¶

torch_tensorrt.ts.compile(module: ScriptModule, inputs: Optional[Sequence[Input | torch.Tensor]] = None, input_signature: Optional[Tuple[Union[Input, Tensor, Sequence[Any]]]] = None, device: Device = Device(type=DeviceType.GPU, gpu_id=0), disable_tf32: bool = False, sparse_weights: bool = False, enabled_precisions: Optional[Set[torch.dtype | dtype]] = None, refit: bool = False, debug: bool = False, capability: EngineCapability = EngineCapability.STANDARD, num_avg_timing_iters: int = 1, workspace_size: int = 0, dla_sram_size: int = 1048576, dla_local_dram_size: int = 1073741824, dla_global_dram_size: int = 536870912, calibrator: object = None, truncate_long_and_double: bool = False, require_full_compilation: bool = False, min_block_size: int = 3, torch_executed_ops: Optional[List[str]] = None, torch_executed_modules: Optional[List[str]] = None, allow_shape_tensors: bool = False) → ScriptModule[source]¶

Compile a TorchScript module for NVIDIA GPUs using TensorRT

Takes a existing TorchScript module and a set of settings to configure the compiler and will convert methods to JIT Graphs which call equivalent TensorRT engines

Converts specifically the forward method of a TorchScript Module

Parameters

module (torch.jit.ScriptModule) – Source module, a result of tracing or scripting a PyTorch torch.nn.Module

Keyword Arguments

inputs (List[Union(Input, torch.Tensor)]) –

Required List of specifications of input shape, dtype and memory layout for inputs to the module. This argument is required. Input Sizes can be specified as torch sizes, tuples or lists. dtypes can be specified using torch datatypes or torch_tensorrt datatypes and you can use either torch devices or the torch_tensorrt device type enum to select device type.

input=[
    torch_tensorrt.Input((1, 3, 224, 224)), # Static NCHW input shape for input #1
    torch_tensorrt.Input(
        min_shape=(1, 224, 224, 3),
        opt_shape=(1, 512, 512, 3),
        max_shape=(1, 1024, 1024, 3),
        dtype=torch.int32
        format=torch.channel_last
    ), # Dynamic input shape for input #2
    torch.randn((1, 3, 224, 244)) # Use an example tensor and let torch_tensorrt infer settings
]

Union (input_signature) –

A formatted collection of input specifications for the module. Input Sizes can be specified as torch sizes, tuples or lists. dtypes can be specified using torch datatypes or torch_tensorrt datatypes and you can use either torch devices or the torch_tensorrt device type enum to select device type. This API should be considered beta-level stable and may change in the future

input_signature=([
    torch_tensorrt.Input((1, 3, 224, 224)), # Static NCHW input shape for input #1
    torch_tensorrt.Input(
        min_shape=(1, 224, 224, 3),
        opt_shape=(1, 512, 512, 3),
        max_shape=(1, 1024, 1024, 3),
        dtype=torch.int32
        format=torch.channel_last
    ), # Dynamic input shape for input #2
], torch.randn((1, 3, 224, 244))) # Use an example tensor and let torch_tensorrt infer settings for input #3

device (Union(Device, torch.device, dict)) –
Target device for TensorRT engines to run on
```
device=torch_tensorrt.Device("dla:1", allow_gpu_fallback=True)
```
disable_tf32 (bool) – Force FP32 layers to use traditional as FP32 format vs the default behavior of rounding the inputs to 10-bit mantissas before multiplying, but accumulates the sum using 23-bit mantissas
sparse_weights (bool) – Enable sparsity for convolution and fully connected layers.
enabled_precision (Set(Union(torch.dpython:type, torch_tensorrt.dpython:type))) – The set of datatypes that TensorRT can use when selecting kernels
refit (bool) – Enable refitting
debug (bool) – Enable debuggable engine
capability (EngineCapability) – Restrict kernel selection to safe gpu kernels or safe dla kernels
num_avg_timing_iters (python:int) – Number of averaging timing iterations used to select kernels
workspace_size (python:int) – Maximum size of workspace given to TensorRT
dla_sram_size (python:int) – Fast software managed RAM used by DLA to communicate within a layer.
dla_local_dram_size (python:int) – Host RAM used by DLA to share intermediate tensor data across operations
dla_global_dram_size (python:int) – Host RAM used by DLA to store weights and metadata for execution
truncate_long_and_double (bool) – Truncate weights provided in int64 or double (float64) to int32 and float32
calibrator (Union(torch_tensorrt._C.IInt8Calibrator, tensorrt.IInt8Calibrator)) – Calibrator object which will provide data to the PTQ system for INT8 Calibration
require_full_compilation (bool) – Require modules to be compiled end to end or return an error as opposed to returning a hybrid graph where operations that cannot be run in TensorRT are run in PyTorch
min_block_size (python:int) – The minimum number of contiguous TensorRT convertible operations in order to run a set of operations in TensorRT
torch_executed_ops (List[str]) – List of aten operators that must be run in PyTorch. An error will be thrown if this list is not empty but require_full_compilation is True
torch_executed_modules (List[str]) – List of modules that must be run in PyTorch. An error will be thrown if this list is not empty but require_full_compilation is True
allow_shape_tensors – (Experimental) Allow aten::size to output shape tensors using IShapeLayer in TensorRT

Returns

Compiled TorchScript Module, when run it will execute via TensorRT

Return type

torch.jit.ScriptModule

torch_tensorrt.ts.convert_method_to_trt_engine(module: ScriptModule, method_name: str = 'forward', inputs: Optional[Sequence[Input | torch.Tensor]] = None, device: Device = Device(type=DeviceType.GPU, gpu_id=0), disable_tf32: bool = False, sparse_weights: bool = False, enabled_precisions: Optional[Set[torch.dtype | dtype]] = None, refit: bool = False, debug: bool = False, capability: EngineCapability = EngineCapability.STANDARD, num_avg_timing_iters: int = 1, workspace_size: int = 0, dla_sram_size: int = 1048576, dla_local_dram_size: int = 1073741824, dla_global_dram_size: int = 536870912, truncate_long_and_double: int = False, calibrator: object = None, allow_shape_tensors: bool = False) → bytes[source]¶

Convert a TorchScript module method to a serialized TensorRT engine

Converts a specified method of a module to a serialized TensorRT engine given a dictionary of conversion settings

Parameters

module (torch.jit.ScriptModule) – Source module, a result of tracing or scripting a PyTorch torch.nn.Module

Keyword Arguments

inputs (List[Union(Input, torch.Tensor)]) –

Required List of specifications of input shape, dtype and memory layout for inputs to the module. This argument is required. Input Sizes can be specified as torch sizes, tuples or lists. dtypes can be specified using torch datatypes or torch_tensorrt datatypes and you can use either torch devices or the torch_tensorrt device type enum to select device type.

input=[
    torch_tensorrt.Input((1, 3, 224, 224)), # Static NCHW input shape for input #1
    torch_tensorrt.Input(
        min_shape=(1, 224, 224, 3),
        opt_shape=(1, 512, 512, 3),
        max_shape=(1, 1024, 1024, 3),
        dtype=torch.int32
        format=torch.channel_last
    ), # Dynamic input shape for input #2
    torch.randn((1, 3, 224, 244)) # Use an example tensor and let torch_tensorrt infer settings
]

method_name (str) – Name of method to convert

Union (input_signature) –

A formatted collection of input specifications for the module. Input Sizes can be specified as torch sizes, tuples or lists. dtypes can be specified using torch datatypes or torch_tensorrt datatypes and you can use either torch devices or the torch_tensorrt device type enum to select device type. This API should be considered beta-level stable and may change in the future

input_signature=([
    torch_tensorrt.Input((1, 3, 224, 224)), # Static NCHW input shape for input #1
    torch_tensorrt.Input(
        min_shape=(1, 224, 224, 3),
        opt_shape=(1, 512, 512, 3),
        max_shape=(1, 1024, 1024, 3),
        dtype=torch.int32
        format=torch.channel_last
    ), # Dynamic input shape for input #2
], torch.randn((1, 3, 224, 244))) # Use an example tensor and let torch_tensorrt infer settings for input #3

device (Union(Device, torch.device, dict)) –
Target device for TensorRT engines to run on
```
device=torch_tensorrt.Device("dla:1", allow_gpu_fallback=True)
```
disable_tf32 (bool) – Force FP32 layers to use traditional as FP32 format vs the default behavior of rounding the inputs to 10-bit mantissas before multiplying, but accumulates the sum using 23-bit mantissas
sparse_weights (bool) – Enable sparsity for convolution and fully connected layers.
enabled_precision (Set(Union(torch.dpython:type, torch_tensorrt.dpython:type))) – The set of datatypes that TensorRT can use when selecting kernels
refit (bool) – Enable refitting
debug (bool) – Enable debuggable engine
capability (EngineCapability) – Restrict kernel selection to safe gpu kernels or safe dla kernels
num_avg_timing_iters (python:int) – Number of averaging timing iterations used to select kernels
workspace_size (python:int) – Maximum size of workspace given to TensorRT
dla_sram_size (python:int) – Fast software managed RAM used by DLA to communicate within a layer.
dla_local_dram_size (python:int) – Host RAM used by DLA to share intermediate tensor data across operations
dla_global_dram_size (python:int) – Host RAM used by DLA to store weights and metadata for execution
truncate_long_and_double (bool) – Truncate weights provided in int64 or double (float64) to int32 and float32
calibrator (Union(torch_tensorrt._C.IInt8Calibrator, tensorrt.IInt8Calibrator)) – Calibrator object which will provide data to the PTQ system for INT8 Calibration
allow_shape_tensors – (Experimental) Allow aten::size to output shape tensors using IShapeLayer in TensorRT

Returns

Serialized TensorRT engine, can either be saved to a file or deserialized via TensorRT APIs

Return type

bytes

torch_tensorrt.ts.check_method_op_support(module: ScriptModule, method_name: str = 'forward') → bool[source]¶

Checks to see if a method is fully supported by torch_tensorrt

Checks if a method of a TorchScript module can be compiled by torch_tensorrt, if not, a list of operators that are not supported are printed out and the function returns false, else true.

Parameters

module (torch.jit.ScriptModule) – Source module, a result of tracing or scripting a PyTorch torch.nn.Module
method_name (str) – Name of method to check

Returns

True if supported Method

Return type

bool

torch_tensorrt.ts.embed_engine_in_new_module(serialized_engine: bytes, input_binding_names: Optional[List[str]] = None, output_binding_names: Optional[List[str]] = None, device: Device = Device(type=DeviceType.GPU, gpu_id=0)) → ScriptModule[source]¶

Takes a pre-built serialized TensorRT engine and embeds it within a TorchScript module

Takes a pre-built serialied TensorRT engine (as bytes) and embeds it within a TorchScript module. Registers the forward method to execute the TensorRT engine with the function signature:

forward(Tensor[]) -> Tensor[]

TensorRT bindings either be explicitly specified using [in/out]put_binding_names or have names with the following format:

[symbol].[index in input / output array]

ex. - [x.0, x.1, x.2] -> [y.0]

Module can be save with engine embedded with torch.jit.save and moved / loaded according to torch_tensorrt portability rules

Parameters

serialized_engine (bytearray) – Serialized TensorRT engine from either torch_tensorrt or TensorRT APIs

Keyword Arguments

input_binding_names (List[str]) – List of names of TensorRT bindings in order to be passed to the encompassing PyTorch module
output_binding_names (List[str]) – List of names of TensorRT bindings in order that should be returned from the encompassing PyTorch module
device (Union(Device, torch.device, dict)) – Target device to run engine on. Must be compatible with engine provided. Default: Current active device

Returns

New TorchScript module with engine embedded

Return type

torch.jit.ScriptModule

torch_tensorrt.ts.TensorRTCompileSpec(inputs: Optional[List[torch.Tensor | Input]] = None, input_signature: Optional[Any] = None, device: torch.device | Device = Device(type=DeviceType.GPU, gpu_id=0), disable_tf32: bool = False, sparse_weights: bool = False, enabled_precisions: Optional[Set[torch.dtype | dtype]] = None, refit: bool = False, debug: bool = False, capability: EngineCapability = EngineCapability.STANDARD, num_avg_timing_iters: int = 1, workspace_size: int = 0, dla_sram_size: int = 1048576, dla_local_dram_size: int = 1073741824, dla_global_dram_size: int = 536870912, truncate_long_and_double: bool = False, calibrator: object = None, allow_shape_tensors: bool = False) → <torch.ScriptClass object at 0x7fc7d9b7d2f0>[source]¶

Utility to create a formatted spec dictionary for using the PyTorch TensorRT backend

Keyword Arguments

inputs (List[Union(Input, torch.Tensor)]) –

Required List of specifications of input shape, dtype and memory layout for inputs to the module. This argument is required. Input Sizes can be specified as torch sizes, tuples or lists. dtypes can be specified using torch datatypes or torch_tensorrt datatypes and you can use either torch devices or the torch_tensorrt device type enum to select device type.

input=[
    torch_tensorrt.Input((1, 3, 224, 224)), # Static NCHW input shape for input #1
    torch_tensorrt.Input(
        min_shape=(1, 224, 224, 3),
        opt_shape=(1, 512, 512, 3),
        max_shape=(1, 1024, 1024, 3),
        dtype=torch.int32
        format=torch.channel_last
    ), # Dynamic input shape for input #2
    torch.randn((1, 3, 224, 244)) # Use an example tensor and let torch_tensorrt infer settings
]

device (Union(Device, torch.device, dict)) –
Target device for TensorRT engines to run on
```
device=torch_tensorrt.Device("dla:1", allow_gpu_fallback=True)
```
disable_tf32 (bool) – Force FP32 layers to use traditional as FP32 format vs the default behavior of rounding the inputs to 10-bit mantissas before multiplying, but accumulates the sum using 23-bit mantissas
sparse_weights (bool) – Enable sparsity for convolution and fully connected layers.
enabled_precision (Set(Union(torch.dpython:type, torch_tensorrt.dpython:type))) – The set of datatypes that TensorRT can use when selecting kernels
refit (bool) – Enable refitting
debug (bool) – Enable debuggable engine
capability (EngineCapability) – Restrict kernel selection to safe gpu kernels or safe dla kernels
num_avg_timing_iters (python:int) – Number of averaging timing iterations used to select kernels
workspace_size (python:int) – Maximum size of workspace given to TensorRT
truncate_long_and_double (bool) – Truncate weights provided in int64 or double (float64) to int32 and float32
calibrator (Union(torch_tensorrt._C.IInt8Calibrator, tensorrt.IInt8Calibrator)) – Calibrator object which will provide data to the PTQ system for INT8 Calibration
allow_shape_tensors –
(Experimental) Allow aten::size to output shape tensors using IShapeLayer in TensorRT

Returns:
torch.classes.tensorrt.CompileSpec: List of methods and formatted spec objects to be provided to torch._C._jit_to_tensorrt

torch_tensorrt.ts¶

Functions¶

Docs

Tutorials

Resources