torchrec.inference¶
torchrec.inference.model_packager¶
- class torchrec.inference.model_packager.PredictFactoryPackager¶
Bases:
object
- classmethod save_predict_factory(predict_factory: ~typing.Type[~torchrec.inference.modules.PredictFactory], configs: ~typing.Dict[str, ~typing.Any], output: ~typing.Union[str, ~pathlib.Path, ~typing.BinaryIO], extra_files: ~typing.Dict[str, ~typing.Union[str, bytes]], loader_code: str = '\nimport %PACKAGE%\n\nMODULE_FACTORY=%PACKAGE%.%CLASS%\n', package_importer: ~typing.Union[~torch.package.importer.Importer, ~typing.List[~torch.package.importer.Importer]] = <torch.package.importer._SysImporter object>) None ¶
- abstract classmethod set_extern_modules()¶
A decorator indicating abstract classmethods.
Deprecated, use ‘classmethod’ with ‘abstractmethod’ instead.
- abstract classmethod set_mocked_modules()¶
A decorator indicating abstract classmethods.
Deprecated, use ‘classmethod’ with ‘abstractmethod’ instead.
- torchrec.inference.model_packager.load_config_text(name: str) str ¶
- torchrec.inference.model_packager.load_pickle_config(name: str, clazz: Type[T]) T ¶
torchrec.inference.modules¶
- class torchrec.inference.modules.BatchingMetadata(type: str, device: str, pinned: List[str])¶
Bases:
object
Metadata class for batching, this should be kept in sync with the C++ definition.
- device: str¶
- pinned: List[str]¶
- type: str¶
- class torchrec.inference.modules.PredictFactory¶
Bases:
ABC
Creates a model (with already learned weights) to be used inference time.
- abstract batching_metadata() Dict[str, BatchingMetadata] ¶
Returns a dict from input name to BatchingMetadata. This infomation is used for batching for input requests.
- batching_metadata_json() str ¶
Serialize the batching metadata to JSON, for ease of parsing with torch::deploy environments.
- abstract create_predict_module() Module ¶
Returns already sharded model with allocated weights. state_dict() must match TransformModule.transform_state_dict(). It assumes that torch.distributed.init_process_group was already called and will shard model according to torch.distributed.get_world_size().
- model_inputs_data() Dict[str, Any] ¶
Returns a dict of various data for benchmarking input generation.
- qualname_metadata() Dict[str, QualNameMetadata] ¶
Returns a dict from qualname (method name) to QualNameMetadata. This is additional information for execution of specific methods of the model.
- qualname_metadata_json() str ¶
Serialize the qualname metadata to JSON, for ease of parsing with torch::deploy environments.
- abstract result_metadata() str ¶
Returns a string which represents the result type. This information is used for result split.
- abstract run_weights_dependent_transformations(predict_module: Module) Module ¶
Run transformations that depends on weights of the predict module. e.g. lowering to a backend.
- abstract run_weights_independent_tranformations(predict_module: Module) Module ¶
Run transformations that don’t rely on weights of the predict module. e.g. fx tracing, model split etc.
- class torchrec.inference.modules.PredictModule(module: Module)¶
Bases:
Module
Interface for modules to work in a torch.deploy based backend. Users should override predict_forward to convert batch input format to module input format.
- Call Args:
batch: a dict of input tensors
- Returns:
a dict of output tensors
- Return type:
output
- Parameters:
module – the actual predict module
device – the primary device for this module that will be used in forward calls.
Example:
module = PredictModule(torch.device("cuda", torch.cuda.current_device()))
- forward(batch: Dict[str, Tensor]) Any ¶
Define the computation performed at every call.
Should be overridden by all subclasses.
Note
Although the recipe for forward pass needs to be defined within this function, one should call the
Module
instance afterwards instead of this since the former takes care of running the registered hooks while the latter silently ignores them.
- abstract predict_forward(batch: Dict[str, Tensor]) Any ¶
- property predict_module: Module¶
- state_dict(destination: Optional[Dict[str, Any]] = None, prefix: str = '', keep_vars: bool = False) Dict[str, Any] ¶
Return a dictionary containing references to the whole state of the module.
Both parameters and persistent buffers (e.g. running averages) are included. Keys are corresponding parameter and buffer names. Parameters and buffers set to
None
are not included.Note
The returned object is a shallow copy. It contains references to the module’s parameters and buffers.
Warning
Currently
state_dict()
also accepts positional arguments fordestination
,prefix
andkeep_vars
in order. However, this is being deprecated and keyword arguments will be enforced in future releases.Warning
Please avoid the use of argument
destination
as it is not designed for end-users.- Parameters:
destination (dict, optional) – If provided, the state of module will be updated into the dict and the same object is returned. Otherwise, an
OrderedDict
will be created and returned. Default:None
.prefix (str, optional) – a prefix added to parameter and buffer names to compose the keys in state_dict. Default:
''
.keep_vars (bool, optional) – by default the
Tensor
s returned in the state dict are detached from autograd. If it’s set toTrue
, detaching will not be performed. Default:False
.
- Returns:
a dictionary containing a whole state of the module
- Return type:
dict
Example:
>>> # xdoctest: +SKIP("undefined vars") >>> module.state_dict().keys() ['bias', 'weight']
- training: bool¶
- class torchrec.inference.modules.QualNameMetadata(need_preproc: bool)¶
Bases:
object
- need_preproc: bool¶
- torchrec.inference.modules.quantize_dense(predict_module: PredictModule, dtype: dtype, additional_embedding_module_type: List[Type[Module]] = []) Module ¶
- torchrec.inference.modules.quantize_embeddings(module: Module, dtype: dtype, inplace: bool, additional_qconfig_spec_keys: Optional[List[Type[Module]]] = None, additional_mapping: Optional[Dict[Type[Module], Type[Module]]] = None, output_dtype: dtype = torch.float32, per_table_weight_dtype: Optional[Dict[str, dtype]] = None) Module ¶
- torchrec.inference.modules.quantize_feature(module: Module, inputs: Tuple[Tensor, ...]) Tuple[Tensor, ...] ¶
- torchrec.inference.modules.quantize_inference_model(model: Module, quantization_mapping: Optional[Dict[str, Type[Module]]] = None, per_table_weight_dtype: Optional[Dict[str, dtype]] = None, fp_weight_dtype: dtype = torch.int8) Module ¶
Quantize the model.
- torchrec.inference.modules.shard_quant_model(model: Module, world_size: int = 1, compute_device: str = 'cuda', sharders: Optional[List[ModuleSharder[Module]]] = None, fused_params: Optional[Dict[str, Any]] = None, device_memory_size: Optional[int] = None, constraints: Optional[Dict[str, ParameterConstraints]] = None) Tuple[Module, ShardingPlan] ¶
Shard the model.
- torchrec.inference.modules.trim_torch_package_prefix_from_typename(typename: str) str ¶