Torchserve Use Cases¶

Torchserve can be used for different use cases. In order to make it convenient for users, some of them have been documented here. These use-cases assume you have pre-trained model(s) and torchserve, torch-model-archiver is installed on your target system. This should help you in moving your development environment model to production/serving environment.

NOTES

If you have not installed latest torchserve and torch-model-archiver then follow installation instructions and complete installation
If planning to use docker make sure following prerequisites are in place -
- Make sure you have latest docker engine install on your target node. If not then use this link to install it.
- Follow instructions install using docker to share model-store directory and start torchserve
The following use-case steps uses curl to execute torchserve REST api calls. However, you can also use chrome plugin postman for this.
Please refer default_handler to understand default handlers.
Please refer custom handlers to understand custom handlers.

Use Cases¶

Serve pytorch eager mode model

Serve pytorch scripted mode model

Serve ready made models on torchserve model zoo

Secure model serving

Serve models on GPUs

Serve custom models with no third party dependency

Serve custom models with third party dependency

Serve models for A/B testing

Deploy pytorch eager mode model¶

Steps to deploy your model(s)

Create MAR file for torch eager model

torch-model-archiver --model-name <your_model_name> --version 1.0 --model-file <your_model_file>.py --serialized-file <your_model_name>.pth --handler <default_handler> --extra-files ./index_to_name.json
mkdir model_store
mv <your_model_name>.mar model_store/

Docker - It is possible to build MAR file directly on docker, refer this for details.

Place MAR file in a new directory name it as model-store (this can be any name)
- Docker - Make sure that MAR file is being copied in volume/directory shared while starting torchserve docker image
Start torchserve with following command - torchserve --start --ncs --model-store <model_store or your_model_store_dir>
- Docker - This is not applicable.
Register model i.e. MAR file created in step 1 above as curl -v -X POST "http://localhost:8081/models?initial_workers=1&synchronous=true&url=<your model_name>.mar"
Check if model has been successfully registered as curl http://localhost:8081/models/<your_model_name>
Scale up workers based on kind of load you are expecting. We have kept min-worker as 1 in registration request above. curl -v -X PUT "http://localhost:8081/models/<your model_name>?min_worker=1&synchronous=true”
Do inference using following curl api call - curl http://localhost:8080/predictions/<your_model_name> -T <your_input_file>. You can also use Postman GUI tool for HTTP request and response.

Expected outcome

Able to deploy any scripted model
Able to do inference using deployed model

Deploy pytorch scripted mode model¶

Prerequisites

Assuming you have a torchscripted model if not then follow instructions in this example to save your eager mode model as scripted model.

Steps to deploy your model(s)

Create MAR file for torch scripted model

torch-model-archiver --model-name <your_model_name> --version 1.0  --serialized-file <your_model_name>.pt --extra-files ./index_to_name.json --handler <default_handler>
mkdir model-store
mv <your_model_name>.mar model-store/

Docker - It is possible to build MAR file directly on docker, refer this for details.

Place MAR file in a new directory name it as model-store (this can be any name)
- Docker - Make sure that MAR file is being copied in volume/directory shared while starting torchserve docker image
Start torchserve with following command - torchserve --start --ncs --model-store <model_store or your_model_store_dir>
- Docker - This is not applicable.
Register model i.e. MAR file created in step 1 above as curl -v -X POST "http://localhost:8081/models?initial_workers=1&synchronous=true&url=<your model_name>.mar"
Check if model has been successfully registered as curl http://localhost:8081/models/<your_model_name>
Scale up workers based on kind of load you are expecting. We have kept min-worker as 1 in registration request above. curl -v -X PUT "http://localhost:8081/models/<your model_name>?min_worker=1&synchronous=true”
Do inference using following curl api call - curl http://localhost:8080/predictions/<your_model_name> -T <your_input_file>. You can also use Postman GUI tool for HTTP request and response.

Expected outcome

Able to deploy any scripted model
Able to do inference using deployed model

Examples

../examples/image_classifier

Serve readymade models on torchserve model zoo¶

This use case demostrates deployment of torch hub based vision models (classifier, object detector, segmenter) available on torchserve model zoo. Use these steps to deploy publically hosted models as well.

Steps to deploy your model(s)

Start torchserve with following command - torchserve --start --ncs --model-store <model_store or your_model_store_dir>
- Docker - This is not applicable.
Register model i.e. MAR file created in step 1 above as curl -v -X POST "http://localhost:8081/models?initial_workers=1&synchronous=true&url=https://<public_url>/<your model_name>.mar"
Check if model has been successfully registered as curl http://localhost:8081/models/<your_model_name>
Scale up workers based on kind of load you are expecting. We have kept min-worker as 1 in registration request above. curl -v -X PUT "http://localhost:8081/models/<your model_name>?min_worker=1&synchronous=true”
Do inference using following curl api call - curl http://localhost:8080/predictions/<your_model_name> -T <your_input_file>. You can also use Postman GUI tool for HTTP request and response.

Expected outcome

Able to deploy any model available in model zoo
Able to do inference using deployed model

Examples

Secure model serving¶

This use case demonstrates torchserve deployment for secure model serving. The example taken here uses eager mode model however you can also deploy scripted models.

Steps to deploy your model(s)

Create MAR file for torch eager model

torch-model-archiver --model-name <your_model_name> --version 1.0 --model-file <your_model_file>.py --serialized-file <your_model_name>.pth --handler <default_handler> --extra-files ./index_to_name.json
mkdir model_store
mv <your_model_name>.mar model_store/

Docker - It is possible to build MAR file directly on docker, refer this for details.

Place MAR file in a new directory name it as model-store (this can be any name)
- Docker - Make sure that MAR file is being copied in volume/directory shared while starting torchserve docker image
Create config.properties file with parameters option 1 or 2 given in enable SSL
Start torchserve using properties file created above as - torchserve --start --ncs --model-store <model_store or your_model_store_dir> --ts-config <your_path>/config.properties
- Docker - docker run --rm -p 127.0.0.1:8443:8433 -p 127.0.0.1:8444:8444 -p 127.0.0.1:8445:8445 -v <local_dir>/model-store:/home/model-server/model-store <your_docker_image> torchserve --model-store=/tmp/models --ts-config <your_path>/config.properties
Register model i.e. MAR file created in step 1 above as curl -k -v -X POST "https://localhost:8081/models?initial_workers=1&synchronous=true&url=https://<s3_path>/<your model_name>.mar"
Check if model has been successfully registered as curl -k https://localhost:8081/models/<your_model_name>
Scale up workers based on kind of load you are expecting. We have kept min-worker as 1 in registration request above. curl -v -X PUT "http://localhost:8081/models/<your model_name>?min_worker=1&synchronous=true”
Do inference using following curl api call - curl -k https://localhost:8080/predictions/<your_model_name> -T <your_input_file>. You can also use Postman GUI tool for HTTP request and response.

NOTICE the use of https and -k option in curl command. In place of -k, you can use other options such as -key etc if you have required key.

Expected outcome

Able to deploy torchserve and access APIs over HTTPs

Examples/Reference

https://github.com/pytorch/serve/blob/master/docs/configuration.md#enable-ssl

Serve models on GPUs¶

This use case demonstrates torchserve deployment on GPU. The example taken here uses scripted mode model however you can also deploy eager models.

Prerequisites

Assuming you have a torchscripted model if not then follow instructions in this example to save your eager mode model as scripted model.

Steps to deploy your model(s)

Create MAR file for torch scripted model

torch-model-archiver --model-name <your_model_name> --version 1.0  --serialized-file <your_model_name>.pt --extra-files ./index_to_name.json --handler <default_handler>
mkdir model-store
mv <your_model_name>.mar model-store/

Docker - It is possible to build MAR file directly on docker, refer this for details.

Move MAR file in a new directory name it as model-store
- Docker - Make sure that MAR file is being copied in volume/directory shared while starting torchserve docker image
torchserve start command in following instruction will automatically detect GPUs and use for loading/serving models. If you want to limit the GPU usage then use nvidia-smi to determine the number of GPU and corresponding ids. Once you have gpu details, you can add number_of_gpu param in config.proerties and use second command as given next instruction. e.g. number_of_gpu=2
Start torchserve with all GPUs- torchserve --start --ncs --model-store <model_store or your_model_store_dir>. With restricted GPUs - torchserve --start --ncs --model-store <model_store or your_model_store_dir> --ts-config <your_path>/config.properties
- Docker - For all GPU docker run --rm -it --gpus all -p 127.0.0.1:8080:8080 -p 127.0.0.1:8081:8081 torchserve:gpu-latest For GPUs 1 and 2 docker run --rm -it --gpus '"device=1,2"' -p 8080:8080 -p 8081:8081 pytorch/torchserve:latest-gpu
- Docker - For details refer start gpu container
Register model i.e. MAR file created in step 1 above as curl -v -X POST "http://localhost:8081/models?initial_workers=1&synchronous=true&url=<your model_name>.mar"
Check if model has been successfully registered as curl http://localhost:8081/models/<your_model_name> The response includes flag indicating model has been loaded on GPU.
Scale up workers based on kind of load you are expecting. We have kept min-worker as 1 in registration request above. curl -v -X PUT "http://localhost:8081/models/<your model_name>?min_worker=1&synchronous=true”
Do inference using following curl api call - curl http://localhost:8080/predictions/<your_model_name> -T <your_input_file>. You can also use Postman GUI tool for HTTP request and response.

Expected outcome

Able to deploy any model to GPU
Able to do inference using deployed model

Serve custom models with no third party dependency¶

This use case demonstrates torchserve deployment for custom models with no python dependency apart from pytorch and related libs. The example taken here uses scripted mode model however you can also deploy eager models.

Prerequisites

Assuming you have a torchscripted model if not then follow instructions in this example to save your eager mode model as scripted model.

Steps to deploy your model(s)

Create <your_custom_handler_py_file>

Create MAR file for torch scripted model

torch-model-archiver --model-name <your_model_name> --version 1.0  --serialized-file <your_model_name>.pt --extra-files ./index_to_name.json --handler <**path/to/your_custom_handler_py_file**>
mkdir model-store
mv <your_model_name>.mar model-store/

Docker - It is possible to build MAR file directly on docker, refer this for details.

Place MAR file in a new directory name it as model-store (this can be any name)
- Docker - Make sure that MAR file is being copied in volume/directory shared while starting torchserve docker image
Start torchserve with following command - torchserve --start --ncs --model-store <model_store or your_model_store_dir>
- Docker - This is not applicable.
Register model i.e. MAR file created in step 1 above as curl -v -X POST "http://localhost:8081/models?initial_workers=1&synchronous=true&url=<your model_name>.mar"
Check if model has been successfully registered as curl http://localhost:8081/models/<your_model_name>
Scale up workers based on kind of load you are expecting. We have kept min-worker as 1 in registration request above. curl -v -X PUT "http://localhost:8081/models/<your model_name>?min_worker=1&synchronous=true”
Do inference using following curl api call - curl http://localhost:8080/predictions/<your_model_name> -T <your_input_file>. You can also use Postman GUI tool for HTTP request and response.

Expected outcome

Able to deploy any model with custom handler

Examples

MNIST example

Serve custom models with third party dependency¶

This use case demonstrates torchserve deployment for custom models with python dependency apart from pytorch and related libs. The example taken here uses scripted mode model however you can also deploy eager models.

Prerequisites

Assuming you have a torchscripted model if not then follow instructions in this example to save your eager mode model as scripted model.

Steps to deploy your model(s)

Create <your_custom_handler_py_file> which uses third party python package such as fairseq for pretrained NMT models
Create a requirements.txt file with an entry for fairseq python package name in it

Create MAR file for torch scripted model with requirements.txt

torch-model-archiver --model-name <your_model_name> --version 1.0  --serialized-file <your_model_name>.pt --extra-files ./index_to_name.json --handler <**path/to/your_custom_handler_py_file**> --requirements-file <your_requirements_txt>
mkdir model-store
mv <your_model_name>.mar model-store/

Docker - It is possible to build MAR file directly on docker, refer this for details.

Place MAR file in a new directory name it as model-store (this can be any name)
- Docker - Make sure that MAR file is being copied in volume/directory shared while starting torchserve docker image
Add following parameter to config.properties file - install_py_dep_per_model=true . For details refer Allow model specific custom python packages .
Start torchserve with following command with config.properties file - torchserve --start --ncs --model-store <model_store or your_model_store_dir> --ts-config <your_path>/config.properties
- Docker - `docker run –rm -p 127.0.0.1:8080:8080 -p 127.0.0.1:8081:8081 -v <local_dir>/model-store:/home/model-server/model-store <your_docker_image> torchserve –model-store=/tmp/models –ts-config <your_path>/config.properties
Register model i.e. MAR file created in step 1 above as curl -v -X POST "http://localhost:8081/models?initial_workers=1&synchronous=true&url=<your model_name>.mar"
Check if model has been successfully registered as curl http://localhost:8081/models/<your_model_name>
Scale up workers based on kind of load you are expecting. We have kept min-worker as 1 in registration request above. curl -v -X PUT "http://localhost:8081/models/<your model_name>?min_worker=1&synchronous=true”
Do inference using following curl api call - curl http://localhost:8080/predictions/<your_model_name> -T <your_input_file>. You can also use Postman GUI tool for HTTP request and response.

Expected outcome

Able to deploy any model with custom handler having third party python dependency

Examples and References

Installing model specific python dependencies

Serve models for AB testing¶

This use case demonstrates serving two or more versions of same model using version API. It is an extension of any of the above use cases.

Prerequisites

You have followed any of the above procedure and have a working torchserve setup along with torch-model-archiver installed.

Steps to deploy your model(s)

Create a model [i.e. mar file] with version 1.0 or as per requirement. Follow the steps given above to create model file e.g. torch-model-archiver –model-name –version 1.0 –model-file model.py –serialized-file .pth –extra-files index_to_name.json –handler .py
Create another model [i.e. mar file] with version 2.0 or as per requirement e.g. torch-model-archiver –model-name –version 2.0 –model-file model.py –serialized-file .pth –extra-files index_to_name.json –handler .py
Register both these models with a initial worker. If you want, you can increase workers by using update api. curl -v -X POST "http://localhost:8081/models?initial_workers=1&synchronous=true&url=<your-model-name-X>.mar"
Now you will be able to invoke these models as
- Model version 1.0 curl http://localhost:8081/models/<your-model-name-X>/1.0 OR curl http://localhost:8080/predictions/<your-model-name-X>/1.0 -F "data=@kitten.jpg"
- Model version 2.0 curl http://localhost:8081/models/<your-model-name-X>/2.0 OR curl http://localhost:8080/predictions/<your-model-name-X>/2.0 -F "data=@kitten.jpg"

Expected outcome

Able to deploy multiple versions of same model

Examples and References

Model management APIs

Torchserve Use Cases¶

Use Cases¶

Deploy pytorch eager mode model¶

Deploy pytorch scripted mode model¶

Serve readymade models on torchserve model zoo¶

Secure model serving¶

Serve models on GPUs¶

Serve custom models with no third party dependency¶

Serve custom models with third party dependency¶

Serve models for AB testing¶

Docs

Tutorials

Resources