Changelog#

0.14.0#

new: TensorRT INT8 and FP8 quantization through ModelOpt (ONNX path)
new: TensorRT NVFP4 quantization through ModelOpt (Torch path)
new: Improved TorchCompile performance for repeated compilations using TORCHINDUCTOR_CACHE_DIR environment variable
new: Global context with scoped variables - temporary context variables
new: Added new context variables INPLACE_OPTIMIZE_WORKSPACE_CONTEXT_KEY and INPLACE_OPTIMIZE_MODULE_GRAPH_ID_CONTEXT_KEY
new: nav.bundle.save now has include and exclude patterns for fine grained files selection
new: GPU and Host memory usage logging
change: Install the TensorRT package for architectures other than x86_64
change: Disable conversion fallback for TensorRT paths and expose control option in custom config
change: Use torch.export.save for Torch-TRT model serialization
change: Added export_engine to OnnxConfig for improved export control
fix: Correctness command relative tolerance formula
fix: Memory management during export and conversion process for Torch

Version of external components used during testing:
- PyTorch 2.7.0a0+7c8ec84dab
- TensorFlow 2.17.0
- TensorRT 10.9.0.34
- TensorRT ModelOptimizer 0.27.0
- Torch-TensorRT 2.7.0a0
- ONNX Runtime 1.20.2
- Polygraphy 0.49.20
- GraphSurgeon 0.5.8
- tf2onnx v1.16.1
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.

0.13.1#

fix: Add AutocastType to public API

Version of external components used during testing:
- PyTorch 2.6.0a0+df5bbc0
- TensorFlow 2.16.1
- TensorRT 10.6.0.26
- Torch-TensorRT 2.6.0a0
- ONNX Runtime 1.19.2
- Polygraphy: 0.49.13
- GraphSurgeon: 0.5.2
- tf2onnx v1.16.1
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.

0.13.0#

new: Introducing custom_args in TensorConfig for custom runners to use which allows dynamic shapes setup for TorchTensorRT compilation
new: autocast_dtype added Torch runner configuration to set the dtype for autocast
new: New version of Onnx Runtime 1.20 for python version >= 3.10
new: Use torch.compile path in heuristic search for max batch size
change: Removed TensorFlow dependencies for nav.jax.optimize
change: Removed PyTorch dependencies from nav.profile
change: Collect all Python packages in status instead of filtered list
change: Use default throughput cutoff threshold for max batch size heuristic when None provided in configuration
change: Updated default ONNX opset to 20 for Torch >= 2.5
fix: Exception is raised with Python >=3.11 due to wrong dataclass initialization
fix: Removed option from ExportOption removed from Torch 2.5
fix: Improved preprocessing stage in Torch based runners
fix: Warn when using autocast with bfloat16 in Torch
fix: Pass runner configuration to runners in nav.profile

Version of external components used during testing:
- PyTorch 2.6.0a0+df5bbc0
- TensorFlow 2.16.1
- TensorRT 10.6.0.26
- Torch-TensorRT 2.6.0a0
- ONNX Runtime 1.19.2
- Polygraphy: 0.49.13
- GraphSurgeon: 0.5.2
- tf2onnx v1.16.1
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.

0.12.0#

new: simple and detailed reporting of the optimization process
new: adjusted exporting TensorFlow SavedModel for Keras 3.x
new: inform user when wrapped a module which is not called during optimize
new: inform user when module uses a custom forward function
new: support for dynamic shapes in Torch ExportedProgram
new: use ExportedProgram for Torch-TensorRT conversion
new: support back-off policy during profiling to avoid reporting local minimum
new: automatically scale conversion batch size when modules have different batch sizes in scope of a single pipeline
change: TensorRT conversion max batch size search rely on saturating throughput for base formats
change: adjusted profiling configuration for throughput cutoff search
change: include optimized pipeline to list of examined variants during nav.profile
change: performance is not executed when correctness failed for format and runtime
change: verify command is not executed when verify function is not provided
change: do not create a model copy before executing torch.compile
fix: pipelines sometimes obtain model and tensors on different devices during nav.profile
fix: extract graph from ExportedProgram for running inference
fix: runner configuration not propagated to pre-processing steps

Version of external components used during testing:
- PyTorch 2.4.0a0+3bcc3cddb5
- TensorFlow 2.16.1
- TensorRT 10.3.0.26
- Torch-TensorRT 2.4.0.a0
- ONNX Runtime 1.18.1
- Polygraphy: 0.49.12
- GraphSurgeon: 0.5.2
- tf2onnx v1.16.1
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.

0.11.0#

new: Python 3.12 support
new: Improved logging
new: optimized in-place module can be stored to Triton model repository
new: multi-profile support for TensorRT model build and runtime
new: measure duration of each command executed in optimization pipeline
new: TensorRT-LLM model store generation for deployment on Triton Inference Server
change: filter unsupported runners instead of raising an error when running optimize
change: moved JAX to support to experimental module and limited support
change: use autocast=True for Torch based runners
change: use torch.inference_mode or torch.no_grad context in nav.profile measurements
change: use multiple strategies to select optimized runtime, defaults to [MaxThroughputAndMinLatencyStrategy, MinLatencyStrategy]
change: trt_profiles are not set automatically for module when using nav.optimize
fix: properly revert log level after torch onnx dynamo export

Version of external components used during testing:
- PyTorch 2.4.0a0+07cecf4
- TensorFlow 2.15.0
- TensorRT 10.0.1.6
- Torch-TensorRT 2.4.0.a0
- ONNX Runtime 1.18.1
- Polygraphy: 0.49.10
- GraphSurgeon: 0.5.2
- tf2onnx v1.16.1
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.

0.10.1#

fix: Check if torch 2 is available before doing dynamo cleanup

Version of external components used during testing:
- PyTorch 2.4.0a0+07cecf4
- TensorFlow 2.15.0
- TensorRT 10.0.1.6
- Torch-TensorRT 2.4.0.a0
- ONNX Runtime 1.18.1
- Polygraphy: 0.49.10
- GraphSurgeon: 0.5.2
- tf2onnx v1.16.1
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.

0.10.0#

new: inplace nav.Module accepts batching flag which overrides a config setting and precision which allows setting appropriate configuration for TensorRT
new: Allow to set device when loading optimized modules using nav.load_optimized()
new: Add support for custom i/o names and dynamic shapes in Torch ONNX Dynamo path
new: Added nav.bundle.save and nav.bundle.load to save and load optimized models from cache
change: Improved optimize and profile status in inplace mode
change: Improved handling defaults for ONNX Dynamo when executing nav.package.optimize
fix: Maintaining modules device in nav.profile()
fix: Add support for all precisions for TensorRT in nav.profile()
fix: Forward method not passed to other inplace modules.

Version of external components used during testing:
- PyTorch 2.4.0a0+07cecf4
- TensorFlow 2.15.0
- TensorRT 10.0.1.6
- Torch-TensorRT 2.4.0.a0
- ONNX Runtime 1.18.0
- Polygraphy: 0.49.10
- GraphSurgeon: 0.5.2
- tf2onnx v1.16.1
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.

0.9.0#

new: TensorRT Timing Tactics Cache Management - using timing tactics cache files for optimization performance improvements
new: Added throughput saturation verification in nav.profile() (enabled by default)
new: Allow to override Inplace cache dir through MODEL_NAVIGATOR_DEFAULT_CACHE_DIR env variable
new: inplace nav.Module can now receive a function name to be used instead of call in modules/submodules, allows customizing modules with non-standard calls
fix: torch dynamo export and torch dynamo onnx export
fix: measurement stabilization in nav.profile()
fix: inplace inference through Torch
fix: trt_profiles argument handling in ONNX to TRT conversion
fix: optimal shape configuration for batch size in Inplace API
change: Disable TensorRT profile builder
change: nav.optimize() does not override module configuration

Version of external components used during testing:
- PyTorch 2.3.0a0+6ddf5cf85e
- TensorFlow 2.15.0
- TensorRT 8.6.3
- Torch-TensorRT 2.0.0.dev0
- ONNX Runtime 1.17.1
- Polygraphy: 0.49.4
- GraphSurgeon: 0.4.6
- tf2onnx v1.16.1
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.

0.8.1#

fix: Inference with TensorRT when model has input with empty shape
fix: Using stabilized runners when model has no batching
fix: Invalid dependencies for cuDNN - review known issues
fix: Make ONNX Graph Surgeon produce artifacts within protobuf Limit (2G)
change: Remove TensorRTCUDAGraph from default runners
change: updated ONNX package to 1.16

Version of external components used during testing:
- PyTorch 2.3.0a0+40ec155e58
- TensorFlow 2.15.0
- TensorRT 8.6.3
- Torch-TensorRT 2.0.0.dev0
- ONNX Runtime 1.17.1
- Polygraphy: 0.49.4
- GraphSurgeon: 0.4.6
- tf2onnx v1.16.1
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.

0.8.0#

new: Allow to select device for TensorRT runner
new: Add device output buffers to TensorRT runner
new: nav.profile added for profiling any Python function
change: API for Inplace optimization (breaking change)
fix: Passing inputs for Torch to ONNX export
fix: Parse args to kwargs in torchscript-trace export
fix: Lower peak memory usage when loading Torch inplace optimized model

Version of external components used during testing:
- PyTorch 2.3.0a0+ebedce2
- TensorFlow 2.15.0
- TensorRT 8.6.3
- Torch-TensorRT 2.0.0.dev0
- ONNX Runtime 1.17.1
- Polygraphy: 0.49.4
- GraphSurgeon: 0.4.6
- tf2onnx v1.16.1
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.

0.7.7#

change: Add input and output specs for Triton model repositories generated from packages

Version of external components used during testing:
- PyTorch 2.2.0a0+81ea7a48
- TensorFlow 2.14.0
- TensorRT 8.6.1
- ONNX Runtime 1.16.2
- Polygraphy: 0.49.0
- GraphSurgeon: 0.3.27
- tf2onnx v1.16.1
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.

0.7.6#

fix: Passing inputs for Torch to ONNX export
fix: Passing input data to OnnxCUDA runner

Version of external components used during testing:
- PyTorch 2.2.0a0+81ea7a48
- TensorFlow 2.14.0
- TensorRT 8.6.1
- ONNX Runtime 1.16.2
- Polygraphy: 0.49.0
- GraphSurgeon: 0.3.27
- tf2onnx v1.16.1
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.

0.7.5#

new: FP8 precision support for TensorRT
new: Support for autocast and inference mode configuration for Torch runners
new: Allow to select device for Torch and ONNX runners
new: Add support for default_model_filename in Triton model configuration
new: Detailed profiling of inference steps (pre- and postprocessing, memcpy and compute)
fix: JAX export and TensorRT conversion fails when custom workspace is used
fix: Missing max workspace size passed to TensorRT conversion
fix: Execution of TensorRT optimize raise error during handling output metadata
fix: Limited Polygraphy version to work correctly with onnxruntime-gpu package

Version of external components used during testing:
- PyTorch 2.2.0a0+6a974be
- TensorFlow 2.13.0
- TensorRT 8.6.1
- ONNX Runtime 1.16.2
- Polygraphy: 0.49.0
- GraphSurgeon: 0.3.27
- tf2onnx v1.15.1
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.

0.7.4#

new: decoupled mode configuration in Triton Model Config
new: support for PyTorch ExportedProgram and ONNX dynamo export
new: added GraphSurgeon ONNX optimization
fix: compatibility of generating PyTriton model config through adapter
fix: installation of packages that are platform dependent
fix: update package config with model loaded from source
change: in TensorRT runner, when TensorType.TORCH is the return type lazily convert tensor to Torch
change: move from Polygraphy CLI to Polygraphy Python API
change: removed Windows from support list

Version of external components used during testing:
- PyTorch 2.1.0a0+32f93b1
- TensorFlow 2.13.0
- TensorRT 8.6.1
- ONNX Runtime 1.16.2
- Polygraphy: 0.49.0
- GraphSurgeon: 0.3.27
- tf2onnx v1.15.1
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.

0.7.3#

new: Data dependent dynamic control flow support in nav.Module (multiple computation graphs per module)
new: Added find max batch size utility
new: Added utilities API documentation
new: Add Timer class for measuring execution time of models and Inplace modules.
fix: Use wide range of shapes for TensorRT conversion
fix: Sorting of samples loaded from workspace
change: in Inplace, store one sample by default per module and store shape info for all samples
change: always execute export for all supported formats
Known issues and limitations:
- nav.Module moves original torch.nn.Module to the CPU, in case of weight sharing that might result in unexpected behaviour
- For data dependent dynamic control flow (multiple computation graphs) nav.Module might copy the weights for each separate graph

Version of external components used during testing:
- PyTorch 2.1.0a0+29c30b1
- TensorFlow 2.13.0
- TensorRT 8.6.1
- ONNX Runtime 1.15.1
- Polygraphy: 0.47.1
- GraphSurgeon: 0.3.27
- tf2onnx v1.15.1
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.

0.7.2#

fix: Obtaining inputs names from ONNX file for TensorRT conversion
change: Raise exception instead of exit with code when required command has failed

Version of external components used during testing:
- PyTorch 2.1.0a0+b5021ba
- TensorFlow 2.12.0
- TensorRT 8.6.1
- ONNX Runtime 1.15.1
- Polygraphy: 0.47.1
- GraphSurgeon: 0.3.27
- tf2onnx v1.14.0
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.

0.7.1#

fix: gather onnx input names based on model’s forward signature
fix: do not run TensorRT max batch size search when max batch size is None
fix: use pytree metadata to flatten torch complex outputs

Version of external components used during testing:
- PyTorch 2.1.0a0+b5021ba
- TensorFlow 2.12.0
- TensorRT 8.6.1
- ONNX Runtime 1.15.1
- Polygraphy: 0.47.1
- GraphSurgeon: 0.3.27
- tf2onnx v1.14.0
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.

0.7.0#

new: Inplace Optimize feature - optimize models directly in the Python code
new: Non-tensor inputs and outputs support
new: Model warmup support in Triton model configuration
new: nav.tensorrt.optimize api added for testing and measuring performance of TensorRT models
new: Extended custom configs to pass arguments directly to export and conversion operations like torch.onnx.export or polygraphy convert
new: Collect GPU clock during model profiling
new: Add option to configure minimal trials and stabilization windows for performance verification and profiling
change: Navigator package version change to 0.2.3. Custom configurations now use trt_profiles list instead single value
change: Store separate reproduction scripts for runners used during correctness and profiling

Version of external components used during testing:
- PyTorch 2.1.0a0+b5021ba
- TensorFlow 2.12.0
- TensorRT 8.6.1
- ONNX Runtime 1.15.1
- Polygraphy: 0.47.1
- GraphSurgeon: 0.3.27
- tf2onnx v1.14.0
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.

0.6.3#

fix: Conditional imports of supported frameworks in export commands

Version of external components used during testing:
- PyTorch 2.1.0a0+4136153
- TensorFlow 2.12.0
- TensorRT 8.6.1
- ONNX Runtime 1.13.1
- Polygraphy: 0.47.1
- GraphSurgeon: 0.3.26
- tf2onnx v1.14.0
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.

0.6.2#

new: Collect information about TensorRT shapes used during conversion
fix: Invalid link in documentation
change: Improved rendering documentation

Version of external components used during testing:
- PyTorch 2.1.0a0+4136153
- TensorFlow 2.12.0
- TensorRT 8.6.1
- ONNX Runtime 1.13.1
- Polygraphy: 0.47.1
- GraphSurgeon: 0.3.26
- tf2onnx v1.14.0
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.

0.6.1#

fix: Add model from package to Triton model store with custom configs

Version of external components used during testing:
- PyTorch 2.1.0a0+4136153
- TensorFlow 2.12.0
- TensorRT 8.6.1
- ONNX Runtime 1.13.1
- Polygraphy: 0.47.1
- GraphSurgeon: 0.3.26
- tf2onnx v1.14.0
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.

0.6.0#

new: Zero-copy runners for Torch, ONNX and TensorRT - omit H2D and D2H memory copy between runners execution
new: nav.pacakge.profile API method to profile generated models on provided dataloader
change: ProfilerConfig replaced with OptimizationProfile:
- new: OptimizationProfile impact the conversion for TensorRT
- new: batch_sizes and max_batch_size limit the max profile in TensorRT conversion
- new: Allow to provide separate dataloader for profiling - first sample used only
new: allow to run nav.package.optimize on empty package - status generation only
new: use torch.inference_mode for inference runner when PyTorch 2.x is available
fix: Missing model in config when passing package generated during nav.{framework}.optimize directly to nav.package.optimize command
Other minor fixes and improvements

Version of external components used during testing:
- PyTorch 2.1.0a0+4136153
- TensorFlow 2.12.0
- TensorRT 8.6.1
- ONNX Runtime 1.13.1
- Polygraphy: 0.47.1
- GraphSurgeon: 0.3.26
- tf2onnx v1.14.0
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.

0.5.6#

fix: Load samples as sorted to keep valid order
fix: Execute conversion when model already exists in path
Other minor fixes and improvements

Version of external components used during testing:
- PyTorch 2.1.0a0+fe05266f
- TensorFlow 2.12.0
- TensorRT 8.6.1
- ONNX Runtime 1.13.1
- Polygraphy: 0.47.1
- GraphSurgeon: 0.3.26
- tf2onnx v1.14.0
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.

0.5.5#

new: Public nav.utilities module with UnpackedDataloader wrapper
new: Added support for strict flag in Torch custom config
new: Extended TensorRT custom config to support builder optimization level and hardware compatibility flags
fix: Invalid optimal shape calculation for odd values in max batch size

Version of external components used during testing:
- PyTorch 2.1.0a0+fe05266f
- TensorFlow 2.12.0
- TensorRT 8.6.1
- ONNX Runtime 1.13.1
- Polygraphy: 0.47.1
- GraphSurgeon: 0.3.26
- tf2onnx v1.14.0
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.

0.5.4#

new: Custom implementation for ONNX and TensorRT runners
new: Use CUDA 12 for JAX in unit tests and functional tests
new: Step-by-step examples
new: Updated documentation
new: TensorRTCUDAGraph runner introduced with support for CUDA graphs
fix: Optimal shape not set correctly during adaptive conversion
fix: Find max batch size command for JAX
fix: Save stdout to logfiles in debug mode

Version of external components used during testing:
- PyTorch 2.1.0a0+fe05266f
- TensorFlow 2.12.0
- TensorRT 8.6.1
- ONNX Runtime 1.13.1
- Polygraphy: 0.47.1
- GraphSurgeon: 0.3.26
- tf2onnx v1.14.0
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.

0.5.3#

fix: filter outputs using output_metadata in ONNX runners

Version of external components used during testing:
- PyTorch 2.0.0a0+1767026
- TensorFlow 2.11.0
- TensorRT 8.5.3.1
- ONNX Runtime 1.13.1
- Polygraphy: 0.44.2
- GraphSurgeon: 0.3.26
- tf2onnx v1.14.0
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.

0.5.2#

new: Added Contributor License Agreement (CLA)
fix: Added missing –extra-index-url to installation instruction for pypi
fix: Updated wheel readme
fix: Do not run TorchScript export when only ONNX in target formats and ONNX extended export is disabled
fix: Log full traceback for ModelNavigatorUserInputError

Version of external components used during testing:
- PyTorch 2.0.0a0+1767026
- TensorFlow 2.11.0
- TensorRT 8.5.3.1
- ONNX Runtime 1.13.1
- Polygraphy: 0.44.2
- GraphSurgeon: 0.3.26
- tf2onnx v1.14.0
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.

0.5.1#

fix: Using relative workspace cause error during Onnx to TensorRT conversion
fix: Added external weight in package for ONNX format
fix: bugfixes for functional tests

Version of external components used during testing:
- PyTorch 1.14.0a0+410ce96
- TensorFlow 2.11.0
- TensorRT 8.5.3
- ONNX Runtime 1.13.1
- Polygraphy: 0.44.2
- GraphSurgeon: 0.4.6
- tf2onnx v1.13.0
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.

0.5.0#

new: Support for PyTriton deployment
new: Support for Python models with python.optimize API
new: PyTorch 2 compile CPU and CUDA runners
new: Collect conversion max batch size in status
new: PyTorch runners with compile support
change: Improved handling CUDA and CPU runners
change: Reduced finding device max batch size time by running it once as separate pipeline
change: Stored find max batch size result in separate filed in status

Version of external components used during testing:
- PyTorch 1.14.0a0+410ce96
- TensorFlow 2.11.0
- TensorRT 8.5.3
- ONNX Runtime 1.13.1
- Polygraphy: 0.44.2
- GraphSurgeon: 0.4.6
- tf2onnx v1.13.0
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.

0.4.4#

fix: when exporting single input model to saved model, unwrap one element list with inputs

Version of external components used during testing:
- PyTorch 1.14.0a0+410ce96
- TensorFlow 2.11.0
- TensorRT 8.5.3
- ONNX Runtime 1.13.1
- Polygraphy: 0.44.2
- GraphSurgeon: 0.4.6
- tf2onnx v1.13.0
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.

0.4.3#

fix: in Keras inference use model.predict(tensor) for single input models

Version of external components used during testing:
- PyTorch 1.14.0a0+410ce96
- TensorFlow 2.11.0
- TensorRT 8.5.3
- ONNX Runtime 1.13.1
- Polygraphy: 0.44.2
- GraphSurgeon: 0.4.6
- tf2onnx v1.13.0
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.

0.4.2#

fix: loading configuration for trt_profile from package
fix: missing reproduction scripts and logs inside package
fix: invalid model path in reproduction script for ONNX to TRT conversion
fix: collecting metadata from ONNX model in main thread during ONNX to TRT conversion

Version of external components used during testing:
- PyTorch 1.14.0a0+410ce96
- TensorFlow 2.11.0
- TensorRT 8.5.3
- ONNX Runtime 1.13.1
- Polygraphy: 0.44.2
- GraphSurgeon: 0.4.6
- tf2onnx v1.13.0
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.

0.4.1#

fix: when specified use dynamic axes from custom OnnxConfig

Version of external components used during testing:
- PyTorch 1.14.0a0+410ce96
- TensorFlow 2.11.0
- TensorRT 8.5.2.2
- ONNX Runtime 1.13.1
- Polygraphy: 0.43.1
- GraphSurgeon: 0.4.6
- tf2onnx v1.13.0
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.

0.4.0#

new: optimize method that replace export and perform max batch size search and improved profiling during process
new: Introduced custom configs in optimize for better parametrization of export/conversion commands
new: Support for adding user runners for model correctness and profiling
new: Search for max possible batch size per format during conversion and profiling
new: API for creating Triton model store from Navigator Package and user provided models
change: Improved status structure for Navigator Package
deprecated: Optimize for Triton Inference Server support
deprecated: HuggingFace contrib module
Bug fixes and other improvements

Version of external components used during testing:
- PyTorch 1.14.0a0+410ce96
- TensorFlow 2.11.0
- TensorRT 8.5.2.2
- ONNX Runtime 1.13.1
- Polygraphy: 0.43.1
- GraphSurgeon: 0.4.6
- tf2onnx v1.13.0
- Other component versions depend on the used framework containers versions. See its support matrix for a detailed summary.

0.3.8#

Updated NVIDIA containers defaults to 22.11

Version of external components used during testing:
- Polygraphy: 0.42.2
- GraphSurgeon: 0.3.19
- Triton Model Analyzer 1.20.0
- tf2onnx: v1.12.1
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.

0.3.7#

Updated NVIDIA containers defaults to 22.10

Version of external components used during testing:
- Polygraphy: 0.42.2
- GraphSurgeon: 0.3.19
- Triton Model Analyzer 1.20.0
- tf2onnx: v1.12.1
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.

0.3.6#

Updated NVIDIA containers defaults to 22.09
Model Navigator Export API:
- new: cast int64 input data to int32 in runner for Torch-TensorRT
- new: cast 64-bit data samples to 32-bit values for TensorRT
- new: verbose flag for logging export and conversion commands to console
- new: debug flag to enable debug mode for export and conversion commands
- change: logs from commands are streamed to console during command run
- change: package load omit the log files and autogenerated scripts

Version of external components used during testing:
- Polygraphy: 0.42.2
- GraphSurgeon: 0.3.19
- Triton Model Analyzer 1.20.0
- tf2onnx: v1.12.1
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.

0.3.5#

Updated NVIDIA containers defaults to 22.08
Model Navigator Export API:
- new: TRTExec runner use use_cuda_graph=True by default
- new: log warning instead of raising error when dataloader dump inputs with nan or inf values
- new: enabled logging for command input parameters
- fix: invalid use of Polygraphy TRT profile when trt_dynamic_axes is passed to export function

Version of external components used during testing:
- Polygraphy: 0.38.0
- GraphSurgeon: 0.3.19
- Triton Model Analyzer 1.19.0
- tf2onnx: v1.12.1
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.

0.3.4#

Updated NVIDIA containers defaults to 22.07
Model Navigator OTIS:
- deprecated: TF32 precision for TensorRT from CLI options - will be removed in future versions
- fix: Tensorflow module was imported when obtaining model signature during conversion
Model Navigator Export API:
- new: Support for building framework containers with Model Navigator installed
- new: Example for loading Navigator Package for reproducing the results
- new: Create reproducing script for correctness and performance steps
- new: TrtexecRunner for correctness and performance tests with trtexec tool
- new: Use TF32 support by default for models with FP32 precision
- new: Reset conversion parameters to defaults when using load for package
- new: Testing all options for JAX export enable_xla and jit_compile parameters
- change: Profiling stability improvements
- change: Rename of onnx_runtimes export function parameters to runtimes
- deprecated: TF32 precision for TensorRT from available options - will be removed in future versions
- fix: Do not save TF-TRT models to the .nav package
- fix: Do not save TF-TRT models from the .nav package
- fix: Correctly load .nav packages when _input_names or _output_names specified
- fix: Adjust TF and TF-TRT model signatures to match input_names
- fix: Save ONNX opset for CLI configuration inside package
- fix: Reproduction scripts were missing for failing paths

Version of external components used during testing:
- Polygraphy: 0.38.0
- GraphSurgeon: 0.3.19
- Triton Model Analyzer 1.17.0
- tf2onnx: v1.11.1
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.

0.3.3#

Model Navigator Export API:
- new: Improved handling inputs and outputs metadata
- new: Navigator Package version updated to 0.1.3
- new: Backward compatibility with previous versions of Navigator Package
- fix: Dynamic shapes for output shapes were read incorrectly

Version of external components used during testing:
- Polygraphy: 0.36.2
- GraphSurgeon: 0.3.19
- Triton Model Analyzer 1.17.0
- tf2onnx: v1.11.1
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.

0.3.2#

Updated NVIDIA containers defaults to 22.06
Model Navigator OTIS:
- new: Perf Analyzer profiling data use base64 format for content
- fix: Signature for TensorRT model when has uint64 or int64 input and/or outputs defined
Model Navigator Export API:
- new: Updated navigator package format to 0.1.1
- new: Added Model Navigator version to status file
- new: Add atol and rtol configuration to CLI config for model
- new: Added experimental support for JAX models
- new: In case of export or conversion failures prepare minimal scripts to reproduce errors
- fix: Conversion parameters are not stored in Navigator Package for CLI execution

Version of external components used during testing:
- Polygraphy: 0.36.2
- GraphSurgeon: 0.3.19
- Triton Model Analyzer 1.17.0
- tf2onnx: v1.11.1
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.

0.3.1#

Updated NVIDIA containers defaults to 22.05
Model Navigator OTIS:
- fix: Saving paths inside the Triton package status file
- fix: Empty list of gpus cause the process run on CPU only
- fix: Reading content from zipped Navigator Package
- fix: When no GPU or target device set to CPU optimize avoid running unsupported conversions in CLI
- new: Converter accept passing target device kind to selected CPU or GPU supported conversions
- new: Added support for OpenVINO accelerator for ONNXRuntime
- new: Added option --config-search-early-exit-enable for Model Analyzer early exit support in manual profiling mode
- new: Added option --model-config-name to the select command. It allows to pick a particular model configuration for deployment from the set of all configurations generated by Triton Model Analyzer, even if it’s not the best performing one.
- removed: The --tensorrt-strict-types option has been removed due to deprecation of the functionality in upstream libraries.
Model Navigator Export API:
- new: Added dynamic shapes support and trt dynamic shapes support for TensorFlow2 export
- new: Improved per format logging
- new: PyTorch to Torch-TRT precision selection added
- new: Advanced profiling (measurement windows, configurable batch sizes)

Version of external components used during testing:
- Polygraphy: 0.36.2
- GraphSurgeon: 0.3.19
- Triton Model Analyzer 1.16.0
- tf2onnx: v1.10.1
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.

0.3.0#

Updated NVIDIA containers defaults to 22.04
Model Navigator Export API
- Support for exporting models from TensorFlow2 and PyTorch source code to supported target formats
- Support for conversion from ONNX to supported target formats
- Support for exporting HuggingFace models
- Conversion, Correctness and performance tests for exported models
- Definition of package structure for storing all exported models and additional metadata
Model Navigator OTIS:
- change: run command has been deprecated and may be removed in a future release
- new: optimize command replace run and produces an output *.triton.nav package
- new: select selects the best-performing configuration from *.triton.nav package and create a Triton Inference Server model repository
- new: Added support for using shared memory option for Perf Analyzer
Remove wkhtmltopdf package dependency

Version of external components used during testing:
- Polygraphy: 0.35.1
- GraphSurgeon: 0.3.14
- Triton Model Analyzer 1.14.0
- tf2onnx: v1.9.3
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.

0.2.7#

Updated NVIDIA containers defaults to 22.02
Removed support for Python 3.7
Triton Model configuration related:
- Support dynamic batching without setting preferred batch size value
Profiling related:
- Deprecated --config-search-max-preferred-batch-size flag as is no longer supported in Triton Model Analyzer

Version of external components used during testing:
- Polygraphy: 0.35.1
- GraphSurgeon: 0.3.14
- Triton Model Analyzer 1.8.2
- tf2onnx: v1.9.3
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.

0.2.6#

Updated NVIDIA containers defaults to 22.01
Removed support for Python 3.6 due to EOL
Conversion related:
- Added support for Torch-TensorRT conversion
Fixes and improvements
- Processes inside containers started by Model Navigator now run without root privileges
- Fix for volume mounts while running Triton Inference Server in container from other container
- Fix for conversion of models without file extension on input and output paths
- Fix using --model-format argument when input and output files have no extension

Version of external components used during testing:
- Polygraphy: 0.35.1
- GraphSurgeon: 0.3.14
- Triton Model Analyzer 1.8.2
- tf2onnx: v1.9.3
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.

Known issues and limitations
- missing support for stateful models (ex. time-series one)
- no verification of conversion results for conversions: TF -> ONNX, TF->TF-TRT, TorchScript -> ONNX
- possible to define a single profile for TensorRT
- no custom ops support
- Triton Inference Server stays in the background when the profile process is interrupted by the user
- TF-TRT conversion lost outputs shapes info

0.2.5#

Updated NVIDIA containers defaults to 21.12
Conversion related:
- [Experimental] TF-TRT - fixed default dataset profile generation
Configuration Model on Triton related
- Fixed name for onnxruntime backend in Triton model deployment configuration

Version of external components used during testing:
- Polygraphy: 0.33.1
- GraphSurgeon: 0.3.14
- Triton Model Analyzer 1.8.2
- tf2onnx: v1.9.3
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.

Known issues and limitations
- missing support for stateful models (ex. time-series one)
- no verification of conversion results for conversions: TF -> ONNX, TF->TF-TRT, TorchScript -> ONNX
- possible to define a single profile for TensorRT
- no custom ops support
- Triton Inference Server stays in the background when the profile process is interrupted by the user
- TF-TRT conversion lost outputs shapes info

0.2.4#

Updated NVIDIA containers defaults to 21.10
Fixed generating profiling data when dtypes are not passed
Conversion related:
- [Experimental] Added support for TF-TRT conversion
Configuration Model on Triton related
- Added possibility to select batching mode - default, dynamic and disabled options supported
Install dependencies from pip packages instead of wheels for Polygraphy and Triton Model Analyzer
fixes and improvements

Version of external components used during testing:
- Polygraphy: 0.33.1
- GraphSurgeon: 0.3.14
- Triton Model Analyzer 1.8.2
- tf2onnx: v1.9.3
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.

Known issues and limitations
- missing support for stateful models (ex. time-series one)
- no verification of conversion results for conversions: TF -> ONNX, TF->TF-TRT, TorchScript -> ONNX
- possible to define a single profile for TensorRT
- no custom ops support
- Triton Inference Server stays in the background when the profile process is interrupted by the user
- TF-TRT conversion lost outputs shapes info

0.2.3#

Updated NVIDIA containers defaults to 21.09
Improved naming of arguments specific for TensorRT conversion and acceleration with backward compatibility
Use pip package for Triton Model Analyzer installation with minimal version 1.8.0
Fixed model_repository path to be not relative to <navigator_workspace> dir
Handle exit codes correctly from CLI commands
Support for use device ids for --gpus argument
Conversion related
- Added support for precision modes to support multiple precisions during conversion to TensorRT
- Added --tensorrt-sparse-weights flag for sparse weight optimization for TensorRT
- Added --tensorrt-strict-types flag forcing it to choose tactics based on the layer precision for TensorRT
- Added --tensorrt-explicit-precision flag enabling explicit precision mode
- Fixed nan values appearing in relative tolerance during conversion to TensorRT
Configuration Model on Triton related
- Removed default value for engine_count_per_device
- Added possibility to define Triton Custom Backend parameters with triton_backend_parameters command
- Added possibility to define max workspace size for TensorRT backend accelerator using argument tensorrt_max_workspace_size
Profiling related
- Added config_search prefix to all profiling parameters (BREAKING CHANGE)
- Added config_search_max_preferred_batch_size parameter
- Added config_search_backend_parameters parameter
fixes and improvements

Versions of used external components:
- Polygraphy: 0.32.0
- GraphSurgeon: 0.3.13
- tf2onnx: v1.9.2 (support for ONNX opset 14, tf 1.15 and 2.6)
- Triton Model Analyzer 1.8.2
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.

Known issues and limitations
- missing support for stateful models (ex. time-series one)
- missing support for models without batching support
- no verification of conversion results for conversions: TF -> ONNX, TorchScript -> ONNX
- possible to define a single profile for TensorRT

0.2.2#

Updated NVIDIA containers defaults to 21.08

Versions of used external components:
- Triton Model Analyzer: 1.7.0
- Triton Inference Server Client: 2.13.0
- Polygraphy: 0.31.1
- GraphSurgeon: 0.3.11
- tf2onnx: v1.9.1 (support for ONNX opset 14, tf 1.15 and 2.5)
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.

Known issues and limitations
- missing support for stateful models (ex. time-series one)
- missing support for models without batching support
- no verification of conversion results for conversions: TF -> ONNX, TorchScript -> ONNX
- possible to define a single profile for TensorRT

0.2.1#

Fixed triton-model-config error when tensorrt_capture_cuda_graph flag is not passed
Dump Conversion Comparator inputs and outputs into JSON files
Added information in logs on the tolerance parameters values to pass the conversion verification
Use count_windows mode as default option for Perf Analyzer
Added possibility to define custom docker images
Bugfixes

Versions of used external components:
- Triton Model Analyzer: 1.6.0
- Triton Inference Server Client: 2.12.0
- Polygraphy: 0.31.1
- GraphSurgeon: 0.3.11
- tf2onnx: v1.9.1 (support for ONNX opset 14, tf 1.15 and 2.5)
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.

Known issues and limitations
- missing support for stateful models (ex. time-series one)
- missing support for models without batching support
- no verification of conversion results for conversions: TF -> ONNX, TorchScript -> ONNX
- possible to define a single profile for TensorRT
- TensorRT backend acceleration not supported for ONNX Runtime in Triton Inference Server ver. 21.07

0.2.0#

comprehensive refactor of command-line API in order to provide more gradual pipeline steps execution

Versions of used external components:
- Triton Model Analyzer: 21.05
- tf2onnx: v1.8.5 (support for ONNX opset 13, tf 1.15 and 2.5)
- Other component versions depend on the used framework and Triton Inference Server containers versions. See its support matrix for a detailed summary.

Known issues and limitations
- missing support for stateful models (ex. time-series one)
- missing support for models without batching support
- no verification of conversion results for conversions: TF -> ONNX, TorchScript -> ONNX
- issues with TorchScript -> ONNX conversion due to issue in PyTorch 1.8
  - affected NVIDIA PyTorch containers: 20.12, 21.02, 21.03
  - workaround: use PyTorch containers newer than 21.03
- possible to define a single profile for TensorRT

0.1.1#

documentation update

0.1.0#

Release of main components:
- Model Converter - converts the model to a set of variants optimized for inference or to be later optimized by Triton Inference Server backend.
- Model Repo Builder - setup Triton Inference Server Model Repository, including its configuration.
- Model Analyzer - select optimal Triton Inference Server configuration based on models compute and memory requirements, available computation infrastructure, and model application constraints.
- Helm Chart Generator - deploy Triton Inference Server and model with optimal configuration to cloud.

Versions of used external components:
- Triton Model Analyzer: 21.03+616e8a30
- tf2onnx: v1.8.4 (support for ONNX opset 13, tf 1.15 and 2.4)
- Other component versions depend on the used framework and Triton Inference Server containers versions. Refer to its support matrix for a detailed summary.

Known issues
- missing support for stateful models (ex. time-series one)
- missing support for models without batching support
- no verification of conversion results for conversions: TF -> ONNX, TorchScript -> ONNX
- issues with TorchScript -> ONNX conversion due to issue in PyTorch 1.8
  - affected NVIDIA PyTorch containers: 20.12, 21.03
  - workaround: use containers different from above
- Triton Inference Server stays in the background when the profile process is interrupted by the user