Skip to content

Quick Start Guide: NVIDIA Jetson with Ultralytics YOLO11

This comprehensive guide provides a detailed walkthrough for deploying Ultralytics YOLO11 on NVIDIA Jetson devices. Additionally, it showcases performance benchmarks to demonstrate the capabilities of YOLO11 on these small and powerful devices.

New product support

We have updated this guide with the latest NVIDIA Jetson Orin Nano Super Developer Kit which delivers up to 67 TOPS of AI performance โ€” a 1.7X improvement over its predecessor โ€” to seamlessly run the most popular AI models.



Watch: How to use Ultralytics YOLO11 on NVIDIA JETSON Devices

NVIDIA Jetson Ecosystem

Note

This guide has been tested with NVIDIA Jetson AGX Orin Developer Kit (64GB) running the latest stable JetPack release of JP6.2, NVIDIA Jetson Orin Nano Super Developer Kit running JetPack release of JP6.1, Seeed Studio reComputer J4012 which is based on NVIDIA Jetson Orin NX 16GB running JetPack release of JP6.0/ JetPack release of JP5.1.3 and Seeed Studio reComputer J1020 v2 which is based on NVIDIA Jetson Nano 4GB running JetPack release of JP4.6.1. It is expected to work across all the NVIDIA Jetson hardware lineup including latest and legacy.

What is NVIDIA Jetson?

NVIDIA Jetson is a series of embedded computing boards designed to bring accelerated AI (artificial intelligence) computing to edge devices. These compact and powerful devices are built around NVIDIA's GPU architecture and are capable of running complex AI algorithms and deep learning models directly on the device, without needing to rely on cloud computing resources. Jetson boards are often used in robotics, autonomous vehicles, industrial automation, and other applications where AI inference needs to be performed locally with low latency and high efficiency. Additionally, these boards are based on the ARM64 architecture and runs on lower power compared to traditional GPU computing devices.

NVIDIA Jetson Series Comparison

Jetson Orin is the latest iteration of the NVIDIA Jetson family based on NVIDIA Ampere architecture which brings drastically improved AI performance when compared to the previous generations. Below table compared few of the Jetson devices in the ecosystem.

Jetson AGX Orin 64GB Jetson Orin NX 16GB Jetson Orin Nano Super Jetson AGX Xavier Jetson Xavier NX Jetson Nano
AI Performance 275 TOPS 100 TOPS 67 TOPs 32 TOPS 21 TOPS 472 GFLOPS
GPU 2048-core NVIDIA Ampere architecture GPU with 64 Tensor Cores 1024-core NVIDIA Ampere architecture GPU with 32 Tensor Cores 1024-core NVIDIA Ampere architecture GPU with 32 Tensor Cores 512-core NVIDIA Volta architecture GPU with 64 Tensor Cores 384-core NVIDIA Voltaโ„ข architecture GPU with 48 Tensor Cores 128-core NVIDIA Maxwellโ„ข architecture GPU
GPU Max Frequency 1.3 GHz 918 MHz 1020 MHz 1377 MHz 1100 MHz 921MHz
CPU 12-core NVIDIA Armยฎ Cortex A78AE v8.2 64-bit CPU 3MB L2 + 6MB L3 8-core NVIDIA Armยฎ Cortex A78AE v8.2 64-bit CPU 2MB L2 + 4MB L3 6-core Armยฎ Cortexยฎ-A78AE v8.2 64-bit CPU 1.5MB L2 + 4MB L3 8-core NVIDIA Carmel Armยฎv8.2 64-bit CPU 8MB L2 + 4MB L3 6-core NVIDIA Carmel Armยฎv8.2 64-bit CPU 6MB L2 + 4MB L3 Quad-Core Armยฎ Cortexยฎ-A57 MPCore processor
CPU Max Frequency 2.2 GHz 2.0 GHz 1.7 GHz 2.2 GHz 1.9 GHz 1.43GHz
Memory 64GB 256-bit LPDDR5 204.8GB/s 16GB 128-bit LPDDR5 102.4GB/s 8GB 128-bit LPDDR5 102 GB/s 32GB 256-bit LPDDR4x 136.5GB/s 8GB 128-bit LPDDR4x 59.7GB/s 4GB 64-bit LPDDR4 25.6GB/s"

For a more detailed comparison table, please visit the Technical Specifications section of official NVIDIA Jetson page.

What is NVIDIA JetPack?

NVIDIA JetPack SDK powering the Jetson modules is the most comprehensive solution and provides full development environment for building end-to-end accelerated AI applications and shortens time to market. JetPack includes Jetson Linux with bootloader, Linux kernel, Ubuntu desktop environment, and a complete set of libraries for acceleration of GPU computing, multimedia, graphics, and computer vision. It also includes samples, documentation, and developer tools for both host computer and developer kit, and supports higher level SDKs such as DeepStream for streaming video analytics, Isaac for robotics, and Riva for conversational AI.

Flash JetPack to NVIDIA Jetson

The first step after getting your hands on an NVIDIA Jetson device is to flash NVIDIA JetPack to the device. There are several different way of flashing NVIDIA Jetson devices.

  1. If you own an official NVIDIA Development Kit such as the Jetson Orin Nano Developer Kit, you can download an image and prepare an SD card with JetPack for booting the device.
  2. If you own any other NVIDIA Development Kit, you can flash JetPack to the device using SDK Manager.
  3. If you own a Seeed Studio reComputer J4012 device, you can flash JetPack to the included SSD and if you own a Seeed Studio reComputer J1020 v2 device, you can flash JetPack to the eMMC/ SSD.
  4. If you own any other third party device powered by the NVIDIA Jetson module, it is recommended to follow command-line flashing.

Note

For methods 3 and 4 above, after flashing the system and booting the device, please enter "sudo apt update && sudo apt install nvidia-jetpack -y" on the device terminal to install all the remaining JetPack components needed.

JetPack Support Based on Jetson Device

The below table highlights NVIDIA JetPack versions supported by different NVIDIA Jetson devices.

JetPack 4 JetPack 5 JetPack 6
Jetson Nano โœ… โŒ โŒ
Jetson TX2 โœ… โŒ โŒ
Jetson Xavier NX โœ… โœ… โŒ
Jetson AGX Xavier โœ… โœ… โŒ
Jetson AGX Orin โŒ โœ… โœ…
Jetson Orin NX โŒ โœ… โœ…
Jetson Orin Nano โŒ โœ… โœ…

Quick Start with Docker

The fastest way to get started with Ultralytics YOLO11 on NVIDIA Jetson is to run with pre-built docker images for Jetson. Refer to the table above and choose the JetPack version according to the Jetson device you own.

t=ultralytics/ultralytics:latest-jetson-jetpack4
sudo docker pull $t && sudo docker run -it --ipc=host --runtime=nvidia $t
t=ultralytics/ultralytics:latest-jetson-jetpack5
sudo docker pull $t && sudo docker run -it --ipc=host --runtime=nvidia $t
t=ultralytics/ultralytics:latest-jetson-jetpack6
sudo docker pull $t && sudo docker run -it --ipc=host --runtime=nvidia $t

After this is done, skip to Use TensorRT on NVIDIA Jetson section.

Start with Native Installation

For a native installation without Docker, please refer to the steps below.

Run on JetPack 6.1

Install Ultralytics Package

Here we will install Ultralytics package on the Jetson with optional dependencies so that we can export the PyTorch models to other different formats. We will mainly focus on NVIDIA TensorRT exports because TensorRT will make sure we can get the maximum performance out of the Jetson devices.

  1. Update packages list, install pip and upgrade to latest

    sudo apt update
    sudo apt install python3-pip -y
    pip install -U pip
    
  2. Install ultralytics pip package with optional dependencies

    pip install ultralytics[export]
    
  3. Reboot the device

    sudo reboot
    

Install PyTorch and Torchvision

The above ultralytics installation will install Torch and Torchvision. However, these 2 packages installed via pip are not compatible to run on Jetson platform which is based on ARM64 architecture. Therefore, we need to manually install pre-built PyTorch pip wheel and compile/ install Torchvision from source.

Install torch 2.5.0 and torchvision 0.20 according to JP6.1

pip install https://github.com/ultralytics/assets/releases/download/v0.0.0/torch-2.5.0a0+872d972e41.nv24.08-cp310-cp310-linux_aarch64.whl
pip install https://github.com/ultralytics/assets/releases/download/v0.0.0/torchvision-0.20.0a0+afc54f7-cp310-cp310-linux_aarch64.whl

Note

Visit the PyTorch for Jetson page to access all different versions of PyTorch for different JetPack versions. For a more detailed list on the PyTorch, Torchvision compatibility, visit the PyTorch and Torchvision compatibility page.

Install cuSPARSELt to fix a dependency issue with torch 2.5.0

wget https://developer.download.nvidia.com/compute/cuda/repos/ubuntu2204/arm64/cuda-keyring_1.1-1_all.deb
sudo dpkg -i cuda-keyring_1.1-1_all.deb
sudo apt-get update
sudo apt-get -y install libcusparselt0 libcusparselt-dev

Install onnxruntime-gpu

The onnxruntime-gpu package hosted in PyPI does not have aarch64 binaries for the Jetson. So we need to manually install this package. This package is needed for some of the exports.

You can find all available onnxruntime-gpu packagesโ€”organized by JetPack version, Python version, and other compatibility detailsโ€”in the Jetson Zoo ONNX Runtime compatibility matrix. Here we will download and install onnxruntime-gpu 1.20.0 with Python3.10 support.

pip install https://github.com/ultralytics/assets/releases/download/v0.0.0/onnxruntime_gpu-1.20.0-cp310-cp310-linux_aarch64.whl

Note

onnxruntime-gpu will automatically revert back the numpy version to latest. So we need to reinstall numpy to 1.23.5 to fix an issue by executing:

pip install numpy==1.23.5

Run on JetPack 5.1.2

Install Ultralytics Package

Here we will install Ultralytics package on the Jetson with optional dependencies so that we can export the PyTorch models to other different formats. We will mainly focus on NVIDIA TensorRT exports because TensorRT will make sure we can get the maximum performance out of the Jetson devices.

  1. Update packages list, install pip and upgrade to latest

    sudo apt update
    sudo apt install python3-pip -y
    pip install -U pip
    
  2. Install ultralytics pip package with optional dependencies

    pip install ultralytics[export]
    
  3. Reboot the device

    sudo reboot
    

Install PyTorch and Torchvision

The above ultralytics installation will install Torch and Torchvision. However, these 2 packages installed via pip are not compatible to run on Jetson platform which is based on ARM64 architecture. Therefore, we need to manually install pre-built PyTorch pip wheel and compile/ install Torchvision from source.

  1. Uninstall currently installed PyTorch and Torchvision

    pip uninstall torch torchvision
    
  2. Install torch 2.2.0 and torchvision 0.17.2 according to JP5.1.2

    pip install https://github.com/ultralytics/assets/releases/download/v0.0.0/torch-2.2.0-cp38-cp38-linux_aarch64.whl
    pip install https://github.com/ultralytics/assets/releases/download/v0.0.0/torchvision-0.17.2+c1d70fe-cp38-cp38-linux_aarch64.whl
    

Note

Visit the PyTorch for Jetson page to access all different versions of PyTorch for different JetPack versions. For a more detailed list on the PyTorch, Torchvision compatibility, visit the PyTorch and Torchvision compatibility page.

Install onnxruntime-gpu

The onnxruntime-gpu package hosted in PyPI does not have aarch64 binaries for the Jetson. So we need to manually install this package. This package is needed for some of the exports.

You can find all available onnxruntime-gpu packagesโ€”organized by JetPack version, Python version, and other compatibility detailsโ€”in the Jetson Zoo ONNX Runtime compatibility matrix. Here we will download and install onnxruntime-gpu 1.17.0 with Python3.8 support.

wget https://nvidia.box.com/shared/static/zostg6agm00fb6t5uisw51qi6kpcuwzd.whl -O onnxruntime_gpu-1.17.0-cp38-cp38-linux_aarch64.whl
pip install onnxruntime_gpu-1.17.0-cp38-cp38-linux_aarch64.whl

Note

onnxruntime-gpu will automatically revert back the numpy version to latest. So we need to reinstall numpy to 1.23.5 to fix an issue by executing:

pip install numpy==1.23.5

Use TensorRT on NVIDIA Jetson

Among all the model export formats supported by Ultralytics, TensorRT offers the highest inference performance on NVIDIA Jetson devices, making it our top recommendation for Jetson deployments. For setup instructions and advanced usage, see our dedicated TensorRT integration guide.

Convert Model to TensorRT and Run Inference

The YOLO11n model in PyTorch format is converted to TensorRT to run inference with the exported model.

Example

from ultralytics import YOLO

# Load a YOLO11n PyTorch model
model = YOLO("yolo11n.pt")

# Export the model to TensorRT
model.export(format="engine")  # creates 'yolo11n.engine'

# Load the exported TensorRT model
trt_model = YOLO("yolo11n.engine")

# Run inference
results = trt_model("https://ultralytics.com/images/bus.jpg")
# Export a YOLO11n PyTorch model to TensorRT format
yolo export model=yolo11n.pt format=engine # creates 'yolo11n.engine'

# Run inference with the exported model
yolo predict model=yolo11n.engine source='https://ultralytics.com/images/bus.jpg'

Note

Visit the Export page to access additional arguments when exporting models to different model formats

Use NVIDIA Deep Learning Accelerator (DLA)

NVIDIA Deep Learning Accelerator (DLA) is a specialized hardware component built into NVIDIA Jetson devices that optimizes deep learning inference for energy efficiency and performance. By offloading tasks from the GPU (freeing it up for more intensive processes), DLA enables models to run with lower power consumption while maintaining high throughput, ideal for embedded systems and real-time AI applications.

The following Jetson devices are equipped with DLA hardware:

Jetson Device DLA Cores DLA Max Frequency
Jetson AGX Orin Series 2 1.6 GHz
Jetson Orin NX 16GB 2 614 MHz
Jetson Orin NX 8GB 1 614 MHz
Jetson AGX Xavier Series 2 1.4 GHz
Jetson Xavier NX Series 2 1.1 GHz

Example

from ultralytics import YOLO

# Load a YOLO11n PyTorch model
model = YOLO("yolo11n.pt")

# Export the model to TensorRT with DLA enabled (only works with FP16 or INT8)
model.export(format="engine", device="dla:0", half=True)  # dla:0 or dla:1 corresponds to the DLA cores

# Load the exported TensorRT model
trt_model = YOLO("yolo11n.engine")

# Run inference
results = trt_model("https://ultralytics.com/images/bus.jpg")
# Export a YOLO11n PyTorch model to TensorRT format with DLA enabled (only works with FP16 or INT8)
# Once DLA core number is specified at export, it will use the same core at inference
yolo export model=yolo11n.pt format=engine device="dla:0" half=True # dla:0 or dla:1 corresponds to the DLA cores

# Run inference with the exported model on the DLA
yolo predict model=yolo11n.engine source='https://ultralytics.com/images/bus.jpg'

Note

When using DLA exports, some layers may not be supported to run on DLA and will fall back to the GPU for execution. This fallback can introduce additional latency and impact the overall inference performance. Therefore, DLA is not primarily designed to reduce inference latency compared to TensorRT running entirely on the GPU. Instead, its primary purpose is to increase throughput and improve energy efficiency.

NVIDIA Jetson Orin YOLO11 Benchmarks

YOLO11 benchmarks were run by the Ultralytics team on 10 different model formats measuring speed and accuracy: PyTorch, TorchScript, ONNX, OpenVINO, TensorRT, TF SavedModel, TF GraphDef, TF Lite, MNN, NCNN. Benchmarks were run on NVIDIA Jetson AGX Orin Developer Kit (64GB), NVIDIA Jetson Orin Nano Super Developer Kit and Seeed Studio reComputer J4012 powered by Jetson Orin NX 16GB device at FP32 precision with default input image size of 640.

Comparison Charts

Even though all model exports are working with NVIDIA Jetson, we have only included PyTorch, TorchScript, TensorRT for the comparison chart below because, they make use of the GPU on the Jetson and are guaranteed to produce the best results. All the other exports only utilize the CPU and the performance is not as good as the above three. You can find benchmarks for all exports in the section after this chart.

NVIDIA Jetson AGX Orin Developer Kit (64GB)

Jetson AGX Orin Benchmarks
Benchmarked with Ultralytics 8.3.157

NVIDIA Jetson Orin Nano Super Developer Kit

Jetson Orin Nano Super Benchmarks
Benchmarked with Ultralytics 8.3.157

NVIDIA Jetson Orin NX 16GB

Jetson Orin NX 16GB Benchmarks
Benchmarked with Ultralytics 8.3.157

Detailed Comparison Tables

The below table represents the benchmark results for five different models (YOLO11n, YOLO11s, YOLO11m, YOLO11l, YOLO11x) across ten different formats (PyTorch, TorchScript, ONNX, OpenVINO, TensorRT, TF SavedModel, TF GraphDef, TF Lite, MNN, NCNN), giving us the status, size, mAP50-95(B) metric, and inference time for each combination.

NVIDIA Jetson AGX Orin Developer Kit (64GB)

Performance

Format Status Size on disk (MB) mAP50-95(B) Inference time (ms/im)
PyTorch โœ… 5.4 0.5101 9.40
TorchScript โœ… 10.5 0.5083 11.00
ONNX โœ… 10.2 0.5077 48.32
OpenVINO โœ… 10.4 0.5058 27.24
TensorRT (FP32) โœ… 12.1 0.5085 3.93
TensorRT (FP16) โœ… 8.3 0.5063 2.55
TensorRT (INT8) โœ… 5.4 0.4719 2.18
TF SavedModel โœ… 25.9 0.5077 66.87
TF GraphDef โœ… 10.3 0.5077 65.68
TF Lite โœ… 10.3 0.5077 272.92
MNN โœ… 10.1 0.5059 36.33
NCNN โœ… 10.2 0.5031 28.51
Format Status Size on disk (MB) mAP50-95(B) Inference time (ms/im)
PyTorch โœ… 18.4 0.5783 12.10
TorchScript โœ… 36.5 0.5782 11.01
ONNX โœ… 36.3 0.5782 107.54
OpenVINO โœ… 36.4 0.5810 55.03
TensorRT (FP32) โœ… 38.1 0.5781 6.52
TensorRT (FP16) โœ… 21.4 0.5803 3.65
TensorRT (INT8) โœ… 12.1 0.5735 2.81
TF SavedModel โœ… 91.0 0.5782 132.73
TF GraphDef โœ… 36.4 0.5782 134.96
TF Lite โœ… 36.3 0.5782 798.21
MNN โœ… 36.2 0.5777 82.35
NCNN โœ… 36.2 0.5784 56.07
Format Status Size on disk (MB) mAP50-95(B) Inference time (ms/im)
PyTorch โœ… 38.8 0.6265 22.20
TorchScript โœ… 77.3 0.6307 21.47
ONNX โœ… 76.9 0.6307 270.89
OpenVINO โœ… 77.1 0.6284 129.10
TensorRT (FP32) โœ… 78.8 0.6306 12.53
TensorRT (FP16) โœ… 41.9 0.6305 6.25
TensorRT (INT8) โœ… 23.2 0.6291 4.69
TF SavedModel โœ… 192.7 0.6307 299.95
TF GraphDef โœ… 77.1 0.6307 310.58
TF Lite โœ… 77.0 0.6307 2400.54
MNN โœ… 76.8 0.6308 213.56
NCNN โœ… 76.8 0.6284 141.18
Format Status Size on disk (MB) mAP50-95(B) Inference time (ms/im)
PyTorch โœ… 49.0 0.6364 27.70
TorchScript โœ… 97.6 0.6399 27.94
ONNX โœ… 97.0 0.6409 345.47
OpenVINO โœ… 97.3 0.6378 161.93
TensorRT (FP32) โœ… 99.1 0.6406 16.11
TensorRT (FP16) โœ… 52.6 0.6376 8.08
TensorRT (INT8) โœ… 30.8 0.6208 6.12
TF SavedModel โœ… 243.1 0.6409 390.78
TF GraphDef โœ… 97.2 0.6409 398.76
TF Lite โœ… 97.1 0.6409 3037.05
MNN โœ… 96.9 0.6372 265.46
NCNN โœ… 96.9 0.6364 179.68
Format Status Size on disk (MB) mAP50-95(B) Inference time (ms/im)
PyTorch โœ… 109.3 0.7005 44.40
TorchScript โœ… 218.1 0.6898 47.49
ONNX โœ… 217.5 0.6900 682.98
OpenVINO โœ… 217.8 0.6876 298.15
TensorRT (FP32) โœ… 219.6 0.6904 28.50
TensorRT (FP16) โœ… 112.2 0.6887 13.55
TensorRT (INT8) โœ… 60.0 0.6574 9.40
TF SavedModel โœ… 544.3 0.6900 749.85
TF GraphDef โœ… 217.7 0.6900 753.86
TF Lite โœ… 217.6 0.6900 6603.27
MNN โœ… 217.3 0.6868 519.77
NCNN โœ… 217.3 0.6849 298.58

Benchmarked with Ultralytics 8.3.157

Note

Inference time does not include pre/ post-processing.

NVIDIA Jetson Orin Nano Super Developer Kit

Performance

Format Status Size on disk (MB) mAP50-95(B) Inference time (ms/im)
PyTorch โœ… 5.4 0.5101 13.70
TorchScript โœ… 10.5 0.5082 13.69
ONNX โœ… 10.2 0.5081 14.47
OpenVINO โœ… 10.4 0.5058 56.66
TensorRT (FP32) โœ… 12.0 0.5081 7.44
TensorRT (FP16) โœ… 8.2 0.5061 4.53
TensorRT (INT8) โœ… 5.4 0.4825 3.70
TF SavedModel โœ… 25.9 0.5077 116.23
TF GraphDef โœ… 10.3 0.5077 114.92
TF Lite โœ… 10.3 0.5077 340.75
MNN โœ… 10.1 0.5059 76.26
NCNN โœ… 10.2 0.5031 45.03
Format Status Size on disk (MB) mAP50-95(B) Inference time (ms/im)
PyTorch โœ… 18.4 0.5790 20.90
TorchScript โœ… 36.5 0.5781 21.22
ONNX โœ… 36.3 0.5781 25.07
OpenVINO โœ… 36.4 0.5810 122.98
TensorRT (FP32) โœ… 37.9 0.5783 13.02
TensorRT (FP16) โœ… 21.8 0.5779 6.93
TensorRT (INT8) โœ… 12.2 0.5735 5.08
TF SavedModel โœ… 91.0 0.5782 250.65
TF GraphDef โœ… 36.4 0.5782 252.69
TF Lite โœ… 36.3 0.5782 998.68
MNN โœ… 36.2 0.5781 188.01
NCNN โœ… 36.2 0.5784 101.37
Format Status Size on disk (MB) mAP50-95(B) Inference time (ms/im)
PyTorch โœ… 38.8 0.6266 46.50
TorchScript โœ… 77.3 0.6307 47.95
ONNX โœ… 76.9 0.6307 53.06
OpenVINO โœ… 77.1 0.6284 301.63
TensorRT (FP32) โœ… 78.8 0.6305 27.86
TensorRT (FP16) โœ… 41.7 0.6309 13.50
TensorRT (INT8) โœ… 23.2 0.6291 9.12
TF SavedModel โœ… 192.7 0.6307 622.24
TF GraphDef โœ… 77.1 0.6307 628.74
TF Lite โœ… 77.0 0.6307 2997.93
MNN โœ… 76.8 0.6299 509.96
NCNN โœ… 76.8 0.6284 292.99
Format Status Size on disk (MB) mAP50-95(B) Inference time (ms/im)
PyTorch โœ… 49.0 0.6364 56.50
TorchScript โœ… 97.6 0.6409 62.51
ONNX โœ… 97.0 0.6399 68.35
OpenVINO โœ… 97.3 0.6378 376.03
TensorRT (FP32) โœ… 99.2 0.6396 35.59
TensorRT (FP16) โœ… 52.1 0.6361 17.48
TensorRT (INT8) โœ… 30.9 0.6207 11.87
TF SavedModel โœ… 243.1 0.6409 807.47
TF GraphDef โœ… 97.2 0.6409 822.88
TF Lite โœ… 97.1 0.6409 3792.23
MNN โœ… 96.9 0.6372 631.16
NCNN โœ… 96.9 0.6364 350.46
Format Status Size on disk (MB) mAP50-95(B) Inference time (ms/im)
PyTorch โœ… 109.3 0.7005 90.00
TorchScript โœ… 218.1 0.6901 113.40
ONNX โœ… 217.5 0.6901 122.94
OpenVINO โœ… 217.8 0.6876 713.1
TensorRT (FP32) โœ… 219.5 0.6904 66.93
TensorRT (FP16) โœ… 112.2 0.6892 32.58
TensorRT (INT8) โœ… 61.5 0.6612 19.90
TF SavedModel โœ… 544.3 0.6900 1605.4
TF GraphDef โœ… 217.8 0.6900 2961.8
TF Lite โœ… 217.6 0.6900 8234.86
MNN โœ… 217.3 0.6893 1254.18
NCNN โœ… 217.3 0.6849 725.50

Benchmarked with Ultralytics 8.3.157

Note

Inference time does not include pre/ post-processing.

NVIDIA Jetson Orin NX 16GB

Performance

Format Status Size on disk (MB) mAP50-95(B) Inference time (ms/im)
PyTorch โœ… 5.4 0.5101 12.90
TorchScript โœ… 10.5 0.5082 13.17
ONNX โœ… 10.2 0.5081 15.43
OpenVINO โœ… 10.4 0.5058 39.80
TensorRT (FP32) โœ… 11.8 0.5081 7.94
TensorRT (FP16) โœ… 8.1 0.5085 4.73
TensorRT (INT8) โœ… 5.4 0.4786 3.90
TF SavedModel โœ… 25.9 0.5077 88.48
TF GraphDef โœ… 10.3 0.5077 86.67
TF Lite โœ… 10.3 0.5077 302.55
MNN โœ… 10.1 0.5059 52.73
NCNN โœ… 10.2 0.5031 32.04
Format Status Size on disk (MB) mAP50-95(B) Inference time (ms/im)
PyTorch โœ… 18.4 0.5790 21.70
TorchScript โœ… 36.5 0.5781 22.71
ONNX โœ… 36.3 0.5781 26.49
OpenVINO โœ… 36.4 0.5810 84.73
TensorRT (FP32) โœ… 37.8 0.5783 13.77
TensorRT (FP16) โœ… 21.2 0.5796 7.31
TensorRT (INT8) โœ… 12.0 0.5735 5.33
TF SavedModel โœ… 91.0 0.5782 185.06
TF GraphDef โœ… 36.4 0.5782 186.45
TF Lite โœ… 36.3 0.5782 882.58
MNN โœ… 36.2 0.5775 126.36
NCNN โœ… 36.2 0.5784 66.73
Format Status Size on disk (MB) mAP50-95(B) Inference time (ms/im)
PyTorch โœ… 38.8 0.6266 45.00
TorchScript โœ… 77.3 0.6307 51.87
ONNX โœ… 76.9 0.6307 56.00
OpenVINO โœ… 77.1 0.6284 202.69
TensorRT (FP32) โœ… 78.7 0.6305 30.38
TensorRT (FP16) โœ… 41.8 0.6302 14.48
TensorRT (INT8) โœ… 23.2 0.6291 9.74
TF SavedModel โœ… 192.7 0.6307 445.58
TF GraphDef โœ… 77.1 0.6307 460.94
TF Lite โœ… 77.0 0.6307 2653.65
MNN โœ… 76.8 0.6308 339.38
NCNN โœ… 76.8 0.6284 187.64
Format Status Size on disk (MB) mAP50-95(B) Inference time (ms/im)
PyTorch โœ… 49.0 0.6364 56.60
TorchScript โœ… 97.6 0.6409 66.72
ONNX โœ… 97.0 0.6399 71.92
OpenVINO โœ… 97.3 0.6378 254.17
TensorRT (FP32) โœ… 99.2 0.6406 38.89
TensorRT (FP16) โœ… 51.9 0.6363 18.59
TensorRT (INT8) โœ… 30.9 0.6207 12.60
TF SavedModel โœ… 243.1 0.6409 575.98
TF GraphDef โœ… 97.2 0.6409 583.79
TF Lite โœ… 97.1 0.6409 3353.41
MNN โœ… 96.9 0.6367 421.33
NCNN โœ… 96.9 0.6364 228.26
Format Status Size on disk (MB) mAP50-95(B) Inference time (ms/im)
PyTorch โœ… 109.3 0.7005 98.50
TorchScript โœ… 218.1 0.6901 123.03
ONNX โœ… 217.5 0.6901 129.55
OpenVINO โœ… 217.8 0.6876 483.44
TensorRT (FP32) โœ… 219.6 0.6904 75.92
TensorRT (FP16) โœ… 112.1 0.6885 35.78
TensorRT (INT8) โœ… 61.6 0.6592 21.60
TF SavedModel โœ… 544.3 0.6900 1120.43
TF GraphDef โœ… 217.7 0.6900 1172.35
TF Lite โœ… 217.6 0.6900 7283.63
MNN โœ… 217.3 0.6877 840.16
NCNN โœ… 217.3 0.6849 474.41

Benchmarked with Ultralytics 8.3.157

Note

Inference time does not include pre/ post-processing.

Explore more benchmarking efforts by Seeed Studio running on different versions of NVIDIA Jetson hardware.

Reproduce Our Results

To reproduce the above Ultralytics benchmarks on all export formats run this code:

Example

from ultralytics import YOLO

# Load a YOLO11n PyTorch model
model = YOLO("yolo11n.pt")

# Benchmark YOLO11n speed and accuracy on the COCO128 dataset for all all export formats
results = model.benchmark(data="coco128.yaml", imgsz=640)
# Benchmark YOLO11n speed and accuracy on the COCO128 dataset for all all export formats
yolo benchmark model=yolo11n.pt data=coco128.yaml imgsz=640

Note that benchmarking results might vary based on the exact hardware and software configuration of a system, as well as the current workload of the system at the time the benchmarks are run. For the most reliable results use a dataset with a large number of images, i.e. data='coco.yaml' (5000 val images).

Best Practices when using NVIDIA Jetson

When using NVIDIA Jetson, there are a couple of best practices to follow in order to enable maximum performance on the NVIDIA Jetson running YOLO11.

  1. Enable MAX Power Mode

    Enabling MAX Power Mode on the Jetson will make sure all CPU, GPU cores are turned on.

    sudo nvpmodel -m 0
    
  2. Enable Jetson Clocks

    Enabling Jetson Clocks will make sure all CPU, GPU cores are clocked at their maximum frequency.

    sudo jetson_clocks
    
  3. Install Jetson Stats Application

    We can use jetson stats application to monitor the temperatures of the system components and check other system details such as view CPU, GPU, RAM utilization, change power modes, set to max clocks, check JetPack information

    sudo apt update
    sudo pip install jetson-stats
    sudo reboot
    jtop
    

Jetson Stats

Next Steps

Congratulations on successfully setting up YOLO11 on your NVIDIA Jetson! For further learning and support, visit more guide at Ultralytics YOLO11 Docs!

FAQ

How do I deploy Ultralytics YOLO11 on NVIDIA Jetson devices?

Deploying Ultralytics YOLO11 on NVIDIA Jetson devices is a straightforward process. First, flash your Jetson device with the NVIDIA JetPack SDK. Then, either use a pre-built Docker image for quick setup or manually install the required packages. Detailed steps for each approach can be found in sections Quick Start with Docker and Start with Native Installation.

What performance benchmarks can I expect from YOLO11 models on NVIDIA Jetson devices?

YOLO11 models have been benchmarked on various NVIDIA Jetson devices showing significant performance improvements. For example, the TensorRT format delivers the best inference performance. The table in the Detailed Comparison Tables section provides a comprehensive view of performance metrics like mAP50-95 and inference time across different model formats.

Why should I use TensorRT for deploying YOLO11 on NVIDIA Jetson?

TensorRT is highly recommended for deploying YOLO11 models on NVIDIA Jetson due to its optimal performance. It accelerates inference by leveraging the Jetson's GPU capabilities, ensuring maximum efficiency and speed. Learn more about how to convert to TensorRT and run inference in the Use TensorRT on NVIDIA Jetson section.

How can I install PyTorch and Torchvision on NVIDIA Jetson?

To install PyTorch and Torchvision on NVIDIA Jetson, first uninstall any existing versions that may have been installed via pip. Then, manually install the compatible PyTorch and Torchvision versions for the Jetson's ARM64 architecture. Detailed instructions for this process are provided in the Install PyTorch and Torchvision section.

What are the best practices for maximizing performance on NVIDIA Jetson when using YOLO11?

To maximize performance on NVIDIA Jetson with YOLO11, follow these best practices:

  1. Enable MAX Power Mode to utilize all CPU and GPU cores.
  2. Enable Jetson Clocks to run all cores at their maximum frequency.
  3. Install the Jetson Stats application for monitoring system metrics.

For commands and additional details, refer to the Best Practices when using NVIDIA Jetson section.



๐Ÿ“… Created 1 year ago โœ๏ธ Updated 23 days ago

Comments