Platform Overview#

NVIDIA AI Enterprise is a cloud-native suite of software tools, libraries and frameworks designed to deliver optimized performance, robust security, and stability for production AI deployments. Easy-to-use microservices optimize model performance with enterprise-grade security, support, and stability, ensuring a streamlined transition from prototype to production for enterprises that run their businesses on AI. It consists of two primary layers: the infrastructure layer and the application layer.

_images/overview-no-icons.png

Application Layer#

The application layer provides specialized SDKs, frameworks and state of the art AI models for developing AI applications. It includes:

  1. Optimized microservices that enhance AI model performance and speed time to deployment for a wide range of AI workflows.

  2. Development and deployment tools: Support for popular AI tools like Triton, TensorFlow, PyTorch, and NVIDIA’s own SDKs.

  3. Optimized libraries for deep learning, data science, and machine learning.

  4. Access to a repository of pre-trained models for various AI tasks.

By separating the infrastructure layer (which is versioned) from the application layer, NVIDIA AI Enterprise ensures that foundational updates and improvements do not disrupt the development and deployment of AI applications. This modular approach allows for flexibility and scalability in AI projects.

Application Layer Software#

NVIDIA AI application frameworks, NVIDIA pretrained models and all other NVIDIA AI software available on NGC are supported with an NVIDIA AI Enterprise license. With 100+ AI frameworks and pretrained models including NeMo, Maxine, cuOpt and more, look for the NVIDIA AI Enterprise Supported label on NGC.

Organizations start their AI journey by using the open, freely available NGC libraries and frameworks to experiment and pilot. Now, when they’re ready to move from pilot to production, enterprises can easily transition to a fully managed and secure AI platform with an NVIDIA AI Enterprise subscription. This gives enterprises deploying business critical AI, the assurance of business continuity with NVIDIA Enterprise Support and access to NVIDIA AI experts.

Table 1 Application Layer Software#

Component

Description

Branch Type

NGC Catalog

Documentation

NVIDIA NIM

NVIDIA NIM provides microservices for accelerated AI model deployment.

Feature Branch (FB)

NIM Feature Branch (FB) on NGC Catalog

NVIDIA NIM Documentation

Production Branch (PB)

NIM Production Branch (PB) on NGC Catalog

Production Branch (PB) Release Notes

Application Frameworks, AI Toolkits, SDKs and more

Building blocks and software tools to build AI workflows. Includes core AI and data science frameworks.

Feature Branch (FB)

Feature Branch (FB) on NGC Catalog

Feature Branch (FB) Release Notes

Varies by framework/toolkit. Refer to the documentation links on the product pages on NGC Catalog.

Production Branch (PB)

Production Branch (PB) on NGC Catalog

Production Branch (PB) Release Notes

Long Term Supported Branch (LTSB)

Long Term Supported Branch (LTSB) on NGC Catalog

Long Term Supported Branch (LTSB) Release Notes

Production-Ready Pretrained Models

Pretrained AI models simplify and speed up development by eliminating the need to build from scratch.

Models available with NVIDIA AI Enterprise support.

Pretrained Models on NGC Catalog

Varies by model. Refer to the documentation links on the product pages on NGC Catalog.

Infrastructure Layer#

The infrastructure layer includes various components that ensure efficient deployment, management, and scaling of AI applications. Key features include:

  1. Versioning to maintain compatibility and stability across different deployments. Each version provides feature updates, security patches, and performance improvements.

  2. Drivers to optimize utilization of NVIDIA GPUs and Networking in bare metal and virtualized environments.

  3. Kubernetes operators for managing GPU and networking in containers and the lifecycle of microservices and AI pipelines.

  4. Cluster Management software to provision and monitor servers at scale.

Infrastructure Layer Software#

The NVIDIA AI Enterprise Infrastructure Release packages all software for managing and optimizing infrastructure and workloads.

Table 2 Infrastructure Layer Software#

Component

Description

NGC Link

Documentation

NVIDIA Data Center Driver

Provides hardware support for NVIDIA GPUs. Consult the appropriate NVIDIA AI Enterprise Release Notes to see which GPUs and operating systems are supported by each version of the driver.

GPU Driver on NGC

NVIDIA Data Center Driver Documentation

NVIDIA vGPU (C-Series) Host Driver

The NVIDIA driver is to be deployed in the hypervisor for virtualized environments.

NVIDIA vGPU (C-Series) Host Driver on NGC

NVIDIA vGPU C-Series Documentation

NVIDIA vGPU (C-Series) Guest Driver

NVIDIA virtual GPU (vGPU) software driver is to be deployed in the VM or on a bare metal operating system to enable multiple virtual machines (VMs) to have simultaneous, direct access to a single physical GPU.

NVIDIA vGPU (C-Series) Guest Driver on NGC

NVIDIA vGPU C-Series Driver Documentation

NVIDIA DOCA Driver for Networking

Enables rapidly creating and managing applications and services on the BlueField networking platform, leveraging industry-standard APIs.

DOCA Driver on NGC

NVIDIA DOCA Drivers Documentation

GPU Operator

NVIDIA GPU Operator simplifies the deployment of NVIDIA AI Enterprise by automating the management of all NVIDIA software components needed to provision GPUs in Kubernetes.

GPU Operator on NGC

NVIDIA GPU Operator Documentation

Network Operator

NVIDIA Network Operator simplifies the provisioning and management of NVIDIA networking resources in a Kubernetes cluster.

Network Operator on NGC

NVIDIA Network Operator Documentation

NVIDIA NIM Operator

NVIDIA NIM Operator enables cluster administrators to operate the software components and services required to run LLM, embedding, and other models using NVIDIA NIM microservices in Kubernetes.

NIM Operator on NGC

NVIDIA NIM Operator Documentation

Base Command Manager

NVIDIA Base Command Manager streamlines cluster provisioning, workload management, and infrastructure monitoring across data centers and edge locations. It comprises the features of NVIDIA Base Command Manager that are certified for use with NVIDIA AI Enterprise.

Base Command Manager on NGC

NVIDIA Base Command Manager Documentation

Note

NVIDIA virtual GPU (vGPU) C-Series drivers allow virtual machines to utilize the full performance of GPUs and access advanced features such as sharing, live migration, and monitoring. The host driver is installed in the hypervisor of each physical host, while the guest driver is installed on each virtual machine.

NVIDIA AI Enterprise is certified to run across public cloud, data centers, workstations, DGX platform to edge. A complete list of supported configurations is listed in the NVIDIA AI Enterprise Infra Support Matrix.

Infra Software Branch Support Matrix#

Infra Software Branch Release Notes#