Product quantization for vector search in Azure Cosmos DB for MongoDB vCore

2025-05-19
Applies to: ✅ MongoDB vCore

Product quantization (PQ) is a powerful technique in Azure Cosmos DB for MongoDB vCore that significantly compresses high-dimensional vector embeddings used in vector search. This compression reduces memory use and speeds up nearest-neighbor searches, improving efficiency for large vector datasets. While PQ offers benefits for speed and scale, it may come at the expense of accuracy.

Benefits

Reduced Storage: PQ greatly lowers the storage needed for vector indexes compared to full-precision (float32) vectors, leading to substantial cost savings for large datasets.
Faster Search: Working with compressed vectors allows the system to calculate distances and find potential nearest neighbors much quicker than with full-precision vectors.
Improved Scalability: Lower memory overhead enables scaling vector search to handle larger and higher-dimensional embeddings within your cluster.

How it works

Product quantization divides the high-dimensional vector space into several lower-dimensional subspaces. Each subspace is then quantized independently using a clustering algorithm (typically k-means). The center of each cluster represents all vectors within it. Each original vector is then represented by a short code of the cluster IDs it belongs to in each subspace.

Using Product quantization

To create a vector index with Product quantization, use the createIndexes command with cosmosSearchOptions specifying "compression": "pq" and "kind" : "vector-diskann":

{
    "createIndexes": "<collection_name>",
    "indexes": [
        {
            "name": "<index_name>",
            "key": {
                "<path_to_property>": "cosmosSearch"
            },
            "cosmosSearchOptions": {
                "kind": "vector-diskann",
                "similarity": "<string_value>", // "COS", "L2"
                "dimensions": <integer_value>, // Max 16,000
                "compression": "pq",
                "pqCompressedDims": <integer_value>, // Dimensions after compression (< original)
                "pqSampleSize": <integer_value>    // Samples for centroid generation
            }
        }
    ]
}

Field	Type	Description
`compression`	string	Set to `"pq"` to enable Product quantization.
`pqCompressedDims`	integer	Dimensions after PQ compression (must be less than original dimensions). Automatically calculated if omitted. Range: 1-8000.
`pqSampleSize`	integer	Number of sample vectors for PQ centroid training. Higher value means better quality but longer build time. Default: 1000. Range: 1000-100000.

Note

Product quantization is currently supported only with the vector-diskann index type.

Note

For best results, create a PQ index after your collection has data. If the collection is empty, the system uses random vectors for initial centroids. If the number of documents is less than pqSampleSize, the training data is padded with random data within the range of your existing vector data.

How compressed dimensions are set

If you don't specify pqCompressedDims, it automatically determines based on the original vector dimensions:

Original Dimension Range	`pqCompressedDims`
[0 - 32)	dimensions / 2
[32 - 64)	16
[64 - 128)	32
[128 - 512)	64
[512 - 1536)	96
above 1536	128

Create a PQ index

db.runCommand(
{
    "createIndexes": "your_vector_collection",
    "indexes": [
        {
            "key": { "v": "cosmosSearch" },
            "name": "diskann_pq_index",
            "cosmosSearchOptions": {
                "kind": "vector-diskann",
                "similarity": "COS",
                "dimensions": 1536,
                "compression": "pq",
                "pqCompressedDims": 96,
                "pqSampleSize": 2000
            }
        }
    ]
} )

Improving search with Oversampling

PQ compression can lead to precision loss in distance calculations. To reduce this, Azure Cosmos DB for MongoDB (vCore) offers the oversampling parameter in the $search operator.

The oversampling factor (a float with a minimum of 1) specifies how many more candidate vectors to retrieve from the compressed index than k (the number of desired results). These extra candidates are used to refine the search using the original, full-precision vectors, improving the final top k accuracy. For instance, to get the top 10 (k=10) most similar vectors, a good best practice might be to set oversampling to a value like 1.5 or 2.0. With "oversampling": 1.5, the system would first get 15 candidates from the index and then refine the top 10 using the full-precision data.

{
    "$search": {
        "cosmosSearch": {
            "vector": <vector_to_search>,
            "path": "<path_to_property>",
            "k": <num_results_to_return>,
            "oversampling": <float_value> 
        },
    }
}

This code snippet demonstrates a vector search using the $search operator with Product quantization. It takes a queryVector as input and searches the v field. The query requests the top 10 most similar documents (k: 10), using an oversampling factor of 2.0, which retrieves 20 candidates improving the accuracy of the search over the compressed index.

db.your_vector_collection.aggregate([
    {
        $search: {
            "cosmosSearch": {
                "vector": [0.1, 0.5, 0.9, ...],
                "path": "v",
                "k": 10,
                "oversampling": 2.0 // Retrieve 2 * 10 = 20 candidates for reranking
            },
            "returnStoredSource": true
        }
    }
])

Half-Precision vs. Product quantization

Both Half-Precision and Product quantization (PQ) compress vector indexes in Azure Cosmos DB for MongoDB (vCore), but they differ in how they achieve compression and affect search:

Feature	Half-Precision	Product quantization (PQ)
Compression Method	Reduces each vector dimension to 16 bits.	Divides vector space into subspaces and quantizes each.
Max Dimensions	Up to 4,000	Up to 16,000
Precision Change	Slight loss due to lower bit depth.	Potentially larger loss, configurable via `pqCompressedDims`.
Search Speed	Moderate speed increase due to smaller index.	Significant speed increase due to highly compressed vectors.
Index Build Time	Relatively fast.	Can be longer due to centroid training (`pqSampleSize`).
Index Support	HNSW, IVF.	DiskANN.
Configuration	Simple, enable `compression: "half"`.	More parameters: `pqCompressedDims`, `pqSampleSize`.
Oversampling Use	Helps with minor precision loss.	Essential for recovering accuracy from larger compression.
Ideal Use Cases	Moderate memory reduction, increased dimensions, acceptable precision trade-off.	Large datasets, high dimensions, fast search prioritized, precision managed with oversampling.

Considerations for Product quantization

Precision vs. Compression: Higher PQ compression leads to smaller indexes and faster search but greater precision loss. Experiment with pqCompressedDims and oversampling to find the right balance.
Index Build Time: PQ index creation can take longer due to the centroid training process, influenced by pqSampleSize.
Data Distribution: PQ works best when vector data has a clear cluster structure.

Next steps

Create a lifetime free-tier vCore cluster for Azure Cosmos DB for MongoDB

Share via