Azure Database for PostgreSQL での LangChain の使用

2025-06-23

Azure Database for PostgreSQL は、LangChain などの主要な大規模言語モデル (LLM) オーケストレーションパッケージとシームレスに統合されるため、開発者はアプリケーション内で高度な AI 機能の機能を活用できます。 LangChain を使用すると、LLM、埋め込みモデル、データベースの管理と使用を合理化できるため、Generative AI アプリケーションの開発がさらに容易になります。

このチュートリアルでは、Azure Database for PostgreSQL 統合ベクターデータベースを使用して、LangChain を使用してコレクション内のドキュメントを格納および管理する方法について説明します。また、インデックスを作成し、コサイン距離、L2 (ユークリッド距離)、IP (内部積) などの近似最近隣アルゴリズムを使用してベクトル検索クエリを実行し、クエリベクターに近いドキュメントを検索する方法についても説明します。

ベクターのサポート

Azure Database for PostgreSQL - フレキシブルサーバーを使用すると、PostgreSQL に何百万ものベクター埋め込みを効率的に格納してクエリを実行し、AI のユースケースを POC (概念実証) から運用環境にスケーリングできます。

ベクター埋め込みとリレーショナルデータに対してクエリを実行するための使い慣れた SQL インターフェイスを提供します。
DiskANN インデックス作成アルゴリズムを使用して、100M 以上のベクトル間でより迅速かつ正確な類似性検索を使用して、pgvectorを向上させます。
リレーショナルメタデータ、ベクター埋め込み、時系列データを 1 つのデータベースに統合することで、操作を簡略化します。
堅牢な PostgreSQL エコシステムと Azure Cloud の機能を使用して、レプリケーションや高可用性などのエンタープライズレベルの機能を実現します。

認証

Azure Database for PostgreSQL - フレキシブルサーバーでは、パスワードベースと Microsoft Entra (旧称 Azure Active Directory) 認証がサポートされています。 Entra 認証を使用すると、Entra ID を使用して PostgreSQL サーバーに対する認証を行えます。 Entra ID を使用すると、データベースユーザーの個別のユーザー名とパスワードを管理する必要がなくなり、他の Azure サービスで使用するのと同じセキュリティメカニズムを使用できます。

このノートブックは、いずれかの認証方法を使用するように設定されています。ノートブックで後で Entra 認証を使用するかどうかを構成できます。

設定

Azure Database for PostgreSQL では、オープンソースの LangChain の Postgres サポートを使用して、Azure Database for PostgreSQL に接続します。まず、パートナーパッケージをダウンロードします。

%pip install -qU langchain_postgres
%pip install -qU langchain-openai
%pip install -qU azure-identity

Azure Database for PostgreSQL - フレキシブルサーバーで pgvector を有効にする

Azure Database for PostgreSQL の有効化手順を参照してください。

資格情報

このノートブックを実行するには、Azure Database for PostgreSQL 接続の詳細が必要であり、環境変数として追加する必要があります。

Microsoft Entra 認証を使用する場合は、 USE_ENTRA_AUTH フラグを True に設定します。 Entra 認証を使用する場合は、ホストとデータベース名のみを指定する必要があります。パスワード認証を使用する場合は、ユーザー名とパスワードも設定する必要があります。

import getpass
import os

USE_ENTRA_AUTH = True

# Supply the connection details for the database
os.environ["DBHOST"] = "<server-name>"
os.environ["DBNAME"] = "<database-name>"
os.environ["SSLMODE"] = "require"

if not USE_ENTRA_AUTH:
    # If using a username and password, supply them here
    os.environ["DBUSER"] = "<username>"
    os.environ["DBPASSWORD"] = getpass.getpass("Database Password:")

Azure OpenAI Embeddings のセットアップ

os.environ["AZURE_OPENAI_ENDPOINT"] = "<azure-openai-endpoint>"
os.environ["AZURE_OPENAI_API_KEY"] = getpass.getpass("Azure OpenAI API Key:")

AZURE_OPENAI_ENDPOINT = os.environ["AZURE_OPENAI_ENDPOINT"]
AZURE_OPENAI_API_KEY = os.environ["AZURE_OPENAI_API_KEY"]

from langchain_openai import AzureOpenAIEmbeddings

embeddings = AzureOpenAIEmbeddings(
    model="text-embedding-3-small",
    api_key=AZURE_OPENAI_API_KEY,
    azure_endpoint=AZURE_OPENAI_ENDPOINT,
    azure_deployment="text-embedding-3-small",
)

初期化

Microsoft Entra 認証

次のセルには、Entra 認証を使用するように LangChain を設定する関数が含まれています。 azure.identity ライブラリからDefaultAzureCredentialを使用して Azure Databases for PostgreSQL サービスのトークンを取得する関数get_token_and_usernameを提供します。これにより、sqlalchemy エンジンに、新しい接続を作成するための有効なトークンが与えられます。また、Java Web トークン (JWT) であるトークンを解析して、データベースへの接続に使用されるユーザー名を抽出します。

create_postgres_engine関数は、TokenManager からフェッチされたトークンに基づいてユーザー名とパスワードを動的に設定する sqlalchemy Engine を作成します。このEngineは、PGVector LangChain VectorStore の connection パラメーターに渡すことができます。

Azure へのログイン

Azure にログインするには、 Azure CLI がインストールされていることを確認します。ターミナルで次のコマンドを実行する必要があります。

az login

ログインすると、次のコードによってトークンがフェッチされます。

import base64
import json
from functools import lru_cache

from azure.identity import DefaultAzureCredential
from sqlalchemy import create_engine, event
from sqlalchemy.engine.url import URL


@lru_cache(maxsize=1)
def get_credential():
    """Memoized function to create the Azure credential, which caches tokens."""
    return DefaultAzureCredential()


def decode_jwt(token):
    """Decode the JWT payload to extract claims."""
    payload = token.split(".")[1]
    padding = "=" * (4 - len(payload) % 4)
    decoded_payload = base64.urlsafe_b64decode(payload + padding)
    return json.loads(decoded_payload)


def get_token_and_username():
    """Fetches a token returns the username and token."""
    # Fetch a new token and extract the username
    token = get_credential().get_token(
        "https://ossrdbms-aad.database.windows.net/.default"
    )
    claims = decode_jwt(token.token)
    username = claims.get("upn")
    if not username:
        raise ValueError("Could not extract username from token. Have you logged in?")

    return username, token.token


def create_postgres_engine():
    db_url = URL.create(
        drivername="postgresql+psycopg",
        username="",  # This will be replaced dynamically
        password="",  # This will be replaced dynamically
        host=os.environ["DBHOST"],
        port=os.environ.get("DBPORT", 5432),
        database=os.environ["DBNAME"],
    )

    # Create a sqlalchemy engine
    engine = create_engine(db_url, echo=True)

    # Listen for the connection event to inject dynamic credentials
    @event.listens_for(engine, "do_connect")
    def provide_dynamic_credentials(dialect, conn_rec, cargs, cparams):
        # Fetch the dynamic username and token
        username, token = get_token_and_username()

        # Override the connection parameters
        cparams["user"] = username
        cparams["password"] = token

    return engine

パスワード認証

Entra 認証を使用しない場合、 get_connection_uri は環境変数からユーザー名とパスワードをプルする接続 URI を提供します。

import urllib.parse


def get_connection_uri():
    # Read URI parameters from the environment
    dbhost = os.environ["DBHOST"]
    dbname = os.environ["DBNAME"]
    dbuser = urllib.parse.quote(os.environ["DBUSER"])
    password = os.environ["DBPASSWORD"]
    sslmode = os.environ["SSLMODE"]

    # Construct connection URI
    # Use psycopg 3!
    db_uri = (
        f"postgresql+psycopg://{dbuser}:{password}@{dbhost}/{dbname}?sslmode={sslmode}"
    )
    return db_uri

ベクターストアの作成

from langchain_core.documents import Document
from langchain_postgres import PGVector
from langchain_postgres.vectorstores import PGVector

collection_name = "my_docs"

# The connection is either a sqlalchemy engine or a connection URI
connection = create_postgres_engine() if USE_ENTRA_AUTH else get_connection_uri()

vector_store = PGVector(
    embeddings=embeddings,
    collection_name=collection_name,
    connection=connection,
    use_jsonb=True,
)

ベクターストアの管理

ベクターストアに項目を追加する

ID でドキュメントを追加すると、その ID に一致する既存のドキュメントが過剰に書き込まれます。

docs = [
    Document(
        page_content="there are cats in the pond",
        metadata={"id": 1, "___location": "pond", "topic": "animals"},
    ),
    Document(
        page_content="ducks are also found in the pond",
        metadata={"id": 2, "___location": "pond", "topic": "animals"},
    ),
    Document(
        page_content="fresh apples are available at the market",
        metadata={"id": 3, "___location": "market", "topic": "food"},
    ),
    Document(
        page_content="the market also sells fresh oranges",
        metadata={"id": 4, "___location": "market", "topic": "food"},
    ),
    Document(
        page_content="the new art exhibit is fascinating",
        metadata={"id": 5, "___location": "museum", "topic": "art"},
    ),
    Document(
        page_content="a sculpture exhibit is also at the museum",
        metadata={"id": 6, "___location": "museum", "topic": "art"},
    ),
    Document(
        page_content="a new coffee shop opened on Main Street",
        metadata={"id": 7, "___location": "Main Street", "topic": "food"},
    ),
    Document(
        page_content="the book club meets at the library",
        metadata={"id": 8, "___location": "library", "topic": "reading"},
    ),
    Document(
        page_content="the library hosts a weekly story time for kids",
        metadata={"id": 9, "___location": "library", "topic": "reading"},
    ),
    Document(
        page_content="a cooking class for beginners is offered at the community center",
        metadata={"id": 10, "___location": "community center", "topic": "classes"},
    ),
]

vector_store.add_documents(docs, ids=[doc.metadata["id"] for doc in docs])

ベクターストア内の項目を更新する

docs = [
    Document(
        page_content="Updated - cooking class for beginners is offered at the community center",
        metadata={"id": 10, "___location": "community center", "topic": "classes"},
    )
]
vector_store.add_documents(docs, ids=[doc.metadata["id"] for doc in docs])

ベクターストアから項目を削除する

vector_store.delete(ids=["3"])

クエリベクターストア

ベクターストアが作成され、関連するドキュメントが追加されたら、チェーンまたはエージェント内のベクターストアに対してクエリを実行できます。

フィルター処理のサポート

ベクターストアでは、ドキュメントのメタデータフィールドに対して適用できるフィルターのセットがサポートされています。

オペレーター	意味/カテゴリ
$eq	等式 (==)
$ne	不等式 (!=)
$lt	未満 (<)
$lte	以下 (<=)
$gt	より大きい (>)
$gte	以上または等しい (>=)
$in	特別なケース (含まれる)
$nin	特別なケース (含まれない)
$between	特別なケース (二つの間)
$like	テキスト (類似)
$ilike	テキスト (大文字と小文字の区別がない類似)
$and	論理 (積)
$or	論理 (和)

直接クエリを実行する

単純な類似性検索を実行するには、次のようにします。

results = vector_store.similarity_search(
    "kitty", k=10, filter={"id": {"$in": [1, 5, 2, 9]}}
)
for doc in results:
    print(f"* {doc.page_content} [{doc.metadata}]")

    * there are cats in the pond [{'id': 1, 'topic': 'animals', '___location': 'pond'}]
    * ducks are also found in the pond [{'id': 2, 'topic': 'animals', '___location': 'pond'}]
    * the new art exhibit is fascinating [{'id': 5, 'topic': 'art', '___location': 'museum'}]
    * the library hosts a weekly story time for kids [{'id': 9, 'topic': 'reading', '___location': 'library'}]

複数のフィールドを持つディクテーションを指定しても演算子がない場合、最上位レベルは論理 AND フィルターとして解釈されます

vector_store.similarity_search(
    "ducks",
    k=10,
    filter={"id": {"$in": [1, 5, 2, 9]}, "___location": {"$in": ["pond", "market"]}},
)

[Document(id='2', metadata={'id': 2, 'topic': 'animals', '___location': 'pond'}, page_content='ducks are also found in the pond'),
 Document(id='1', metadata={'id': 1, 'topic': 'animals', '___location': 'pond'}, page_content='there are cats in the pond')]

vector_store.similarity_search(
    "ducks",
    k=10,
    filter={
        "$and": [
            {"id": {"$in": [1, 5, 2, 9]}},
            {"___location": {"$in": ["pond", "market"]}},
        ]
    },
)

[Document(id='2', metadata={'id': 2, 'topic': 'animals', '___location': 'pond'}, page_content='ducks are also found in the pond'),
 Document(id='1', metadata={'id': 1, 'topic': 'animals', '___location': 'pond'}, page_content='there are cats in the pond')]

類似性検索を実行し、対応するスコアを受け取る場合は、次を実行できます。

results = vector_store.similarity_search_with_score(query="cats", k=1)
for doc, score in results:
    print(f"* [SIM={score:3f}] {doc.page_content} [{doc.metadata}]")

* [SIM=0.528338] there are cats in the pond [{'id': 1, 'topic': 'animals', '___location': 'pond'}]

PGVector ベクターストアで実行できるさまざまな検索の完全な一覧については、API リファレンスを参照してください。

レトリーバーに変換してクエリを実行する

また、ベクターストアをレトリバーに変換して、チェーン内で簡単に使用することもできます。

retriever = vector_store.as_retriever(search_type="mmr", search_kwargs={"k": 1})
retriever.invoke("kitty")

[Document(id='1', metadata={'id': 1, 'topic': 'animals', '___location': 'pond'}, page_content='there are cats in the pond')]

現在の制限

langchain_postgresは psycopg3 でのみ機能します。接続文字列を postgresql+psycopg2://... から postgresql+psycopg://langchain:langchain@... に更新する
埋め込みストアとコレクションのスキーマが変更され、ユーザー指定の ID でadd_documentsが正しく機能するようになりました。
ここで明示的な接続オブジェクトを渡す必要があります。

現時点では、スキーマの変更に関する簡単なデータ移行をサポートする メカニズムはありません 。そのため、ベクターストアでスキーマを変更した場合、ユーザーはテーブルを再作成し、ドキュメントを読み取る必要があります。

次の方法で共有

Azure Database for PostgreSQL での LangChain の使用

ベクターのサポート

認証

設定

Azure Database for PostgreSQL - フレキシブル サーバーで pgvector を有効にする

資格情報

Azure OpenAI Embeddings のセットアップ

初期化

Microsoft Entra 認証

Azure へのログイン

パスワード認証

ベクター ストアの作成

ベクター ストアの管理

ベクター ストアに項目を追加する

ベクター ストア内の項目を更新する

ベクター ストアから項目を削除する

クエリ ベクター ストア