JavaScript 用 Azure AI Document Intelligence クライアントライブラリ - バージョン 5.1.0

2025-05-13

Azure AI Document Intelligence は、機械学習を使用してドキュメントのテキストと構造化データを分析するクラウドサービスです。これには、次の主な機能が含まれています。

レイアウト - ドキュメントからテキスト、テーブル構造、選択マーク、およびそれらの境界領域座標を抽出します。
ドキュメント - 一般的な事前構築済みドキュメントモデルを使用して、ドキュメントからエンティティ、キーと値のペア、テーブル、選択マークを分析します。
読み取り - テキスト言語情報に加えて、ページの単語や行などのテキスト要素に関する情報を読み取ります。
事前構築済み - 事前構築済みモデルを使用して、特定の種類の一般的なドキュメント (領収書、請求書、名刺、身分証明書など) のデータを分析します。
カスタム - ドキュメントからテキスト、フィールド値、選択マーク、およびテーブルデータを抽出するカスタムモデルを構築します。カスタムモデルは独自のデータで構築されるため、ドキュメントに合わせて調整されます。
分類子 - カスタム分類子を作成して、ドキュメントを定義済みのクラスに分類します。

ソースコード | パッケージ (NPM) | API リファレンスドキュメント | 製品ドキュメント | サンプル

注

Document Intelligence サービスは、以前は "Azure Form Recognizer" と呼ばれていました。これらのサービスは同じものであり、JavaScript の @azure/ai-form-recognizer パッケージは Azure AI Document Intelligence サービスの Azure SDK パッケージです。本稿執筆時点では、Azure Form Recognizer から Azure AI Document Intelligence への名称変更が進行中であるため、「Form Recognizer」と「Document Intelligence」は同じ意味で使用される場合があります。

`@azure/ai-form-recognizer` パッケージをインストールする

JavaScript 用の Azure Document Intelligence クライアントライブラリを npm でインストールします。

npm install @azure/ai-form-recognizer

作業の開始

import { DefaultAzureCredential } from "@azure/identity";
import { DocumentAnalysisClient } from "@azure/ai-form-recognizer";
import { createReadStream } from "node:fs";

const credential = new DefaultAzureCredential();
const client = new DocumentAnalysisClient(
  "https://<resource name>.cognitiveservices.azure.com",
  credential,
);

// Document Intelligence supports many different types of files.
const file = createReadStream("path/to/file.jpg");
const poller = await client.beginAnalyzeDocument("<model ID>", file);

const { pages, tables, styles, keyValuePairs, documents } = await poller.pollUntilDone();

現在サポートされている環境

Node.js の LTS バージョン
Safari、Chrome、Edge、Firefox の最新バージョン。

詳細については、サポートポリシーのを参照してください。

[前提条件]

Azure サブスクリプション
Cognitive Services または Form Recognizer リソース。リソースを作成する必要がある場合は、Azure Portal を使用するか、Azure CLI をできます。

Form Recognizer リソースを作成する

注: 執筆時点では、Azure portal ではまだリソースを "Form Recognizer" リソースと呼んでいます。将来的には、これは「Document Intelligence」リソースに更新される可能性があります。現時点では、次のドキュメントでは "Form Recognizer" という名前を使用しています。

Document Intelligence は、マルチサービスアクセスとシングルサービスアクセスの両方をサポートします。 1 つのエンドポイント/キーで複数の Cognitive Services にアクセスする予定の場合は、Cognitive Services リソースを作成します。 Form Recognizer アクセスのみの場合は、Form Recognizer リソースを作成します。

リソースは、

オプション 1:Azure ポータル

オプション 2:Azure CLI。

CLI を使用して Form Recognizer リソースを作成する方法の例を次に示します。

# Create a new resource group to hold the Form Recognizer resource -
# if using an existing resource group, skip this step
az group create --name my-resource-group --___location westus2

Azure CLI を使用する場合は、<your-resource-group-name> と <your-resource-name> を独自の一意の名前に置き換えます。

az cognitiveservices account create --kind FormRecognizer --resource-group <your-resource-group-name> --name <your-resource-name> --sku <your-sku-name> --___location <your-___location>

クライアントを作成して認証する

Document Intelligence サービスと対話するには、 DocumentAnalysisClient または DocumentModelAdministrationClientを選択し、この種類のインスタンスを作成する必要があります。次の例では、 DocumentAnalysisClientを使用します。 Document Intelligence API にアクセスするためのクライアントインスタンスを作成するには、Form Recognizer リソースの endpoint と credential が必要です。クライアントは、リソースの API キーを持つ AzureKeyCredential を使用するか、Azure Active Directory RBAC を使用してクライアントを承認する TokenCredential を使用できます。

Form Recognizer リソースのエンドポイントは、 Azure Portal または以下の Azure CLI スニペットを使用して見つけることができます。

az cognitiveservices account show --name <your-resource-name> --resource-group <your-resource-group-name> --query "properties.endpoint"

API キーを使用する

Azure ポータルを使用して Form Recognizer リソースを参照し、API キーを取得するか、以下の Azure CLI スニペットを使用します。

注: API キーが "サブスクリプションキー" または "サブスクリプション API キー" と呼ばれる場合があります。

az cognitiveservices account keys list --resource-group <your-resource-group-name> --name <your-resource-name>

API キーとエンドポイントを取得したら、次のように使用できます。

import { AzureKeyCredential, DocumentAnalysisClient } from "@azure/ai-form-recognizer";

const credential = new AzureKeyCredential("<API key>");
const client = new DocumentAnalysisClient(
  "https://<resource name>.cognitiveservices.azure.com",
  credential,
);

Azure Active Directory を使用する

ほとんどの例では API キーの承認が使用されていますが、 Azure Identity ライブラリを使用して Azure Active Directory でクライアントを認証することもできます。以下に示す DefaultAzureCredential プロバイダー、または Azure SDK で提供されるその他の資格情報プロバイダーを使用するには、 @azure/identity パッケージをインストールしてください。

npm install @azure/identity

サービスプリンシパルを使用して認証するには、 AAD アプリケーションを登録し、サービスプリンシパルに "Cognitive Services User" ロールを割り当ててサービスへのアクセスを許可する必要もあります (注: "Owner" などの他のロールでは必要なアクセス許可は付与されず、例とサンプルコードを実行するのに十分なアクセス許可 "Cognitive Services User" だけで十分です)。

AAD アプリケーションのクライアント ID、テナント ID、クライアントシークレットの値を環境変数 (AZURE_CLIENT_ID、AZURE_TENANT_ID、AZURE_CLIENT_SECRET) として設定します。

import { DefaultAzureCredential } from "@azure/identity";
import { DocumentAnalysisClient } from "@azure/ai-form-recognizer";

const credential = new DefaultAzureCredential();
const client = new DocumentAnalysisClient(
  "https://<resource name>.cognitiveservices.azure.com",
  credential,
);

ソブリンクラウド

クライアントの作成時に audience オプションを指定して、別の Azure クラウド環境 (Azure China や Azure Government など) に接続します。 KnownFormRecognizerAudience列挙型を使用して、環境に適した値を選択します。

import { DefaultAzureCredential } from "@azure/identity";
import { DocumentAnalysisClient, KnownFormRecognizerAudience } from "@azure/ai-form-recognizer";

const credential = new DefaultAzureCredential();
const client = new DocumentAnalysisClient(
  "https://<resource name>.cognitiveservices.azure.com", // endpoint
  credential,
  {
    audience: KnownFormRecognizerAudience.AzureGovernment,
  }
);

audience オプションを指定しない場合、デフォルトは Azure Public Cloud (https://cognitiveservices.azure.com) に適しています。

重要な概念

`DocumentAnalysisClient`

DocumentAnalysisClient カスタムモデルと事前構築済みモデルを使用して入力ドキュメントを分析するための操作を提供します。これには 3 つの方法があります。

beginAnalyzeDocumentは、モデル ID で指定されたカスタムモデルまたは事前構築済みモデルを使用して、入力ドキュメントファイルストリームからデータを抽出します。すべてのリソースでサポートされている事前構築済みモデルとそのモデル ID/出力については、モデルのサービスのドキュメントを参照してください。
beginAnalyzeDocumentFromUrlは、 beginAnalyzeDocument と同じ機能を実行しますが、ファイルストリームではなく、公開されているファイルの URL を送信します。

`DocumentModelAdministrationClient`

DocumentModelAdministrationClient リソース内のモデルを管理 (作成、読み取り、一覧表示、および削除) するための操作を提供します。

beginBuildDocumentModel 独自のトレーニングデータセットから新しいドキュメントモデルを作成する操作を開始します。作成されたモデルは、カスタムスキーマに従ってフィールドを抽出できます。トレーニングデータは、Azure Storage コンテナーに配置され、特定の規則に従って整理されることが想定されます。トレーニングデータセットへのラベルの適用の詳細については、トレーニングデータセットの作成に関するサービスのドキュメントを参照してください。
beginComposeDocumentModel 複数のモデルを 1 つのモデルに構成する操作を開始します。カスタムフォーム認識に使用する場合、新しい作成済みモデルは最初に入力ドキュメントの分類を実行して、そのサブモデルのうちどれが最も適切かを判断します。
beginCopyModelTo カスタムモデルを 1 つのリソースから別のリソース (または同じリソース) にコピーする操作を開始します。これには、getCopyAuthorizationメソッドを使用して生成できるターゲットリソースからのCopyAuthorizationが必要です。
getResourceDetails リソースの制限に関する情報 (カスタムモデルの数やリソースがサポートできるモデルの最大数など) を取得します。
getDocumentModel、 listDocumentModels、および deleteDocumentModel を使用すると、リソース内のモデルを管理できます。
getOperation また、モデル作成操作のステータスを表示 listOperations 、進行中の操作や失敗した操作も含めて表示できます。操作は 24 時間保持されます。

モデルは、Document Intelligence サービスのグラフィカルユーザーインターフェイスである Document Intelligence Studio を使用して作成することもできます。

DocumentModelAdministrationClientを使用してモデルを構築する方法を示すサンプルコードスニペットは、以下の「モデルの構築」の例セクションにあります。

長時間実行される操作

実行時間の長い操作 (LRO) は、操作を開始するためにサービスに送信される最初の要求と、操作が完了したかどうか、および失敗したか成功したかを判断するために特定の間隔で結果をポーリングする操作で構成される操作です。最終的に、LROはエラーで失敗するか、結果を生成します。

Azure AI Document Intelligence では、モデルを作成する操作 (モデルのコピーと作成を含む) と分析/データ抽出操作は LRO です。 SDK クライアントは、Promise<PollerLike> オブジェクトを返す非同期begin<operation-name>メソッドを提供します。 PollerLike オブジェクトは、サービスのインフラストラクチャ上で非同期的に実行される操作を表し、プログラムは、begin<operation-name> メソッドから返されたポーラーで pollUntilDone メソッドを呼び出して待機することで、操作の完了を待つことができます。次のセクションでは、実行時間の長い操作の使用を示すために、サンプルコードスニペットが提供されています。

例示

次のセクションでは、Document Intelligence クライアントライブラリで使用される一般的なパターンを示すいくつかの JavaScript コードスニペットを示します。

モデル ID を持つドキュメントの分析
事前構築済みのドキュメントモデルを使用する
事前に作成された「レイアウト」を使用する
事前に作成された "ドキュメント" を使用します
事前に構築された "読み取り" を使用します
モデルを構築する
モデルの管理

モデル ID を持つドキュメントの分析

beginAnalyzeDocument メソッドは、ドキュメントからフィールドとテーブルデータを抽出できます。分析では、独自のデータでトレーニングされたカスタムモデル、またはサービスによって提供される事前構築済みモデルのいずれかを使用できます (以下の「 事前構築済みモデルの使用 」を参照)。カスタムモデルは独自のドキュメントに合わせて調整されるため、モデル内のドキュメントタイプの 1 つと同じ構造のドキュメントでのみ使用する必要があります (作成済みモデルのように、複数のドキュメントが存在する場合があります)。

import { DefaultAzureCredential } from "@azure/identity";
import { DocumentAnalysisClient } from "@azure/ai-form-recognizer";
import { createReadStream } from "node:fs";

const credential = new DefaultAzureCredential();
const client = new DocumentAnalysisClient(
  "https://<resource name>.cognitiveservices.azure.com",
  credential,
);

const modelId = "<model id>";
const path = "<path to a document>";
const readStream = createReadStream(path);

const poller = await client.beginAnalyzeDocument(modelId, readStream, {
  onProgress: ({ status }) => {
    console.log(`status: ${status}`);
  },
});

// There are more fields than just these three
const { documents, pages, tables } = await poller.pollUntilDone();

console.log("Documents:");
for (const document of documents || []) {
  console.log(`Type: ${document.docType}`);
  console.log("Fields:");
  for (const [name, field] of Object.entries(document.fields)) {
    console.log(
      `Field ${name} has content '${field.content}' with a confidence score of ${field.confidence}`,
    );
  }
}

console.log("Pages:");
for (const page of pages || []) {
  console.log(`Page number: ${page.pageNumber} (${page.width}x${page.height} ${page.unit})`);
}

console.log("Tables:");
for (const table of tables || []) {
  console.log(`- Table (${table.columnCount}x${table.rowCount})`);
  for (const cell of table.cells) {
    console.log(`  - cell (${cell.rowIndex},${cell.columnIndex}) "${cell.content}"`);
  }
}

URL からドキュメントを分析する

読み取り可能なストリームを提供する代わりに、 beginAnalyzeDocumentFromUrl メソッドを使用して、パブリックにアクセス可能なURLを提供できます。「パブリックにアクセス可能」とは、URL ソースがサービスのインフラストラクチャからアクセス可能でなければならないことを意味します (つまり、プライベートイントラネットの URL、またはヘッダーまたは証明書ベースのシークレットを使用する URL は機能しません。これは、Document Intelligence サービスが URL にアクセスできる必要があるからです)。ただし、URL 自体でシークレット (クエリパラメーターに SAS トークンを含む Azure Storage BLOB URL など) をエンコードできます。

事前構築済みのドキュメントモデルを使用する

beginAnalyzeDocument メソッドでは、Document Intelligence サービスによって提供される事前構築済みモデルを使用して、領収書、請求書、名刺、身分証明書など、特定の種類の一般的なドキュメントからフィールドを抽出することもできます。事前構築済みモデルは、モデル ID 文字列 (カスタムドキュメントモデルと同じ - 以下の 他の事前構築済みモデルの セクションを参照) として提供されるか、 DocumentModel オブジェクトを使用して提供されます。 DocumentModelを使用する場合、Document Intelligence SDK for JavaScript は、モデルのスキーマに基づいて抽出されたドキュメントに対してはるかに強力な TypeScript タイプを提供し、JavaScript の名前付け規則を使用するように変換されます。

現在のサービス API バージョン (2022-08-31) の DocumentModel オブジェクトの例は、prebuilt samples ディレクトリにあります。次の例では、そのディレクトリの [prebuilt-receipt.ts] ファイルのPrebuiltReceiptModelを使用します。

DocumentModelベースの分析の主な利点は、より強力な TypeScript 型制約であるため、次のサンプルは ECMAScript モジュール構文を使用して TypeScript で記述されています。

import { DefaultAzureCredential } from "@azure/identity";
import { DocumentAnalysisClient } from "@azure/ai-form-recognizer";
import { createReadStream } from "node:fs";
import { PrebuiltReceiptModel } from "../samples-dev/prebuilt/prebuilt-receipt.js";

const credential = new DefaultAzureCredential();
const client = new DocumentAnalysisClient(
  "https://<resource name>.cognitiveservices.azure.com",
  credential,
);

const path = "<path to a document>";
const readStream = createReadStream(path);

// The PrebuiltReceiptModel `DocumentModel` instance encodes both the model ID and a stronger return type for the operation
const poller = await client.beginAnalyzeDocument(PrebuiltReceiptModel, readStream, {
  onProgress: ({ status }) => {
    console.log(`status: ${status}`);
  },
});

const {
  documents: [receiptDocument],
} = await poller.pollUntilDone();

// The fields of the document constitute the extracted receipt data.
const receipt = receiptDocument.fields;

if (receipt === undefined) {
  throw new Error("Expected at least one receipt in analysis result.");
}

console.log(`Receipt data (${receiptDocument.docType})`);
console.log("  Merchant Name:", receipt.merchantName?.value);

// The items of the receipt are an example of a `DocumentArrayValue`
if (receipt.items !== undefined) {
  console.log("Items:");
  for (const { properties: item } of receipt.items.values) {
    console.log("- Description:", item.description?.value);
    console.log("  Total Price:", item.totalPrice?.value);
  }
}

console.log("  Total:", receipt.total?.value);

あるいは、前述のように、より強力な戻り値の型を生成する PrebuiltReceiptModel を使用する代わりに、事前構築済みレシートのモデル ID ("prebuilt-receipt") を使用できますが、ドキュメントフィールドは TypeScript で厳密に型指定されず、フィールド名は通常 "camelCase" ではなく "PascalCase" になります。

その他の事前構築済みモデル

レシートに限らない! いくつかの事前構築済みモデルから選択でき、さらに多くのモデルが進行中です。各事前構築済みモデルには、サポートされているフィールドの独自のセットがあります。

レシート ( PrebuiltReceiptModel (上記と同様) または事前構築済みのレシートモデル ID を使用 "prebuilt-receipt"。
名刺 ( PrebuiltBusinessCardModel またはそのモデル ID "prebuilt-businessCard"を使用)。
PrebuiltInvoiceModelまたはそのモデル ID "prebuilt-invoice"を使用した請求書。
身分証明書(運転免許証やパスポートなど)、 PrebuiltIdDocumentModel またはそのモデルID "prebuilt-idDocument"を使用。
W2 納税申告書 (米国)、 PrebuiltTaxUsW2Model またはそのモデル ID "prebuilt-tax.us.w2"を使用。
健康保険証(米国)(「PrebuiltHealthInsuranceCardUsModel][samples-prebuilt-healthinsurancecard.us]」またはそのモデルID "prebuilt-healthInsuranceCard.us"を使用)

上記の各事前構築済みモデルは、 documents (モデルのフィールドスキーマの抽出されたインスタンス) を生成します。また、フィールドスキーマがないため、 documentsを生成しない 3 つの事前構築済みモデルもあります。これらは次のとおりです。

事前構築済みのレイアウトモデル (以下の 「事前構築済み」を使用する を参照) は、ページやテーブルなどの基本レイアウト (OCR) 要素に関する情報を抽出します。
事前構築済みの一般ドキュメントモデル (以下の 「事前構築済みドキュメント」の使用 を参照) は、レイアウトモデルによって生成された情報にキーと値のペア (ラベル付き要素などのページ要素間の直接的な関連付け) を追加します。
事前構築済みの Read モデル (以下の 「事前構築済み」を使用する を参照) は、ページの単語や行などのテキスト要素と、ドキュメントの言語に関する情報のみを抽出します。

これらすべてのモデルのフィールドについては、利用可能な事前構築済みモデルのサービスのドキュメントを参照してください。

すべての事前構築済みモデルのフィールドには、DocumentModelAdministrationClient と結果の docTypes フィールドの検査の getDocumentModel メソッド (モデル ID による) を使用してプログラムでアクセスすることもできます。

事前に作成された「レイアウト」を使用する

"prebuilt-layout" モデルは、入力ドキュメントのテキストコンテンツ内のページ(テキストの単語/行と選択マークで構成される)、テーブル、ビジュアルテキストスタイル、およびそれらの境界領域とスパンなど、ドキュメントの基本要素のみを抽出します。このモデルを呼び出す PrebuiltLayoutModel という名前の厳密に型指定された DocumentModel インスタンスを提供します。または、通常どおり、そのモデル ID "prebuilt-layout"を直接使用できます。

import { DefaultAzureCredential } from "@azure/identity";
import { DocumentAnalysisClient } from "@azure/ai-form-recognizer";
import { createReadStream } from "node:fs";
import { PrebuiltLayoutModel } from "../samples-dev/prebuilt/prebuilt-layout.js";

const credential = new DefaultAzureCredential();
const client = new DocumentAnalysisClient(
  "https://<resource name>.cognitiveservices.azure.com",
  credential,
);

const path = "<path to a document>";
const readStream = createReadStream(path);

const poller = await client.beginAnalyzeDocument(PrebuiltLayoutModel, readStream);
const { pages, tables } = await poller.pollUntilDone();

for (const page of pages || []) {
  console.log(`- Page ${page.pageNumber}: (${page.width}x${page.height} ${page.unit})`);
}

for (const table of tables || []) {
  console.log(`- Table (${table.columnCount}x${table.rowCount})`);
  for (const cell of table.cells) {
    console.log(`  cell [${cell.rowIndex},${cell.columnIndex}] "${cell.content}"`);
  }
}

事前に作成された "ドキュメント" を使用します

"prebuilt-document"モデルは、レイアウト抽出方法によって生成されるプロパティに加えて、キーと値のペア (ラベル付きフィールドなどのページ要素間の直接的な関連付け) に関する情報を抽出します。この事前構築済み (一般) ドキュメントモデルは、以前の Document Intelligence サービスのイテレーションでラベル情報なしでトレーニングされたカスタムモデルと同様の機能を提供しますが、現在は、さまざまなドキュメントで動作する事前構築済みモデルとして提供されています。このモデルを呼び出す PrebuiltDocumentModel という名前の厳密に型指定された DocumentModel インスタンスを提供します。または、通常どおり、そのモデル ID "prebuilt-document"を直接使用できます。

import { DefaultAzureCredential } from "@azure/identity";
import { DocumentAnalysisClient } from "@azure/ai-form-recognizer";
import { createReadStream } from "node:fs";
import { PrebuiltDocumentModel } from "../samples-dev/prebuilt/prebuilt-document.js";

const credential = new DefaultAzureCredential();
const client = new DocumentAnalysisClient(
  "https://<resource name>.cognitiveservices.azure.com",
  credential,
);

const path = "<path to a document>";
const readStream = createReadStream(path);

const poller = await client.beginAnalyzeDocument(PrebuiltDocumentModel, readStream);

// `pages`, `tables` and `styles` are also available as in the "layout" example above, but for the sake of this
// example we won't show them here.
const { keyValuePairs } = await poller.pollUntilDone();

if (!keyValuePairs || keyValuePairs.length <= 0) {
  console.log("No key-value pairs were extracted from the document.");
} else {
  console.log("Key-Value Pairs:");
  for (const { key, value, confidence } of keyValuePairs) {
    console.log("- Key  :", `"${key.content}"`);
    console.log("  Value:", `"${value?.content ?? "<undefined>"}" (${confidence})`);
  }
}

事前に構築された "読み取り" を使用します

"prebuilt-read"モデルは、単語や段落などのドキュメント内のテキスト情報を抽出し、そのテキストの言語と文体(手書きと組版など)を分析します。このモデルを呼び出す PrebuiltReadModel という名前の厳密に型指定された DocumentModel インスタンスを提供します。または、通常どおり、そのモデル ID "prebuilt-read"を直接使用できます。

import { DefaultAzureCredential } from "@azure/identity";
import { DocumentAnalysisClient } from "@azure/ai-form-recognizer";
import { createReadStream } from "node:fs";
import { PrebuiltReadModel } from "../samples-dev/prebuilt/prebuilt-read.js";

const credential = new DefaultAzureCredential();
const client = new DocumentAnalysisClient(
  "https://<resource name>.cognitiveservices.azure.com",
  credential,
);

const path = "<path to a document>";
const readStream = createReadStream(path);

const poller = await client.beginAnalyzeDocument(PrebuiltReadModel, readStream);

// The "prebuilt-read" model (`beginReadDocument` method) only extracts information about the textual content of the
// document, such as page text elements, text styles, and information about the language of the text.
const { content, pages, languages } = await poller.pollUntilDone();

if (!pages || pages.length <= 0) {
  console.log("No pages were extracted from the document.");
} else {
  console.log("Pages:");
  for (const page of pages) {
    console.log("- Page", page.pageNumber, `(unit: ${page.unit})`);
    console.log(`  ${page.width}x${page.height}, angle: ${page.angle}`);
    console.log(
      `  ${page.lines && page.lines.length} lines, ${page.words && page.words.length} words`,
    );

    if (page.lines && page.lines.length > 0) {
      console.log("  Lines:");

      for (const line of page.lines) {
        console.log(`  - "${line.content}"`);
      }
    }
  }
}

if (!languages || languages.length <= 0) {
  console.log("No language spans were extracted from the document.");
} else {
  console.log("Languages:");
  for (const languageEntry of languages) {
    console.log(
      `- Found language: ${languageEntry.locale} (confidence: ${languageEntry.confidence})`,
    );

    for (const text of getTextOfSpans(content, languageEntry.spans)) {
      const escapedText = text.replace(/\r?\n/g, "\\n").replace(/"/g, '\\"');
      console.log(`  - "${escapedText}"`);
    }
  }
}

function* getTextOfSpans(content, spans) {
  for (const span of spans) {
    yield content.slice(span.offset, span.offset + span.length);
  }
}

ドキュメントの分類

Document Intelligence サービスは、トレーニングデータセットに基づいてドキュメントを定義済みのカテゴリのセットに分類できるカスタムドキュメント分類子をサポートしています。ドキュメントは、 beginClassifyDocument の DocumentAnalysisClient メソッドを使用してカスタム分類子で分類できます。上記の beginAnalyzeDocument と同様に、このメソッドは、分類するドキュメントを含むファイルまたはストリームを受け入れ、代わりにドキュメントへのパブリックにアクセス可能なURLを受け入れる beginClassifyDocumentFromUrl 対応物があります。

次のサンプルは、カスタム分類子を使用してドキュメントを分類する方法を示しています。

import { DefaultAzureCredential } from "@azure/identity";
import { DocumentAnalysisClient } from "@azure/ai-form-recognizer";

const credential = new DefaultAzureCredential();
const client = new DocumentAnalysisClient(
  "https://<resource name>.cognitiveservices.azure.com",
  credential,
);

const documentUrl =
  "https://raw.githubusercontent.com/Azure/azure-sdk-for-js/main/sdk/formrecognizer/ai-form-recognizer/assets/invoice/Invoice_1.pdf";

const poller = await client.beginClassifyDocumentFromUrl("<classifier id>", documentUrl);

const result = await poller.pollUntilDone();

if (result?.documents?.length === 0) {
  throw new Error("Failed to extract any documents.");
}

for (const document of result.documents) {
  console.log(
    `Extracted a document with type '${document.docType}' on page ${document.boundingRegions?.[0].pageNumber} (confidence: ${document.confidence})`,
  );
}

カスタム分類子のトレーニングについては、次のセクションの最後にある分類子のトレーニングに関するセクションを参照してください。

モデルの構築

SDK は、 DocumentModelAdministrationClient クラスを使用したモデルの作成もサポートしています。ラベル付けされたトレーニングデータからモデルを構築すると、独自のドキュメントでトレーニングされた新しいモデルが作成され、結果のモデルはそれらのドキュメントの構造から値を認識できるようになります。モデル構築操作では、トレーニングドキュメントを保持する Azure Storage BLOB コンテナーへの SAS エンコードされた URL を受け取ります。 Document Intelligence サービスのインフラストラクチャは、コンテナ内のファイルを読み取り、その内容に基づいてモデルを作成します。トレーニングデータコンテナーを作成および構造化する方法の詳細については、モデルの構築に関する Document Intelligence サービスのドキュメントを参照してください。

プログラムによるモデル作成にはこれらの方法を提供していますが、Document Intelligence サービスチームは、Web 上でモデルを作成および管理できる対話型 Web アプリケーションである Document Intelligence Studio を作成しました。

たとえば、次のプログラムでは、既存の Azure Storage コンテナーへの SAS エンコードされた URL を使用して、カスタムドキュメントモデルを構築します。

import { DefaultAzureCredential } from "@azure/identity";
import { DocumentModelAdministrationClient } from "@azure/ai-form-recognizer";

const credential = new DefaultAzureCredential();
const client = new DocumentModelAdministrationClient(
  "https://<resource name>.cognitiveservices.azure.com",
  credential,
);

const containerSasUrl = "<SAS url to the blob container storing training documents>";

// You must provide the model ID. It can be any text that does not start with "prebuilt-".
// For example, you could provide a randomly generated GUID using the "uuid" package.
// The second parameter is the SAS-encoded URL to an Azure Storage container with the training documents.
// The third parameter is the build mode: one of "template" (the only mode prior to 4.0.0-beta.3) or "neural".
// See https://aka.ms/azsdk/formrecognizer/buildmode for more information about build modes.
const poller = await client.beginBuildDocumentModel("<model ID>", containerSasUrl, "template", {
  // The model description is optional and can be any text.
  description: "This is my new model!",
  onProgress: ({ status }) => {
    console.log(`operation status: ${status}`);
  },
});
const model = await poller.pollUntilDone();

console.log(`Model ID: ${model.modelId}`);
console.log(`Description: ${model.description}`);
console.log(`Created: ${model.createdOn}`);

// A model may contain several document types, which describe the possible object structures of fields extracted using
// this model

console.log("Document Types:");
for (const [docType, { description, fieldSchema: schema }] of Object.entries(
  model.docTypes ?? {},
)) {
  console.log(`- Name: "${docType}"`);
  console.log(`  Description: "${description}"`);

  // For simplicity, this example will only show top-level field names
  console.log("  Fields:");

  for (const [fieldName, fieldSchema] of Object.entries(schema)) {
    console.log(`  - "${fieldName}" (${fieldSchema.type})`);
    console.log(`    ${fieldSchema.description ?? "<no description>"}`);
  }
}

カスタム分類子は、beginBuildDocumentModel ではなく beginBuildDocumentClassifier メソッドを使用して同様の方法で構築されます。カスタム分類子の構築の詳細については、入力トレーニングデータが少し異なる形式で提供されるため、分類子のビルドサンプルを参照してください。カスタム分類子のトレーニングデータセットの作成については、Document Intelligence サービスのドキュメントを参照してください。

モデルを管理する

DocumentModelAdministrationClient また、モデルにアクセスして一覧表示するためのいくつかの方法も提供します。次の例は、リソース内のモデルを反復処理し (これには、リソース内のカスタムモデルと、すべてのリソースに共通する事前構築済みモデルの両方が含まれます)、ID によるモデルの取得、およびモデルの削除方法を示しています。

import { DefaultAzureCredential } from "@azure/identity";
import { DocumentModelAdministrationClient } from "@azure/ai-form-recognizer";

const credential = new DefaultAzureCredential();
const client = new DocumentModelAdministrationClient(
  "https://<resource name>.cognitiveservices.azure.com",
  credential,
);

// Produces an async iterable that supports paging (`PagedAsyncIterableIterator`). The `listDocumentModels` method will only
// iterate over model summaries, which do not include detailed schema information. Schema information is only returned
// from `getDocumentModel` as part of the full model information.
const models = client.listDocumentModels();
let i = 1;
for await (const summary of models) {
  console.log(`Model ${i++}:`, summary);
}

// The iterable is paged, and the application can control the flow of paging if needed
i = 1;
for await (const page of client.listDocumentModels().byPage()) {
  for (const summary of page) {
    console.log(`Model ${i++}`, summary);
  }
}

// We can also get a full ModelInfo by ID. Here we only show the basic information. See the documentation and the
// `getDocumentModel` sample program for information about the `docTypes` field, which contains the model's document type
// schemas.
const model = await client.getDocumentModel("<model ID>");
console.log(`ID ${model.modelId}`);
console.log(`Created: ${model.createdOn}`);
console.log(`Description: ${model.description ?? "<none>"}`);

// A model can also be deleted by its model ID. Once it is deleted, it CANNOT be recovered.
const modelIdToDelete = "<model ID that should be deleted forever>";
await client.deleteDocumentModel(modelIdToDelete);

カスタム分類子を削除するためのdeleteDocumentClassifierに加えて、カスタム分類子に関する情報を一覧表示および取得するために、同様のメソッド listDocumentClassifiers と getDocumentClassifier を使用できます。

トラブルシューティング

トラブルシューティングに関するサポートについては、トラブルシューティングガイドを参照してください。

ロギング（記録）

ログ記録を有効にすると、エラーに関する有用な情報を明らかにするのに役立つ場合があります。 HTTP 要求と応答のログを表示するには、 AZURE_LOG_LEVEL 環境変数を info に設定します。または、setLogLevelで@azure/loggerを呼び出すことによって、実行時にログ記録を有効にすることもできます。

import { setLogLevel } from "@azure/logger";

setLogLevel("info");

ログを有効にする方法の詳細な手順については、 @azure/logger パッケージのドキュメントを参照してください。

次のステップ

上記の「例」セクションに示されていないいくつかの機能や方法 (モデルのコピーと構成、モデル管理操作の一覧表示、モデルの削除など) を含む、このライブラリの使用方法を示す詳細なコードサンプルについては、 samples ディレクトリを参照してください。

投稿

このライブラリに投稿する場合は、コードをビルドしてテストする方法の詳細については、投稿ガイドを参照してください。