适用于: SQL Server 2025 (17.x) 预览版
AI_GENERATE_CHUNKS
是一个表值函数,它基于类型、大小和源表达式创建“区块”或文本片段。
兼容性级别 170
AI_GENERATE_CHUNKS
要求兼容级别至少为 170。 当级别小于 170 时,数据库引擎无法找到该 AI_GENERATE_CHUNKS
函数。
若要更改数据库的兼容性级别,请参阅 “查看或更改数据库的兼容性级别”。
语法
AI_GENERATE_CHUNKS (source = text_expression
, chunk_type = 'FIXED'
[ [ , ] chunk_size = numeric_expression ]
[ [ , ] overlap = numeric_expression ]
)
论据
源
任何字符类型的表达式(例如 nvarchar、varchar、nchar 或 char)。
chunk_type
用于对文本/文档进行分块的类型或方法命名的字符串文本,不能 NULL
或列中的值。
此版本的接受值:
FIXED
chunk_size
当chunk_type
是FIXED
时,此参数设置指定为变量、文本或小int、int 或 bigint 类型的标量表达式的每个区块的字符/字数计数大小。
chunk_size 不能为 NULL
、负或零(0
)。
重叠
重叠参数确定应包含在当前区块中的上述文本的百分比。 此百分比应用于 chunk_size
参数以以字符为单位计算大小。
可以将重叠值指定为变量、文本或类型为 tinyint、smallint、int 或 bigint 的标量表达式。 它必须是介于零(0
)和 50 之间的整数(含)且不能为 NULL 或负数。 默认值为零 (0
)。
返回类型
AI_GENERATE_CHUNKS
返回具有以下列的表:
列名称 | 数据类型 | DESCRIPTION |
---|---|---|
chunk |
与源表达式数据类型相同 | 返回从源表达式分块的文本。 |
chunk_set_id |
int | 对文档或行的所有区块进行分组的 ID。 如果多个文档或行分块在单个事务中,则每个文档或行都有不同的 chunk_set_id 分块。 |
chunk_order |
int | 一系列与每个区块的排序相关的顺序,从开始 1 和递增 1 。 |
chunk_offset |
int | 源数据/文档区块相对于分块过程的开始位置。 |
chunk_length |
int | 返回的文本区块的字符长度。 |
返回示例
下面是使用以下参数返回结果 AI_GENERATE_CHUNKS
的示例:
的
FIXED
区块类型。区块大小为 50 个字符。
区块文本:
All day long we seemed to dawdle through a country which was full of beauty of every kind. Sometimes we saw little towns or castles on the top of steep hills such as we see in old missals; sometimes we ran by rivers and streams which seemed from the wide stony margin on each side of them to be subject to great floods.
块 | chunk_set_id | chunk_order | chunk_offset | chunk_length |
---|---|---|---|---|
All day long we seemed to dawdle through a country |
1 | 1 | 1 | 50 |
which was full of beauty of every kind. Sometimes |
1 | 2 | 51 | 50 |
we saw little towns or castles on the top of stee |
1 | 3 | 101 | 50 |
p hills such as we see in old missals; sometimes w |
1 | 4 | 151 | 50 |
e ran by rivers and streams which seemed from the |
1 | 5 | 201 | 50 |
wide stony margin on each side of them to be subje |
1 | 6 | 251 | 50 |
ct to great floods. |
1 | 7 | 301 | 19 |
注解
AI_GENERATE_CHUNKS
可用于包含多行的表。 根据区块大小和要分块的文本量,结果集指示何时启动 chunk_set_id
包含该列的新列或文档。 在以下示例中,当它完成对第一行的文本进行分块并移动到第二行时,更改 chunk_set_id
。 用于和chunk_offset
重置的值chunk_order
以指示新的起点。
CREATE TABLE textchunk (text_id INT IDENTITY(1,1) PRIMARY KEY, text_to_chunk nvarchar(max));
GO
INSERT INTO textchunk (text_to_chunk)
VALUES
('All day long we seemed to dawdle through a country which was full of beauty of every kind. Sometimes we saw little towns or castles on the top of steep hills such as we see in old missals; sometimes we ran by rivers and streams which seemed from the wide stony margin on each side of them to be subject to great floods.'),
('My Friend, Welcome to the Carpathians. I am anxiously expecting you. Sleep well to-night. At three to-morrow the diligence will start for Bukovina; a place on it is kept for you. At the Borgo Pass my carriage will await you and will bring you to me. I trust that your journey from London has been a happy one, and that you will enjoy your stay in my beautiful land. Your friend, DRACULA')
GO
SELECT c.*
FROM textchunk t
CROSS APPLY
AI_GENERATE_CHUNKS(source = text_to_chunk, chunk_type = N'FIXED', chunk_size = 50) c
块 | chunk_set_id | chunk_order | chunk_offset | chunk_length |
---|---|---|---|---|
All day long we seemed to dawdle through a country |
1 | 1 | 1 | 50 |
which was full of beauty of every kind. Sometimes |
1 | 2 | 51 | 50 |
we saw little towns or castles on the top of stee |
1 | 3 | 101 | 50 |
p hills such as we see in old missals; sometimes w |
1 | 4 | 151 | 50 |
e ran by rivers and streams which seemed from the |
1 | 5 | 201 | 50 |
wide stony margin on each side of them to be subje |
1 | 6 | 251 | 50 |
ct to great floods. |
1 | 7 | 301 | 19 |
My Friend, Welcome to the Carpathians. I am anxi |
2 | 1 | 1 | 50 |
ously expecting you. Sleep well to-night. At three |
2 | 2 | 51 | 50 |
to-morrow the diligence will start for Bukovina; |
2 | 3 | 101 | 50 |
a place on it is kept for you. At the Borgo Pass m |
2 | 4 | 151 | 50 |
y carriage will await you and will bring you to me |
2 | 5 | 201 | 50 |
. I trust that your journey from London has been a |
2 | 6 | 251 | 50 |
happy one, and that you will enjoy your stay in m |
2 | 7 | 301 | 50 |
y beautiful land. Your friend, DRACULA |
2 | 8 | 351 | 三十八 |
例子
答: 对具有 FIXED 类型和大小为 100 个字符的文本列进行分块
以下示例用于 AI_GENERATE_CHUNKS
对文本列进行分块。 它使用 chunk_type
FIXED
100 个字符和 chunk_size
100 个字符。
SELECT
c.chunk
FROM
docs_table t
CROSS APPLY
AI_GENERATE_CHUNKS(source = text_column, chunk_type = N'FIXED', chunk_size = 100) c
B. 对文本列进行分块,使其重叠
以下示例使用 AI_GENERATE_CHUNKS
重叠对文本列进行分块。 它使用 FIXED 的chunk_type,chunk_size为 100 个字符,重叠百分比为 10%。
SELECT
c.chunk
FROM
docs_table t
CROSS APPLY
AI_GENERATE_CHUNKS(source = text_column, chunk_type = N'FIXED', chunk_size = 100, overlap = 10) c
C. 将AI_GENERATE_EMBEDDINGS用于AI_GENERATE_CHUNKS
此示例用于AI_GENERATE_EMBEDDINGS
AI_GENERATE_CHUNKS
从文本区块创建嵌入内容,然后将从 AI 模型推理终结点返回的向量数组插入表中。
INSERT INTO
my_embeddings (chunked_text, vector_embeddings)
SELECT
c.chunk,
AI_GENERATE_EMBEDDINGS(c.chunk USE MODEL MyAzureOpenAiModel)
FROM
table_with_text t
CROSS APPLY
AI_GENERATE_CHUNKS(source = t.text_to_chunk, chunk_type = N'FIXED', chunk_size = 100) c