什么是模糊字符串匹配?

适用于:SQL Server 2025 (17.x) 在Microsoft Fabric 中预览 Azure SQL 数据库Azure SQL 托管实例 SQL 数据库

使用模糊或近似字符串匹配来检查两个字符串是否相似,并计算两个字符串之间的差异。 使用此功能可识别因字符损坏而可能不同的字符串。 损坏原因可能包括拼写错误、转置字符、缺少字符或缩写。 模糊字符串匹配使用算法来检测声音相似的字符串。

注释

  • 模糊字符串匹配目前处于试用阶段。
  • SQL Server 在 SQL Server 2025(17.x)预览版中引入了对模糊字符串匹配的支持。

模糊函数

功能 DESCRIPTION
EDIT_DISTANCE 计算将一个字符串转换为另一个字符串所需的插入、删除、替换和换行数。
编辑距离相似度 计算从 0(表示不匹配)到 100(表示完全匹配)的相似性值。
JARO_WINKLER_DISTANCE(贾罗-温克勒距离) 计算两个字符串之间的编辑距离,该距离优先于从头开始匹配的字符串,以设置前缀长度。
Jaro-Winkler 相似度 计算一个相似性值,范围为 0(表示不匹配)到 1(表示完全匹配)。

注释

目前,这些函数不遵循排序规则设置定义的比较语义,例如不区分大小写和其他特定于排序规则的规则。 实现对排序规则的支持后,函数的输出将反映这些语义,并可能会相应地更改。

例子

以下示例演示模糊字符串匹配函数。

示例表

在运行示例查询之前,请创建并填充示例表。

若要创建并填充示例表,请连接到非生产用户数据库,并运行以下脚本:

-- Step 1: Create the table
CREATE TABLE WordPairs (
    WordID INT IDENTITY(1,1) PRIMARY KEY, -- Auto-incrementing ID
    WordUK NVARCHAR(50), -- UK English word
    WordUS NVARCHAR(50)  -- US English word
);

-- Step 2: Insert the data
INSERT INTO WordPairs (WordUK, WordUS) VALUES
('Colour', 'Color'),
('Flavour', 'Flavor'),
('Centre', 'Center'),
('Theatre', 'Theater'),
('Organise', 'Organize'),
('Analyse', 'Analyze'),
('Catalogue', 'Catalog'),
('Programme', 'Program'),
('Metre', 'Meter'),
('Honour', 'Honor'),
('Neighbour', 'Neighbor'),
('Travelling', 'Traveling'),
('Grey', 'Gray'),
('Defence', 'Defense'),
('Practise', 'Practice'), -- Verb form in UK
('Practice', 'Practice'), -- Noun form in both
('Aluminium', 'Aluminum'),
('Cheque', 'Check'); -- Bank cheque vs. check

示例 EDIT_DISTANCE

SELECT WordUK, WordUS, EDIT_DISTANCE(WordUK, WordUS) AS Distance
FROM WordPairs
WHERE EDIT_DISTANCE(WordUK, WordUS) <= 2
ORDER BY Distance ASC;

返回:

WordUK                         WordUS                         Distance
------------------------------ ------------------------------ -----------
Practice                       Practice                       0
Aluminium                      Aluminum                       1
Honour                         Honor                          1
Neighbour                      Neighbor                       1
Travelling                     Traveling                      1
Grey                           Gray                           1
Defence                        Defense                        1
Practise                       Practice                       1
Colour                         Color                          1
Flavour                        Flavor                         1
Organise                       Organize                       1
Analyse                        Analyze                        1
Catalogue                      Catalog                        2
Programme                      Program                        2
Metre                          Meter                          2
Centre                         Center                         2
Theatre                        Theater                        2

示例 EDIT_DISTANCE_SIMILARITY

SELECT WordUK, WordUS, EDIT_DISTANCE_SIMILARITY(WordUK, WordUS) AS Similarity
FROM WordPairs
WHERE EDIT_DISTANCE_SIMILARITY(WordUK, WordUS) >=75
ORDER BY Similarity DESC;

返回:

WordUK                         WordUS                         Similarity
------------------------------ ------------------------------ -----------
Practice                       Practice                       100
Travelling                     Traveling                      90
Aluminium                      Aluminum                       89
Neighbour                      Neighbor                       89
Organise                       Organize                       88
Practise                       Practice                       88
Defence                        Defense                        86
Analyse                        Analyze                        86
Flavour                        Flavor                         86
Colour                         Color                          83
Honour                         Honor                          83
Catalogue                      Catalog                        78
Programme                      Program                        78
Grey                           Gray                           75

示例 JARO_WINKLER_DISTANCE

SELECT WordUK, WordUS, JARO_WINKLER_DISTANCE(WordUK, WordUS) AS Distance
FROM WordPairs
WHERE JARO_WINKLER_DISTANCE(WordUK, WordUS) <= .05
ORDER BY Distance ASC;

返回:

WordUK                         WordUS                         Distance
------------------------------ ------------------------------ -----------
Practice                       Practice                       0
Travelling                     Traveling                      0.01999998
Neighbour                      Neighbor                       0.02222222
Aluminium                      Aluminum                       0.02222222
Theatre                        Theater                        0.02857143
Flavour                        Flavor                         0.02857143
Centre                         Center                         0.03333336
Colour                         Color                          0.03333336
Honour                         Honor                          0.03333336
Catalogue                      Catalog                        0.04444444
Programme                      Program                        0.04444444
Metre                          Meter                          0.04666668

示例 JARO_WINKLER_SIMILARITY

SELECT WordUK, WordUS, JARO_WINKLER_SIMILARITY(WordUK, WordUS) AS Similarity
FROM WordPairs
WHERE JARO_WINKLER_SIMILARITY(WordUK, WordUS) > 0.9
ORDER BY  Similarity DESC;

返回:

WordUK                         WordUS                         Similarity
------------------------------ ------------------------------ -----------
Practice                       Practice                       1
Travelling                     Traveling                      0.98
Neighbour                      Neighbor                       0.9777778
Aluminium                      Aluminum                       0.9777778
Flavour                        Flavor                         0.9714286
Theatre                        Theater                        0.9714286
Centre                         Center                         0.9666666
Colour                         Color                          0.9666666
Honour                         Honor                          0.9666666
Catalogue                      Catalog                        0.9555556
Programme                      Program                        0.9555556
Metre                          Meter                          0.9533333
Organise                       Organize                       0.95
Practise                       Practice                       0.95
Defence                        Defense                        0.9428572
Analyse                        Analyze                        0.9428572

包含所有函数的示例查询

以下查询演示了当前可用的所有正则表达式函数。

SELECT	T.source_string,
		T.target_string,
		EDIT_DISTANCE(T.source_string, T.target_string) as ED_Distance,
		JARO_WINKLER_DISTANCE(T.source_string, T.target_string) as JW_Distance,

		EDIT_DISTANCE_SIMILARITY(T.source_string, T.target_string) as ED_Similarity,
		CAST(JARO_WINKLER_SIMILARITY(T.source_string, T.target_string)*100 as int) as JW_Similarity
FROM (VALUES('Black', 'Red'),
			('Colour', 'Yellow'),
			('Colour', 'Color'),
			('Microsoft', 'Msft'),
			('Regex', 'Regex')) as T(source_string, target_string);

返回:

source_string  target_string  ED_Distance    JW_Distance    ED_Similarity  JW_Similarity
-------------- -------------- -------------- -------------- -------------- -------------- 
Black	        Red	            5	           1	            0	        0
Colour	        Yellow	        5	           0.4444444	    17	        55
Colour	        Color	        1	           0.03333336	    83	        96
Microsoft	    Msft	        5	           0.4916667	    44	        50
Regex	        Regex	        0	           0	            100	        100

清除

使用示例数据后,请删除示例表。

IF OBJECT_ID('dbo.WordPairs', 'U') IS NOT NULL
BEGIN
    DROP TABLE dbo.WordPairs;
END