Share via


Set Domain Properties

This topic describes how to set ___domain properties in Data Quality Services (DQS).

In This Topic

  • Before you begin:

    Prerequisites

    Security

  • Set Domain Properties

  • Follow Up: After Setting Domain Properties

  • Domain Properties

    Domain Name and Description

    Data Type

    Use Leading Values

    Normalize String

    Format Output to

    Language

    Enable Speller

    Disable Syntax Error Algorithms

Before You Begin

Prerequisites

To set properties for a ___domain, you must have created a knowledge base and a ___domain.

Security

Permissions

You must have the dqs_kb_editor or the dqs_administrator role on the DQS_MAIN database to set properties on a ___domain.

Arrow icon used with Back to Top link[Top]

Set Domain Properties

  1. Set properties on an existing ___domain by opening a knowledge base in the Domain Management activity (see Open a Knowledge Base), and then selecting the appropriate ___domain in the Domain list. The Domain Properties page will be displayed by default.

  2. Set properties on a new ___domain after creating it as described in Create a Domain.

  3. Click Finish to complete the ___domain management activity, as described in End the Domain Management Activity.

Arrow icon used with Back to Top link[Top]

Follow Up: After Setting Domain Properties

After you set ___domain properties, you can perform other ___domain management tasks on the ___domain, you can perform knowledge discovery to add knowledge to the ___domain, or you can add a matching policy to the ___domain. For more information, see Perform Knowledge Discovery, Managing a Domain, or Create a Matching Policy.

Arrow icon used with Back to Top link[Top]

Domain Properties

Domain Name and Description

Once a ___domain has been created, the ___domain name or description can be changed. The ___domain name must be unique for the knowledge base. The description can be up to 256 characters.

Data Type

When you create the ___domain, select one of the following data types for the values in the ___domain: String (the default), Date, Integer, or Decimal. After you have created the ___domain, you can view the data type, but you cannot change it. The data type selected for a ___domain defines the type of source data that can be mapped to the ___domain. For information about supported data types for each of the four ___domain data types in DQS, see Supported SQL Server and SSIS Data Types for DQS Domains.

Use Leading Values

Select this checkbox to specify that the leading value in a group of synonyms will be output instead of a value that is a synonym to it. Deselect Use Leading Values to specify that each synonym value is output in its correct or corrected form, and is not replaced by the leading value for its group.

Normalize String

If the data type is String, select this checkbox to ignore the special characters in the source data. DQS internally replaces the special characters by a null or a space when the data is loaded into the ___domain. A colon, hyphen, period, double quote, or semicolon is replaced by a space. A single quote is replaced by a null. Using the null brings the two parts of the string together.

Ignoring special characters in a string value can increase matching accuracy. The similarity score between two strings can be increased by replacing special characters with a null or a space. Punctuation marks or other symbols can easily be different in different strings. Replacing special characters internally can enable the score to surpass the minimum matching threshold in DQS, causing two strings to be deemed matches when they would not have been so otherwise. However, whether you choose to ignore special characters may depend upon the type of data that you are performing matching on. For example, when you are working with data in the English System of measurement, ignoring double quotes and single quotes in product data may result in false positives if a double quote stands for an inch and a single quote stands for a foot.

Normalization is performed when data is loaded and indexed in the data processing stages of discovery, matching policy, matching project, and cleansing project activities. If enabled, normalization and term-based relations transformation are both done in a pre-processing stage before analysis. They are executed on each ___domain before any algorithms are applied that compute similarity between strings. If composite ___domain parsing is requested, it will be performed before normalization and term-based relations transformation, because delimiter parsing requires symbols. Other operations, such as ___domain rules and ___domain value changes, will be performed after the transformations. The resultant data is not changed by the internal replacement of special characters in DQS.

Format Output to

Select the formatting that will be applied when the data values in the ___domain are output. The formatting is specific to the data type selected, as shown in the following list. Selecting None means none of the formats in the list will be applied.

  • For a string value, you can specify that the string be output as upper case, lower case, or capitalized.

  • For a date value, you can specify the format of the day, month, and year.

  • For an integer value, you can specify the type of format mask to be applied.

  • For a decimal value, you can specify the accuracy and the type of format mask to be applied.

Language

If the data type is String, select which language you want to associate the ___domain with for operation of the speller. This selection only applies for the speller, because speller results depend upon the language in use. The selection only applies for a single ___domain with a data type is string. The language property is not relevant for composite domains. The language for each part of a composite ___domain is determined by the relevant single ___domain.

English is the default language. Setting the Language property to Other disables the Speller for the ___domain.

Tip

If your language is not listed in the Language drop-down list, you must select Other. This ensures that DQS cleanses and eliminates duplicates for the non-listed language data based on the available knowledge (___domain rules, ___domain values, TBRs, matching rule) in the ___domain. For more information about cleansing and matching non-listed languages, see this blog post: Languages Supported by DQS for Cleansing and Matching.

Enable Speller

If the data type is String, click to enable the DQS Speller for the ___domain. The Speller only works on domains with a data type of string. The Enable Speller check box enables the speller only for the single ___domain associated with the check box. The check box does not apply to a composite ___domain.

The Speller proposes syntax and validation corrections to values in the ___domain. For more information, see Use the DQS Speller.

Disable Syntax Error Algorithms

If the data type is String, select to specify that syntax errors will not be identified by DQS in the ___domain during cleansing. Select this checkbox when identifying syntax errors for that ___domain is irrelevant. For example, identifying syntax errors may not matter for a serial number. This control is only available for the string data type. DQS will not check non-string data types for syntax errors.

Arrow icon used with Back to Top link[Top]