Unstructured data as a knowledge source

Copilot Studio allows you to enhance your agents with ___domain-specific knowledge powered by the same trusted, familiar data sources you've been building through Power Platform connectors.

By uploading external content from your device, OneDrive, or SharePoint, you can enrich your agents with contextual knowledge tailored to your business. These files are securely stored in Microsoft Dataverse and automatically processed into semantic indexes and vector embeddings. This configuration enables your agents to generate more accurate, grounded responses based on the information you provide.

Files uploaded in Copilot Studio use Microsoft Dataverse to ingest raw files to create indexes and vector embeddings which help provide quality responses for your agents. These files can be uploaded from your computer, or by connecting to OneDrive or SharePoint.

Uploading files as knowledge sources helps makers enrich their agents with extra data, augmenting the language model's knowledge, and grounding the agent in specific information provided by the maker. Makers can upload various files which are semantically indexed as vector embeddings and then used as knowledge for agents. This knowledge used in agents can then be shared with authenticated and unauthenticated users of the agent.

Graphic depicting the interactions between the makers of agents and the users of agents, and how knowledge sources retrieve information to be provided to the user.

To improve agent’s responses, uploaded files are chunked into pieces for faster processing and vector-indexed to provide semantic matches with the user's query. The files are stored securely in Dataverse. When a user queries through an agent, Copilot Studio finds the most relevant chunks that match the intent of the user query and returns the results to the user. 

Similarly, Dataverse ingests OneDrive files, SharePoint files (using the options under file upload), and unstructured content like knowledge base articles from other enterprise systems such as Salesforce, ServiceNow, Confluence, and ZenDesk to provide better semantic results for the agent.

Power Platform connectors for unstructured data

The following Power Platform connectors are configured to work with unstructured data sources:

OneDrive

One Drive allows makers to use a file selector interface to choose the files and folders they wish to include. Once selected, the items are retrieved into Dataverse and indexed for use. Folders added include all of the supported files and subfolders within that folder up to the total file limit.

SharePoint

SharePoint Documents allow makers to use a file selector interface to choose the files and folders they wish to include. Once selected, the items are retrieved into Dataverse and indexed for use. Folders added include all of the supported files and subfolders within that folder up to the total file limit. Currently there's no support for Pages.

Salesforce

The Salesforce connector for unstructured data supports the ability to retrieve Knowledge Bases containing knowledge articles. Makers select a Knowledge Base and all articles within that Knowledge Base are indexed for use. Individual articles or topics can't be selected. When querying data there's no ability to specify a specific article or knowledge base. The Knowledge list shows a single object for all knowledge objects you select when you create the source.

ServiceNow

The ServiceNow connector for unstructured data supports the ability to retrieve Knowledge Bases containing knowledge articles. Knowledge Bases contain articles. Makers select a Knowledge Base and all articles within that Knowledge Base are indexed for use. Individual articles can't be selected. When querying data there's no ability to specify a knowledge base, folder, or individual article. The Knowledge list shows a single object for all knowledge objects you select when you create the source.

Confluence

The Confluence connector for unstructured data supports the ability to retrieve the spaces containing pages, subfolders are also supported. Individual pages can't be selected. When querying data there's no ability to specify a page. The Knowledge list shows a single object for all pages within the space.

Zendesk

The Zendesk connector for unstructured data supports the ability to retrieve the knowledge base containing knowledge articles. Individual articles, categories, or sections can't be selected. When querying data there's no ability to specify an article, category, or section. The Knowledge list shows a single object for all articles within the knowledge base.

Security

When a user queries an agent that is using a Power Platform Connector source, a few authorization checks are done.

Connector Access

When a maker first uses a connector-based source, they're asked to either select an existing Power Platform connector or to add one. This process ensures that data is only shared with makers who have the appropriate permissions, and provides access to the data source itself.

Content access

When a query is made, the user’s connection information is used to check the data source to make sure they have permission to see the content. Even though the chunks and indexes are stored locally to Dataverse, a live check is done on the queries to make sure that the current user has access to the data before providing a summary or response.

Note

If a user doesn't have permissions for a specific set of files or knowledge base articles, a result isn't returned to them and they receive a standard message of "no results could be found." If users feel there should be results for that source, they need to work with their administrators to ensure they have permissions to the data they're trying to reach. Content permission information isn't stored locally. All permission checks are done live with the source to ensure they're the most up-to-date.

Synchronization and file refresh frequency

Connected files from OneDrive and SharePoint, and unstructured knowledge articles are kept fresh using a scheduled synchronization job. This job runs automatically in the background, refreshing the contents of the files and reindexing the changes to provide accurate results for queries. Refreshes manage not only changes to content, but also ensure any content deleted from the source no longer appears as part of any query responses. Currently, there isn't a way to manually trigger a refresh.

Licensing

All requests that involve knowledge are charged at the Microsoft Copilot generative answers messaging rates. For more information, go to Billing rates and management.

If knowledge sources require data to be ingested, then the storage of the data and the corresponding indexes to retrieve that data would be subject to the storage entitlements the customer has. For more information on Dataverse natural language search, go to Enhance AI-powered experiences with Dataverse search.

Limits and limitations

When first enabling unstructured data support, there might be a delay between 5 and 30 minutes for Dataverse configuration and indexing before the added files are processed. The length of time depends on the size of the current Dataverse environment.

Each agent can have a maximum of 500 knowledge objects. These objects could be files, folders, knowledge articles, websites, or other sources.

At this time, only five different sources can be used at a time in an agent. For example, SharePoint, Dataverse, OneDrive, or other sources.

For more information about specific limits and limitations for the supported unstructured data sources, go to Copilot Studio unstructured data knowledge source limits.

FAQ

The SharePoint icon isn't displayed in the Upload files section of the Add knowledge dialog?

There's a slight delay between installing a solution and it being displayed in all existing organizations. To initiate a manual update, follow these steps:

  1. Sign in to the Power Platform Admin Center, using administrator credentials.
  2. Select Manage.
  3. Select Dynamics 365 Apps.
  4. Type "PowerAIExtensions" in the search bar.
  5. Select the More icon () of Microsoft Dynamics 365 - PowerAIExtensions and select Install.
  6. From the drop-down menu, select your environment and then select Install.
  7. After the installation completes, open Power Apps in a new window.
  8. Select Solutions.
  9. Select See History.
  10. Search for "PowerAIExtensions_Anchor" and ensure it's set to 1.01.688 or higher.

In the Add knowledge dialog, what is the difference between the two SharePoint options?

In the Add knowledge dialog, there are two SharePoint options. The SharePoint option in the file upload section is used to upload individual SharePoint files or folders, and enables file synchronization capabilities. The other SharePoint option provides the full support of SharePoint in Copilot Studio.

What happens when I add more than 500 knowledge objects to my agent?

You're prevented from adding any further objects unless you first delete previous objects.

Does each agent have their own index of the knowledge source?

Knowledge sources are stored in Dataverse for use in the environment they were created in. If the same SharePoint folder is used in multiple agents, a single instance of the folder is used for all the agents.

What happens if I select a folder that has more than the maximum number of files, folders, and subfolders when adding a SharePoint or OneDrive source?

Copilot Studio retrieves and indexes up to the maximum number of files, folders, and subfolders, and index those. The remaining aren't processed. Currently, there isn't any messaging to indicate what was or wasn't processed.

One of the files I added (or that was part of a folder I added) is displayed as part of the knowledge source, but I can't get answers from it. Why?

This issue could be related to one of the following reasons:

  • The file or folder is set to "Ready" on the Knowledge page.
  • Ensure that the file name doesn't include an unsupported character (specifically for SharePoint files).
  • Ensure that the file doesn't have a sensitivity setting of Confidential or Highly Confidential, or have password protection.
  • Ensure that it's a supported file type.
  • If the file or folder is from a different user's OneDrive or SharePoint site, verify that it's shared with the maker.
  • If the file is knowledge base file, ensure that your account has permissions to view the content in the source system.