Why is Azure AI Document Intelligence Custom Classifier labeling a blank page with some numbers on the top as a "Prescription" with high confidence?

Estacio, Pedro Vasconcelos 25 Reputation points
2025-06-18T15:04:41.71+00:00

Outros-Prescrição

This image is classified by the model as a "Prescription". I reviewed all my training data and can't find a reason why would this image would be classified as such. I even have a category called "Others" where I have similar images to this one, with blank pages and some numbers on the top right.

Is there something I can do to prevent this classification error or this high confidence value?

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
2,093 questions
{count} votes

1 answer

Sort by: Newest
  1. Manas Mohanty 5,275 Reputation points Microsoft External Staff Moderator
    2025-06-18T17:10:20.1+00:00

    Hey Estacio, Pedro Vasconcelos

    It sounds like you're facing some challenges with Custom Classifier misclassifying a blank page as a "Prescription" with high confidence.

    Here are a few things you can do to troubleshoot this problem:

    1. Review Training Data: Double-check your training data to ensure that the model has not been exposed to similar patterns that could cause it to misclassify. Sometimes, including more explicit examples of "blank pages" or low-content documents might help the model learn to categorize them correctly as "Others."
    2. Adjust Classification Fields: Make sure that the classification fields are clearly defined, especially for ambiguous cases. It can be beneficial to enhance the descriptions for your categories to minimize misclassification.
    3. Set Confidence Thresholds: Consider implementing manual review for documents with confidence scores below a certain threshold. For instance, if the confidence score for this classification is particularly high, try setting a lower threshold for review to catch these types of errors.
    4. Use Human Review: Incorporating a human in the loop for documents that show high confidence in misclassification can help ensure that any critical changes are made before the output is used. You can retrain incrementally if you are seeing more misclassification.

    Reference -https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/train/custom-classifier?view=doc-intel-4.0.0

    If you're still having issues after trying these steps, it would be helpful to gather some additional information. Here are a few follow-up questions you might want to provide:

    • What confidence score is being assigned to the classification of the blank page?
    • Can you share more details on how your training data is structured and the types of documents included?
    • Have you defined clear and specific categories for all potential classifications?

    Thank you

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.