Hey Estacio, Pedro Vasconcelos
It sounds like you're facing some challenges with Custom Classifier misclassifying a blank page as a "Prescription" with high confidence.
Here are a few things you can do to troubleshoot this problem:
- Review Training Data: Double-check your training data to ensure that the model has not been exposed to similar patterns that could cause it to misclassify. Sometimes, including more explicit examples of "blank pages" or low-content documents might help the model learn to categorize them correctly as "Others."
- Adjust Classification Fields: Make sure that the classification fields are clearly defined, especially for ambiguous cases. It can be beneficial to enhance the descriptions for your categories to minimize misclassification.
- Set Confidence Thresholds: Consider implementing manual review for documents with confidence scores below a certain threshold. For instance, if the confidence score for this classification is particularly high, try setting a lower threshold for review to catch these types of errors.
- Use Human Review: Incorporating a human in the loop for documents that show high confidence in misclassification can help ensure that any critical changes are made before the output is used. You can retrain incrementally if you are seeing more misclassification.
If you're still having issues after trying these steps, it would be helpful to gather some additional information. Here are a few follow-up questions you might want to provide:
- What confidence score is being assigned to the classification of the blank page?
- Can you share more details on how your training data is structured and the types of documents included?
- Have you defined clear and specific categories for all potential classifications?
Thank you