InvalidContentLength on PDF without any reason

M. Kerr 0 Reputation points
2025-06-08T12:19:02.2466667+00:00

I'm trying to set up a basic pipeline to take a PDF, OCR it, and return a searchable PDF with Azure prebuilt-read under the v4.0 API. I'm sure ten thousand people have done this before, right?!

I can't get Azure to process anything more than the smallest PDFs. For example, I have a 12.1 MiB PDF of business documents (US Letter size; about 100 pages; no more than 5000x5000 pixels per page) that Azure DI refuses to read. "InvalidContentLength - The input image is too large. Refer to documentation for the maximum file size." I am on S0; my documents are far under the maximum size.

First, I tried submitting files with REST. It got above 10 MB or so and complained that the file was too big. So the Internet told me that I had to set up an unneeded Blob Storage instance and pass a URL from there. OK, fine, so I did that and started submitting a urlsource instead. Same error.

Even in the online Document Intelligence Studio OCR/Read, my document uploads fine (or accepts my Blob Storage URL), even shows a correct preview window where I can scroll through the document, but I get the exact same !!@#$!@ error when I "Run analysis." Even if I ask it only to analyze one page. "InvalidContentLength"

I am on the paid version which is supposed to support 500 MB PDFs.

I am so, so incredibly frustrated. This error message is not helpful! The documentation says it should work fine!

I would think that there would be an easy template or complete how-to document for something so simple.

Azure AI Document Intelligence
Azure AI Document Intelligence
An Azure service that turns documents into usable data. Previously known as Azure Form Recognizer.
2,093 questions
{count} votes

2 answers

Sort by: Most helpful
  1. Obinna Ejidike 1,440 Reputation points
    2025-06-09T08:37:31.4966667+00:00

    Hi M. Kerr

    Thanks for using the Q&A platform.

    The error InvalidContentLength suggests the input image is too large based on https://learn.microsoft.com/en-us/azure/ai-services/document-intelligence/how-to-guides/resolve-errors?view=doc-intel-4.0.0

    However, the error is commonly seen by users even when working with smaller files if you have special characters in filenames, or a corrupt PDF structure, or unsupported features embedded in the file.

    Kindly validate your PDF using PyPDF2 or PyMuPDF to ensure it’s not corrupt, has pages, and isn't encrypted. Verify page image dimensions by extracting each page image via pdfplumber, pymupdf, or poppler, and check that none exceed 10,000 pixels in width/height.

    Find a similar issue stackoverflow: https://stackoverflow.com/questions/79272102/azure-document-intelligence-formrecognizer-invalidcontent

    If the response was helpful, please feel free to mark it as “Accepted Answer” and consider giving it an upvote. This helps others in the community as well.

    Regards,

    Obinna.


  2. Manas Mohanty 5,350 Reputation points Microsoft External Staff Moderator
    2025-06-11T20:31:44.6433333+00:00

    Hi M. Kerr

    I tested in Central US region with the 3-page pdf shared from you.

    Issue seems to be specific at your resource side.

    User's image

    Could you create a new document intelligence resource (with no underscore or special character in the name) in any other supported region and let us know if the issue persists.

    Status at customer side - Resolved with new deployment with another region.

    Thank you.

    0 comments No comments

Your answer

Answers can be marked as Accepted Answers by the question author, which helps users to know the answer solved the author's problem.