I'm trying to set up a basic pipeline to take a PDF, OCR it, and return a searchable PDF with Azure prebuilt-read under the v4.0 API. I'm sure ten thousand people have done this before, right?!
I can't get Azure to process anything more than the smallest PDFs. For example, I have a 12.1 MiB PDF of business documents (US Letter size; about 100 pages; no more than 5000x5000 pixels per page) that Azure DI refuses to read. "InvalidContentLength - The input image is too large. Refer to documentation for the maximum file size." I am on S0; my documents are far under the maximum size.
First, I tried submitting files with REST. It got above 10 MB or so and complained that the file was too big. So the Internet told me that I had to set up an unneeded Blob Storage instance and pass a URL from there. OK, fine, so I did that and started submitting a urlsource instead. Same error.
Even in the online Document Intelligence Studio OCR/Read, my document uploads fine (or accepts my Blob Storage URL), even shows a correct preview window where I can scroll through the document, but I get the exact same !!@#$!@ error when I "Run analysis." Even if I ask it only to analyze one page. "InvalidContentLength"
I am on the paid version which is supposed to support 500 MB PDFs.
I am so, so incredibly frustrated. This error message is not helpful! The documentation says it should work fine!
I would think that there would be an easy template or complete how-to document for something so simple.