The workaround works, I can extract data from invoices using it, but the problem is that Mistral measures as consumed 65.0000 input tokens, for a pdf file that in Claude are measured 4,000 tokens.
Maybe it's a problem of the coding I've used, which might not correspond to the official Mistral specifications.
Can you please add official support for pdf files in Mistral?.
I couldn't find documentation on token usage from Mistral for PDF files submitted this way.
We can add this method meanwhile but investigate better approaches for a future version.
With AI Studio 1.7.1.0 the submit of PDF file to Mistral generates a 400 error “"Prompt contains 378782 tokens and 0 draft tokens, too large for model with 262144 maximum context length".
I found this doc, maybe it could be useful to you:
Yes context size limitations prevent from submitting very large PDF files
This looks like an interesting but separate API though from Mistral, so requires a separate implementation that at first sight cannot be unified at this point with other cloud LLM providers
Have you looked at TMS Software | Blog that demonstrates offline PDF plain text extraction?