I use cloud vision to annotate documents with DOCUMENT_TEXT_DETECTION, and I only use the words data.
The annotate command returns a lot of information for each letter/symbol (languages, vertices, breaks, text, confidence, ...) which adds up to a lot of memory usage. Running annotate on a 4 pages document¹ return over 100MB of data, which is past my php memory limit, causing the script to crash. Getting only the words data would probably be about 5 times smaller.
To be clear, I load the VisionClient, set up the image, run the annotate() command, and it returns a 100MB variable directly, crashing at this point, before I get the chance to do any cleaning.
$vision = new VisionClient([/* key & id here */]);
$image = $vision->image(file_get_contents($imagepath), ['DOCUMENT_TEXT_DETECTION']);
$annotation = $vision->annotate($image); // Crash at that point trying to allocate too much memory.
Is there a way to not request the entirety of the data? The documentation on annotate seems to indicate that it's possible to annotate only part of the picture, but not to toss the symbols data.
At a more fundamental level, am I doing something wrong here regarding memory management in general?
Thanks
Edit : Just realized : I also need to store the data in a file, which I do using serialize()... which double the memory usage when ran, even if I do $annotation = serialize($annotation) to avoid having 2 variables. So I'd actually need 200MB per user.
¹ Though this is related to the amount of text rather than the amount of pages.