dss524049 2018-04-27 02:24
浏览 96

PHP Google Cloud Vision API:注释即时泛滥内存

I use cloud vision to annotate documents with DOCUMENT_TEXT_DETECTION, and I only use the words data.

The annotate command returns a lot of information for each letter/symbol (languages, vertices, breaks, text, confidence, ...) which adds up to a lot of memory usage. Running annotate on a 4 pages document¹ return over 100MB of data, which is past my php memory limit, causing the script to crash. Getting only the words data would probably be about 5 times smaller.

To be clear, I load the VisionClient, set up the image, run the annotate() command, and it returns a 100MB variable directly, crashing at this point, before I get the chance to do any cleaning.

$vision = new VisionClient([/* key & id here */]);
$image = $vision->image(file_get_contents($imagepath), ['DOCUMENT_TEXT_DETECTION']);
$annotation = $vision->annotate($image); // Crash at that point trying to allocate too much memory.

Is there a way to not request the entirety of the data? The documentation on annotate seems to indicate that it's possible to annotate only part of the picture, but not to toss the symbols data.

At a more fundamental level, am I doing something wrong here regarding memory management in general?

Thanks

Edit : Just realized : I also need to store the data in a file, which I do using serialize()... which double the memory usage when ran, even if I do $annotation = serialize($annotation) to avoid having 2 variables. So I'd actually need 200MB per user.

¹ Though this is related to the amount of text rather than the amount of pages.

  • 写回答

1条回答 默认 最新

  • doulao2029 2018-05-02 19:19
    关注

    Dino,

    When dealing with large images, I would highly recommend uploading your image to Cloud Storage and then running the annotation request against the image in a bucket. This way you'll be able to take advantage of the resumable or streaming protocols available in the Storage library to upload your object with more reliability and with less memory consumption. Here's a quick snippet of what this could look like using the resumable uploader:

    use Google\Cloud\Core\Exception\GoogleException;
    use Google\Cloud\Storage\StorageClient;
    use Google\Cloud\Vision\VisionClient;
    
    $storage = new StorageClient();
    $bucket = $storage->bucket('my-bucket');
    $imageName = 'my-image.png';
    
    $uploader = $bucket->getResumableUploader(
        fopen('/path/to/local/image.png', 'r'),
        [
            'name' => $imageName,
            'chunkSize' => 262144 // This will read data in smaller chunks, freeing up memory
        ]
    );
    
    try {
        $uploader->upload();
    } catch (GoogleException $ex) {
        $resumeUri = $uploader->getResumeUri();
        $uploader->resume($resumeUri);
    }
    
    $vision = new VisionClient();
    $image = $vision->image($bucket->object($imageName), [
        'FACE_DETECTION'
    ]);
    
    $vision->annotate($image);
    

    https://googlecloudplatform.github.io/google-cloud-php/#/docs/google-cloud/v0.63.0/storage/bucket?method=getResumableUploader

    评论

报告相同问题?

悬赏问题

  • ¥15 下图接收小电路,谁知道原理
  • ¥15 装 pytorch 的时候出了好多问题,遇到这种情况怎么处理?
  • ¥20 IOS游览器某宝手机网页版自动立即购买JavaScript脚本
  • ¥15 手机接入宽带网线,如何释放宽带全部速度
  • ¥30 关于#r语言#的问题:如何对R语言中mfgarch包中构建的garch-midas模型进行样本内长期波动率预测和样本外长期波动率预测
  • ¥15 ETLCloud 处理json多层级问题
  • ¥15 matlab中使用gurobi时报错
  • ¥15 这个主板怎么能扩出一两个sata口
  • ¥15 不是,这到底错哪儿了😭
  • ¥15 2020长安杯与连接网探