douyan4958 2013-01-25 13:04
浏览 58
已采纳

如何以原始格式从PDF中提取图像

I'm using pdfimages -j bar.pdf /tmp/image to extract images from a PDF. My objective is to get them in their raw state as they were added. So If it was a .tif I'd like to get a .tif, if it's a jpg I'd like to get a .jpg. I keep getting .ppm for everything I extract.

Is it possible to get images in their original format or is ppm my only opiton?

Update: My primary objective for wanting to do this is to check the DPI of all of the images included in the document, or, check to see if they're vector.

  • 写回答

5条回答 默认 最新

  • donglianer5064 2013-01-25 13:56
    关注

    You can't (reliably) know the source image file format by looking at an image in PDF. For example, TIFF images can be compressed with (off the top of me head) none, RLE, CCITT (couple variations), LZW, Flate, Jpeg. If an image in a PDF is compressed with DCT (jpeg), how do you decide whether or not the source was TIFF or Jpeg? If it is compressed with Flate, how do you distinguish between TIFF and PNG? Further, it is the software generating the PDF which decides the compression, so I can take a Flate compressed TIFF image and encode it into a PDF using JPEG2000 or a CCITT compressed image and compress it with Jbig2 or a jpeg image, reduce it to an 8-bit paletted image and compress it with Flate.

    TL;DR you can't know.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(4条)

报告相同问题?

悬赏问题

  • ¥15 素材场景中光线烘焙后灯光失效
  • ¥15 请教一下各位,为什么我这个没有实现模拟点击
  • ¥15 执行 virtuoso 命令后,界面没有,cadence 启动不起来
  • ¥50 comfyui下连接animatediff节点生成视频质量非常差的原因
  • ¥20 有关区间dp的问题求解
  • ¥15 多电路系统共用电源的串扰问题
  • ¥15 slam rangenet++配置
  • ¥15 有没有研究水声通信方面的帮我改俩matlab代码
  • ¥15 ubuntu子系统密码忘记
  • ¥15 保护模式-系统加载-段寄存器