如何解压缩/缩小PDF流

Working with the 2016-W4 pdf, which has 2 large streams (page 1 & 2), along with a bunch of other objects and smaller streams. I'm trying to deflate the stream(s), to work with the source data, but am struggling. I'm only able to get corrupt inputs and invalid checksums errors.

I've written a test script to help debug, and have pulled out smaller streams from the file to test with.

Here are 2 streams from the original pdf, along with their length objects:

stream 1:

149 0 obj
<< /Length 150 0 R /Filter /FlateDecode /Type /XObject /Subtype /Form /FormType
1 /BBox [0 0 8 8] /Resources 151 0 R >>
stream
x+TT(T0B ,JUWÈS0Ð37±402V(NFJSþ¶
«
endstream
endobj
150 0 obj
42
endobj

stream 2

142 0 obj
<< /Length 143 0 R /Filter /FlateDecode /Type /XObject /Subtype /Form /FormType
1 /BBox [0 0 0 0] /Resources 144 0 R >>
stream
x+Tçã
endstream
endobj
143 0 obj
11
endobj

I copied just the stream contents into new files within Vim (excluding the carriage returns after stream and before endstream).

I've tried both:

compress/flate (rfc-1951) – (removing the first 2 bytes (CMF, FLG))
compress/zlib (rfc-1950)

I've converted the streams to []byte for the below:

package main

import (
    "bytes"
    "compress/flate"
    "compress/gzip"
    "compress/zlib"
    "fmt"
    "io"
    "os"
)

var (
    flateReaderFn = func(r io.Reader) (io.ReadCloser, error) { return flate.NewReader(r), nil }
    zlibReaderFn  = func(r io.Reader) (io.ReadCloser, error) { return zlib.NewReader(r) }
)

func deflate(b []byte, skip, length int, newReader func(io.Reader) (io.ReadCloser, error)) {
    // rfc-1950
    // --------
    //   First 2 bytes
    //   [120, 1] - CMF, FLG
    //
    //   CMF: 120
    //     0111 1000
    //     ↑    ↑
    //     |    CM(8) = deflate compression method
    //     CINFO(7)   = 32k LZ77 window size
    //
    //   FLG: 1
    //     0001 ← FCHECK
    //            (CMF*256 + FLG) % 31 == 0
    //             120 * 256 + 1 = 30721
    //                             30721 % 31 == 0

    stream := bytes.NewReader(b[skip:length])
    r, err := newReader(stream)
    if err != nil {
        fmt.Println("
failed to create reader,", err)
        return
    }

    n, err := io.Copy(os.Stdout, r)
    if err != nil {
        if n > 0 {
            fmt.Print("
")
        }
        fmt.Println("
failed to write contents from reader,", err)
        return
    }
    fmt.Printf("%d bytes written
", n)
    r.Close()
}

func main() {
    //readerFn, skip := flateReaderFn, 2 // compress/flate RFC-1951, ignore first 2 bytes
    readerFn, skip := zlibReaderFn, 0 // compress/zlib RFC-1950, ignore nothing

    //                                                                                                ⤹ This is where the error occurs: `flate: corrupt input before offset 19`.
    stream1 := []byte{120, 1, 43, 84, 8, 84, 40, 84, 48, 0, 66, 11, 32, 44, 74, 85, 8, 87, 195, 136, 83, 48, 195, 144, 51, 55, 194, 177, 52, 48, 50, 86, 40, 78, 70, 194, 150, 74, 83, 8, 4, 0, 195, 190, 194, 182, 10, 194, 171, 10}
    stream2 := []byte{120, 1, 43, 84, 8, 4, 0, 1, 195, 167, 0, 195, 163, 10}

    fmt.Println("----------------------------------------
Stream 1:")
    deflate(stream1, skip, 42, readerFn) // flate: corrupt input before offset 19

    fmt.Println("----------------------------------------
Stream 2:")
    deflate(stream2, skip, 11, readerFn) // invalid checksum
}

I'm sure I'm doing something wrong somewhere, I just can't quite see it.

(The pdf does open in a viewer)

展开全部

写回答
好问题 0 提建议
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

2条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dongwen5351 2017-02-21 12:23
关注
Binary data should never be copied out of / saved from text editors. There might be cases when this succeeds, and it just adds oil to the flame.

Your data that you eventually "mined out" from the PDF is most likely not identical to the actual data that is in the PDF. You should take the data from a hex editor (e.g. try hecate for something new), or write a simple app that saves it (which strictly handles the file as binary).

Hint #1:

The binary data displayed spread across multiple lines. Binary data does not contain carriage returns, that's a textual control. If it does, that means the editor did interpret it as text, and so some codes / characters where "consumed" to start a new line. Multiple sequences may be interpreted as the same newline (e.g. , ). By excluding them, you're already at data loss, by including them, you might already have a different sequence. And if the data was interpreted and displayed as text, more problems may arise as there are more control characters, and some characters may not appear when displayed.

Hint #2:

When flateReaderFn is used, decoding the 2nd example succeeds (completes without an error). This means "you were barking up the right tree", but the success depends on what the actual data is and to what extent was it "distorted" by the text editor.

本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报
编辑

预览
轻敲空格完成输入
显示为

卡片

标题

链接
评论

按下Enter换行，Ctrl+Enter发表内容

查看更多回答(1条)

编辑

预览

报告相同问题？

关注问题

映射/缩小/过滤器/等的惯用替换
2018-03-24 10:19

回答 2 已采纳 Simple apply/filter/reduce package. I wanted to see how hard it was to implement this sor
html怎么使网页可以等比例缩小放大啊？ css css3 html
2022-02-14 07:23

回答 3 已采纳百分比 + rem布局
移动端图片怎么放大缩小？ javascript jquery
2022-08-24 08:06

回答 4 已采纳点击放大设置图片的 transform: scale(倍数放大); 点击缩小设置图片的 transform: scale(倍数减小);
基于FPGA的miniLZO解压缩算法实现.pdf
2021-07-13 03:03

miniLZO解压缩算法的硬件实现，不仅提高了数据处理速度，还缩小了系统资源占用，使得这种算法更加适用于资源有限的硬件平台。此外，这种基于FPGA的硬件解压缩方案还可以为相关领域的开发者提供参考和实践指导。 ...
vue页面自适应问题窗口缩小页面元素也缩小 javascript vue.js 前端
2023-04-23 12:21

回答 2 已采纳我去看了一下黑神话的他是屏幕变化中在修改图片的宽度高度是没有变化你为什么要修改你的高度呢
我的php图像服务器使用Imagick在裁剪/缩小后产生更大的png8文件 php
2016-07-10 13:20

回答 1 已采纳 Your original image has 33 colours and weighs in at 22kB. If you resize like this (albeit at the
Echarts自适应不能缩小只能变大 echarts vue.js
2021-11-22 08:14

回答 2 已采纳发现了问题所在，连续两个元素嵌套（父元素和祖先元素）都用的flex:1，导致获取不到父元素的宽度，父元素用监控祖先元素的宽度然后计算赋值就好了。
一种基于FPGA和ADV212的遥感卫星图像JPEG2000实时解压缩方法.pdf
2021-07-13 04:52

JPEG2000是一种先进的图像压缩标准，它采用小波变换技术，支持无损和有损压缩。JPEG2000算法综合了小波变换和嵌入式算术编码器(EBCOT)的技术优势...这种实时解压缩方法特别适用于遥感卫星图像等高速数据流的应用场景。
canvas绘图放大缩小失真 javascript
2023-02-24 08:42

回答 2 已采纳可以通过设置canvas的width和height来放大或者缩小，而不是使用transform，这样可以避免失真。另外，也可以使用图像处理技术来改善图像质量，比如使用抗锯齿技术，可以通过context
winform输出结果进行放大缩小 c#
2023-03-27 09:41

回答 2 已采纳该回答引用GPT:要实现只将绘制出的矩形进行放大缩小，可以在pictureBox1_MouseWheel事件中添加以下代码： // 获取用户输入的矩形中心点、长和宽 double centerX =
C#winform的picturebox鼠标滚动放大缩小问题 c#
2022-01-17 06:06

回答 2 已采纳那你是用什么实现的放大呢？如果你是直接GDI+绘制，那肯定不存在你说的限制。你也可以用个panel套picturebox,图片设置成铺满，然后修改picturebox的位置和大小，这样也不存在大小限制
基于程序特征分析的流处理器VLIW压缩技术与解压实现.pdf
2021-09-25 11:35

该架构的设计充分利用了对指令级并行（Instruction-Level Parallelism，ILP）的深入理解，通过改进硬件结构来适应压缩后的代码，从而实现了高效的解压缩处理。这项工作的最大贡献在于为VLIW处理器的优化提供了新的...
数字音视频制作讲义.pdf
2023-10-07 05:40

豪杰超级解霸 3000 的“AVI 转 MPEG”工具可以对生成的AVI文件进行格式转换，缩小文件容量，使其更加便于存储和传输。豪杰超级解霸 3000 的视频截取功能可以将所需的视频素材截取下来，保存成MPEG格式。其操作步骤...
Transformer自然语言处理实战pdf阅读
2024-07-23 10:01

fc&&fl的博客从而很难处理长序列，因为当序列过长时，在将所有内容压缩为单个固定表示的过程中可能会丢失序列开头的信息。幸运的是，有一种方法可以摆脱这一瓶颈，就是允许解码器访问编码器的所有隐藏状态。这种通用机制称为...
TI-SN65LVDS95.pdf
2023-02-08 14:38

TI的SN65LVDS95是一款高性能的低压差分信号（LVDS）串行器/解串器发送器，特别适用于低电磁干扰（EMI）的点对点子系统通信。这款芯片集成了3个7位并行加载、串行输出移位寄存器、一个7倍时钟合成器和4个LVDS线路驱动...
没有解决我的问题, 去提问

悬赏问题

¥15 PADS Logic 原理图
¥15 PADS Logic 图标
¥15 电脑和power bi环境都是英文如何将日期层次结构转换成英文
¥20 气象站点数据求取中~
¥15 如何获取APP内弹出的网址链接
¥15 wifi 图标不见了不知道怎么办上不了网变成小地球了

如何解压缩/缩小PDF流

2条回答 默认 最新

悬赏问题

2条回答默认最新