weixin_39801613
2020-11-29 12:15 阅读 0

Database Compression

Files Added

Other Existing Files Modified

Some additional functions have been added to support database compression. Here are the following files:

Strategy for Compressing

  • Once a tile gets full (all slots occupied by tuples), the CompressTile is created.
  • We scan each column and sort it.
  • We compute the median, min and max, this gives us the minoffset(min-median) and maxoffset(max-median).
  • If these offsets can be represented in a smaller data type than the original data type, we compress the entire column.
  • The median is chosen as the base, and essentially only the offsets are stored in the column.
  • Currently we can compress SMALLINT and INTEGER and BIGINT. TINYINT is already 1 byte so we do not compress it.
  • We are yet to add support for decimal values.
  • This median is also stored as the metadata to later retrieve the original value

Testing

Compression Correctness Test:

  • This test inserts 25 tuples
  • Each tuple is of the form (i, i*100) where i belongs to (0,25)
  • Since each tile group contains 1 tile and 10 tuples per tile, there are 3 tile groups formed.
  • The first 2 tile groups are full and get compressed.
  • The third tile group has 5 slots vacant and is still not full and is uncompressed.
  • Thus we now have compressed and uncompressed data.
  • We now perform a SELECT * on this and expect to correctly recieve the true value of the compressed data and uncompressed data.

Compression Size Test:

  • This test inserts 100 tuples
  • Each tuple is of the form [Integer, Integer, Decimal, Varchar]
  • Thus the tuple length is (4+ 4+ 8+ 8) = 24 bytes.
  • We currently support the compression of integers
  • The integers get compressed to TINYINT making the tuple sizes now (1+ 1+ 8+ 8) = 18 bytes
  • The test checks this decrease in size.

该提问来源于开源项目:cmu-db/peloton

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享

10条回答 默认 最新

  • weixin_39629969 weixin_39629969 2020-11-29 12:16

    Coverage Status

    Coverage decreased (-1.9%) to 65.961% when pulling e223394260afbbdf328d867bce84e73676369d48 on rohit-cmu:master into b191a335c0f02798ccfa324731546eb90277c6a0 on cmu-db:master.

    点赞 评论 复制链接分享
  • weixin_39885803 weixin_39885803 2020-11-29 12:16

    We recommend not formatting other files that you didn't modify.

    点赞 评论 复制链接分享
  • weixin_39629969 weixin_39629969 2020-11-29 12:16

    Coverage Status

    Coverage decreased (-1.8%) to 65.961% when pulling 35947d59e7164605eb8df45a9d9c22e887d964ba on rohit-cmu:master into e1d16feaacab85cd09eb0e46c131d99c399748b8 on cmu-db:master.

    点赞 评论 复制链接分享
  • weixin_39629969 weixin_39629969 2020-11-29 12:16

    Coverage Status

    Coverage decreased (-67.2%) to 0.0% when pulling 848bd842ec1520dbf8ddc6f08abe2e89df650732 on rohit-cmu:master into a05b5c134802a522ff00e135350cb44838546dff on cmu-db:master.

    点赞 评论 复制链接分享
  • weixin_39629969 weixin_39629969 2020-11-29 12:16

    Coverage Status

    Coverage decreased (-1.9%) to 65.311% when pulling f98294d7c1625727e7d0861be49da1531a9932f1 on rohit-cmu:master into 2b56468c5482034462d3f8b8d2115769416298c3 on cmu-db:master.

    点赞 评论 复制链接分享
  • weixin_39629969 weixin_39629969 2020-11-29 12:16

    Coverage Status

    Coverage decreased (-1.5%) to 65.669% when pulling f98294d7c1625727e7d0861be49da1531a9932f1 on rohit-cmu:master into 2b56468c5482034462d3f8b8d2115769416298c3 on cmu-db:master.

    点赞 评论 复制链接分享
  • weixin_39629969 weixin_39629969 2020-11-29 12:16

    Coverage Status

    Coverage decreased (-1.4%) to 65.746% when pulling a53be27c6fa4ff45e73072f0a636c76c867db9fc on rohit-cmu:master into 2b56468c5482034462d3f8b8d2115769416298c3 on cmu-db:master.

    点赞 评论 复制链接分享
  • weixin_39629969 weixin_39629969 2020-11-29 12:16

    Coverage Status

    Coverage decreased (-2.9%) to 64.27% when pulling 0e184251fcfffcda3844955248768a50677b8e35 on rohit-cmu:master into 2b56468c5482034462d3f8b8d2115769416298c3 on cmu-db:master.

    点赞 评论 复制链接分享
  • weixin_39629969 weixin_39629969 2020-11-29 12:16

    Coverage Status

    Coverage decreased (-2.8%) to 64.368% when pulling 2d1c5f3c3d448d9f33475640fa63691451e8559f on rohit-cmu:master into 2b56468c5482034462d3f8b8d2115769416298c3 on cmu-db:master.

    点赞 评论 复制链接分享
  • weixin_39629969 weixin_39629969 2020-11-29 12:16

    Coverage Status

    Coverage decreased (-2.8%) to 64.366% when pulling fed118a399171bb366058ec9711fcb9d5900c1c5 on rohit-cmu:master into 2b56468c5482034462d3f8b8d2115769416298c3 on cmu-db:master.

    点赞 评论 复制链接分享

相关推荐