Entropy

Description

An entropy encoder is a data encoding method that achieves lossless data compression by encoding a message with "wasted" or "extra" information removed. In other words, entropy encoding removes information that was not necessary in the first place to accurately encode the message. A high degree of entropy implies a message with a great deal of wasted information; english text encoded in ASCII is an example of a message type that has very high entropy. Already compressed messages, such as JPEG graphics or ZIP archives, have very little entropy and do not benefit from further attempts at entropy encoding.

English text encoded in ASCII has a high degree of entropy because all characters are encoded using the same number of bits, eight. It is a known fact that the letters E, L, N, R, S and T occur at a considerably higher frequency than do most other letters in english text. If a way could be found to encode just these letters with four bits, then the new encoding would be smaller, would contain all the original information, and would have less entropy. ASCII uses a fixed number of bits for a reason, however: it’s easy, since one is always dealing with a fixed number of bits to represent each possible glyph or character. How would an encoding scheme that used four bits for the above letters be able to distinguish between the four-bit codes and eight-bit codes? This seemingly difficult problem is solved using what is known as a "prefix-free variable-length" encoding.

In such an encoding, any number of bits can be used to represent any glyph, and glyphs not present in the message are simply not encoded. However, in order to be able to recover the information, no bit pattern that encodes a glyph is allowed to be the prefix of any other encoding bit pattern. This allows the encoded bitstream to be read bit by bit, and whenever a set of bits is encountered that represents a glyph, that glyph can be decoded. If the prefix-free constraint was not enforced, then such a decoding would be impossible.

Consider the text "AAAAABCD". Using ASCII, encoding this would require 64 bits. If, instead, we encode "A" with the bit pattern "00", "B" with "01", "C" with "10", and "D" with "11" then we can encode this text in only 16 bits; the resulting bit pattern would be "0000000000011011". This is still a fixed-length encoding, however; we’re using two bits per glyph instead of eight. Since the glyph "A" occurs with greater frequency, could we do better by encoding it with fewer bits? In fact we can, but in order to maintain a prefix-free encoding, some of the other bit patterns will become longer than two bits. An optimal encoding is to encode "A" with "0", "B" with "10", "C" with "110", and "D" with "111". (This is clearly not the only optimal encoding, as it is obvious that the encodings for B, C and D could be interchanged freely for any given encoding without increasing the size of the final encoded message.) Using this encoding, the message encodes in only 13 bits to "0000010110111", a compression ratio of 4.9 to 1 (that is, each bit in the final encoded message represents as much information as did 4.9 bits in the original encoding). Read through this bit pattern from left to right and you’ll see that the prefix-free encoding makes it simple to decode this into the original text even though the codes have varying bit lengths.

As a second example, consider the text "THE CAT IN THE HAT". In this text, the letter "T" and the space character both occur with the highest frequency, so they will clearly have the shortest encoding bit patterns in an optimal encoding. The letters "C", "I’ and "N" only occur once, however, so they will have the longest codes.

There are many possible sets of prefix-free variable-length bit patterns that would yield the optimal encoding, that is, that would allow the text to be encoded in the fewest number of bits. One such optimal encoding is to encode spaces with "00", "A" with "100", "C" with "1110", "E" with "1111", "H" with "110", "I" with "1010", "N" with "1011" and "T" with "01". The optimal encoding therefore requires only 51 bits compared to the 144 that would be necessary to encode the message with 8-bit ASCII encoding, a compression ratio of 2.8 to 1.
Input

The input file will contain a list of text strings, one per line. The text strings will consist only of uppercase alphanumeric characters and underscores (which are used in place of spaces). The end of the input will be signalled by a line containing only the word “END” as the text string. This line should not be processed.
Output

For each text string in the input, output the length in bits of the 8-bit ASCII encoding, the length in bits of an optimal prefix-free variable-length encoding, and the compression ratio accurate to one decimal point.
Sample Input

AAAAABCD
THE_CAT_IN_THE_HAT
END
Sample Output

64 13 4.9
144 51 2.8

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
devmiao 2017-10-28 01:09
关注
http://blog.csdn.net/weyuli/article/details/9361165

本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

Entropy
2017-05-15 04:52

回答 2 已采纳 http://blog.csdn.net/kindlucy/article/details/5814043
Information Entropy less
2017-08-24 08:21

回答 2 已采纳 http://blog.csdn.net/u011107911/article/details/40047033
多分类softmax问题使用binary_crossentropy tensorflow 人工智能机器学习深度学习神经网络
2020-05-26 15:57

回答 1 已采纳不知道你什么结果高很多，是loss高，还是acc高。一南一北两回事了。
前端开源库-bip65
2019-08-29 16:46

前端开源库-bip65Bip65，一个Bip65绝对锁定时间编码库。
nn.CrossEntropyLoss(x,label)报x维度与label维度不一样 pytorch 神经网络
2022-07-22 18:00

回答 1 已采纳左边images维度和右边的output维度不一致
tf.train.AdamOptimizer(1e-4).minimize(cross_entropy) 的 1e-4 是给哪个参数的？ tensorflow 深度学习神经网络
2019-02-14 07:04

回答 2 已采纳在 `python` 里面，如果没有指明的话，参数赋值顺序是从左到右的吧。比如这里应该就是 `learning_rate`，然后后面全部使用默认参数。
自动驾驶前端
2023-02-20 23:59

Little Xi的博客 ight: 400; font-family: 金山云技术体; letter-spacing: 0px;
前端测试资源
2023-09-06 17:43

Rabir-YellowDavid的博客【代码】前端测试资源。
前端需要理解的浏览器知识
2023-08-25 14:54

薛定谔的猫96的博客当进行了会影响布局树的...，由许多浏览器的特征信息综合起来的，不同特征值的信息熵（entropy，是接收的每条消息中包含的信息的平均量，信息熵越高，则能传输越多的信息，信息熵越低，则意味着传输的信息越少）有异。
前端（选项菜单）
2023-12-22 00:04

entropy is true的博客提交的内容并不是”学生”，而是标签的标签属性 value 的值。需要注意的是，这是一个单选菜单，如果用户选择了”学生”，那么提交的数据将会是: career:"student"因为选项有点多，所以我们不采用单选框，而是用到新...
2024前端智能化发展现状与未来展望
2024-04-28 00:49

2401_84419009的博客面试前要精心做好准备，简历上写...开源分享：【大厂前端面试题解析+核心总结学习笔记+真实项目实战+最新讲解视频】为了帮助大家更好更高效的准备面试，特别整理了《前端工程师面试手册》电子稿文件。前端面试题汇总。
「金融安全」Entropy_as_a_service_a_framework_for_delivering_high_qu
2021-12-05 08:01

「金融安全」Entropy_as_a_service_a_framework_for_delivering_high_quality_entropy - 安全架构安全架构网站安全云安全云安全应急响应
使用 PyTorch C++ 前端
2021-11-30 15:29

NaiveYoungPeo的博客使用 PyTorch C++ 前端 PyTorch C++ 前端是 PyTorch 机器学习框架的纯 C++ 接口。虽然 PyTorch 的主要接口是 Python，但 Python API 位于大量 C++ 代码库之上，提供基础数据结构和功能，例如张量和自动微分。 C++ ...
前端智能化漫谈 (1) - pix2code
2019-07-25 17:31

Jtag特工的博客前端智能化漫谈 (1) - pix2code 自从有了GUI图形界面，就诞生了跟图形界面打交道的开发工程师，其中最大的一拨就演化成现在的前端工程师。不管是工作在前端、移动端还是桌面客户端，跟界面布局和切图等工作打交道是...
2024年Web前端最新你不知道的 WebSocket_ws echo，前端开发培训学什么
2024-05-06 09:30

2401_84411323的博客开源分享：【大厂前端面试题解析+核心总结学习笔记+真实项目实战+最新讲解视频】写在最后最后，对所以做Java的朋友提几点建议，也是我的个人心得：疯狂编程学习效果可视化写博客阅读优秀代码心态调整当然...
【多模态】Multi-modal chemical information reconstruction from images and texts for exploring the
2022-12-29 10:38

Pengsen Ma的博客使用CrossEntropy loss和10-fold cross validation ，使用Viterbi 算法解码。batch_size为64个序列，每个序列有256个token。参数为0.51M，FLOPs计算的时间复杂度为3.07M。Adam优化器。 3、Evaluation metrics TP为真...
训练篇:使用pytorch实现垃圾分类并部署使用,浏览器访问，前端Vue，后台Flask
2021-11-13 10:52

_-CHEN-_的博客训练篇:使用pytorch实现垃圾分类并部署使用,浏览器访问，前端Vue，后台Flask1.数据集准备1.1数据集下载1.2 数据集划分1.3实现自己的Dataset2.开始训练数据2.1 实现训练方法和测试方法2.2 开始训练 1.数据集准备 1.1...
nop-entropy可逆计算入门（1）
2024-01-31 19:09

shushengcoder的博客第1步：从大佬的gitee：https://gitee.com/canonical-entropy/nop-entropy下载源码，进行本地编译，具体编译看项目下的readme... 例如：注意事项前端的REST链接根据对象名和方法名自动推定，无需手工指定，固定格式为...
tensorflow算法实战：普通的数据训练和迁移学习之后的数据训练进行图像的识别（包括前端页面）
2022-03-31 21:34

Keep_Trying_Go的博客文章目录1.数据集的准备：2.requirements.txt文件：3.文件结构：4.预测效果：5.首先普通的训练：（1）导入相关的库函数：（2）相关的变量的初始化：（3）...前端文件： 1.数据集的准备：链接：https://pan.baidu.co
前端RSA加密demo
2018-07-23 15:27

Maximus_ckp的博客前端运用RSA技术进行加密简单来说分为以下几步骤：引入RSA加密文件jsencrypt.js ↓ 实例化加密对象 var encrypt = new JSEncrypt(); var decrypt = new JSEncrypt(); ↓ 设置公钥 / 密钥（公钥 / 密钥跟...
没有解决我的问题, 去提问

悬赏问题

¥17 pro*C预编译“闪回查询”报错SCN不能识别
¥15 微信会员卡接入微信支付商户号收款
¥15 如何获取烟草零售终端数据
¥15 数学建模招标中位数问题
¥15 phython路径名过长报错不知道什么问题
¥15 深度学习中模型转换该怎么实现
¥15 Stata外部命令安装问题求帮助！
¥15 从键盘随机输入A-H中的一串字符串，用七段数码管方法进行绘制。提交代码及运行截图。
¥15 如何用python向钉钉机器人发送可以放大的图片？
¥15 matlab（相关搜索：紧聚焦）