Entropy 代码的计算

Problem Description
An entropy encoder is a data encoding method that achieves lossless data compression by encoding a message with “wasted” or “extra” information removed. In other words, entropy encoding removes information that was not necessary in the first place to accurately encode the message. A high degree of entropy implies a message with a great deal of wasted information; english text encoded in ASCII is an example of a message type that has very high entropy. Already compressed messages, such as JPEG graphics or ZIP archives, have very little entropy and do not benefit from further attempts at entropy encoding.

English text encoded in ASCII has a high degree of entropy because all characters are encoded using the same number of bits, eight. It is a known fact that the letters E, L, N, R, S and T occur at a considerably higher frequency than do most other letters in english text. If a way could be found to encode just these letters with four bits, then the new encoding would be smaller, would contain all the original information, and would have less entropy. ASCII uses a fixed number of bits for a reason, however: it’s easy, since one is always dealing with a fixed number of bits to represent each possible glyph or character. How would an encoding scheme that used four bits for the above letters be able to distinguish between the four-bit codes and eight-bit codes? This seemingly difficult problem is solved using what is known as a “prefix-free variable-length” encoding.

In such an encoding, any number of bits can be used to represent any glyph, and glyphs not present in the message are simply not encoded. However, in order to be able to recover the information, no bit pattern that encodes a glyph is allowed to be the prefix of any other encoding bit pattern. This allows the encoded bitstream to be read bit by bit, and whenever a set of bits is encountered that represents a glyph, that glyph can be decoded. If the prefix-free constraint was not enforced, then such a decoding would be impossible.

Consider the text “AAAAABCD”. Using ASCII, encoding this would require 64 bits. If, instead, we encode “A” with the bit pattern “00”, “B” with “01”, “C” with “10”, and “D” with “11” then we can encode this text in only 16 bits; the resulting bit pattern would be “0000000000011011”. This is still a fixed-length encoding, however; we’re using two bits per glyph instead of eight. Since the glyph “A” occurs with greater frequency, could we do better by encoding it with fewer bits? In fact we can, but in order to maintain a prefix-free encoding, some of the other bit patterns will become longer than two bits. An optimal encoding is to encode “A” with “0”, “B” with “10”, “C” with “110”, and “D” with “111”. (This is clearly not the only optimal encoding, as it is obvious that the encodings for B, C and D could be interchanged freely for any given encoding without increasing the size of the final encoded message.) Using this encoding, the message encodes in only 13 bits to “0000010110111”, a compression ratio of 4.9 to 1 (that is, each bit in the final encoded message represents as much information as did 4.9 bits in the original encoding). Read through this bit pattern from left to right and you’ll see that the prefix-free encoding makes it simple to decode this into the original text even though the codes have varying bit lengths.

As a second example, consider the text “THE CAT IN THE HAT”. In this text, the letter “T” and the space character both occur with the highest frequency, so they will clearly have the shortest encoding bit patterns in an optimal encoding. The letters “C”, “I’ and “N” only occur once, however, so they will have the longest codes.

There are many possible sets of prefix-free variable-length bit patterns that would yield the optimal encoding, that is, that would allow the text to be encoded in the fewest number of bits. One such optimal encoding is to encode spaces with “00”, “A” with “100”, “C” with “1110”, “E” with “1111”, “H” with “110”, “I” with “1010”, “N” with “1011” and “T” with “01”. The optimal encoding therefore requires only 51 bits compared to the 144 that would be necessary to encode the message with 8-bit ASCII encoding, a compression ratio of 2.8 to 1.

Input
The input file will contain a list of text strings, one per line. The text strings will consist only of uppercase alphanumeric characters and underscores (which are used in place of spaces). The end of the input will be signalled by a line containing only the word “END” as the text string. This line should not be processed.

Output
For each text string in the input, output the length in bits of the 8-bit ASCII encoding, the length in bits of an optimal prefix-free variable-length encoding, and the compression ratio accurate to one decimal point.

Sample Input
AAAAABCD
THE_CAT_IN_THE_HAT
END

Sample Output
64 13 4.9
144 51 2.8

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

报告相同问题？

关注问题

Entropy
2017-05-15 04:52

回答 2 已采纳 http://blog.csdn.net/kindlucy/article/details/5814043
Information Entropy less
2017-08-24 08:21

回答 2 已采纳 http://blog.csdn.net/u011107911/article/details/40047033
python 如何只运行部分代码 python pytorch 深度学习
2022-10-04 17:03

回答 8 已采纳把代码删掉。或者在这段代码的开头加上三个引号，末尾也加上三个引号，就可以把代码注释掉
Nop Platform 2.0是基于可逆计算理论实现的采用面向语言编程范式的新一代低代码开发平台
2024-04-04 09:23

Nop Platform 2.0是基于可逆计算理论实现的采用面向语言编程范式的新一代低代码开发平台，包含基于全新原理从零开始研发的GraphQL引擎、ORM引擎、工作流引擎、报表引擎、规则引擎、批处理引引擎等完整设计。...
python代码，箭头什么意思？ python pytorch
2022-10-25 21:59

回答 7 已采纳解题思路：通过肢解分析1、) 这只是书写规范，正确的完整的书写是def...(),在python中代表tuple元祖数据类型 2、-> 在python中代表返回值注解的符号，作为一种类型注释，
pytorch如何查看框架的定义代码 python pytorch 神经网络
2021-09-14 09:01

回答 1 已采纳推荐使用pycharm，按住CTRL键，点击你想查看的方法或者属性就会直接跳转过去。望采纳！
分享5种常用计算机编程算法及示例代码提升效率的关键
2023-06-05 21:25

polsnet的博客 model.Compile(optimizer: "adam", loss: "categorical_crossentropy", metrics: new[] { "accuracy" }); model.Fit(x_train, y_train, batch_size: 128, epochs: 10, validation_data: (x_test, y_test)); 多层...
python数学量子计算库toqito
2023-06-23 09:55

openwin_top的博客一个好的编程接口需要具备哪些要素 python如何计算三体运行问题 python模拟算盘的计算过程在进行股票统计研究中，有很多因子，如何屏蔽其他因子的影响，只研究一个因子的影响大小呢如何用科大讯飞接口进行语音识别...
Mojo编程语言：Python易用性与C性能的完美结合
2023-05-09 14:22

亿牛云爬虫专家的博客 Mojo是一门新兴的编程语言，但已经有一些用户可以通过Mojo Playground在线体验Mojo的编程。Mojo的发展趋势是利用MLIR（多层次中间表示）作为其核心基础，实现跨平台、跨语言、跨硬件的优化和部署。Mojo是Python的超...
Python计算KL散度
2023-05-15 17:32

Shy960418的博客【代码】Python计算KL散度。
简单python代码实现决策树计算信息增益_机器学习笔记之信息熵、信息增益和决策树(ID3算法)...
2021-02-12 11:05

怪叔叔来了的博客决策树算法：优点：计算复杂度不高，输出结果易于理解，对中间值的缺失不敏感，可以处理不相关的特征数据。缺点：可能会产生过度匹配问题。适用数据类型：数值型和标称型。算法原理：决策树是一个简单的为输入值选择...
当下最流行的10种编程语言你都知道哪些？
2023-10-09 18:00

Entropy-Go的博客当下最流行的10种编程语言，排序不分先后，有你正在使用或者学习的编程语言么？
掌握一个编程语言先从手写经典代码开始
2020-05-06 22:56

eason_by_step的博客如题都是扯淡的，倒是在第一次看tensorflow实战这本书的时候，没有特别的在意代码的实现，以至于看完后觉得很有道理，但是让自己来编写一个深度神经网络时，不知道从合作下手，因为之前都是在看（或者誊写）书本上的...
深度学习与语言模型
2023-08-08 01:04

禅与计算机程序设计艺术的博客在自然语言处理领域，语言模型是一个至关重要的基础工具。它可以帮助机器理解输入句子、文本中的单词顺序以及词汇的概率分布。...随着大规模数据、高计算性能的增加，深度学习方法逐渐成为机器学习领域的主流技术。
脑语言v0.5.8 2500令【单字编程】
2022-07-11 06:55

脑语言的博客这是脑语言v0.5.8版的2500个单字（也称为“令”与“一令”），通过【单字编程】（并不仅是中文编程，而是混合英文关键字，但以单字为主的命名）也许是英文不太好时又希望能写代码的其中一种方式。我在做脑语言...
如何训练一个语言模型？
2023-08-08 01:45

禅与计算机程序设计艺术的博客语言模型（Language Model）是自然语言处理任务中一种重要的技术。它是基于统计语言模型构建的预测模型，能够对任意给定的句子或者段落按照一定概率分布进行排序，并对输入语句中的每一个单词赋予其在整个词汇表的...
如何选择编程语言Python Go还是Rust？
2023-10-09 18:30

Entropy-Go的博客选择编程语言需要考虑多个方面，包括语言的特性、社区支持、工作机会、学习曲线等。下面是关于Python Go和Rust的一些介绍。
边缘计算与物联网技术
2023-08-01 01:14

禅与计算机程序设计艺术的博客边缘计算(Edge computing)是一种新型的信息技术概念和技术体系，由英特尔、微软、英伟达等公司提出，属于新一代云计算技术的一种，其核心思想是将计算任务从网络中心转移到本地节点，解决网络带宽受限、资源受限和...
生成模型在计算机视觉、自然语言处理、推荐系统中的应用和研究
2023-08-07 00:34

禅与计算机程序设计艺术的博客这篇文章主要阐述的是生成模型在计算机视觉、自然语言处理、推荐系统中的应用和研究，并通过开源框架、开源模型和开源库等方式分享给广大的科研工作者和工程师。此外，我们也希望借助这一篇文章，激发更多的同仁对...
GPT驱动低代码平台生产完整应用的已验证策略
2023-05-12 12:21

canonical-entropy的博客 GPT3.5已经可以理解元模型、模型差量、领域特定语言DSL的概念，可以直接驱动Nop平台生成前后端全套应用。Nop平台与GPT沟通的策略如下： 1. 通过当前所使用DSL的xdef元模型（类似json schema但更紧凑）帮助GPT更快、...
没有解决我的问题, 去提问

悬赏问题

¥28 微信小程序开发页面布局没问题，真机调试的时候页面布局就乱了
¥15 python的qt5界面
¥15 无线电能传输系统MATLAB仿真问题
¥50 如何用脚本实现输入法的热键设置
¥20 我想使用一些网络协议或者部分协议也行，主要想实现类似于traceroute的一定步长内的路由拓扑功能
¥30 深度学习，前后端连接
¥15 孟德尔随机化结果不一致
¥15 apm2.8飞控罗盘bad health，加速度计校准失败
¥15 求解O-S方程的特征值问题给出边界层布拉休斯平行流的中性曲线
¥15 谁有desed数据集呀

码龄粉丝数原力等级 --

Entropy 代码的计算

0条回答默认最新

悬赏问题

Entropy 代码的计算

0条回答 默认 最新

悬赏问题

0条回答默认最新