不支持的字符的Groovy字符编码结果不匹配

This is so far my analysis and obstacle, suppose for the character below, which is basically supported by “UTF-8” character set and has not supported to “EUC-JP”. “―” For php, there is a method “var_dump(input_string)” to convert any string into byte array with encoding “EUC-JP”, in this case, it returns,

[161, 189, 10] //Note: [3]=>int(10) for Line Feed.

similarly, when I produce the byte array with encoding “UTF-8”, in this case it returns,

[226, 128, 141,10] //Note: [4]=>int(10) for Line Feed.

But, when I tried the same thing in Groovy, It behaves totally differently, For EUC-JP the byte arrangement as follows,

[-95, -67, 10] //Note: [3]=>int(10) for Line Feed.

For UTF-8,

[-30, -128, -107, 10] //Note: [4]=>int(10) for Line Feed.

N.B., I fetched the data directly from 2 different text file encoded with EUC-JP and UTF-8 respectively. The last byte of all the above arrays are for LF (Line Feed). As the byte arrangement is different for same character encoding for those two language, it’s not possible to match produced hash in between.

Here is code sample so far, Start with php,

<?php
$myfile = fopen("euc_jp.txt", "r") or die("Unable to open file!");
$str1 = fread($myfile,filesize("euc_jp.txt"));

echo "Read From File EUC-JP:<br/>";
echo $str1;
$byte_array1 = unpack('C*', $str1);
echo "<br/>Byte Dump of EUC-JP File Content:<br/>";
var_dump($byte_array1);

echo "<br/><br/>";
$myfile2 = fopen("utf_8.txt", "r") or die("Unable to open file!");
$str2 = fread($myfile2,filesize("utf_8.txt"));

echo "<br/><br/>";
echo "Read From File UTF-8:<br/>";
echo $str2;
$byte_array2 = unpack('C*', $str2);
echo "<br/>Byte Dump of EUC-JP File Content:<br/>";
var_dump($byte_array2);


$encodedToEucJp = mb_convert_encoding($str2, "EUC_JP");
echo "<br/><br/>After conversion (UTF-8) to (EUC-JP): <br/>";
echo $encodedToEucJp;

echo "<br/><br/>";

echo "Hash Generation Directly From EUC-JP:<br/>";
print_r(md5($str1));

echo "<br/><br/>";
echo "Hash Generation From UTF-8 File Content After Encoded to EUC-JP:<br/>";
print_r(md5($encodedToEucJp));

fclose($myfile);
fclose($myfile2);
?>

For Groovy,

println(new File('/var/www/html/euc_jp.txt').getText('EUC-JP').getBytes("EUC-JP"))
println(new File('/var/www/html/utf_8.txt').getText('UTF-8').getBytes("UTF-8"))

This is so far my obstacle, first of all the byte representation for those two language is different, if it’s not a limitation of Groovy as well as Java8, how do I produce the same byte arrangement that produced by php, secondly, what is the equivalent code for the native php function, b_convert_encoding(). So, that I able to convert any string encoding, where there might have some character that doesn’t support by both the encoding mechanism.

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
douzhangkui2467 2018-03-29 06:05
关注
Java and Groovy bytes are two's complement signed values, i.e. the value of one byte is between -128 and +127.

To calculate the corresponding unsigned value for the same one byte, add 256 to a negative value, e.g. -95 + 256 = 161

So, what you see are the same bytes. It's just that PHP prints the values as unsigned, and Groovy prints the values as signed. They are still the same 8 bits in a byte.

Unsigned Signed Hex 161, 189, 10 == -95, -67, 10 == A1, BD, 0A 226, 128, 141, 10 == -30, -128, -115, 10 == E2, 80, 8D, 0A 226, 128, 149, 10 == -30, -128, -107, 10 == E2, 80, 95, 0A

E2 80 8D is UTF-8 for Unicode Character 'ZERO WIDTH JOINER' (U+200D).

E2 80 95 is UTF-8 for Unicode Character 'HORIZONTAL BAR' (U+2015).

If you want to print the values of a byte array as readable text, I suggest you print the bytes using 2-digit HEX. That way every byte is consistently 2 hex-digits long.
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

给定相同的输入字符串，为什么这些base64编码输出不同？ gnu javascript
2015-05-07 13:26

回答 2 已采纳 echo 'Laurence Tureaud is Mr. T' Echo adds a newline after the string. Try the following to
[急]groovy-1.8.3.jar下载不下来
2014-12-19 02:37

回答 4 已采纳已发，注意查收；邮箱：jerry8059@163.com
如果将一段字符串转换成为一段可执行的代码？ java
2012-06-01 10:04

回答 1 已采纳简单，使用evaluate即可，能动态把String转成Groovy code。类似JS的eval
groovy if 判断字符串_Groovy正则表达式小结
2021-03-07 22:24

weixin_39952031的博客如果你在Java编码时使用过正则表达式，那么会发现Groovy对正则表达式的用法做了很大的简化，从而让更多的人不再惧怕正则表达式。Ted Naleid的博文中，详细介绍了Groovy中正则表达的用法。用/../定义正则表达式Groovy...
Jenkins里定时器构建任务使用groovy模板发邮件报错 jenkins
2023-03-14 17:37

回答 7 已采纳基于Monster 组和GPT的调写：根据错误信息，Groovy脚本中存在一个空对象调用了方法。具体地说，Groovy脚本中的 getUserName() 方法在一个空对象上被调用，这导致了Null
用于SoapUI中的测试步骤的Groovy脚本 php
2017-04-07 16:23

回答 1 已采纳 You could use the following in the groovy [step 2] def totalPostSteps = 8 def totalFromResponse =
Groovy和Go gvm的共存
2014-10-12 12:10

回答 1 已采纳 Issue 82 and issue 103 seem to show there is no immediate plan to resolve that collision. You fin
java 字符串包_包java字符串
2021-02-12 20:54

weixin_39614831的博客 Java核心技术卷I基础知识3.6.3　不可变字符串3.6.3　不可变字符串String类没有提供用于修改字符串的方法。如果希望将greeting的内容修改为“Help!”，不能直接地将greeting的最后两个位置的字符修改为‘p’和‘!’。...
有关jasperreports报错的问题java.lang.NoClassDefFoundError: net/sf/jasperreports/engine/JRException eclipse java
2019-02-19 16:38

回答 3 已采纳首先先感谢前面几位大佬提供的思路，没有他们，我还得耽误很久才能找到问题所在虽然没能直接帮我解决问题，但是还是提供了正确的方向，感谢矛十七，zoyation，祗是辉哥哥的帮助下面是我解决
groovy中，从数据库读取的数据是列表形式[1,2,3,4],应该怎么写入本地文件中？ java
2019-01-25 13:52

回答 2 已采纳数据写入的格式不对，转换下就可以了
Android编译gradle提示A problem occurred evaluating project ':example'. android android-studio java
2022-06-15 18:34

回答 2 已采纳这个方法废弃了现在用implementation 或者api
JAVA启动报错未结束的字符串文字
2024-08-11 03:54

远坂宗敬的博客这一错误通常是因为代码中存在不完整的字符串声明造成的。在本篇文章中，我们将深入分析这一错误的成因，并通过代码示例和流程图帮助大家更好地理解和解决这个问题。什么是未结束的字符串文字？...
groovy 规则引擎 java_Drools, IKExpression, Aviator和Groovy字符串表达式求值比较
2021-03-22 09:53

此用户已诈尸的博客 Gradle脚本是基于一种JVM语言 -- Groovy，再加上DSL(领域特定语言)语言组成的。因为eSOC项目的一个重要功能就是规则引擎，规则引擎的主要功能就是关联分析。规则引擎的最基本的功能就是计算表达的值(表达式是规则中...
java groovy 表达式_Groovy的基础语法
2021-03-20 09:00

weixin_39799290的博客对于Java 开发人员，Groovy 提供了更简单的替代语言，且几乎不需要学习时间。语句Groovy的语句和Java类似，但是有一些特殊的地方。例如语句的分号是可选的。如果每行一个语句，就可以省略分号；如果一行上有多个语句...
Java可移动性不强_java地位无可撼动的原因
2021-03-17 23:53

May Wei的博客而Java作为现代程序员的中坚力量在这点上会不会成为下一个COBOL？有关JAVA的技术卖出多少本书已经是一个很久远的记忆了。现处中年时期的Java语言的用途已经不再出现在各种杂志的封面上了。JAVA从出生到现在已经26年...
没有解决我的问题, 去提问

悬赏问题

¥15 做个有关计算的小程序
¥15 MPI读取tif文件无法正常给各进程分配路径
¥15 如何用MATLAB实现以下三个公式（有相互嵌套）
¥30 关于#算法#的问题：运用EViews第九版本进行一系列计量经济学的时间数列数据回归分析预测问题求各位帮我解答一下
¥15 setInterval 页面闪烁，怎么解决
¥15 如何让企业微信机器人实现消息汇总整合
¥50 关于#ui#的问题：做yolov8的ui界面出现的问题
¥15 如何用Python爬取各高校教师公开的教育和工作经历
¥15 TLE9879QXA40 电机驱动
¥20 对于工程问题的非线性数学模型进行线性化

不支持的字符的Groovy字符编码结果不匹配

1条回答 默认 最新

悬赏问题

1条回答默认最新