如何计算字符串中Unicode字符的出现次数？

how do you count the occurrences of a Unicode character in a string with PHP?
maybe this is a simple questions but I am a biginner in PHP. I want to count how many Unicode characters U+06cc are in a string.

Character 'yeh' in farsi corresponds to 2 code points.
ی = u+06cc
ي = u+064a
that u+064a is a substitute in Farsi.
The popular character Arabic charset CP-1256 has no character mapped into U+06cc.
now I want to count how many Unicode characters U+06cc are in a string to detect that string is arabic or farsi.
when I use $count = substr_count($str, "ى");
or when I use
$count = substr_count($str, "\xDB\x8c");
it counts both "ی" and "ي" ,
any idea ?

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

2条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
duangonglian6028 2014-01-18 13:51
关注
I suppose you have a UTF-8 string, since UTF-8 is the most reasonable Unicode encoding.

$count = substr_count($str, "\xDB\x8C");

is what you want. You simply treat the string as a sequence of bytes. In UTF-8 the first byte of a multibyte character and its continuation bytes can never be mixed up (the first byte is always 11...... binary, while continuation bytes are always 10......). This ensures you cannot find something different from what your are looking for.

To find the UTF-8 encoding of U+06CC I used the fileformat.info website, which I think is the best for this purpose.

If you use UTF-8 in your IDE too, you can simply write "ى" instead of "\xDB\x8C" (internally they are exactly the same string in PHP), but that will make the readability of what you have written dependent on the IDE (often not good if you need to share your code).

Now that you have clarified your question, my above answer is no more appropriate. I leave it there just as a reference for other passers-by.
Your problem could stem from the fact that, reading here it seems that "ي" can lose its dots below if modified by the Unicode character U+0654 (the non-spacing mark "Arabic hamsa above"). Since my browser does not remove the dots, and adds the hamsa, I don't know whether the hamsa is supposed to disappear too when the dots disappear. Anyway, it COULD be that "\xDB\x8C" has the same appearance as "\xD9\x8A\xD9\x94". I have not been able to find the reverse, i.e., the double dot below as a non-spacing modification character, which would explain why substr_count($str, "\xDB\x8c") finds the Arabic yeh too - but maybe it exists.
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

查看更多回答(1条)

报告相同问题？

关注问题

如何计算字符串中Unicode字符的出现次数？ php
2014-01-18 13:39

回答 2 已采纳 I suppose you have a UTF-8 string, since UTF-8 is the most reasonable Unicode encoding. $count =
Python中的某些Unicode为何无法转换为字符串？ python
2022-12-17 13:32

回答 1 已采纳不是所有的Unicode编码都可以转换为字符。Unicode字符集中有一些编码是用来表示非字符的，这些编码无法转换为字符。因为编码为0x10fff的字符是一个未分配的代码点，并不表示任何实际的字符。
如何匹配包含Unicode字符的完整字符串？
2019-07-31 09:47

回答 1 已采纳 You may use ^[\p{L}\p{M}]+$ See Go demo. Details ^ - start of string [ - start of a charact
php字符串中某个字符出现的次数,php 计算一个字符串在另一个字符串中出现的次数...
2021-03-24 00:39

驴放屁的博客一个判断子串在父串中出现的次数/// /// 计算字符串中子串出现的次数 /// /// 字符串已知一个字符串S 以及长度为n的字符数组a，编写一个函数，统计a中每个字符在字符串中的出现次数import java.util.Scanner;...
如何在PHP中从IMAP检索Unicode字符？ php
2016-10-10 00:31

回答 1 已采纳 So I'm not really sure how to get this working in PHP 5. I decided to try upgrading my server to P
python求一个任意字符串s中出现次数最多的一个字符 python
2021-09-22 09:16

回答 3 已采纳 >>> def main(s): eles = list(set(s)) eles.sort() eles.reverse() idx = list(
Golang中的字符串转换和Unicode
2018-06-15 15:15

回答 1 已采纳 You are quoting from a weak, unreliable source: Go Essentials: Strings. Amongst other things, ther
总结：计算机中字符串比较大小的规则
2023-01-25 16:32

ideal-cs的博客【代码】总结：计算机中字符串比较大小的规则。
如何使用预定义的字母表在unicode中对字符串进行排序？ php
2018-04-19 11:31

回答 2 已采纳 I've left all of my testing echoes in my code block and merely commented them out in case you want
php分裂字符串在块ngrams unicode char问题 php
2017-04-22 16:25

回答 1 已采纳 use unicode oriented string functions function Bigrams($word){ $ngrams = array(); $len =
python中文字符串在字典中显示unicode码 python
2018-12-14 07:21

回答 1 已采纳这个字是飞龙在天。原字应该是繁体字的龙。读yan,打出字“龑”,你改成这个字就没问题了。
php 中文字符串截取无乱码,php实现中文字符串无乱码截取
2021-04-24 00:02

weixin_39753791的博客在PHP开发中会经常用到字符串截取，有的时候字符串截取会出现乱码的情况，那么怎么解决这个问题呢，其实也很容易首先我们要了解关于中英文占多少字节的问题。ASCII码：一个中文汉字占两个字节的空间。UTF-8编码：一...
如何将字符串从unicode转换为html实体 html
2019-05-06 07:53

回答 1 已采纳 That character is not special in HTML, so you can include it as-is in the output, just be sure to
php 字符串乘法,字符串相乘
2021-04-18 09:06

quanthub的博客 字符串相乘在 Python 语言中，算术运算符的“+”和“*”是可以对字符串进行操作的，如字符串拼接(string concatenation)，字符串重复，即我自己所理解为的字符串相乘。str1 = 'Hello'str2 = 'World'new_str = str1 +...
php判断字符串有几个中文,编写PHP程序检查字符串中的中文字符个数的实例分享...
2021-03-24 11:14

高级内线刘闪的博客有时候我们需要计算一个字符串中包含的字数，对于纯英文字符串，字数等于字符串长度，用strlen函数即可获得，但如果字符串中包含中文怎办？mb_strlen可以实现，但不幸没装扩展，那就自己实现一下吧。php有一个扩展...
没有解决我的问题, 去提问

悬赏问题

¥15 装 pytorch 的时候出了好多问题，遇到这种情况怎么处理？
¥20 IOS游览器某宝手机网页版自动立即购买JavaScript脚本
¥15 手机接入宽带网线，如何释放宽带全部速度
¥30 关于#r语言#的问题：如何对R语言中mfgarch包中构建的garch-midas模型进行样本内长期波动率预测和样本外长期波动率预测
¥15 ETLCloud 处理json多层级问题
¥15 matlab中使用gurobi时报错
¥15 这个主板怎么能扩出一两个sata口
¥15 不是，这到底错哪儿了😭
¥15 2020长安杯与连接网探
¥15 关于#matlab#的问题：在模糊控制器中选出线路信息，在simulink中根据线路信息生成速度时间目标曲线（初速度为20m/s，15秒后减为0的速度时间图像）我想问线路信息是什么

如何计算字符串中Unicode字符的出现次数？

2条回答 默认 最新

悬赏问题

2条回答默认最新