如何计算字符串中Unicode字符的出现次数？

how do you count the occurrences of a Unicode character in a string with PHP?
maybe this is a simple questions but I am a biginner in PHP. I want to count how many Unicode characters U+06cc are in a string.

Character 'yeh' in farsi corresponds to 2 code points.
ی = u+06cc
ي = u+064a
that u+064a is a substitute in Farsi.
The popular character Arabic charset CP-1256 has no character mapped into U+06cc.
now I want to count how many Unicode characters U+06cc are in a string to detect that string is arabic or farsi.
when I use $count = substr_count($str, "ى");
or when I use
$count = substr_count($str, "\xDB\x8c");
it counts both "ی" and "ي" ,
any idea ?

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

2条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
duangonglian6028 2014-01-18 13:51
关注
I suppose you have a UTF-8 string, since UTF-8 is the most reasonable Unicode encoding.

$count = substr_count($str, "\xDB\x8C");

is what you want. You simply treat the string as a sequence of bytes. In UTF-8 the first byte of a multibyte character and its continuation bytes can never be mixed up (the first byte is always 11...... binary, while continuation bytes are always 10......). This ensures you cannot find something different from what your are looking for.

To find the UTF-8 encoding of U+06CC I used the fileformat.info website, which I think is the best for this purpose.

If you use UTF-8 in your IDE too, you can simply write "ى" instead of "\xDB\x8C" (internally they are exactly the same string in PHP), but that will make the readability of what you have written dependent on the IDE (often not good if you need to share your code).

Now that you have clarified your question, my above answer is no more appropriate. I leave it there just as a reference for other passers-by.
Your problem could stem from the fact that, reading here it seems that "ي" can lose its dots below if modified by the Unicode character U+0654 (the non-spacing mark "Arabic hamsa above"). Since my browser does not remove the dots, and adds the hamsa, I don't know whether the hamsa is supposed to disappear too when the dots disappear. Anyway, it COULD be that "\xDB\x8C" has the same appearance as "\xD9\x8A\xD9\x94". I have not been able to find the reverse, i.e., the double dot below as a non-spacing modification character, which would explain why substr_count($str, "\xDB\x8c") finds the Arabic yeh too - but maybe it exists.
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

查看更多回答(1条)

报告相同问题？

关注问题

Python中的某些Unicode为何无法转换为字符串？ python
2022-12-17 13:32

回答 1 已采纳不是所有的Unicode编码都可以转换为字符。Unicode字符集中有一些编码是用来表示非字符的，这些编码无法转换为字符。因为编码为0x10fff的字符是一个未分配的代码点，并不表示任何实际的字符。
如何匹配包含Unicode字符的完整字符串？
2019-07-31 09:47

回答 1 已采纳 You may use ^[\p{L}\p{M}]+$ See Go demo. Details ^ - start of string [ - start of a charact
如何在PHP中从IMAP检索Unicode字符？ php
2016-10-10 00:31

回答 1 已采纳 So I'm not really sure how to get this working in PHP 5. I decided to try upgrading my server to P
使用php统计字符串中中英文字符的个数
2020-12-18 19:34

在PHP编程语言中，统计字符串中中英文字符的个数是一项常见的任务，这在处理用户输入、数据分析或者文本处理场景中很有用。本篇将详细解释如何使用PHP来完成这个功能，以及涉及到的相关知识点。首先，我们要理解的...
python求一个任意字符串s中出现次数最多的一个字符 python
2021-09-22 09:16

回答 3 已采纳 >>> def main(s): eles = list(set(s)) eles.sort() eles.reverse() idx = list(
Golang中的字符串转换和Unicode
2018-06-15 15:15

回答 1 已采纳 You are quoting from a weak, unreliable source: Go Essentials: Strings. Amongst other things, ther
如何使用预定义的字母表在unicode中对字符串进行排序？ php
2018-04-19 11:31

回答 2 已采纳 I've left all of my testing echoes in my code block and merely commented them out in case you want
php 中文字符串首字母的获取函数分享
2020-10-26 18:25

在PHP中，处理中文字符串时常常会遇到需要获取某个字符串首字母的情况。在英文字符串中，首字母的获取是通过直接访问数组索引的方式，例如获取字符串 "Hello" 的首字母可以通过访问 $str[0] 来实现。但中文字符并非...
php分裂字符串在块ngrams unicode char问题 php
2017-04-22 16:25

回答 1 已采纳 use unicode oriented string functions function Bigrams($word){ $ngrams = array(); $len =
python中文字符串在字典中显示unicode码 python
2018-12-14 07:21

回答 1 已采纳这个字是飞龙在天。原字应该是繁体字的龙。读yan,打出字“龑”,你改成这个字就没问题了。
Golang-从字符串中删除所有Unicode换行符
2016-07-11 11:20

回答 1 已采纳 You can use strings.Map: func filterNewLines(s string) string { return strings.Map(func(r run
支持中文的PHP按字符串长度分割成数组代码
2020-10-24 05:22

如果`$l`大于0，则通过`mb_strlen`计算字符串长度，并使用`for`循环按指定长度`$l`分割字符串。如果`$l`小于等于0，则会使用正则表达式`preg_split`方法来分割字符串，`preg_split`方法的`u`修饰符是必须的，它允许...
编写PHP程序检查字符串中的中文字符个数的实例分享
2020-10-22 16:57

在PHP编程中，有时我们需要计算字符串中中文字符的数量。通常，英文字符串的长度可以通过PHP内置的`strlen()`函数获取，但当字符串包含中文字符时，情况会变得复杂。这是因为中文字符在不同的字符编码中可能由多个...
php查找字符串中第一个非0的位置截取
2020-12-19 00:24

6. **substr_count()** - 计算字符串中特定子串（如'0'）出现的次数。 7. **mb_strpos()** - 对于多字节字符集（如UTF-8），`mb_strpos()`是处理Unicode字符串的版本，可以正确处理中文字符。 8. **mb_substr()** ...
解决PHP字符串长度不一致的问题
2020-10-18 20:34

首先，要理解字符串长度计算问题常常出现在字符编码转换的场景中。在不同的编码格式下，同一个字符所占用的字节可能会不同。比如，UTF-8编码方式下，中文字符往往占用3个字节，而在GB2312或其他双字节编码中，可能只...
没有解决我的问题, 去提问

悬赏问题

¥15 如何让企业微信机器人实现消息汇总整合
¥50 关于#ui#的问题：做yolov8的ui界面出现的问题
¥15 如何用Python爬取各高校教师公开的教育和工作经历
¥15 TLE9879QXA40 电机驱动
¥20 对于工程问题的非线性数学模型进行线性化
¥15 Mirare PLUS 进行密钥认证？（详解）
¥15 物体双站RCS和其组成阵列后的双站RCS关系验证
¥20 想用ollama做一个自己的AI数据库
¥15 关于qualoth编辑及缝合服装领子的问题解决方案探寻
¥15 请问怎么才能复现这样的图呀

如何计算字符串中Unicode字符的出现次数？

2条回答 默认 最新

悬赏问题

2条回答默认最新