PHP utf-8分布式Web应用程序的最佳实践和风险

I have read several things about this topic but still I have doubts I want to share with the community.

I want to add a complete utf-8 support to the application I developed, DaDaBIK; the application can be used with different DBMSs (such as MySQL, PostgreSQL, SQLite). The charset used in the databases can be ANY. I cant' set or assume the charset.

My approach would be convert, using iconv functions, everything i read from the db in utf-8 and then convert it back in the original charset when I have to write to the DB. This would allow me to assume I'm working with utf-8.

The problem, as you probably know, is that PHP doesn't support utf-8 natively and, even assuming to use mbstring, there are (according to http://www.phpwact.org/php/i18n/utf-8) several PHP functions which can create problems with utf-8 and DON't have an mbstring correspondance, for example the PREG extension, strcspn, trim, ucfirst, ucwords....

Since I'm using some external libraries such as adodb and htmLawed I can't control all the source code...in those libraries there are several cases of usage of those functions....do you have any advice about? And above all, how very popular applications like wordpress and so on are handling this (IMHO big) problem? I doubt they don't have any "trim" in the code....they just take the risk (data corruption for example) or there is something I can't see?

Thanks a lot.

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dongmale0656 2012-09-05 11:47
关注
First of all: PHP supports UTF-8 just fine natively. Only a few of the core functions dealing with strings should not be used on multi-byte strings.

It entirely depends on the functions you are talking about and what you're using them for. PHP strings are encoding-less byte arrays. Most standard functions therefore just work on raw bytes. trim just looks for certain bytes at the start and end of the string and trims them off, which works perfectly fine with UTF-8 encoded strings, because UTF-8 is entirely ASCII compatible. The same goes for str_replace and similar functions that look for characters (bytes) inside strings and replace or remove them.

The only real issue is functions that work with an offset, like substr. The default functions work with byte offsets, whereas you really want a more intelligent character offset, which does not necessarily correspond to bytes. For those functions an mb_ equivalent typically exists.

preg_ supports UTF-8 just fine using the /u modifier.

If you have a library which uses, for instance, substr on a potential multi-byte string, use a different library because it's a bad library.

See What Every Programmer Absolutely, Positively Needs To Know About Encodings And Character Sets To Work With Text for some more in-depth discussion and demystification about PHP and character sets.

Further, it does not matter what the strings are encoded as in the database. You can set the connection encoding for the database, which will cause it to convert everything for you and always return you data in the desired client encoding. No need for iconverting everything in PHP.

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

nacos-seata 分布式项目回滚无效 java 分布式
2022-12-26 11:08

回答 1 已采纳文章：Nacos集成分布式事务Seata文档中也许有你想要的答案，请看下吧
Redis分布式锁的应用 java redis 分布式
2022-07-28 16:12

回答 2 已采纳单机情况下使用redis分布式锁有什么缺陷或优势缺陷：都单机了，还用什么分布式锁哇，缺陷就是浪费资源，大材小用。还有就是单机本地java有很多锁可以用哇，虽然redis快，但是多出来一个服务就是不一
如何在分布式应用程序上处理文件上载？ php
2013-06-10 10:45

回答 1 已采纳 You could use an already existing platform to perform syncing. Most modern OSes support syncing, a
iWeb开源三剑客之iWebIM v1.0 Beta2 UTF-8
2019-10-26 02:05

iWebIM为站点用户提供一个良好的Web模式即时聊天扩展应用。iWebIM不仅可以集成到Jooyea团队开发的产品（iWebSNS、iWebMall & iWebShop）中，还可以集成到到任何以用户交流为中心的网站系统上（论坛、博客、新闻...
PHP单例类的最佳实践[重复] php
2012-01-08 10:28

回答 3 已采纳 An example singleton classes in php: Creating the Singleton design pattern in PHP5 : Ans 1 : Creat
分布式系统小程序的时延问题分布式网络
2017-10-21 03:10

回答 2 已采纳如果这样，那就只能求大概的时间了，客户端可以获取发送命令的时间，可以获取收到时间的时间，减去服务器执行命令的生成时间，就剩下客户端发命令至开始生成的时间和服务端发送到客户端收的时间，假设完成整个步骤的
关于tomcat集群配置和分布式部署 tomcat 分布式负载均衡
2016-04-20 09:38

回答 2 已采纳可以部署在一台机器上，可以办到分流，但是因为你是在一台服务器上，所以jvm承载内存就有限。在一个服务器上，连结过多效率也不高，虽然分流，但是处理还是在一台机器上，cpu和内存都负载很重，只是减
xxl-Job分布式任务调度
2021-02-17 12:19

赵广陆的博客概述1.1 什么是任务调度1.2 cron表达式1.3 什么是分布式任务调度1.4 xxl-Job简介2.XXL-Job快速入门2.1 环境搭建2.1.1 调度中心环境要求2.1.2 源码仓库地址2.1.3 初始化“调度数据库”2.1.4 编译源码2.1.5 配置部署...
分布式工作同步和fork-join并行编程方法有什么区别
2016-03-12 03:24

回答 1 已采纳 I would be very careful taking the statement you've quoted from https://en.wikipedia.org/wiki/Go_(
为什么用分布式锁和分布式事务 java
2022-04-14 16:24

回答 2 已采纳 分布式系统中，一个业务的逻辑可能会放在几个不同的java服务中。数据库自带的锁和事务是与代码中获取的connection连接绑定的，而这个连接也只存在于一个java服务中，自然也就导致了事务只与一个j
求大神解释一下集合式和分布式 分布式
2017-06-29 06:37

回答 1 已采纳可以参考： http://blog.csdn.net/dan15188387481/article/details/49402419 **如果对您有帮助，请采纳答案好吗，谢谢！**
xrkmonitor字符云监控系统-PHP
2021-06-25 05:17

安装脚本使用中文 utf8 编码，安装过程请将您的终端设置为 utf8，以免出现乱码安装脚本同时支持 root 账号和普通账号操作，使用普通账号执行安装部署要求如下: 在线部署使用动态链接库，需要在指定目录下执行...
关于#分布式账本#的问题：如何进入领域内容榜 TV 分布式账本
2022-12-23 21:49

回答 1 已采纳要输入分布式分类帐的字段内容列表，您需要遵循您正在使用的特定分类帐系统建立的程序和协议。向分布式账本添加内容的一些常见步骤可能包括：确定要添加到分类帐的内容类型。不同的分布式账本系统可能支持不同类型
Centos7 部署 XXL-JOB 分布式任务调度平台
2022-11-28 14:51

BasicLab基础架构实验室的博客 Centos7 部署 XXL-JOB 分布式任务调度平台，需在源代码编写自定义任务，然后编译打包发送到服务器并启动执行器web端添加执行器web端添加任务。
LAMP-web平台搭建（Linux+apache+mysql+php）
2022-07-14 22:32

梦游的老李的博客 util 复制整个目录并取消版本号到httpd的/srclib下面，./configure回检测 apache默认需要依赖pcre软件，但由于apache软件版本较高，系统预安装的pcre无法使用，需认为手动安装合适版本 cd /pcre-8.x.x ./configure ...
没有解决我的问题, 去提问

悬赏问题

¥15 对于这个复杂问题的解释说明
¥50 三种调度算法报错有实例
¥15 关于#python#的问题，请各位专家解答！
¥200 询问：python实现大地主题正反算的程序设计，有偿
¥15 smptlib使用465端口发送邮件失败
¥200 总是报错，能帮助用python实现程序实现高斯正反算吗？有偿
¥15 对于squad数据集的基于bert模型的微调
¥15 为什么我运行这个网络会出现以下报错？CRNN神经网络
¥20 steam下载游戏占用内存
¥15 CST保存项目时失败

PHP utf-8分布式Web应用程序的最佳实践和风险

1条回答 默认 最新

悬赏问题

1条回答默认最新