为什么utf 8.Valid String函数无法检测到无效的unicode字符？

From https://en.wikipedia.org/wiki/UTF-8#Invalid_code_points, I got to know that U+D800 through U+DFFF are invalid. So in decimal system, it is 55296 through 57343.

And Maximum valid Unicode is '\U0010FFFF'. In decimal system, it is 1114111

My code:

package main

import "fmt"
import "unicode/utf8"

func main() {

    fmt.Println("Case 1(Invalid Range)")
    str := fmt.Sprintf("%c", rune(55296+1))
    if !utf8.ValidString(str) {
        fmt.Print(str, " is not a valid Unicode")
    } else {
        fmt.Println(str, " is valid unicode character")
    }

    fmt.Println("Case 2(More than maximum valid range)")
    str = fmt.Sprintf("%c", rune(1114111+1))
    if !utf8.ValidString(str) {
        fmt.Print(str, " is not a valid Unicode")
    } else {
        fmt.Println(str, " is valid unicode character")
    }
}

Why ValidString function is not returning false for invalid unicode characters given as input ? I am sure my understanding is wrong, could some one explain??

写回答
好问题 0 提建议
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

2条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
douxi2011 2016-04-05 13:17
关注
Your problem happens in Sprintf. Since you give it an invalid character Sprintf replaces with with rune(65533) which is the unicode replacement character used instead of invalid characters. So your string is valid UTF8.

This will also happen if you do something like this: str := string([]rune{ 55297 }) so this might be something that happens when creating runes. It's not immediately obvious from: https://blog.golang.org/strings

If you want to force your string to contain invalid UTF8 you can write the first string like this:

str := string([]byte{237, 159, 193})
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

查看更多回答(1条)

报告相同问题？

关注问题

utf8.h.zip_C++_UTF8_utf8.h_utf8函数_字符处理
2022-09-21 20:42

在C++编程中，UTF-8编码是一种广泛使用的字符编码标准，它能够表示Unicode字符集中的所有字符。UTF-8.h.zip文件包含了一个C接口的UTF-8字符串处理函数库，这个库对于需要处理UTF-8编码的字符串的开发者来说非常实用...
Go语言---unicode/utf8 unicode/utf16包
2018-05-10 11:22

li_101357的博客 unicode/utf8包utf8实现了函数和常量来支持UTF-8编码的文本。它包括在runes和UTF-8字节序列之间转换的函数。utf8里面的函数就有一些字节和字符的转换。//判断是否符合UTF编码的函数 // Valid 判断 p 是否为完整有效...
AL32UTF8/UTF8（Unicode）数据库字符集含义 (文档 ID 1946289.1)
2019-09-13 17:33

msdnchina的博客 AL32UTF8/UTF8（Unicode）数据库字符集含义 (文档 ID 1946289.1) 适用于: Oracle Database Cloud Schema Service - 版本 N/A 和更高版本 Oracle Database Exadata Cloud Machine - 版本 N/A 和更高版本 Oracle ...
55、Unicode 支持函数详解
2025-12-17 03:54

whisky的博客本文详细解析了一系列与Unicode支持相关的函数，涵盖字符串比较、编码转换、UTF-8合法性判断、字符特性检测、显示与解码等多个方面。文章介绍了各函数的功能、参数、返回值及使用注意事项，特别指出部分函数的实验性...
GoLang之标准库unicode/utf8包
2022-05-12 01:15

GoGo在努力的博客 GoLang之标准库unicode/utf8包
php怎么区分字符编码,PHP检测字符串是否为UTF8编码的常用方法
2021-03-24 08:42

林道蕴的博客本文实例总结了PHP检测字符串是否为UTF8编码的常用方法。分享给大家供大家参考。具体实现方法如下：检测字符串编码...例子1/*** 检测字符串是否为UTF8编码* @param string $str 被检测的字符串* @return boolean*/f...
Elixir字符串处理：Unicode与UTF-8的最佳实践
2025-08-30 00:40

谢媛露Trevor的博客在当今全球化的数字世界中，处理多语言文本已成为现代应用程序的基本...本文将深入探讨Elixir中字符串处理的精髓，帮助你掌握Unicode和UTF-8的最佳实践。读完本文，你将获得： - ✅ Elixir字符串内部表示机制的...
掌握UTF-8编码转换技术
2025-07-10 07:44

嗹国学长的博客 UTF-8是Unicode字符集的一种实现方式，广泛应用于网络传输和文件存储。它是一种可变长度的编码方式，既支持ASCII字符集，也支持包括中文、日文等在内的多种语言字符。ASCII（American Standard Code for Information...
ffmpeg 中 encode_str8 函数的用途及 GET_UTF8 宏的理解
2025-08-20 11:12

hjjdebug的博客 encode_str8函数则用于在UTF-8字符串前添加长度标识，当字符串包含多字节字符时，会在开头添加长度和类型标记（如0x15），否则直接添加长度。文章通过分解宏为函数的方式，结合中文字符示例，详细说明了其工作原理。
使用UTF8-CPP转换unicode编码附录：UTF8和UTF16和UTF32和Unicode编码
2019-08-14 18:53

weixin_30512785的博客本文用于解决如何用C++处理字符串的编码格式。...为了方便后续使用这个C++库，我们可以直接将源程序文件utf8.h和文件夹utf8复制到目录/usr/local/include 下，这样就可以在项目中直接调用了。执行命令： m...
Roc语言字符串处理：UTF-8支持与文本操作函数
2025-11-10 01:58

平淮齐Percy的博客对于多语言支持和全球化应用而言，正确处理Unicode字符（尤其是UTF-8编码）至关重要。Roc语言作为一门注重效率与友好性的函数式语言，提供了强大的字符串处理能力。本文将深入探讨Roc语言的字符串实现机制、UTF-8...
Java设置String字符串编码方法详解
2023-07-10 21:57

字符编码是将字符（如字母、数字和符号）与数字或二进制值关联的系统，例如ASCII、Unicode（包括UTF-8、UTF-16等）和GB2312等。Java语言默认使用Unicode作为其内部字符集，这使得Java程序可以处理各种语言的字符。 ...
没有解决我的问题, 去提问

为什么utf 8.Valid String函数无法检测到无效的unicode字符？

2条回答 默认 最新

2条回答默认最新