douzhou7124 2008-12-15 16:02
浏览 59
已采纳

字符编码问题 - PHP输出,由.NET读取,通过HttpWebRequest

I have a PHP script (running on a Linux server) that ouputs the names of some files on the server. It outputs these file names in a simple text-only format.

This output is read from a VB.NET program by using HttpWebRequest, HttpWebResponse, and a StreamReader.

The problem is that some of the file names being output contain... unusual characters. Specifically, the "section" symbol (§).

If I view the output of the PHP script in a web browser, the symbol appears fine.

But when I read the output of the PHP script into my .NET program, the symbol doesn't appear correctly (it appears as a generic "block" symbol).

I've tried all the different character encoding options that you can use when reading the response stream (from the HttpWebResponse). I've tried outputting the stream directly to a text file (no good), displaying it in a TextBox (no good), and even when viewing the results directly in the Visual Studio debugger, the character appears as a block instead of as the "section" symbol.

I've examined the output in a hex editor (as suggested by a related question, "how do you troubleshoot character encoding problems."

When I write out the section symbol (§) from .NET itself, the hex bytes I see representing it are "c2 a7" (makes sense if it's unicode, right? requires two bytes?). When I write out the output from the PHP script directly to a file and examine that with a hex editor, the symbol shows up as "ef bf bd" - three bytes instead of two?

I'm at a loss as to what to do - if I need to specify some other character encoding, or if I'm missing something obvious about this.

Here's the code that's used to get the output of the PHP script (VB-style comments modified so they appear correctly on this site):


Dim myRequest As HttpWebRequest = WebRequest.Create("http://www.example.com/sample.php")

Dim myResponse As HttpWebResponse = myRequest.GetResponse()

// read the response stream
Dim myReader As New StreamReader(myResponse.GetResponseStream())

// read the entire output in one block (just as an example)
Dim theOutput as String = myReader.ReadToEnd()

Any ideas?

  • Am I using the wrong kind of StreamReader? (I've tried passing the character encoding in the call to create the new StreamReader - I've tried all the ones that are in System.Text.Encoding - UTF-8, UTF-7, ASCII, UTF-32, Unicode, etc.)
  • Should I be using a different method for reading the output of the PHP script?
  • Is there something I should be doing different on the PHP-side when outputting the text?

UPDATED INFO:

  • The output from PHP is specifically encoded UTF-8 by calling: utf8_encode($file);
  • When I wrote out the symbol from .NET, I copied and pasted the symbol from the Character Map app in Windows. I also copied & pasted it directly from the file's name (in Windows) and from this web page itself - all gave the same hex value when written out (c2 a7).
  • Yes, the "section symbol" I'm talking about is U+00A7 (ALT+0167 on Windows, according to Character Map).
  • The content-type is set explicitly via header('Content-Type: text/html; charset=utf-8'); right at the beginning of the PHP script.

UPDATE:

Figured it out myself, but I couldn't have done it without the help from the people who answered. Thank you!

  • 写回答

4条回答 默认 最新

  • douren2395 2008-12-15 17:30
    关注

    Figured it out!!

    Like so many things, it's simple in retrospect!

    Jon Skeet was correct - it was meant to be UTF-8, but definitely wasn't.

    Turns out, in the original script I was using (before I stripped it down to make it simpler to debug), there was some additional text output by the script which was not wrapped in a utf8_encode() call. This caused the entire page to be output in ISO-8859-1 instead of UTF-8.

    I noticed this when I checked my testing script's "encoding" property (in Firefox, "View Page Info"). It was UTF-8 for the testing script, but ISO-8859-1. The production script also printed the date of the file; this was not wrapped in a call to utf8_encode - and that caused the entire output to change to ISO-08859-1.

    [Insert sound of me slapping my forehead here]

    Thanks to everyone who answered! You were very helpful!

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(3条)

报告相同问题?

悬赏问题

  • ¥15 素材场景中光线烘焙后灯光失效
  • ¥15 请教一下各位,为什么我这个没有实现模拟点击
  • ¥15 执行 virtuoso 命令后,界面没有,cadence 启动不起来
  • ¥50 comfyui下连接animatediff节点生成视频质量非常差的原因
  • ¥20 有关区间dp的问题求解
  • ¥15 多电路系统共用电源的串扰问题
  • ¥15 slam rangenet++配置
  • ¥15 有没有研究水声通信方面的帮我改俩matlab代码
  • ¥15 ubuntu子系统密码忘记
  • ¥15 保护模式-系统加载-段寄存器