dpsr1670 2016-04-05 16:17
浏览 50
已采纳

file_get_contents()打破了ISO-8859-1编码

I am trying to read a page using file_get_contents() but I cannot get the character encoding to work.

this is my code:

    $username = "masked";
    $password = "maskedPass";
    $remote_url = 'https://utfws.utfpr.edu.br/aluno01/sistema/mplistahorario.inicio?p_curscodnr=212';

    // Create a stream
    $opts = array(
        'http'=>array(
            'method'=>"GET",
            'header' => array(
                "Authorization: Basic " . base64_encode("$username:$password"),
                'Accept-Charset: iso-8859-1'
            )

        )
    );

    $context = stream_context_create($opts);

    // Open the file using the HTTP headers set above
    $file = file_get_contents($remote_url, false, $context);

    echo $file;

I tried to change the character encoding to utf-8 but I always get a page with question marks instead of áéíóúãõç.

When I open the page directly in my browser it works just fine. Why is this happening?

  • 写回答

1条回答 默认 最新

  • doulipi3742 2016-04-05 16:48
    关注

    It sounds to me like this might just be a problem of lost encoding details.

    What you're describing is:

    1. request document from webserver, specifying encoding 8859-1
    2. server responds with document in requested encoding, including header specifying the encoding is 8859-1. This will look correct in a browser.
    3. output document ( but not header data! ) from php ( where this goes isn't specified
    4. open the data in some sort of viewer.

    See where the encoding specification was lost, there in step 3?

    The data can correctly be decoded with 8859-1, but only will be decoded with 8859-1 if the viewer is configured to use that encoding by default. Some apps may have a default of 8859-1, but UTF-8 is a lot more common these days.

    If you load the data into a different storage engine, say mysql, the problem may compound. mysql associates a charset with text data. If your database defaults to utf-8, and you don't tell it the data is actually in 8859-1, but you don't tell it the data is in 8859-1, now you're feeding it data that is assumed to be in utf-8, and the data will be treated as such in the database going forward. Now even if you ask the database for 8859-1 in the future, the data will be re-encoded from utf-8 to 8859-1, but it's not valid utf-8 - it's yet another incorrect set of bytes.

    To address this problem, specify the encoding when you view the data, or when you save it to a database.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 2024-五一综合模拟赛
  • ¥15 如何将下列的“无限压缩存储器”设计出来
  • ¥15 下图接收小电路,谁知道原理
  • ¥15 装 pytorch 的时候出了好多问题,遇到这种情况怎么处理?
  • ¥20 IOS游览器某宝手机网页版自动立即购买JavaScript脚本
  • ¥15 手机接入宽带网线,如何释放宽带全部速度
  • ¥30 关于#r语言#的问题:如何对R语言中mfgarch包中构建的garch-midas模型进行样本内长期波动率预测和样本外长期波动率预测
  • ¥15 ETLCloud 处理json多层级问题
  • ¥15 matlab中使用gurobi时报错
  • ¥15 这个主板怎么能扩出一两个sata口