dpsr1670 2016-04-05 16:17
浏览 50
已采纳

file_get_contents()打破了ISO-8859-1编码

I am trying to read a page using file_get_contents() but I cannot get the character encoding to work.

this is my code:

    $username = "masked";
    $password = "maskedPass";
    $remote_url = 'https://utfws.utfpr.edu.br/aluno01/sistema/mplistahorario.inicio?p_curscodnr=212';

    // Create a stream
    $opts = array(
        'http'=>array(
            'method'=>"GET",
            'header' => array(
                "Authorization: Basic " . base64_encode("$username:$password"),
                'Accept-Charset: iso-8859-1'
            )

        )
    );

    $context = stream_context_create($opts);

    // Open the file using the HTTP headers set above
    $file = file_get_contents($remote_url, false, $context);

    echo $file;

I tried to change the character encoding to utf-8 but I always get a page with question marks instead of áéíóúãõç.

When I open the page directly in my browser it works just fine. Why is this happening?

  • 写回答

1条回答 默认 最新

  • doulipi3742 2016-04-05 16:48
    关注

    It sounds to me like this might just be a problem of lost encoding details.

    What you're describing is:

    1. request document from webserver, specifying encoding 8859-1
    2. server responds with document in requested encoding, including header specifying the encoding is 8859-1. This will look correct in a browser.
    3. output document ( but not header data! ) from php ( where this goes isn't specified
    4. open the data in some sort of viewer.

    See where the encoding specification was lost, there in step 3?

    The data can correctly be decoded with 8859-1, but only will be decoded with 8859-1 if the viewer is configured to use that encoding by default. Some apps may have a default of 8859-1, but UTF-8 is a lot more common these days.

    If you load the data into a different storage engine, say mysql, the problem may compound. mysql associates a charset with text data. If your database defaults to utf-8, and you don't tell it the data is actually in 8859-1, but you don't tell it the data is in 8859-1, now you're feeding it data that is assumed to be in utf-8, and the data will be treated as such in the database going forward. Now even if you ask the database for 8859-1 in the future, the data will be re-encoded from utf-8 to 8859-1, but it's not valid utf-8 - it's yet another incorrect set of bytes.

    To address this problem, specify the encoding when you view the data, or when you save it to a database.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 如何在scanpy上做差异基因和通路富集?
  • ¥20 关于#硬件工程#的问题,请各位专家解答!
  • ¥15 关于#matlab#的问题:期望的系统闭环传递函数为G(s)=wn^2/s^2+2¢wn+wn^2阻尼系数¢=0.707,使系统具有较小的超调量
  • ¥15 FLUENT如何实现在堆积颗粒的上表面加载高斯热源
  • ¥30 截图中的mathematics程序转换成matlab
  • ¥15 动力学代码报错,维度不匹配
  • ¥15 Power query添加列问题
  • ¥50 Kubernetes&Fission&Eleasticsearch
  • ¥15 報錯:Person is not mapped,如何解決?
  • ¥15 c++头文件不能识别CDialog