duanmen1887 2012-12-14 22:28 采纳率: 0%
浏览 74
已采纳

php rss feed字符编码多语言Feed项

I've created an rss feed for a website, but it has 2 languages - Greek and English. Everything works fine except the rss feed when an item has a title written in Greek.

So I though ok I must change the encoding before parsing the string. I failed 100%.

I have tried every encoding function php provides: iconv, ut8_encode, mb_convert_encoding also mb_detect_encoding strict and not. I also used html entities, but nothing seems to work

The source code generating the rss is this:

function construct_rss($results, $cat = null)
{

    if($results == false)
    {
        exit;
    }

    header('Content-Type: application/rss+xml charset=UTF-8');


    $rssfeed = '<?xml version="1.0" encoding="utf-8" ?>';
    $rssfeed .= '<rss version="2.0">';
    $rssfeed .= '<channel>';
    $rssfeed .= '<title>domain.com RSS feed</title>';
    $rssfeed .= '<link>http://www.domain.com</link>';
    if($cat == null)
    {
        $rssfeed .= '<description>Upcoming events</description>';
    }
    else
    {
        $rssfeed .= '<description>Upcoming events - ' . $cat . '</description>';
    }
    $rssfeed .= '<language>en-us</language>';
    $rssfeed .= '<copyright>Copyright (C) 2012 domain.com</copyright>';


    foreach ($results as $key => $event) 
    {
        $exp = explode(',',$event['vName']);
        $vName = $exp[0]; 

        $rssfeed .= '<item>';
        $rssfeed .= '<title>' . $event['eTitle'] . ' @ ' . $vName . '</title>';
        $rssfeed .= '<description>' . htmlentities('<a href="http://www.domain.com/event.php?eid=' . $event['id'] .'"><img WIDTH="150" HEIGHT="220" style="width:150px;height:220px;padding-bottom:10px;padding-right:10px;" src="http://'.$_SERVER['SERVER_NAME'].'/image.php?source='.urlencode('events/'.$event['folder'].'/images/default/' . $event['file_1']).'&w=150&h=220&out=raw"></a>' . '<p>' . $event['eDescr'] . '</div>') . '</description>';
        $rssfeed .= '<link>http://www.'.$_SERVER['SERVER_NAME'].'/events/' . urlencode($event['eCategory']) . '/' .urlencode($event['url']). '</link>';
        $rssfeed .= '<pubDate>' . date("D, d M Y H:i:s O", strtotime($event['dStart'] . ' ' . $event['tStart'])) . '</pubDate>';
        $rssfeed .= '</item>';
    }

    $rssfeed .= '</channel>';
    $rssfeed .= '</rss>';

    echo $rssfeed;

}

And here is a raw output:

<?xml version="1.0" encoding="utf-8" ?><rss version="2.0">
<channel>
<title>domain.com RSS feed</title>
<link>http://www.domai.com</link>
<description>Upcoming events</description>
<language>en-us</language><copyright>Copyright (C) 2012 domain.com</copyright>
<item>
<title>ΕΙΣΒΟΛΕΑΣ & EVERSOR - O μÏθος καταÏÏέει @ Gagarin 205 Live Music Space</title>
<description>&lt;a href=&quot;http://www.domain.com/event.php?eid=209&quot;&gt;&lt;img WIDTH=&quot;150&quot; HEIGHT=&quot;220&quot; style=&quot;width:150px;height:220px;padding-bottom:10px;padding-right:10px;&quot; src=&quot;http://www.comain.com/image.php?source=events%2F985d6bfa8e35df69471b1ecdb9ed187e%2Fimages%2Fdefault%2Feisvo.jpg&amp;w=150&amp;h=220&amp;out=raw&quot;&gt;&lt;/a&gt;&lt;p&gt;&lt;p&gt;&lt;span style=&quot;color: #333333; font-family: lucida grande, tahoma, verdana, arial, sans-serif; font-size: 13px; line-height: 16px;&quot;&gt;&amp;Epsilon;&amp;Iota;&amp;Sigma;&amp;Beta;&amp;Omicron;&amp;Lambda;&amp;Epsilon;&amp;Alpha;&amp;Sigma; &amp;amp; EVERSOR - &quot;&amp;Omicron; &amp;Mu;&amp;Upsilon;&amp;Theta;&amp;Omicron;&amp;Sigma; &amp;Kappa;&amp;Alpha;&amp;Tau;&amp;Alpha;&amp;Rho;&amp;Rho;&amp;Epsilon;&amp;Epsilon;&amp;Iota;&quot; TOUR LIVE @ &amp;Alpha;&amp;Theta;&amp;Eta;&amp;Nu;&amp;Alpha; (GAGARIN205), &amp;Sigma;&amp;Alpha;&amp;Beta; 22/12&lt;/span&gt;&lt;br style=&quot;color: #333333; font-family: lucida grande, tahoma, verdana, arial, sans-serif; font-size: 13px; line-height: 16px;&quot; /&gt;&lt;br style=&quot;color: #333333; font-family: lucida grande, tahoma, verdana, arial, sans-serif; font-size: 13px; line-height: 16px;&quot; /&gt;&lt;span style=&quot;color: #333333; font-family: lucida grande, tahoma, verdana, arial, sans-serif; font-size: 13px; line-height: 16px;&quot;&gt;doors open: 20.00&lt;/span&gt;&lt;br style=&quot;color: #333333; font-family: lucida grande, tahoma, verdana, arial, sans-serif; font-size: 13px; line-height: 16px;&quot; /&gt;&lt;span style=&quot;color: #333333; font-family: lucida grande, tahoma, verdana, arial, sans-serif; font-size: 13px; line-height: 16px;&quot;&gt;ticket price: 10e&lt;/span&gt;&lt;br style=&quot;color: #333333; font-family: lucida grande, tahoma, verdana, arial, sans-serif; font-size: 13px; line-height: 16px;&quot; /&gt;&lt;span style=&quot;color: #333333; font-family: lucida grande, tahoma, verdana, arial, sans-serif; font-size: 13px; line-height: 16px;&quot;&gt;guests: 12os Pithikos &amp;amp; Hatemost&lt;/span&gt;&lt;br style=&quot;color: #333333; font-family: lucida grande, tahoma, verdana, arial, sans-serif; font-size: 13px; line-height: 16px;&quot; /&gt;&lt;span style=&quot;color: #333333; font-family: lucida grande, tahoma, verdana, arial, sans-serif; font-size: 13px; line-height: 16px;&quot;&gt;opening: Gelws&lt;/span&gt;&lt;/p&gt;&lt;/div&gt;</description><link>http://www.www.domain.com/events/Hip-Hop+Rap/%CE%95%CE%99%CE%A3%CE%92%CE%9F%CE%9B%CE%95%CE%91%CE%A3-EVERSOR-%CE%9F-%CE%9C%CE%A5%CE%98%CE%9F%CE%A3-%CE%9A%CE%91%CE%A4%CE%91%CE%A1%CE%A1%CE%95%CE%95%CE%99-0</link>
<pubDate>Sat, 22 Dec 2012 20:00:00 +0200</pubDate>
</item>
</channel>
</rss>

As you can see the problem is at the item's title.

If anyone could point to a direction or something because I can't figure this one out. I thought by converting $event['eTitle'] encoding it would work but no luck.

EDIT: stored in db as TEXT utf8_general_ci

EDIT 2: this seems to work ->

utf8_encode(htmlentities($event['eTitle'],ENT_COMPAT,'utf-8'))

but on W3C validator I get this error: column 268: XML parsing error: :1:268: undefined entity

and here Is the highlighted section:

EVERSOR - O &mu;Ã\x8fÂ\x8d&theta;

\x8f and \x8d cause this error. But why?

  • 写回答

1条回答 默认 最新

  • dongshi1869 2012-12-15 09:48
    关注

    It's supposed to be: header('Content-Type: application/rss+xml; charset=UTF-8');, you are missing a semicolon. Your data is already UTF-8 which is evidenced by htmlentities working out &theta; when UTF-8 is specified. Since your data is already in UTF-8, utf8_encode will make it even worse.

    There is no need for any conversion, you should check if the proper header changes anything. Your raw output is correct, it's just that it's being interpreted as Windows-1252 instead of UTF-8.


    Btw, in XML &mu; and &theta; are undefined entities by default. This shows an example how to define entities, but is not really valid:
    <?xml version="1.0" encoding="utf-8" ?>
    <!DOCTYPE channel
    [
        <!ENTITY mu   "&#924;">
        <!ENTITY theta   "&#920;">
    ]>
    <channel>
    EVERSOR - O &mu;Ã\x8fÂ\x8d&theta;
    </channel>
    

    Nevertheless, it gets shown correctly in chrome and firefox, without undefined entity erros.

    This is just supplemental information, your raw data is correct in the first place so nothing need to be done.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 乌班图ip地址配置及远程SSH
  • ¥15 怎么让点阵屏显示静态爱心,用keiluVision5写出让点阵屏显示静态爱心的代码,越快越好
  • ¥15 PSPICE制作一个加法器
  • ¥15 javaweb项目无法正常跳转
  • ¥15 VMBox虚拟机无法访问
  • ¥15 skd显示找不到头文件
  • ¥15 机器视觉中图片中长度与真实长度的关系
  • ¥15 fastreport table 怎么只让每页的最下面和最顶部有横线
  • ¥15 java 的protected权限 ,问题在注释里
  • ¥15 这个是哪里有问题啊?