dsjj15012 2017-02-20 04:38
浏览 65
已采纳

在不同语言的文件中搜索字符串 - PHP - UTF-8

I have read through many posts and tried many thing,

I have some monster files in a game server I am working on, The game is a korean game so a lot of the code words are in korean.

I am trying to get a line that starts with *아이템 followed by the string I am wanting. I set the default_encoding to UTF-8. I am able to find the string based on other bits in it but I want to exclude that *아이템 from my output,

sample for the code is:

ini_set("max_execution_time", 0);
    $monsdbconn = sqlsrv_connect("INSTANCE\SQLEXPRESS", array("Database" => "MonsDB", "UID" => "BLAH", "PWD"=> "BLAH"));
    $monsDir = realpath('C:/PT-Server/GameServer/Monster/');
    $monsters = new RecursiveDirectoryIterator($monsDir);

if (@$monsdbconn) {
    $clearit = "DELETE FROM monsdrops";
    if (sqlsrv_query($monsdbconn,$clearit)) {
        foreach($monsters as $name => $object){
            $monstername = "";  
            if (stripos($name, '.inf')){
                $monsterfile = file($name);
                $items = array("WA*", "WP*", "DA*", "WC*");
                foreach ($monsterfile as $monster) {
                    if (strstr($monster, "Name")) {
                        //things to remove from the string.
                        $monstrip = array("*Name",'"'); 

                        //Remove "" and *Name from the string
                        $monstername = str_replace($monstrip, "", $monster); 

                        //Remove spaces from start and end of string to prevent
                        //Duplicate entries, Will not remove space from between words.
                        $monstername = trim($monstername," "); // Space
                        $monstername = trim($monstername,"  "); // Tab
                    }
                    // THIS IS THE POINT IM SEARCHING FOR ITEMS AT THE MOMENT, BUT I NEED IT TO FIND THE KOREAN CHAR SET
                    if (preg_match("/\D{2}\d{3}/", $monster)) { 

                        $string = preg_split("/(\s)/", $monster);
                        foreach ($string as $line) {
                            if ((preg_match("/\D{2}\d{3}/", $line)) && ((stripos($line, "name\\") === false) || stripos($line, ".zhoon") === false)) {
                                $sqlinsert = "INSERT INTO monsdrops ([monstername],[monsterdrops]) VALUES ('$monstername', '$line')";
                                $insert = sqlsrv_query($monsdbconn, $sqlinsert);
                                if ($insert) {
                                    echo "Insert $monstername, $line Successful! <br />";       
                                } else {
                                    echo "<br />Insert Failed! <br />";
                                    print_r(sqlsrv_errors());
                                }
                            }
                        }
                    }       
                }

            }
        }
    } else {
        echo "Unable To Clear DB";
    }
} else {
    echo "Unable to connect to DB";
}
@sqlsrv_close($monsdbconn);

however it cannot find the characters, If I pick another part of the line and echo it, the characters show (since I set the default_encoding) but it cannot find it, and its painful as there are many trigger words in the list that I wish to find that are in korean.

Thanks in advance.

example of the file would be :

*아이템 5000 ec101 db120 da120 dg120 

the ec101 etc is what I am trying to pilfer.

have tried mb_stripos unsuccessfully, and tried again with the code supplied below to no avail. it just doesn't find the text, however if I set it to find ec101 it will, but i can't guarantee that will be in the line so I used the preg_match but that only works for the drops, it wont work for all the other bits of information I am trying to find from the files

  • 写回答

1条回答 默认 最新

  • douyu0725 2017-02-20 04:59
    关注

    stripos() is not multibyte compatible. Instead you should use mb_stripos() which should work better for you. Also note that you need to check explicitly for a false result. A result of zero can also be interpreted as false.

    $file = "c:\server\monster.inf";
    $lines = file($file);
    foreach ($lines as $line) {
        // convert to Unicode standard
        $line = mb_convert_encoding($line, "UTF-8", "EUC-KR");
        if (mb_stripos($line, "*아이템") !== false) {
            echo "$line
    ";
        }
    }
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥50 有数据,怎么建立模型求影响全要素生产率的因素
  • ¥50 有数据,怎么用matlab求全要素生产率
  • ¥15 TI的insta-spin例程
  • ¥15 完成下列问题完成下列问题
  • ¥15 C#算法问题, 不知道怎么处理这个数据的转换
  • ¥15 YoloV5 第三方库的版本对照问题
  • ¥15 请完成下列相关问题!
  • ¥15 drone 推送镜像时候 purge: true 推送完毕后没有删除对应的镜像,手动拷贝到服务器执行结果正确在样才能让指令自动执行成功删除对应镜像,如何解决?
  • ¥15 求daily translation(DT)偏差订正方法的代码
  • ¥15 js调用html页面需要隐藏某个按钮