duanping3587 2012-09-30 05:19
浏览 64
已采纳

读取行并更改不包含某些单词但不以点结尾的行

I wanna read some text files in a folder line by line. for example of 1 txt :

Fast and Effective Text Mining Using Linear-time Document Clustering
Bjornar Larsen WORD2 Chinatsu Aone
SRA International AK, Inc.
4300 Fair Lakes Cow-l Fairfax, VA 22033

{bjornar-larsen, WORD1

I wanna remove line that does not contain of words = word, word2, word3, and does not end with dot .

so. from the example, the result will be :

Bjornar Larsen WORD2 Chinatsu Aone
SRA International, Inc.
{bjornar-larsen, WORD1

I am confused, hw to remove the line? it that possible? or can we replace them with a space?

here's the code :

$url = glob($savePath.'*.txt');
foreach ($url as $file => $files) {
    $handle = fopen($files, "r") or die ('can not open file');
    $ori_content= file_get_contents($files);
    foreach(preg_split("/((?
)|(
?))/", $ori_content) as $buffer){
        $pos1 = stripos($buffer, $word1);
        $pos2 = stripos($buffer, $word2);
        $pos3 = stripos($buffer, $word3);
        $last = $str[strlen($buffer)-1];//read the las character
        if (true !== $pos1 OR true !== $pos2 OR true !==$pos3 && $last != '.'){
        //how to remove
        }
    }
}

please help me, thank you so much :)

  • 写回答

5条回答 默认 最新

  • douhao3562 2012-09-30 05:28
    关注

    You're using a !== true comparison to test the return-value of the stripos. !== true means "is not absolutely equal-to the boolean value true". The return-value of stripos is numeric, unless the word doesn't exist, in which case it's false. In other words, your condition is always false.

    Try updating it to use === false instead. Also, you're using OR in between each; Your example shows that it needs to only contain 1 of them - so if you're checking that "none of them were found", you'll need to use && for everything:

    if (($pos1 === false) && ($pos2 === false) && ($pos3 === false) && ($last != '.'))
    

    Regarding "how to remove the line", you'll need to keep a list of all lines you want to keep. This means, we'll actually want to flip the condition above to use !== false and an || between everything (because we want to keep all lines that match any rule).

    Try something like this:

    $url = glob($savePath.'*.txt');
    foreach ($url as $file => $files) {
        $handle = fopen($files, "r") or die ('can not open file');
        $ori_content= file_get_contents($files);
        $linesToKeep = array(); // list of all lines that match our rules
        foreach(preg_split("/((?
    )|(
    ?))/", $ori_content) as $buffer){
            $pos1 = stripos($buffer, $word1);
            $pos2 = stripos($buffer, $word2);
            $pos3 = stripos($buffer, $word3);
            $last = $str[strlen($buffer)-1];
    
            if (($pos1 !== false) || ($pos2 !== false) || ($pos3 !== false) || ($last == '.')) {
                $linesToKeep[] = $buffer; // save this line
            }
        }
        // process list of lines for this file;
        // file_put_contents($files, join("
    ", $linesToKeep)); // write back to file
        // $lines = join("
    ", $linesToKeep); // convert to string to manipulate
    }
    

    Now, you'll have every line that matches your ruleset in the $linesToKeep array. You can convert this back to a string with $lines = join(" ", $linesToKeep);, or iterate through it and process it however you'd like.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(4条)

报告相同问题?

悬赏问题

  • ¥15 phython路径名过长报错 不知道什么问题
  • ¥15 深度学习中模型转换该怎么实现
  • ¥15 HLs设计手写数字识别程序编译通不过
  • ¥15 Stata外部命令安装问题求帮助!
  • ¥15 从键盘随机输入A-H中的一串字符串,用七段数码管方法进行绘制。提交代码及运行截图。
  • ¥15 TYPCE母转母,插入认方向
  • ¥15 如何用python向钉钉机器人发送可以放大的图片?
  • ¥15 matlab(相关搜索:紧聚焦)
  • ¥15 基于51单片机的厨房煤气泄露检测报警系统设计
  • ¥15 Arduino无法同时连接多个hx711模块,如何解决?