doumei1926 2012-08-09 06:11
浏览 35
已采纳

计算多个文件中的单词频率

<?php



$filename = "largefile.txt";



/* get content of $filename in $content */

$content = strtolower(file_get_contents($filename));



/* split $content into array of substrings of $content i.e wordwise */

$wordArray = preg_split('/[^a-z]/', $content, -1, PREG_SPLIT_NO_EMPTY);



/* "stop words", filter them */

$filteredArray = array_filter($wordArray, function($x){

return !preg_match("/^(.|a|an|and|the|this|at|in|or|of|is|for|to)$/",$x);

});



/* get associative array of values from $filteredArray as keys and their frequency count as value */

$wordFrequencyArray = array_count_values($filteredArray);



/* Sort array from higher to lower, keeping keys */

arsort($wordFrequencyArray);

This is my code i have implemented to find out the frequency of distinct words in a file. This is working.

Now what i want to do is, Let suppose there be 10 text files.I want to count the word frequency of a word in all the 10 files i.e if i want to find frequency of word "stack" in all the 10 files that is how many times the word stack appears in all the files.And then would do it for all the distinct words.

I have done it for a single file but cannot thnk of how to extend it to multiple files. THanks for help and sorry for my bad english

  • 写回答

1条回答 默认 最新

  • duansengcha9114 2012-08-09 06:18
    关注

    Put what you've got into a function & call it for each filename in an array using a foreach loop:

    <?php
    
    $wordFrequencyArray = array();
    
    function countWords($file) use($wordFrequencyArray) {
        /* get content of $filename in $content */
        $content = strtolower(file_get_contents($filename));
    
        /* split $content into array of substrings of $content i.e wordwise */
        $wordArray = preg_split('/[^a-z]/', $content, -1, PREG_SPLIT_NO_EMPTY);
    
        /* "stop words", filter them */
        $filteredArray = array_filter($wordArray, function($x){
            return !preg_match("/^(.|a|an|and|the|this|at|in|or|of|is|for|to)$/",$x);
        });
    
        /* get associative array of values from $filteredArray as keys and their frequency count as value */
        foreach (array_count_values($filteredArray) as $word => $count) {
            if (!isset($wordFrequencyArray[$word])) $wordFrequencyArray[$word] = 0;
            $wordFrequencyArray[$word] += $count;
        }
    }
    $filenames = array('file1.txt', 'file2.txt', 'file3.txt', 'file4.txt' ...);
    foreach ($filenames as $file) {
        countWords($file);
    }
    
    print_r($wordFrequencyArray);
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 Python中的request,如何使用ssr节点,通过代理requests网页。本人在泰国,需要用大陆ip才能玩网页游戏,合法合规。
  • ¥100 为什么这个恒流源电路不能恒流?
  • ¥15 有偿求跨组件数据流路径图
  • ¥15 写一个方法checkPerson,入参实体类Person,出参布尔值
  • ¥15 我想咨询一下路面纹理三维点云数据处理的一些问题,上传的坐标文件里是怎么对无序点进行编号的,以及xy坐标在处理的时候是进行整体模型分片处理的吗
  • ¥15 CSAPPattacklab
  • ¥15 一直显示正在等待HID—ISP
  • ¥15 Python turtle 画图
  • ¥15 stm32开发clion时遇到的编译问题
  • ¥15 lna设计 源简并电感型共源放大器