查找文本文件中发生的最大字符串

So I've seen questions asked before that are along the lines of finding the maximum occurence of a string within a file but all of those rely on knowing what to look for.

I have what you might almost call a flat file database that grabs a bunch of input data and basically wraps different parts of it in html span tags with referencing ids.

Each line comes out in this kind of fashion:

<p>
<span class="ip">58.106.**.***</span> 
Wrote <span class='text'>some text</span>
<span class='effect1'> and caused seizures </span>
<span class='time'>23:47</span> 
</p>

How would I then go about finding the #test contents that occurs the most times.

i.e if I had

<p>
    <span class="ip">58.106.**.***</span> 
    Wrote <span id='text'>woof</span>
    <span class='effect1'> and caused seizures </span>
    <span class='time'>23:47</span> 
    </p>

<p>
    <span class="ip">58.106.**.***</span> 
    Wrote <span class='text'>meow</span>
    <span class='effect1'> and caused mind-splosion </span>
    <span class='time'>23:47</span> 
    </p>

<p>
    <span class="ip">58.106.**.***</span> 
    Wrote <span class='text'>meow</span>
    <span class='effect1'> and used no effect </span>
    <span class='time'>23:47</span> 
    </p>

<p>
    <span class="ip">58.106.**.***</span> 
    Wrote <span class='text'>meow</span>
    <span class='effect1'> and used no effect </span>
    <span class='time'>23:47</span> 
    </p>

the output would be 'meow'.

How would I accomplish this in php?

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

2条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
duanqu9292 2012-04-21 14:25
关注
First off: Your format is not conducive to this type of data manipulation; you might want to consider changing it.

That said, based on this structure the logical solution would be to leverage DOMXPath as Dani says. This could have been problematic because of all the duplicate ids in there, but in practice it works (after emitting a boatload of warnings, which is one more reason that the data structure affords revision).

Here's some code to go with the idea:

$input = '<body>'.get_input().'</body>'; $doc = new DOMDocument; $doc->loadHTML($input); // lots of warnings, duplicate ids! $xpath = new DOMXPath($doc); $result = $xpath->query("//*[@id='text']/text()"); $occurrences = array(); foreach ($result as $item) { if (!isset($occurrences[$item->wholeText])) { $occurrences[$item->wholeText] = 0; } $occurrences[$item->wholeText]++; } // Sort the results and produce final answer arsort($occurrences); reset($occurrences); echo "The most common text is '".key($occurrences). "', which occurs ".current($occurrences)." times.";

See it in action.

Update (seeing as you fixed the duplicate id issue): You would simply change the xpath query to "//*[@class='text']/text()" so that it continues to match. However this way of doing things remains inefficient, so if one or more of these apply:

you are going to do this all the time

you have lots of data

you need it to be really fast

then changing the data format is a good idea.
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

查看更多回答(1条)

报告相同问题？

关注问题

jupyter中用find方法查找文件中字符串 python
2021-04-16 22:50

回答 2 已采纳 find方法查找指定子字符串的首次出现，前面要加上字符串。直接用find(...)会报没有定义错误。用in方法查找和判断更容易些。denglu()函数可以这样改： def denglu():
c++从文件中查找特定的字符串 c++
2017-03-28 10:35

回答 6 已采纳 powershell -c "systeminfo | Out-File -Encoding unicode 1.txt"
在字符串中查找某个字符 c语言
2021-12-04 17:37

回答 1 已采纳 #include "stdio.h" #define MAXLEN 80 int main( ) { int count,i,k,flag,sub; char cc,ch,oldch,st
搜索html文档中的字符串,jQuery如何通过文本查找元素？
2021-06-11 06:45

深度碎片的博客 jQuery如何通过文本查找元素？下面本篇文章给大家介绍一下使用jQuery通过文本查找元素的方法。有一定的参考价值，有需要的朋友可以参考一下，...该字符串可以是直接包含在元素中的文本，或者被包含于子元素中。最...
js查找空格在字符串中出现的位置，以及次数 javascript
2022-04-20 10:15

回答 4 已采纳 var a = "s f g gghdf fhs" undefined let indexs = [] undefined let count = 0 undefined for (let i =
查找dataframe指定位置是否包含特定字符串 python
2022-07-19 22:01

回答 2 已采纳 import pandas as pd data={'one':['xvks','vdvs, bmw','jydv, audi','rexe '], 'two':[325,854,276
请问NSIS怎样在一段信息中查找需要的字符串 javascript
2019-05-21 15:36

回答 2 已采纳是做安装程序的那个软件，后来找到include "textfunc.nsh"文件，可以使用
前端面试题18（js字符串特定内容查找方法）
2024-07-06 11:18

GIS-CL的博客在JavaScript中，有多种方法可以用来查找字符串中的特定内容。
python中文件——查找复制特定字符串 python
2021-11-10 14:23

回答 2 已采纳 n = eval(input()) with open('text.txt', 'r', encoding='utf8') as f: mark=True for i in range
vue项目中使用模版字符串不生效怎么回事 javascript vue.js 前端
2021-10-21 10:30

回答 4 已采纳哈喽，应该用v-html，有用请点采纳哦
查找两个字符串a,b中的最长公共子串 c语言
2021-11-11 20:33

回答 1 已采纳 //思路：动态规划经典问题，加一个start标记即可,注意将较短子串最先出现的那个输出 #include<iostream> #include<vector> #includ
matlab 数组中查找字符串长度,Matlab 之 字符串数组查找
2021-03-17 02:16

幸运的小金Angel的博客下面就介绍一下字符串数组查找的小技巧。字符串数组我通常会选择应用cell格式保存，下面的分析也是建立在这个前提下。【1】strcmp() 函数strcmp() 函数的基本功能是比较两个字符串是否相等，其基本用法是：TF = ...
delphi 字符串查找或者匹配的问题？
2018-08-05 14:07

回答 1 已采纳 ``` Arr : array[0..4] of WideString =( WideString('中国'), WideString('乌拉圭'), WideString('日本'),
jQuery xml字符串的解析、读取及查找方法
2020-10-22 19:13

总结来说，jQuery提供了一套简洁而强大的API来处理XML字符串，使得开发者能够更加方便地解析、读取和查找XML文档中的数据。通过对这些方法的深入理解和应用，可以显著提升Web开发的效率和功能实现的丰富性。
linux字符查找命令,linux查找文件或字符串的命令
2021-05-16 17:31

葡萄的眼泪的博客 1. linux下面用于查到的命令有哪些？是不是有很多呀，这个我还没...(1)如果在给定的文件中搜索某个字符串，直接grep “main” ./main.c即可；(2)如果你要搜索某个特定的字符串，而不确定这个字符串可能会在哪个文件...
没有解决我的问题, 去提问

悬赏问题

¥20 西南科技大学数字信号处理
¥15 有两个非常“自以为是”烦人的问题急期待大家解决！
¥30 STM32 INMP441无法读取数据
¥15 R语言绘制密度图，一个密度曲线内fill不同颜色如何实现
¥100 求汇川机器人IRCB300控制器和示教器同版本升级固件文件升级包
¥15 用visualstudio2022创建vue项目后无法启动
¥15 x趋于0时tanx-sinx极限可以拆开算吗
¥15 pyqt信号槽连接写法
¥500 把面具戴到人脸上，请大家贡献智慧，别用大模型回答，大模型的答案没啥用
¥15 任意一个散点图自己下载其js脚本文件并做成独立的案例页面，不要作在线的，要离线状态。

查找文本文件中发生的最大字符串

2条回答 默认 最新

悬赏问题

2条回答默认最新