dongzhi2332 2014-12-20 19:20
浏览 30
已采纳

查找并保存大文本文件中2个特定短语之间的所有单词[关闭]

Im not a programmer or sth i just found this website suitable to ask my question so please try to help me like you are helping a beginner. (however i know a lil bit about c and php and html)

Here is my problem

I have saved the source of a web page in eg "source.txt" file, now i want to find all of the words in the text that are placed between <h4> and </h4>. i need a command to open "source.txt" then look for the words between that two phrase and save each word in different line and finally save them in eg "result.exe"

For example i have:

<h4>Barton Fink</h4></a>what is your name<br /><h4>Flyer123</h4></a>my name is pimp<br /><h4>mr.jaghi</h4></a>LoL<br />

And i want my output to be:

Barton Fink

Flyer 123

mr.jaghi

sure its easy do it manually in short codes but in my case its a long page and there is more than thousands of those words needed to be leeched

BTW im using windows platform pls show me a way using cmd if possible or if not tell me the easiest way

  • 写回答

1条回答 默认 最新

  • duanbei7005 2014-12-20 20:02
    关注

    Can be as follows, using regular expressions in PowerShell.

    [regex]::Matches((Get-Content source.txt), "<h4>(.+?)</h4>") | foreach{$_.Groups[1].Value} | OUt-File -FilePath "result.txt"
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥30 YOLO检测微调结果p为1
  • ¥20 求快手直播间榜单匿名采集ID用户名简单能学会的
  • ¥15 DS18B20内部ADC模数转换器
  • ¥15 做个有关计算的小程序
  • ¥15 MPI读取tif文件无法正常给各进程分配路径
  • ¥15 如何用MATLAB实现以下三个公式(有相互嵌套)
  • ¥30 关于#算法#的问题:运用EViews第九版本进行一系列计量经济学的时间数列数据回归分析预测问题 求各位帮我解答一下
  • ¥15 setInterval 页面闪烁,怎么解决
  • ¥15 如何让企业微信机器人实现消息汇总整合
  • ¥50 关于#ui#的问题:做yolov8的ui界面出现的问题