dongzhi2332 2014-12-20 19:20
浏览 30
已采纳

查找并保存大文本文件中2个特定短语之间的所有单词[关闭]

Im not a programmer or sth i just found this website suitable to ask my question so please try to help me like you are helping a beginner. (however i know a lil bit about c and php and html)

Here is my problem

I have saved the source of a web page in eg "source.txt" file, now i want to find all of the words in the text that are placed between <h4> and </h4>. i need a command to open "source.txt" then look for the words between that two phrase and save each word in different line and finally save them in eg "result.exe"

For example i have:

<h4>Barton Fink</h4></a>what is your name<br /><h4>Flyer123</h4></a>my name is pimp<br /><h4>mr.jaghi</h4></a>LoL<br />

And i want my output to be:

Barton Fink

Flyer 123

mr.jaghi

sure its easy do it manually in short codes but in my case its a long page and there is more than thousands of those words needed to be leeched

BTW im using windows platform pls show me a way using cmd if possible or if not tell me the easiest way

  • 写回答

1条回答 默认 最新

  • duanbei7005 2014-12-20 20:02
    关注

    Can be as follows, using regular expressions in PowerShell.

    [regex]::Matches((Get-Content source.txt), "<h4>(.+?)</h4>") | foreach{$_.Groups[1].Value} | OUt-File -FilePath "result.txt"
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 目详情-五一模拟赛详情页
  • ¥15 有了解d3和topogram.js库的吗?有偿请教
  • ¥100 任意维数的K均值聚类
  • ¥15 stamps做sbas-insar,时序沉降图怎么画
  • ¥15 买了个传感器,根据商家发的代码和步骤使用但是代码报错了不会改,有没有人可以看看
  • ¥15 关于#Java#的问题,如何解决?
  • ¥15 加热介质是液体,换热器壳侧导热系数和总的导热系数怎么算
  • ¥100 嵌入式系统基于PIC16F882和热敏电阻的数字温度计
  • ¥15 cmd cl 0x000007b
  • ¥20 BAPI_PR_CHANGE how to add account assignment information for service line