在linux中进行XML解析，打印多个元素

So I found a script online for xml parsing in linux that I am wanting to use, and I was hoping to get some help with understanding how the script works, and how to edit it for my own use.

Here is the script (credit)

#!/bin/bash

cat $1 | awk '

START {    pos=1;    xml=$0    len=length(xml);    endp=1 }

{    while(pos <= len)    {
      if(substr(xml,pos,7) == "<title>")
      {
         pos=pos+7;
         endp=pos;
         while((substr(xml,endp,8) != "</title>") && (endp < len))
         {
            endp++;
         }
         print "   ",substr(xml,pos,endp-pos)," * ";
         pos=endp+7;
      }
      pos++;    } }'

Here is a simplified sample of the xml data I will be using

I have already gotten rid of the extra characters on both sides of the tags and made a few other adjustments by changing the script to this

  #!/bin/bash

    cat $1 | awk '

    START {    pos=1;    xml=$0    len=length(xml);    endp=1 }

    {    while(pos <= len)    {
          if(substr(xml,pos,16) == "<sport><![CDATA[")
          {
             pos=pos+16;
             endp=pos;
             while((substr(xml,endp,11) != "]]></sport>") && (endp < len))
             {
                endp++;
             }
             print "",substr(xml,pos,endp-pos),"";
             pos=endp+10;
          }
          pos++;    } }'

So using this script leaves me with a plain text file with this result

Women's Soccer
Men's Soccer
Women's Soccer

Ultimately I'd like to have a script output the following

Women's Soccer Away @ South Carolina (Exhibition) at 7:00 PM
Men's Soccer Home vs. Ohio State at 7:00 PM
Women's Soccer Away @ William and Mary at 7:00 PM

For those wondering, this is the shell that calls the parse script (ignore file names and locations)

wget -O rss.xml http://en-us.fxfeeds.mozilla.com/en-US/firefox/headlines.xml
        ~dsl/bin/rssparse! rss.xml > headlines_$$.tmp
        cd /tmp/ldmtrx
        split --lines=30 /tmp/headlines_$$.tmp ldmtrxnews
        cd /tmp
        rm headlines_$$.tmp rss.xml

While it would be greatly appreciated, I don't expect anyone to complete this task for me, I'd just really like some tips and help getting started. I'm not sure how to run this script on a different element and then print both elements (for example <sport> and <homeaway>) I could run the script again, but then the elements would be printed on different lines.

Lastly, I'd like to know how to exclude all data that does not have a <date> matching today's date. Thanks for your help.

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dongne1560 2012-08-07 22:03
关注
You must know that your example lacks of validation. It needs some tweaks

check this pastie instead of that pastie

then using xmlstarlet you can superseed all that this script does.

$ wget --output-document - http://pastie.org/pastes/4408130/download | xmlstarlet sel -t -m rss/channel/item -v sport -o ' Away @ ' -v opponent -o ' at ' -v time -na

That outputs:

Women's Soccer Away @ South Carolina (Exhibition) at 7:00 PM Men's Soccer Away @ Ohio State (Exhibition) at 7:00 PM Women's Soccer Away @ William and Mary at 7:00 PM

And when the output is what you need you can use -C with xmlstarlet to show an xml template you can source in any language you need that particular parsing.
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

在linux中进行XML解析，打印多个元素 bash linux php xml
2012-08-06 02:07

回答 1 已采纳 You must know that your example lacks of validation. It needs some tweaks check this pastie inste
从xml中解析出某个元素的xpath html5 python 开发语言
2021-01-11 10:18

回答 2 已采纳请问你看了网上一些技术博客了吗： https://www.cnblogs.com/yyds/p/6627208.html
如何在Go中使用各种元素解析巨大的XML文件？
2016-04-14 13:56

回答 4 已采纳 Use the standard xml Decoder. Call Token to read tokens one by one. When a start element of int
【实战项目】网络编程：在Linux环境下基于opencv和socket的人脸识别系统--C++实现
2024-03-11 09:00

SarPro的博客通过网络编程技术，该系统实现了在客户端摄像头捕获的图像数据经过人脸识别处理后，通过Socket传输到服务器端进行识别，并返回结果给客户端。该系统结合了图像处理、网络通信和人脸识别等技术，具有实用性和可扩展性...
springmvc+velocity 在spring-mvc.xml中配置视图解析器爆红 mvc spring
2021-07-15 14:35

回答 2 已采纳后缀解析器完全限定名不应该是org.springframework.web.servlet.view.InternalResourceViewResolver
发生什么情况，在Go Lang中解析XML后，我的代码无法显示结果？
2018-11-13 02:15

回答 1 已采纳 When unmarshal-ing, remove the XML namespace (xmlns) prefixes. e.g. type CustomerAndy struct {
XML文件解析中关于property标签的问题 java
2022-02-10 00:54

回答 3 已采纳 <property value="写在标签上的内容效果一样">内部打算再写点内容的用这个</property><property value="写在标签上的内容效果一样"
C++QT开发——Xml、Json解析
2022-11-14 20:07

程序员老舅的博客 C++QT开发——Xml、Json解析
mybatis如何在主配置文件中声明定义多个mapper.xml xml
2015-12-08 02:45

回答 6 已采纳 ``` ```
maven中的 setting.xml 可以配置多个么？ java maven
2017-07-11 01:22

回答 3 已采纳不只是要复制setting.xml,整个maven源码都要复制,放在两个不同的地方,然后不同的开发工具分别配置,设置不同的setting.xml,实现不同项目不同的本地库
C#解析XML文件，有一个bug问题。 c# xml
2018-07-27 05:14

回答 1 已采纳你的COMPU-INTERNAL-TO-PHYS节点下是COMPU-SCALES，但你在Children04里获取的是COMPU-INTERNAL-TO-PHYS下的节点数据，你的datagridvi
linux中xml中文乱码,XML编码utf-8有中文无法解析或乱码 C#
2021-05-13 20:13

胡説个球的博客表现为用ie或者infopath之类的xml软件打不开这个xml，txt打开有时正常有时乱码。当然C#也是无法解析的。但是用ultraedit打开正常显示，用ultraedit保存为utf8后xml就一切正常了。查询发现是bom的原因。什么是BOM呢？...
如何在mybatis的xml文件中，include另一个xml文件中的sql java 有问必答
2021-05-10 15:49

回答 6 已采纳必须要在 java 中定义一下，并且加上 @Mapper 注解才行。要不然 mybatis 是不会解析这个 .xml 文件的。
Java中有哪些解析XML的类库？有什么特点？
2020-02-04 21:57

ConstXiong的博客可以在层次结构中寻找特定信息需要加载整个文档、构造层次结构优点：可获取和操作 xml 任意部分的结构和数据缺点：需加载整个 XML 文档，消耗资源大 2. SAX(Simple API for XML) SAX 解析器基于事件的模型，...
编程语言分类及区别
2020-07-05 22:40

FierceALiang的博客 编程语言分类及区别一、按历代分类 编程语言分为三大类：机器语言、汇编语言、高级语言第一代语言（1GL）：机器语言特征：面向机器的指令，机器可以直接执行。第二代语言（2GL）：汇编语言特征：使用助记符...
没有解决我的问题, 去提问

悬赏问题

¥15 ROS Turtlebot3 多机协同自主探索环境时遇到的多机任务分配问题，explore节点
¥15 Matlab怎么求解含参的二重积分？
¥15 苹果手机突然连不上wifi了？
¥15 cgictest.cgi文件无法访问
¥20 删除和修改功能无法调用
¥15 kafka topic 所有分副本数修改
¥15 小程序中fit格式等运动数据文件怎样实现可视化？（包含心率信息））
¥15 如何利用mmdetection3d中的get_flops.py文件计算fcos3d方法的flops？
¥40 串口调试助手打开串口后,keil5的代码就停止了
¥15 电脑最近经常蓝屏，求大家看看哪的问题

在linux中进行XML解析，打印多个元素

1条回答 默认 最新

悬赏问题

1条回答默认最新