dssnh86244 2017-06-15 11:25 采纳率: 100%
浏览 46
已采纳

在动态结构元素上进行encoding / xml解组

I'm working with epubs using Golang, I have to fetch the cover image from cover.xhtml file (or whatever file it is mentioned in .opf file).

My problem is with dynamic structure of elements in the Cover.xhtml files.

Each epubs has different structure on the Cover.xhtml file. For example,

<body>
    <figure id="cover-image">
        <img src="covers/9781449328030_lrg.jpg" alt="First Edition" />
    </figure>
</body>

Another epub cover.xhtml file

<body>
    <div>
        <img src="@public@vhost@g@gutenberg@html@files@54869@54869-h@images@cover.jpg" alt="Cover" />
    </div>
</body>

I need to fetch the img tag's src attribute from this file. But I couldn't do it.

Here is the part of my Code that deals with unmarshalling the cover.xhtml file

type CPSRCS struct {
    Src string `xml:"src,attr"`
}

type CPIMGS struct {
    Image CPSRCS `xml:"img"`
}

XMLContent, err = ioutil.ReadFile("./uploads/moby-dick/OPS/cover.xhtml")
CheckError(err)

coverFile := CPIMGS{}
err = xml.Unmarshal(XMLContent, &coverFile)
CheckError(err)
fmt.Println(coverFile)

The output is:

{{}}

The output I'm expecting is:

{{covers/9781449328030_lrg.jpg}}

Thanks in advance!

  • 写回答

1条回答 默认 最新

  • doubianxian6557 2017-06-15 12:35
    关注

    This will pull out the img element from the read in file and then unmarshal the src attribute from the element. This is making the assumption that you will only ever need to grab the first img element from the file.

    XMLContent, err = ioutil.ReadFile("./uploads/moby-dick/OPS/cover.xhtml")
    CheckError(err)
    
    //Parse the XMLContent to grab just the img element
    strContent := string(XMLContent)
    imgLoc := strings.Index(strContent, "<img")
    prefixRem := strContent[imgLoc:]
    endImgLoc := strings.Index(prefixRem, "/>")
    //Move over by 2 to recover the '/>'
    trimmed := prefixRem[:endImgLoc+2]
    
    var coverFile CPSRCS
    err = xml.Unmarshal([]byte(trimmed), &coverFile)
    CheckError(err)
    fmt.Println(coverFile)
    

    This will produce the result of {covers/9781449328030_lrg.jpg} for the first input file and {@public@vhost@g@gutenberg@html@files@54869@54869-h@images@cover.jpg} for the second input file you provided.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 做个有关计算的小程序
  • ¥15 MPI读取tif文件无法正常给各进程分配路径
  • ¥15 如何用MATLAB实现以下三个公式(有相互嵌套)
  • ¥30 关于#算法#的问题:运用EViews第九版本进行一系列计量经济学的时间数列数据回归分析预测问题 求各位帮我解答一下
  • ¥15 setInterval 页面闪烁,怎么解决
  • ¥15 如何让企业微信机器人实现消息汇总整合
  • ¥50 关于#ui#的问题:做yolov8的ui界面出现的问题
  • ¥15 如何用Python爬取各高校教师公开的教育和工作经历
  • ¥15 TLE9879QXA40 电机驱动
  • ¥20 对于工程问题的非线性数学模型进行线性化