dt3999 2018-08-03 06:56
浏览 68
已采纳

如何解析任意长度的文件?

I have a text file that I'd like to parse with records like this:

===================
name: John Doe
Education: High School Diploma
Education: Bachelor's Degree
Education: Sun Java Certified Programmer
Age: 29
===================
name: Bob Bear
Education: High School Diploma
Age: 18
===================
name: Jane Doe
Education: High School Diploma
Education: Bachelor's Degree
Education: Master's Degree
Education: AWS Certified Solution Architect Professional
Age: 25

As you can see, the fields in such a text file are fixed, but some of them repeat an arbitrary number of times. The records are separated by a fixed length ==== delimiter.

How would I write parsing logic this this sort of problem? I am think of using switch as it reads the start of the line, but the logic to handle multiple repeating fields baffles me.

  • 写回答

2条回答 默认 最新

  • du229908 2018-08-03 18:52
    关注

    A good way to approach this sort of problem is to "divide and conquer". That is, divide the overall problem into smaller sub-problems which are easier to manage and then solve each them individually. If you've planned properly then when you've finished each of the sub-problems you should have solved the whole problem.

    Start by thinking about modeling. The document appears to contain a list of records, what should those records be called? What named fields should the records contain and what types should they have? How would you represent them idiomatically in go? For example, you might decide to call each record a Person with fields as such:

    type Person struct {
        Name        string
        Credentials []string
        Age         int
    }
    

    Next, think about what the interface (signature) of your parse function should look like. Should it emit an array of people? Should it use a visitor pattern and emit a person as soon as it's parsed? What constraints should drive the answer? Are memory or compute time constraints important? Does the user of the parser want any control over the parsing work such as canceling? Do they need metadata such as the total number of records contained in the document? Will the input always be from a file or a string, maybe from an HTTP request or a network socket? How will these choices drive your design?

    func ParsePeople(string) ([]Person, error) // ?
    func ParsePeople(io.Reader) ([]Person, error) // ?
    func ParsePeople(io.Reader, func visitor(Person) bool) error // ?
    

    Finally you can implement your parser to fulfill the interface that you've decided on. A straightforward approach here would be to read the input file line-by-line and take an action according to the contents of the line. For example (in pseudocode):

    forEach line = inputFile.line
      if line is a separator
        emit or store the last parsed person, if present
        create a new person to store parsed fields
      else if line is a data field
        parse the data
        update the person with the parsed data
      end
    end
    return the parsed records or final record, if emitting
    

    Each line of pseudocode above represents a sub-problem that should be easier to solve than the whole.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥20 matlab计算中误差
  • ¥15 对于相关问题的求解与代码
  • ¥15 ubuntu子系统密码忘记
  • ¥15 信号傅里叶变换在matlab上遇到的小问题请求帮助
  • ¥15 保护模式-系统加载-段寄存器
  • ¥15 电脑桌面设定一个区域禁止鼠标操作
  • ¥15 求NPF226060磁芯的详细资料
  • ¥15 使用R语言marginaleffects包进行边际效应图绘制
  • ¥20 usb设备兼容性问题
  • ¥15 错误(10048): “调用exui内部功能”库命令的参数“参数4”不能接受空数据。怎么解决啊