doushaju4901 2013-02-26 07:02
浏览 47
已采纳

正则表达式过滤表

ok well i have a table that gets outputted by some open source software but it does not get outputted in an actual table format eg

<table> 
  <thead>
     <td>Heading</td>
  <thead>
  <tbody>
    <tr>
       <td>Content</td>
    </tr>
  <tbody>
</table

Instead The people that developed the software decided that it would be a good a idea to output the table like so

+------------+-------------+-------+-------------+------------+---------------+----------+
| HEADING 1  | HEADING 2   | ETC   | ANOTHER     | HEADING3   | HEADING4     | SML |
+------------+-------------+-------+-------------+------------+---------------+----------+
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
+------------+-------------+-------+-------------+------------+--------------+----------+
| TOTALS        AGENTS:21  |  total|        total|       total|         total| total|
+------------+-------------+-------+-------------+------------+--------------+----------+

So i cant build a web scraper to get the Data or well im not shure if i could build a scraper to scrape that since its all wrapped inside one <pre> </pre> tag . So instead i have been trying to use ruby and Regex to try and get the job done so far i have managed to get all the leading |'s out and also i have managed to get the heading +-------+----- But only that far since it seems that i have to Repeat the pattern the whole time it doesnt want to repeat itself ok But enough talking for now Here is the Code i have used so far

text.lines.to_a.each do |line|
   line.sub(/^\| |^\+*-*\+*\-*/) do |match|
    puts "Regexp Match: " << match
end
STDIN.getc
puts "New Line "<< line
end

and for example the output for the first line would only be +-----------------+---------- it has be in CSV format so il use Gsub to replace the remaining |'s with ,'s

I can use PHP or Ruby so any answer is more than welcome

  • 写回答

4条回答 默认 最新

  • douqun1977 2013-02-26 09:41
    关注

    Here's a complete solution in ruby. You need to manually add a | to the last line, though.

    require 'builder'
    
    table = '+------------+-------------+-------+-------------+------------+---------------+----------+
    | HEADING 1  | HEADING 2   | ETC   | ANOTHER     | HEADING3   | HEADING4     | SML |
    +------------+-------------+-------+-------------+------------+---------------+----------+
    | content   | more content | cont  | More more   | content    | content 2.0  | litl |
    | content   | more content | cont  | More more   | content    | content 2.0  | litl |
    | content   | more content | cont  | More more   | content    | content 2.0  | litl |
    | content   | more content | cont  | More more   | content    | content 2.0  | litl |
    | content   | more content | cont  | More more   | content    | content 2.0  | litl |
    | content   | more content | cont  | More more   | content    | content 2.0  | litl |
    | content   | more content | cont  | More more   | content    | content 2.0  | litl |
    | content   | more content | cont  | More more   | content    | content 2.0  | litl |
    +------------+-------------+-------+-------------+------------+--------------+----------+
    | TOTALS        AGENTS:21  |  total|        total|       total|         total| total|
    +------------+-------------+-------+-------------+------------+--------------+----------+';
    
    def parse_table(table)
      rows = []
      table.each_line do |line|
        next if line.match /^\+/
        rows << line.split(/\s*\|\s*/).reject(&:empty?) 
      end
      rows
    end
    
    def html_row(xml, columns)
      xml.tr do
        columns.each do |column|
          xml.td column
        end
      end
    end
    
    def html_table(rows)
      head_row = rows.first
      body_rows = rows[1..-1]
    
      xml = Builder::XmlMarkup.new :indent => 2
      xml.table do
        xml.thead do
          html_row xml, head_row
        end
        xml.tbody do
          body_rows.each do |body_row|
            html_row xml, body_row
          end
        end
      end.to_s
    end
    
    
    rows = parse_table(table)
    html = html_table(rows)
    puts html
    

    Output:

    <table>
      <thead>
        <tr>
          <td>HEADING 1</td>
          <td>HEADING 2</td>
          <td>ETC</td>
          <td>ANOTHER</td>
          <td>HEADING3</td>
          <td>HEADING4</td>
          <td>SML</td>
        </tr>
      </thead>
      <tbody>
        <tr>
          <td>content</td>
          <td>more content</td>
          <td>cont</td>
          <td>More more</td>
          <td>content</td>
          <td>content 2.0</td>
          <td>litl</td>
        </tr>
        <tr>
          <td>content</td>
          <td>more content</td>
          <td>cont</td>
          <td>More more</td>
          <td>content</td>
          <td>content 2.0</td>
          <td>litl</td>
        </tr>
        <tr>
          <td>content</td>
          <td>more content</td>
          <td>cont</td>
          <td>More more</td>
          <td>content</td>
          <td>content 2.0</td>
          <td>litl</td>
        </tr>
        <tr>
          <td>content</td>
          <td>more content</td>
          <td>cont</td>
          <td>More more</td>
          <td>content</td>
          <td>content 2.0</td>
          <td>litl</td>
        </tr>
        <tr>
          <td>content</td>
          <td>more content</td>
          <td>cont</td>
          <td>More more</td>
          <td>content</td>
          <td>content 2.0</td>
          <td>litl</td>
        </tr>
        <tr>
          <td>content</td>
          <td>more content</td>
          <td>cont</td>
          <td>More more</td>
          <td>content</td>
          <td>content 2.0</td>
          <td>litl</td>
        </tr>
        <tr>
          <td>content</td>
          <td>more content</td>
          <td>cont</td>
          <td>More more</td>
          <td>content</td>
          <td>content 2.0</td>
          <td>litl</td>
        </tr>
        <tr>
          <td>content</td>
          <td>more content</td>
          <td>cont</td>
          <td>More more</td>
          <td>content</td>
          <td>content 2.0</td>
          <td>litl</td>
        </tr>
        <tr>
          <td>TOTALS        AGENTS:21</td>
          <td>total</td>
          <td>total</td>
          <td>total</td>
          <td>total</td>
          <td>total</td>
        </tr>
      </tbody>
    </table>
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(3条)

报告相同问题?

悬赏问题

  • ¥15 如何让企业微信机器人实现消息汇总整合
  • ¥50 关于#ui#的问题:做yolov8的ui界面出现的问题
  • ¥15 如何用Python爬取各高校教师公开的教育和工作经历
  • ¥15 TLE9879QXA40 电机驱动
  • ¥20 对于工程问题的非线性数学模型进行线性化
  • ¥15 Mirare PLUS 进行密钥认证?(详解)
  • ¥15 物体双站RCS和其组成阵列后的双站RCS关系验证
  • ¥20 想用ollama做一个自己的AI数据库
  • ¥15 关于qualoth编辑及缝合服装领子的问题解决方案探寻
  • ¥15 请问怎么才能复现这样的图呀