doushaju4901 2013-02-26 07:02
浏览 47
已采纳

正则表达式过滤表

ok well i have a table that gets outputted by some open source software but it does not get outputted in an actual table format eg

<table> 
  <thead>
     <td>Heading</td>
  <thead>
  <tbody>
    <tr>
       <td>Content</td>
    </tr>
  <tbody>
</table

Instead The people that developed the software decided that it would be a good a idea to output the table like so

+------------+-------------+-------+-------------+------------+---------------+----------+
| HEADING 1  | HEADING 2   | ETC   | ANOTHER     | HEADING3   | HEADING4     | SML |
+------------+-------------+-------+-------------+------------+---------------+----------+
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
+------------+-------------+-------+-------------+------------+--------------+----------+
| TOTALS        AGENTS:21  |  total|        total|       total|         total| total|
+------------+-------------+-------+-------------+------------+--------------+----------+

So i cant build a web scraper to get the Data or well im not shure if i could build a scraper to scrape that since its all wrapped inside one <pre> </pre> tag . So instead i have been trying to use ruby and Regex to try and get the job done so far i have managed to get all the leading |'s out and also i have managed to get the heading +-------+----- But only that far since it seems that i have to Repeat the pattern the whole time it doesnt want to repeat itself ok But enough talking for now Here is the Code i have used so far

text.lines.to_a.each do |line|
   line.sub(/^\| |^\+*-*\+*\-*/) do |match|
    puts "Regexp Match: " << match
end
STDIN.getc
puts "New Line "<< line
end

and for example the output for the first line would only be +-----------------+---------- it has be in CSV format so il use Gsub to replace the remaining |'s with ,'s

I can use PHP or Ruby so any answer is more than welcome

  • 写回答

4条回答 默认 最新

  • douqun1977 2013-02-26 09:41
    关注

    Here's a complete solution in ruby. You need to manually add a | to the last line, though.

    require 'builder'
    
    table = '+------------+-------------+-------+-------------+------------+---------------+----------+
    | HEADING 1  | HEADING 2   | ETC   | ANOTHER     | HEADING3   | HEADING4     | SML |
    +------------+-------------+-------+-------------+------------+---------------+----------+
    | content   | more content | cont  | More more   | content    | content 2.0  | litl |
    | content   | more content | cont  | More more   | content    | content 2.0  | litl |
    | content   | more content | cont  | More more   | content    | content 2.0  | litl |
    | content   | more content | cont  | More more   | content    | content 2.0  | litl |
    | content   | more content | cont  | More more   | content    | content 2.0  | litl |
    | content   | more content | cont  | More more   | content    | content 2.0  | litl |
    | content   | more content | cont  | More more   | content    | content 2.0  | litl |
    | content   | more content | cont  | More more   | content    | content 2.0  | litl |
    +------------+-------------+-------+-------------+------------+--------------+----------+
    | TOTALS        AGENTS:21  |  total|        total|       total|         total| total|
    +------------+-------------+-------+-------------+------------+--------------+----------+';
    
    def parse_table(table)
      rows = []
      table.each_line do |line|
        next if line.match /^\+/
        rows << line.split(/\s*\|\s*/).reject(&:empty?) 
      end
      rows
    end
    
    def html_row(xml, columns)
      xml.tr do
        columns.each do |column|
          xml.td column
        end
      end
    end
    
    def html_table(rows)
      head_row = rows.first
      body_rows = rows[1..-1]
    
      xml = Builder::XmlMarkup.new :indent => 2
      xml.table do
        xml.thead do
          html_row xml, head_row
        end
        xml.tbody do
          body_rows.each do |body_row|
            html_row xml, body_row
          end
        end
      end.to_s
    end
    
    
    rows = parse_table(table)
    html = html_table(rows)
    puts html
    

    Output:

    <table>
      <thead>
        <tr>
          <td>HEADING 1</td>
          <td>HEADING 2</td>
          <td>ETC</td>
          <td>ANOTHER</td>
          <td>HEADING3</td>
          <td>HEADING4</td>
          <td>SML</td>
        </tr>
      </thead>
      <tbody>
        <tr>
          <td>content</td>
          <td>more content</td>
          <td>cont</td>
          <td>More more</td>
          <td>content</td>
          <td>content 2.0</td>
          <td>litl</td>
        </tr>
        <tr>
          <td>content</td>
          <td>more content</td>
          <td>cont</td>
          <td>More more</td>
          <td>content</td>
          <td>content 2.0</td>
          <td>litl</td>
        </tr>
        <tr>
          <td>content</td>
          <td>more content</td>
          <td>cont</td>
          <td>More more</td>
          <td>content</td>
          <td>content 2.0</td>
          <td>litl</td>
        </tr>
        <tr>
          <td>content</td>
          <td>more content</td>
          <td>cont</td>
          <td>More more</td>
          <td>content</td>
          <td>content 2.0</td>
          <td>litl</td>
        </tr>
        <tr>
          <td>content</td>
          <td>more content</td>
          <td>cont</td>
          <td>More more</td>
          <td>content</td>
          <td>content 2.0</td>
          <td>litl</td>
        </tr>
        <tr>
          <td>content</td>
          <td>more content</td>
          <td>cont</td>
          <td>More more</td>
          <td>content</td>
          <td>content 2.0</td>
          <td>litl</td>
        </tr>
        <tr>
          <td>content</td>
          <td>more content</td>
          <td>cont</td>
          <td>More more</td>
          <td>content</td>
          <td>content 2.0</td>
          <td>litl</td>
        </tr>
        <tr>
          <td>content</td>
          <td>more content</td>
          <td>cont</td>
          <td>More more</td>
          <td>content</td>
          <td>content 2.0</td>
          <td>litl</td>
        </tr>
        <tr>
          <td>TOTALS        AGENTS:21</td>
          <td>total</td>
          <td>total</td>
          <td>total</td>
          <td>total</td>
          <td>total</td>
        </tr>
      </tbody>
    </table>
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(3条)

报告相同问题?

悬赏问题

  • ¥15 安卓adb backup备份应用数据失败
  • ¥15 eclipse运行项目时遇到的问题
  • ¥15 关于#c##的问题:最近需要用CAT工具Trados进行一些开发
  • ¥15 南大pa1 小游戏没有界面,并且报了如下错误,尝试过换显卡驱动,但是好像不行
  • ¥15 没有证书,nginx怎么反向代理到只能接受https的公网网站
  • ¥50 成都蓉城足球俱乐部小程序抢票
  • ¥15 yolov7训练自己的数据集
  • ¥15 esp8266与51单片机连接问题(标签-单片机|关键词-串口)(相关搜索:51单片机|单片机|测试代码)
  • ¥15 电力市场出清matlab yalmip kkt 双层优化问题
  • ¥30 ros小车路径规划实现不了,如何解决?(操作系统-ubuntu)