正则表达式过滤表

ok well i have a table that gets outputted by some open source software but it does not get outputted in an actual table format eg

<table> 
  <thead>
     <td>Heading</td>
  <thead>
  <tbody>
    <tr>
       <td>Content</td>
    </tr>
  <tbody>
</table

Instead The people that developed the software decided that it would be a good a idea to output the table like so

+------------+-------------+-------+-------------+------------+---------------+----------+
| HEADING 1  | HEADING 2   | ETC   | ANOTHER     | HEADING3   | HEADING4     | SML |
+------------+-------------+-------+-------------+------------+---------------+----------+
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
+------------+-------------+-------+-------------+------------+--------------+----------+
| TOTALS        AGENTS:21  |  total|        total|       total|         total| total|
+------------+-------------+-------+-------------+------------+--------------+----------+

So i cant build a web scraper to get the Data or well im not shure if i could build a scraper to scrape that since its all wrapped inside one <pre> </pre> tag . So instead i have been trying to use ruby and Regex to try and get the job done so far i have managed to get all the leading |'s out and also i have managed to get the heading +-------+----- But only that far since it seems that i have to Repeat the pattern the whole time it doesnt want to repeat itself ok But enough talking for now Here is the Code i have used so far

text.lines.to_a.each do |line|
   line.sub(/^\| |^\+*-*\+*\-*/) do |match|
    puts "Regexp Match: " << match
end
STDIN.getc
puts "New Line "<< line
end

and for example the output for the first line would only be +-----------------+---------- it has be in CSV format so il use Gsub to replace the remaining |'s with ,'s

I can use PHP or Ruby so any answer is more than welcome

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

4条回答默认最新

douqun1977 2013-02-26 09:41

关注

Here's a complete solution in ruby. You need to manually add a | to the last line, though.

require 'builder'

table = '+------------+-------------+-------+-------------+------------+---------------+----------+
| HEADING 1  | HEADING 2   | ETC   | ANOTHER     | HEADING3   | HEADING4     | SML |
+------------+-------------+-------+-------------+------------+---------------+----------+
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
| content   | more content | cont  | More more   | content    | content 2.0  | litl |
+------------+-------------+-------+-------------+------------+--------------+----------+
| TOTALS        AGENTS:21  |  total|        total|       total|         total| total|
+------------+-------------+-------+-------------+------------+--------------+----------+';

def parse_table(table)
  rows = []
  table.each_line do |line|
    next if line.match /^\+/
    rows << line.split(/\s*\|\s*/).reject(&:empty?) 
  end
  rows
end

def html_row(xml, columns)
  xml.tr do
    columns.each do |column|
      xml.td column
    end
  end
end

def html_table(rows)
  head_row = rows.first
  body_rows = rows[1..-1]

  xml = Builder::XmlMarkup.new :indent => 2
  xml.table do
    xml.thead do
      html_row xml, head_row
    end
    xml.tbody do
      body_rows.each do |body_row|
        html_row xml, body_row
      end
    end
  end.to_s
end


rows = parse_table(table)
html = html_table(rows)
puts html

Output:

<table>
  <thead>
    <tr>
      <td>HEADING 1</td>
      <td>HEADING 2</td>
      <td>ETC</td>
      <td>ANOTHER</td>
      <td>HEADING3</td>
      <td>HEADING4</td>
      <td>SML</td>
    </tr>
  </thead>
  <tbody>
    <tr>
      <td>content</td>
      <td>more content</td>
      <td>cont</td>
      <td>More more</td>
      <td>content</td>
      <td>content 2.0</td>
      <td>litl</td>
    </tr>
    <tr>
      <td>content</td>
      <td>more content</td>
      <td>cont</td>
      <td>More more</td>
      <td>content</td>
      <td>content 2.0</td>
      <td>litl</td>
    </tr>
    <tr>
      <td>content</td>
      <td>more content</td>
      <td>cont</td>
      <td>More more</td>
      <td>content</td>
      <td>content 2.0</td>
      <td>litl</td>
    </tr>
    <tr>
      <td>content</td>
      <td>more content</td>
      <td>cont</td>
      <td>More more</td>
      <td>content</td>
      <td>content 2.0</td>
      <td>litl</td>
    </tr>
    <tr>
      <td>content</td>
      <td>more content</td>
      <td>cont</td>
      <td>More more</td>
      <td>content</td>
      <td>content 2.0</td>
      <td>litl</td>
    </tr>
    <tr>
      <td>content</td>
      <td>more content</td>
      <td>cont</td>
      <td>More more</td>
      <td>content</td>
      <td>content 2.0</td>
      <td>litl</td>
    </tr>
    <tr>
      <td>content</td>
      <td>more content</td>
      <td>cont</td>
      <td>More more</td>
      <td>content</td>
      <td>content 2.0</td>
      <td>litl</td>
    </tr>
    <tr>
      <td>content</td>
      <td>more content</td>
      <td>cont</td>
      <td>More more</td>
      <td>content</td>
      <td>content 2.0</td>
      <td>litl</td>
    </tr>
    <tr>
      <td>TOTALS        AGENTS:21</td>
      <td>total</td>
      <td>total</td>
      <td>total</td>
      <td>total</td>
      <td>total</td>
    </tr>
  </tbody>
</table>

本回答被题主选为最佳回答 , 对您是否有帮助呢?

查看更多回答(3条)

报告相同问题？

关注问题

正则表达式过滤非缩小文件 php
2013-12-17 22:08

回答 1 已采纳 You can use lookbehind: preg_match('@(?<!\.min)\.js$@', $path); Here (?<!\.min) is negati
用于选择性剥离HTML的正则表达式 html php
2010-12-08 22:44

回答 3 已采纳 Using DOMDocument, you can try something like this: $doc = new DOMDocument; $doc->loadHTMLFile
需要正则表达式来过滤掉.ru和其他垃圾邮件地址 php
2013-11-27 12:14

回答 2 已采纳 You can match multiple domain zones (or just endings) with following code: $endings = array('\.ru
PHP使用正则表达式实现过滤非法字符串功能示例
2020-10-18 11:37

本示例主要展示了如何使用PHP的正则表达式功能来过滤非法字符串，特别是针对留言板数据提交的情况，防止恶意用户输入可能破坏系统或者安全的数据。首先，我们需要了解PHP中的`preg_replace`函数。`preg_replace`是...
正则表达式匹配除标点符号之外的任何UTF字符 php
2009-04-12 09:21

回答 2 已采纳 I think that for SEO needs you should stick to ASCII characters in the URL. In theory, many more
求大神帮忙写一个正则表达式php过滤编辑器的新闻内容
2016-05-26 14:23

回答 2 已采纳 ``` $str=preg_replace("/\s+/", " ", $str); //过滤多余回车 $str=preg_replace("//si","",$str); //注释 $
正则表达式匹配函数，这样写会不会误报率太高？ php 安全性测试
2023-03-16 09:26

回答 4 已采纳你肯定要首先对要过滤的目标文字有一个正确的预期全文是什么，而你要找的又是什么根据特性去过滤掉不要的部分，只保留需要的部分你都不知道要找的文字是个什么样的，那光去按单词搜索，还写什么正则，直接ctrl
php过滤HTML标签、属性等正则表达式汇总
2020-10-25 10:28

主要介绍了php过滤HTML标签、属性等正则表达式汇总,本文使用代码实例给出了过滤HTML内容的正则表达式,具体说明请参阅代码中的注释,本文对使用PHP做采集的朋友有比较大的作用,需要的朋友可以参考下
PHP正则表达式删除空P标签 php
2011-02-08 11:08

回答 1 已采纳 $str = '[a]asd[/a] [b][/b][c][/c]'; var_dump(preg_replace('~\[([^\]]+)\]\[/\\1\]~', '', $str));
正则表达式过滤 - 模仿Expression Engine的模板解析 php
2009-10-14 10:30

回答 2 已采纳 preg_match('~{(\w+)}(.+?){/\1}~s', $r, $m); content will be in $m[2]. this won't handle nestin
用于在PHP中过滤文件名的正则表达式 php
2011-04-21 19:43

回答 3 已采纳 If I interpret it correctly, you probably just want something like: /[.](flv|mp4|wmv)$/ The $ e
PHP防止SQL注入与几种正则表达式讲解
2020-12-15 00:45

php function customerror($errno， $errstr， $errfile， $errline) { echo <b>error number:</b> [$errno]，error on line $errline in $errfile ; die(); } set_error_handler(customerror，e_error); ...
php 正则表达式 过滤,PHP正则表达式过滤常用标签 - 米扑博客
2021-04-12 18:59

Machinery Ly的博客 PHP 正则表达式语法，请见米扑博客之前总结的：PHP 正则表达式本文直接给出PHP的正则表达式过滤常用标签$str = preg_replace("/\s+/", " ", $str); //过滤多余回车$str = preg_replace("/$str = preg_replace("//si...
PHP正则表达式过滤html标签属性(DEMO)
2020-10-22 11:06

以下是PHP正则表达式过滤HTML标签属性的详细介绍和实例说明。 1. 基本过滤概念使用正则表达式过滤HTML标签属性是基于规则的字符串匹配技术。它允许开发者指定特定模式，然后对文本进行查找和替换。PHP中的`preg_...
php中常见的sql攻击正则表达式汇总
2020-12-18 17:50

本文实例讲述了php中常见的sql攻击正则表达式。分享给大家供大家参考。具体分析如下：我们都已经知道，在MYSQL 5+中 information_schema库中存储了所有的库名，表明以及字段名信息。故攻击方式如下： 1. 判断第一...
没有解决我的问题, 去提问

悬赏问题

¥15 如何让企业微信机器人实现消息汇总整合
¥50 关于#ui#的问题：做yolov8的ui界面出现的问题
¥15 如何用Python爬取各高校教师公开的教育和工作经历
¥15 TLE9879QXA40 电机驱动
¥20 对于工程问题的非线性数学模型进行线性化
¥15 Mirare PLUS 进行密钥认证？（详解）
¥15 物体双站RCS和其组成阵列后的双站RCS关系验证
¥20 想用ollama做一个自己的AI数据库
¥15 关于qualoth编辑及缝合服装领子的问题解决方案探寻
¥15 请问怎么才能复现这样的图呀

码龄粉丝数原力等级 --

正则表达式过滤表

4条回答默认最新

码龄粉丝数原力等级 --

悬赏问题

正则表达式过滤表

4条回答 默认 最新

悬赏问题

4条回答默认最新