ok well i have a table that gets outputted by some open source software but it does not get outputted in an actual table format eg
<table>
<thead>
<td>Heading</td>
<thead>
<tbody>
<tr>
<td>Content</td>
</tr>
<tbody>
</table
Instead The people that developed the software decided that it would be a good a idea to output the table like so
+------------+-------------+-------+-------------+------------+---------------+----------+
| HEADING 1 | HEADING 2 | ETC | ANOTHER | HEADING3 | HEADING4 | SML |
+------------+-------------+-------+-------------+------------+---------------+----------+
| content | more content | cont | More more | content | content 2.0 | litl |
| content | more content | cont | More more | content | content 2.0 | litl |
| content | more content | cont | More more | content | content 2.0 | litl |
| content | more content | cont | More more | content | content 2.0 | litl |
| content | more content | cont | More more | content | content 2.0 | litl |
| content | more content | cont | More more | content | content 2.0 | litl |
| content | more content | cont | More more | content | content 2.0 | litl |
| content | more content | cont | More more | content | content 2.0 | litl |
+------------+-------------+-------+-------------+------------+--------------+----------+
| TOTALS AGENTS:21 | total| total| total| total| total|
+------------+-------------+-------+-------------+------------+--------------+----------+
So i cant build a web scraper to get the Data or well im not shure if i could build a scraper to scrape that since its all wrapped inside one <pre> </pre>
tag . So instead i have been trying to use ruby and Regex to try and get the job done so far i have managed to get all the leading |
's out and also i have managed to get the heading +-------+-----
But only that far since it seems that i have to Repeat the pattern the whole time it doesnt want to repeat itself ok But enough talking for now Here is the Code i have used so far
text.lines.to_a.each do |line|
line.sub(/^\| |^\+*-*\+*\-*/) do |match|
puts "Regexp Match: " << match
end
STDIN.getc
puts "New Line "<< line
end
and for example the output for the first line would only be +-----------------+----------
it has be in CSV format so il use Gsub
to replace the remaining |
's with ,
's
I can use PHP or Ruby so any answer is more than welcome