csdnceshi62
csdnceshi62
采纳率100%
2011-02-22 16:34

如何使用 GREP、 REGEX 或 PERL 提取模式下的字符串

已采纳

I have a file that looks something like this:

<table name="content_analyzer" primary-key="id">
  <type="global" />
</table>
<table name="content_analyzer2" primary-key="id">
  <type="global" />
</table>
<table name="content_analyzer_items" primary-key="id">
  <type="global" />
</table>

I need to extract anything within the quotes that follow "name=", i.e., content_analyzer , content_analyzer2 and content_analyzer_items.

I am doing this on a Linux box, so a solution using sed, perl, grep or bash is fine.

转载于:https://stackoverflow.com/questions/5080988/how-to-extract-string-following-a-pattern-with-grep-regex-or-perl

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享
  • 邀请回答

8条回答

  • csdnceshi66 必承其重 | 欲带皇冠 10年前

    Since you need to match content without including it in the result (must match name=" but it's not part of the desired result) some form of zero-width matching or group capturing is required. This can be done easily with the following tools:

    Perl

    With Perl you could use the n option to loop line by line and print the content of a capturing group if it matches:

    perl -ne 'print "$1\n" if /name="(.*?)"/' filename
    

    GNU grep

    If you have an improved version of grep, such as GNU grep, you may have the -P option available. This option will enable Perl-like regex, allowing you to use \K which is a shorthand lookbehind. It will reset the match position, so anything before it is zero-width.

    grep -Po 'name="\K.*?(?=")' filename
    

    The o option makes grep print only the matched text, instead of the whole line.

    Vim - Text Editor

    Another way is to use a text editor directly. With Vim, one of the various ways of accomplishing this would be to delete lines without name= and then extract the content from the resulting lines:

    :v/name=/d
    :%s/\v.*name\="([^"]+)".*/\1
    

    Standard grep

    If you don't have access to these tools, for some reason, something similar could be achieved with standard grep. However, without the look around it will require some cleanup later:

    grep -o 'name="[^"]*"' filename
    

    A note about saving results

    In all of the commands above the results will be sent to stdout. It's important to remember that you can always save them by piping it to a file by appending:

    > result
    

    to the end of the command.

    点赞 27 评论 复制链接分享
  • csdnceshi71 Memor.の 10年前

    Here's a solution using HTML tidy & xmlstarlet:

    htmlstr='
    <table name="content_analyzer" primary-key="id">
    <type="global" />
    </table>
    <table name="content_analyzer2" primary-key="id">
    <type="global" />
    </table>
    <table name="content_analyzer_items" primary-key="id">
    <type="global" />
    </table>
    '
    
    echo "$htmlstr" | tidy -q -c -wrap 0 -numeric -asxml -utf8 --merge-divs yes --merge-spans yes 2>/dev/null |
    sed '/type="global"/d' |
    xmlstarlet sel -N x="http://www.w3.org/1999/xhtml" -T -t -m "//x:table" -v '@name' -n
    
    点赞 8 评论 复制链接分享
  • csdnceshi62 csdnceshi62 10年前

    The regular expression would be:

    .+name="([^"]+)"
    

    Then the grouping would be in the \1

    点赞 7 评论 复制链接分享
  • weixin_41568110 七度&光 10年前

    Oops, the sed command has to precede the tidy command of course:

    echo "$htmlstr" | 
    sed '/type="global"/d' |
    tidy -q -c -wrap 0 -numeric -asxml -utf8 --merge-divs yes --merge-spans yes 2>/dev/null |
    xmlstarlet sel -N x="http://www.w3.org/1999/xhtml" -T -t -m "//x:table" -v '@name' -n
    
    点赞 5 评论 复制链接分享
  • csdnceshi68 local-host 10年前

    this could do it:

    perl -ne 'if(m/name="(.*?)"/){ print $1 . "\n"; }'
    
    点赞 5 评论 复制链接分享
  • csdnceshi78 程序go 10年前

    An HTML parser should be used for this purpose rather than regular expressions. A Perl program that makes use of HTML::TreeBuilder:

    Program

    #!/usr/bin/env perl
    
    use strict;
    use warnings;
    
    use HTML::TreeBuilder;
    
    my $tree = HTML::TreeBuilder->new_from_file( \*DATA );
    my @elements = $tree->look_down(
        sub { defined $_[0]->attr('name') }
    );
    
    for (@elements) {
        print $_->attr('name'), "\n";
    }
    
    __DATA__
    <table name="content_analyzer" primary-key="id">
      <type="global" />
    </table>
    <table name="content_analyzer2" primary-key="id">
      <type="global" />
    </table>
    <table name="content_analyzer_items" primary-key="id">
      <type="global" />
    </table>
    

    Output

    content_analyzer
    content_analyzer2
    content_analyzer_items
    
    点赞 3 评论 复制链接分享
  • csdnceshi78 程序go 10年前

    If you're using Perl, download a module to parse the XML: XML::Simple, XML::Twig, or XML::LibXML. Don't re-invent the wheel.

    点赞 3 评论 复制链接分享
  • csdnceshi56 lrony* 4年前

    If the structure of your xml (or text in general) is fixed, the easiest way is using cut. For your specific case:

    echo '<table name="content_analyzer" primary-key="id">
      <type="global" />
    </table>
    <table name="content_analyzer2" primary-key="id">
      <type="global" />
    </table>
    <table name="content_analyzer_items" primary-key="id">
      <type="global" />
    </table>' | grep name= | cut -f2 -d '"'
    
    点赞 评论 复制链接分享

相关推荐