weixin_41568183
零零乙
2009-06-05 00:04
采纳率: 50%
浏览 572

在 Perl 中,我如何将整个文件读入字符串?

I'm trying to open an .html file as one big long string. This is what I've got:

open(FILE, 'index.html') or die "Can't read file 'filename' [$!]\n";  
$document = <FILE>; 
close (FILE);  
print $document;

which results in:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN

However, I want the result to look like:

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"
"http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">

This way I can search the entire document more easily.

转载于:https://stackoverflow.com/questions/953707/in-perl-how-can-i-read-an-entire-file-into-a-string

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 邀请回答

17条回答 默认 最新

  • csdnceshi71
    Memor.の 2009-06-05 00:18
    已采纳

    Add:

     local $/;
    

    before reading from the file handle. See How can I read in an entire file all at once?, or

    $ perldoc -q "entire file"

    See Variables related to filehandles in perldoc perlvar and perldoc -f local.

    Incidentally, if you can put your script on the server, you can have all the modules you want. See How do I keep my own module/library directory?.

    In addition, Path::Class::File allows you to slurp and spew.

    Path::Tiny gives even more convenience methods such as slurp, slurp_raw, slurp_utf8 as well as their spew counterparts.

    点赞 评论
  • weixin_41568110
    七度&光 2009-06-05 00:12

    A simple way is:

    while (<FILE>) { $document .= $_ }
    

    Another way is to change the input record separator "$/". You can do it locally in a bare block to avoid changing the global record separator.

    {
        open(F, "filename");
        local $/ = undef;
        $d = <F>;
    }
    
    点赞 评论
  • csdnceshi67
    bug^君 2009-06-05 01:28

    I would do it like this:

    my $file = "index.html";
    my $document = do {
        local $/ = undef;
        open my $fh, "<", $file
            or die "could not open $file: $!";
        <$fh>;
    };
    

    Note the use of the three-argument version of open. It is much safer than the old two- (or one-) argument versions. Also note the use of a lexical filehandle. Lexical filehandles are nicer than the old bareword variants, for many reasons. We are taking advantage of one of them here: they close when they go out of scope.

    点赞 评论
  • csdnceshi65
    larry*wei 2009-06-05 03:20

    All the posts are slightly non-idiomatic. The idiom is:

    open my $fh, '<', $filename or die "error opening $filename: $!";
    my $data = do { local $/; <$fh> };
    

    Mostly, there is no need to set $/ to undef.

    点赞 评论
  • csdnceshi75
    衫裤跑路 2009-06-05 07:24

    Either set $/ to undef (see jrockway's answer) or just concatenate all the file's lines:

    $content = join('', <$fh>);
    

    It's recommended to use scalars for filehandles on any Perl version that supports it.

    点赞 评论
  • weixin_41568134
    MAO-EYE 2009-06-05 08:55

    With File::Slurp:

    use File::Slurp;
    my $text = read_file('index.html');
    

    Yes, even you can use CPAN.

    点赞 评论
  • csdnceshi78
    程序go 2009-06-05 17:06

    From perlfaq5: How can I read in an entire file all at once?:


    You can use the File::Slurp module to do it in one step.

    use File::Slurp;
    
    $all_of_it = read_file($filename); # entire file in scalar
    @all_lines = read_file($filename); # one line per element
    

    The customary Perl approach for processing all the lines in a file is to do so one line at a time:

    open (INPUT, $file)     || die "can't open $file: $!";
    while (<INPUT>) {
        chomp;
        # do something with $_
        }
    close(INPUT)            || die "can't close $file: $!";
    

    This is tremendously more efficient than reading the entire file into memory as an array of lines and then processing it one element at a time, which is often--if not almost always--the wrong approach. Whenever you see someone do this:

    @lines = <INPUT>;
    

    you should think long and hard about why you need everything loaded at once. It's just not a scalable solution. You might also find it more fun to use the standard Tie::File module, or the DB_File module's $DB_RECNO bindings, which allow you to tie an array to a file so that accessing an element the array actually accesses the corresponding line in the file.

    You can read the entire filehandle contents into a scalar.

    {
    local(*INPUT, $/);
    open (INPUT, $file)     || die "can't open $file: $!";
    $var = <INPUT>;
    }
    

    That temporarily undefs your record separator, and will automatically close the file at block exit. If the file is already open, just use this:

    $var = do { local $/; <INPUT> };
    

    For ordinary files you can also use the read function.

    read( INPUT, $var, -s INPUT );
    

    The third argument tests the byte size of the data on the INPUT filehandle and reads that many bytes into the buffer $var.

    点赞 评论
  • weixin_41568174
    from.. 2009-06-05 17:18

    You're only getting the first line from the diamond operator <FILE> because you're evaluating it in scalar context:

    $document = <FILE>; 
    

    In list/array context, the diamond operator will return all the lines of the file.

    @lines = <FILE>;
    print @lines;
    
    点赞 评论
  • csdnceshi80
    胖鸭 2012-02-20 10:48

    This is more of a suggestion on how NOT to do it. I've just had a bad time finding a bug in a rather big Perl application. Most of the modules had its own configuration files. To read the configuration files as-a-whole, I found this single line of Perl somewhere on the Internet:

    # Bad! Don't do that!
    my $content = do{local(@ARGV,$/)=$filename;<>};
    

    It reassigns the line separator as explained before. But it also reassigns the STDIN.

    This had at least one side effect that cost me hours to find: It does not close the implicit file handle properly (since it does not call closeat all).

    For example, doing that:

    use strict;
    use warnings;
    
    my $filename = 'some-file.txt';
    
    my $content = do{local(@ARGV,$/)=$filename;<>};
    my $content2 = do{local(@ARGV,$/)=$filename;<>};
    my $content3 = do{local(@ARGV,$/)=$filename;<>};
    
    print "After reading a file 3 times redirecting to STDIN: $.\n";
    
    open (FILE, "<", $filename) or die $!;
    
    print "After opening a file using dedicated file handle: $.\n";
    
    while (<FILE>) {
        print "read line: $.\n";
    }
    
    print "before close: $.\n";
    close FILE;
    print "after close: $.\n";
    

    results in:

    After reading a file 3 times redirecting to STDIN: 3
    After opening a file using dedicated file handle: 3
    read line: 1
    read line: 2
    (...)
    read line: 46
    before close: 46
    after close: 0
    

    The strange thing is, that the line counter $. is increased for every file by one. It's not reset, and it does not contain the number of lines. And it is not reset to zero when opening another file until at least one line is read. In my case, I was doing something like this:

    while($. < $skipLines) {<FILE>};
    

    Because of this problem, the condition was false because the line counter was not reset properly. I don't know if this is a bug or simply wrong code... Also calling close; oder close STDIN; does not help.

    I replaced this unreadable code by using open, string concatenation and close. However, the solution posted by Brad Gilbert also works since it uses an explicit file handle instead.

    The three lines at the beginning can be replaced by:

    my $content = do{local $/; open(my $f1, '<', $filename) or die $!; my $tmp1 = <$f1>; close $f1 or die $!; $tmp1};
    my $content2 = do{local $/; open(my $f2, '<', $filename) or die $!; my $tmp2 = <$f2>; close $f2 or die $!; $tmp2};
    my $content3 = do{local $/; open(my $f3, '<', $filename) or die $!; my $tmp3 = <$f3>; close $f3 or die $!; $tmp3};
    

    which properly closes the file handle.

    点赞 评论
  • csdnceshi66

    These are all good answers. BUT if you're feeling lazy, and the file isn't that big, and security is not an issue (you know you don't have a tainted filename), then you can shell out:

    $x=`cat /tmp/foo`;    # note backticks, qw"cat ..." also works
    
    点赞 评论
  • csdnceshi56
    lrony* 2012-12-27 20:57

    You can use cat in Linux:

    @file1=\`cat /etc/file.txt\`;
    
    点赞 评论
  • csdnceshi69
    YaoRaoLov 2013-05-12 00:43

    Another possible way:

    open my $fh, '<', "filename";
    read $fh, my $string, -s $fh;
    close $fh;
    
    点赞 评论
  • csdnceshi79
    python小菜 2013-05-28 14:36
    open f, "test.txt"
    $file = join '', <f>
    

    <f> - returns an array of lines from our file (if $/ has the default value "\n") and then join '' will stick this array into.

    点赞 评论
  • csdnceshi53
    Lotus@ 2013-12-30 16:44

    You could simply create a sub-routine:

    #Get File Contents
    sub gfc
    {
        open FC, @_[0];
        join '', <FC>;
    }
    
    点赞 评论
  • weixin_41568208
    北城已荒凉 2014-05-08 20:07

    I would do it in the simplest way, so anyone can understand what happens, even if there are smarter ways:

    my $text = "";
    while (my $line = <FILE>) {
        $text .= $line;
    }
    
    点赞 评论
  • csdnceshi70
    笑故挽风 2016-03-14 16:29

    Use

     $/ = undef;
    

    before $document = <FILE>;. $/ is the input record separator, which is a newline by default. By redefining it to undef, you are saying there is no field separator. This is called "slurp" mode.

    Other solutions like undef $/ and local $/ (but not my $/) redeclare $/ and thus produce the same effect.

    点赞 评论
  • csdnceshi70
    笑故挽风 2017-05-31 10:30

    I don't know if it's good practice, but I used to use this:

    ($a=<F>);
    
    点赞 评论

相关推荐