为什么在 c + + 中读取 stdin 行比读取 Python 慢得多?

I wanted to compare reading lines of string input from stdin using Python and C++ and was shocked to see my C++ code run an order of magnitude slower than the equivalent Python code. Since my C++ is rusty and I'm not yet an expert Pythonista, please tell me if I'm doing something wrong or if I'm misunderstanding something.


(TLDR answer: include the statement: cin.sync_with_stdio(false) or just use fgets instead.

TLDR results: scroll all the way down to the bottom of my question and look at the table.)


C++ code:

#include <iostream>
#include <time.h>

using namespace std;

int main() {
    string input_line;
    long line_count = 0;
    time_t start = time(NULL);
    int sec;
    int lps;

    while (cin) {
        getline(cin, input_line);
        if (!cin.eof())
            line_count++;
    };

    sec = (int) time(NULL) - start;
    cerr << "Read " << line_count << " lines in " << sec << " seconds.";
    if (sec > 0) {
        lps = line_count / sec;
        cerr << " LPS: " << lps << endl;
    } else
        cerr << endl;
    return 0;
}

// Compiled with:
// g++ -O3 -o readline_test_cpp foo.cpp

Python Equivalent:

#!/usr/bin/env python
import time
import sys

count = 0
start = time.time()

for line in  sys.stdin:
    count += 1

delta_sec = int(time.time() - start_time)
if delta_sec >= 0:
    lines_per_sec = int(round(count/delta_sec))
    print("Read {0} lines in {1} seconds. LPS: {2}".format(count, delta_sec,
       lines_per_sec))

Here are my results:

$ cat test_lines | ./readline_test_cpp
Read 5570000 lines in 9 seconds. LPS: 618889

$cat test_lines | ./readline_test.py
Read 5570000 lines in 1 seconds. LPS: 5570000

I should note that I tried this both under Mac OS X v10.6.8 (Snow Leopard) and Linux 2.6.32 (Red Hat Linux 6.2). The former is a MacBook Pro, and the latter is a very beefy server, not that this is too pertinent.

$ for i in {1..5}; do echo "Test run $i at `date`"; echo -n "CPP:"; cat test_lines | ./readline_test_cpp ; echo -n "Python:"; cat test_lines | ./readline_test.py ; done
Test run 1 at Mon Feb 20 21:29:28 EST 2012
CPP:   Read 5570001 lines in 9 seconds. LPS: 618889
Python:Read 5570000 lines in 1 seconds. LPS: 5570000
Test run 2 at Mon Feb 20 21:29:39 EST 2012
CPP:   Read 5570001 lines in 9 seconds. LPS: 618889
Python:Read 5570000 lines in 1 seconds. LPS: 5570000
Test run 3 at Mon Feb 20 21:29:50 EST 2012
CPP:   Read 5570001 lines in 9 seconds. LPS: 618889
Python:Read 5570000 lines in 1 seconds. LPS: 5570000
Test run 4 at Mon Feb 20 21:30:01 EST 2012
CPP:   Read 5570001 lines in 9 seconds. LPS: 618889
Python:Read 5570000 lines in 1 seconds. LPS: 5570000
Test run 5 at Mon Feb 20 21:30:11 EST 2012
CPP:   Read 5570001 lines in 10 seconds. LPS: 557000
Python:Read 5570000 lines in  1 seconds. LPS: 5570000

Tiny benchmark addendum and recap

For completeness, I thought I'd update the read speed for the same file on the same box with the original (synced) C++ code. Again, this is for a 100M line file on a fast disk. Here's the comparison, with several solutions/approaches:

Implementation      Lines per second
python (default)           3,571,428
cin (default/naive)          819,672
cin (no sync)             12,500,000
fgets                     14,285,714
wc (not fair comparison)  54,644,808

转载于:https://stackoverflow.com/questions/9371238/why-is-reading-lines-from-stdin-much-slower-in-c-than-python

csdnceshi57
perhaps? fscanf & scanf aren't neccessarily unsafe, but it's a bit cludgy to use them safely, for you have to use %#s, where # is 1 less than the size of your buffer instead of just %s if you don't want to risk a buffer overflow.
2 年多之前 回复
csdnceshi77
狐狸.fox You could also fix the C++ loop. That extra test in the loop could be costly. while (getline(cin, input_line)){line_count++;}
2 年多之前 回复
csdnceshi58
Didn"t forge If I were you, I'd look at mmap and memchr function. Since memory is not an issue, map the whole file in your program with mmap, and then process using memchr to figure out the "line limits". And also fadvise to tell the kernel that's you are reading sequentially
大约 5 年之前 回复
csdnceshi66
必承其重 | 欲带皇冠 This is indeed very educational. And also once again underline the fact that C++ is powerful, but only when great care is put into its use.
大约 5 年之前 回复
csdnceshi61
derek5. Also, see my follow-up question about splitting lines in C++ vs Python... a similar speed story, where the naive approach is slower in C++! Here: stackoverflow.com/q/9378500/379037
5 年多之前 回复
weixin_41568183
零零乙 I was meaning the one from Edit 6
接近 7 年之前 回复
csdnceshi61
derek5. which "quick python version" are you referring to? Thanks for the suggestions, I'll copy the chart to the top.
接近 7 年之前 回复
csdnceshi80
胖鸭 According with stackoverflow.com/questions/21107131/… you can speed up you python code twice simply extracting it into a function.
接近 7 年之前 回复
weixin_41568183
零零乙 For completeness, why don't you add the quick python version and similar C++ code to the chart in the bottom? You could also consider moving the chart to the top, as people might not find it in the rather long post. An interesting read indeed!
接近 7 年之前 回复
csdnceshi61
derek5. Thanks, these tips sound promising! If you happen to have/write/find any simple example code implementing some of these and post it as an answer (or at least link), I and future readers of this question would be very grateful for your teaching. Cheers!
8 年多之前 回复
weixin_41568131
10.24 wee, famous question! If you'd like to get performance on the speed of wc, you can do this: don't call line-by-line functions, but ready binary blocks and examine them one int at a time, logically masking them using XOR bitmasks for a newline character. (this generally leads to less single-byte fetches). furthermore, use stdio built-in functions to point its buffer at one you allocate. you can then examine it directly. there's more voodoo if you want to actually abuse stdio, so you haven't even come close to hitting the limit, but you're probably at disk limit now so no point in abusing it
8 年多之前 回复
weixin_41568126
乱世@小熊 The answer to "why is my I/O slow?" is almost always "buffering."
8 年多之前 回复
csdnceshi51
旧行李 Nice post. But I would just like to mention that the buffer overflow problem with scanf can be handled by specifying the number of characters to be read (for any datatype). See the width parameter mentioned in the link. As an example: char s[10]; scanf("%9s",s); //This will read at most 9 characters from the input. int x; scanf("%2d",&x); //This will read a 2 digit number from the input. (just mentioning) This can take care of buffer overflow. Also dynamic width can not be specified, but to overcome that one could simply generate the
8 年多之前 回复
csdnceshi67
bug^君 Related question: stackoverflow.com/questions/8310039/…
8 年多之前 回复
csdnceshi61
derek5. Thanks for that python code snippet! It looks like wc uses safe_read, which is just a wrapper around plain read, and reads 16k at a time. Modifying your python code to use a 16k buffer instead of 32k and running it on the same machine and test file, the crunch took 4 seconds (i.e. 25,000,000 LPS) Thanks!
8 年多之前 回复
csdnceshi64
游.程 wc -l is fast because it reads the stream more than one line at a time (it might be fread(stdin)/memchr('\n') combination). Python results are in the same order of magnitude e.g., wc-l.py
8 年多之前 回复
csdnceshi78
程序go Since nobody seems to have mentioned why you get an extra line with C++: Do not test against cin.eof()!! Put the getline call into the 'if` statement.
8 年多之前 回复
csdnceshi75
衫裤跑路 yes, but so do C and C++. The C Python implementation is almost certainly relying on stdio for this. There are many levels of caching involved in a modern system AFAIK.
8 年多之前 回复
weixin_41568184
叼花硬汉 The problem is synchronization with stdio -- see my answer.
8 年多之前 回复
csdnceshi62
csdnceshi62 Is it possible that the python VM does some clever buffered reading of stdin when piping a file to it, so it doesn't have to go out to the drive cache for each read of a single line?
8 年多之前 回复
csdnceshi68
local-host Actually, 2 folds when using a bigger file (precision was too low with 1 sec).
8 年多之前 回复
csdnceshi68
local-host I get the difference down to 3 folds using fscanf. That's strange because at some point python must use the underlying C API.
8 年多之前 回复
csdnceshi68
local-host there are two copies and you could save one
8 年多之前 回复
csdnceshi72
谁还没个明天 How big are the files? Unless the lines are very long, I would doubt it could be a caching issue. ie the cache wouldn't be cleared/thrashed between runs
8 年多之前 回复
csdnceshi72
谁还没个明天 Python is definitely not smart enough to avoid copying the data. Besides the file needs to be read to see where the newlines are
8 年多之前 回复
csdnceshi54
hurriedly% the Python version is still vastly faster if I collect a character from each line.
8 年多之前 回复
csdnceshi54
hurriedly% you should probably use time.time() instead of datetime.datetime.now().seconds; you get floating point results and no possibility of divide-by-zero in the LPS computation. I get similar results (python about 10x faster) testing on my machine.
8 年多之前 回复
csdnceshi68
local-host One more thing, this is the kind of simple test that do not actually reflect the performance of language N vs language M. Python may have a very smart optimization for that precise case you would not get when running a full featured application.
8 年多之前 回复
csdnceshi68
local-host : I see two possibilities (assuming you have remove the caching problem suggested by David): 1) <iostream> performance sucks. Not the first time it happens. 2) Python is clever enough not to copy the data in the for loop because you don't use it. You could retest trying to use scanf and a char[]. Alternatively you could try rewriting the loop so that something is done with the string (eg keep the 5th letter and concatenate it in a result).
8 年多之前 回复
csdnceshi54
hurriedly% Try copying your test file to a second, separate file, so that they are cached separately.
8 年多之前 回复
csdnceshi61
derek5. Yes, and on two different machines as well.
8 年多之前 回复
weixin_41568184
叼花硬汉 Did you run your tests multiple times? Perhaps there is a disk cache issue.
8 年多之前 回复

10个回答

By default, cin is synchronized with stdio, which causes it to avoid any input buffering. If you add this to the top of your main, you should see much better performance:

std::ios_base::sync_with_stdio(false);

Normally, when an input stream is buffered, instead of reading one character at a time, the stream will be read in larger chunks. This reduces the number of system calls, which are typically relatively expensive. However, since the FILE* based stdio and iostreams often have separate implementations and therefore separate buffers, this could lead to a problem if both were used together. For example:

int myvalue1;
cin >> myvalue1;
int myvalue2;
scanf("%d",&myvalue2);

If more input was read by cin than it actually needed, then the second integer value wouldn't be available for the scanf function, which has its own independent buffer. This would lead to unexpected results.

To avoid this, by default, streams are synchronized with stdio. One common way to achieve this is to have cin read each character one at a time as needed using stdio functions. Unfortunately, this introduces a lot of overhead. For small amounts of input, this isn't a big problem, but when you are reading millions of lines, the performance penalty is significant.

Fortunately, the library designers decided that you should also be able to disable this feature to get improved performance if you knew what you were doing, so they provided the sync_with_stdio method.

weixin_41568110
七度&光 (p.s. nor std::wcin.)
3 年多之前 回复
weixin_41568110
七度&光 Wrong; if my understanding of synchronization is correct, you're still free to do what you want as long as it relies on std::cin.rdbuf(). You just can't use cstdio's stdin.
3 年多之前 回复
csdnceshi59
ℙℕℤℝ Note that sync_with_stdio() is a static member function, and a call to this function on any stream object (e.g. cin) toggles on or off synchronization for all standard iostream objects.
接近 6 年之前 回复
csdnceshi57
perhaps? Gladly will I burn one of my daily votes for this, and i didn't even arrive here on a search, just saw it in a possibly-related list from a completely arbitrary question. This is incredibly helpful for people doing speed-sensitive cin-input and want the benefits of buffering. Thank you so much, Vaughn. Terrific answer.
7 年多之前 回复
csdnceshi67
bug^君 Glad I ran into this Q&A. Just remember that if you use the cin.sync_with_stdio(false) solution no other part of your program should read from stdin! An excellent article on stream sync may be found at drdobbs.com/184401305
8 年多之前 回复
csdnceshi51
旧行李 To make cout, cin, cerr and clog faster, do it this way std::ios_base::sync_with_stdio(false);
8 年多之前 回复
csdnceshi64
游.程 Yes, this actually applies to cout, cerr, and clog as well.
8 年多之前 回复
weixin_41568184
叼花硬汉 What about output/printing? because I feel count is slow too, is it the same?
8 年多之前 回复
csdnceshi52
妄徒之命 It turns out fgets is even faster, please see my Edit 5. Still, your solution is very useful, esp. when one needs to use cin and write to a string object, e.g. in contexts where a single line can ocassionally be much longer than anticipated and where the fgets-->char buffer[MAXLINE] route would result in truncation. Thanks.
8 年多之前 回复
csdnceshi52
妄徒之命 Yes, adding this line immediately above my original while loop sped the code up to surpass even python. I'm about to post the results as the final edit. Thanks again!
8 年多之前 回复
csdnceshi74
7*4 This should be at the top. It is almost certainly correct. The answer cannot lie in replacing the read with an fscanf call, because that quite simply doesn't do as much work as Python does. Python must allocate memory for the string, possibly multiple times as the existing allocation is deemed inadequate - exactly like the C++ approach with std::string. This task is almost certainly I/O bound and there is way too much FUD going around about the cost of creating std::string objects in C++ or using <iostream> in and of itself.
8 年多之前 回复

Just out of curiosity I've taken a look at what happens under the hood, and I've used dtruss/strace on each test.

C++

./a.out < in
Saw 6512403 lines in 8 seconds.  Crunch speed: 814050

syscalls sudo dtruss -c ./a.out < in

CALL                                        COUNT
__mac_syscall                                   1
<snip>
open                                            6
pread                                           8
mprotect                                       17
mmap                                           22
stat64                                         30
read_nocancel                               25958

Python

./a.py < in
Read 6512402 lines in 1 seconds. LPS: 6512402

syscalls sudo dtruss -c ./a.py < in

CALL                                        COUNT
__mac_syscall                                   1
<snip>
open                                            5
pread                                           8
mprotect                                       17
mmap                                           21
stat64                                         29

In your second example (with scanf()) reason why this is still slower might be because scanf("%s") parses string and looks for any space char (space, tab, newline).

Also, yes, CPython does some caching to avoid harddisk reads.

A first element of an answer: <iostream> is slow. Damn slow. I get a huge performance boost with scanf as in the below, but it is still two times slower than Python.

#include <iostream>
#include <time.h>
#include <cstdio>

using namespace std;

int main() {
    char buffer[10000];
    long line_count = 0;
    time_t start = time(NULL);
    int sec;
    int lps;

    int read = 1;
    while(read > 0) {
        read = scanf("%s", buffer);
        line_count++;
    };
    sec = (int) time(NULL) - start;
    line_count--;
    cerr << "Saw " << line_count << " lines in " << sec << " seconds." ;
    if (sec > 0) {
        lps = line_count / sec;
        cerr << "  Crunch speed: " << lps << endl;
    } 
    else
        cerr << endl;
    return 0;
}
csdnceshi64
游.程 fgets is even faster; please see edit 5 above. Thanks.
8 年多之前 回复
csdnceshi60
℡Wang Yan Same here. The sync to stdio was the trick.
8 年多之前 回复
csdnceshi74
7*4 After fixing the c++ version, this stdio version is substantially slower than the c++ iostreams version on my computer. (3 seconds vs 1 second)
8 年多之前 回复
csdnceshi64
游.程 Didn't see this post until I made my third edit, but thanks again for your suggestion. Strangely, there is no 2x hit for me vs. python now with the scanf line in edit3 above. I'm using 2.7, by the way.
8 年多之前 回复

The following code was faster for me than the other code posted here so far: (Visual Studio 2013, 64-bit, 500 MB file with line length uniformly in [0, 1000)).

const int buffer_size = 500 * 1024;  // Too large/small buffer is not good.
std::vector<char> buffer(buffer_size);
int size;
while ((size = fread(buffer.data(), sizeof(char), buffer_size, stdin)) > 0) {
    line_count += count_if(buffer.begin(), buffer.begin() + size, [](char ch) { return ch == '\n'; });
}

It beats all my Python attempts by more than a factor 2.

Well, I see that in your second solution you switched from cin to scanf, which was the first suggestion I was going to make you (cin is sloooooooooooow). Now, if you switch from scanf to fgets, you would see another boost in performance: fgets is the fastest C++ function for string input.

BTW, didn't know about that sync thing, nice. But you should still try fgets.

csdnceshi78
程序go Except fgets will be wrong (in terms of line counts, and in terms of splitting lines across loops if you actually need to use them) for sufficiently large lines, without additional checks for incomplete lines (and attempting to compensate for it involves allocating unnecessarily large buffers, where std::getline handles reallocation to match actual input seamlessly). Fast and wrong is easy, but it's almost always worth it to use "slightly slower, but correct", which turning off sync_with_stdio gets you.
大约 2 年之前 回复

I reproduced the original result on my computer using g++ on a Mac.

Adding the following statements to the C++ version just before the while loop brings it inline with the Python version:

std::ios_base::sync_with_stdio(false);
char buffer[1048576];
std::cin.rdbuf()->pubsetbuf(buffer, sizeof(buffer));

sync_with_stdio improved speed to 2 seconds, and setting a larger buffer brought it down to 1 second.

weixin_41568134
MAO-EYE To highlight @SEK 's 'more importantly':comment: if the buffer is on the stack, you can't allow the function to return until the file is closed, or the use of the buffer is otherwise discontinued.
大约 3 年之前 回复
csdnceshi68
local-host It should be noted that the effects of pubsetbuf on cin's buffer are not standardized. The implementation may indeed use the provided buffer, ignore it (which would be the inherited default action from std::basic_streambuf), or possibly even do something else. See also stackoverflow.com/questions/12481463/…
5 年多之前 回复
weixin_41568126
乱世@小熊 Windows default Stack size is 1MB.
6 年多之前 回复
weixin_41568131
10.24 Matthieu, Mac uses a 8MB process stack by default. Linux uses 4MB per thread default, IIRC. 1MB isn't that much of an issue for a program that transforms input with relatively shallow stack depth. More importantly, though, std::cin will trash the stack if the buffer goes out of scope.
接近 7 年之前 回复
csdnceshi69
YaoRaoLov I would also avoid setting up a 1MB buffer on the stack. It can lead to stackoverflow (though I guess it's a good place to debate about it!)
8 年多之前 回复
csdnceshi67
bug^君 I was too hasty in my reply; setting the buffer size to something other than the default did not produce an appreciable difference.
8 年多之前 回复
csdnceshi66
必承其重 | 欲带皇冠 You might want to try different buffer sizes to get more useful information. I suspect you will see rapidly diminishing returns.
8 年多之前 回复

getline, stream operators, scanf, can be convenient if you don't care about file loading time or if you are loading small text files. But, if the performance is something you care about, you should really just buffer the entire file into memory (assuming it will fit).

Here's an example:

//open file in binary mode
std::fstream file( filename, std::ios::in|::std::ios::binary );
if( !file ) return NULL;

//read the size...
file.seekg(0, std::ios::end);
size_t length = (size_t)file.tellg();
file.seekg(0, std::ios::beg);

//read into memory buffer, then close it.
char *filebuf = new char[length+1];
file.read(filebuf, length);
filebuf[length] = '\0'; //make it null-terminated
file.close();

If you want, you can wrap a stream around that buffer for more convenient access like this:

std::istrstream header(&filebuf[0], length);

Also, if you are in control of the file, consider using a flat binary data format instead of text. It's more reliable to read and write because you don't have to deal with all the ambiguities of whitespace. It's also smaller and much faster to parse.

I'm a few years behind here, but:

In 'Edit 4/5/6' of the original post, you are using the construction:

$ /usr/bin/time cat big_file | program_to_benchmark

This is wrong in a couple of different ways:

  1. You're actually timing the execution of `cat`, not your benchmark. The 'user' and 'sys' CPU usage displayed by `time` are those of `cat`, not your benchmarked program. Even worse, the 'real' time is also not necessarily accurate. Depending on the implementation of `cat` and of pipelines in your local OS, it is possible that `cat` writes a final giant buffer and exits long before the reader process finishes its work.

  2. Use of `cat` is unnecessary and in fact counterproductive; you're adding moving parts. If you were on a sufficiently old system (i.e. with a single CPU and -- in certain generations of computers -- I/O faster than CPU) -- the mere fact that `cat` was running could substantially color the results. You are also subject to whatever input and output buffering and other processing `cat` may do. (This would likely earn you a 'Useless Use Of Cat' award if I were Randal Schwartz.

A better construction would be:

$ /usr/bin/time program_to_benchmark < big_file

In this statement it is the shell which opens big_file, passing it to your program (well, actually to `time` which then executes your program as a subprocess) as an already-open file descriptor. 100% of the file reading is strictly the responsibility of the program you're trying to benchmark. This gets you a real reading of its performance without spurious complications.

I will mention two possible, but actually wrong, 'fixes' which could also be considered (but I 'number' them differently as these are not things which were wrong in the original post):

A. You could 'fix' this by timing only your program:

$ cat big_file | /usr/bin/time program_to_benchmark

B. or by timing the entire pipeline:

$ /usr/bin/time sh -c 'cat big_file | program_to_benchmark'

These are wrong for the same reasons as #2: they're still using `cat` unnecessarily. I mention them for a few reasons:

  • they're more 'natural' for people who aren't entirely comfortable with the I/O redirection facilities of the POSIX shell

  • there may be cases where `cat` is needed (e.g.: the file to be read requires some sort of privilege to access, and you do not want to grant that privilege to the program to be benchmarked: `sudo cat /dev/sda | /usr/bin/time my_compression_test --no-output`)

  • in practice, on modern machines, the added `cat` in the pipeline is probably of no real consequence

But I say that last thing with some hesitation. If we examine the last result in 'Edit 5' --

$ /usr/bin/time cat temp_big_file | wc -l
0.01user 1.34system 0:01.83elapsed 74%CPU ...

-- this claims that `cat` consumed 74% of the CPU during the test; and indeed 1.34/1.83 is approximately 74%. Perhaps a run of:

$ /usr/bin/time wc -l < temp_big_file

would have taken only the remaining .49 seconds! Probably not: `cat` here had to pay for the read() system calls (or equivalent) which transferred the file from 'disk' (actually buffer cache), as well as the pipe writes to deliver them to `wc`. The correct test would still have had to do those read() calls; only the write-to-pipe and read-from-pipe calls would have been saved, and those should be pretty cheap.

Still, I predict you would be able to measure the difference between `cat file | wc -l` and `wc -l < file` and find a noticeable (2-digit percentage) difference. Each of the slower tests will have paid a similar penalty in absolute time; which would however amount to a smaller fraction of its larger total time.

In fact I did some quick tests with a 1.5 gigabyte file of garbage, on a Linux 3.13 (Ubuntu 14.04) system, obtaining these results (these are actually 'best of 3' results; after priming the cache, of course):

$ time wc -l < /tmp/junk
real 0.280s user 0.156s sys 0.124s (total cpu 0.280s)
$ time cat /tmp/junk | wc -l
real 0.407s user 0.157s sys 0.618s (total cpu 0.775s)
$ time sh -c 'cat /tmp/junk | wc -l'
real 0.411s user 0.118s sys 0.660s (total cpu 0.778s)

Notice that the two pipeline results claim to have taken more CPU time (user+sys) than realtime. This is because I'm using the shell (Bash)'s built-in 'time' command, which is cognizant of the pipeline; and I'm on a multi-core machine where separate processes in a pipeline can use separate cores, accumulating CPU time faster than realtime. Using /usr/bin/time I see smaller CPU time than realtime -- showing that it can only time the single pipeline element passed to it on its command line. Also, the shell's output gives milliseconds while /usr/bin/time only gives hundreths of a second.

So at the efficiency level of `wc -l`, the `cat` makes a huge difference: 409 / 283 = 1.453 or 45.3% more realtime, and 775 / 280 = 2.768, or a whopping 177% more CPU used! On my random it-was-there-at-the-time test box.

I should add that there is at least one other significant difference between these styles of testing, and I can't say whether it is a benefit or fault; you have to decide this yourself:

When you run `cat big_file | /usr/bin/time my_program`, your program is receiving input from a pipe, at precisely the pace sent by `cat`, and in chunks no larger than written by `cat`.

When you run `/usr/bin/time my_program < big_file`, your program receives an open file descriptor to the actual file. Your program -- or in many cases the I/O libraries of the language in which it was written -- may take different actions when presented with a file descriptor referencing a regular file. It may use mmap(2) to map the input file into its address space, instead of using explicit read(2) system calls. These differences could have a far larger effect on your benchmark results than the small cost of running the `cat` binary.

Of course it is an interesting benchmark result if the same program performs significantly differently between the two cases. It shows that, indeed, the program or its I/O libraries are doing something interesting, like using mmap(). So in practice it might be good to run the benchmarks both ways; perhaps discounting the `cat` result by some small factor to "forgive" the cost of running `cat` itself.

weixin_41568183
零零乙 Don't forget that you can still do left to right with redirection: <file program does almost the same thing(with the caveats JJC mentioned) as cat file | program.
2 年多之前 回复
csdnceshi53
Lotus@ Again, aside from the perhaps uninteresting incremental performance difference due to the cat binary running at the same time, you are giving up the possibility of the program under test being able to mmap() the input file. This could make a profound difference in results. This is true even if you wrote the benchmarks yourself, in the various languages, using only their 'input lines from a file' idiom. It depends on the detailed workings of their various I/O libraries.
3 年多之前 回复
csdnceshi53
Lotus@ Redirection is parsed out of the shell command line at an early stage, which allows you to do one of these, if it gives a more pleasing appearance of left-to-right flow: $ < big_file time my_program $ time < big_file my_program This should work in any POSIX shell (i.e. not csh and I'm not sure about exotica like rc : )
3 年多之前 回复
csdnceshi63
elliott.david I'll refrain from an upvote, personally, since this doesn't address the original question (note that the use of cat is constant in the competing examples). But, again, thanks for the intellectual discussion about the ins and outs of *nix.
3 年多之前 回复
csdnceshi63
elliott.david Wow, that was quite insightful! While I've been aware that cat is unnecessary for feeding input to stdin of programs and that the < shell redirect is preferred, I've generally stuck to cat due to the left-to-right flow of data that the former method preserves visually when I reason about pipelines. Performance differences in such cases I've found to be negligible. But, I do appreciate your educating us, Bela.
3 年多之前 回复

By the way, the reason the line count for the C++ version is one greater than the count for the Python version is that the eof flag only gets set when an attempt is made to read beyond eof. So the correct loop would be:

while (cin) {
    getline(cin, input_line);

    if (!cin.eof())
        line_count++;
};
csdnceshi70
笑故挽风 The really correct loop would be: while (getline(cin, input_line)) line_count++;
8 年多之前 回复
Csdn user default icon
上传中...
上传图片
插入图片
抄袭、复制答案,以达到刷声望分或其他目的的行为,在CSDN问答是严格禁止的,一经发现立刻封号。是时候展现真正的技术了!
立即提问
相关内容推荐