请问下面的这段代码在.keyBy(1)
和.keyBy(0)
时为啥在输出结果的并行度上很大的差异(这段代码没有太多的实际意义,只是对输出有疑惑)
private static void countWindowTest1() throws Exception {
StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();
env.setParallelism(3);
DataStreamSource<String> lines = env.socketTextStream("localhost", 8888);
lines
.flatMap(new FlatMapFunction<String, Tuple3<Integer, String, Integer>>() {
public void flatMap(String line, Collector<Tuple3<Integer, String, Integer>> collector) throws Exception {
String[] words = line.split(" ");
for (String word : words) {
Tuple3<Integer, String, Integer> tp = Tuple3.of(Math.abs(word.hashCode() % parallelism), word, 1);
collector.collect(tp);
}
}
})
.keyBy(1)
.countWindow(3)
.sum(2)
.print();
env.execute();
}
输入同样都是下面的两段话
Fight and you may die, run and you'll live. At least a while. And dying in your beds many years from now, would you be willing to trade all the days from this day to that for one chance, just one chance to come back here and tell our enemies that they may take our lives, but they'll never take our freedom!
Fight and you may die, run and you'll live. At least a while. And dying in your beds many years from now, would you be willing to trade all the days from this day to that for one chance, just one chance to come back here and tell our enemies that they may take our lives, but they'll never take our freedom!
- 当
.keyBy(1)
时
- 当
.keyBy(0)
时
为什么第一个输出看起来更符合并行度是3的设置,而第二个输出结果看起来都使用了同一个work呢?