编写一个地图诱导程序,该程序将包含逗号分离单词和输出的 CSV 文件作为输入,每个单词的行都显示在单词中。
例如:
goat,chicken,horse
cat,horse
dog,cat,sheep
buffalo,dolphin,cat
sheep
相应的输出如下:
"buffalo" ["buffalo,dolphin,cat"]
"cat" ["buffalo,dolphin,cat", "cat,horse", "dog,cat,sheep"]
"chicken" ["goat,chicken,horse"]
"dog" ["dog,cat,sheep"]
"dolphin" ["buffalo,dolphin,cat"]
"goat" ["goat,chicken,horse"]
"horse" ["cat,horse", "goat,chicken,horse"]
"sheep" ["dog,cat,sheep", "sheep"]
代码没写完,思路如下:
from mrjob.job import MRJob
from mrjob.step import MRStep
import csv
class part2(MRJob):
def steps(self):
return [MRStep(mapper=self.mapper, reducer=self.reducer)]
#return [MRStep(mapper=self.mapper)]
def mapper(self, key, document):
for word in document.split(','):
yield word, 1
def reducer(self, word, line):
yield word, line
part2.run()