本人目前遇到flume采集写入hdfs性能等各种问题,大致如下。在10上的xx/xx目录下的数据进行读取 sink到08上的flume 由08上的flume写到07的hdfs上 30多m的文件写了好久。有时候会内存溢出等问题
Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
Describe/configure the source
a1.sources.r1.type = avro
a1.sources.r1.bind = r09n08
a1.sources.r1.port = 55555
a1.sources.r1.interceptors = i1
a1.sources.r1.interceptors.i1.type = timestamp
#hdfs sink
a1.sinks.k1.type = hdfs
a1.sinks.k1.hdfs.path = hdfs://r09n07:8020/project/dame/input/%Y%m%d/%H
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.filePrefix = hdfs-
a1.sinks.k1.hdfs.rollInterval = 0
#a1.sinks.k1.hdfs.fileSuffix = .log
#a1.sinks.k1.hdfs.round = true
#a1.sinks.k1.hdfs.roundValue = 1
#a1.sinks.k1.hdfs.roundUnit = minute
a1.sinks.k1.hdfs.rollSize = 67108864
a1.sinks.k1.hdfs.rollCount = 0
#a1.sinks.k1.hdfs.writeFormat = Text
Use a channel which buffers events in file
a1.channels = c1
a1.channels.c1.type = memory
#a1.channels.c1.checkpointDir=/home/nids/wg/apache-flume-1.5.2-bin/checkpoint
#a1.channels.c1.dataDirs=/home/nids/wg/apache-flume-1.5.2-bin/datadir
a1.sinks.k1.hdfs.batchSize = 10000
#a1.sinks.k1.hdfs.callTimeout = 6000
#a1.sinks.k1.hdfs.appendTimeout = 6000
#a1.channels.c1.type = memory
a1.channels.c1.capacity = 100000
a1.channels.c1.transactionCapacity = 10000
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
上面是08机器上的配置文件
下面是10机器上的配置文件
# Name the components on this agent
a1.sources = r1
a1.sinks = k1
a1.channels = c1
# Describe the sink
a1.sinks.k1.type = logger
####
a1.sources.r1.type = spooldir
a1.sources.r1.spoolDir = /home/nids/wg/apache-flume-1.5.2-bin/ceshi12
a1.sources.r1.fileHeader =false
a1.sources.r1.channels = c1
####
# Describe/configure the source
#a1.sources.r1.type = avro
a1.sources.r1.bind = localhost
a1.sources.r1.port = 44444
# avro sink
a1.sinks.k1.type = avro
a1.sinks.k1.channel = c1
a1.sinks.k1.hostname = r09n08
a1.sinks.k1.port = 55555
# Use a channel which buffers events in file
a1.channels = c1
a1.channels.c1.type = memory
#a1.channels.c1.checkpointDir = /home/nids/wg/apache-flume-1.5.2-bin/checkpoint
#a1.channels.c1.dataDirs = /home/nids/wg/apache-flume-1.5.2-bin/datadir
a1.sinks.k1.hdfs.batchSize = 10000
#a1.channels.c1.type = memory
a1.channels.c1.capacity = 100000
a1.channels.c1.transactionCapacity = 10000
# Bind the source and sink to the channel
a1.sources.r1.channels = c1
a1.sinks.k1.channel = c1
求各位高手解答。有时候只写了一部分数据就不再继续了,对单个文件执行时没有问题就是对目录扫描 channel是 memory类型时性能极差。不知道问题出在哪里