hanlianguai
willgone123
2015-03-12 12:27

flume采集数据到hdfs性能问题

  • flume到hdfs写入问题

本人目前遇到flume采集写入hdfs性能等各种问题,大致如下。在10上的xx/xx目录下的数据进行读取 sink到08上的flume 由08上的flume写到07的hdfs上 30多m的文件写了好久。有时候会内存溢出等问题图片说明

Name the components on this agent

a1.sources = r1

a1.sinks = k1

a1.channels = c1

Describe/configure the source

a1.sources.r1.type = avro

a1.sources.r1.bind = r09n08

a1.sources.r1.port = 55555

a1.sources.r1.interceptors = i1

a1.sources.r1.interceptors.i1.type = timestamp

#hdfs sink
a1.sinks.k1.type = hdfs

a1.sinks.k1.hdfs.path = hdfs://r09n07:8020/project/dame/input/%Y%m%d/%H
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.filePrefix = hdfs-
a1.sinks.k1.hdfs.rollInterval = 0
#a1.sinks.k1.hdfs.fileSuffix = .log

#a1.sinks.k1.hdfs.round = true

#a1.sinks.k1.hdfs.roundValue = 1

#a1.sinks.k1.hdfs.roundUnit = minute

a1.sinks.k1.hdfs.rollSize = 67108864

a1.sinks.k1.hdfs.rollCount = 0

#a1.sinks.k1.hdfs.writeFormat = Text

Use a channel which buffers events in file

a1.channels = c1
a1.channels.c1.type = memory
#a1.channels.c1.checkpointDir=/home/nids/wg/apache-flume-1.5.2-bin/checkpoint
#a1.channels.c1.dataDirs=/home/nids/wg/apache-flume-1.5.2-bin/datadir

a1.sinks.k1.hdfs.batchSize = 10000
#a1.sinks.k1.hdfs.callTimeout = 6000
#a1.sinks.k1.hdfs.appendTimeout = 6000

#a1.channels.c1.type = memory

a1.channels.c1.capacity = 100000

a1.channels.c1.transactionCapacity = 10000
a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

上面是08机器上的配置文件

 下面是10机器上的配置文件
 # Name the components on this agent
a1.sources = r1  
a1.sinks = k1  
a1.channels = c1

# Describe the sink
a1.sinks.k1.type = logger  

####
a1.sources.r1.type = spooldir 
a1.sources.r1.spoolDir = /home/nids/wg/apache-flume-1.5.2-bin/ceshi12
a1.sources.r1.fileHeader =false
a1.sources.r1.channels = c1
####

# Describe/configure the source
#a1.sources.r1.type = avro   
a1.sources.r1.bind = localhost  
a1.sources.r1.port = 44444 

# avro sink   
a1.sinks.k1.type = avro  
a1.sinks.k1.channel = c1
a1.sinks.k1.hostname = r09n08  
a1.sinks.k1.port = 55555

# Use a channel which buffers events in file
a1.channels = c1
a1.channels.c1.type = memory
#a1.channels.c1.checkpointDir = /home/nids/wg/apache-flume-1.5.2-bin/checkpoint
#a1.channels.c1.dataDirs = /home/nids/wg/apache-flume-1.5.2-bin/datadir

a1.sinks.k1.hdfs.batchSize = 10000 
#a1.channels.c1.type = memory  
a1.channels.c1.capacity = 100000  
a1.channels.c1.transactionCapacity = 10000  

# Bind the source and sink to the channel
a1.sources.r1.channels = c1  
a1.sinks.k1.channel = c1

求各位高手解答。有时候只写了一部分数据就不再继续了,对单个文件执行时没有问题就是对目录扫描 channel是 memory类型时性能极差。不知道问题出在哪里
  • 点赞
  • 回答
  • 收藏
  • 复制链接分享

0条回答

为你推荐

换一换