willgone123 2015-03-12 12:27
浏览 11086

flume采集数据到hdfs性能问题

本人目前遇到flume采集写入hdfs性能等各种问题,大致如下。在10上的xx/xx目录下的数据进行读取 sink到08上的flume 由08上的flume写到07的hdfs上 30多m的文件写了好久。有时候会内存溢出等问题图片说明

Name the components on this agent

a1.sources = r1

a1.sinks = k1

a1.channels = c1

Describe/configure the source

a1.sources.r1.type = avro

a1.sources.r1.bind = r09n08

a1.sources.r1.port = 55555

a1.sources.r1.interceptors = i1

a1.sources.r1.interceptors.i1.type = timestamp

#hdfs sink
a1.sinks.k1.type = hdfs

a1.sinks.k1.hdfs.path = hdfs://r09n07:8020/project/dame/input/%Y%m%d/%H
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.filePrefix = hdfs-
a1.sinks.k1.hdfs.rollInterval = 0
#a1.sinks.k1.hdfs.fileSuffix = .log

#a1.sinks.k1.hdfs.round = true

#a1.sinks.k1.hdfs.roundValue = 1

#a1.sinks.k1.hdfs.roundUnit = minute

a1.sinks.k1.hdfs.rollSize = 67108864

a1.sinks.k1.hdfs.rollCount = 0

#a1.sinks.k1.hdfs.writeFormat = Text

Use a channel which buffers events in file

a1.channels = c1
a1.channels.c1.type = memory
#a1.channels.c1.checkpointDir=/home/nids/wg/apache-flume-1.5.2-bin/checkpoint
#a1.channels.c1.dataDirs=/home/nids/wg/apache-flume-1.5.2-bin/datadir

a1.sinks.k1.hdfs.batchSize = 10000
#a1.sinks.k1.hdfs.callTimeout = 6000
#a1.sinks.k1.hdfs.appendTimeout = 6000

#a1.channels.c1.type = memory

a1.channels.c1.capacity = 100000

a1.channels.c1.transactionCapacity = 10000
a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

上面是08机器上的配置文件

 下面是10机器上的配置文件
 # Name the components on this agent
a1.sources = r1  
a1.sinks = k1  
a1.channels = c1

# Describe the sink
a1.sinks.k1.type = logger  

####
a1.sources.r1.type = spooldir 
a1.sources.r1.spoolDir = /home/nids/wg/apache-flume-1.5.2-bin/ceshi12
a1.sources.r1.fileHeader =false
a1.sources.r1.channels = c1
####

# Describe/configure the source
#a1.sources.r1.type = avro   
a1.sources.r1.bind = localhost  
a1.sources.r1.port = 44444 

# avro sink   
a1.sinks.k1.type = avro  
a1.sinks.k1.channel = c1
a1.sinks.k1.hostname = r09n08  
a1.sinks.k1.port = 55555

# Use a channel which buffers events in file
a1.channels = c1
a1.channels.c1.type = memory
#a1.channels.c1.checkpointDir = /home/nids/wg/apache-flume-1.5.2-bin/checkpoint
#a1.channels.c1.dataDirs = /home/nids/wg/apache-flume-1.5.2-bin/datadir

a1.sinks.k1.hdfs.batchSize = 10000 
#a1.channels.c1.type = memory  
a1.channels.c1.capacity = 100000  
a1.channels.c1.transactionCapacity = 10000  

# Bind the source and sink to the channel
a1.sources.r1.channels = c1  
a1.sinks.k1.channel = c1

求各位高手解答。有时候只写了一部分数据就不再继续了,对单个文件执行时没有问题就是对目录扫描 channel是 memory类型时性能极差。不知道问题出在哪里
  • 写回答

0条回答 默认 最新

    报告相同问题?

    悬赏问题

    • ¥15 求.net core 几款免费的pdf编辑器
    • ¥20 SQL server表计算问题
    • ¥15 C# P/Invoke的效率问题
    • ¥20 thinkphp适配人大金仓问题
    • ¥20 Oracle替换.dbf文件后无法连接,如何解决?(相关搜索:数据库|死循环)
    • ¥15 数据库数据成问号了,前台查询正常,数据库查询是?号
    • ¥15 算法使用了tf-idf,用手肘图确定k值确定不了,第四轮廓系数又太小才有0.006088746097507285,如何解决?(相关搜索:数据处理)
    • ¥15 彩灯控制电路,会的加我QQ1482956179
    • ¥200 相机拍直接转存到电脑上 立拍立穿无线局域网传
    • ¥15 (关键词-电路设计)