willgone123 2015-03-12 12:27
浏览 11086

flume采集数据到hdfs性能问题

本人目前遇到flume采集写入hdfs性能等各种问题,大致如下。在10上的xx/xx目录下的数据进行读取 sink到08上的flume 由08上的flume写到07的hdfs上 30多m的文件写了好久。有时候会内存溢出等问题图片说明

Name the components on this agent

a1.sources = r1

a1.sinks = k1

a1.channels = c1

Describe/configure the source

a1.sources.r1.type = avro

a1.sources.r1.bind = r09n08

a1.sources.r1.port = 55555

a1.sources.r1.interceptors = i1

a1.sources.r1.interceptors.i1.type = timestamp

#hdfs sink
a1.sinks.k1.type = hdfs

a1.sinks.k1.hdfs.path = hdfs://r09n07:8020/project/dame/input/%Y%m%d/%H
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.filePrefix = hdfs-
a1.sinks.k1.hdfs.rollInterval = 0
#a1.sinks.k1.hdfs.fileSuffix = .log

#a1.sinks.k1.hdfs.round = true

#a1.sinks.k1.hdfs.roundValue = 1

#a1.sinks.k1.hdfs.roundUnit = minute

a1.sinks.k1.hdfs.rollSize = 67108864

a1.sinks.k1.hdfs.rollCount = 0

#a1.sinks.k1.hdfs.writeFormat = Text

Use a channel which buffers events in file

a1.channels = c1
a1.channels.c1.type = memory
#a1.channels.c1.checkpointDir=/home/nids/wg/apache-flume-1.5.2-bin/checkpoint
#a1.channels.c1.dataDirs=/home/nids/wg/apache-flume-1.5.2-bin/datadir

a1.sinks.k1.hdfs.batchSize = 10000
#a1.sinks.k1.hdfs.callTimeout = 6000
#a1.sinks.k1.hdfs.appendTimeout = 6000

#a1.channels.c1.type = memory

a1.channels.c1.capacity = 100000

a1.channels.c1.transactionCapacity = 10000
a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

上面是08机器上的配置文件

 下面是10机器上的配置文件
 # Name the components on this agent
a1.sources = r1  
a1.sinks = k1  
a1.channels = c1

# Describe the sink
a1.sinks.k1.type = logger  

####
a1.sources.r1.type = spooldir 
a1.sources.r1.spoolDir = /home/nids/wg/apache-flume-1.5.2-bin/ceshi12
a1.sources.r1.fileHeader =false
a1.sources.r1.channels = c1
####

# Describe/configure the source
#a1.sources.r1.type = avro   
a1.sources.r1.bind = localhost  
a1.sources.r1.port = 44444 

# avro sink   
a1.sinks.k1.type = avro  
a1.sinks.k1.channel = c1
a1.sinks.k1.hostname = r09n08  
a1.sinks.k1.port = 55555

# Use a channel which buffers events in file
a1.channels = c1
a1.channels.c1.type = memory
#a1.channels.c1.checkpointDir = /home/nids/wg/apache-flume-1.5.2-bin/checkpoint
#a1.channels.c1.dataDirs = /home/nids/wg/apache-flume-1.5.2-bin/datadir

a1.sinks.k1.hdfs.batchSize = 10000 
#a1.channels.c1.type = memory  
a1.channels.c1.capacity = 100000  
a1.channels.c1.transactionCapacity = 10000  

# Bind the source and sink to the channel
a1.sources.r1.channels = c1  
a1.sinks.k1.channel = c1

求各位高手解答。有时候只写了一部分数据就不再继续了,对单个文件执行时没有问题就是对目录扫描 channel是 memory类型时性能极差。不知道问题出在哪里
  • 写回答

0条回答 默认 最新

    报告相同问题?

    悬赏问题

    • ¥40 复杂的限制性的商函数处理
    • ¥15 程序不包含适用于入口点的静态Main方法
    • ¥15 素材场景中光线烘焙后灯光失效
    • ¥15 请教一下各位,为什么我这个没有实现模拟点击
    • ¥15 执行 virtuoso 命令后,界面没有,cadence 启动不起来
    • ¥50 comfyui下连接animatediff节点生成视频质量非常差的原因
    • ¥20 有关区间dp的问题求解
    • ¥15 多电路系统共用电源的串扰问题
    • ¥15 slam rangenet++配置
    • ¥15 有没有研究水声通信方面的帮我改俩matlab代码