flume采集数据到hdfs性能问题

本人目前遇到flume采集写入hdfs性能等各种问题，大致如下。在10上的xx/xx目录下的数据进行读取 sink到08上的flume 由08上的flume写到07的hdfs上 30多m的文件写了好久。有时候会内存溢出等问题图片说明

Name the components on this agent

a1.sources = r1

a1.sinks = k1

a1.channels = c1

Describe/configure the source

a1.sources.r1.type = avro

a1.sources.r1.bind = r09n08

a1.sources.r1.port = 55555

a1.sources.r1.interceptors = i1

a1.sources.r1.interceptors.i1.type = timestamp

#hdfs sink
a1.sinks.k1.type = hdfs

a1.sinks.k1.hdfs.path = hdfs://r09n07:8020/project/dame/input/%Y%m%d/%H
a1.sinks.k1.hdfs.fileType = DataStream
a1.sinks.k1.hdfs.filePrefix = hdfs-
a1.sinks.k1.hdfs.rollInterval = 0
#a1.sinks.k1.hdfs.fileSuffix = .log

#a1.sinks.k1.hdfs.round = true

#a1.sinks.k1.hdfs.roundValue = 1

#a1.sinks.k1.hdfs.roundUnit = minute

a1.sinks.k1.hdfs.rollSize = 67108864

a1.sinks.k1.hdfs.rollCount = 0

#a1.sinks.k1.hdfs.writeFormat = Text

Use a channel which buffers events in file

a1.channels = c1
a1.channels.c1.type = memory
#a1.channels.c1.checkpointDir=/home/nids/wg/apache-flume-1.5.2-bin/checkpoint
#a1.channels.c1.dataDirs=/home/nids/wg/apache-flume-1.5.2-bin/datadir

a1.sinks.k1.hdfs.batchSize = 10000
#a1.sinks.k1.hdfs.callTimeout = 6000
#a1.sinks.k1.hdfs.appendTimeout = 6000

#a1.channels.c1.type = memory

a1.channels.c1.capacity = 100000

a1.channels.c1.transactionCapacity = 10000
a1.sources.r1.channels = c1

a1.sinks.k1.channel = c1

上面是08机器上的配置文件

 下面是10机器上的配置文件
 # Name the components on this agent
a1.sources = r1  
a1.sinks = k1  
a1.channels = c1

# Describe the sink
a1.sinks.k1.type = logger  

####
a1.sources.r1.type = spooldir 
a1.sources.r1.spoolDir = /home/nids/wg/apache-flume-1.5.2-bin/ceshi12
a1.sources.r1.fileHeader =false
a1.sources.r1.channels = c1
####

# Describe/configure the source
#a1.sources.r1.type = avro   
a1.sources.r1.bind = localhost  
a1.sources.r1.port = 44444 

# avro sink   
a1.sinks.k1.type = avro  
a1.sinks.k1.channel = c1
a1.sinks.k1.hostname = r09n08  
a1.sinks.k1.port = 55555

# Use a channel which buffers events in file
a1.channels = c1
a1.channels.c1.type = memory
#a1.channels.c1.checkpointDir = /home/nids/wg/apache-flume-1.5.2-bin/checkpoint
#a1.channels.c1.dataDirs = /home/nids/wg/apache-flume-1.5.2-bin/datadir

a1.sinks.k1.hdfs.batchSize = 10000 
#a1.channels.c1.type = memory  
a1.channels.c1.capacity = 100000  
a1.channels.c1.transactionCapacity = 10000  

# Bind the source and sink to the channel
a1.sources.r1.channels = c1  
a1.sinks.k1.channel = c1

求各位高手解答。有时候只写了一部分数据就不再继续了，对单个文件执行时没有问题就是对目录扫描 channel是 memory类型时性能极差。不知道问题出在哪里

写回答
好问题 0 提建议
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

报告相同问题？

关注问题

Flume采集Rabbitmq数据同步到HDFS
2022-05-30 00:05

在这个场景中，我们将探讨如何使用Flume从RabbitMQ消息队列中采集数据，并同步到HDFS。 RabbitMQ是一个开源的消息代理和队列服务器，广泛应用于各种分布式系统中，提供可靠的数据交换机制。它允许生产者发送消息，...
Flume 采集数据到hdfs 小文件优化
2021-03-20 13:24

Genebrother的博客众所周知，从flume采集数据到hdfs的时候，需要避免小文件的产生，太多的小文件，在数据处理的过程中，会降低数据处理的性能，那么在日常的flume采集到hdfs的文件，如果避免小文件的产生？在flume的sink操作时，...
基于Flume、Kafka、HDFS和Hive的日志采集与实时数据分析系统设计与实践终极版
2025-08-29 21:14

系统流程包括日志文件通过Flume采集并传输至Kafka，经Kafka缓冲后由Flume Sink写入HDFS，最终加载为Hive表进行数据分析与查询。文章重点讲解了各组件的配置优化技巧，如Flume的TAILDIR源防丢数据、Kafka的分区与压缩...
Flume采集数据到Hive&Hbase
2022-11-11 16:24

蓝桉Д的博客 Flum采集数据到Hive&Hbase
使用Flume消费Kafka数据并落盘到HDFS
2021-08-03 16:38

@李思成的博客在hadoop104的/opt/module/flume/conf目录下创建kafka-flume-hdfs.conf文件 [lili@hadoop104 conf]$ vim kafka-flume-hdfs.conf 文件配置内容如下： #定义组件 #由于要分别从Kafka的两个分区中获得数据，因此我们...
Flume采集数据到Hive&HBase
2022-11-11 10:48

哈了个Doop的博客 Flume采集数据到Hive&HBase
Flume数据采集项目常见问题——（一）
2023-04-17 21:06

平平无奇程序猿的博客几个简单的和flume，maxwell传输相关的小问题
7.Flume采集数据上传到集群
2023-02-03 14:32

Allenspringfestival的博客 HDFS的结构以及作用 HBASE
基于 Apache Flume 定制的数据采集工具.zip
2024-01-04 01:45

Apache Flume 是一个高度可配置、可靠且分布式的数据采集系统，常用于收集、聚合和移动大量日志数据。它设计的目标是将数据流从多个源有效地传输到一个或多个目标，例如 HDFS（Hadoop 分布式文件系统）或任何其他...
Flume 搭建采集静态、动态日志到hdfs与Channel三种（内存、磁盘、kafka）类型介绍
2022-11-09 15:48

房石阳明i的博客 Flume 搭建采集静态、动态日志到hdfs与Channel三种（内存、磁盘、kafka）类型介绍
没有解决我的问题, 去提问

码龄粉丝数原力等级 --

flume采集数据到hdfs性能问题

Name the components on this agent

Describe/configure the source

Use a channel which buffers events in file

0条回答默认最新

flume采集数据到hdfs性能问题

Name the components on this agent

Describe/configure the source

Use a channel which buffers events in file

0条回答 默认 最新

0条回答默认最新