Hadoop在我的项目中派上用场吗？ [关闭]

A couple days ago I was asked by my company to find requirements to start a project. The project is creating an e-book store. The term simple, but the total amount of data is about 4TB and the number of files are around 500,000.

As my team members use php and mysql, I tried to look around apache for big data. I obviously faced apache haadoop, and mysql-cluster for big data. But after several days of digging on google, I'm now just completely confused! I now have these questions:

Are even these amount of data (4-5TB) considered as big data? (Some sources said that at least 5TB of data should use hadoop, some other said big data for hadoop mean Zetabytes and Petabytes)
Does hadoop ship with it's own special database, or should be used with mysql or etc.?
Does hadoop works only on a cluster, or it works on a single-nod server as fine?

As I faced these terms very recent, I believe that some or all of my questions maybe really silly... But I'll be really grateful if you have other suggestions for this type project.

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

2条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
duanjing2013 2014-01-14 00:49
关注
Here are my short answers

Are even these amount of data (4-5TB) considered as big data? (Some sources said that at least 5TB of data should use hadoop, some other said big data for hadoop mean Zetabytes and Petabytes)

Yes and no. For certain usecases, this is not big enough data while for others, it is. Questions that should be asked and answered

Is this data is growing. What is the rate of growth.

Are you going to run some analytics on this data from time to time

Does hadoop ship with it's own special database, or should be used with mysql or etc.?

Yes, Hadoop has HDFS file system, which can store flatfile and can be treated like data repository. But that may not be the best solution. You may want to look at NoSQL DBs like Cassandra, HBase, MongoDB

Does hadoop works only on a cluster, or it works on a single-nod server as fine?

Technically, yes, hadoop can run on a single nod in Pseudo cluster or standalone mode. But that is used only for learning or testing purpose for development. For any production environment you should think of Hadoop clusters spanning multiple VMs.... Minimum I saw in prod was 6 VM.

As such 5TB is not very big volume for Relational DB (that supports clustering). But cost of supporting relational DB goes up exponentially with capacity. While with Hadoop and just HDFS, the cost is very low.... add Cassandra or HBase...not much difference. But remember, simply using hadoop, you are looking at a high latency system. If your expectation is that Hadoop will answer your queries in real time ...please look out for other solutions. (eg:queries like list all books checked out to Xyz", then just get it from DB... don't use Hadoop for that query).

Overall my suggestion will be, take a crash course on Hadoop from youtube, cloudera, try to gain some expertise on what is Hadoop and what is not and then decide. Your questions gives an impression , that you have a long learning curve ahead and it is worth taking that challenge.
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

查看更多回答(1条)

报告相同问题？

关注问题

Hadoop在我的项目中派上用场吗？ [关闭] apache hadoop mysql php
2014-01-13 21:24

回答 2 已采纳 Here are my short answers Are even these amount of data (4-5TB) considered as big data? (Some so
工信部hadoop中级或高级证书有用吗？大数据职场和发展问答团队
2023-03-24 09:26

回答 1 已采纳工信部Hadoop中级或高级证书对于从事大数据相关工作的人来说是有用的，它可以证明个人在Hadoop技术方面的专业能力和水平，对于个人的职业发展和晋升也有一定的帮助。但需要注意的是，证书只是一个衡量标
Hadoop的配置文件datanode需要改吗？ hadoop
2022-08-13 22:09

回答 1 已采纳主要还是看配置的端口有没有被占用，有的话就需要改
PHP+Hadoop项目
2019-12-16 11:22

宁小法的博客学习使用PHP+Hadoop已经相关技术进行[分布式]开发. 2.学习/操作 1.PHP+Hadoop实现数据统计分析 https://blog.csdn.net/hao508506/article/details/63002073 后续补充 ... ...
Hadoop集群可以不开启Kerberos来使用ranger吗？我只需要权限控制ok吗，使用Linux用户去认证 hadoop 大数据
2022-10-11 09:36

回答 1 已采纳可以的，不开启kerberos，只安装ranger即可
hadoop1.x和2.x架构上的区别? hadoop 大数据数据挖掘
2022-10-26 11:41

回答 1 已采纳（1）Hadoop 1.0Hadoop 1.0即第一代Hadoop，由分布式存储系统HDFS和分布式计算框架MapReduce组成，其中，HDFS由一个NameNode和多个DataNode组成，Ma
Hadoop集群搭建格式化多次，没有了data文件，还有啥解决办法吗？ hadoop
2022-07-14 10:44

回答 2 已采纳每次重新搭建的时候，data文件不是自己去创建指定的吗，如果删除了那基本没法找回
Hadoop NameNode和SecondaryNameNode通常能在一台机器上启动吗？
2017-10-10 16:19

人蠢多读书的博客转载来自：...下列哪个程序通常与 NameNode 在一个节点启动？ a)SecondaryNameNode b)DataNode c)TaskTracker d)Jobtracker 答案D 分析： hadoop的集群是基于master
按照Hadoop教程在windows上安装出现一系列错误 hadoop windows 大数据有问必答
2021-11-28 16:10

回答 1 已采纳仔细检查一下这些情况jdk的环境变量是否配置好了，jdk的版本一般使用1.8不要太高hadoop标签里面含有空格，单词拼写错误hadoop配置文件是否存在中文注释hdfs进程是否开启
我的jar包在hadoop运行程序出现了问题（非代码错误） hadoop java 有问必答
2021-07-27 18:50

回答 2 已采纳可参考：https://blog.csdn.net/wk51920/article/details/51698042https://stackoverflow.com/questions/145540
哈喽，可以帮我看下这个问题吗？ centos hadoop 运维
2022-09-17 12:23

回答 2 已采纳问题解决了我是事先执行了fuser -km /home导致RPC关闭与文件漂移，开启RPC并重启服务器即可
怎么在CentOS Linux 8上安装Hadoop？安装配置Hadoop的详细步骤
2019-12-29 21:32

9Tristone的博客前期准备必备软件安装好CentOS Linux 8 并升级完内核和软件包之后再进行Hadoop的安装...手里有一台Dell R620服务器，共有8块硬盘，通过VMware将虚拟机分布在这8块硬盘上，模拟8台服务器，在一个机架上：编号：/B...
hadoop hdfs dfs没有任何反应？ hadoop hdfs 大数据
2022-10-04 19:13

回答 1 已采纳不是没有反应，而是目录下没有文件，所以ls后没有内容可以显示，可以试试ls /
将idea项目打包在Hadoop集群运行 / 在Hadoop集群上运行jar包
2020-11-10 15:10

小白本白ing的博客将idea项目打包在集群运行
一文搞懂什么是Hadoop？Hadoop的优点有哪些？Hadoop⽣态圈【详细介绍】
2021-08-27 10:16

报告，今天也有好好学习的博客在目前市面上用的比较广泛的数据仓库是Hive，而Hive又是依存于Hadoop这样一个开源的分布式计算平台上。所以本篇博客我们就来介绍一下Hadoop。 Hadoop概述 1 Hadoop简介 Hadoop是什么？简单来说，Hadoop就是解决⼤...
没有解决我的问题, 去提问

悬赏问题

¥15 乘性高斯噪声在深度学习网络中的应用
¥15 运筹学排序问题中的在线排序
¥15 关于docker部署flink集成hadoop的yarn，请教个问题 flink启动yarn-session.sh连不上hadoop，这个整了好几天一直不行，求帮忙看一下怎么解决
¥30 求一段fortran代码用IVF编译运行的结果
¥15 深度学习根据CNN网络模型，搭建BP模型并训练MNIST数据集
¥15 C++ 头文件/宏冲突问题解决
¥15 用comsol模拟大气湍流通过底部加热（温度不同）的腔体
¥50 安卓adb backup备份子用户应用数据失败
¥20 有人能用聚类分析帮我分析一下文本内容嘛
¥30 python代码，帮调试，帮帮忙吧

Hadoop在我的项目中派上用场吗？ [关闭]

2条回答 默认 最新

悬赏问题

2条回答默认最新