A couple days ago I was asked by my company to find requirements to start a project. The project is creating an e-book store. The term simple, but the total amount of data is about 4TB and the number of files are around 500,000.
As my team members use php and mysql, I tried to look around apache for big data. I obviously faced apache haadoop, and mysql-cluster for big data. But after several days of digging on google, I'm now just completely confused! I now have these questions:
Are even these amount of data (4-5TB) considered as big data? (Some sources said that at least 5TB of data should use hadoop, some other said big data for hadoop mean Zetabytes and Petabytes)
Does hadoop ship with it's own special database, or should be used with mysql or etc.?
Does hadoop works only on a cluster, or it works on a single-nod server as fine?
As I faced these terms very recent, I believe that some or all of my questions maybe really silly... But I'll be really grateful if you have other suggestions for this type project.