Musings on Tech

Overview of Hadoop Ecosystem

Of late, have been looking into the Big Data space and Hadoop in particular. When I started looking into it, found that there are so many products and tools related to Haddop. Using this post summarize my discovery about Hadoop Ecosystem. Hadoop Ecosystem A small overview on each is listed below: Data Collection - Primary objective of these is to move data into a Hadoop cluster Flume : - Apache Flume is a distributed, reliable, and available system for efficiently collecting, aggregating and moving large amounts of log data from many different sources to a centralized data store. Developed by cloudera and currently being incubated at Apache software foundaton. The details about the same can be found here . Scribe : Scribe is a server for aggregating streaming log data. It is designed to scale to a very large number of nodes and be robust to network and node failures. Dveloped by Facebook and can be found here . Chuckwa : Chuk...

Musings on Tech

Search This Blog

Posts

Overview of Hadoop Ecosystem