Skip to main content

Popular posts from this blog

Dilbert on Agile Programing

Dilbert on Agile and Extreme Programming -  Picked up from dilbert.com - Scott Adams.

Big Data: Why Traditional Data warehouses fail?

Over the years, have been involved with few of the data warehousing efforts.   As a concept, I believe that having a functional and active data  ware house is essential for an organization. Data warehouses facilitate easy analysis and help analysts in gathering insights about the business.   But my practical experiences suggest that the reality is far from the expectations. Many of the data warehousing initiatives end up as a high  cost, long gestation projects with questionable end results.   I have spoken to few of my associates who are involved in the area and it appears that  quite a few of them share my view. When I query the users and intended users of the data warehouses, I hear issues like: The system is inflexible and is not able to quickly adapt to changing business needs.  By the time, the changes get implemented on the system, the analytical need for which the changes were introduced is no longer relevant. The implementors of the datawarehouse are always look

Overview of Hadoop Ecosystem

Of late, have been looking into the Big Data space and Hadoop in particular.  When I started looking into it, found that there are so many products and tools related to Haddop.   Using this post summarize my discovery about Hadoop Ecosystem. Hadoop Ecosystem A small overview on each is listed below: Data Collection  - Primary objective of these is to move data into a Hadoop cluster Flume : - Apache Flume is a distributed, reliable, and available system for efficiently collecting, aggregating and moving large amounts of log data from many different sources to a centralized data store. Developed by cloudera and currently being incubated at Apache software foundaton. The details about the same can be found here . Scribe : Scribe is a server for aggregating streaming log data. It is designed to scale to a very large number of nodes and be robust to network and node failures. Dveloped by Facebook and can be found here .  Chuckwa : Chukwa is a Hadoop subproject dev