Skip to main content

Big Data: Why Traditional Data warehouses fail?

Over the years, have been involved with few of the data warehousing efforts.   As a concept, I believe that having a functional and active data ware house is essential for an organization. Data warehouses facilitate easy analysis and help analysts in gathering insights about the business.  

But my practical experiences suggest that the reality is far from the expectations. Many of the data warehousing initiatives end up as a high cost, long gestation projects with questionable end results.   I have spoken to few of my associates who are involved in the area and it appears that quite a few of them share my view.

When I query the users and intended users of the data warehouses, I hear issues like:
  • The system is inflexible and is not able to quickly adapt to changing business needs.
  •  By the time, the changes get implemented on the system, the analytical need for which the changes were introduced is no longer relevant.
  • The implementors of the datawarehouse are always looking for the requirements for which the model can be defined and implemented. But the fact is we are in no position to foresee all the requirements upfront and want the system to be flexible to the changing needs.
  • Most of the data ware house systems I use are legacy systems which are so complex and so unmanageable to work with. I am better off build in point solutions which will address my specific need quickly.
  • The latency of system is high.  There are cases where we need to do quick analysis, but we find that it takes a long time before the data is available in the warehouse for analysis. 
It is not that, the implementors of the system do not have their own peeves.  Some of them go like this:
  • We are building systems which are optimized to provide answers.  Unless the users tell us what are the kind of answers they intend to gather from the system, how can I model a system.
  • By definition, Datawarehousing and OLAPs are constructed to be read intensive and for sparse writing.   The way you accomplish this is by ensuring batch loading.  Data Latency is inherent in such concepts and data warehouses are not intended for real time or pseudo real time analytics.
  • User understimate the efforts and costs to ensure data quality and integrity.  It takes significant time, effort and cost to ensure the integrity of the data stored in the ware houses.  Everyone is underestimating the impact of having "Dirty Data" in the warehouse.
  • Everyone should understand that at a very fundamental level, schemas are static and predefined.   Everything is built over this assumption. 
  • Any changes that would impact a change in schema and storage will have a cascading impact and implementing such changes will take time.
  • The number of people and groups we need to interact with is mind boggling.  We need to simultaneously work with, Business owners,  the data owners,  the analysts.  While we are talking to these folks, they are also working in parallel to develop point solutions which address their specific needs.  There is always conflict of priorities.
Looking at it objectively, I find that both sides of the argument are rooted in logic and deserves merit.  But what is more obvious is that these arguments offers a good insight on what are the problems which affect such initiatives and why many of the ware housing projects fail.  


But the mentioned issues also provides an opportunity to start looking at building a system which would bridge this growing gap between users and implementors of the system.

Comments

  1. Nice information, valuable and excellent design, as share good stuff with good ideas and concepts, lots of great information and inspiration, both of which I need, thanks to offer such a helpful information here.
    java training in chennai | java training in bangalore

    java online training | java training in pune

    java training in chennai | java training in bangalore

    ReplyDelete
  2. Appreciating the persistence you put into your blog and detailed information you provide
    python training in pune
    python online training
    python training in OMR

    ReplyDelete
  3. Great post! I am actually getting ready to across this information, It’s very helpful for this blog.Also great with all of the valuable information you have Keep up the good work you are doing well.
    Devops training in velachery
    Devops training in annanagar
    Devops training in sholinganallur

    ReplyDelete
  4. All the points you described so beautiful. Every time i read your i blog and i am so surprised that how you can write so well.
    Blueprism training in Pune

    Blueprism training in Chennai

    ReplyDelete
  5. Resources like the one you mentioned here will be very useful to me ! I will post a link to this page on my blog. I am sure my visitors will find that very useful
    angularjs Training in bangalore

    angularjs Training in bangalore

    angularjs Training in btm

    angularjs Training in electronic-city

    angularjs online Training

    angularjs Training in marathahalli

    ReplyDelete
  6. First of all Big thanks for sharing this with us. Excellent content with a cool idea, the great content of various kinds of the valuable information.

    Selenium Training in Chennai
    Selenium Training
    iOS Training in Chennai
    Digital Marketing Training in Chennai
    core java training in chennai
    SEO Training in Chennai
    SEO Training

    ReplyDelete
  7. Very well great blog post. This is a wonderful article, Given so much info in it, keep sharing.


    Data Science Courses

    ReplyDelete

  8. I am really enjoying reading your well written articles. It looks like you spend a lot of effort and time on your blog. I have bookmarked it and I am looking forward to reading new articles. Keep up the good work.Data Science Courses

    ReplyDelete
  9. Excellent Blog! I would like to thank for the efforts you have made in writing this post. I am hoping the same best work from you in the future as well. I wanted to thank you for this websites! Thanks for sharing. Great websites!
    iot training in malaysia

    ReplyDelete
  10. Hi,
    Good job & thank you very much for the new information, i learned something new. Very well written. It was sooo good to read and usefull to improve knowledge. Who want to learn this information most helpful. One who wanted to learn this technology IT employees will always suggest you take big data hadoop training in bangalore. Because big data course in Bangalore is one of the best that one can do while choosing the course.

    ReplyDelete
  11. This is an awesome motivating article.I am practically satisfied with your great work.You put truly extremely supportive data. Keep it up. Continue blogging. Hoping to perusing your next post
    data science course in malaysia

    ReplyDelete
  12. Good Post! Thank you so much for sharing this pretty post, it was so good to read and useful to improve my knowledge as updated one, keep blogging…
    AWS training in chennai | AWS training in annanagar | AWS training in omr | AWS training in porur | AWS training in tambaram | AWS training in velachery

    ReplyDelete
  13. This is a nice article here with some useful tips for those who are not used-to comment that frequently. Thanks for this helpful information.Great Article… I love to read your articles because your writing style is too good, its is very very helpful for all of us and I never get bored while reading your article because, they are becomes a more and more interesting from the starting lines until the end.
    Data Science Training In Chennai

    Data Science Online Training In Chennai

    Data Science Training In Bangalore

    Data Science Training In Hyderabad

    Data Science Training In Coimbatore

    Data Science Training

    Data Science Online Training

    ReplyDelete
  14. Such a helpful article. Interesting to peruse this article.I might want to thank you for the endeavors you had made for composing this wonderful article.
    data scientist training and placement

    ReplyDelete
  15. Such a valuable article. Well-written and informative article. I really liked it. Thanks for sharing this article. Keep sharing!
    Data Science course in Hyderabad

    ReplyDelete
  16. Much thanks for composing such an intriguing article on this point. This has truly made me think and I plan to peruse more
    data scientist training and placement

    ReplyDelete
  17. Thanks for posting the best information and the blog is very good and the.artificial intelligence course in hyderabad

    ReplyDelete
  18. AI Patasala offers advanced Machine Learning Training in Hyderabad that will teach you the cognitive skills necessary to deal with real-world challenges in the industry.
    Machine Learning Certification in Hyderabad

    ReplyDelete
  19. You completely match our expectation and the variety of our information.
    cyber security training malaysia

    ReplyDelete

Post a Comment

Popular posts from this blog

Overview of Hadoop Ecosystem

Of late, have been looking into the Big Data space and Hadoop in particular.  When I started looking into it, found that there are so many products and tools related to Haddop.   Using this post summarize my discovery about Hadoop Ecosystem. Hadoop Ecosystem A small overview on each is listed below: Data Collection  - Primary objective of these is to move data into a Hadoop cluster Flume : - Apache Flume is a distributed, reliable, and available system for efficiently collecting, aggregating and moving large amounts of log data from many different sources to a centralized data store. Developed by cloudera and currently being incubated at Apache software foundaton. The details about the same can be found here . Scribe : Scribe is a server for aggregating streaming log data. It is designed to scale to a very large number of nodes and be robust to network and node failures. Dveloped by Facebook and can be found here .  Chuckwa : Chuk...

Dilbert on Agile Programing

Dilbert on Agile and Extreme Programming -  Picked up from dilbert.com - Scott Adams.