Of late, have been looking into the Big Data space and Hadoop in particular. When I started looking into it, found that there are so many products and tools related to Haddop. Using this post summarize my discovery about Hadoop Ecosystem.
Machine Learning - Specialized software that implement machine learning algorithms over Hadoop.
Data Collection - Primary objective of these is to move data into a Hadoop cluster
Flume: - Apache Flume is a distributed, reliable, and available system for efficiently collecting, aggregating and moving large amounts of log data from many different sources to a centralized data store. Developed by cloudera and currently being incubated at Apache software foundaton. The details about the same can be found here.
Scribe: Scribe is a server for aggregating streaming log data. It is designed to scale to a very large number of nodes and be robust to network and node failures. Dveloped by Facebook and can be found here.
Chuckwa: Chukwa is a Hadoop subproject devoted to large-scale log collection and analysis. Chukwa is built on top of the Hadoop distributed filesystem (HDFS) and MapReduce framework and inherits Hadoop’s scalability and robustness. Details about Chhuckwa can be found here.
Kafka: Kafka provides a publish-subscribe solution that can handle all activity stream data and processing on a consumer-scale web site. Kafka is being incubated at Apache software foundation and details about Kafka can be found here.
Sqoop - Sqoop allows easy import and export of data from structured data stores such as relational databases, enterprise data warehouses, and NoSQL systems. Using Sqoop, you can provision the data from external system on to HDFS, and populate tables in Hive and HBase. cloudera is the creator of Sqoop and Sqoop is currently undergoing incubation at Apache Software Foundation. More information on this project can be found here.
HIHO - HIHO aims to integrate Hadoop with existing data centric systems. There is a need to connect Hadoop to different systems, like databases, report display tools etc to fully leverage the functionality offered by each. HIHO project attempts to do so. Details on HIHO can be found here.
Core Engine - These form the core Engine for storage and processing
HDFS - HDFS (Hadoop Distributed File System) is the primary distributed storage used by Hadoop applications. A HDFS cluster primarily consists of a NameNode that manages the file system metadata and DataNodes that store the actual data. HDFS is part of core Hadoop framework and is a subproject of Apache Hadoop project. Additonal information pertaining to HDFS can be found here.
Map Reduce - Hadoop MapReduce is a programming model and software framework for writing applications that rapidly process vast amounts of data in parallel on large clusters of compute nodes. Good starting point to understand mapreduce would be here.
Data Storage - Specialized storage on Hadoop
HBase - HBase is the Hadoop database. We can think of it as a distributed, scalable, big data store. The most common usage scenario for HBase is to use it when you need random, realtime read/write access to your Big Data. HBase is a sub project of Apache Hadoop project and information on HBase can be found here.
Support Extensions - These help in extending and supporting core activities on Hadoop
Avro - Avro is an additon to apache family of products and address the area of Data serialization. Apache Avro is a data serialization system that provides Rich data structures, A compact, fast, binary data format, A container file, to store persistent data, Remote procedure call (RPC) and integration with dynamic languages. Information on Avro can be found here.
Zookeeper - ZooKeeper is part of Apache family of products and is a centralized service for maintaining configuration information, naming, providing distributed synchronization, and providing group services. All of these kinds of services are used in some form or another by distributed applications like Hadoop. Details about zookeeper can be found here.
Oozie - Oozie is a workflow/coordination system to manage Apache Hadoop job. Oozie is being incubated at Apache software foundation and details regarding oozie can be found here.
Thrift - he Apache Thrift software framework, for scalable cross-language services development, combines a software stack with a code generation engine to build services that work efficiently and seamlessly between C++, Java, Python, PHP, Ruby, Erlang, Perl, Haskell, C#, Cocoa, JavaScript, Node.js, Smalltalk, OCaml and Delphi and other languages. Thrift also provides a set of API to connect with HBase. Details on Thrift can be found here.
Scripting Languages - These provide easy ways to facilitate processing on Hadoop
Jaql - Jaql is a high-level scripting language for the JavaScript Object Notation (JSON). It is able to run on Hadoop and break most requests down to Map/Reduce tasks. Jaql heavily borrowes from SQL, XQuery, LISP, Pig Latin, JavaScript and Unix Pipes. Developed primarily inside Ibm and part of BigInsights project. The good starting point for JAQL will be here.
Pig - Pig is a high-level scripting language for data transformation. It is a Hadoop Subproject and can be looked at as a platform for analyzing large data sets that consists of a high-level language for expressing data analysis programs, coupled with infrastructure for evaluating these programs. Details about Pig can be found here.
Hive - Hive is a data warehouse system for Hadoop that facilitates easy data summarization, ad-hoc queries, and the analysis of large datasets stored in Hadoop compatible file systems. Hive is a subproject of Apache hadoop project and information w.r.t to Hive can be found here.
Data Analytics - These are specialized software to perform Analytics on data stored on Hadoop
Intellicus : Intellicus, with a variety of powerful features, extends Ad hoc Reporting and Dash boarding solution to support Big Data. Intellicus provides data source connectivity through Hive to all variants of Hadoop like Apache and Cloudera. It provides both background batch processing and online query processing for Ad hoc reporting using HiveQL. Intellecus as a basic free edition but also has feature rich paid editions. The website for intellicus is https://www.intellicus.com
Karmasphere - Karmasphere provides self-service access to Big Data and analytic functions for faster, more efficient analysis and collaboration. Visually explore data for patterns and trends, iteratively analyze data using familiar queries and skills. Look up details of karmasphhere at https://karmasphere.com
Monitoring and management - These help to monitor and manage the jobs and data on Hadoop
Gangalia: Ganglia is a scalable distributed monitoring system for high-performance computing systems such as clusters and Grids. It uses carefully engineered data structures and algorithms to achieve very low per-node overheads and high concurrency. Ganglia is a BSD-licensed open-source project that grew out of the University of California, Berkeley Millennium Project. The details of Ganglia can be found here.
Hue: Hue is both a web UI for Hadoop and a framework to create interactive web applications. It features a FileBrowser for accessing HDFS, JobSub and JobBrowser applications for submitting and viewing MapReduce jobs, a Beeswax application for interacting with Hive. Hue grew out of Cloudra and details pertaining to Hue can be found here.
Cacti: Cacti is a complete network graphing solution designed to harness the power of RRDTool's data storage and graphing functionality. Cacti provides a fast poller, advanced graph templating, multiple data acquisition methods, and user management features out of the box.There is a good set of hadoop cacti templates that can be used to visualize the data exposed by Hadoop jmx. Look for details on cacti here.
Hadoop high level application - These are application/frameworks which leverage Hadoop for specialized purposes
Search Related - These are software which leverage Hadoop in web crawling and indexing.
Nutch: Apache Nutch is an open source web-search software project. Nutch is a project of the Apache Software Foundation. Nutch is a Web crawler written in Java. By using it, we can find Web page hyperlinks in an automated manner. Nutch can run hadoop clusters and a good source to begin reading on this would be here.
Solr: Solr is the popular, blazing fast open source enterprise search platform from the Apache Lucene project. Its major features include powerful full-text search, hit highlighting, faceted search, dynamic clustering, database integration, rich document (e.g., Word, PDF) handling, and geospatial search. Solr integrates easily with Nutch. Nutch crawls the web and solr indexes the crawled data. Details of Solr can be found here.
Machine Learning - Specialized software that implement machine learning algorithms over Hadoop.
Mahout - Mahout is a subproject in Apache family of products on Hadoop. The Apache Mahout library's goal is to build scalable machine learning libraries. Mahout core algorithms for clustering, classfication and batch based collaborative filtering are implemented on top of Apache Hadoop using the map/reduce paradigm. A starting point for information on Mahout would be here.
Hi Girish,
ReplyDeleteI am working with Intellicus, one of the products that you referred to in your blog. I would like to share the progress made by Intellicus in Big data space during last one year. The latest release of Intellicus provides data source connectivity to Hive, Pig, HBase, custom map reduce jobs, HDFS, AWS, S3. One can also invoke R server from Intellicus and create interactive charts in Intellicus from the response received by R server. Intellicus also supports Adhoc reporting over Greenplum, Vertica, Teradata,GridSQL & Sand.
Intellicus also provides a unique solution which is MOLAP on Hadoop, which takes the execution of the cube to Hadoop rather than bringing data out of Hadoop.
Updating with the recent skills and applying it is the only tactic to live in our vocation. You have done really a great job by sharing this blog in here. Keep writing blog like this.
ReplyDeleteHadoop Training in Bangalore
Excellent blog helpful to everyone Hadoop training in bangalore
ReplyDeleteTableau training in bangalore
Very Nice Blog on Hadoop Overview.. Thank You For sharing
ReplyDeleteDevops Training in Bangalore
itEanz
excellent blog python interview questions
ReplyDeleteNice blog
ReplyDeleteIot Training in Bangalore
Iteanz
very nice artificial intelligence training in bangalore
ReplyDeleteGreat information for Artifiacial Intelligence Training in Bangalore
ReplyDeleteIot Interview Questions
very helpfull blog it was a pleasure reading your blog
ReplyDeletewould love to read it more
knowldege is not found but earned through hardwork and good teaching
that being said click here to join us the next best thing in bangalore
devops online training
Devops Training in Bangalore
Thanks for sharing information nice blog
ReplyDeletepython online training
artificial intelligence online training
power bi training in bangalore
Talend training in bangalore
talend online training
wordpress interview questions and answers
python training in bangalore
corporate training companies in bangalore
Hi, from your post i learnt about hadoop ecosystem ,do keep posting your blog | Hadoop Training .
ReplyDeleteHadoop Training in Chennai | Hadoop .
ReplyDeleteThis really has covered a great insight on Hadoop. I found myself lucky to visit your page and came across this insightful read on Hadoop tutorial. Please allow me to share similar work on Hadoop training course . Watch and gain knowledge today.https://www.youtube.com/watch?v=SwDZhlnr9ho
Wonderful blog & good post.Its really helpful for me, awaiting for more new post. Keep Blogging !!
ReplyDeletePower BI Training in Chennai | Power BI Training Institute in Chennai
Awesome Blog with Smart Content
ReplyDeleteHadoop training in Hyderabad
Hadoop training in Bangalore
Very good information. we need learn from real time examples and for this we choose good training institute, who were interested to know about Hadoop which is quite interesting. We need a good training institute for my learning .. so people making use of the free demo classes.
ReplyDeleteMany training institute provides free demo classes. One of the best training institute in Bangalore is Apponix Technologies.
https://www.apponix.com/Big-Data-Institute/hadoop-training-in-bangalore.html
nice post..
ReplyDeleteforeach loop in node js
ywy cable
javascript integer max value
adder and subtractor using op amp
"c program to find frequency of a word in a string"
on selling an article for rs 1020, a merchant loses 15%. for how much price should he sell the article to gain 12% on it ?
paramatrix interview questions
why you consider yourself suitable for the position applied for
Very excellent post.The knowledge you have been sharing through this post is very helpul to bring up new ideas and to innovate big things.I suggesst everyone to go through this blog,you not ony get knowledge on it but enjoy reading the post.Thanks for sharing.
ReplyDeletePython Certification Course in Bangalore
Exelent post...
ReplyDeleteInplant Training in Chennai
Iot Internship
Internship in Chennai for CSE
Internship in Chennai
Python Internship in Chennai
Implant Training in Chennai
Android Training in Chennai
R Programming Training in Chennai
Python Internship
Internship in chennai for EEE
This comment has been removed by the author.
ReplyDeletenice post......!!
ReplyDeletepoland web hosting
russian federation web hosting
slovakia web hosting
spain web hosting
suriname
syria web hosting
united kingdom
united kingdom shared web hosting
zambia web hosting
inplant training in chennai
This comment has been removed by the author.
ReplyDeleteNice post. Thanks for sharing! I want people to know just how good this information is in your blog. It’s interesting content and Great work
ReplyDeletedata analytics course
Business Analytics Certification Course Training in Hyderabad
<a href="https://360digitmg.com/india/python-r-programming/''>Python & R Programming Course Training for Beginners</a>
nice blog..
ReplyDeletecoronavirus update
inplant training in chennai
inplant training
inplant training in chennai for cse
inplant training in chennai for ece
inplant training in chennai for eee
inplant training in chennai for mechanical
internship in chennai
online internships
Great post...
ReplyDeleteCoronavirus Update
Intern Ship In Chennai
Inplant Training In Chennai
Internship For CSE Students
Online Internships
Internship For MBA Students
ITO Internship
I think great site for these post and I am read the most of contents have useful for my Carrier.Thanks for these useful information.Any information are commands like to share himThank you for your post. This is useful information.
ReplyDeleteHere we provide our special one's.Java training in Chennai
Java Online training in Chennai
Java Course in Chennai
Best JAVA Training Institutes in Chennai
Java training in Bangalore
Java training in Hyderabad
Java Training in Coimbatore
Java Training
Java Online Training
Quite Interesting post!!! Thanks for posting such a useful post. I wish to read your upcoming post to enhance my skill set, keep blogging.I am reading your post from the beginning, it was so interesting to read & I feel thanks to you for posting such a good blog, keep updates regularly.
ReplyDeleteselenium training in chennai
selenium training in chennai
selenium online training in chennai
software testing training in chennai
selenium training in bangalore
selenium training in hyderabad
selenium training in coimbatore
selenium online training
selenium training
Thanks for sharing an informative blog keep rocking bring more details.I like the helpful info you provide in your articles. I’ll bookmark your weblog and check again here regularly.
ReplyDeleteSoftware Testing Training in Chennai | Certification | Online
Courses
Software Testing Training in Chennai
Software Testing Online Training in Chennai
Software Testing Courses in Chennai
Software Testing Training in Bangalore
Software Testing Training in Hyderabad
Software Testing Training in Coimbatore
Software Testing Training
Software Testing Online Training
Awesome..I read this post so nice and very imformative information...thanks for sharing
ReplyDeleteangular js training in chennai
angular training in chennai
angular js online training in chennai
angular js training in bangalore
angular js training in hyderabad
angular js training in coimbatore
angular js training
angular js online training
Positive site, where did u come up with the information on this posting?I have read a few of the articles on your website now, and I really like your style. Thanks a million and please keep up the effective work
ReplyDeleteDevOps Training in Chennai
DevOps Online Training in Chennai
DevOps Training in Bangalore
DevOps Training in Hyderabad
DevOps Training in Coimbatore
DevOps Training
DevOps Online Training
Nice article i was really impressed by seeing this article, it was very interesting and it is very useful for me.Thanks for sharing this wonderful content.its very useful to us.I gained many unknown information, the way you have clearly explained is really fantastic.keep posting such useful information.
ReplyDeleteFull Stack Training in Chennai | Certification | Online Training Course
Full Stack Training in Bangalore | Certification | Online Training Course
Full Stack Training in Hyderabad | Certification | Online Training Course
Full Stack Developer Training in Chennai | Mean Stack Developer Training in Chennai
Full Stack Training
Full Stack Online Training
thanks for sharing such a nice info.I hope you will share more information like this. please keep on sharing!keep up!!
ReplyDeleteAndroid Training in Chennai
Android Online Training in Chennai
Android Training in Bangalore
Android Training in Hyderabad
Android Training in Coimbatore
Android Training
Android Online Training
Wonderful blog & good post.Its really helpful for me, awaiting for more new post. Keep Blogging !!
ReplyDeleteAWS Course in Bangalore
AWS Course in Hyderabad
AWS Course in Coimbatore
AWS Course
AWS Certification Course
AWS Certification Training
AWS Online Training
AWS Training
Thanks for sharing Valuable information.
ReplyDeleteacte chennai
acte complaints
acte reviews
acte trainer complaints
acte trainer reviews
acte velachery reviews complaints
acte tambaram reviews complaints
acte anna nagar reviews complaints
acte porur reviews complaints
acte omr reviews complaints
It’s really Nice and Meaningful. It’s really cool Blog. You have really helped lots of people who visit Blog and provide them Useful Information. Thanks for Sharing.
ReplyDeleteIELTS Coaching in chennai
German Classes in Chennai
GRE Coaching Classes in Chennai
TOEFL Coaching in Chennai
spoken english classes in chennai | Communication training
Hi, Thanks for sharing nice articles...
ReplyDeleteData Science Training in Hyderabad
Raise month approach doctor it president. Western share seek.technology
ReplyDelete