Difference between Hadoop 1 and Hadoop 2 Last Updated : 23 Jun, 2022 Comments Improve Suggest changes Like Article Like Report Hadoop is an open source software programming framework for storing a large amount of data and performing the computation. Its framework is based on Java programming with some native code in C and shell scripts. Hadoop 1 vs Hadoop 2 1. Components: In Hadoop 1 we have MapReduce but Hadoop 2 has YARN(Yet Another Resource Negotiator) and MapReduce version 2. Hadoop 1Hadoop 2HDFSHDFSMap ReduceYARN / MRv2 2. Daemons: Hadoop 1Hadoop 2NamenodeNamenodeDatanodeDatanodeSecondary NamenodeSecondary NamenodeJob TrackerResource ManagerTask TrackerNode Manager 3. Working: In Hadoop 1, there is HDFS which is used for storage and top of it, Map Reduce which works as Resource Management as well as Data Processing. Due to this workload on Map Reduce, it will affect the performance.In Hadoop 2, there is again HDFS which is again used for storage and on the top of HDFS, there is YARN which works as Resource Management. It basically allocates the resources and keeps all the things going on. 4. Limitations: Hadoop 1 is a Master-Slave architecture. It consists of a single master and multiple slaves. Suppose if master node got crashed then irrespective of your best slave nodes, your cluster will be destroyed. Again for creating that cluster means copying system files, image files, etc. on another system is too much time consuming which will not be tolerated by organizations in today's time. Hadoop 2 is also a Master-Slave architecture. But this consists of multiple masters (i.e active namenodes and standby namenodes) and multiple slaves. If here master node got crashed then standby master node will take over it. You can make multiple combinations of active-standby nodes. Thus Hadoop 2 will eliminate the problem of a single point of failure. 5. Ecosystem: Oozie is basically Work Flow Scheduler. It decides the particular time of jobs to execute according to their dependency.Pig, Hive and Mahout are data processing tools that are working on the top of Hadoop.Sqoop is used to import and export structured data. You can directly import and export the data into HDFS using SQL database.Flume is used to import and export the unstructured data and streaming data. 6. Windows Support: in Hadoop 1 there is no support for Microsoft Windows provided by Apache whereas in Hadoop 2 there is support for Microsoft windows. Comment More infoAdvertise with us Next Article Hadoop Version 3.0 - What's New? A AnkitSelwal Follow Improve Article Tags : Big Data Similar Reads Hadoop Version 3.0 - What's New? Hadoop is a Java-based framework for distributed storage and processing of large datasets. Introduced in 2006 by Doug Cutting and Mike Cafarella for the Nutch project, it soon became central to Big Data technologies. By 2008, it outperformed supercomputers in sorting terabytes of data. With Hadoop 2 5 min read Difference Between Hadoop 2.x vs Hadoop 3.x The Journey of Hadoop Started in 2005 by Doug Cutting and Mike Cafarella. Which is an open-source software build for dealing with the large size Data? The objective of this article is to make you familiar with the differences between the Hadoop 2.x vs Hadoop 3.x version. Obviously, Hadoop 3.x has so 2 min read Difference Between Hadoop and Cassandra Hadoop is an open-source software programming framework. The framework of Hadoop is based on Java Programming Language with some native code in shell script and C. This framework is used to manage, store and process the data & computation for the different applications of big data running under 2 min read Hadoop | History or Evolution Hadoop is an open source framework overseen by Apache Software Foundation which is written in Java for storing and processing of huge datasets with the cluster of commodity hardware. There are mainly two problems with the big data. First one is to store such a huge amount of data and the second one 4 min read Difference Between Hadoop and Hive Hadoop: Hadoop is a Framework or Software which was invented to manage huge data or Big Data. Hadoop is used for storing and processing large data distributed across a cluster of commodity servers. Hadoop stores the data using Hadoop distributed file system and process/query it using the Map-Reduce 2 min read Difference Between Hadoop and SQL Performance Hadoop: Hadoop is an open-source software framework written in Java for storing data and processing large datasets ranging in size from gigabytes to petabytes. Hadoop is a distributed file system that can store and process a massive amount of data clusters across computers. Hadoop from being open so 4 min read Like