Data Processing | Hadoop is mainly designed for batch processing which is very efficient in processing large datasets. | It supports batch processing as well as stream processing. | It supports both batch and stream processing. Flink also provides the single run-time for batch and stream processing. |
Stream Engine | It takes the complete data-set as input at once and produces the output. | Process data streams in micro-batches. | The true streaming engine uses streams for workload: streaming, micro-batch, SQL, batch. |
Data Flow | Data Flow does not contain any loops. supports linear data flow. | Spark supports cyclic data flow and represents it as (DAG) direct acyclic graph. | Flink uses a controlled cyclic dependency graph in run time. which efficiently manifest ML algorithms. |
Computation Model | Hadoop Map-Reduce supports the batch-oriented model. | It supports the micro-batching computational model. | Flink supports a continuous operator-based streaming model. |
Performance | Slower than Spark and Flink. | More than Hadoop lesser than Flink. | Performance is highest among these three. |
Memory management | Configurable Memory management supports both dynamically or statically management. | The Latest release of spark has automatic memory management. | Supports automatic memory management |
Fault tolerance | Highly fault-tolerant using a replication mechanism. | Spark RDD provides fault tolerance through lineage. | Fault tolerance is based on Chandy-Lamport distributed snapshots results in high throughput. |
Scalability | Highly scalable and can be scaled up to tens of thousands of nodes. | Highly scalable. | It is also highly scalable. |
Iterative Processing | Does not support Iterative Processing. | supports Iterative Processing. | supports Iterative Processing and iterate data with the help of its streaming architecture. |
Supported Languages | Java, C, C++, Python, Perl, groovy, Ruby, etc. | Java, Python, R, Scala. | Java, Python, R, Scala. |
Cost | Uses commodity hardware which is less expensive | Needed lots of RAM so the cost is relatively high. | Apache Flink also needed lots of RAM so the cost is relatively high. |
Abstraction | No Abstraction in Map-Reduce. | Spark RDD abstraction | Flink supports Dataset abstraction for batch and DataStreams |
SQL support | Users can run SQL queries using Apache Hive. | Users can run SQL queries using Spark-SQL. It also supports Hive for SQL. | Flink supports Table-API which are similar to SQL expression. Apache foundation is panning to add SQL interface in its future release. |
Caching | Map-Reduce can not cache data. | It can cache data in memory | Flink can also cache data in memory |
Hardware Requirements | Runs well on less expensive commodity hardware. | It also needed high-level hardware. | Apache Flink also needs High-level Hardware |
Machine Learning | Apache Mahout is used for ML. | Spark is so powerful in implementing ML algorithms with its own ML libraries. | FlinkML library of Flink is used for ML implementation. |
Line of code | Hadoop 2.0 has 1,20,000 lines of codes. | developed in 20000 lines of codes. | It is developed in Scala and Java so no. of lines of code is less then Hadoop. |
High Availability | Configurable in High Availability Mode. | Configurable in High Availability Mode. | Configurable in High Availability Mode. |
Amazon S3 connector | Provides Support for Amazon S3 Connector. | Provides Support for Amazon S3 Connector. | Provides Support for Amazon S3 Connector. |
Backpressure Handing | Hadoop handles back-pressure through Manual Configuration. | Spark also handles back-pressure through Manual Configuration. | Apache Flink handles back-pressure Implicitly through System Architecture |
Criteria for Windows | Hadoop does not have any windows criteria since it does not support streaming. | Spark has time-based window criteria. | Flink has record-based Flink Window criteria. |
Apache License | Apache License 2. | Apache License 2. | Apache License 2. |