It processes data from diverse data sources such as Hadoop Distributed File System (HDFS), Amazon’s S3 system, Apache Cassandra, MongoDB, Alluxio, Apache Hive. It performs in-memory processing which makes it more powerful and fast. Data scientists believe that Spark executes 100 times faster than MapReduce as it can cache data in memory whereas MapReduce works more by reading and writing on disks. It was developed to overcome the limitations in the MapReduce paradigm of Hadoop. It is a general-purpose cluster computing system that provides high-level APIs in Scala, Python, Java, and R. It is a data processing engine hosted at the vendor-independent Apache Software Foundation to work on large data sets or big data. Spark is an open-source framework for running analytics applications.
0 Comments
Leave a Reply. |