Amazon Cloud Now Supports Spark and Enterprise SQL Server. The Amazon cloud is all data lately. They just added support to the Apache Spark project for Big Data analytics, as well as a machine image offering for SQL Server Enterprise Edition. Yesterday, Jon Fritz, an Amazon Web Services Inc. (AWS), announced that the Amazon EMR Web Service now supports the open-source Spark distributed processing framework. Fritz pointed out that Spark improves on the original Apache Hadoop component, the batch-oriented MapReduce. This is ironic since EMR stands for Amazon Elastic MapReduce. Spark employs new technology and approaches in order to overcome the limitations of MapReduce. Big Data analytics has evolved with more complex use cases and new applications. Fritz stated that Hadoop MapReduce has been a great tool for large-scale data processing, batch reports, ad hoc analysis of unstructured data, and machine learning (ML) in the past. “Apache Spark is a new distributed processing framework in Hadoop that increases job performance and speeds up development for certain workloads. [Click on the image to see a larger view.] The Spark Ecosystem (source Databricks Inc.). Some of these workloads include stream processing and machine learning, as well as fast, interactive SQL queries. Spark is well-suited to such use cases due to features such as in memory computation and a directed-acyclic graph execution engine (DAG), which increases analytic performance, particularly in real-time, iterative situations. There are multiple libraries that can be used to perform specific applications such as ML stream processing, graph processing, and others. Spark is currently the most active open-source project. Databricks Inc., a commercial steward, listed around 230 contributors to the newly launched version 1.4. There is a growing number of companies jumping on the Spark bandwagon. IBM announced a major investment in the technology. Big Blue plans to incorporate Spark into its key data-driven products, services, and is investing enormous resources in the project. This includes a commitment of more than 3,500 engineers to assist with development efforts. IBM is now making room for Amazon to join that bandwagon. [Click on the image to see a larger view.] Stack Overflow Activity (source : Redmonk). “Spark on Hadoop is natively supported by Amazon EMR. You can quickly and easily create managed Spark Clusters from the AWS Management Console or AWS CLI or the Amazon EMRAP,” the Spark page states. You can also leverage additional Amazon EMR features such as fast Amazon S3 connectivity via the Amazon EMR File System, integration with the Amazon EC2 Spot Market, and resize commands that allow you to add or remove instances from your cluster. Spark in Amazon EMR is free. AWS’ data-driven focus is further demonstrated by the announcement of Spark. AWS also announced a new Microsoft SQL Server Enterprise Edition Amazon Machine Image for the Amazon Elastic Compute Cloud. This was posted by Jeff Barr in a blog post. It enhances the Standard Edition by adding more computing power to the system and more memory. Standard can use 16 cores and 128 gigabytes of memory, while Enterprise can use 32 cores and 244 gigabytes of memory in an extra-large instance. The Enterprise Edition includes SQL Server Enterprise Edition 2012, and SQL Server Enterprise Edition 2014. These are available in many regions as described in the AWS Marketplace. Barr highlighted the following unique features of the offering.
High availability allows users to configure a primary and up to four active, readable second databases into an Al