你好,游客 登录
背景:
阅读新闻

Apache Spark 2.0.0 发布,APIs 更新

[日期:2016-08-02] 来源:开源中国  作者: [字体: ]

  Apache Spark 2.0.0 发布了,Apache Spark 是一种与hadoop 相似的开源集群计算环境,但是两者之间还存在一些不同之处,这些有用的不同之处使 Spark 在某些工作负载方面表现得更加优越,换句话说,Spark 启用了内存分布数据集,除了能够提供交互式查询外,它还可以优化迭代工作负载。

  该版本主要更新APIs,支持SQL 2003,支持R UDF ,增强其性能。300个开发者贡献了2500补丁程序。

Spark

  Apache Spark 2.0.0 APIs更新记录如下:

  Unifying DataFrame and Dataset: In Scala and Java, DataFrame and Dataset have been unified, i.e. DataFrame is just a type alias for Dataset of Row. In Python and R, given the lack of type safety, DataFrame is the main programming interface.

  SparkSession: new entry point that replaces the old SQLContext and HiveContext for DataFrame and Dataset APIs. SQLContext and HiveContext are kept for backward compatibility.

  A new, streamlined configuration API for SparkSession

  Simpler, more performant accumulator API

  A new, improved Aggregator API for typed aggregation in Datasets

  Apache Spark 2.0.0 SQL更新记录如下:

  A native SQL parser that supports both ANSI-SQL as well as Hive QL

  Native DDL command implementations

  Subquery support, including

  Uncorrelated Scalar Subqueries

  Correlated Scalar Subqueries

  NOT IN predicate Subqueries (in WHERE/HAVING clauses)

  IN predicate subqueries (in WHERE/HAVING clauses)

  (NOT) EXISTS predicate subqueries (in WHERE/HAVING clauses)

  View canonicalization support

  一些新特性:

  Native CSV data source, based on Databricks’ spark-csv module

  Off-heap memory management for both caching and runtime execution

  Hive style bucketing support

  Approximate summary statistics using sketches, including approximate quantile, Bloom filter, and count-min sketch.

  性能增强:

  Substantial (2 - 10X) performance speedups for common operators in SQL and DataFrames via a new technique called whole stage code generation.

  Improved Parquet scan throughput through vectorization

  Improved ORC performance

  Many improvements in the Catalyst query optimizer for common workloads

  Improved window function performance via native implementations for all window functions

  Automatic file coalescing for native data sources

  更多发布信息,可查看 发布说明 。

 

  下载地址: http://spark.apache.org/downloads.html





收藏 推荐 打印 | 录入:elainebo | 阅读:
相关新闻       Apache  Spark 2.0 
本文评论    (0)
评论声明
  • 尊重网上道德,遵守中华人民共和国的各项有关法律法规
  • 承担一切因您的行为而直接或间接导致的民事或刑事法律责任
  • 本站管理人员有权保留或删除其管辖留言中的任意内容
  • 本站有权在网站内转载或引用您的评论
  • 参与本评论即表明您已经阅读并接受上述条款