Facebook 正式开源其大数据查询引擎 Presto-其它-@大数据资讯

　　Faebook的数据仓库存储在少量大型hadoop/HDFS集群，随着数据量的表述，Facebook需要一套交互性更好的数据查询系统。2012年开始试用一些外部项目都不合适，他们决定自己开发，这就是Presto。

　　Presto是一套分布式SQL引擎，支持P级数据交互查询。支持Ansi SQL查询，包括复杂查询，如包括联合查询、左右联接、子查询以及一些聚合和计算函数；including approximate distinct counts (using HyperLogLog) and approximate percentiles (based on quantile digest). The main restrictions at this stage are a size limitation on the join tables and cardinality of unique keys/groups. The system also lacks the ability to write output data back to tables (currently query results are streamed to the client).

　　Presto完全不同于Hive/MapReduce, Hive是把一条查询分解成多个MapReduce任务分步实行，每个任务都从磁盘上读取数据在把结果写回去。而Presto不用Mapreduce，而是用支持SQL查询的分析引擎，在内存中进行操作，以保证速度。

　　Presto使用Java开发，支持外部数据存储的扩展,

　　2012年秋天Presto项目启动，2013年春天系统上线，现已成为Facebook数据仓库主要查询系统。现已部署超过1000个节点，有超过1000名员工使用，每天处理P级数据查询3万条。

　　Presto的数据查询速度比Hive／Mapreduce快10倍以上。