Facebook open sources its SQL-on-Hadoop engine, and the web rejoices

  SUMMARY:Facebook has open sourced Presto, a SQL engine it says is on average 10 times faster than Hive for running queries across large data sets stored in Hadoop and elsewhere. Facebook has open sourced Presto, the interactive SQL-on-Hadoop engine the company first discussed in June. Presto is Facebook’s take on Cloudera’s Impala or Google’s Dremel, and ...

How to Get Started in Data Science

A lot of people ask me: how do I become a data scientist? I think the short answer is: as with any technical role, it isn’t necessarily easy or quick, but if you’re smart, committed and willing to invest in learning and experimentation, then of course you can do it. In a previous post, I described my ...

Evolving Hadoop ecosystem presents new ways to program big data apps

The Hadoop ecosystem is a body in motion. Just a few years ago, you might quickly but fairly describe Hadoop as “HDFS, MapReduce and some glue” — referring to the Hadoop Distributed File System, its associated software programming model and an emerging collection of APIs and utilities, which together were becoming synonymous with big data systems. What ...