Facebook open sources its SQL-on-Hadoop engine, and the web rejoices

  SUMMARY:Facebook has open sourced Presto, a SQL engine it says is on average 10 times faster than Hive for running queries across large data sets stored in Hadoop and elsewhere. Facebook has open sourced Presto, the interactive SQL-on-Hadoop engine the company first discussed in June. Presto is Facebook’s take on Cloudera’s Impala or Google’s Dremel, and ...

How to Get Started in Data Science

A lot of people ask me: how do I become a data scientist? I think the short answer is: as with any technical role, it isn’t necessarily easy or quick, but if you’re smart, committed and willing to invest in learning and experimentation, then of course you can do it. In a previous post, I described my ...

What to know when choosing database as a service

This report underwitten by: Clustrix In the past couple of years, we’ve seen more innovation in the SQL database management system (DBMS) category than in the 30-plus years since commercial products became available. The past decade’s web 2.0 sites have mostly driven this innovation, which looks to bridge some of the gap between NoSQL DBMS ...

Evolving Hadoop ecosystem presents new ways to program big data apps

The Hadoop ecosystem is a body in motion. Just a few years ago, you might quickly but fairly describe Hadoop as “HDFS, MapReduce and some glue” — referring to the Hadoop Distributed File System, its associated software programming model and an emerging collection of APIs and utilities, which together were becoming synonymous with big data systems. What ...

Teradata gets into the in-memory biz to take on SAP’s HANA

  photo: Shutterstock / Hellen Sergeyeva SUMMARY:Teradata is trying to steal some thunder in the in-memory analytics space with a new technology called Intelligent Memory that places hot data in RAM while dispersing the rest across solid-state drives and disk. Data analytics veteran Teradata will not let the new era of data-analysis architectures pass it by ...

Careful: Your big data analytics may be polluted by data scientist bias

This was posted originally on Gigaom. photo: pzAxe/Shutterstock SUMMARY:True believers may be guilty of hype, but there’s no denying that big data presents opportunities for businesses of every stripe. That potential is vulnerable to pollution from data bias, and so calls for preventative processes. Expectations surrounding the future of  big data range from the just huge ...

How HBase converted MySpace’s MySQL champion and is driving Hadoop mainstream

photo: Shutterstock / z0w SUMMARY:Gravity CTO Jim Benedetto knows his way around MySQL after managing a 600-instance cluster at MySpace, but he has found HBase religion as his real-time content-recommendation platform grew. And he’s not alone. How’s this for an understatement: Operational databases are important for many, if not the majority, of web applications. And if ...