вівторок, 12 листопада 2013 р.

Create Impala DataMart based on Hive backend

Hive queries are slow, hopefully on Cloudera there is possible to create fast accessible Impala DataMart.

Data into Impala table can be populated from Hive table.
The following table is accessible in `default` database:


table tab2 (
  id int,
  col_1 boolean,
  col_2 double)

The following queries can be used to create Impala and Hive tables with the same content (and the difference in the speed of access to these datasets):

Impala
Hive

create table tab5 (
  col1 boolean,
  col2 double)
STORED AS PARQUETFILE;

insert overwrite tab5
select
 col_1,
 sum(col_2)
from tab2
group by col_1;

create table tab5h (
  col1 boolean,
  col2 double)
STORED AS sequencefile;

insert overwrite table tab5h
select
 col_1,
 sum(col_2)
from tab2
group by col_1;