Data into Impala table can be populated from Hive table.
The following table is accessible in `default` database:
table tab2 ( id int, col_1 boolean, col_2 double)
The following queries can be used to create Impala and Hive tables with the same content (and the difference in the speed of access to these datasets):
Impala
|
Hive
|
create table tab5 ( col1 boolean, col2 double) STORED AS PARQUETFILE; insert overwrite tab5 select col_1, sum(col_2) from tab2 group by col_1; |
create table tab5h ( col1 boolean, col2 double) STORED AS sequencefile; insert overwrite table tab5h select col_1, sum(col_2) from tab2 group by col_1; |