Oozie was created to eliminate workflow/scheduling issues and, obvious, may be used to create ETL and naturally engages Hive.
Workflow is a core component of any Oozie job and it is list of required steps to accomplish task. So, workflow gives a way to describe ETL and there is the example of using Hive in Oozie workflow:
Well, it describes two-steps job, content of executed hive scripts are located in first_step.hql and second_step.hql respectively (both located on HDFS).
Some preparations are required before start of using it Put to HDFS hive-site.xml with added property:
Hive uses temporary folders both on the machine running the Hive client and the default HDFS instance. These folders are used to store per-query temporary/intermediate data sets and are normally cleaned up by the hive client when the query is finished. However, in cases of abnormal hive client termination, some data may be left behind. The configuration details are as follows: On the HDFS cluster this is set to /tmp/hive-
After that, property file is required:
Put workflow to the path specified at oozie.wf.application.path. Also, directory lib may be created at this path and used for saving different jars required by workflow (for example, custom Hive UDF).
And the final step: run job on oozie server, it may be done with the next command (assume you put properties localy):