Let's review simple example.
First of all, Oracle BigData Lite VM must be downloaded (for free, but it takes 25Gb on disk).
After installation, test dataset must be create. I put 2 files to directory on HDFS /user/oracle/xquery/input with sample dataset about access to website. The example of content is:
2013-10-28T06:00:00, chrome, index.html, 200
2013-10-28T08:30:02, firefox, index.html, 200
2013-10-28T08:32:50, ie9, about.html, 200
Next step: create XQuery script (my_xquery.xq) to process data (simple grouping by date of visiting page)
import module "oxh:text"; for $line in text:collection("/user/oracle/xquery/input/*.txt") let $split := fn:tokenize($line, "\s*,\s*") let $time := xs:dateTime($split[1]) let $day := xs:date($time) group by $day return text:put($day || ", " || fn:count($line))
Now script is ready to be run, execute from command line:
hadoop jar $OXH_HOME/lib/oxh.jar my_xquery.xq -output /user/oracle/xquery/output -clean -ls
Options:
-output specify output directory
-clean remove output directory if exists
-ls list the content of output directory after run
Here is the result:
That's it, XQuery was translated to MapReduce (similar to Pig Latin or HiveQL). This functionality is the part of Oracle BigData Connectors for Hadoop and more information with examples might be read here