вівторок, 9 червня 2015 р.

How to present XML in Hive flat table after XSLT transformation

Let's start from defining a task. Imaging that the dataset is a set of XML files and the requirement is to present some specific information from this file as simple flat structure. Let's illustrate:

Definetely, we can use SerDe for XML, but what if XML structure is not defined before hand and we want to give end-user a chance to control parsing process? One of possible solutions is to incorporate XSLT to transform XML to desired format.