вівторок, 6 жовтня 2015 р.

Apache Zeppelin: impressions

A notebooks are getting more and more attraction from data analytics, data scientists and developers. Jupiter is a famous notebooks created by Python guys and widely adopted among different users. At the same time, the new notebook provider was recently born: Apache Zeppelin with main focus on integration with BigData technology stack.

In fact, Apache Zeppelin provides build-in integration with Apache Spark (and SparkSQL), Apache Flink, Hive, Ignite, Tajo (does someone outside South Korea is using that?), definitely markdown and html, and event AngularJS. It's good part about Zeppelin. Also, Ambari integration give a possibility to install Zeppelin in "a couple clicks" and get access through Amabari Views.  And practically it works very well:




And now I'd like to focus on the what's wrong with Apache Zeppelin:

1) Security. Zeppelin 0.5 doesn't have security. Anybody can open any notebook, view and edit that. It doesn't work for enterprises, moreover it doesn't work even for RnD. I want to have protected notebooks, I want to have roles and groups, and give notebook only to specific group of people for specific set of actions.
2) Workspace. One-level list of notebooks, really? That's awful. Guys, add possibility to combine them in folders of folders and etc, it's really important. Also, only one way to backup notebooks, is to backups underlying folders from filesystem. Not very good, UI button is required at least.
3) Security 2. I've already written about notebooks security, but data on storage is also must be protected. Currently Zeppelin run everything as ZEPPELIN user, and I have to share data with ZEPPELIN users which is not what I want to do. So, it makes sense for each notebook to provide a setting "run as" to specify specific user for this research. Enterprises really value that.

Personally I also tried to make it works on Docker (more or less it works) and EMR (failed, and everybody failed as far as I know).

To sum up: Zeppelin is an interesting and promising product, but it has to much weakness to be seriously used and consider for production projects, specially for enterprises. So, in technology radar I can definitely put Zeppelin into the section "Be informed"