понеділок, 13 липня 2015 р.

How to waste the whole day with Spark Streaming and HBase

The "funny" story how to waste the whole day debugging resolving simple case... tips and tricks :)

Spark Streaming application hangs out on action and nothing is changing during hours.
The long story is: custom Receiver accept events from external source and store them to RDD (actually, DStream) for future processing. When I run it I noticed that action hung out! And what was a really scare: the messages were read from source. After spending couple hours trying to find the issue with Reciever, I realized it works fine and finally found the issue ... ... in how I run the job!

I did it in local environment first and submit it to YARN:
...
--num-executors 2
...
It fact, it doesn't work for me because no one worker (spark executor) was able to start! So, just by increasing number of executors to 3, I was able to make everything working.

HBase related Spark Streaming application hangs out and nothing is changing during hours.
Again, the long story is then Spark Streaming application hangs out as soon as it touch HBase. I spent several hours (again) and I was really surprised when found the reason: HBase connection was broken. OMG! I haven't seen any errors or warning related to HBase connection in logs... what is the reason? In fact, HBase tried to establish connection again and again without throwing an error. Consider the following a piece of code (grey - my original part, when blue - an update that helped me to overcome the issue):
     Configuration config = HBaseConfiguration.create();
  config.set(HConstants.ZOOKEEPER_QUORUM, "host:port");
  config.set(HConstants.ZOOKEEPER_ZNODE_PARENT, "/hbase");
  config.set("hbase.client.retries.number", Integer.toString(3));
  config.set("zookeeper.session.timeout", Integer.toString(60000));
  config.set("zookeeper.recovery.retry", Integer.toString(0));

It really helps because of the number of retries was limited. Default value is 35 and can definitely confuse.