четвер, 31 серпня 2017 р.

Druid: fixed lambda



Druid is an excellent high-performance, column-oriented and distributed data storage. Used by IT giant to get answers in sub-seconds from TBs (or even PBs) datasets. Needless to say I felt in love since day one. 

Several examples:
Netflix ingest up to 2 TB per hour with the ability to query data as its being ingested
eBay ingest over 100.000 events'sec and supports over 100 concurrent queries without impacting ingest rate and query latency

 Main featured of Druid that helps to stand out of the crowd:

  • Sub-seconds query
  • Scalable to PBs
  • Real-time strams
  • Deploy anywhere (can work with Hadoop or without by processing data from S3)
I'm excited I had an opportunity to work with Druid a year ago. It's really cool, works super fast and delivers excellent result! The JSON-based query language wasn't super hard to learn, I managed even to calculate average using post action:) previous MR experience really helped.

One remark, I'd like to add there: 
we developer and tested druid based system in us-east-1 region, everything was good, deployment was automated, so we moved to prod which, surprisingly, was selected to be in Frankfurt AWS region. We got pretty nasty error in Druid when deployment script finished his work there:
 Caused by: io.druid.segment.loading.SegmentLoadingException: S3 fail!

Looks like the problem was in additional configuration required for non US-east region, unfortunately there isn’t documentation so I derived that from source code, looks like it works now:

On each historical node, please add the following file “/opt/druid/config/_common/jets3t.properties” with a content:
storage-service.request-signature-version=AWS4-HMAC-SHA256
s3service.s3-endpoint=s3.eu-central-1.amazonaws.com

1st line forces to use v4 auth

2nd line sets endpoint, default is us-east-1, but for Frankfurt it must be s3.eu-central-1.amazonaws.com

Anyway, Metamarket team - thank you for great product! Now going to test Caravela from AirBnb