tag:blogger.com,1999:blog-83820004077372710142024-03-16T20:50:31.439+02:00Kostiantyn's blogMy tech blog. Java development and otherAnonymoushttp://www.blogger.com/profile/12572224859800644616noreply@blogger.comBlogger100125tag:blogger.com,1999:blog-8382000407737271014.post-52907311045003229952017-11-20T06:00:00.002+02:002017-11-20T06:00:10.378+02:00Terraform<span style="background-color: white; font-family: Roboto, arial, sans-serif; font-weight: bold;">Terraform by HashiCorp</span><span style="background-color: white; font-family: Roboto, arial, sans-serif;"> enables you to safely and predictably create, change, and improve infrastructure. It is an open source tool that codifies APIs into declarative configuration files that can be shared amongst team members, treated as code, edited, reviewed, and versioned.</span><br />
<span style="background-color: white; font-family: Roboto, arial, sans-serif;"><br /></span>
<span style="background-color: white;"><span style="font-family: Roboto, arial, sans-serif;">Terraform is a simple and reliable way to manage infrastructure in AWS, Google Cloud, Azure, Digital Ocean and more IaaS (providers, in terms of Terraform). The main idea of such tools is to create reproducible infrastructure and Terraform provides DSL to describe infrastructure and then apply to different environments. Previously we used a set of Python and bash scripts to describe what create in AWS, described different conditions which checks if some resource exists in AWS and create if it's not. Actually, Terraform is doing the same underhood. This is an introduction which covers simple <a href="http://simpletoad.blogspot.com/2017/11/applying-alluxio-to-warm-your-data.html" target="_blank">use case to create Alluxio cluster</a> I used in the previous post.</span></span><br />
<span style="background-color: white;"><span style="font-family: Roboto, arial, sans-serif;"><br /></span></span>
<span style="background-color: white;"><span style="font-family: Roboto, arial, sans-serif;">Terraform supports a number of different providers, but Terraform script must be written every time for new provider.</span></span><br />
<span style="background-color: white;"><span style="font-family: Roboto, arial, sans-serif;"><br /></span></span>
<span style="background-color: white;"><span style="font-family: Roboto, arial, sans-serif;"></span></span><br />
<a name='more'></a><span style="font-family: Roboto, arial, sans-serif;"><br /></span><br />
<span style="background-color: white;"><span style="font-family: Roboto, arial, sans-serif;"><br /></span></span>
<span style="background-color: white;"><span style="font-family: Roboto, arial, sans-serif;">For beginning, let's define AWS provider:</span></span><br />
<div class="line number2 index1 alt1" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">provider </code><code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">"aws"</code> <code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">{</code></div>
<div class="line number3 index2 alt2" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">region = </code><code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">"us-west-1"</code></div>
<div class="line number4 index3 alt1" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">profile = </code><code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">"xxx-federated"</code> <code class="bash comments" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: rgb(0, 130, 0) !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"># this is source of credentials; without profile use access keys</code></div>
<div class="line number5 index4 alt2" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">}</code></div>
<span style="background-color: white;"><span style="color: #545454; font-family: Roboto, arial, sans-serif;"><br /></span></span>
<span style="font-family: Roboto, arial, sans-serif;"><span style="background-color: white;">Then create security group and open/close required ports:</span></span><br />
<div class="line number8 index7 alt1" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash comments" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: rgb(0, 130, 0) !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"># As simplification, it opens wide range of ports from 20 to 65530, use more grained control</code></div>
<div class="line number9 index8 alt2" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">resource </code><code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">"aws_security_group"</code> <code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">"my_test"</code> <code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">{</code></div>
<div class="line number10 index9 alt1" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">description = </code><code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">"Used in the terraform"</code></div>
<div class="line number11 index10 alt2" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">vpc_id = </code><code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">"vpc-XXXXX"</code> <code class="bash comments" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: rgb(0, 130, 0) !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"># VPC in which we want to have this subnet</code></div>
<div class="line number12 index11 alt1" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code> </div>
<div class="line number13 index12 alt2" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash comments" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: rgb(0, 130, 0) !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"># input</code></div>
<div class="line number14 index13 alt1" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">ingress {</code></div>
<div class="line number15 index14 alt2" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">from_port = 20</code></div>
<div class="line number16 index15 alt1" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">to_port = 65530</code></div>
<div class="line number17 index16 alt2" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">protocol = </code><code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">"tcp"</code></div>
<div class="line number18 index17 alt1" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">cidr_blocks = [</code><code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">"10.0.0.0/8"</code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">,</code><code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">"172.0.0.0/8"</code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">]</code></div>
<div class="line number19 index18 alt2" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">}</code></div>
<div class="line number20 index19 alt1" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
</div>
<div class="line number21 index20 alt2" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash comments" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: rgb(0, 130, 0) !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"># outbound</code></div>
<div class="line number22 index21 alt1" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">egress {</code></div>
<div class="line number23 index22 alt2" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">from_port = 0</code></div>
<div class="line number24 index23 alt1" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">to_port = 0</code></div>
<div class="line number25 index24 alt2" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">protocol = </code><code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">"-1"</code></div>
<div class="line number26 index25 alt1" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">cidr_blocks = [</code><code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">"0.0.0.0/0"</code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">]</code></div>
<div class="line number27 index26 alt2" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">}</code></div>
<div class="line number28 index27 alt1" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">}</code></div>
<span style="color: #545454; font-family: Roboto, arial, sans-serif;"><span style="background-color: white;"><br /></span></span>
<span style="color: #545454; font-family: Roboto, arial, sans-serif;"><span style="background-color: white;"><br /></span></span>
<span style="font-family: Roboto, arial, sans-serif;"><span style="background-color: white;">AWS very actively relies on roles so our Alluxio instances are supposed to have IAM roles:</span></span><br />
<div class="line number30 index29 alt1" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash comments" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: rgb(0, 130, 0) !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"># this is profile for our instance, used for IAM purposes</code></div>
<div class="line number31 index30 alt2" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">resource </code><code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">"aws_iam_instance_profile"</code> <code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">"my_profile"</code> <code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">{</code></div>
<div class="line number32 index31 alt1" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">name = </code><code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">"my_profile"</code></div>
<div class="line number33 index32 alt2" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">role = </code><code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">"BEST_EVER_Role"</code> <code class="bash comments" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: rgb(0, 130, 0) !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"># yeah, role we're gonna use</code></div>
<div class="line number34 index33 alt1" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">}</code></div>
<div class="line number34 index33 alt1" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<br /></div>
<span style="font-family: Roboto, arial, sans-serif;"><span style="background-color: white;">Alluxio architecture consist Master and several slaves. There is an option to run Secondary Master, but it's similar to Hadoop SNN and won't serve requests from slaves when primary master is not available. </span></span><br />
<span style="color: #545454; font-family: Roboto, arial, sans-serif;"><span style="background-color: white;"><br /></span></span>
<div class="line number36 index35 alt1" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash comments" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: rgb(0, 130, 0) !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"># EC2 to host Alluxio master</code></div>
<div class="line number37 index36 alt2" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">resource </code><code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">"aws_instance"</code> <code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">"emr2-master"</code> <code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">{</code></div>
<div class="line number38 index37 alt1" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">instance_type = </code><code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">"m4.2xlarge"</code></div>
<div class="line number39 index38 alt2" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">count=1</code></div>
<div class="line number40 index39 alt1" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash comments" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: rgb(0, 130, 0) !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"># ami based on Amazon linux with docker and git</code></div>
<div class="line number41 index40 alt2" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">ami=</code><code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">"ami-YYYY"</code> <code class="bash comments" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: rgb(0, 130, 0) !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"># our instance is based on this role</code></div>
<div class="line number42 index41 alt1" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash comments" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: rgb(0, 130, 0) !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"># The name of our SSH keypair we created above.</code></div>
<div class="line number43 index42 alt2" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">key_name = </code><code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">"key"</code> <code class="bash comments" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: rgb(0, 130, 0) !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"># EC2 key pair name which is known to AWS</code></div>
<div class="line number44 index43 alt1" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">iam_instance_profile = </code><code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">"${aws_iam_instance_profile.my_profile.id}"</code> <code class="bash comments" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: rgb(0, 130, 0) !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"># sweet part: reference to instance profile ID</code></div>
<div class="line number45 index44 alt2" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">vpc_security_group_ids = [</code><code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">"${aws_security_group.my_test.id}"</code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">] </code><code class="bash comments" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: rgb(0, 130, 0) !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"># another sweet part: reference to security group declared above</code></div>
<div class="line number46 index45 alt1" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">subnet_id = </code><code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">"subnet-123ce1da"</code> <code class="bash comments" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: rgb(0, 130, 0) !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"># subnet, just hardcode for now</code></div>
<div class="line number47 index46 alt2" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">tags {</code></div>
<div class="line number48 index47 alt1" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">Name = </code><code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">"Alluxio-master"</code> <code class="bash comments" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: rgb(0, 130, 0) !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"># here we can describe several tags we need</code></div>
<div class="line number49 index48 alt2" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">}</code></div>
<div class="line number50 index49 alt1" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">provisioner </code><code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">"remote-exec"</code> <code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">{ </code><code class="bash comments" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: rgb(0, 130, 0) !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"># provisioner, what we want to run on master after it created</code></div>
<div class="line number51 index50 alt2" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">inline = [</code></div>
<div class="line number52 index51 alt1" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">"git clone <a href="https://github.com/Alluxio/alluxio.git" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; text-decoration-line: none; top: auto; vertical-align: baseline; width: auto;">https://github.com/Alluxio/alluxio.git"</a></code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">,</code></div>
<div class="line number53 index52 alt2" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">"cd alluxio/integration/docker"</code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">,</code></div>
<div class="line number54 index53 alt1" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">"docker build -t alluxio ."</code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">,</code></div>
<div class="line number55 index54 alt2" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">"cd ~"</code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">,</code></div>
<div class="line number56 index55 alt1" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">"mkdir underStorage"</code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">,</code></div>
<div class="line number57 index56 alt2" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">"sudo mkdir /mnt/ramdisk"</code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">,</code></div>
<div class="line number58 index57 alt1" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">"sudo mount -t ramfs -o size=12G ramfs /mnt/ramdisk"</code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">,</code></div>
<div class="line number59 index58 alt2" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">"sudo chmod a+w /mnt/ramdisk"</code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">,</code></div>
<div class="line number60 index59 alt1" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">"sudo service docker restart"</code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">,</code></div>
<div class="line number61 index60 alt2" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">"docker run -d --net=host -v $PWD/underStorage:/underStorage -e ALLUXIO_MASTER_HOSTNAME=127.0.0.1 -e ALLUXIO_UNDERFS_ADDRESS=/underStorage alluxio master"</code></div>
<div class="line number62 index61 alt1" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">]</code></div>
<div class="line number63 index62 alt2" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">}</code></div>
<div class="line number64 index63 alt1" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> connection {</code></div>
<div class="line number69 index68 alt2" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">timeout = </code><code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">"15m"</code> </div>
<div class="line number70 index69 alt1" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">user = </code><code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">"ec2-user"</code></div>
<div class="line number71 index70 alt2" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">private_key = </code><code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">"${file("</code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">/Users/kostia/</code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">.</code><code class="bash functions" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: rgb(255, 20, 147) !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">ssh</code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">/key</code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">.pem</code><code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">")}"</code></div>
<div class="line number72 index71 alt1" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"><span style="color: #333333;"> </span></code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">}</code></div>
<div class="line number73 index72 alt2" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">}</code></div>
<span style="color: #545454; font-family: Roboto, arial, sans-serif;"><span style="background-color: white;"><br /></span></span>
<span style="font-family: Roboto, arial, sans-serif;"><span style="background-color: white;">Last step is to describe Alluxio slaves and connect them to Master</span></span><br />
<div class="line number76 index75 alt1" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">resource </code><code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">"aws_instance"</code> <code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">"emr2-slaves"</code> <code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">{</code></div>
<div class="line number77 index76 alt2" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">instance_type = </code><code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">"r4.xlarge"</code></div>
<div class="line number78 index77 alt1" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">count=3</code></div>
<div class="line number79 index78 alt2" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash comments" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: rgb(0, 130, 0) !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"># ami based on Amazon linux with docker and git</code></div>
<div class="line number80 index79 alt1" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">ami=</code><code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">"ami-ZZZ"</code></div>
<div class="line number81 index80 alt2" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash comments" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: rgb(0, 130, 0) !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"># The name of our SSH keypair we created above.</code></div>
<div class="line number82 index81 alt1" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">key_name = </code><code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">"key"</code></div>
<div class="line number83 index82 alt2" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">vpc_security_group_ids = [</code><code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">"${aws_security_group.my_test.id}"</code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">]</code></div>
<div class="line number84 index83 alt1" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">iam_instance_profile = </code><code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">"${aws_iam_instance_profile.my_profile.id}"</code></div>
<div class="line number85 index84 alt2" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">subnet_id = </code><code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">"subnet-123c1a0b"</code></div>
<div class="line number86 index85 alt1" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">tags {</code></div>
<div class="line number87 index86 alt2" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">Name = </code><code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">"Alluxio-slaves"</code></div>
<div class="line number88 index87 alt1" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">}</code></div>
<div class="line number89 index88 alt2" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">provisioner </code><code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">"remote-exec"</code> <code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">{</code></div>
<div class="line number90 index89 alt1" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">inline = [</code></div>
<div class="line number91 index90 alt2" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">"git clone <a href="https://github.com/Alluxio/alluxio.git" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; text-decoration-line: none; top: auto; vertical-align: baseline; width: auto;">https://github.com/Alluxio/alluxio.git"</a></code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">,</code></div>
<div class="line number92 index91 alt1" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">"cd alluxio/integration/docker"</code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">,</code></div>
<div class="line number93 index92 alt2" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">"docker build -t alluxio ."</code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">,</code></div>
<div class="line number94 index93 alt1" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">"cd ~"</code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">,</code></div>
<div class="line number95 index94 alt2" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">"mkdir underStorage"</code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">,</code></div>
<div class="line number96 index95 alt1" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">"sudo mkdir /mnt/ramdisk"</code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">,</code></div>
<div class="line number97 index96 alt2" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">"sudo mount -t ramfs -o size=28G ramfs /mnt/ramdisk"</code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">,</code></div>
<div class="line number98 index97 alt1" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">"sudo chmod a+w /mnt/ramdisk"</code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">,</code></div>
<div class="line number99 index98 alt2" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">"sudo service docker restart"</code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">,</code></div>
<div class="line number100 index99 alt1" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">"docker run -d --net=host -v /mnt/ramdisk:/opt/ramdisk -v $PWD/underStorage:/underStorage -e ALLUXIO_MASTER_HOSTNAME=${aws_instance.emr2-master.private_ip} -e ALLUXIO_RAM_FOLDER=/opt/ramdisk -e ALLUXIO_WORKER_MEMORY_SIZE=28GB -e ALLUXIO_UNDERFS_ADDRESS=/underStorage alluxio worker"</code></div>
<div class="line number101 index100 alt2" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">]</code></div>
<div class="line number102 index101 alt1" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">}</code></div>
<div class="line number103 index102 alt2" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">connection {</code></div>
<div class="line number108 index107 alt1" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">timeout = </code><code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">"15m" # wait 15 minutes for connection to be established</code></div>
<div class="line number109 index108 alt2" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">user = </code><code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">"ec2-user"</code></div>
<div class="line number110 index109 alt1" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: #333333; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash spaces" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"> </code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">private_key = </code><code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">"${file("</code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">/Users/kostia/</code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">.</code><code class="bash functions" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: rgb(255, 20, 147) !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">ssh</code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">/key</code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">.pem</code><code class="bash string" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: blue !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">")}"</code></div>
<div class="line number111 index110 alt2" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;"><span style="color: #333333;"> </span></code><code class="bash plain" style="background: 0px center; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; color: black !important; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px; position: static; right: auto; top: auto; vertical-align: baseline; width: auto;">}</code></div>
<div class="line number111 index110 alt2" style="background-attachment: initial; background-clip: initial; background-color: white !important; background-image: initial; background-origin: initial; background-position: 0px center; background-repeat: initial; background-size: initial; border-radius: 0px; border: 0px; bottom: auto; box-sizing: content-box; float: none; font-family: Consolas, "Bitstream Vera Sans Mono", "Courier New", Courier, monospace; height: auto; left: auto; line-height: 20px; margin: 0px; min-height: inherit; outline: 0px; overflow: visible; padding: 0px 1em 0px 0px; position: static; right: auto; top: auto; vertical-align: baseline; white-space: nowrap; width: auto;">
<span style="background-color: initial;">}</span></div>
<span style="color: #545454; font-family: Roboto, arial, sans-serif;"><span style="background-color: white;"><br /></span></span>
<span style="font-family: Roboto, arial, sans-serif;"><span style="background-color: white;">Last step is apply changes from Terraform json script to AWS infrastructure. The main commands to know:</span></span><br />
<span style="font-family: Roboto, arial, sans-serif;"><span style="background-color: white;"><b>terraform apply</b> applies changes to AWS account (i.e. create, update or delete instances)</span></span><br />
<span style="font-family: Roboto, arial, sans-serif;"><span style="background-color: white;"><b>terraform plan</b> will show plan (list of commands to be executed) before applying</span></span><br />
<span style="font-family: Roboto, arial, sans-serif;"><span style="background-color: white;"><b>terraform destroy</b> destroys created previously infrastructure</span></span><br />
<span style="color: #545454; font-family: Roboto, arial, sans-serif;"><span style="background-color: white;"><br /></span></span>
<span style="color: #545454; font-family: Roboto, arial, sans-serif;"><span style="background-color: white;"># </span>Einführung in Terraform (in English)</span><br />
<span style="color: #545454; font-family: Roboto, arial, sans-serif;"><span style="background-color: white;"># </span>Introduksjon i Terraform </span><span style="color: #545454; font-family: Roboto, arial, sans-serif;">(in English)</span><br />
<span style="color: #545454; font-family: Roboto, arial, sans-serif;"><span style="background-color: white;"><br /></span></span>
<span style="color: #545454; font-family: Roboto, arial, sans-serif; font-size: x-small;"><span style="background-color: white;"><br /></span></span>Anonymoushttp://www.blogger.com/profile/12572224859800644616noreply@blogger.com1tag:blogger.com,1999:blog-8382000407737271014.post-87679095475568593572017-11-20T05:13:00.004+02:002017-11-27T01:37:43.906+02:00Applying Alluxio to warm up your data<blockquote class="tr_bq">
<span style="font-family: Arial, Helvetica, sans-serif;"><span style="background-color: white;">Alluxio</span><span style="background-color: white;">, formerly Tachyon, enables any application to interact with any data from any storage system at memory speed.</span> </span></blockquote>
<span style="font-family: Arial, Helvetica, sans-serif;">states https://www.alluxio.org/. In this article I'd like to describe the general idea of using Alluxio and how it helped me. <span style="color: #545454; font-family: "roboto" , "arial" , sans-serif; font-size: xx-small;">Alluxio is not one known to everyone, however it has a lot of features to propose and can be a game changer for your project. Alluxio already powers data processing at Barclays, Alibaba, Baidu, ZTE, Intel, etc. The current license is Apache 2.0 and source code can be reviewed here https://github.com/Alluxio/alluxio .</span></span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><span style="color: #545454; font-family: "roboto" , "arial" , sans-serif; font-size: xx-small;"><br /></span>
<span style="color: #545454; font-family: "roboto" , "arial" , sans-serif; font-size: xx-small;">Alluxio provides virtual filesystem which create a layer between your application (i.e. computational framework) and real storage such as HDFS, S3, Google Cloud Storage, Azure Blob Storage and so on. Alluxio has several interfaces: Hadoop compatible FS, native key-value interface, NFS interface. From component point of view, Alluxio has single Master (plus Secondary Master which similar to SNN in Hadoop, i.e. doesn't process requests from clients), multiple Slaves and, obviously, Client. </span></span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><span style="color: #545454; font-family: "roboto" , "arial" , sans-serif; font-size: xx-small;"><br /></span>
<span style="color: #545454; font-family: "roboto" , "arial" , sans-serif; font-size: xx-small;">My use case was inspired by layered storage in HDFS: it's when you can configure HDFS to save specific HDFS paths on Hot storage (let say in memory) or Warm (~ SSD) or Cold (~ HDD). However, cloud usage is growing every day and it's not so often to see hardware Hadoop cluster and the issue with a clouds (at the same time, a benefit): storage is isolated from computations, which makes impossible or hard to implement storage layers. And that's very good use case for Alluxio: deploy alluxio cluster to play the role of Hot storage where only high-frequency used data is located. </span></span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><span style="color: #545454; font-family: "roboto" , "arial" , sans-serif; font-size: xx-small;"><br /></span>
<span style="color: #545454; font-family: "roboto" , "arial" , sans-serif; font-size: xx-small;">While saving data on S3, we'd like to partition them by year, month and day to increase access speed while executing access to data in known time range. However it's not often happen to access data according to uniform distribution, much often there is very specific patterns like:</span></span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span>
<ul>
<li><span style="color: #545454; font-family: Arial, Helvetica, sans-serif; font-size: xx-small;">actively access last 3 months</span></li>
<li><span style="color: #545454; font-family: Arial, Helvetica, sans-serif; font-size: xx-small;">actively access last month and the same month of last year</span></li>
</ul>
<span style="color: #545454; font-family: Arial, Helvetica, sans-serif; font-size: xx-small;">It's natural candidate to put these data into Alluxio to speed up access to them, but the rest of data will be available directly from S3.</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><span style="color: #545454; font-family: "roboto" , "arial" , sans-serif; font-size: xx-small;"><br /></span>
<span style="color: #545454; font-family: "roboto" , "arial" , sans-serif; font-size: xx-small;">Let's see the practical example of working with data stored on S3 using Apache Spark on EMR.</span></span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><span style="color: #545454; font-family: "roboto" , "arial" , sans-serif; font-size: xx-small;"><br /></span>
<span style="color: #545454; font-family: "roboto" , "arial" , sans-serif; font-size: xx-small;"> I used <a href="http://simpletoad.blogspot.com/2017/11/terraform.html" target="_blank">Terraform</a> to create Alluxio cluster, having 3 </span><span style="background-color: white; white-space: nowrap;">r4.xlarge</span><span style="color: #545454; font-family: "roboto" , "arial" , sans-serif; font-size: xx-small;"> slaves and one </span><span style="background-color: white; white-space: nowrap;">m4.xlarge</span><span style="color: #545454; font-family: "roboto" , "arial" , sans-serif; font-size: xx-small;"> master. Also, we will need computational power to run Spark job, let's create AWS EMR cluster:</span></span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><span style="color: #545454; font-family: "roboto" , "arial" , sans-serif; font-size: xx-small;"><br /></span>
<span style="color: #545454; font-family: "courier new" , "courier" , monospace; font-size: xx-small;"><i>aws emr create-cluster --name 'Alluxio_EMR_test' \</i></span></span><br />
<span style="color: #545454; font-family: Arial, Helvetica, sans-serif; font-size: xx-small;"><i>--instance-type m4.2xlarge \</i></span><br />
<span style="color: #545454; font-family: Arial, Helvetica, sans-serif; font-size: xx-small;"><i>--instance-count 3 \</i></span><br />
<span style="color: #545454; font-family: Arial, Helvetica, sans-serif; font-size: xx-small;"><i>--ec2-attributes SubnetId=subnet-131cda0a,KeyName=my-key-name,InstanceProfile=EMR_EC2_DefaultRole \</i></span><br />
<span style="color: #545454; font-family: Arial, Helvetica, sans-serif; font-size: xx-small;"><i>--service-role EMR_DefaultRole \</i></span><br />
<span style="color: #545454; font-family: Arial, Helvetica, sans-serif; font-size: xx-small;"><i>--applications Name=Hadoop Name=spark \</i></span><br />
<span style="color: #545454; font-family: Arial, Helvetica, sans-serif; font-size: xx-small;"><i>--region us-west-2 \</i></span><br />
<span style="color: #545454; font-family: Arial, Helvetica, sans-serif; font-size: xx-small;"><i>--log-uri s3://alluxio-poc/emrlogs \</i></span><br />
<span style="color: #545454; font-family: Arial, Helvetica, sans-serif; font-size: xx-small;"><i>--enable-debugging \</i></span><br />
<span style="color: #545454; font-family: Arial, Helvetica, sans-serif; font-size: xx-small;"><i>--release-label emr-5.7.0 \</i></span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><i><span style="color: #545454; font-family: "courier new" , "courier" , monospace; font-size: xx-small;"></span></i><br /></span>
<span style="color: #545454; font-family: Arial, Helvetica, sans-serif; font-size: xx-small;"><i>--emrfs Consistent=true</i></span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><span style="color: #545454; font-family: "roboto" , "arial" , sans-serif; font-size: xx-small;"><br /></span>
<span style="color: #545454; font-family: "roboto" , "arial" , sans-serif; font-size: xx-small;"><br /></span>
<span style="color: #545454; font-family: "roboto" , "arial" , sans-serif; font-size: xx-small;">After that, Alluxio is ready to be started and out data is ready to be pulled in:</span></span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><span style="color: #545454; font-family: "roboto" , "arial" , sans-serif; font-size: xx-small;"><br /></span>
<span style="color: #545454; font-family: "courier new" , "courier" , monospace; font-size: xx-small;"><i>[ec2-user@ip-172-16-175-35 ~]$ docker ps</i></span></span><br />
<span style="color: #545454; font-family: Arial, Helvetica, sans-serif; font-size: xx-small;"><i>CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES</i></span><br />
<span style="color: #545454; font-family: Arial, Helvetica, sans-serif; font-size: xx-small;"><i>1c876a0ffe4d alluxio "/entrypoint.sh wo..." 9 minutes ago Up 9 minutes cranky_brown</i></span><br />
<span style="color: #545454; font-family: Arial, Helvetica, sans-serif; font-size: xx-small;"><i>[ec2-user@ip-172-16-175-35 ~]$ docker exec -it 1c876a0ffe4d /bin/sh</i></span><br />
<span style="color: #545454; font-family: Arial, Helvetica, sans-serif; font-size: xx-small;"><i>/ # cd /opt/alluxio/bin</i></span><br />
<span style="color: #545454; font-family: Arial, Helvetica, sans-serif; font-size: xx-small;"><i>/opt/alluxio/bin # ./alluxio runTests</i></span><br />
<span style="color: #545454; font-family: Arial, Helvetica, sans-serif; font-size: xx-small;"><i>/opt/alluxio/bin # ./alluxio fs mkdir /mnt</i></span><br />
<span style="color: #545454; font-family: Arial, Helvetica, sans-serif; font-size: xx-small;"><i>Successfully created directory /mnt</i></span><br />
<span style="color: #545454; font-family: Arial, Helvetica, sans-serif; font-size: xx-small;"><i># the following command cache S3 folder inside of Alluxio</i></span><br />
<span style="color: #545454; font-family: Arial, Helvetica, sans-serif; font-size: xx-small;"><i>opt/alluxio/bin # ./alluxio fs mount -readonly alluxio://localhost:19998/mnt/s3 s3a://alluxio-poc/data</i></span><br />
<span style="color: #545454; font-family: Arial, Helvetica, sans-serif; font-size: xx-small;"><i>Mounted s3a://alluxio-poc/data at alluxio://localhost:19998/mnt/s3</i></span><br />
<span style="color: #545454; font-family: Arial, Helvetica, sans-serif; font-size: xx-small;"><i>/opt/alluxio/bin #</i></span><br />
<span style="color: #545454; font-family: Arial, Helvetica, sans-serif; font-size: xx-small;"><i>/opt/alluxio/bin # ./alluxio fs ls /mnt/s3</i></span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><i><span style="color: #545454; font-family: "courier new" , "courier" , monospace; font-size: xx-small;"></span></i><br /></span>
<span style="color: #545454; font-family: Arial, Helvetica, sans-serif; font-size: xx-small;"><i>-rwx------ pc-nord-account66pc-nord-account66410916576 09-22-2017 18:03:10:815 Not In Memory /mnt/s3/part-00084-2e9dafb0-2d7a-428e-b517-b6eb4d70f781.snappy.parquet</i></span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><span style="color: #545454; font-family: "roboto" , "arial" , sans-serif; font-size: xx-small;"><br /></span>
<span style="color: #545454; font-family: "roboto" , "arial" , sans-serif; font-size: xx-small;"><br /></span><span style="color: #545454; font-family: "roboto" , "arial" , sans-serif; font-size: xx-small;">Then, back to EMR Master and start spark shell:</span></span><br />
<span style="color: #545454; font-family: Arial, Helvetica, sans-serif; font-size: xx-small;"><i>spark-shell --jars ~/alluxio-1.5.0/client/spark/alluxio-1.5.0-spark-client.jar</i></span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><span style="color: #545454; font-family: "roboto" , "arial" , sans-serif; font-size: xx-small;"><br /></span>
<span style="color: #545454; font-family: "roboto" , "arial" , sans-serif; font-size: xx-small;"><br /></span>
<span style="color: #545454; font-family: "roboto" , "arial" , sans-serif; font-size: xx-small;">The following command starts spark context and register alluxio file sustem:</span></span><br />
<span style="color: #545454; font-family: Arial, Helvetica, sans-serif; font-size: xx-small;"><i>val hadoopConf = sc.hadoopConfiguration</i></span><br />
<span style="color: #545454; font-family: Arial, Helvetica, sans-serif; font-size: xx-small;"><i>hadoopConf.set("fs.alluxio.impl", "alluxio.hadoop.FileSystem")</i></span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><i><span style="color: #545454; font-family: "courier new" , "courier" , monospace; font-size: xx-small;"></span></i><br /></span>
<span style="color: #545454; font-family: Arial, Helvetica, sans-serif; font-size: xx-small;"><i>val x = spark.read.parquet("alluxio://172.16.175.46:19998/mnt/s3")</i></span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><span style="color: #545454; font-family: "courier new" , "courier" , monospace; font-size: xx-small;"><i><br /></i></span>
<span style="color: #545454; font-family: "courier new" , "courier" , monospace; font-size: xx-small;"><i>// let's see how fast is' gonna be</i></span></span><br />
<span style="color: #545454; font-family: Arial, Helvetica, sans-serif; font-size: xx-small;"><i>x.select($"itemid", $"itemdescription", $"GlobalTransactionID", $"amount").orderBy(desc("amount")).show(20) // 4 sec</i></span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><i><span style="color: #545454; font-family: "courier new" , "courier" , monospace; font-size: xx-small;"></span></i><br /></span>
<span style="color: #545454; font-family: Arial, Helvetica, sans-serif; font-size: xx-small;"><i>x.count() // 3 sec</i></span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><span style="color: #545454; font-family: "courier new" , "courier" , monospace; font-size: xx-small;"><i><br /></i></span>
<span style="color: #545454; font-family: "courier new" , "courier" , monospace; font-size: xx-small;"><i>// now let's compare with s3 dataset</i></span></span><br />
<i><span style="font-family: Arial, Helvetica, sans-serif;"><span style="color: #545454; font-size: xx-small;">val p = spark.read.parquet("</span><span style="color: #545454; font-size: xx-small;">s3a://alluxio-poc/data</span><span style="color: #545454; font-size: xx-small;">")</span></span></i><br />
<span style="color: #545454; font-family: Arial, Helvetica, sans-serif; font-size: xx-small;"><i>p.select($"itemid", $"itemdescription", $"GlobalTransactionID", $"amount").orderBy(desc("amount")).show(20) // 19 sec</i></span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><i><span style="color: #545454; font-family: "courier new" , "courier" , monospace; font-size: xx-small;"></span></i><br /></span>
<span style="color: #545454; font-family: Arial, Helvetica, sans-serif; font-size: xx-small;"><i>p.count() // value 19 sec</i></span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><br /></span>
<span style="color: #545454; font-family: Arial, Helvetica, sans-serif; font-size: xx-small;">To sum up, Alluxio provides great way to speed up data processing in update-based warehouse when you need access only to limited dataset. Potential use case: hot data that must be accessed and processed x10 times more often, but is only 10% of all dataset is an ideal candidate to be cached with Alluxio.</span><br />
<span style="font-family: Arial, Helvetica, sans-serif;"><span style="color: #545454; font-family: "roboto" , "arial" , sans-serif; font-size: xx-small;"><br /></span>
<span style="color: #444444; font-family: "roboto" , "arial" , sans-serif; font-size: xx-small;"># Einführung in Alluxio (in English)</span></span><br />
<span style="color: #545454; font-family: "roboto" , "arial" , sans-serif; font-size: x-small;"><br /></span>Anonymoushttp://www.blogger.com/profile/12572224859800644616noreply@blogger.com17tag:blogger.com,1999:blog-8382000407737271014.post-29593101110911192002017-08-31T08:47:00.000+03:002017-08-31T08:47:18.286+03:00Druid: fixed lambda<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj0CU_fGdYA1zS0MH2pxspYicf8HbcIx43xG6XnVHfXoYCxfIBaRS5oPkQBRoIBqnyj0Q-XuqikoODtvM97XQwOPxAHFtB5AwFYNjEobfBFBIpyznD7pZ6El5YQtUQEGdmmiOgjMxcLYLQQ/s1600/Druid.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="276" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj0CU_fGdYA1zS0MH2pxspYicf8HbcIx43xG6XnVHfXoYCxfIBaRS5oPkQBRoIBqnyj0Q-XuqikoODtvM97XQwOPxAHFtB5AwFYNjEobfBFBIpyznD7pZ6El5YQtUQEGdmmiOgjMxcLYLQQ/s400/Druid.PNG" width="400" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: left;">
<b>Druid </b>is an excellent high-performance, column-oriented and distributed data storage. Used by IT giant to get answers in sub-seconds from TBs (or even PBs) datasets. Needless to say I felt in love since day one. </div>
<br />
Several <a href="http://druid.io/druid-powered.html" target="_blank">examples</a>:<br />
<blockquote class="tr_bq">
<i>Netflix </i>ingest up to 2 TB per hour with the ability to query data as its being ingested</blockquote>
<blockquote class="tr_bq">
<i>eBay </i>ingest over 100.000 events'sec and supports over 100 concurrent queries without impacting ingest rate and query latency</blockquote>
<br />
Main featured of Druid that helps to stand out of the crowd:<br />
<br />
<ul>
<li>Sub-seconds query</li>
<li>Scalable to PBs</li>
<li>Real-time strams</li>
<li>Deploy anywhere (can work with Hadoop or without by processing data from S3)</li>
</ul>
<div>
I'm excited I had an opportunity to work with Druid a year ago. It's really cool, works super fast and delivers excellent result! The JSON-based query language wasn't super hard to learn, I managed even to calculate average using post action:) previous MR experience really helped.</div>
<div>
<br /></div>
<div>
One remark, I'd like to add there: </div>
<div>
we developer and tested druid based system in us-east-1 region, everything was good, deployment was automated, so we moved to prod which, surprisingly, was selected to be in Frankfurt AWS region. We got pretty nasty error in Druid when deployment script finished his work there:</div>
<div>
<span style="background-color: #f9f9f9; color: #2c2d30; font-size: 15px;">Caused by: io.druid.segment.loading.SegmentLoadingException: S3 fail!</span></div>
<div>
<br /></div>
<div>
<div class="MsoNormal">
Looks like the problem was in additional configuration required for non US-east region, unfortunately there isn’t documentation so I derived that from source code, looks like it works now:<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
On each historical node, please add the following file <b>“/opt/druid/config/_common/jets3t.properties</b>” with a content:<o:p></o:p></div>
<div class="MsoNormal">
<b>storage-service.request-signature-version=AWS4-HMAC-SHA256<o:p></o:p></b></div>
<div class="MsoNormal">
<b>s3service.s3-endpoint=s3.eu-central-1.amazonaws.com<o:p></o:p></b></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
1<sup>st</sup> line forces to use v4 auth<o:p></o:p></div>
<br />
<div class="MsoNormal">
2<sup>nd</sup> line sets endpoint, default is us-east-1, but for Frankfurt it must be s3.eu-central-1.amazonaws.com<br />
<br />
Anyway, Metamarket team - thank you for great product! Now going to test Caravela from AirBnb<br />
<br />
<br /></div>
</div>
<div>
<br /></div>
<div>
<br /></div>
Anonymoushttp://www.blogger.com/profile/12572224859800644616noreply@blogger.com0tag:blogger.com,1999:blog-8382000407737271014.post-24255816233484326852016-07-29T21:21:00.000+03:002016-07-29T21:21:01.527+03:00Protecting Spark UI, part 2: servlet filter<a href="http://simpletoad.blogspot.com/2016/07/protecting-spark-ui-part-1-nginx.html" target="_blank">In the previous post </a>it was described how to configure simple NGINX instance to add basic auth to Spark job. In this part let see what Spark's<a href="http://spark.apache.org/docs/latest/security.html" target="_blank"> suggest itself</a> by implementing filter.<br />
<br />
<a href="https://tomcat.apache.org/tomcat-5.5-doc/servletapi/javax/servlet/Filter.html" target="_blank">Filter</a> is an special class which participate in Java servlet lifecycle and is called on each request (and even response). Using filter a resource can be protected by basic authentication from unauthorized access. According to documentation the filter must be implemented and then passed (full name) as a parameter. Let's pass valid username and password through environment variables, it must be good enough, as it equals to the approach used to pass AWS credentials for instance. Obviously, this env variable must be set on the instance where driver is supposed to be run. Another option is to pass them as arguments into filter using s<i>park.<full-filter-name>.params param1=value1 param2=value2 ...</full-filter-name></i><br />
<br />
Let's imagine our class in the package <i>my.company.filters</i> (and using several helpers, like <i>commons-codec, commons-lang</i>)<br />
<br />
<span style="font-family: Courier New, Courier, monospace;">public class BasicAuthFilter implements Filter {</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;"> private String login;</span><br />
<span style="font-family: Courier New, Courier, monospace;"> private String pass;</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;"> // this method is called one time on Filter creation</span><br />
<span style="font-family: Courier New, Courier, monospace;"> public void init(FilterConfig config) {</span><br />
<span style="font-family: Courier New, Courier, monospace;"> this.login = System.getenv("SPARK_LOGIN");</span><br />
<span style="font-family: Courier New, Courier, monospace;"> this.pass = System.getenv("SPARK_PASS");</span><br />
<span style="font-family: Courier New, Courier, monospace;"> }</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;"> public void doFilter(ServletRequest req, ServletResponse res, FilterChain chain) throws IOException, ServletException {</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;"> HttpServletRequest hreq = (</span><span style="font-family: "Courier New", Courier, monospace;">HttpServletRequest</span><span style="font-family: "Courier New", Courier, monospace;">) req;</span><br />
<span style="font-family: "Courier New", Courier, monospace;"> HttpServletResponse hres = (</span><span style="font-family: "Courier New", Courier, monospace;">HttpServletResponse</span><span style="font-family: "Courier New", Courier, monospace;">) res;</span><br />
<span style="font-family: "Courier New", Courier, monospace;"><br /></span>
<span style="font-family: "Courier New", Courier, monospace;"> String auth = hreq.getHeader( "Authorization" );</span><br />
<span style="font-family: "Courier New", Courier, monospace;"> if ( auth != null ) {</span><br />
<span style="font-family: "Courier New", Courier, monospace;"> int index = auth.indexOf(' ');</span><br />
<span style="font-family: "Courier New", Courier, monospace;"> if ( index > 0 ) {</span><br />
<span style="font-family: "Courier New", Courier, monospace;"> String[] creds = StringUtils.split( new String( Base64(auth.substring(index)), Charset.UTF_8), ':' );</span><br />
<span style="font-family: "Courier New", Courier, monospace;"> if ( creds.length == 2 && login.equals(creds[0]) && pass.equals( creds[1] ) ) {</span><br />
<span style="font-family: "Courier New", Courier, monospace;"> // auth passed successfully</span><br />
<span style="font-family: "Courier New", Courier, monospace;"> return;</span><br />
<span style="font-family: "Courier New", Courier, monospace;"> }</span><br />
<span style="font-family: "Courier New", Courier, monospace;"> </span><br />
<span style="font-family: "Courier New", Courier, monospace;"> }</span><br />
<span style="font-family: "Courier New", Courier, monospace;"> }</span><br />
<span style="font-family: "Courier New", Courier, monospace;"><br /></span>
<span style="font-family: "Courier New", Courier, monospace;"> hres.setHeader( "WWW-Authenticate", "Basic realm=\"ProtectedSpark\"" );</span><br />
<span style="font-family: "Courier New", Courier, monospace;"> hres.sendError( HttpServletResponse.SC_UNAUTHORIZED );</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;"> }</span><br />
<span style="font-family: Courier New, Courier, monospace;"><br /></span>
<span style="font-family: Courier New, Courier, monospace;">}</span><br />
<br />
<br />
Ok, next step is to build JAR (pack this filter into JAR). After that, we can run our job in secured manner: execute spark-submit and pass newly assembled jar with flag <i>--jars</i> and through configuration (*.conf file or <i>--conf</i> param) pass full class path: <i>spark.ui.filters</i>=<i>my.company.filters.BasicAuthFilter</i><br />
<br />
<br />Anonymoushttp://www.blogger.com/profile/12572224859800644616noreply@blogger.com3tag:blogger.com,1999:blog-8382000407737271014.post-56502056948402362982016-07-29T18:56:00.000+03:002016-07-29T21:21:27.959+03:00Protecting Spark UI, part 1: nginx<a href="http://spark.apache.org/docs/latest/monitoring.html" target="_blank">Apache Spark WEB UI</a> is a descent place to check cluster health and monitor job performance, starting point for almost every performance optimization. A guys from Databricks hardworking on improvements of UI from version to version.<br />
But it still have one issue which I'm facing on every project and which must be resolver every time: I'm talking about publicity of this information, everyone how can reach the port (defaults, 8080 or 4040) can then access UI, and all information there (and there are a lot of stuff you want to keep private).<br />
<br />
There are several solution to deal with it:<br />
<br />
<ol>
<li>Close all ports and configure nginx to listen specific port and forward requests (of course w/ basic authentication)</li>
<li>protect UI using <a href="http://simpletoad.blogspot.com/2016/07/protecting-spark-ui-part-2-servlet.html" target="_blank">Spark's built-in method</a>: implementing own filter</li>
</ol>
<div>
In this post, let's start from <b>How to protect Spark UI with NGINX?</b></div>
<div>
<br /></div>
<div>
The instruction below is suitable for <b>protecting standalone spark Web UI </b>when job is executed in client mode (so you can predict where driver is up and run).</div>
<div>
<br /></div>
<div>
Let's assume that there is a node with both spark and nginx installed (obviously they can be on different nodes).</div>
<div>
<br /></div>
<div>
First of all, close all spark related ports (and there are a lot of them): they must be still accessible in-network. In Amazon, it easy to do with security groups: just specify appropriate CIDR mask for each inbound rule, for instance <span style="background-color: white; color: #444444; font-family: "helvetica neue" , "roboto" , "arial" , sans-serif; font-size: 14px; line-height: 27px;">172.16.0.0/12</span>. Next, open 2 ports not used by Spark, but which you're going to make accessible to get into spark master ui or spark driver ui: just for example let's assume it's 2020 and 2020.</div>
<div>
<br /></div>
<div>
Now the small part left: configure nginx to perform basic auth and forward requests to Spark UI. In this case nginx is in provate network, so request will be handled by Spark and UI actually presented to end user. </div>
<div>
<br /></div>
<div>
Before configuring nginx itself, the file to keep proper configuration must be created:</div>
<div>
It's simple to do with <span style="background-color: rgba(0 , 0 , 0 , 0.0470588); color: #3a3a3a; font-family: monospace , monospace; font-size: 14px; line-height: 21px; white-space: pre;">htpasswd</span> tool, can be installed by running <span style="background-color: rgba(0 , 0 , 0 , 0.0470588); color: #3a3a3a; font-family: monospace , monospace; font-size: 14px; line-height: 21px; white-space: pre;">sudo yum install -y httpd-tools</span></div>
<div>
<br /></div>
<div>
Then generate password and store it into a file (user name will be spark and passowrd entered in CLI):</div>
<div>
<span style="background-color: rgba(0 , 0 , 0 , 0.0470588); color: #3a3a3a; font-family: monospace , monospace; font-size: 14px; line-height: 21px; white-space: pre;">sudo htpasswd -c /etc/nginx/.htpasswd spark</span></div>
<div>
<br /></div>
<div>
Last step is to create proper nginx configuration (the eample is only to forward all request on Spark Master 8080 to 2000):</div>
<div>
vi <span style="background-color: rgba(0 , 0 , 0 , 0.0470588); color: #3a3a3a; font-family: monospace , monospace; font-size: 14px; line-height: 21px; white-space: pre;">/etc/nginx/nginx2001.conf</span></div>
<div>
<br /></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;">{</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;"> events {</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;"> </span><span class="pl-k" style="background-color: white; box-sizing: border-box; color: #a71d5d; font-family: "consolas" , "liberation mono" , "menlo" , "courier" , monospace; font-size: 12px; line-height: 20px; white-space: pre;">worker_connections</span><span style="background-color: white; color: #333333; font-family: "consolas" , "liberation mono" , "menlo" , "courier" , monospace; font-size: 12px; line-height: 20px; white-space: pre;"> 1</span><span class="pl-s" style="background-color: white; box-sizing: border-box; color: #183691; font-family: "consolas" , "liberation mono" , "menlo" , "courier" , monospace; font-size: 12px; line-height: 20px; white-space: pre;">000</span><span style="background-color: white; color: #333333; font-family: "consolas" , "liberation mono" , "menlo" , "courier" , monospace; font-size: 12px; line-height: 20px; white-space: pre;">;</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;"> }</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;"> </span><span class="pl-k" style="background-color: white; box-sizing: border-box; color: #a71d5d; font-family: "consolas" , "liberation mono" , "menlo" , "courier" , monospace; font-size: 12px; line-height: 20px; white-space: pre;">server</span><span style="background-color: white; color: #333333; font-family: "consolas" , "liberation mono" , "menlo" , "courier" , monospace; font-size: 12px; line-height: 20px; white-space: pre;"> {</span></div>
<div>
<span style="background-color: white; color: #333333; font-family: "consolas" , "liberation mono" , "menlo" , "courier" , monospace; font-size: 12px; line-height: 20px; white-space: pre;"> </span><span class="pl-k" style="background-color: white; box-sizing: border-box; color: #a71d5d; font-family: "consolas" , "liberation mono" , "menlo" , "courier" , monospace; font-size: 12px; line-height: 20px; white-space: pre;">listen</span><span style="background-color: white; color: #333333; font-family: "consolas" , "liberation mono" , "menlo" , "courier" , monospace; font-size: 12px; line-height: 20px; white-space: pre;"> </span><span class="pl-s" style="background-color: white; box-sizing: border-box; color: #183691; font-family: "consolas" , "liberation mono" , "menlo" , "courier" , monospace; font-size: 12px; line-height: 20px; white-space: pre;">2020</span><span style="background-color: white; color: #333333; font-family: "consolas" , "liberation mono" , "menlo" , "courier" , monospace; font-size: 12px; line-height: 20px; white-space: pre;">;</span></div>
<div>
<span style="background-color: white; color: #333333; font-family: "consolas" , "liberation mono" , "menlo" , "courier" , monospace; font-size: 12px; line-height: 20px; white-space: pre;"><br /></span></div>
<div>
<span style="background-color: white; color: #333333; font-family: "consolas" , "liberation mono" , "menlo" , "courier" , monospace; font-size: 12px; line-height: 20px; white-space: pre;"> </span></div>
<table class="highlight tab-size js-file-line-container" data-tab-size="8" style="background: rgb(255, 255, 255); border-collapse: collapse; border: 0px; color: #333333; font-family: Consolas, "Liberation Mono", Menlo, Courier, monospace; font-size: 12px; line-height: 1.4; margin: 0px; padding: 0px;"><tbody>
<tr><td class="blob-code blob-code-inner js-file-line" id="file-gistfile1-nginxconf-LC33" style="background: transparent; border: 0px; line-height: 20px; overflow: visible; padding: 1px 10px !important; position: relative; vertical-align: top; white-space: pre; word-wrap: normal;"><span class="pl-s" style="color: #183691;"> auth_basic "</span>Private Beta<span class="pl-s" style="color: #183691;">";
auth_basic_user_file /etc/nginx/.htpasswd;</span></td></tr>
</tbody></table>
<div>
<span style="background-color: white; color: #333333; font-family: "consolas" , "liberation mono" , "menlo" , "courier" , monospace; font-size: 12px; line-height: 20px; white-space: pre;"><br /></span></div>
<div>
<span style="background-color: white; color: #333333; font-family: "consolas" , "liberation mono" , "menlo" , "courier" , monospace; font-size: 12px; line-height: 20px; white-space: pre;"> </span><span style="background-color: white; color: #183691; font-family: "consolas" , "liberation mono" , "menlo" , "courier" , monospace; font-size: 12px; line-height: 20px; white-space: pre;">location / {</span></div>
<div>
<span style="background-color: white; color: #183691; font-family: "consolas" , "liberation mono" , "menlo" , "courier" , monospace; font-size: 12px; line-height: 20px; white-space: pre;"> proxy_pass </span><span style="background-color: white; color: #183691; font-family: "consolas" , "liberation mono" , "menlo" , "courier" , monospace; font-size: 12px; line-height: 20px; white-space: pre;"> http://localhost:8080;</span></div>
<div>
<span style="background-color: white; color: #183691; font-family: "consolas" , "liberation mono" , "menlo" , "courier" , monospace; font-size: 12px; line-height: 20px; white-space: pre;"> }</span></div>
<div>
<span style="background-color: white; color: #333333; font-family: "consolas" , "liberation mono" , "menlo" , "courier" , monospace; font-size: 12px; line-height: 20px; white-space: pre;"> }</span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;"><br /></span></div>
<div>
<span style="font-family: "courier new" , "courier" , monospace;">}</span></div>
<div>
<br /></div>
<div>
Actually, that's it. After that we just need to start nginx</div>
<div>
nginx -c <span style="background-color: rgba(0 , 0 , 0 , 0.0470588); color: #3a3a3a; font-family: monospace , monospace; font-size: 14px; line-height: 21px; white-space: pre;">/etc/nginx/nginx2001.conf</span></div>
<div>
<br /></div>
<div>
And point prowser to HOST:2020 to be asked enter credentials and only after that be redirected to Spark Master UI.</div>
<div>
<br /></div>
Anonymoushttp://www.blogger.com/profile/12572224859800644616noreply@blogger.com2tag:blogger.com,1999:blog-8382000407737271014.post-16450864422056787302015-10-06T00:20:00.002+03:002015-10-06T00:20:30.658+03:00Apache Zeppelin: impressionsA notebooks are getting more and more attraction from data analytics, data scientists and developers. Jupiter is a famous notebooks created by Python guys and widely adopted among different users. At the same time, the new notebook provider was recently born: <b>Apache Zeppelin with main focus on integration with BigData technology stack</b>.<br />
<br />
In fact, Apache Zeppelin provides build-in integration with Apache Spark (and SparkSQL), Apache Flink, Hive, Ignite, Tajo (does someone outside South Korea is using that?), definitely markdown and html, and event AngularJS. It's good part about Zeppelin. Also, <b>Ambari integration </b>give a possibility to install Zeppelin in "a couple clicks" and get access through Amabari Views. And practically it works very well:<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhzM6Tw5WzwReKIkJ7vckFLiWvCvqayQAh6S9yYx_g573J0_fZk4leLBoNp3cM212Th77vj9ArRvR5oB9nNe9bQCjarW5snzNbjRf4jLYPPkG7w-qP1mmLOaX-o3KQEpxlfnMbqFpxGuy3j/s1600/Zeppelin2.PNG" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="364" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhzM6Tw5WzwReKIkJ7vckFLiWvCvqayQAh6S9yYx_g573J0_fZk4leLBoNp3cM212Th77vj9ArRvR5oB9nNe9bQCjarW5snzNbjRf4jLYPPkG7w-qP1mmLOaX-o3KQEpxlfnMbqFpxGuy3j/s640/Zeppelin2.PNG" width="640" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: left;">
<br /></div>
And now I'd like to focus on the <b>what's wrong with Apache Zeppelin</b>:<br />
<br />
1) <b>Security</b>. Zeppelin 0.5 doesn't have security. Anybody can open any notebook, view and edit that. It doesn't work for enterprises, moreover it doesn't work even for RnD. I want to have protected notebooks, I want to have roles and groups, and give notebook only to specific group of people for specific set of actions.<br />
2) <b>Workspace</b>. One-level list of notebooks, really? That's awful. Guys, add possibility to combine them in folders of folders and etc, it's really important. Also, only one way to backup notebooks, is to backups underlying folders from filesystem. Not very good, UI button is required at least.<br />
3) <b>Security 2</b>. I've already written about notebooks security, but data on storage is also must be protected. Currently Zeppelin run everything as ZEPPELIN user, and I have to share data with ZEPPELIN users which is not what I want to do. So, it makes sense for each notebook to provide a setting "run as" to specify specific user for this research. Enterprises really value that.<br />
<br />
Personally I also tried to make it works on Docker (more or less it works) and EMR (failed, and everybody failed as far as I know).<br />
<br />
<b>To sum up</b>: Zeppelin is an interesting and promising product, but it has to much weakness to be seriously used and consider for production projects, specially for enterprises. So, in technology radar I can definitely put Zeppelin into the section "Be informed"Anonymoushttp://www.blogger.com/profile/12572224859800644616noreply@blogger.com30tag:blogger.com,1999:blog-8382000407737271014.post-83279546376750037072015-07-13T20:09:00.000+03:002015-07-13T20:09:02.532+03:00How to waste the whole day with Spark Streaming and HBaseThe "funny" story how to waste the whole day debugging resolving simple case... tips and tricks :)<br />
<br />
<b>Spark Streaming application hangs out </b>on action and nothing is changing during hours.<br />
The long story is: custom Receiver accept events from external source and store them to RDD (actually, DStream) for future processing. When I run it I noticed that action hung out! And what was a really scare: the messages were read from source. After spending couple hours trying to find the issue with Reciever, I realized it works fine and finally found the issue ... <ruffle> ... in how I run the job!</ruffle><br />
<br />
I did it in local environment first and submit it to YARN:<br />
<span style="color: #38761d;">...</span><br />
<span style="color: #38761d;">--num-executors 2</span><br />
<span style="color: #38761d;">...</span><br />
It fact, it doesn't work for me because no one worker (spark executor) was able to start! So, just by increasing number of executors to <span style="color: #38761d;">3</span>, I was able to make everything working.<br />
<br />
<b>HBase related Spark Streaming application hangs out </b>and nothing is changing during hours.<br />
Again, the long story is then Spark Streaming application hangs out as soon as it touch HBase. I spent several hours (again) and I was really surprised when found the reason: HBase connection was broken. OMG! I haven't seen any errors or warning related to HBase connection in logs... what is the reason? In fact, HBase tried to establish connection again and again without throwing an error. Consider the following a piece of code (grey - my original part, when blue - an update that helped me to overcome the issue):<br />
<span style="color: #666666;"> <span style="font-family: Courier New, Courier, monospace;">Configuration config = HBaseConfiguration.create();</span></span><br />
<span style="color: #666666; font-family: Courier New, Courier, monospace;"> config.set(HConstants.ZOOKEEPER_QUORUM, "host:port");</span><br />
<span style="color: #666666; font-family: Courier New, Courier, monospace;"> config.set(HConstants.ZOOKEEPER_ZNODE_PARENT, "/hbase");</span><br />
<span style="font-family: Courier New, Courier, monospace;"> <span style="color: #0b5394;"> config.set("hbase.client.retries.number", Integer.toString(3));</span></span><br />
<span style="color: #0b5394; font-family: Courier New, Courier, monospace;"> config.set("zookeeper.session.timeout", Integer.toString(60000));</span><br />
<span style="color: #0b5394; font-family: Courier New, Courier, monospace;"> config.set("zookeeper.recovery.retry", Integer.toString(0));</span><br />
<br />
It really helps because of the number of retries was limited. Default value is 35 and can definitely confuse.Anonymoushttp://www.blogger.com/profile/12572224859800644616noreply@blogger.com0tag:blogger.com,1999:blog-8382000407737271014.post-71832346165589055012015-06-09T15:44:00.002+03:002015-06-09T15:44:11.383+03:00How to present XML in Hive flat table after XSLT transformationLet's start from defining a task. Imaging that the dataset is a set of XML files and the requirement is to present some specific information from this file as simple flat structure. Let's illustrate:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjNbvTq6_Pbtgd9UxliAXQcg_lL5NjaTtbyACvHJuQoEl7wbwGo4NQfQoYqPIQ_UY9wyLCEgskEx0j9XkIKvWMf8sLknH3SI0gjG-MWUtqIcO5g6Bxg5ak04doxCb2ZUR5k2Nm6dWgr6A7y/s1600/HierToFlat.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="148" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjNbvTq6_Pbtgd9UxliAXQcg_lL5NjaTtbyACvHJuQoEl7wbwGo4NQfQoYqPIQ_UY9wyLCEgskEx0j9XkIKvWMf8sLknH3SI0gjG-MWUtqIcO5g6Bxg5ak04doxCb2ZUR5k2Nm6dWgr6A7y/s320/HierToFlat.png" width="320" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
Definetely, we can use SerDe for XML, but what if XML structure is not defined before hand and we want to give end-user a chance to control parsing process? One of possible solutions is to incorporate XSLT to transform XML to desired format.<br />
<br />
<a name='more'></a><br /><br />
A bit late I will reveal how XML might be applied from Hive query, but now let's focus on XSLT.<br />
Highlevel XSLT looks like:<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEibuIVeWyFgzOth2LYgUa-R9rQVmoBZGbK5rMhQYDg1MTL67KrX6zaxk-z2RJ3cXQ6wdVyWeqrtLgHaFwFHx7DGFqeCpOz0rse34D6NMzPCCDExyRS1rJCnt3gTWil3R7TSYYOCHQCM-JzY/s1600/XSLT.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="237" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEibuIVeWyFgzOth2LYgUa-R9rQVmoBZGbK5rMhQYDg1MTL67KrX6zaxk-z2RJ3cXQ6wdVyWeqrtLgHaFwFHx7DGFqeCpOz0rse34D6NMzPCCDExyRS1rJCnt3gTWil3R7TSYYOCHQCM-JzY/s400/XSLT.png" width="400" /></a></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<br /></div>
Let's store this XSLT into transformation.xslt file.
We are going to use TRANFORM functionality from Hive. Groovy contains realy straighforward way to call XSLT transformation as it might be used to run XSLT transformation from Hive. This blogpost http://www.pleus.net/blog/?p=1448 contains a great overwrite how to do that. Afterthat, we can store groovy file as run-transformation.groovy. Dont' forget to pass file path to XSLT file as argument
And the last step, is to prepare HQL file which will contain Hive script and run transformation on cluster in distributed mode:
<!-- HTML generated using hilite.me --><br />
<br />
<div style="background: #f8f8f8; border-width: .1em .1em .1em .8em; border: solid gray; overflow: auto; padding: .2em .6em; width: auto;">
<pre style="line-height: 125%; margin: 0;"><span style="color: black;">add</span> <span style="color: black;">file</span> <span style="color: black;">$</span><span style="color: black; font-weight: bold;">{</span><span style="color: black;">hiveconf</span><span style="color: #ce5c00; font-weight: bold;">:</span><span style="color: black;">resources</span><span style="color: black; font-weight: bold;">}</span><span style="color: #ce5c00; font-weight: bold;">/</span><span style="color: black;">run_transformation</span><span style="color: black; font-weight: bold;">.</span><span style="color: black;">groovy</span><span style="color: black; font-weight: bold;">;</span>
<span style="color: black;">add</span> <span style="color: black;">file</span> <span style="color: black;">$</span><span style="color: black; font-weight: bold;">{</span><span style="color: black;">hiveconf</span><span style="color: #ce5c00; font-weight: bold;">:</span><span style="color: black;">resources</span><span style="color: black; font-weight: bold;">}</span><span style="color: #ce5c00; font-weight: bold;">/</span><span style="color: black;">transformation</span><span style="color: black; font-weight: bold;">.</span><span style="color: black;">xslt</span><span style="color: black; font-weight: bold;">;</span>
<span style="color: black;">set</span> <span style="color: black;">hive</span><span style="color: black; font-weight: bold;">.</span><span style="color: black;">execution</span><span style="color: black; font-weight: bold;">.</span><span style="color: black;">engine</span><span style="color: #ce5c00; font-weight: bold;">=</span><span style="color: black;">tez</span><span style="color: black; font-weight: bold;">;</span>
<span style="color: black;">set</span> <span style="color: black;">hive</span><span style="color: black; font-weight: bold;">.</span><span style="color: black;">merge</span><span style="color: black; font-weight: bold;">.</span><span style="color: black;">mapfiles</span><span style="color: #ce5c00; font-weight: bold;">=</span><span style="color: #204a87; font-weight: bold;">false</span><span style="color: black; font-weight: bold;">;</span>
<span style="color: black;">set</span> <span style="color: black;">hive</span><span style="color: black; font-weight: bold;">.</span><span style="color: black;">input</span><span style="color: black; font-weight: bold;">.</span><span style="color: black;">format</span><span style="color: #ce5c00; font-weight: bold;">=</span><span style="color: black;">org</span><span style="color: black; font-weight: bold;">.</span><span style="color: black;">apache</span><span style="color: black; font-weight: bold;">.</span><span style="color: black;">hadoop</span><span style="color: black; font-weight: bold;">.</span><span style="color: black;">hive</span><span style="color: black; font-weight: bold;">.</span><span style="color: black;">ql</span><span style="color: black; font-weight: bold;">.</span><span style="color: black;">io</span><span style="color: black; font-weight: bold;">.</span><span style="color: black;">HiveInputFormat</span><span style="color: black; font-weight: bold;">;</span>
<span style="color: black;">select</span> <span style="color: black;">TRANSFORM</span><span style="color: black; font-weight: bold;">(</span><span style="color: black;">content</span><span style="color: black; font-weight: bold;">)</span>
<span style="color: black;">USING</span> <span style="color: #4e9a06;">'groovy run_transformation.groovy transformation.xslt'</span> <span style="color: black;">as</span> <span style="color: black; font-weight: bold;">(</span><span style="color: black;">A2</span><span style="color: black; font-weight: bold;">,</span><span style="color: black;">E</span><span style="color: black; font-weight: bold;">,</span><span style="color: black;">TAG</span><span style="color: black; font-weight: bold;">)</span>
<span style="color: black;">from</span> <span style="color: black;">$</span><span style="color: black; font-weight: bold;">{</span><span style="color: black;">hiveconf</span><span style="color: #ce5c00; font-weight: bold;">:</span><span style="color: black;">schemaName</span><span style="color: black; font-weight: bold;">}.</span><span style="color: black;">$</span><span style="color: black; font-weight: bold;">{</span><span style="color: black;">hiveconf</span><span style="color: #ce5c00; font-weight: bold;">:</span><span style="color: black;">tableName</span><span style="color: black; font-weight: bold;">};</span>
</pre>
</div>
Anonymoushttp://www.blogger.com/profile/12572224859800644616noreply@blogger.com0tag:blogger.com,1999:blog-8382000407737271014.post-23943599114997841132015-01-30T21:48:00.000+02:002015-01-30T21:48:25.517+02:00Demystify BloomFilter on HadoopI believe most of you have seen <a href="https://hadoop.apache.org/docs/r2.3.0/api/org/apache/hadoop/util/bloom/BloomFilter.html">BloomFilter </a>class. But how to correctly use it?<br />
<br />
Accordint to Wikipedia, "<span style="background-color: white; color: #252525; font-family: sans-serif; font-size: 14px; line-height: 22.3999996185303px;">A </span><b style="background-color: white; color: #252525; font-family: sans-serif; font-size: 14px; line-height: 22.3999996185303px;"><a href="http://en.wikipedia.org/wiki/Bloom_filter" target="_blank">Bloom filter</a></b><span style="background-color: white; color: #252525; font-family: sans-serif; font-size: 14px; line-height: 22.3999996185303px;"> is a space-efficient </span><a class="mw-redirect" href="http://en.wikipedia.org/wiki/Probabilistic" style="background: none rgb(255, 255, 255); color: #0b0080; font-family: sans-serif; font-size: 14px; line-height: 22.3999996185303px; text-decoration: none;" title="Probabilistic">probabilistic</a><span style="background-color: white; color: #252525; font-family: sans-serif; font-size: 14px; line-height: 22.3999996185303px;"> </span><a href="http://en.wikipedia.org/wiki/Data_structure" style="background: none rgb(255, 255, 255); color: #0b0080; font-family: sans-serif; font-size: 14px; line-height: 22.3999996185303px; text-decoration: none;" title="Data structure">data structure</a><span style="background-color: white; color: #252525; font-family: sans-serif; font-size: 14px; line-height: 22.3999996185303px;">, conceived by Burton Howard Bloom in 1970, that is used to test whether an </span><a href="http://en.wikipedia.org/wiki/Element_(mathematics)" style="background: none rgb(255, 255, 255); color: #0b0080; font-family: sans-serif; font-size: 14px; line-height: 22.3999996185303px; text-decoration: none;" title="Element (mathematics)">element</a><span style="background-color: white; color: #252525; font-family: sans-serif; font-size: 14px; line-height: 22.3999996185303px;"> is a member of a </span><a class="mw-redirect" href="http://en.wikipedia.org/wiki/Set_(computer_science)" style="background: none rgb(255, 255, 255); color: #0b0080; font-family: sans-serif; font-size: 14px; line-height: 22.3999996185303px; text-decoration: none;" title="Set (computer science)">set</a><span style="background-color: white; color: #252525; font-family: sans-serif; font-size: 14px; line-height: 22.3999996185303px;">. </span><a href="http://en.wikipedia.org/wiki/Type_I_and_type_II_errors" style="background: none rgb(255, 255, 255); color: #0b0080; font-family: sans-serif; font-size: 14px; line-height: 22.3999996185303px; text-decoration: none;" title="Type I and type II errors">False positive</a><span style="background-color: white; color: #252525; font-family: sans-serif; font-size: 14px; line-height: 22.3999996185303px;"> matches are possible, but </span><a href="http://en.wikipedia.org/wiki/Type_I_and_type_II_errors" style="background: none rgb(255, 255, 255); color: #0b0080; font-family: sans-serif; font-size: 14px; line-height: 22.3999996185303px; text-decoration: none;" title="Type I and type II errors">false negatives</a><span style="background-color: white; color: #252525; font-family: sans-serif; font-size: 14px; line-height: 22.3999996185303px;"> are not, thus a Bloom filter has a 100% </span><a href="http://en.wikipedia.org/wiki/Precision_and_recall" style="background: none rgb(255, 255, 255); color: #0b0080; font-family: sans-serif; font-size: 14px; line-height: 22.3999996185303px; text-decoration: none;" title="Precision and recall">recall</a><span style="background-color: white; color: #252525; font-family: sans-serif; font-size: 14px; line-height: 22.3999996185303px;"> rate. In other words, a query returns either "possibly in set" or "definitely not in set".</span>"<br />
<br />
Also, I found this site wich give a very goo description of Bloom filter with perfect visualization, please <a href="http://billmill.org/bloomfilter-tutorial/" target="_blank">check</a><br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://www.tunersports.com/images/products/1337293023-blox-air-filter.jpg" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://www.tunersports.com/images/products/1337293023-blox-air-filter.jpg" height="200" width="200" /></a></div>
<br />
<br />
As it is clear from Bloom filter definition, this datastructure can really help when we need to filter some records. Particularly, performing join: in this case we can transform small dataset into filter, and then apply filter on map stage in second MR, which perform a real join. In other words, we will have 2 MR when 1st is used for creating filter and 2nd is used to perform filtrtion on map and join on reduce.<br />
<br />
Ok, first MepReduce contains 2 stages: mapper and reducer, because in result we should got exactly one Bloom filter object:<br />
<br />
<ol>
<li>initialize BloomFilter object as Mapper clas member: <i>BloomFilter = new BloomFilter(10000, 10, hash.MURMUR_HASH)</i></li>
<li>on each record, add it to filter: <i>filter.add( new Key(str.getBytes()) );</i></li>
<li>emmit data only in cleanup method, for example you can just write file withoutusing context at all</li>
</ol>
<br />
Your filter is prepared now, it can be desiarilized at any place and used for data filtration.<br />
<br />
<br />Anonymoushttp://www.blogger.com/profile/12572224859800644616noreply@blogger.com0tag:blogger.com,1999:blog-8382000407737271014.post-23187140318852339972015-01-23T14:39:00.003+02:002015-01-23T14:43:59.807+02:00Composite join with MapReduceAs everyone knows, map-side join is the most effective techniques to join datasets on Hadoop. However, at the same time it gives a possibility to join ONE BIG dataset and ONE OR MORE SAMLL datasets. This is the limitation, because sometimes you wish to join TWI HUGE datasets. Typically, this is the use case for reducer-side join, but it cause Cartesian product and obviously we would like to ommit so heavy operation.<br />
<br />
And this is time for <b>Composite join: map-side join on huge datasets</b>. In fact, both datasets must meet several requirements in this case:<br />
<br />
<ol>
<li>The datasets are all sorted by the join key</li>
<li>Each dataset has the same number of file (you can achive that by setting reducers number)</li>
<li>File N in each dataset contains the same join key K</li>
<li>Each file is not splitable</li>
</ol>
<div>
In this case you can perform map join to join block from dataset A versus block from dataset B. Hadoop API provides <a href="https://hadoop.apache.org/docs/current/api/org/apache/hadoop/mapred/join/CompositeInputFormat.html#compose(java.lang.String, java.lang.Class, org.apache.hadoop.fs.Path...)" target="_blank">CompositeInputFormat </a>to achive this requirement. Example of usage:</div>
<div>
<br /></div>
<!-- HTML generated using hilite.me --><br />
<div style="background: #f8f8f8; border-width: .1em .1em .1em .8em; border: solid gray; overflow: auto; padding: .2em .6em; width: auto;">
<pre style="line-height: 125%; margin: 0;"><span style="color: #8f5902; font-style: italic;">// in job configuration you have to set</span>
<span style="color: black;">job</span><span style="color: #ce5c00; font-weight: bold;">.</span><span style="color: #c4a000;">setInputFormatClass</span><span style="color: #ce5c00; font-weight: bold;">(</span><span style="color: black;">CompositeInputFormat</span><span style="color: #ce5c00; font-weight: bold;">.</span><span style="color: #c4a000;">class</span><span style="color: #ce5c00; font-weight: bold;">);</span>
<span style="color: #8f5902; font-style: italic;">// inner - reference to inner join (you can specify outer as well)</span>
<span style="color: #8f5902; font-style: italic;">// d1, d2 - Path to both datasets</span>
<span style="color: black;">job</span><span style="color: #ce5c00; font-weight: bold;">.</span><span style="color: #c4a000;">getConfiguration</span><span style="color: #ce5c00; font-weight: bold;">().</span><span style="color: #c4a000;">set</span><span style="color: #ce5c00; font-weight: bold;">(</span><span style="color: black;">CompositeInputFormat</span><span style="color: #ce5c00; font-weight: bold;">.</span><span style="color: #c4a000;">JOIN_EXPR</span><span style="color: #ce5c00; font-weight: bold;">,</span> <span style="color: black;">CompositeInputFormat</span><span style="color: #ce5c00; font-weight: bold;">.</span><span style="color: #c4a000;">compose</span><span style="color: #ce5c00; font-weight: bold;">(</span><span style="color: #4e9a06;">"inner"</span><span style="color: #ce5c00; font-weight: bold;">,</span> <span style="color: black;">KeyValueTextInputFormat</span><span style="color: #ce5c00; font-weight: bold;">.</span><span style="color: #c4a000;">class</span><span style="color: #ce5c00; font-weight: bold;">,</span> <span style="color: black;">d1</span><span style="color: #ce5c00; font-weight: bold;">,</span> <span style="color: black;">d2</span><span style="color: #ce5c00; font-weight: bold;">));</span>
<span style="color: black;">job</span><span style="color: #ce5c00; font-weight: bold;">.</span><span style="color: #c4a000;">setNumReduceTasks</span><span style="color: #ce5c00; font-weight: bold;">(</span><span style="color: #0000cf; font-weight: bold;">0</span><span style="color: #ce5c00; font-weight: bold;">);</span>
</pre>
</div>
<div>
<br /></div>
<br />
<br />
The mapper with have key-value pair of type Text, TupleWritable:<br />
<!-- HTML generated using hilite.me --><br />
<div style="background: #f8f8f8; border-width: .1em .1em .1em .8em; border: solid gray; overflow: auto; padding: .2em .6em; width: auto;">
<pre style="line-height: 125%; margin: 0;"><span style="color: #5c35cc; font-weight: bold;">@Override</span>
<span style="color: #204a87; font-weight: bold;">public</span> <span style="color: #204a87; font-weight: bold;">void</span> <span style="color: black;">map</span><span style="color: #ce5c00; font-weight: bold;">(</span><span style="color: black;">Text</span> <span style="color: black;">key</span><span style="color: #ce5c00; font-weight: bold;">,</span> <span style="color: black;">TupleWritable</span> <span style="color: black;">value</span><span style="color: #ce5c00; font-weight: bold;">,</span> <span style="color: black;">Context</span> <span style="color: black;">ctx</span><span style="color: #ce5c00; font-weight: bold;">)</span> <span style="color: #ce5c00; font-weight: bold;">{</span>
<span style="color: #ce5c00; font-weight: bold;">...</span>
<span style="color: #ce5c00; font-weight: bold;">}</span>
</pre>
<pre style="line-height: 125%; margin: 0;"><span style="color: #ce5c00; font-weight: bold;">
</span></pre>
<pre style="line-height: 125%; margin: 0;"><span style="color: #ce5c00; font-weight: bold;">
</span></pre>
</div>
Bonus: you can use this powerful feature with Hive! <b>Composite join in Hive</b>: To do that, the following hive properties must be set:<br />
hive.input.format=org.apache.hadoop.give.ql.io.BucketizedHiveInputFormat;<br />
hive.optimize.bucketmapjoin=truel<br />
hive.optimize.bucketmapjoin.sortedmerge=true;<br />
<br />
<br />
Ofcourse, it requires all the keys to be sorted in both tables and then must be bucketized in the same number of bucketsAnonymoushttp://www.blogger.com/profile/12572224859800644616noreply@blogger.com0tag:blogger.com,1999:blog-8382000407737271014.post-15773911351446149192015-01-23T10:02:00.002+02:002015-01-23T10:02:38.548+02:00Kafka web console with DockerMy first Docker file aims to run Kafka Web Console (application for monitoring Apache Kafka):<br />
<br />
<!-- HTML generated using hilite.me --><br />
<div style="background: #f8f8f8; border-width: .1em .1em .1em .8em; border: solid gray; overflow: auto; padding: .2em .6em; width: auto;">
<pre style="line-height: 125%; margin: 0;"><span style="color: black;">FROM ubuntu:trusty</span>
<span style="color: black;">RUN apt-get update; apt-get install -y unzip openjdk-7-jdk wget git docker.io</span>
<span style="color: black;">RUN wget http://downloads.typesafe.com/play/2.2.6/play-2.2.6.zip</span>
<span style="color: black;">RUN unzip play-2.2.6.zip -d /tmp</span>
<span style="color: black;">RUN wget https://github.com/claudemamo/kafka-web-console/archive/master.zip</span>
<span style="color: black;">RUN unzip master.zip -d /tmp</span>
<span style="color: black;">WORKDIR /tmp/kafka-web-console-master</span>
<span style="color: black;">CMD ../play-2.2.6/play "start -DapplyEvolutions.default=true"</span>
</pre>
</div>
<br />
<br />
Dockerfile might be buid with command:<br />
<i>docker build -t kafka/web-console:2.0 .
</i><br />
and run as:<br />
<i>docker run -i -t -p 9000:9000 kafka/web-console:2.0
</i><br />
<br />
At the end, Kafka Web Console will be available at host:9000 - zookeeper hosts must be added and Kafka brokers will be discovered aautomaticallyAnonymoushttp://www.blogger.com/profile/12572224859800644616noreply@blogger.com0tag:blogger.com,1999:blog-8382000407737271014.post-61170881113907258002014-11-11T16:14:00.000+02:002014-11-11T16:14:07.434+02:00Spark and Location Sensitive Hashing, part 2This is a second part of topic about Locality Sensitive Hashing, and here is example of creating working example using Apache Spark.<br />
<br />
Let's start from definition of task: there are two datasets - bank accounts and web-site visitors. In common, they have only name, but it's possible misspeling. Let's consider the following example:<br />
<br />
<i>Bank Accounts</i><br />
<br />
<table border="1">
<tbody>
<tr><td><b>Name</b></td> <td>Tom Soyer</td> <td>Andy Bin</td> <td>Tom Wiscor</td> <td>Tomas Soy<span style="font-family: Arial, Helvetica, sans-serif; font-size: 18px; line-height: 23.3999996185303px;">é</span>r</td> </tr>
<tr><td><b>Credit score</b></td> <td>10</td> <td>20</td> <td>30</td> <td>40</td> </tr>
</tbody></table>
<br />
<i>Web-site Visitors</i><br />
<br />
<table border="1">
<tbody>
<tr><td><b>Name</b></td> <td>Tom Soyer</td> <td>Andrew Bin</td> <td>Tom Viscor</td> <td>Thomas Soyer</td> </tr>
<tr><td><b>email</b></td> <td>1@1</td> <td>2@1</td> <td>3@1</td> <td>2@2</td> </tr>
</tbody></table>
<br />
<div>
<a name='more'></a>Well, we have to join these two data sets by name, and as misspeling is possible, I will use Hamming distance to find the most similar names in bucket. So, Hamming function is following:</div>
<div>
<br /></div>
<!-- HTML generated using hilite.me --><br />
<div style="background: #272822; border-width: .1em .1em .1em .8em; border: solid gray; overflow: auto; padding: .2em .6em; width: auto;">
<pre style="line-height: 125%; margin: 0;"><span style="color: #66d9ef;">private</span> <span style="color: #66d9ef;">def</span> <span style="color: #f8f8f2;">stringDistance</span><span style="color: #f92672;">(</span><span style="color: #f8f8f2;">s1</span><span style="color: #66d9ef;">:</span> <span style="color: #66d9ef;">String</span><span style="color: #f92672;">,</span> <span style="color: #f8f8f2;">s2</span><span style="color: #66d9ef;">:</span> <span style="color: #66d9ef;">String</span><span style="color: #f92672;">)</span><span style="color: #66d9ef;">:</span> <span style="color: #66d9ef;">Int</span> <span style="color: #f92672;">=</span> <span style="color: #f92672;">{</span>
<span style="color: #66d9ef;">def</span> <span style="color: #f8f8f2;">min</span><span style="color: #f92672;">(</span><span style="color: #f8f8f2;">a</span><span style="color: #66d9ef;">:Int</span><span style="color: #f92672;">,</span> <span style="color: #f8f8f2;">b</span><span style="color: #66d9ef;">:Int</span><span style="color: #f92672;">,</span> <span style="color: #f8f8f2;">c</span><span style="color: #66d9ef;">:Int</span><span style="color: #f92672;">)</span> <span style="color: #66d9ef;">=</span> <span style="color: #a6e22e;">Math</span><span style="color: #f92672;">.</span><span style="color: #f8f8f2;">min</span><span style="color: #f92672;">(</span> <span style="color: #a6e22e;">Math</span><span style="color: #f92672;">.</span><span style="color: #f8f8f2;">min</span><span style="color: #f92672;">(</span> <span style="color: #f8f8f2;">a</span><span style="color: #f92672;">,</span> <span style="color: #f8f8f2;">b</span> <span style="color: #f92672;">),</span> <span style="color: #f8f8f2;">c</span><span style="color: #f92672;">)</span>
<span style="color: #66d9ef;">def</span> <span style="color: #f8f8f2;">sd</span><span style="color: #f92672;">(</span><span style="color: #f8f8f2;">s1</span><span style="color: #66d9ef;">:</span> <span style="color: #66d9ef;">List</span><span style="color: #f92672;">[</span><span style="color: #66d9ef;">Char</span><span style="color: #f92672;">],</span> <span style="color: #f8f8f2;">s2</span><span style="color: #66d9ef;">:</span> <span style="color: #66d9ef;">List</span><span style="color: #f92672;">[</span><span style="color: #66d9ef;">Char</span><span style="color: #f92672;">])</span><span style="color: #66d9ef;">:</span> <span style="color: #66d9ef;">Int</span> <span style="color: #f92672;">=</span> <span style="color: #f92672;">(</span><span style="color: #f8f8f2;">s1</span><span style="color: #f92672;">,</span> <span style="color: #f8f8f2;">s2</span><span style="color: #f92672;">)</span> <span style="color: #66d9ef;">match</span> <span style="color: #f92672;">{</span>
<span style="color: #66d9ef;">case</span> <span style="color: #f92672;">(</span><span style="color: #66d9ef;">_</span><span style="color: #f92672;">,</span> <span style="color: #a6e22e;">Nil</span><span style="color: #f92672;">)</span> <span style="color: #66d9ef;">=></span> <span style="color: #f8f8f2;">s1</span><span style="color: #f92672;">.</span><span style="color: #f8f8f2;">length</span>
<span style="color: #66d9ef;">case</span> <span style="color: #f92672;">(</span><span style="color: #a6e22e;">Nil</span><span style="color: #f92672;">,</span> <span style="color: #66d9ef;">_</span><span style="color: #f92672;">)</span> <span style="color: #66d9ef;">=></span> <span style="color: #f8f8f2;">s2</span><span style="color: #f92672;">.</span><span style="color: #f8f8f2;">length</span>
<span style="color: #66d9ef;">case</span> <span style="color: #f92672;">(</span><span style="color: #f8f8f2;">c1</span><span style="color: #f92672;">::</span><span style="color: #f8f8f2;">t1</span><span style="color: #f92672;">,</span> <span style="color: #f8f8f2;">c2</span><span style="color: #f92672;">::</span><span style="color: #f8f8f2;">t2</span><span style="color: #f92672;">)</span> <span style="color: #66d9ef;">=></span> <span style="color: #f8f8f2;">min</span><span style="color: #f92672;">(</span> <span style="color: #f8f8f2;">sd</span><span style="color: #f92672;">(</span><span style="color: #f8f8f2;">t1</span><span style="color: #f92672;">,</span><span style="color: #f8f8f2;">s2</span><span style="color: #f92672;">)</span> <span style="color: #f92672;">+</span> <span style="color: #ae81ff;">1</span><span style="color: #f92672;">,</span> <span style="color: #f8f8f2;">sd</span><span style="color: #f92672;">(</span><span style="color: #f8f8f2;">s1</span><span style="color: #f92672;">,</span><span style="color: #f8f8f2;">t2</span><span style="color: #f92672;">)</span> <span style="color: #f92672;">+</span> <span style="color: #ae81ff;">1</span><span style="color: #f92672;">,</span>
<span style="color: #f8f8f2;">sd</span><span style="color: #f92672;">(</span><span style="color: #f8f8f2;">t1</span><span style="color: #f92672;">,</span><span style="color: #f8f8f2;">t2</span><span style="color: #f92672;">)</span> <span style="color: #f92672;">+</span> <span style="color: #f92672;">(</span><span style="color: #66d9ef;">if</span> <span style="color: #f92672;">(</span><span style="color: #f8f8f2;">c1</span><span style="color: #f92672;">==</span><span style="color: #f8f8f2;">c2</span><span style="color: #f92672;">)</span> <span style="color: #ae81ff;">0</span> <span style="color: #66d9ef;">else</span> <span style="color: #ae81ff;">1</span><span style="color: #f92672;">)</span> <span style="color: #f92672;">)</span>
<span style="color: #f92672;">}</span>
<span style="color: #f8f8f2;">sd</span><span style="color: #f92672;">(</span> <span style="color: #f8f8f2;">s1</span><span style="color: #f92672;">.</span><span style="color: #f8f8f2;">toList</span><span style="color: #f92672;">,</span> <span style="color: #f8f8f2;">s2</span><span style="color: #f92672;">.</span><span style="color: #f8f8f2;">toList</span> <span style="color: #f92672;">)</span>
<span style="color: #f92672;">}</span>
</pre>
</div>
<br />
<br />
The second things to do, is to define a set of functions that would be used for get rid of data:<br />
<br />
<!-- HTML generated using hilite.me --><br />
<div style="background: #272822; border-width: .1em .1em .1em .8em; border: solid gray; overflow: auto; padding: .2em .6em; width: auto;">
<pre style="line-height: 125%; margin: 0;"><span style="color: #75715e;">/**</span>
<span style="color: #75715e;"> * Fowler–Noll–Vo (FVN) hash function</span>
<span style="color: #75715e;"> * @see <a href="http://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function">;</a></span>
<span style="color: #75715e;"> * http://en.wikipedia.org/wiki/Fowler%E2%80%93Noll%E2%80%93Vo_hash_function</span>
<span style="color: #75715e;"> * </span></pre>
</div>
<span style="color: #75715e;"> * */</span>
<span style="color: #66d9ef;">private</span><span style="color: #f92672;">[</span><span style="color: #66d9ef;">impl</span><span style="color: #f92672;">]</span> <span style="color: #66d9ef;">def</span> <span style="color: #a6e22e;">LshHash</span><span style="color: #f92672;">[</span><span style="color: #66d9ef;">T</span><span style="color: #f92672;">](</span><span style="color: #f8f8f2;">seedOne</span><span style="color: #66d9ef;">:</span> <span style="color: #66d9ef;">Int</span><span style="color: #f92672;">,</span> <span style="color: #f8f8f2;">seedTwo</span><span style="color: #66d9ef;">:</span> <span style="color: #66d9ef;">Int</span><span style="color: #f92672;">)(</span><span style="color: #f8f8f2;">input</span><span style="color: #66d9ef;">:T</span><span style="color: #f92672;">)</span><span style="color: #66d9ef;">:</span> <span style="color: #66d9ef;">Int</span> <span style="color: #f92672;">=</span> <span style="color: #f92672;">{</span>
<span style="color: #66d9ef;">var</span> <span style="color: #f8f8f2;">hash</span> <span style="color: #66d9ef;">=</span> <span style="color: #ae81ff;">2166136261L</span><span style="color: #f92672;">.</span><span style="color: #f8f8f2;">asInstanceOf</span><span style="color: #f92672;">[</span><span style="color: #66d9ef;">Int</span><span style="color: #f92672;">]</span> <span style="color: #75715e;">// offset_basis for FNV-1</span>
<span style="color: #f8f8f2;">hash</span> <span style="color: #66d9ef;">=</span> <span style="color: #f8f8f2;">hash</span><span style="color: #f92672;">*</span><span style="color: #ae81ff;">16777619</span> <span style="color: #f92672;">^</span> <span style="color: #f8f8f2;">seedOne</span> <span style="color: #75715e;">// FNV_prime</span>
<span style="color: #f8f8f2;">hash</span> <span style="color: #66d9ef;">=</span> <span style="color: #f8f8f2;">hash</span><span style="color: #f92672;">*</span><span style="color: #ae81ff;">16777619</span> <span style="color: #f92672;">^</span> <span style="color: #f8f8f2;">seedTwo</span>
<span style="color: #f8f8f2;">hash</span> <span style="color: #66d9ef;">=</span> <span style="color: #f8f8f2;">hash</span><span style="color: #f92672;">*</span><span style="color: #ae81ff;">16777619</span> <span style="color: #f92672;">^</span> <span style="color: #f8f8f2;">input</span><span style="color: #f92672;">.</span><span style="color: #f8f8f2;">hashCode</span><span style="color: #f92672;">()</span>
<span style="color: #66d9ef;">return</span> <span style="color: #f8f8f2;">hash</span>
<span style="color: #f92672;">}</span>
<br />
Now we have to create a set of this functions:
<br />
<!-- HTML generated using hilite.me --><br />
<div style="background: #272822; border-width: .1em .1em .1em .8em; border: solid gray; overflow: auto; padding: .2em .6em; width: auto;">
<pre style="line-height: 125%; margin: 0;"><span style="color: #66d9ef;">private</span> <span style="color: #66d9ef;">val</span> <span style="color: #f8f8f2;">minHashFuns</span> <span style="color: #66d9ef;">=</span> <span style="color: #66d9ef;">new</span> <span style="color: #f8f8f2;">mutable</span><span style="color: #f92672;">.</span><span style="color: #a6e22e;">ArrayBuffer</span><span style="color: #f92672;">[</span> <span style="color: #f92672;">(</span><span style="color: #66d9ef;">Any</span><span style="color: #f92672;">)</span> <span style="color: #66d9ef;">=></span> <span style="color: #66d9ef;">Int</span> <span style="color: #f92672;">]()</span> <span style="color: #75715e;">// array of minhash functions that were initialized with basic Seed values</span>
<span style="color: #a6e22e;">@transient</span> <span style="color: #66d9ef;">private</span> <span style="color: #66d9ef;">val</span> <span style="color: #f8f8f2;">rnd</span> <span style="color: #66d9ef;">=</span> <span style="color: #66d9ef;">new</span> <span style="color: #a6e22e;">Random</span><span style="color: #f92672;">(</span><span style="color: #ae81ff;">2014</span><span style="color: #f92672;">)</span> <span style="color: #75715e;">// the same seed is required to generate the same sequence on different machines</span>
<span style="color: #66d9ef;">private</span> <span style="color: #66d9ef;">def</span> <span style="color: #f8f8f2;">populateMinHashes</span><span style="color: #f92672;">()</span> <span style="color: #66d9ef;">=</span> <span style="color: #f92672;">{</span>
<span style="color: #66d9ef;">for</span><span style="color: #f92672;">(</span> <span style="color: #f8f8f2;">i</span> <<span style="color: #66d9ef;">-</span> <span style="color: #ae81ff;">1</span> <span style="color: #f8f8f2;">to</span> <span style="color: #f8f8f2;">signatureSize</span><span style="color: #f92672;">*</span><span style="color: #f8f8f2;">signatureGroups</span><span style="color: #f92672;">)</span> <span style="color: #f92672;">{</span>
<span style="color: #f8f8f2;">minHashFuns</span> <span style="color: #f92672;">+=</span> <span style="color: #f92672;">(</span> <span style="color: #a6e22e;">LshHash</span><span style="color: #f92672;">(</span><span style="color: #f8f8f2;">rnd</span><span style="color: #f92672;">.</span><span style="color: #f8f8f2;">nextInt</span><span style="color: #f92672;">(),</span> <span style="color: #f8f8f2;">rnd</span><span style="color: #f92672;">.</span><span style="color: #f8f8f2;">nextInt</span><span style="color: #f92672;">())</span> <span style="color: #f92672;">)</span>
<span style="color: #f92672;">}</span>
<span style="color: #f92672;">}</span>
</pre>
</div>
<br />
And there is how we apply minhashes:
<br />
<!-- HTML generated using hilite.me --><br />
<div style="background: #272822; border-width: .1em .1em .1em .8em; border: solid gray; overflow: auto; padding: .2em .6em; width: auto;">
<pre style="line-height: 125%; margin: 0;"><span style="color: #66d9ef;">private</span> <span style="color: #66d9ef;">def</span> <span style="color: #f8f8f2;">applyMinHashed</span><span style="color: #f92672;">[</span><span style="color: #66d9ef;">T</span> <span style="color: #66d9ef;"><: font=""> <span style="color: #66d9ef;">NGramEnabled</span><span style="color: #f92672;">](</span><span style="color: #f8f8f2;">rdd</span><span style="color: #66d9ef;">:</span> <span style="color: #66d9ef;">RDD</span><span style="color: #f92672;">[</span><span style="color: #66d9ef;">T</span><span style="color: #f92672;">])</span><span style="color: #66d9ef;">:</span> <span style="color: #66d9ef;">RDD</span><span style="color: #f92672;">[(</span><span style="color: #66d9ef;">String</span>, <span style="color: #66d9ef;">T</span><span style="color: #f92672;">)]</span> <span style="color: #66d9ef;">=</span> <span style="color: #f92672;">{</span>
<span style="color: #66d9ef;">return</span> <span style="color: #f8f8f2;">rdd</span><span style="color: #f92672;">.</span><span style="color: #f8f8f2;">flatMap</span> <span style="color: #f92672;">{</span>
<span style="color: #f8f8f2;">e</span> <span style="color: #66d9ef;">=></span>
<span style="color: #f92672;">(</span><span style="color: #ae81ff;">0</span> <span style="color: #f8f8f2;">until</span> <span style="color: #f8f8f2;">signatureGroups</span><span style="color: #f92672;">).</span><span style="color: #f8f8f2;">by</span><span style="color: #f92672;">(</span><span style="color: #ae81ff;">1</span><span style="color: #f92672;">).</span><span style="color: #f8f8f2;">map</span> <span style="color: #f92672;">{</span>
<span style="color: #f8f8f2;">i</span> <span style="color: #66d9ef;">=></span> <span style="color: #a6e22e;">Array</span><span style="color: #f92672;">(</span><span style="color: #f8f8f2;">getMinHashSignatureAsStr</span><span style="color: #f92672;">(</span><span style="color: #a6e22e;">NGrams</span><span style="color: #f92672;">.</span><span style="color: #f8f8f2;">getNGramms</span><span style="color: #f92672;">(</span><span style="color: #f8f8f2;">e</span><span style="color: #f92672;">.</span><span style="color: #f8f8f2;">getStringForNGram</span><span style="color: #f92672;">()),</span> <span style="color: #f8f8f2;">i</span><span style="color: #f92672;">),</span> <span style="color: #f8f8f2;">e</span><span style="color: #f92672;">)</span>
<span style="color: #f92672;">}</span>
<span style="color: #f92672;">}.</span><span style="color: #f8f8f2;">map</span><span style="color: #f92672;">{</span>
<span style="color: #f8f8f2;">x</span> <span style="color: #66d9ef;">=></span>
<span style="color: #f92672;">(</span><span style="color: #f8f8f2;">x</span><span style="color: #f92672;">(</span><span style="color: #ae81ff;">0</span><span style="color: #f92672;">).</span><span style="color: #f8f8f2;">asInstanceOf</span><span style="color: #f92672;">[</span><span style="color: #66d9ef;">String</span><span style="color: #f92672;">],</span> <span style="color: #f8f8f2;">x</span><span style="color: #f92672;">(</span><span style="color: #ae81ff;">1</span><span style="color: #f92672;">).</span><span style="color: #f8f8f2;">asInstanceOf</span><span style="color: #f92672;">[</span><span style="color: #66d9ef;">T</span><span style="color: #f92672;">])</span>
<span style="color: #f92672;">}</span>
<span style="color: #f92672;">}</span>
<span style="color: #66d9ef;">private</span> <span style="color: #66d9ef;">def</span> <span style="color: #f8f8f2;">getMinHashSignatureAsStr</span><span style="color: #f92672;">(</span><span style="color: #f8f8f2;">tokens</span><span style="color: #66d9ef;">:</span> <span style="color: #66d9ef;">scala.collection.immutable.Set</span><span style="color: #f92672;">[</span><span style="color: #66d9ef;">String</span><span style="color: #f92672;">],</span> <span style="color: #f8f8f2;">signatureGroupNum</span><span style="color: #66d9ef;">:</span> <span style="color: #66d9ef;">Int</span><span style="color: #f92672;">)</span><span style="color: #66d9ef;">:</span> <span style="color: #66d9ef;">String</span> <span style="color: #f92672;">=</span> <span style="color: #f92672;">{</span>
<span style="color: #66d9ef;">return</span> <span style="color: #f8f8f2;">getMinHashSignature</span><span style="color: #f92672;">(</span><span style="color: #f8f8f2;">tokens</span><span style="color: #f92672;">,</span> <span style="color: #f8f8f2;">signatureGroupNum</span><span style="color: #f92672;">).</span><span style="color: #f8f8f2;">mkString</span><span style="color: #f92672;">(</span><span style="color: #e6db74;">"_"</span><span style="color: #f92672;">)</span>
<span style="color: #f92672;">}</span>
<span style="color: #66d9ef;">private</span> <span style="color: #66d9ef;">def</span> <span style="color: #f8f8f2;">getMinHashSignature</span><span style="color: #f92672;">(</span><span style="color: #f8f8f2;">tokens</span><span style="color: #66d9ef;">:</span> <span style="color: #66d9ef;">scala.collection.immutable.Set</span><span style="color: #f92672;">[</span><span style="color: #66d9ef;">String</span><span style="color: #f92672;">],</span> <span style="color: #f8f8f2;">signatureGroupNum</span><span style="color: #66d9ef;">:</span> <span style="color: #66d9ef;">Int</span><span style="color: #f92672;">)</span><span style="color: #66d9ef;">:</span> <span style="color: #66d9ef;">Array</span><span style="color: #f92672;">[</span><span style="color: #66d9ef;">Int</span><span style="color: #f92672;">]</span> <span style="color: #66d9ef;">=</span> <span style="color: #f92672;">{</span>
<span style="color: #66d9ef;">val</span> <span style="color: #f8f8f2;">minHashValues</span> <span style="color: #66d9ef;">=</span> <span style="color: #a6e22e;">Array</span><span style="color: #f92672;">.</span><span style="color: #f8f8f2;">fill</span><span style="color: #f92672;">[</span><span style="color: #66d9ef;">Int</span><span style="color: #f92672;">](</span><span style="color: #f8f8f2;">signatureSize</span><span style="color: #f92672;">)(</span><span style="color: #a6e22e;">Int</span><span style="color: #f92672;">.</span><span style="color: #a6e22e;">MaxValue</span><span style="color: #f92672;">)</span>
<span style="color: #75715e;">// we don't need to hash the same token more then once, so will save all hashed tokens</span>
<span style="color: #66d9ef;">val</span> <span style="color: #f8f8f2;">uniqueTokens</span> <span style="color: #66d9ef;">=</span> <span style="color: #66d9ef;">new</span> <span style="color: #f8f8f2;">mutable</span><span style="color: #f92672;">.</span><span style="color: #a6e22e;">HashSet</span><span style="color: #f92672;">[</span><span style="color: #66d9ef;">String</span><span style="color: #f92672;">]()</span>
<span style="color: #66d9ef;">for</span><span style="color: #f92672;">(</span><span style="color: #f8f8f2;">token</span> <span style="color: #66d9ef;"><- font=""> <span style="color: #f8f8f2;">tokens</span><span style="color: #f92672;">)</span> <span style="color: #f92672;">{</span>
<span style="color: #66d9ef;">if</span><span style="color: #f92672;">(</span> <span style="color: #f8f8f2;">uniqueTokens</span><span style="color: #f92672;">.</span><span style="color: #f8f8f2;">add</span><span style="color: #f92672;">(</span><span style="color: #f8f8f2;">token</span><span style="color: #f92672;">)</span> <span style="color: #f92672;">)</span> <span style="color: #f92672;">{</span>
<span style="color: #75715e;">// apply each LSH function to token</span>
<span style="color: #66d9ef;">for</span><span style="color: #f92672;">(</span> <span style="color: #f8f8f2;">j</span> <span style="color: #66d9ef;"><- font=""> <span style="color: #ae81ff;">0</span> <span style="color: #f8f8f2;">until</span> <span style="color: #f8f8f2;">signatureSize</span> <span style="color: #f92672;">)</span> <span style="color: #f92672;">{</span>
<span style="color: #66d9ef;">val</span> <span style="color: #f8f8f2;">currentHashValue</span> <span style="color: #66d9ef;">=</span> <span style="color: #f8f8f2;">minHashFuns</span><span style="color: #f92672;">(</span><span style="color: #f8f8f2;">signatureGroupNum</span><span style="color: #f92672;">*</span><span style="color: #f8f8f2;">signatureSize</span> <span style="color: #f92672;">+</span> <span style="color: #f8f8f2;">j</span><span style="color: #f92672;">)(</span><span style="color: #f8f8f2;">token</span><span style="color: #f92672;">)</span>
<span style="color: #66d9ef;">if</span><span style="color: #f92672;">(</span> <span style="color: #f8f8f2;">currentHashValue</span> <span style="color: #f92672;"><</span> <span style="color: #f8f8f2;">minHashValues</span><span style="color: #f92672;">(</span><span style="color: #f8f8f2;">j</span><span style="color: #f92672;">)</span> <span style="color: #f92672;">)</span> <span style="color: #f92672;">{</span>
<span style="color: #f8f8f2;">minHashValues</span><span style="color: #f92672;">(</span><span style="color: #f8f8f2;">j</span><span style="color: #f92672;">)</span> <span style="color: #66d9ef;">=</span> <span style="color: #f8f8f2;">currentHashValue</span>
<span style="color: #f92672;">}</span>
<span style="color: #f92672;">}</span>
<span style="color: #f92672;">}</span>
<span style="color: #f92672;">}</span>
<span style="color: #66d9ef;">return</span> <span style="color: #f8f8f2;">minHashValues</span>
<span style="color: #f92672;">}</span>
</-></span></-></span></:></span></pre>
</div>
<br />
And now we are ready to merge all code and delivery solution for joining two RDDs:
<br />
<!-- HTML generated using hilite.me --><br />
<div style="background: #272822; border-width: .1em .1em .1em .8em; border: solid gray; overflow: auto; padding: .2em .6em; width: auto;">
<pre style="line-height: 125%; margin: 0;"> <span style="color: #66d9ef;">def</span> <span style="color: #f8f8f2;">join</span><span style="color: #f92672;">(</span><span style="color: #f8f8f2;">accounts</span><span style="color: #66d9ef;">:</span> <span style="color: #66d9ef;">RDD</span><span style="color: #f92672;">[</span><span style="color: #66d9ef;">BankAccount</span><span style="color: #f92672;">],</span> <span style="color: #f8f8f2;">visitors</span><span style="color: #66d9ef;">:</span> <span style="color: #66d9ef;">RDD</span><span style="color: #f92672;">[</span><span style="color: #66d9ef;">Visitor</span><span style="color: #f92672;">])</span><span style="color: #66d9ef;">:</span> <span style="color: #66d9ef;">RDD</span><span style="color: #f92672;">[(</span><span style="color: #66d9ef;">Visitor</span>, <span style="color: #66d9ef;">BankAccount</span><span style="color: #f92672;">)]</span> <span style="color: #66d9ef;">=</span> <span style="color: #f92672;">{</span>
<span style="color: #75715e;">/* In Scala, these operations are automatically available on RDDs</span>
<span style="color: #75715e;"> containing Tuple2 objects (the built-in tuples in the language, created by</span>
<span style="color: #75715e;"> simply writing (a, b)), as long as you import</span>
<span style="color: #75715e;"> org.apache.spark.SparkContext._ in your program to enable Spark’s implicit</span>
<span style="color: #75715e;"> conversions.*/</span>
<span style="color: #66d9ef;">return</span> <span style="color: #f8f8f2;">applyMinHashed</span><span style="color: #f92672;">(</span><span style="color: #f8f8f2;">accounts</span><span style="color: #f92672;">).</span><span style="color: #f8f8f2;">join</span><span style="color: #f92672;">(</span> <span style="color: #f8f8f2;">applyMinHashed</span><span style="color: #f92672;">(</span><span style="color: #f8f8f2;">visitors</span><span style="color: #f92672;">)</span> <span style="color: #f92672;">).</span><span style="color: #f8f8f2;">map</span><span style="color: #f92672;">{</span>
<span style="color: #66d9ef;">case</span> <span style="color: #f92672;">(</span><span style="color: #f8f8f2;">key</span><span style="color: #f92672;">,</span> <span style="color: #f92672;">(</span><span style="color: #f8f8f2;">account</span><span style="color: #f92672;">,</span> <span style="color: #f8f8f2;">visitor</span><span style="color: #f92672;">))</span> <span style="color: #66d9ef;">=></span>
<span style="color: #f92672;">(</span><span style="color: #f8f8f2;">visitor</span><span style="color: #f92672;">,</span> <span style="color: #f8f8f2;">account</span><span style="color: #f92672;">)</span>
<span style="color: #f92672;">}.</span><span style="color: #f8f8f2;">groupByKey</span><span style="color: #f92672;">()</span>
<span style="color: #f92672;">.</span><span style="color: #f8f8f2;">map</span><span style="color: #f92672;">{</span>
<span style="color: #66d9ef;">case</span> <span style="color: #f92672;">(</span><span style="color: #f8f8f2;">visitor</span><span style="color: #f92672;">,</span> <span style="color: #f8f8f2;">accounts</span><span style="color: #f92672;">)</span> <span style="color: #66d9ef;">=></span> <span style="color: #f92672;">{</span>
<span style="color: #66d9ef;">var</span> <span style="color: #f8f8f2;">closestAccount</span><span style="color: #66d9ef;">:</span> <span style="color: #66d9ef;">BankAccount</span> <span style="color: #f92672;">=</span> <span style="color: #66d9ef;">null</span>
<span style="color: #66d9ef;">var</span> <span style="color: #f8f8f2;">bestEditDistance</span> <span style="color: #66d9ef;">=</span> <span style="color: #a6e22e;">Int</span><span style="color: #f92672;">.</span><span style="color: #a6e22e;">MaxValue</span>
<span style="color: #66d9ef;">for</span> <span style="color: #f92672;">(</span><span style="color: #f8f8f2;">a</span> <span style="color: #66d9ef;"><- font=""> <span style="color: #f8f8f2;">accounts</span><span style="color: #f92672;">)</span> <span style="color: #f92672;">{</span>
<span style="color: #66d9ef;">val</span> <span style="color: #f8f8f2;">curEditDist</span> <span style="color: #66d9ef;">=</span> <span style="color: #f8f8f2;">stringDistance</span><span style="color: #f92672;">(</span><span style="color: #f8f8f2;">visitor</span><span style="color: #f92672;">.</span><span style="color: #f8f8f2;">name</span><span style="color: #f92672;">,</span> <span style="color: #f8f8f2;">a</span><span style="color: #f92672;">.</span><span style="color: #f8f8f2;">name</span><span style="color: #f92672;">)</span>
<span style="color: #66d9ef;">if</span> <span style="color: #f92672;">(</span><span style="color: #f8f8f2;">curEditDist</span> <span style="color: #f92672;"><</span> <span style="color: #f8f8f2;">bestEditDistance</span><span style="color: #f92672;">)</span> <span style="color: #f92672;">{</span>
<span style="color: #f8f8f2;">bestEditDistance</span> <span style="color: #66d9ef;">=</span> <span style="color: #f8f8f2;">curEditDist</span>
<span style="color: #f8f8f2;">closestAccount</span> <span style="color: #66d9ef;">=</span> <span style="color: #f8f8f2;">a</span>
<span style="color: #f92672;">}</span>
<span style="color: #f92672;">}</span>
<span style="color: #f92672;">(</span><span style="color: #f8f8f2;">visitor</span><span style="color: #f92672;">,</span> <span style="color: #f8f8f2;">closestAccount</span><span style="color: #f92672;">)</span>
<span style="color: #f92672;">}</span>
<span style="color: #f92672;">}</span>
<span style="color: #f92672;">}</span>
</-></span></pre>
</div>
<br />
Final code to join two RDD and print result to console:
<!-- HTML generated using hilite.me --><br />
<div style="background: #272822; border-width: .1em .1em .1em .8em; border: solid gray; overflow: auto; padding: .2em .6em; width: auto;">
<pre style="line-height: 125%; margin: 0;"> <span style="color: #66d9ef;">val</span> <span style="color: #f8f8f2;">acc2vis</span> <span style="color: #66d9ef;">=</span> <span style="color: #f8f8f2;">service</span><span style="color: #f92672;">.</span><span style="color: #f8f8f2;">join</span><span style="color: #f92672;">(</span><span style="color: #f8f8f2;">accounts</span><span style="color: #f92672;">,</span> <span style="color: #f8f8f2;">visitors</span><span style="color: #f92672;">)</span>
<span style="color: #66d9ef;">for</span><span style="color: #f92672;">(</span> <span style="color: #f92672;">(</span><span style="color: #f8f8f2;">v</span><span style="color: #f92672;">,</span><span style="color: #f8f8f2;">a</span><span style="color: #f92672;">)</span> <span style="color: #66d9ef;"><- font=""> <span style="color: #f8f8f2;">acc2vis</span><span style="color: #f92672;">.</span><span style="color: #f8f8f2;">collect</span><span style="color: #f92672;">()</span> <span style="color: #f92672;">)</span> <span style="color: #f92672;">{</span>
<span style="color: #f8f8f2;">println</span><span style="color: #f92672;">(</span> <span style="color: #f8f8f2;">f</span><span style="color: #e6db74;">"Visitor ${v.name}%s has score level ${a.score}%2.2f (${a.name}%s)"</span> <span style="color: #f92672;">)</span>
<span style="color: #f92672;">}</span>
</-></span></pre>
</div>
Anonymoushttp://www.blogger.com/profile/12572224859800644616noreply@blogger.com3tag:blogger.com,1999:blog-8382000407737271014.post-89878808661198162642014-11-07T15:37:00.002+02:002014-11-11T19:21:05.585+02:00Spark and Location Sensitive Hashing, part 1Location Sensitive Hashing is the name of special algorithm designed to address complexity of BigData processing.<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjQeg80ZBm1yTtn92ajYDP9ksFVPCVZbpyIZWNxAzw_pq_JlMCRhgCfbceu718AsGMXO4ggHRQ9zfEJaADKAWoY8f1e3kba89zzj5aws_TCV3cS0ubD64oZSrI9UtZLMhnu4cRb_0qU-8vH/s1600/Screen+Shot+2014-11-11+at+7.18.58+PM.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="320" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEjQeg80ZBm1yTtn92ajYDP9ksFVPCVZbpyIZWNxAzw_pq_JlMCRhgCfbceu718AsGMXO4ggHRQ9zfEJaADKAWoY8f1e3kba89zzj5aws_TCV3cS0ubD64oZSrI9UtZLMhnu4cRb_0qU-8vH/s320/Screen+Shot+2014-11-11+at+7.18.58+PM.png" width="315" /></a></div>
<br />
<br />
Let's consider the follwoing example: assume we have two independent systems, one is web-application that gets user's profile from social network, second system is online payment system. Our idea is merge profiles from social network and payment system. Of course, the social network user might not be presented in payment system at all, cerate accounts in different time and definetely we don't have foreign key to match them exactly. There are two possible issues:<br />
<br />
<ul>
<li>there are two huge data sets that must be merged</li>
<li>an user's name might look different in social network and payment system </li>
</ul>
<div>
The naive approach is to compare social network user and payment system user names, calculate Hamming distance between them and pick up the most similar pair as successfuly matched. The biggest issue here is O(n<sup>2</sup>) complexity of this approach.</div>
<div>
<br /></div>
<div>
We want to minimize a number of comparison between two datasets. Hopefully, this issue was resolved by inventing Location Sensitive Hashing algorithm. Let's consider simple hashing: </div>
<div>
f(str) → x</div>
<div>
we can calculate hashing function f on string (user name from profile) s and get integer x; then we need to compare Hamming distances only for strings which have the same x. The issue here is to pick up very good hashing function, which is almost impossible. Hopefully, we are not limited by one function: we can apply several/tens/hundreds hashing functions - in this case we would have data duplication, because one string would be assigned to several buckets (hash value). It would increase the number of useles comparisons, but at the some moment we would have a bigger chance to get succesful comparison.<br />
<br />
However, it wouldn't work good enough, because names might have misprintings and using special lettern in social profile when only traditional latin in payments system or vice versa. n-grams and minhashing might come in handy in this situation. The main idea is to get all possible n-grams for string and apply minhashing algorithm to them. In result, we aims to get a set of new hash codes based on n-grams and make comparison of string that was placed into the same buckets based on these hashcodes.<br />
<br />
Step by step algorithm is next:<br />
<br />
<ol>
<li>Define a collection of hash functions</li>
<li>Calculate minhash function on n-gramm of profile by minhash algo</li>
<li>Based on equals hashcodes get pairs of similar profiles from social and payment networks</li>
<li>Calculate Hamming distance in pairs to select the most similar matching for each case</li>
</ol>
<br />
In next part: source code example and implementation over Apache Spark</div>
Anonymoushttp://www.blogger.com/profile/12572224859800644616noreply@blogger.com0tag:blogger.com,1999:blog-8382000407737271014.post-41285360243828145322014-10-10T11:38:00.000+03:002014-10-10T11:38:21.001+03:00Tuning the MapReduce job<blockquote>
<blockquote class="tr_bq">
<i>java.lang.OutOfMemoryError: GC overhead limit exceeded</i></blockquote>
</blockquote>
that's what I got yesterday while running my new shining MapReduce job.<br />
<br />
OutOfMemory in java has different reasons: no more memory available, or GC was called to often (my case), no more free PermGem space, etc.<br />
<br />
To get more information, about JVM internals we have to tune JVM runing. I'm using Hortonworks distribution, so I went to Ambari, MapReduce configuration tab and found <span style="background-color: white; color: #222222; font-family: Arial, sans-serif; font-size: 12px; line-height: 18px;">mapreduce.reduce.java.opts</span> This property is responsible for reducer's JVM configuration. Let's add GarbageCollector loggining<br />
<i style="background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18.4799995422363px;">-verbose:gc -Xloggc:/tmp/@taskid@.gc </i><span style="background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18.4799995422363px;"><i>-XX:+PrintGCDetails -XX:+PrintGCTimeStamps</i></span><br />
<span style="background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; line-height: 18.4799995422363px;">We set up to write GC log to local filesystem in folder tmp, file name - taskId + gc extension.</span><br />
<span style="background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18.4799995422363px;"><i><br /></i></span>
<span style="background-color: white; color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: 13px; line-height: 18.4799995422363px;">In general, the following properties are important for JVM tuning:</span><br />
<br />
<ul>
<li><span style="color: #222222; font-family: Arial, Tahoma, Helvetica, FreeSans, sans-serif; font-size: x-small;"><span style="line-height: 18.4799995422363px;"><span style="background-color: white; font-family: Arial, sans-serif; font-size: 12px; line-height: 18px;">mapred.child.java.opts - </span></span></span><span style="background-color: white; color: #222222; font-family: Arial, sans-serif; font-size: 12px; line-height: 18px;">Provides JVM options to pass to map and reduce tasks. Usually includes the </span><strong style="background-color: white; border: 0px; color: #222222; font-family: Arial, sans-serif; font-size: 12px; line-height: 18px; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">-Xmx</strong><span style="background-color: white; color: #222222; font-family: Arial, sans-serif; font-size: 12px; line-height: 18px;"> option to specify the maximum heap size. May also specify </span><strong style="background-color: white; border: 0px; color: #222222; font-family: Arial, sans-serif; font-size: 12px; line-height: 18px; margin: 0px; outline: 0px; padding: 0px; vertical-align: baseline;">-Xms</strong><span style="background-color: white; color: #222222; font-family: Arial, sans-serif; font-size: 12px; line-height: 18px;"> to specify the start heap size. </span></li>
<li><span style="background-color: white; color: #222222; font-family: Arial, sans-serif; font-size: 12px; line-height: 18px;">mapreduce.map.java.opts - </span><span style="background-color: white; color: #222222; font-family: Arial, sans-serif; font-size: 12px; line-height: 18px;">Overrides mapred.child.java.opts for map tasks.</span></li>
<li><span style="background-color: white; color: #222222; font-family: Arial, sans-serif; font-size: 12px; line-height: 18px;">mapreduce.reduce.java.opts - </span><span style="background-color: white; color: #222222; font-family: Arial, sans-serif; font-size: 12px; line-height: 18px;">Overrides mapred.child.java.opts for reduce tasks.</span></li>
</ul>
<span style="color: #222222; font-family: Arial, sans-serif;"><span style="line-height: 18px;">After entering new value for property, the MapReduce service must be restarted (Hortonworks reming with yellow button "Restart"). Only after restart changes woulb be applied. Next step is to run map reduce job, and in result the logs per task woulb be placed into tmp folder on each node.</span></span><br />
<span style="color: #222222; font-family: Arial, sans-serif;"><span style="line-height: 18px;"><br /></span></span>
<span style="color: #222222; font-family: Arial, sans-serif;"><span style="line-height: 18px;">It'a but diffiulty to read the log, but hopefulyl several UI tools exist on the market. i prefer the open sourced <a href="https://github.com/chewiebug/GCViewer" target="_blank">GCViewer</a>, which is java application and doesn't require instalation. It supports wide range of JVM, moreove it has command line interface for generation reports - so automation for getting reports might be applied.</span></span><br />
<span style="color: #222222; font-family: Arial, sans-serif;"><span style="line-height: 18px;"><br /></span></span>
<span style="color: #222222; font-family: Arial, sans-serif;"><span style="line-height: 18px;">The open GC log gets the detail overview of memory state:</span></span><br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhm9TEmgtPHi665F-a8pCX05ov4p4rxVJKecZ-qxLibvlYl6O9ojcgmeWXXgmRSATotoUBkAEQGmPQAjtLElCrj6ruJy-kr2PFKIpKrYZG2uo2vZnzPEq7B5c5mm6xia0TqNATdD9JZlOqo/s1600/GC_jvm.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhm9TEmgtPHi665F-a8pCX05ov4p4rxVJKecZ-qxLibvlYl6O9ojcgmeWXXgmRSATotoUBkAEQGmPQAjtLElCrj6ruJy-kr2PFKIpKrYZG2uo2vZnzPEq7B5c5mm6xia0TqNATdD9JZlOqo/s1600/GC_jvm.png" height="400" width="275" /></a></div>
<span style="color: #222222; font-family: Arial, sans-serif;"><span style="line-height: 18px;"><br /></span></span>
<span style="color: #222222; font-family: Arial, sans-serif;"><span style="line-height: 18px;">Legend:</span></span><br />
<br />
<ul>
<li><span style="color: #333333; font-family: Consolas, 'Liberation Mono', Menlo, Courier, monospace; font-size: 15px; white-space: pre-wrap;">Green line that shows the length of all GCs</span></li>
<li><span style="color: #333333; font-family: Consolas, 'Liberation Mono', Menlo, Courier, monospace; font-size: 15px; white-space: pre-wrap;">Magenta area that shows the size of the tenured </span><span style="color: #333333; font-family: Consolas, 'Liberation Mono', Menlo, Courier, monospace; font-size: 15px; white-space: pre-wrap;">generation (not available without PrintGCDetails)</span></li>
<li><span style="color: #333333; font-family: Consolas, 'Liberation Mono', Menlo, Courier, monospace; font-size: 15px; white-space: pre-wrap;">Orange area that shows the size of the young</span><span style="color: #333333; font-family: Consolas, 'Liberation Mono', Menlo, Courier, monospace; font-size: 15px; white-space: pre-wrap;"> generation (not available without PrintGCDetails)</span></li>
<li><span style="color: #333333; font-family: Consolas, 'Liberation Mono', Menlo, Courier, monospace; font-size: 15px; white-space: pre-wrap;">Blue line that shows used heap size</span></li>
</ul>
<br />
Anonymoushttp://www.blogger.com/profile/12572224859800644616noreply@blogger.com0tag:blogger.com,1999:blog-8382000407737271014.post-19120787710655619572014-10-09T14:50:00.000+03:002014-10-09T14:50:12.357+03:00Unit test for Hive querySometimes the soul wants something really extraordinaly... for example, to write a unit test for Hive query :)<br />
<br />
Let's how it is possible step be step. So, to write unit test for Hive:<br />
<br />
First of all, the local hive instance must be run, and for that we need local metastor (I propose Apache Derby) and directories for temporary data, logs, etc. As all configuration will be read from system properties, I didn't find beter way then set up all of them programaticaly...<br />
Be shure to create all mentioned directories before starting Hive, for example with google Guava:<br />
<br />
<span style="font-family: Courier New, Courier, monospace;">FileUtils.forceMkdir(HIVE_BASE_DIR);</span><br />
<br />
And after then register all of them in system environment:<br />
<br /><!-- HTML generated using hilite.me --><div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;border-width:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; line-height: 125%"> System<span style="color: #333333">.</span><span style="color: #0000CC">setProperty</span><span style="color: #333333">(</span><span style="background-color: #fff0f0">"javax.jdo.option.ConnectionURL"</span><span style="color: #333333">,</span> <span style="background-color: #fff0f0">"jdbc:derby:;databaseName="</span> <span style="color: #333333">+</span> HIVE_METADB_DIR<span style="color: #333333">.</span><span style="color: #0000CC">getAbsolutePath</span><span style="color: #333333">()</span> <span style="color: #333333">+</span> <span style="background-color: #fff0f0">";create=true"</span><span style="color: #333333">);</span>
System<span style="color: #333333">.</span><span style="color: #0000CC">setProperty</span><span style="color: #333333">(</span><span style="background-color: #fff0f0">"hive.metastore.warehouse.dir"</span><span style="color: #333333">,</span> HIVE_WAREHOUSE_DIR<span style="color: #333333">.</span><span style="color: #0000CC">getAbsolutePath</span><span style="color: #333333">());</span>
System<span style="color: #333333">.</span><span style="color: #0000CC">setProperty</span><span style="color: #333333">(</span><span style="background-color: #fff0f0">"hive.exec.scratchdir"</span><span style="color: #333333">,</span> HIVE_SCRATCH_DIR<span style="color: #333333">.</span><span style="color: #0000CC">getAbsolutePath</span><span style="color: #333333">());</span>
System<span style="color: #333333">.</span><span style="color: #0000CC">setProperty</span><span style="color: #333333">(</span><span style="background-color: #fff0f0">"hive.exec.local.scratchdir"</span><span style="color: #333333">,</span> HIVE_LOCAL_SCRATCH_DIR<span style="color: #333333">.</span><span style="color: #0000CC">getAbsolutePath</span><span style="color: #333333">());</span>
System<span style="color: #333333">.</span><span style="color: #0000CC">setProperty</span><span style="color: #333333">(</span><span style="background-color: #fff0f0">"hive.metastore.metadb.dir"</span><span style="color: #333333">,</span> HIVE_METADB_DIR<span style="color: #333333">.</span><span style="color: #0000CC">getAbsolutePath</span><span style="color: #333333">());</span>
System<span style="color: #333333">.</span><span style="color: #0000CC">setProperty</span><span style="color: #333333">(</span><span style="background-color: #fff0f0">"test.log.dir"</span><span style="color: #333333">,</span> HIVE_LOGS_DIR<span style="color: #333333">.</span><span style="color: #0000CC">getAbsolutePath</span><span style="color: #333333">());</span>
System<span style="color: #333333">.</span><span style="color: #0000CC">setProperty</span><span style="color: #333333">(</span><span style="background-color: #fff0f0">"hive.querylog.location"</span><span style="color: #333333">,</span> HIVE_TMP_DIR<span style="color: #333333">.</span><span style="color: #0000CC">getAbsolutePath</span><span style="color: #333333">());</span>
System<span style="color: #333333">.</span><span style="color: #0000CC">setProperty</span><span style="color: #333333">(</span><span style="background-color: #fff0f0">"hadoop.tmp.dir"</span><span style="color: #333333">,</span> HIVE_HADOOP_TMP_DIR<span style="color: #333333">.</span><span style="color: #0000CC">getAbsolutePath</span><span style="color: #333333">());</span>
System<span style="color: #333333">.</span><span style="color: #0000CC">setProperty</span><span style="color: #333333">(</span><span style="background-color: #fff0f0">"derby.stream.error.file"</span><span style="color: #333333">,</span> HIVE_BASE_DIR<span style="color: #333333">.</span><span style="color: #0000CC">getAbsolutePath</span><span style="color: #333333">()</span> <span style="color: #333333">+</span> sep <span style="color: #333333">+</span> <span style="background-color: #fff0f0">"derby.log"</span><span style="color: #333333">);</span>
</pre></div><br />
After that, the local hive executor might be started:
<!-- HTML generated using hilite.me --><div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;border-width:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; line-height: 125%">HiveInterface client <span style="color: #333333">=</span> <span style="color: #008800; font-weight: bold">new</span> HiveServer<span style="color: #333333">.</span><span style="color: #0000CC">HiveServerHandler</span><span style="color: #333333">();</span>
</pre></div><br />
In fact, we are ready in this moment. Now I propose to create a Hive table, load data into it and perform some queries. The best practice in Java world is to put all metadata/data for test in separate file, so I put them under resources directory in this example, and here is reading from resource text files:
<!-- HTML generated using hilite.me --><div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;border-width:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; line-height: 125%">client<span style="color: #333333">.</span><span style="color: #0000CC">execute</span><span style="color: #333333">(</span>readResourceFile<span style="color: #333333">(</span><span style="background-color: #fff0f0">"/Example/table_ddl.hql"</span><span style="color: #333333">));</span>
client<span style="color: #333333">.</span><span style="color: #0000CC">execute</span><span style="color: #333333">(</span><span style="background-color: #fff0f0">"LOAD DATA LOCAL INPATH '"</span> <span style="color: #333333">+</span>
<span style="color: #008800; font-weight: bold">this</span><span style="color: #333333">.</span><span style="color: #0000CC">getClass</span><span style="color: #333333">().</span><span style="color: #0000CC">getResource</span><span style="color: #333333">(</span><span style="background-color: #fff0f0">"Example/data.csv"</span><span style="color: #333333">).</span><span style="color: #0000CC">getPath</span><span style="color: #333333">()</span> <span style="color: #333333">+</span> <span style="background-color: #fff0f0">"' OVERWRITE INTO TABLE "</span> <span style="color: #333333">+</span> tableName<span style="color: #333333">);</span>
</pre></div><br />
Ok, now data in the table and Hive knows about them. Let's perform a query:
<!-- HTML generated using hilite.me --><div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;border-width:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; line-height: 125%">client<span style="color: #333333">.</span><span style="color: #0000CC">execute</span><span style="color: #333333">(</span><span style="background-color: #fff0f0">"select sum(revenue), avg(revenue) from "</span> <span style="color: #333333">+</span> tableName <span style="color: #333333">+</span> <span style="background-color: #fff0f0">" group by state"</span><span style="color: #333333">);</span>
</pre></div><br />
Even more, we can register custom function and test it!
<!-- HTML generated using hilite.me --><div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;border-width:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; line-height: 125%">client<span style="color: #333333">.</span><span style="color: #0000CC">execute</span><span style="color: #333333">(</span><span style="background-color: #fff0f0">"ADD JAR "</span> <span style="color: #333333">+</span> HIVE_BASE_DIR<span style="color: #333333">.</span><span style="color: #0000CC">getAbsolutePath</span><span style="color: #333333">()</span> <span style="color: #333333">+</span> jar<span style="color: #333333">.</span><span style="color: #0000CC">getAbsoluteFile</span><span style="color: #333333">());</span>
client<span style="color: #333333">.</span><span style="color: #0000CC">execute</span><span style="color: #333333">(</span><span style="background-color: #fff0f0">"CREATE TEMPORARY FUNCTION TempFun as 'org.my.example.MainFunClass'"</span><span style="color: #333333">);</span>
</pre></div><br />
And after that we can call fresh function:
<!-- HTML generated using hilite.me --><div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;border-width:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; line-height: 125%">client<span style="color: #333333">.</span><span style="color: #0000CC">execute</span><span style="color: #333333">(</span><span style="background-color: #fff0f0">"select TempFun(revenue) from "</span> <span style="color: #333333">+</span> tableName<span style="color: #333333">);</span>
String revenueProcessed <span style="color: #333333">=</span> client<span style="color: #333333">.</span><span style="color: #0000CC">fetchOne</span><span style="color: #333333">();</span>
</pre></div>
Anonymoushttp://www.blogger.com/profile/12572224859800644616noreply@blogger.com0tag:blogger.com,1999:blog-8382000407737271014.post-41292732568615499772014-08-18T15:57:00.001+03:002014-08-18T15:57:51.640+03:00Writing in ElasticSearch directly from Hadoop MapReduce<a href="http://www.elasticsearch.org/" target="_blank">ElasticSearch </a>is a hot topic today. This is powerful open source search and analytics engine that makes data easy to explore. Several times I faced with data populating into ElasticSearch after Hadoop jobs completion. A couple years it was non trivial issue that requires using binary ElasticSearch client and publishing data manually. Hopefully, there is already support by EalsticSearch for Hadoop today.<br />
<br />
Let's see how it might be done with a simplest case: we have to put JSON formatted data into ElasticSearch for further analysis. So, our purpose is to write Map-only job that will populate ElasticSearch with data from text file (already in JSON).<br />
<br />
First of all, let configure Configuration object:<br />
<br />
<div style="background: #f8f8f8; border-width: .1em .1em .1em .8em; border: solid gray; overflow: auto; padding: .2em .6em; width: auto;">
<pre style="line-height: 125%; margin: 0;"> <span style="color: black;">conf</span><span style="color: #ce5c00; font-weight: bold;">.</span><span style="color: #c4a000;">setBoolean</span><span style="color: #ce5c00; font-weight: bold;">(</span><span style="color: #4e9a06;">"mapred.map.tasks.speculative.execution"</span><span style="color: #ce5c00; font-weight: bold;">,</span> <span style="color: #204a87; font-weight: bold;">false</span><span style="color: #ce5c00; font-weight: bold;">);</span>
<span style="color: black;">conf</span><span style="color: #ce5c00; font-weight: bold;">.</span><span style="color: #c4a000;">setBoolean</span><span style="color: #ce5c00; font-weight: bold;">(</span><span style="color: #4e9a06;">"mapred.reduce.tasks.speculative.execution"</span><span style="color: #ce5c00; font-weight: bold;">,</span> <span style="color: #204a87; font-weight: bold;">false</span><span style="color: #ce5c00; font-weight: bold;">);</span>
<span style="color: black;">conf</span><span style="color: #ce5c00; font-weight: bold;">.</span><span style="color: #c4a000;">set</span><span style="color: #ce5c00; font-weight: bold;">(</span><span style="color: #4e9a06;">"es.resource"</span><span style="color: #ce5c00; font-weight: bold;">,</span> <span style="color: #4e9a06;">"emailIndex/email"</span><span style="color: #ce5c00; font-weight: bold;">);</span> <span style="color: #8f5902; font-style: italic;">// intex/type</span>
<span style="color: black;">conf</span><span style="color: #ce5c00; font-weight: bold;">.</span><span style="color: #c4a000;">set</span><span style="color: #ce5c00; font-weight: bold;">(</span><span style="color: #4e9a06;">"es.nodes"</span><span style="color: #ce5c00; font-weight: bold;">,</span> <span style="color: #4e9a06;">"192.168.12.04"</span><span style="color: #ce5c00; font-weight: bold;">);</span> <span style="color: #8f5902; font-style: italic;">// host</span>
<span style="color: black;">conf</span><span style="color: #ce5c00; font-weight: bold;">.</span><span style="color: #c4a000;">set</span><span style="color: #ce5c00; font-weight: bold;">(</span><span style="color: #4e9a06;">"es.port"</span><span style="color: #ce5c00; font-weight: bold;">,</span> <span style="color: #4e9a06;">"11000"</span><span style="color: #ce5c00; font-weight: bold;">);</span> <span style="color: #8f5902; font-style: italic;">// port</span>
<span style="color: black;">conf</span><span style="color: #ce5c00; font-weight: bold;">.</span><span style="color: #c4a000;">set</span><span style="color: #ce5c00; font-weight: bold;">(</span><span style="color: #4e9a06;">"es.input.json"</span><span style="color: #ce5c00; font-weight: bold;">,</span> <span style="color: #4e9a06;">"yes"</span><span style="color: #ce5c00; font-weight: bold;">);</span>
</pre>
</div>
<br />
I guess, everything is clear here.<br />
<br />
Very important is to set up correct output format, pay attention on register:<br />
<br />
<!-- HTML generated using hilite.me --><br />
<div style="background: #f8f8f8; border-width: .1em .1em .1em .8em; border: solid gray; overflow: auto; padding: .2em .6em; width: auto;">
<pre style="line-height: 125%; margin: 0;"> <span style="color: #8f5902; font-style: italic;">// Set input and output format classes</span>
<span style="color: black;">job</span><span style="color: #ce5c00; font-weight: bold;">.</span><span style="color: #c4a000;">setInputFormatClass</span><span style="color: #ce5c00; font-weight: bold;">(</span><span style="color: black;">TextInputFormat</span><span style="color: #ce5c00; font-weight: bold;">.</span><span style="color: #c4a000;">class</span><span style="color: #ce5c00; font-weight: bold;">);</span>
<span style="color: black;">job</span><span style="color: #ce5c00; font-weight: bold;">.</span><span style="color: #c4a000;">setOutputFormatClass</span><span style="color: #ce5c00; font-weight: bold;">(</span><span style="color: black;">EsOutputFormat</span><span style="color: #ce5c00; font-weight: bold;">.</span><span style="color: #c4a000;">class</span><span style="color: #ce5c00; font-weight: bold;">);</span>
<span style="color: #8f5902; font-style: italic;">// Specify the type of output keys and values</span>
<span style="color: black;">job</span><span style="color: #ce5c00; font-weight: bold;">.</span><span style="color: #c4a000;">setOutputKeyClass</span><span style="color: #ce5c00; font-weight: bold;">(</span><span style="color: black;">NullWritable</span><span style="color: #ce5c00; font-weight: bold;">.</span><span style="color: #c4a000;">class</span><span style="color: #ce5c00; font-weight: bold;">);</span>
<span style="color: black;">job</span><span style="color: #ce5c00; font-weight: bold;">.</span><span style="color: #c4a000;">setOutputValueClass</span><span style="color: #ce5c00; font-weight: bold;">(</span><span style="color: black;">Text</span><span style="color: #ce5c00; font-weight: bold;">.</span><span style="color: #c4a000;">class</span><span style="color: #ce5c00; font-weight: bold;">);</span>
</pre>
</div>
<br />
After that we will implement Mapper (it emits only value, without key - this behavior is required by ES output format class!):<br />
<br />
<!-- HTML generated using hilite.me --><br />
<div style="background: #f8f8f8; border-width: .1em .1em .1em .8em; border: solid gray; overflow: auto; padding: .2em .6em; width: auto;">
<pre style="line-height: 125%; margin: 0;"><span style="color: #204a87; font-weight: bold;">public</span> <span style="color: #204a87; font-weight: bold;">static</span> <span style="color: #204a87; font-weight: bold;">class</span> <span style="color: black;">EmailToEsMapper</span> <span style="color: #204a87; font-weight: bold;">extends</span> <span style="color: black;">org</span><span style="color: #ce5c00; font-weight: bold;">.</span><span style="color: #c4a000;">apache</span><span style="color: #ce5c00; font-weight: bold;">.</span><span style="color: #c4a000;">hadoop</span><span style="color: #ce5c00; font-weight: bold;">.</span><span style="color: #c4a000;">mapreduce</span><span style="color: #ce5c00; font-weight: bold;">.</span><span style="color: #c4a000;">Mapper</span><span style="color: #ce5c00; font-weight: bold;"><</span><span style="color: black;">LongWritable</span><span style="color: #ce5c00; font-weight: bold;">,</span> <span style="color: black;">Text</span><span style="color: #ce5c00; font-weight: bold;">,</span> <span style="color: black;">NullWritable</span><span style="color: #ce5c00; font-weight: bold;">,</span> <span style="color: black;">Text</span><span style="color: #ce5c00;"><b>></b></span> <span style="color: #ce5c00; font-weight: bold;">{</span>
<span style="color: #204a87; font-weight: bold;">private</span> <span style="color: black;">Text</span> <span style="color: black;">output</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: #204a87; font-weight: bold;">new</span> <span style="color: black;">Text</span><span style="color: #ce5c00; font-weight: bold;">();</span>
<span style="color: #5c35cc; font-weight: bold;">@Override</span>
<span style="color: #204a87; font-weight: bold;">protected</span> <span style="color: #204a87; font-weight: bold;">void</span> <span style="color: black;">map</span><span style="color: #ce5c00; font-weight: bold;">(</span><span style="color: black;">LongWritable</span> <span style="color: black;">key</span><span style="color: #ce5c00; font-weight: bold;">,</span> <span style="color: black;">Text</span> <span style="color: black;">value</span><span style="color: #ce5c00; font-weight: bold;">,</span> <span style="color: black;">Context</span> <span style="color: black;">context</span><span style="color: #ce5c00; font-weight: bold;">)</span> <span style="color: #204a87; font-weight: bold;">throws</span> <span style="color: black;">IOException</span><span style="color: #ce5c00; font-weight: bold;">,</span> <span style="color: black;">InterruptedException</span> <span style="color: #ce5c00; font-weight: bold;">{</span>
<span style="color: black;">String</span> <span style="color: black;">email</span> <span style="color: #ce5c00; font-weight: bold;">=</span> <span style="color: black;">value</span><span style="color: #ce5c00; font-weight: bold;">.</span><span style="color: #c4a000;">toString</span><span style="color: #ce5c00; font-weight: bold;">();</span>
<span style="color: black;">output</span><span style="color: #ce5c00; font-weight: bold;">.</span><span style="color: #c4a000;">set</span><span style="color: #ce5c00; font-weight: bold;">(</span><span style="color: black;">email</span><span style="color: #ce5c00; font-weight: bold;">)</span>
<span style="color: black;">context</span><span style="color: #ce5c00; font-weight: bold;">.</span><span style="color: #c4a000;">write</span><span style="color: #ce5c00; font-weight: bold;">(</span><span style="color: black;">NullWritable</span><span style="color: #ce5c00; font-weight: bold;">.</span><span style="color: #c4a000;">get</span><span style="color: #ce5c00; font-weight: bold;">(),</span> <span style="color: black;">output</span><span style="color: #ce5c00; font-weight: bold;">);</span>
<span style="color: #ce5c00; font-weight: bold;">}</span>
<span style="color: #ce5c00; font-weight: bold;">}</span>
</pre>
</div>
<br />
Let's back to the second code snippet. There is <span style="line-height: 16.25px;">EsOutputFormat, pay attention on register, because there is old deprecated API with </span><span style="line-height: 16.25px;">ESOutputFormat class.It might be required to add exclusion to Maven file, to pull correct versions of jars and omit dependencies hell:</span><br />
<span style="line-height: 16.25px;"><br /></span>
<span style="line-height: 16.25px;"><br /></span>
<!-- HTML generated using hilite.me --><br />
<div style="background: #f8f8f8; border-width: .1em .1em .1em .8em; border: solid gray; overflow: auto; padding: .2em .6em; width: auto;">
<pre style="line-height: 125%; margin: 0;"> <span style="color: #204a87; font-weight: bold;"><dependency></dependency></span>
<span style="color: #204a87; font-weight: bold;"><groupid></groupid></span>org.elasticsearch<<span style="color: #204a87; font-weight: bold;">/groupId></span>
<<span style="color: #204a87; font-weight: bold;">artifactId></span>elasticsearch-hadoop<<span style="color: #204a87; font-weight: bold;">/artifactId></span>
<span style="color: #204a87; font-weight: bold;"><version></version></span>1.3.0.M2<span style="color: #204a87; font-weight: bold;"></span></pre>
</div>
<span style="color: #204a87; font-weight: bold;"><classifier></classifier></span>yarn<span style="color: #204a87; font-weight: bold;"></span>
<span style="color: #204a87; font-weight: bold;"><exclusions></exclusions></span>
<span style="color: #204a87; font-weight: bold;"><exclusion></exclusion></span>
<span style="color: #204a87; font-weight: bold;"><groupid></groupid></span>cascading<<span style="color: #204a87; font-weight: bold;">/groupId></span>
<span style="color: #204a87; font-weight: bold;"><artifactid></artifactid></span>cascading-hadoop<span style="color: #204a87; font-weight: bold;"></span>
<span style="color: #204a87; font-weight: bold;"></span>
<span style="color: #204a87; font-weight: bold;"><exclusion></exclusion></span>
<span style="color: #204a87; font-weight: bold;"><groupid></groupid></span>cascading<span style="color: #204a87; font-weight: bold;"></span>
<span style="color: #204a87; font-weight: bold;"><artifactid></artifactid></span>cascading-local<span style="color: #204a87; font-weight: bold;"></span>
<<span style="color: #204a87; font-weight: bold;">/exclusion></span>
<span style="color: #204a87; font-weight: bold;"><exclusion></exclusion></span>
<span style="color: #204a87; font-weight: bold;"><groupid></groupid></span>org.apache.pig<span style="color: #204a87; font-weight: bold;"></span>
<span style="color: #204a87; font-weight: bold;"><artifactid></artifactid></span>pig<span style="color: #204a87; font-weight: bold;"></span>
<span style="color: #204a87; font-weight: bold;"></span>
<span style="color: #204a87; font-weight: bold;"><exclusion></exclusion></span>
<span style="color: #204a87; font-weight: bold;"><groupid></groupid></span>org.apache.hive<span style="color: #204a87; font-weight: bold;"></span>
<span style="color: #204a87; font-weight: bold;"><artifactid></artifactid></span>hive-service<span style="color: #204a87; font-weight: bold;"></span>
<span style="color: #204a87; font-weight: bold;"></span>
<span style="color: #204a87; font-weight: bold;"></span>
<span style="color: #204a87; font-weight: bold;"></span>
Anonymoushttp://www.blogger.com/profile/12572224859800644616noreply@blogger.com0tag:blogger.com,1999:blog-8382000407737271014.post-37761803806231862922014-08-13T15:02:00.001+03:002014-08-13T15:02:26.100+03:00Geo Coordinates convertingI've made discovery working on the last task: could you imagine that there are many many many geographical coordinate systems in the world? I couldn't. I was pretty sure that there is only one: longitude and latitude.<br />
<br />
Surprise! There are much more of them and they are widely popular. Some of them are used in particular domain, some of them are specific for some countries. For example, you can read more about <a href="http://en.wikipedia.org/wiki/Gauss%E2%80%93Kr%C3%BCger_coordinate_system" target="_blank">Gauss–Krüger coordinate system</a>.<br />
<br />
<!-- HTML generated using hilite.me --><br />
<div style="background: #111111; border-width: .1em .1em .1em .8em; border: solid gray; overflow: auto; padding: .2em .6em; width: auto;">
<pre style="line-height: 125%; margin: 0;"><span style="color: #fb660a; font-weight: bold;">import</span> <span style="color: white;">org.geotools.geometry.GeneralDirectPosition;</span>
<span style="color: #fb660a; font-weight: bold;">import</span> <span style="color: white;">org.geotools.referencing.CRS;</span>
<span style="color: #fb660a; font-weight: bold;">import</span> <span style="color: white;">org.opengis.geometry.DirectPosition;</span>
<span style="color: #fb660a; font-weight: bold;">import</span> <span style="color: white;">org.opengis.referencing.FactoryException;</span>
<span style="color: #fb660a; font-weight: bold;">import</span> <span style="color: white;">org.opengis.referencing.NoSuchAuthorityCodeException;</span>
<span style="color: #fb660a; font-weight: bold;">import</span> <span style="color: white;">org.opengis.referencing.crs.CoordinateReferenceSystem;</span>
<span style="color: #fb660a; font-weight: bold;">import</span> <span style="color: white;">org.opengis.referencing.operation.MathTransform;</span>
<span style="color: #fb660a; font-weight: bold;">import</span> <span style="color: white;">org.opengis.referencing.operation.TransformException;</span>
<span style="color: #fb660a; font-weight: bold;">public</span> <span style="color: #fb660a; font-weight: bold;">strictfp</span> <span style="color: #cdcaa9; font-weight: bold;">double</span><span style="color: white;">[]</span> <span style="color: #ff0086; font-weight: bold;">translate</span><span style="color: white;">(String</span> <span style="color: white;">from,</span> <span style="color: white;">String</span> <span style="color: white;">to,</span> <span style="color: #cdcaa9; font-weight: bold;">double</span> <span style="color: white;">x,</span> <span style="color: #cdcaa9; font-weight: bold;">double</span> <span style="color: white;">y)</span>
<span style="color: #fb660a; font-weight: bold;">throws</span> <span style="color: white;">FactoryException,</span> <span style="color: white;">NoSuchAuthorityCodeException,</span> <span style="color: white;">TransformException</span> <span style="color: white;">{</span>
<span style="color: white;">CoordinateReferenceSystem</span> <span style="color: white;">sourceCRS</span> <span style="color: white;">=</span> <span style="color: white;">CRS.</span><span style="color: #ff0086; font-weight: bold;">decode</span><span style="color: white;">(</span> <span style="color: white;">from</span> <span style="color: white;">);</span>
<span style="color: white;">CoordinateReferenceSystem</span> <span style="color: white;">targetCRS</span> <span style="color: white;">=</span> <span style="color: white;">CRS.</span><span style="color: #ff0086; font-weight: bold;">decode</span><span style="color: white;">(</span> <span style="color: white;">to</span> <span style="color: white;">);</span>
<span style="color: white;">MathTransform</span> <span style="color: white;">transform</span> <span style="color: white;">=</span> <span style="color: white;">CRS.</span><span style="color: #ff0086; font-weight: bold;">findMathTransform</span><span style="color: white;">(sourceCRS,</span> <span style="color: white;">targetCRS,</span> <span style="color: #fb660a; font-weight: bold;">true</span><span style="color: white;">);</span>
<span style="color: white;">DirectPosition</span> <span style="color: white;">expPt</span> <span style="color: white;">=</span> <span style="color: #fb660a; font-weight: bold;">new</span> <span style="color: white;">GeneralDirectPosition(x,</span> <span style="color: white;">y);</span>
<span style="color: white;">expPt</span> <span style="color: white;">=</span> <span style="color: white;">transform.</span><span style="color: #ff0086; font-weight: bold;">transform</span><span style="color: white;">(expPt,</span> <span style="color: #fb660a; font-weight: bold;">null</span><span style="color: white;">);</span>
<span style="color: #fb660a; font-weight: bold;">return</span> <span style="color: white;">expPt.</span><span style="color: #ff0086; font-weight: bold;">getCoordinate</span><span style="color: white;">();</span>
<span style="color: white;">}</span>
</pre>
</div>
<br />
Ok, it looks good. One time consuming issue - it's to include correct libraries with Maven, because this small piece of code has very wide dependencies and it took several hours to manage correct combination :)<br />
<br />
So, maven dependencies:<br />
<br />
<!-- HTML generated using hilite.me --><div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;border-width:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; line-height: 125%"> <span style="color: #1e90ff; font-weight: bold"><dependency></span>
<span style="color: #1e90ff; font-weight: bold"><groupId></span>org.geotools<span style="color: #1e90ff; font-weight: bold"></groupId></span>
<span style="color: #1e90ff; font-weight: bold"><artifactId></span>gt-opengis<span style="color: #1e90ff; font-weight: bold"></artifactId></span>
<span style="color: #1e90ff; font-weight: bold"><version></span>2.7.0.1<span style="color: #1e90ff; font-weight: bold"></version></span>
<span style="color: #1e90ff; font-weight: bold"></dependency></span>
<span style="color: #1e90ff; font-weight: bold"><dependency></span>
<span style="color: #1e90ff; font-weight: bold"><groupId></span>org.geotools<span style="color: #1e90ff; font-weight: bold"></groupId></span>
<span style="color: #1e90ff; font-weight: bold"><artifactId></span>gt-metadata<span style="color: #1e90ff; font-weight: bold"></artifactId></span>
<span style="color: #1e90ff; font-weight: bold"><version></span>2.7.0.1<span style="color: #1e90ff; font-weight: bold"></version></span>
<span style="color: #1e90ff; font-weight: bold"></dependency></span>
<span style="color: #1e90ff; font-weight: bold"><dependency></span>
<span style="color: #1e90ff; font-weight: bold"><groupId></span>org.geotools<span style="color: #1e90ff; font-weight: bold"></groupId></span>
<span style="color: #1e90ff; font-weight: bold"><artifactId></span>gt-referencing<span style="color: #1e90ff; font-weight: bold"></artifactId></span>
<span style="color: #1e90ff; font-weight: bold"><version></span>2.7.0.1<span style="color: #1e90ff; font-weight: bold"></version></span>
<span style="color: #1e90ff; font-weight: bold"></dependency></span>
<span style="color: #1e90ff; font-weight: bold"><dependency></span>
<span style="color: #1e90ff; font-weight: bold"><groupId></span>org.geotools<span style="color: #1e90ff; font-weight: bold"></groupId></span>
<span style="color: #1e90ff; font-weight: bold"><artifactId></span>gt-epsg-hsql<span style="color: #1e90ff; font-weight: bold"></artifactId></span>
<span style="color: #1e90ff; font-weight: bold"><version></span>2.7.0.1<span style="color: #1e90ff; font-weight: bold"></version></span>
<span style="color: #1e90ff; font-weight: bold"></dependency></span>
<span style="color: #1e90ff; font-weight: bold"><dependency></span>
<span style="color: #1e90ff; font-weight: bold"><groupId></span>javax.media<span style="color: #1e90ff; font-weight: bold"></groupId></span>
<span style="color: #1e90ff; font-weight: bold"><artifactId></span>jai_core<span style="color: #1e90ff; font-weight: bold"></artifactId></span>
<span style="color: #1e90ff; font-weight: bold"><version></span>1.1.3<span style="color: #1e90ff; font-weight: bold"></version></span>
<span style="color: #1e90ff; font-weight: bold"></dependency></span>
</pre></div>
<br />Anonymoushttp://www.blogger.com/profile/12572224859800644616noreply@blogger.com0tag:blogger.com,1999:blog-8382000407737271014.post-41640270666109891792014-07-24T17:45:00.001+03:002014-07-24T17:45:24.322+03:00Hadoop 2.2 Distributed Cache and Map JoinIt's very common to use Distributed Cache for Map joins - it gives a possibility to implement extremely fast join of huge dataset with a small one(s). Comparing to other join techniques you can win up to 1000x speed up, so Map joins are extremely useful and widely used. It's the easiest way to implement outer join, non-equie join and so on, I'd recommend to use Map join always when it is possible.<br />
<br />
What is bad about Hadoop and I don't like it - they change API very often, each new version has changes in API. The most weird example: interface Mapper. It was introduces, then deprecated and then dedepricated (in Hadoop 2 it's without @Deprecated)... oh, quite difficult to manage all changes...<br />
<br />
The last changes: <i>DistributedCache </i>is now deprecated. And you can't use the old good <i>DistributedCache.addCacheFile</i><br />
<br />
In the new Hadoop 2.x the new approach introduced:<br />
1) add file to distributed cache (I'm using symlink here):<br />
<i><span style="font-family: Georgia, Times New Roman, serif;">job.addCacheFile(new URI(conf.get("dimension.file")+"#<b>YOUR_DIM</b>"));</span></i><br />
<br />
2) in your setup method (Mapper or Reducer) the data from cache might be read with following instruction:<br />
<span style="font-family: Georgia, Times New Roman, serif;"><i>Path[] files = context.getLocalCacheFiles(); // oh, this method is again deprecated <span style="line-height: 19.049999237060547px;">ym_-)</span></i></span><br />
<span style="line-height: 19.049999237060547px;"><span style="font-family: Georgia, Times New Roman, serif;"><i><br /></i></span></span>
<span style="line-height: 19.049999237060547px;"><span style="font-family: Georgia, Times New Roman, serif;"><i>// loop over all files in cache</i></span></span><br />
<span style="line-height: 19.049999237060547px;"><span style="font-family: Georgia, Times New Roman, serif;"><i>for (Path p : files) {</i></span></span><br />
<span style="font-family: Georgia, Times New Roman, serif;"><i><span style="line-height: 19.049999237060547px;"> if (p.getName().equals("</span><b>YOUR_DIM</b><span style="line-height: 19.049999237060547px;">")) {</span></i></span><br />
<span style="line-height: 19.049999237060547px;"><span style="font-family: Georgia, Times New Roman, serif;"><i> // load cache (for example into Map)</i></span></span><br />
<span style="line-height: 19.049999237060547px;"><span style="font-family: Georgia, Times New Roman, serif;"><i> }</i></span></span><br />
<span style="line-height: 19.049999237060547px;"><span style="font-family: Georgia, Times New Roman, serif;"><i>}</i></span></span><br />
<span style="line-height: 19.049999237060547px;"><span style="font-family: Georgia, Times New Roman, serif;"><br /></span></span>
<span style="font-family: inherit;"><span style="line-height: 19.049999237060547px;">That's all, symlink are very useful for accessing file from cache.</span></span><br />
<span style="font-family: inherit;"><span style="line-height: 19.049999237060547px;"><br /></span></span>Anonymoushttp://www.blogger.com/profile/12572224859800644616noreply@blogger.com0tag:blogger.com,1999:blog-8382000407737271014.post-12508517267886963042014-07-03T12:09:00.001+03:002014-07-03T12:09:24.241+03:00Runing Spark Unit Test on Windows 7It's common situation in enterprises when developers are working on Windows platform. When you are working with Hadoop, it sounds as a f**ing shit, but this is a fact.<br />
<br />
Recently, I switched in a favor of Spark instead of traditional MapReduce paradigm and was need to implement some kind of unit/integration testing... of course, it was need to work under Windows 7.<br />
<br />
I've written very simple test: run ETL in-memory, without touching Hadoop at all (in future, I'd like to read input from local filesystem):<br />
<br />
<!-- HTML generated using hilite.me --><div style="background: #272822; overflow:auto;width:auto;border:solid gray;border-width:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; line-height: 125%"><span style="color: #a6e22e">@Test</span>
<span style="color: #f8f8f2">def</span> <span style="color: #a6e22e">testETL</span><span style="color: #f92672">()</span> <span style="color: #f92672">=</span> <span style="color: #f92672">{</span>
<span style="color: #f8f8f2">val</span> <span style="color: #f8f8f2">conf</span> <span style="color: #f92672">=</span> <span style="color: #66d9ef">new</span> <span style="color: #f8f8f2">SparkConf</span><span style="color: #f92672">()</span>
<span style="color: #f8f8f2">val</span> <span style="color: #f8f8f2">sc</span> <span style="color: #f92672">=</span> <span style="color: #66d9ef">new</span> <span style="color: #f8f8f2">SparkContext</span><span style="color: #f92672">(</span><span style="color: #e6db74">"local"</span><span style="color: #f92672">,</span> <span style="color: #e6db74">"test"</span><span style="color: #f92672">,</span> <span style="color: #f8f8f2">conf</span><span style="color: #f92672">)</span>
<span style="color: #66d9ef">try</span> <span style="color: #f92672">{</span>
<span style="color: #f8f8f2">val</span> <span style="color: #f8f8f2">etl</span> <span style="color: #f92672">=</span> <span style="color: #66d9ef">new</span> <span style="color: #f8f8f2">IxtoolsDailyAgg</span><span style="color: #f92672">()</span> <span style="color: #75715e">// empty constructor</span>
<span style="color: #f8f8f2">val</span> <span style="color: #f8f8f2">data</span> <span style="color: #f92672">=</span> <span style="color: #f8f8f2">sc</span><span style="color: #f92672">.</span><span style="color: #a6e22e">parallelize</span><span style="color: #f92672">(</span><span style="color: #f8f8f2">List</span><span style="color: #f92672">(</span><span style="color: #e6db74">"in1"</span><span style="color: #f92672">,</span> <span style="color: #e6db74">"in2"</span><span style="color: #f92672">,</span> <span style="color: #e6db74">"in3"</span><span style="color: #f92672">))</span>
<span style="color: #f8f8f2">etl</span><span style="color: #f92672">.</span><span style="color: #a6e22e">etl</span><span style="color: #f92672">(</span><span style="color: #f8f8f2">data</span><span style="color: #f92672">)</span> <span style="color: #75715e">// rdd transformation, no access to SparkContext or Hadoop</span>
<span style="color: #f8f8f2">Assert</span><span style="color: #f92672">.</span><span style="color: #a6e22e">assertTrue</span><span style="color: #f92672">(</span><span style="color: #66d9ef">true</span><span style="color: #f92672">)</span>
<span style="color: #f92672">}</span> <span style="color: #66d9ef">finally</span> <span style="color: #f92672">{</span>
<span style="color: #66d9ef">if</span><span style="color: #f92672">(</span><span style="color: #f8f8f2">sc</span> <span style="color: #f92672">!=</span> <span style="color: #66d9ef">null</span><span style="color: #f92672">)</span>
<span style="color: #f8f8f2">sc</span><span style="color: #f92672">.</span><span style="color: #a6e22e">stop</span><span style="color: #f92672">()</span>
<span style="color: #f92672">}</span>
<span style="color: #f92672">}</span>
</pre></div>
<br />
Bum! I got exception:<br />
<br />
<!-- HTML generated using hilite.me --><div style="background: #272822; overflow:auto;width:auto;border:solid gray;border-width:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; line-height: 125%"><span style="color: #f8f8f2">java</span><span style="color: #f92672">.</span><span style="color: #a6e22e">io</span><span style="color: #f92672">.</span><span style="color: #a6e22e">IOException</span><span style="color: #f92672">:</span> <span style="color: #f8f8f2">Could</span> <span style="color: #f8f8f2">not</span> <span style="color: #f8f8f2">locate</span> <span style="color: #f8f8f2">executable</span> <span style="color: #66d9ef">null</span><span style="color: #960050; background-color: #1e0010">\</span><span style="color: #f8f8f2">bin</span><span style="color: #960050; background-color: #1e0010">\</span><span style="color: #f8f8f2">winutils</span><span style="color: #f92672">.</span><span style="color: #a6e22e">exe</span> <span style="color: #f8f8f2">in</span> <span style="color: #f8f8f2">the</span> <span style="color: #f8f8f2">Hadoop</span> <span style="color: #f8f8f2">binaries</span><span style="color: #f92672">.</span>
<span style="color: #f8f8f2">at</span> <span style="color: #f8f8f2">org</span><span style="color: #f92672">.</span><span style="color: #a6e22e">apache</span><span style="color: #f92672">.</span><span style="color: #a6e22e">hadoop</span><span style="color: #f92672">.</span><span style="color: #a6e22e">util</span><span style="color: #f92672">.</span><span style="color: #a6e22e">Shell</span><span style="color: #f92672">.</span><span style="color: #a6e22e">getQualifiedBinPath</span><span style="color: #f92672">(</span><span style="color: #f8f8f2">Shell</span><span style="color: #f92672">.</span><span style="color: #a6e22e">java</span><span style="color: #f92672">:</span><span style="color: #ae81ff">318</span><span style="color: #f92672">)</span>
<span style="color: #f8f8f2">at</span> <span style="color: #f8f8f2">org</span><span style="color: #f92672">.</span><span style="color: #a6e22e">apache</span><span style="color: #f92672">.</span><span style="color: #a6e22e">hadoop</span><span style="color: #f92672">.</span><span style="color: #a6e22e">util</span><span style="color: #f92672">.</span><span style="color: #a6e22e">Shell</span><span style="color: #f92672">.</span><span style="color: #a6e22e">getWinUtilsPath</span><span style="color: #f92672">(</span><span style="color: #f8f8f2">Shell</span><span style="color: #f92672">.</span><span style="color: #a6e22e">java</span><span style="color: #f92672">:</span><span style="color: #ae81ff">333</span><span style="color: #f92672">)</span>
<span style="color: #f8f8f2">at</span> <span style="color: #f8f8f2">org</span><span style="color: #f92672">.</span><span style="color: #a6e22e">apache</span><span style="color: #f92672">.</span><span style="color: #a6e22e">hadoop</span><span style="color: #f92672">.</span><span style="color: #a6e22e">util</span><span style="color: #f92672">.</span><span style="color: #a6e22e">Shell</span><span style="color: #f92672">.<</span><span style="color: #f8f8f2">clinit</span><span style="color: #f92672">>(</span><span style="color: #f8f8f2">Shell</span><span style="color: #f92672">.</span><span style="color: #a6e22e">java</span><span style="color: #f92672">:</span><span style="color: #ae81ff">326</span><span style="color: #f92672">)</span>
<span style="color: #f8f8f2">at</span> <span style="color: #f8f8f2">org</span><span style="color: #f92672">.</span><span style="color: #a6e22e">apache</span><span style="color: #f92672">.</span><span style="color: #a6e22e">hadoop</span><span style="color: #f92672">.</span><span style="color: #a6e22e">util</span><span style="color: #f92672">.</span><span style="color: #a6e22e">StringUtils</span><span style="color: #f92672">.<</span><span style="color: #f8f8f2">clinit</span><span style="color: #f92672">>(</span><span style="color: #f8f8f2">StringUtils</span><span style="color: #f92672">.</span><span style="color: #a6e22e">java</span><span style="color: #f92672">:</span><span style="color: #ae81ff">76</span><span style="color: #f92672">)</span>
<span style="color: #f8f8f2">at</span> <span style="color: #f8f8f2">org</span><span style="color: #f92672">.</span><span style="color: #a6e22e">apache</span><span style="color: #f92672">.</span><span style="color: #a6e22e">hadoop</span><span style="color: #f92672">.</span><span style="color: #a6e22e">security</span><span style="color: #f92672">.</span><span style="color: #a6e22e">Groups</span><span style="color: #f92672">.</span><span style="color: #a6e22e">parseStaticMapping</span><span style="color: #f92672">(</span><span style="color: #f8f8f2">Groups</span><span style="color: #f92672">.</span><span style="color: #a6e22e">java</span><span style="color: #f92672">:</span><span style="color: #ae81ff">93</span><span style="color: #f92672">)</span>
<span style="color: #f8f8f2">at</span> <span style="color: #f8f8f2">org</span><span style="color: #f92672">.</span><span style="color: #a6e22e">apache</span><span style="color: #f92672">.</span><span style="color: #a6e22e">hadoop</span><span style="color: #f92672">.</span><span style="color: #a6e22e">security</span><span style="color: #f92672">.</span><span style="color: #a6e22e">Groups</span><span style="color: #f92672">.<</span><span style="color: #f8f8f2">init</span><span style="color: #f92672">>(</span><span style="color: #f8f8f2">Groups</span><span style="color: #f92672">.</span><span style="color: #a6e22e">java</span><span style="color: #f92672">:</span><span style="color: #ae81ff">77</span><span style="color: #f92672">)</span>
<span style="color: #f8f8f2">at</span> <span style="color: #f8f8f2">org</span><span style="color: #f92672">.</span><span style="color: #a6e22e">apache</span><span style="color: #f92672">.</span><span style="color: #a6e22e">hadoop</span><span style="color: #f92672">.</span><span style="color: #a6e22e">security</span><span style="color: #f92672">.</span><span style="color: #a6e22e">Groups</span><span style="color: #f92672">.</span><span style="color: #a6e22e">getUserToGroupsMappingService</span><span style="color: #f92672">(</span><span style="color: #f8f8f2">Groups</span><span style="color: #f92672">.</span><span style="color: #a6e22e">java</span><span style="color: #f92672">:</span><span style="color: #ae81ff">240</span><span style="color: #f92672">)</span>
<span style="color: #f8f8f2">at</span> <span style="color: #f8f8f2">org</span><span style="color: #f92672">.</span><span style="color: #a6e22e">apache</span><span style="color: #f92672">.</span><span style="color: #a6e22e">hadoop</span><span style="color: #f92672">.</span><span style="color: #a6e22e">security</span><span style="color: #f92672">.</span><span style="color: #a6e22e">UserGroupInformation</span><span style="color: #f92672">.</span><span style="color: #a6e22e">initialize</span><span style="color: #f92672">(</span><span style="color: #f8f8f2">UserGroupInformation</span><span style="color: #f92672">.</span><span style="color: #a6e22e">java</span><span style="color: #f92672">:</span><span style="color: #ae81ff">255</span><span style="color: #f92672">)</span>
<span style="color: #f8f8f2">at</span> <span style="color: #f8f8f2">org</span><span style="color: #f92672">.</span><span style="color: #a6e22e">apache</span><span style="color: #f92672">.</span><span style="color: #a6e22e">hadoop</span><span style="color: #f92672">.</span><span style="color: #a6e22e">security</span><span style="color: #f92672">.</span><span style="color: #a6e22e">UserGroupInformation</span><span style="color: #f92672">.</span><span style="color: #a6e22e">setConfiguration</span><span style="color: #f92672">(</span><span style="color: #f8f8f2">UserGroupInformation</span><span style="color: #f92672">.</span><span style="color: #a6e22e">java</span><span style="color: #f92672">:</span><span style="color: #ae81ff">283</span><span style="color: #f92672">)</span>
<span style="color: #f8f8f2">at</span> <span style="color: #f8f8f2">org</span><span style="color: #f92672">.</span><span style="color: #a6e22e">apache</span><span style="color: #f92672">.</span><span style="color: #a6e22e">spark</span><span style="color: #f92672">.</span><span style="color: #a6e22e">deploy</span><span style="color: #f92672">.</span><span style="color: #a6e22e">SparkHadoopUtil</span><span style="color: #f92672">.<</span><span style="color: #f8f8f2">init</span><span style="color: #f92672">>(</span><span style="color: #f8f8f2">SparkHadoopUtil</span><span style="color: #f92672">.</span><span style="color: #a6e22e">scala</span><span style="color: #f92672">:</span><span style="color: #ae81ff">36</span><span style="color: #f92672">)</span>
<span style="color: #f8f8f2">at</span> <span style="color: #f8f8f2">org</span><span style="color: #f92672">.</span><span style="color: #a6e22e">apache</span><span style="color: #f92672">.</span><span style="color: #a6e22e">spark</span><span style="color: #f92672">.</span><span style="color: #a6e22e">deploy</span><span style="color: #f92672">.</span><span style="color: #a6e22e">SparkHadoopUtil</span><span style="color: #f8f8f2">$</span><span style="color: #f92672">.<</span><span style="color: #f8f8f2">init</span><span style="color: #f92672">>(</span><span style="color: #f8f8f2">SparkHadoopUtil</span><span style="color: #f92672">.</span><span style="color: #a6e22e">scala</span><span style="color: #f92672">:</span><span style="color: #ae81ff">109</span><span style="color: #f92672">)</span>
<span style="color: #f8f8f2">at</span> <span style="color: #f8f8f2">org</span><span style="color: #f92672">.</span><span style="color: #a6e22e">apache</span><span style="color: #f92672">.</span><span style="color: #a6e22e">spark</span><span style="color: #f92672">.</span><span style="color: #a6e22e">deploy</span><span style="color: #f92672">.</span><span style="color: #a6e22e">SparkHadoopUtil</span><span style="color: #f8f8f2">$</span><span style="color: #f92672">.<</span><span style="color: #f8f8f2">clinit</span><span style="color: #f92672">>(</span><span style="color: #f8f8f2">SparkHadoopUtil</span><span style="color: #f92672">.</span><span style="color: #a6e22e">scala</span><span style="color: #f92672">)</span>
<span style="color: #f8f8f2">at</span> <span style="color: #f8f8f2">org</span><span style="color: #f92672">.</span><span style="color: #a6e22e">apache</span><span style="color: #f92672">.</span><span style="color: #a6e22e">spark</span><span style="color: #f92672">.</span><span style="color: #a6e22e">SparkContext</span><span style="color: #f92672">.<</span><span style="color: #f8f8f2">init</span><span style="color: #f92672">>(</span><span style="color: #f8f8f2">SparkContext</span><span style="color: #f92672">.</span><span style="color: #a6e22e">scala</span><span style="color: #f92672">:</span><span style="color: #ae81ff">228</span><span style="color: #f92672">)</span>
<span style="color: #f8f8f2">at</span> <span style="color: #f8f8f2">org</span><span style="color: #f92672">.</span><span style="color: #a6e22e">apache</span><span style="color: #f92672">.</span><span style="color: #a6e22e">spark</span><span style="color: #f92672">.</span><span style="color: #a6e22e">SparkContext</span><span style="color: #f92672">.<</span><span style="color: #f8f8f2">init</span><span style="color: #f92672">>(</span><span style="color: #f8f8f2">SparkContext</span><span style="color: #f92672">.</span><span style="color: #a6e22e">scala</span><span style="color: #f92672">:</span><span style="color: #ae81ff">97</span><span style="color: #f92672">)</span>
</pre></div>
<br />
<br />
What?<br />
<blockquote class="tr_bq">
org.apache.hadoop.util.Shell.<clinit>(Shell.java:326)</clinit></blockquote>
I swear, I didn't use Hadoop in my code!<br />
Unfortunately, Hadoop configuration is initialized together with SparkContext :( no way to omit it...<br />
I was recommended to install HDP on Windows, but I hate this idea...<br />
<br />
I tried the most stupid idea - provide winutils.exe... I hope, it's only the check of environment and Hadoop functionality won't be used if I don't touch it.<br />
So, I downloaded <b>winutils.exe</b> from <a href="http://social.msdn.microsoft.com/Forums/windowsazure/en-US/28a57efb-082b-424b-8d9e-731b1fe135de/please-read-if-experiencing-job-failures?forum=hdinsight" target="_blank">msdn</a> (msdn still helpful even for hadooper), put it to created directory <i>d:\winutil\bin</i> and then add<br />
<blockquote class="tr_bq">
System.setProperty("hadoop.home.dir", "d:\\winutil\\") </blockquote>
at the beginning of my unit testAnonymoushttp://www.blogger.com/profile/12572224859800644616noreply@blogger.com21tag:blogger.com,1999:blog-8382000407737271014.post-78538939561136745222014-04-24T17:01:00.002+03:002014-04-24T17:35:30.478+03:00Hue Notifier for Hadoop goes wildSeveral months ago I developed Chrome browser plugin for my own needs. As a Hadoop engineer I faced with one problem everyday. I run a lot of Hive/Pig jobs simultaneously and they take a lot of time (from several minutes to several hours). So, I had mission to check job completion by walking Hue's pages in my browser. Well, it was 1) irritate, 2) draw away from coding...<br />
<br />
As solution I developed <a href="https://chrome.google.com/webstore/detail/hue-notifier-for-hadoop/nphihgndmlbjaenficpnlmaoalpcgioi" target="_blank">Hue Notifier for Hadoop plugin</a> for Google Chrome. It "monitors" state of job and inform you about completion similar to GMail informs about new mail (pop-up over all windows). I have a quite limited knowledge of JavaScript and it has been first time I wrote browser plugin... so, I'm absolutely sure it might be improved. I tested it with Hue delivered with Cloudera 4.3 and Cloudera 5 as well as HDP2.0. The most irritating issue w/ my code: Chrome Notification must be enabled manually before start using plugin :(<br />
<br />
The source code is generally available at <a href="https://github.com/mwacc/hue-chrome" target="_blank">GitHub under this repository</a>. You are welcome to fork and improve this one. Or, if you wish just to contribute, ping me and I will grant access (and push changes to Google Play afterwards).<br />
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="http://aral.github.io/fork-me-on-github-retina-ribbons/right-dusk-blue@2x.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="http://aral.github.io/fork-me-on-github-retina-ribbons/right-dusk-blue@2x.png" height="200" width="200" /></a></div>
Anonymoushttp://www.blogger.com/profile/12572224859800644616noreply@blogger.com1tag:blogger.com,1999:blog-8382000407737271014.post-70551464447276372662014-04-18T14:18:00.000+03:002014-04-18T14:18:03.356+03:00Building BuilData ETL with Hive and OoziePerhaps, Hive is the most successful component of today's Hadoop infrastructure. It provides simple and efficient way of creating Hadoop-based data processing jobs with comfortable SQL-like language.
But, in contract to Pig, it's not a workflow-friendly language and requires additional effort to create a real multi-step ETL. <br />
Oozie was created to eliminate workflow/scheduling issues and, obvious, may be used to create ETL and naturally engages Hive. <br />
<br />
<a name='more'></a><br /><br />
Workflow is a core component of any Oozie job and it is list of required steps to accomplish task. So, workflow gives a way to describe ETL and there is the example of using Hive in Oozie workflow:
<!-- HTML generated using hilite.me --><div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;border-width:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; line-height: 125%"><span style="color: #1e90ff; font-weight: bold"><workflow-app</span> <span style="color: #1e90ff">xmlns=</span><span style="color: #aa5500">"uri:oozie:workflow:0.2"</span> <span style="color: #1e90ff">name=</span><span style="color: #aa5500">"etl-by-month-wf"</span> <span style="color: #1e90ff">xmlns:sla=</span><span style="color: #aa5500">"uri:oozie:sla:0.1"</span><span style="color: #1e90ff; font-weight: bold">></span>
<span style="color: #1e90ff; font-weight: bold"><start</span> <span style="color: #1e90ff">to=</span><span style="color: #aa5500">"xxx"</span><span style="color: #1e90ff; font-weight: bold">/></span>
<span style="color: #1e90ff; font-weight: bold"><action</span> <span style="color: #1e90ff">name=</span><span style="color: #aa5500">"xxx"</span><span style="color: #1e90ff; font-weight: bold">></span>
<span style="color: #1e90ff; font-weight: bold"><hive</span> <span style="color: #1e90ff">xmlns=</span><span style="color: #aa5500">"uri:oozie:hive-action:0.2"</span><span style="color: #1e90ff; font-weight: bold">></span>
<span style="color: #1e90ff; font-weight: bold"><job-tracker></span>${jobTracker}<span style="color: #1e90ff; font-weight: bold"></job-tracker></span>
<span style="color: #1e90ff; font-weight: bold"><name-node></span>${nameNode}<span style="color: #1e90ff; font-weight: bold"></name-node></span>
<span style="color: #1e90ff; font-weight: bold"><job-xml></span>${hiveSiteXml}<span style="color: #1e90ff; font-weight: bold"></job-xml></span>
<span style="color: #1e90ff; font-weight: bold"><script></span>${projectSource}/first_step.hql<span style="color: #1e90ff; font-weight: bold"></script></span>
<span style="color: #1e90ff; font-weight: bold"><param></span>hiveSchema=${hiveSchema}<span style="color: #1e90ff; font-weight: bold"></param></span>
<span style="color: #1e90ff; font-weight: bold"><param></span>dataLocality=${dataOutput}<span style="color: #1e90ff; font-weight: bold"></param></span>
<span style="color: #1e90ff; font-weight: bold"><param></span>flowID=${wf:id()}<span style="color: #1e90ff; font-weight: bold"></param></span>
<span style="color: #1e90ff; font-weight: bold"><param></span>arg1=${argument}<span style="color: #1e90ff; font-weight: bold"></param></span>
<span style="color: #1e90ff; font-weight: bold"></hive></span>
<span style="color: #1e90ff; font-weight: bold"><ok</span> <span style="color: #1e90ff">to=</span><span style="color: #aa5500">"yyy"</span><span style="color: #1e90ff; font-weight: bold">/></span>
<span style="color: #1e90ff; font-weight: bold"><error</span> <span style="color: #1e90ff">to=</span><span style="color: #aa5500">"fail"</span><span style="color: #1e90ff; font-weight: bold">/></span>
<span style="color: #1e90ff; font-weight: bold"></action></span>
<span style="color: #1e90ff; font-weight: bold"><action</span> <span style="color: #1e90ff">name=</span><span style="color: #aa5500">"yyy"</span><span style="color: #1e90ff; font-weight: bold">></span>
<span style="color: #1e90ff; font-weight: bold"><hive</span> <span style="color: #1e90ff">xmlns=</span><span style="color: #aa5500">"uri:oozie:hive-action:0.2"</span><span style="color: #1e90ff; font-weight: bold">></span>
<span style="color: #1e90ff; font-weight: bold"><job-tracker></span>${jobTracker}<span style="color: #1e90ff; font-weight: bold"></job-tracker></span>
<span style="color: #1e90ff; font-weight: bold"><name-node></span>${nameNode}<span style="color: #1e90ff; font-weight: bold"></name-node></span>
<span style="color: #1e90ff; font-weight: bold"><job-xml></span>${hiveSiteXml}<span style="color: #1e90ff; font-weight: bold"></job-xml></span>
<span style="color: #1e90ff; font-weight: bold"><script></span>${projectSource}/second_step.hql<span style="color: #1e90ff; font-weight: bold"></script></span>
<span style="color: #1e90ff; font-weight: bold"><param></span>hiveSchema=${hiveSchema}<span style="color: #1e90ff; font-weight: bold"></param></span>
<span style="color: #1e90ff; font-weight: bold"><param></span>dataLocality=${dataOutput}<span style="color: #1e90ff; font-weight: bold"></param></span>
<span style="color: #1e90ff; font-weight: bold"><param></span>flowID=${wf:id()}<span style="color: #1e90ff; font-weight: bold"></param></span>
<span style="color: #1e90ff; font-weight: bold"></hive></span>
<span style="color: #1e90ff; font-weight: bold"><ok</span> <span style="color: #1e90ff">to=</span><span style="color: #aa5500">"end"</span><span style="color: #1e90ff; font-weight: bold">/></span>
<span style="color: #1e90ff; font-weight: bold"><error</span> <span style="color: #1e90ff">to=</span><span style="color: #aa5500">"fail"</span><span style="color: #1e90ff; font-weight: bold">/></span>
<span style="color: #1e90ff; font-weight: bold"></action></span>
<span style="color: #1e90ff; font-weight: bold"><kill</span> <span style="color: #1e90ff">name=</span><span style="color: #aa5500">"fail"</span><span style="color: #1e90ff; font-weight: bold">></span>
<span style="color: #1e90ff; font-weight: bold"><message></span>Error message[${wf:errorMessage(wf:lastErrorNode())}]<span style="color: #1e90ff; font-weight: bold"></message></span>
<span style="color: #1e90ff; font-weight: bold"></kill></span>
<span style="color: #1e90ff; font-weight: bold"><end</span> <span style="color: #1e90ff">name=</span><span style="color: #aa5500">"end"</span><span style="color: #1e90ff; font-weight: bold">/></span>
<span style="color: #1e90ff; font-weight: bold"></workflow-app></span>
</pre></div>
<br/>
<br/>
Well, it describes two-steps job, content of executed hive scripts are located in first_step.hql and second_step.hql respectively (both located on HDFS). <br/>
Some preparations are required before start of using it</br>
Put to HDFS hive-site.xml with added property:
<!-- HTML generated using hilite.me --><div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;border-width:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; line-height: 125%"><span style="color: #1e90ff; font-weight: bold"><property></span>
<span style="color: #1e90ff; font-weight: bold"><name></span>hive.exec.scratchdir<span style="color: #1e90ff; font-weight: bold"></name></span>
<span style="color: #1e90ff; font-weight: bold"><value></span>/user/cloudera/data/tmp<span style="color: #1e90ff; font-weight: bold"></value></span>
<span style="color: #1e90ff; font-weight: bold"></property></span>
</pre></div>
<br/>
Hive uses temporary folders both on the machine running the Hive client and the default HDFS instance. These folders are used to store per-query temporary/intermediate data sets and are normally cleaned up by the hive client when the query is finished. However, in cases of abnormal hive client termination, some data may be left behind. The configuration details are as follows:
On the HDFS cluster this is set to /tmp/hive-<username> by default and is controlled by the configuration variable hive.exec.scratchdir
On the client machine, this is hardcoded to /tmp/<username> - <b>permission issue</b>
<br/>
After that, property file is required:<br/>
<!-- HTML generated using hilite.me --><div style="background: #ffffff; overflow:auto;width:auto;border:solid gray;border-width:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; line-height: 125%">nameNode=hdfs://localhost.localdomain:8020
jobTracker=localhost.localdomain:8021
user.name=cloudera
base_url=${nameNode}/user/${user.name}
oozie.use.system.libpath=true
oozie.libpath=/user/oozie/share/lib/hive
hiveSiteXml=/user/cloudera/hive-site.xml
oozie.wf.application.path=${base_url}/start.dir/workflow.xml
hiveSchema=your_db
</pre></div>
</br>
Put workflow to the path specified at <i>oozie.wf.application.path</i>. Also, directory lib may be created at this path and used for saving different jars required by workflow (for example, custom Hive UDF). <br/>
And the final step: run job on oozie server, it may be done with the next command (assume you put properties localy):<br/>
<!-- HTML generated using hilite.me --><div style="background: #eeeedd; overflow:auto;width:auto;border:solid gray;border-width:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; line-height: 125%">oozie job -oozie http://localhost:11000/oozie -config oozie.conf.properties -run
</pre></div>
Anonymoushttp://www.blogger.com/profile/12572224859800644616noreply@blogger.com1tag:blogger.com,1999:blog-8382000407737271014.post-32207538329483455152014-04-01T16:47:00.000+03:002014-04-18T14:18:29.237+03:00Spark on HDP2There is my first experience with<b> Apache Spark, running it on Hadoop</b>.
I faced in several issues during running my piece of code. <br />
To be honest, I started with Cloudera CDH5 distribution, they promised <b>Spark </b>was already added and usage will be simple. But no luck in fact, it doesn't work at all - even on local machine with their spark-cloudera jar. I didn't want to waste my time, so I just downloaded spark distro to <b>HDP2</b>.<br />
First of all, let start <b>Spark in standalone mode</b>, according to documentation:
<!-- HTML generated using hilite.me --><br />
<div style="background: #f0f0f0; border-width: .1em .1em .1em .8em; border: solid gray; overflow: auto; padding: .2em .6em; width: auto;">
<pre style="line-height: 125%; margin: 0;"><span style="color: #60a0b0; font-style: italic;"># start master</span>
./sbin/start-master.sh
<span style="color: #60a0b0; font-style: italic;"># pick up in the log output spark://IP:PORT</span>
<span style="color: #60a0b0; font-style: italic;"># and than run worker on each node</span>
./bin/spark-class org.apache.spark.deploy.worker.Worker spark://IP:PORT
<span style="color: #60a0b0; font-style: italic;"># more documentation available here https://spark.apache.org/docs/0.9.0/spark-standalone.html</span>
</pre>
</div>
<br />
After that I wrote some amount of Scala code, in fact to just count hardcoded words in document:<br />
<!-- HTML generated using hilite.me --><br />
<div style="background: #f0f0f0; border-width: .1em .1em .1em .8em; border: solid gray; overflow: auto; padding: .2em .6em; width: auto;">
<pre style="line-height: 125%; margin: 0;"><span style="color: #007020; font-weight: bold;">package</span> <span style="color: #0e84b5; font-weight: bold;">experiment</span>
<span style="color: #007020; font-weight: bold;">import</span> <span style="color: #0e84b5; font-weight: bold;">org.apache.spark.</span><span style="color: #666666;">{</span><span style="color: #0e84b5; font-weight: bold;">SparkConf</span><span style="color: #666666;">,</span> <span style="color: #0e84b5; font-weight: bold;">SparkContext</span><span style="color: #666666;">}</span>
<span style="color: #007020; font-weight: bold;">object</span> <span style="color: #0e84b5; font-weight: bold;">SimpleApp</span> <span style="color: #666666;">{</span>
<span style="color: #007020; font-weight: bold;">def</span> main<span style="color: #666666;">(</span>args<span style="color: #007020; font-weight: bold;">:</span> <span style="color: #902000;">Array</span><span style="color: #666666;">[</span><span style="color: #902000;">String</span><span style="color: #666666;">])</span> <span style="color: #666666;">{</span>
<span style="color: #007020; font-weight: bold;">val</span> logFile <span style="color: #007020; font-weight: bold;">=</span> args<span style="color: #666666;">(</span><span style="color: #40a070;">0</span><span style="color: #666666;">)</span>
<span style="color: #007020; font-weight: bold;">val</span> conf <span style="color: #007020; font-weight: bold;">=</span> <span style="color: #007020; font-weight: bold;">new</span> <span style="color: #0e84b5; font-weight: bold;">SparkConf</span><span style="color: #666666;">()</span>
<span style="color: #666666;">.</span>setMaster<span style="color: #666666;">(</span><span style="color: #4070a0;">"local"</span><span style="color: #666666;">)</span>
<span style="color: #666666;">.</span>setAppName<span style="color: #666666;">(</span><span style="color: #4070a0;">"My Spark application"</span><span style="color: #666666;">)</span>
<span style="color: #666666;">.</span>set<span style="color: #666666;">(</span><span style="color: #4070a0;">"spark.executor.memory"</span><span style="color: #666666;">,</span> <span style="color: #4070a0;">"1g"</span><span style="color: #666666;">)</span>
<span style="color: #007020; font-weight: bold;">val</span> sc <span style="color: #007020; font-weight: bold;">=</span> <span style="color: #007020; font-weight: bold;">new</span> <span style="color: #0e84b5; font-weight: bold;">SparkContext</span><span style="color: #666666;">(</span>conf<span style="color: #666666;">)</span>
<span style="color: #60a0b0; font-style: italic;">// hdfs:///user/hue/input.txt</span>
<span style="color: #007020; font-weight: bold;">val</span> logData <span style="color: #007020; font-weight: bold;">=</span> sc<span style="color: #666666;">.</span>textFile<span style="color: #666666;">(</span>logFile<span style="color: #666666;">,</span> <span style="color: #40a070;">2</span><span style="color: #666666;">).</span>cache<span style="color: #666666;">()</span>
<span style="color: #007020; font-weight: bold;">val</span> numAs <span style="color: #007020; font-weight: bold;">=</span> logData<span style="color: #666666;">.</span>filter<span style="color: #666666;">(</span>line <span style="color: #007020; font-weight: bold;">=></span> line<span style="color: #666666;">.</span>contains<span style="color: #666666;">(</span><span style="color: #4070a0;">"London"</span><span style="color: #666666;">)).</span>count<span style="color: #666666;">()</span>
<span style="color: #007020; font-weight: bold;">val</span> numBs <span style="color: #007020; font-weight: bold;">=</span> logData<span style="color: #666666;">.</span>filter<span style="color: #666666;">(</span>line <span style="color: #007020; font-weight: bold;">=>;</span> line<span style="color: #666666;">.</span>contains<span style="color: #666666;">(</span><span style="color: #4070a0;">"Lviv"</span><span style="color: #666666;">)).</span>count<span style="color: #666666;">()</span>
println<span style="color: #666666;">(</span><span style="color: #4070a0;">"Lines with London: %s, Lines with Lviv: %s"</span><span style="color: #666666;">.</span>format<span style="color: #666666;">(</span>numAs<span style="color: #666666;">,</span> numBs<span style="color: #666666;">))</span>
<span style="color: #666666;">}</span>
</pre>
</div>
<br />
<br />
<a name='more'></a><br /><br />
<br />
It was the easiest part! After that I spent a couple of hours making correct build with <b>Maven</b>, the result is:<br />
<!-- HTML generated using hilite.me --><div style="background: #f0f0f0; overflow:auto;width:auto;border:solid gray;border-width:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; line-height: 125%"><span style="color: #007020"><?xml version="1.0" encoding="UTF-8"?></span>
<span style="color: #062873; font-weight: bold"><project</span> <span style="color: #4070a0">xmlns="http://maven.apache.org/POM/4.0.0"</span>
<span style="color: #4070a0">xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"</span>
<span style="color: #4070a0">xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd"</span><span style="color: #062873; font-weight: bold">></span>
<span style="color: #062873; font-weight: bold"><modelVersion></span>4.0.0<span style="color: #062873; font-weight: bold"></modelVersion></span>
<span style="color: #062873; font-weight: bold"><groupId></span>SparkBegining<span style="color: #062873; font-weight: bold"></groupId></span>
<span style="color: #062873; font-weight: bold"><artifactId></span>SparkBegining<span style="color: #062873; font-weight: bold"></artifactId></span>
<span style="color: #062873; font-weight: bold"><packaging></span>jar<span style="color: #062873; font-weight: bold"></packaging></span>
<span style="color: #062873; font-weight: bold"><version></span>1.0-SNAPSHOT<span style="color: #062873; font-weight: bold"></version></span>
<span style="color: #062873; font-weight: bold"><properties></span>
<span style="color: #062873; font-weight: bold"><scala.version></span>2.10.0<span style="color: #062873; font-weight: bold"></scala.version></span>
<span style="color: #062873; font-weight: bold"></properties></span>
<span style="color: #062873; font-weight: bold"><repositories></span>
<span style="color: #062873; font-weight: bold"><repository></span>
<span style="color: #062873; font-weight: bold"><id></span>Akka repository<span style="color: #062873; font-weight: bold"></id></span>
<span style="color: #062873; font-weight: bold"><url></span>http://repo.akka.io/releases<span style="color: #062873; font-weight: bold"></url></span>
<span style="color: #062873; font-weight: bold"></repository></span>
<span style="color: #062873; font-weight: bold"><repository></span>
<span style="color: #062873; font-weight: bold"><id></span>scala<span style="color: #062873; font-weight: bold"></id></span>
<span style="color: #062873; font-weight: bold"><name></span>Scala Tools<span style="color: #062873; font-weight: bold"></name></span>
<span style="color: #062873; font-weight: bold"><url></span>http://scala-tools.org/repo-releases/<span style="color: #062873; font-weight: bold"></url></span>
<span style="color: #062873; font-weight: bold"><releases></span>
<span style="color: #062873; font-weight: bold"><enabled></span>true<span style="color: #062873; font-weight: bold"></enabled></span>
<span style="color: #062873; font-weight: bold"></releases></span>
<span style="color: #062873; font-weight: bold"><snapshots></span>
<span style="color: #062873; font-weight: bold"><enabled></span>false<span style="color: #062873; font-weight: bold"></enabled></span>
<span style="color: #062873; font-weight: bold"></snapshots></span>
<span style="color: #062873; font-weight: bold"></repository></span>
<span style="color: #062873; font-weight: bold"></repositories></span>
<span style="color: #062873; font-weight: bold"><pluginRepositories></span>
<span style="color: #062873; font-weight: bold"><pluginRepository></span>
<span style="color: #062873; font-weight: bold"><id></span>scala-tools.org<span style="color: #062873; font-weight: bold"></id></span>
<span style="color: #062873; font-weight: bold"><name></span>Scala-Tools Maven2 Repository<span style="color: #062873; font-weight: bold"></name></span>
<span style="color: #062873; font-weight: bold"><url></span>http://scala-tools.org/repo-releases<span style="color: #062873; font-weight: bold"></url></span>
<span style="color: #062873; font-weight: bold"></pluginRepository></span>
<span style="color: #062873; font-weight: bold"></pluginRepositories></span>
<span style="color: #062873; font-weight: bold"><dependencies></span>
<span style="color: #062873; font-weight: bold"><dependency></span>
<span style="color: #062873; font-weight: bold"><groupId></span>org.scala-lang<span style="color: #062873; font-weight: bold"></groupId></span>
<span style="color: #062873; font-weight: bold"><artifactId></span>scala-library<span style="color: #062873; font-weight: bold"></artifactId></span>
<span style="color: #062873; font-weight: bold"><version></span>${scala.version}<span style="color: #062873; font-weight: bold"></version></span>
<span style="color: #062873; font-weight: bold"></dependency></span>
<span style="color: #062873; font-weight: bold"><dependency></span>
<span style="color: #062873; font-weight: bold"><groupId></span>org.apache.spark<span style="color: #062873; font-weight: bold"></groupId></span>
<span style="color: #062873; font-weight: bold"><artifactId></span>spark-core_2.10<span style="color: #062873; font-weight: bold"></artifactId></span>
<span style="color: #062873; font-weight: bold"><version></span>0.9.0-incubating<span style="color: #062873; font-weight: bold"></version></span>
<span style="color: #062873; font-weight: bold"><exclusions></span>
<span style="color: #062873; font-weight: bold"><exclusion></span>
<span style="color: #062873; font-weight: bold"><artifactId></span>com.google.protobuf<span style="color: #062873; font-weight: bold"></artifactId></span>
<span style="color: #062873; font-weight: bold"><groupId></span>protobuf-java<span style="color: #062873; font-weight: bold"></groupId></span>
<span style="color: #062873; font-weight: bold"></exclusion></span>
<span style="color: #062873; font-weight: bold"></exclusions></span>
<span style="color: #062873; font-weight: bold"></dependency></span>
<span style="color: #062873; font-weight: bold"><dependency></span>
<span style="color: #062873; font-weight: bold"><groupId></span>com.google.protobuf<span style="color: #062873; font-weight: bold"></groupId></span>
<span style="color: #062873; font-weight: bold"><artifactId></span>protobuf-java<span style="color: #062873; font-weight: bold"></artifactId></span>
<span style="color: #062873; font-weight: bold"><version></span>2.5.0<span style="color: #062873; font-weight: bold"></version></span>
<span style="color: #062873; font-weight: bold"></dependency></span>
<span style="color: #062873; font-weight: bold"><dependency></span>
<span style="color: #062873; font-weight: bold"><groupId></span>org.apache.hadoop<span style="color: #062873; font-weight: bold"></groupId></span>
<span style="color: #062873; font-weight: bold"><artifactId></span>hadoop-client<span style="color: #062873; font-weight: bold"></artifactId></span>
<span style="color: #062873; font-weight: bold"><version></span>2.2.0<span style="color: #062873; font-weight: bold"></version></span>
<span style="color: #062873; font-weight: bold"></dependency></span>
<span style="color: #062873; font-weight: bold"><dependency></span>
<span style="color: #062873; font-weight: bold"><groupId></span>junit<span style="color: #062873; font-weight: bold"></groupId></span>
<span style="color: #062873; font-weight: bold"><artifactId></span>junit<span style="color: #062873; font-weight: bold"></artifactId></span>
<span style="color: #062873; font-weight: bold"><version></span>4.4<span style="color: #062873; font-weight: bold"></version></span>
<span style="color: #062873; font-weight: bold"><scope></span>test<span style="color: #062873; font-weight: bold"></scope></span>
<span style="color: #062873; font-weight: bold"></dependency></span>
<span style="color: #062873; font-weight: bold"><dependency></span>
<span style="color: #062873; font-weight: bold"><groupId></span>org.specs<span style="color: #062873; font-weight: bold"></groupId></span>
<span style="color: #062873; font-weight: bold"><artifactId></span>specs<span style="color: #062873; font-weight: bold"></artifactId></span>
<span style="color: #062873; font-weight: bold"><version></span>1.2.5<span style="color: #062873; font-weight: bold"></version></span>
<span style="color: #062873; font-weight: bold"><scope></span>test<span style="color: #062873; font-weight: bold"></scope></span>
<span style="color: #062873; font-weight: bold"></dependency></span>
<span style="color: #062873; font-weight: bold"></dependencies></span>
<span style="color: #062873; font-weight: bold"><build></span>
<span style="color: #062873; font-weight: bold"><sourceDirectory></span>src/main/scala<span style="color: #062873; font-weight: bold"></sourceDirectory></span>
<span style="color: #062873; font-weight: bold"><testSourceDirectory></span>src/test/scala<span style="color: #062873; font-weight: bold"></testSourceDirectory></span>
<span style="color: #062873; font-weight: bold"><plugins></span>
<span style="color: #062873; font-weight: bold"><plugin></span>
<span style="color: #062873; font-weight: bold"><groupId></span>org.scala-tools<span style="color: #062873; font-weight: bold"></groupId></span>
<span style="color: #062873; font-weight: bold"><artifactId></span>maven-scala-plugin<span style="color: #062873; font-weight: bold"></artifactId></span>
<span style="color: #062873; font-weight: bold"><executions></span>
<span style="color: #062873; font-weight: bold"><execution></span>
<span style="color: #062873; font-weight: bold"><goals></span>
<span style="color: #062873; font-weight: bold"><goal></span>compile<span style="color: #062873; font-weight: bold"></goal></span>
<span style="color: #062873; font-weight: bold"><goal></span>testCompile<span style="color: #062873; font-weight: bold"></goal></span>
<span style="color: #062873; font-weight: bold"></goals></span>
<span style="color: #062873; font-weight: bold"></execution></span>
<span style="color: #062873; font-weight: bold"></executions></span>
<span style="color: #062873; font-weight: bold"><configuration></span>
<span style="color: #062873; font-weight: bold"><scalaVersion></span>${scala.version}<span style="color: #062873; font-weight: bold"></scalaVersion></span>
<span style="color: #062873; font-weight: bold"><args></span>
<span style="color: #062873; font-weight: bold"><arg></span>-target:jvm-1.5<span style="color: #062873; font-weight: bold"></arg></span>
<span style="color: #062873; font-weight: bold"></args></span>
<span style="color: #062873; font-weight: bold"></configuration></span>
<span style="color: #062873; font-weight: bold"></plugin></span>
<span style="color: #062873; font-weight: bold"><plugin></span>
<span style="color: #062873; font-weight: bold"><groupId></span>org.apache.maven.plugins<span style="color: #062873; font-weight: bold"></groupId></span>
<span style="color: #062873; font-weight: bold"><artifactId></span>maven-eclipse-plugin<span style="color: #062873; font-weight: bold"></artifactId></span>
<span style="color: #062873; font-weight: bold"><configuration></span>
<span style="color: #062873; font-weight: bold"><downloadSources></span>true<span style="color: #062873; font-weight: bold"></downloadSources></span>
<span style="color: #062873; font-weight: bold"><buildcommands></span>
<span style="color: #062873; font-weight: bold"><buildcommand></span>ch.epfl.lamp.sdt.core.scalabuilder<span style="color: #062873; font-weight: bold"></buildcommand></span>
<span style="color: #062873; font-weight: bold"></buildcommands></span>
<span style="color: #062873; font-weight: bold"><additionalProjectnatures></span>
<span style="color: #062873; font-weight: bold"><projectnature></span>ch.epfl.lamp.sdt.core.scalanature<span style="color: #062873; font-weight: bold"></projectnature></span>
<span style="color: #062873; font-weight: bold"></additionalProjectnatures></span>
<span style="color: #062873; font-weight: bold"><classpathContainers></span>
<span style="color: #062873; font-weight: bold"><classpathContainer></span>org.eclipse.jdt.launching.JRE_CONTAINER<span style="color: #062873; font-weight: bold"></classpathContainer></span>
<span style="color: #062873; font-weight: bold"><classpathContainer></span>ch.epfl.lamp.sdt.launching.SCALA_CONTAINER<span style="color: #062873; font-weight: bold"></classpathContainer></span>
<span style="color: #062873; font-weight: bold"></classpathContainers></span>
<span style="color: #062873; font-weight: bold"></configuration></span>
<span style="color: #062873; font-weight: bold"></plugin></span>
<span style="color: #062873; font-weight: bold"><plugin></span>
<span style="color: #062873; font-weight: bold"><groupId></span>org.apache.maven.plugins<span style="color: #062873; font-weight: bold"></groupId></span>
<span style="color: #062873; font-weight: bold"><artifactId></span>maven-shade-plugin<span style="color: #062873; font-weight: bold"></artifactId></span>
<span style="color: #062873; font-weight: bold"><version></span>1.5<span style="color: #062873; font-weight: bold"></version></span>
<span style="color: #062873; font-weight: bold"><executions></span>
<span style="color: #062873; font-weight: bold"><execution></span>
<span style="color: #062873; font-weight: bold"><phase></span>package<span style="color: #062873; font-weight: bold"></phase></span>
<span style="color: #062873; font-weight: bold"><goals></span>
<span style="color: #062873; font-weight: bold"><goal></span>shade<span style="color: #062873; font-weight: bold"></goal></span>
<span style="color: #062873; font-weight: bold"></goals></span>
<span style="color: #062873; font-weight: bold"><configuration></span>
<span style="color: #062873; font-weight: bold"><shadedArtifactAttached></span>true<span style="color: #062873; font-weight: bold"></shadedArtifactAttached></span>
<span style="color: #062873; font-weight: bold"><artifactSet></span>
<span style="color: #062873; font-weight: bold"><includes></span>
<span style="color: #062873; font-weight: bold"><include></span>*:*<span style="color: #062873; font-weight: bold"></include></span>
<span style="color: #062873; font-weight: bold"></includes></span>
<span style="color: #062873; font-weight: bold"></artifactSet></span>
<span style="color: #062873; font-weight: bold"><filters></span>
<span style="color: #062873; font-weight: bold"><filter></span>
<span style="color: #062873; font-weight: bold"><artifact></span>*:*<span style="color: #062873; font-weight: bold"></artifact></span>
<span style="color: #60a0b0; font-style: italic"><!-- it's required to overcome icorect digest exception --></span>
<span style="color: #062873; font-weight: bold"><excludes></span>
<span style="color: #062873; font-weight: bold"><exclude></span>META-INF/*.SF<span style="color: #062873; font-weight: bold"></exclude></span>
<span style="color: #062873; font-weight: bold"><exclude></span>META-INF/*.DSA<span style="color: #062873; font-weight: bold"></exclude></span>
<span style="color: #062873; font-weight: bold"><exclude></span>META-INF/*.RSA<span style="color: #062873; font-weight: bold"></exclude></span>
<span style="color: #062873; font-weight: bold"></excludes></span>
<span style="color: #062873; font-weight: bold"></filter></span>
<span style="color: #062873; font-weight: bold"></filters></span>
<span style="color: #062873; font-weight: bold"><transformers></span>
<span style="color: #60a0b0; font-style: italic"><!-- it's required to overcome 'akka.version' exception (and put Akka default configuration) --></span>
<span style="color: #062873; font-weight: bold"><transformer</span>
<span style="color: #4070a0">implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer"</span><span style="color: #062873; font-weight: bold">></span>
<span style="color: #062873; font-weight: bold"><resource></span>reference.conf<span style="color: #062873; font-weight: bold"></resource></span>
<span style="color: #062873; font-weight: bold"></transformer></span>
<span style="color: #062873; font-weight: bold"><transformer</span>
<span style="color: #4070a0">implementation="org.apache.maven.plugins.shade.resource.ManifestResourceTransformer"</span><span style="color: #062873; font-weight: bold">></span>
<span style="color: #062873; font-weight: bold"><manifestEntries></span>
<span style="color: #062873; font-weight: bold"><Main-Class></span>experiment.SimpleApp<span style="color: #062873; font-weight: bold"></Main-Class></span>
<span style="color: #062873; font-weight: bold"></manifestEntries></span>
<span style="color: #062873; font-weight: bold"></transformer></span>
<span style="color: #60a0b0; font-style: italic"><!-- and it's required to specify handler for 'hdfs' filesystem --></span>
<span style="color: #062873; font-weight: bold"><transformer</span> <span style="color: #4070a0">implementation="org.apache.maven.plugins.shade.resource.ServicesResourceTransformer"</span><span style="color: #062873; font-weight: bold">/></span>
<span style="color: #062873; font-weight: bold"><transformer</span> <span style="color: #4070a0">implementation="org.apache.maven.plugins.shade.resource.AppendingTransformer"</span><span style="color: #062873; font-weight: bold">></span>
<span style="color: #062873; font-weight: bold"><resource></span>META-INF/services/org.apache.hadoop.fs.FileSystem<span style="color: #062873; font-weight: bold"></resource></span>
<span style="color: #062873; font-weight: bold"></transformer></span>
<span style="color: #062873; font-weight: bold"></transformers></span>
<span style="color: #062873; font-weight: bold"></configuration></span>
<span style="color: #062873; font-weight: bold"></execution></span>
<span style="color: #062873; font-weight: bold"></executions></span>
<span style="color: #062873; font-weight: bold"></plugin></span>
<span style="color: #062873; font-weight: bold"></plugins></span>
<span style="color: #062873; font-weight: bold"></build></span>
<span style="color: #062873; font-weight: bold"><reporting></span>
<span style="color: #062873; font-weight: bold"><plugins></span>
<span style="color: #062873; font-weight: bold"><plugin></span>
<span style="color: #062873; font-weight: bold"><groupId></span>org.scala-tools<span style="color: #062873; font-weight: bold"></groupId></span>
<span style="color: #062873; font-weight: bold"><artifactId></span>maven-scala-plugin<span style="color: #062873; font-weight: bold"></artifactId></span>
<span style="color: #062873; font-weight: bold"><configuration></span>
<span style="color: #062873; font-weight: bold"><scalaVersion></span>${scala.version}<span style="color: #062873; font-weight: bold"></scalaVersion></span>
<span style="color: #062873; font-weight: bold"></configuration></span>
<span style="color: #062873; font-weight: bold"></plugin></span>
<span style="color: #062873; font-weight: bold"></plugins></span>
<span style="color: #062873; font-weight: bold"></reporting></span>
<span style="color: #062873; font-weight: bold"></project></span>
</pre></div>
<br />
<br />
Perhaps, you mentioned that I excluded protobuf from sprak and added next version. The reason:<br />
I got error <br />
<blockquote class="tr_bq">
<i>Exception in thread "main" java.lang.VerifyError:
class org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$AppendRequestProto overrides final method
getUnknownFields.()Lcom/google/protobuf/UnknownFieldSet;at java.lang.ClassLoader.defineClass1(Native Method)</i></blockquote>
<br />
I checked my HDP (<i>find / -name protobug*.jar</i>) and found that my Hadoop uses protobuf 2.5.1 instead of 2.4.1 (it was dependency derived from spark jar! it easy discovered with maven command <i>mvn dependency:tree -Dincludes=*protobuf*</i>)<br />
<br />
After that, finally, I was able to run Spark Job! Hurray:<br />
<!-- HTML generated using hilite.me --><br />
<div style="background: #f0f0f0; border-width: .1em .1em .1em .8em; border: solid gray; overflow: auto; padding: .2em .6em; width: auto;">
<pre style="line-height: 125%; margin: 0;"><span style="color: #888888;">java -jar SparkBegining-1.0-SNAPSHOT-shaded.jar hdfs://10.25.9.155:8020/user/hue/input.txt</span>
</pre>
</div>Anonymoushttp://www.blogger.com/profile/12572224859800644616noreply@blogger.com1tag:blogger.com,1999:blog-8382000407737271014.post-83101143723454400112014-03-20T11:18:00.000+02:002014-03-20T11:18:10.040+02:00XQuery on HadoopJava is mother language for the most of Hadoop engineers. In recent years, Python became popular, R is used by data scientist on Hadoop. Pig Latin and HiveQL is de-facto the mainstream languages for Hadoop now days. Oracle decided to not stop on that and gives possibility to write MapReduce jobs in XQuery! Unbelievable, xml-fans must be happy :)<br/><br/>
Let's review simple example.<br/><br/>
First of all, <a href="http://www.oracle.com/technetwork/database/bigdata-appliance/oracle-bigdatalite-2104726.html">Oracle BigData Lite VM</a> must be downloaded (for free, but it takes 25Gb on disk). <br/><br/>
After installation, test dataset must be create. I put 2 files to directory on HDFS /user/oracle/xquery/input with sample dataset about access to website. The example of content is:<br/>
2013-10-28T06:00:00, chrome, index.html, 200<br/>
2013-10-28T08:30:02, firefox, index.html, 200<br/>
2013-10-28T08:32:50, ie9, about.html, 200<br/>
<br/>
Next step: create XQuery script (my_xquery.xq) to process data (simple grouping by date of visiting page)<br/><br/>
<!-- HTML generated using hilite.me --><div style="background: #111111; overflow:auto;width:auto;border:solid gray;border-width:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; line-height: 125%"><span style="color: #fb660a">import</span> <span style="color: #fb660a">module</span> <span style="color: #0086d2">"oxh:text"</span><span style="color: #ffffff">;</span>
<span style="color: #fb660a; font-weight: bold">for</span> <span style="color: #fb660a">$line</span> <span style="color: #ffffff">in</span> <span style="color: #ff0086; font-weight: bold">text:collection</span><span style="color: #ffffff">(</span><span style="color: #0086d2">"/user/oracle/xquery/input/*.txt"</span><span style="color: #ffffff">)</span>
<span style="color: #fb660a; font-weight: bold">let</span> <span style="color: #fb660a">$split</span> <span style="color: #ffffff">:=</span> <span style="color: #ff0086; font-weight: bold">fn:tokenize</span><span style="color: #ffffff">(</span><span style="color: #fb660a">$line</span><span style="color: #ffffff">,</span> <span style="color: #0086d2">"\s*,\s*"</span><span style="color: #ffffff">)</span>
<span style="color: #fb660a; font-weight: bold">let</span> <span style="color: #fb660a">$time</span> <span style="color: #ffffff">:=</span> <span style="color: #ff0086; font-weight: bold">xs:dateTime</span><span style="color: #ffffff">(</span><span style="color: #fb660a">$split</span><span style="color: #ffffff">[</span><span style="color: #0086f7; font-weight: bold">1</span><span style="color: #ffffff">])</span>
<span style="color: #fb660a; font-weight: bold">let</span> <span style="color: #fb660a">$day</span> <span style="color: #ffffff">:=</span> <span style="color: #ff0086; font-weight: bold">xs:date</span><span style="color: #ffffff">(</span><span style="color: #fb660a">$time</span><span style="color: #ffffff">)</span>
<span style="color: #ffffff">group</span> <span style="color: #ffffff">by</span> <span style="color: #fb660a">$day</span>
<span style="color: #fb660a; font-weight: bold">return</span> <span style="color: #ff0086; font-weight: bold">text:put</span><span style="color: #ffffff">(</span><span style="color: #fb660a">$day</span> <span style="color: #ffffff">||</span> <span style="color: #0086d2">", "</span> <span style="color: #ffffff">||</span> <span style="color: #ff0086; font-weight: bold">fn:count</span><span style="color: #ffffff">(</span><span style="color: #fb660a">$line</span><span style="color: #ffffff">))</span>
</pre></div>
<br/><br/>
Now script is ready to be run, execute from command line:
<!-- HTML generated using hilite.me --><div style="background: #f8f8f8; overflow:auto;width:auto;border:solid gray;border-width:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; line-height: 125%">hadoop jar <span style="color: #B8860B">$OXH_HOME</span>/lib/oxh.jar my_xquery.xq -output /user/oracle/xquery/output -clean -ls
</pre></div>
<br/>
Options: <br/>
-output specify output directory<br/>
-clean remove output directory if exists<br/>
-ls list the content of output directory after run<br/>
<br/>
Here is the result:<br/>
<div class="separator" style="clear: both; text-align: center;"><a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEheOEb_aW8YdE0RHdRvf39WGxDKBJ1z9-zPVTn7BKbjMu4MkShL3aZ8kugxKnqLcz4SZqIVCIZjELgYI2bx5wZg-KrVq-RxbI-KAfhNjQbDBeoEfN34nUasyXC33pQ5DBpR397iOZJ0mzdJ/s1600/oraclehadoop.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEheOEb_aW8YdE0RHdRvf39WGxDKBJ1z9-zPVTn7BKbjMu4MkShL3aZ8kugxKnqLcz4SZqIVCIZjELgYI2bx5wZg-KrVq-RxbI-KAfhNjQbDBeoEfN34nUasyXC33pQ5DBpR397iOZJ0mzdJ/s400/oraclehadoop.png" /></a></div>
<br/><br/>
That's it, XQuery was translated to MapReduce (similar to Pig Latin or HiveQL). This functionality is the part of Oracle BigData Connectors for Hadoop and more information with examples might be <a href="http://docs.oracle.com/cd/E49465_01/doc.23/e49333/oxh.htm#BDCUG526">read here</a> Anonymoushttp://www.blogger.com/profile/12572224859800644616noreply@blogger.com0tag:blogger.com,1999:blog-8382000407737271014.post-34932689017456190372014-02-19T21:22:00.000+02:002014-02-19T21:30:16.575+02:00How to write good unit test for Hadoop MapReduce?Without a doubt, there is avery common situation when UnitTest (or IntegrationTest) is required to test functionality of MapReduce job. This approach perfect fit TDD, moreover, it gives opportunity to develop MapReduce jobs faster, because there is no needs to redeploy jar on a cluster each time and debugging is easy to use.<br />
<br />
The first line of defence is MRUnit. Great framework for unit testing, input/output format independent with possibility to run/test map and reduce functions separately. Unfortunately, this framework has a several meaningful drawbacks. For example, no access to MR counters, or during the MR test only one Mapper allowed.<br />
<br />
Local execution mode may be used to overcome MRUnit limitations or create integration test for mapreduce job. Let's assume there is runnable MapReduce tool with several input sources (mappers) and reducer:<br />
<br />
<!-- HTML generated using hilite.me --><div style="background: #111111; overflow:auto;width:auto;border:solid gray;border-width:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; line-height: 125%"><span style="color: #fb660a; font-weight: bold">public</span> <span style="color: #fb660a; font-weight: bold">class</span> <span style="color: #ffffff">ExampleMrDriver</span> <span style="color: #fb660a; font-weight: bold">extends</span> <span style="color: #ffffff">Configured</span> <span style="color: #fb660a; font-weight: bold">implements</span> <span style="color: #ffffff">Tool</span> <span style="color: #ffffff">{</span>
<span style="color: #fb660a; font-weight: bold">public</span> <span style="color: #ffffff">Job</span> <span style="color: #ff0086; font-weight: bold">createMRJob</span><span style="color: #ffffff">(Configuration</span> <span style="color: #ffffff">conf)</span> <span style="color: #fb660a; font-weight: bold">throws</span> <span style="color: #ffffff">IOException</span> <span style="color: #ffffff">{...}</span>
<span style="color: #ffffff">@Override</span>
<span style="color: #fb660a; font-weight: bold">public</span> <span style="color: #cdcaa9; font-weight: bold">int</span> <span style="color: #ff0086; font-weight: bold">run</span><span style="color: #ffffff">(String[]</span> <span style="color: #ffffff">strings)</span> <span style="color: #fb660a; font-weight: bold">throws</span> <span style="color: #ffffff">Exception</span> <span style="color: #ffffff">{</span>
<span style="color: #ffffff">Configuration</span> <span style="color: #ffffff">conf</span> <span style="color: #ffffff">=</span> <span style="color: #ffffff">getConf();</span>
<span style="color: #ffffff">Job</span> <span style="color: #ffffff">job</span> <span style="color: #ffffff">=</span> <span style="color: #ffffff">createMRJob(conf);</span>
<span style="color: #fb660a; font-weight: bold">return</span> <span style="color: #ffffff">job.</span><span style="color: #ff0086; font-weight: bold">waitForCompletion</span><span style="color: #ffffff">(</span><span style="color: #fb660a; font-weight: bold">true</span><span style="color: #ffffff">)</span> <span style="color: #ffffff">?</span> <span style="color: #0086f7; font-weight: bold">0</span> <span style="color: #ffffff">:</span> <span style="color: #ffffff">-</span><span style="color: #0086f7; font-weight: bold">1</span><span style="color: #ffffff">;</span>
<span style="color: #ffffff">}</span>
<span style="color: #fb660a; font-weight: bold">public</span> <span style="color: #fb660a; font-weight: bold">static</span> <span style="color: #cdcaa9; font-weight: bold">void</span> <span style="color: #ff0086; font-weight: bold">main</span><span style="color: #ffffff">(String[]</span> <span style="color: #ffffff">args)</span> <span style="color: #ffffff">{</span>
<span style="color: #fb660a; font-weight: bold">try</span> <span style="color: #ffffff">{</span>
<span style="color: #008800; font-style: italic; background-color: #0f140f">// run job in a Oozie-friendly manner</span>
<span style="color: #cdcaa9; font-weight: bold">int</span> <span style="color: #ffffff">status</span> <span style="color: #ffffff">=</span> <span style="color: #ffffff">ToolRunner.</span><span style="color: #ff0086; font-weight: bold">run</span><span style="color: #ffffff">(</span><span style="color: #fb660a; font-weight: bold">new</span> <span style="color: #ffffff">ExampleMrDriver(),</span> <span style="color: #ffffff">args);</span>
<span style="color: #fb660a; font-weight: bold">if</span><span style="color: #ffffff">(status!=</span><span style="color: #0086f7; font-weight: bold">0</span><span style="color: #ffffff">)</span> <span style="color: #ffffff">{</span>
<span style="color: #ffffff">System.</span><span style="color: #ff0086; font-weight: bold">exit</span><span style="color: #ffffff">(status);</span>
<span style="color: #ffffff">}</span>
<span style="color: #ffffff">}</span> <span style="color: #fb660a; font-weight: bold">catch</span> <span style="color: #ffffff">(Exception</span> <span style="color: #ffffff">e)</span> <span style="color: #ffffff">{</span>
<span style="color: #ffffff">e.</span><span style="color: #ff0086; font-weight: bold">printStackTrace</span><span style="color: #ffffff">();</span>
<span style="color: #ffffff">System.</span><span style="color: #ff0086; font-weight: bold">exit</span><span style="color: #ffffff">(</span><span style="color: #0086f7; font-weight: bold">1</span><span style="color: #ffffff">);</span>
<span style="color: #ffffff">}</span>
<span style="color: #ffffff">}</span>
<span style="color: #ffffff">}</span>
</pre></div>
<br />
<br />
Nice integration test (or unit, call and use it as you like) for this Hadoop MapReduce a listed bellow:<br />
<br />
<!-- HTML generated using hilite.me --><div style="background: #111111; overflow:auto;width:auto;border:solid gray;border-width:.1em .1em .1em .8em;padding:.2em .6em;"><pre style="margin: 0; line-height: 125%"><span style="color: #fb660a; font-weight: bold">private</span> <span style="color: #ffffff">String</span> <span style="color: #ffffff">outputDir;</span>
<span style="color: #ffffff">@BeforeClass</span>
<span style="color: #fb660a; font-weight: bold">public</span> <span style="color: #cdcaa9; font-weight: bold">void</span> <span style="color: #ff0086; font-weight: bold">createTmpDir</span><span style="color: #ffffff">()</span> <span style="color: #fb660a; font-weight: bold">throws</span> <span style="color: #ffffff">IOException</span> <span style="color: #ffffff">{</span>
<span style="color: #ffffff">outputDir</span> <span style="color: #ffffff">=</span> <span style="color: #ffffff">System.</span><span style="color: #ff0086; font-weight: bold">getProperty</span><span style="color: #ffffff">(</span><span style="color: #0086d2">"java.io.tmpdir"</span><span style="color: #ffffff">);</span> <span style="color: #ffffff">+</span> <span style="color: #0086d2">"output"</span><span style="color: #ffffff">;</span>
<span style="color: #ffffff">}</span>
<span style="color: #ffffff">@Test</span>
<span style="color: #fb660a; font-weight: bold">public</span> <span style="color: #cdcaa9; font-weight: bold">void</span> <span style="color: #ff0086; font-weight: bold">test</span><span style="color: #ffffff">()</span> <span style="color: #fb660a; font-weight: bold">throws</span> <span style="color: #ffffff">Exception</span> <span style="color: #ffffff">{</span>
<span style="color: #ffffff">JobConf</span> <span style="color: #ffffff">jobConf</span> <span style="color: #ffffff">=</span> <span style="color: #fb660a; font-weight: bold">new</span> <span style="color: #ffffff">JobConf();</span>
<span style="color: #ffffff">jobConf.</span><span style="color: #ff0086; font-weight: bold">set</span><span style="color: #ffffff">(</span><span style="color: #0086d2">"fs.default.name"</span><span style="color: #ffffff">,</span> <span style="color: #0086d2">"file:///"</span><span style="color: #ffffff">);</span>
<span style="color: #ffffff">jobConf.</span><span style="color: #ff0086; font-weight: bold">set</span><span style="color: #ffffff">(</span><span style="color: #0086d2">"mapred.job.tracker"</span><span style="color: #ffffff">,</span> <span style="color: #0086d2">"local"</span><span style="color: #ffffff">);</span> <span style="color: #008800; font-style: italic; background-color: #0f140f">// local mode</span>
<span style="color: #ffffff">jobConf.</span><span style="color: #ff0086; font-weight: bold">set</span><span style="color: #ffffff">(</span><span style="color: #0086d2">"mapred.reduce.task"</span><span style="color: #ffffff">,</span> <span style="color: #0086d2">"1"</span><span style="color: #ffffff">);</span> <span style="color: #008800; font-style: italic; background-color: #0f140f">// only one file is required in output</span>
<span style="color: #008800; font-style: italic; background-color: #0f140f">// create file w/ input content per mapper in test/resource folder</span>
<span style="color: #ffffff">jobConf.</span><span style="color: #ff0086; font-weight: bold">set</span><span style="color: #ffffff">(</span><span style="color: #0086d2">"input.dir.2"</span><span style="color: #ffffff">,</span> <span style="color: #fb660a; font-weight: bold">this</span><span style="color: #ffffff">.</span><span style="color: #ff0086; font-weight: bold">getClass</span><span style="color: #ffffff">().</span><span style="color: #ff0086; font-weight: bold">getResource</span><span style="color: #ffffff">(</span><span style="color: #0086d2">"/mr/inpu1"</span><span style="color: #ffffff">).</span><span style="color: #ff0086; font-weight: bold">getPath</span><span style="color: #ffffff">());</span>
<span style="color: #ffffff">jobConf.</span><span style="color: #ff0086; font-weight: bold">set</span><span style="color: #ffffff">(</span><span style="color: #0086d2">"input.dir.1"</span><span style="color: #ffffff">,</span> <span style="color: #fb660a; font-weight: bold">this</span><span style="color: #ffffff">.</span><span style="color: #ff0086; font-weight: bold">getClass</span><span style="color: #ffffff">().</span><span style="color: #ff0086; font-weight: bold">getResource</span><span style="color: #ffffff">(</span><span style="color: #0086d2">"/mr/input2"</span><span style="color: #ffffff">).</span><span style="color: #ff0086; font-weight: bold">getPath</span><span style="color: #ffffff">());</span>
<span style="color: #ffffff">jobConf.</span><span style="color: #ff0086; font-weight: bold">set</span><span style="color: #ffffff">(</span><span style="color: #0086d2">"input.dir.3"</span><span style="color: #ffffff">,</span> <span style="color: #fb660a; font-weight: bold">this</span><span style="color: #ffffff">.</span><span style="color: #ff0086; font-weight: bold">getClass</span><span style="color: #ffffff">().</span><span style="color: #ff0086; font-weight: bold">getResource</span><span style="color: #ffffff">(</span><span style="color: #0086d2">"/mr/input3"</span><span style="color: #ffffff">).</span><span style="color: #ff0086; font-weight: bold">getPath</span><span style="color: #ffffff">());</span>
<span style="color: #008800; font-style: italic; background-color: #0f140f">// expected output will be placed here</span>
<span style="color: #ffffff">jobConf.</span><span style="color: #ff0086; font-weight: bold">set</span><span style="color: #ffffff">(</span><span style="color: #0086d2">"output.dir"</span><span style="color: #ffffff">,</span> <span style="color: #ffffff">outputDir);</span>
<span style="color: #ffffff">ExampleMrDriver</span> <span style="color: #ffffff">driver</span> <span style="color: #ffffff">=</span> <span style="color: #fb660a; font-weight: bold">new</span> <span style="color: #ffffff">ExampleMrDriver();</span>
<span style="color: #ffffff">driver.</span><span style="color: #ff0086; font-weight: bold">setConf</span><span style="color: #ffffff">(jobConf);</span>
<span style="color: #cdcaa9; font-weight: bold">int</span> <span style="color: #ffffff">exitCode</span> <span style="color: #ffffff">=</span> <span style="color: #ffffff">driver.</span><span style="color: #ff0086; font-weight: bold">run</span><span style="color: #ffffff">(</span><span style="color: #fb660a; font-weight: bold">new</span> <span style="color: #ffffff">String[]{});</span>
<span style="color: #ffffff">Assert.</span><span style="color: #ff0086; font-weight: bold">assertEquals</span><span style="color: #ffffff">(</span><span style="color: #0086f7; font-weight: bold">0</span><span style="color: #ffffff">,</span> <span style="color: #ffffff">exitCode);</span>
<span style="color: #008800; font-style: italic; background-color: #0f140f">// check content of output file, counters, etc</span>
<span style="color: #ffffff">}</span>
<span style="color: #ffffff">@AfterClass</span>
<span style="color: #fb660a; font-weight: bold">public</span> <span style="color: #cdcaa9; font-weight: bold">void</span> <span style="color: #ff0086; font-weight: bold">tearDown</span><span style="color: #ffffff">()</span> <span style="color: #fb660a; font-weight: bold">throws</span> <span style="color: #ffffff">IOException</span> <span style="color: #ffffff">{</span>
<span style="color: #fb660a; font-weight: bold">new</span> <span style="color: #ff0086; font-weight: bold">File</span><span style="color: #ffffff">(outputDir).</span><span style="color: #ff0086; font-weight: bold">delete</span><span style="color: #ffffff">();</span>
<span style="color: #ffffff">}</span>
</pre></div>
Anonymoushttp://www.blogger.com/profile/12572224859800644616noreply@blogger.com0tag:blogger.com,1999:blog-8382000407737271014.post-33171581306384662602014-01-16T10:41:00.002+02:002014-01-16T10:41:43.300+02:00Predicted Age of Abalone based on physical measurements<a href="http://archive.ics.uci.edu/ml/datasets/Abalone">Abalone </a>dataset is freely available at UCI Machine Learning Repository since 1995. It contains result of abalone research in Australia. Predicting the age of abalone from physical measurements. The age of abalone is determined by cutting the shell through the cone, staining it, and counting the number of rings through a microscope - a boring and time-consuming task. Other measurements, which are easier to obtain, are used to predict the age. Definitely, the task is more complex in the real conditions and<span style="color: #123654; font-family: Arial, Helvetica, sans-serif; font-size: 13px;"> </span>further information, such as weather patterns and location (hence food availability) may be required to solve the problem. <br />
<div>
<br /></div>
<div>
So, Age ~ Rings and must be predicted from the set of different measures as Diameter, Weight, Height, Length, etc. It is supervised learning task, because of the dataset with relation Result~Features is provided. Simple check shows numbers of rings from 1 to 29 and it is huge range for classification. Another supervised learning algorithm is a linear regression. </div>
<div>
<br /></div>
<div>
EDA (exploratory data analysis) is a first step before building any model and there is the code for loading dataset into memory and plotting several relations, for example Rings~Diameter</div>
<!-- HTML generated using hilite.me --><br />
<div style="background: #f8f8f8; border-width: .1em .1em .1em .8em; border: solid gray; overflow: auto; padding: .2em .6em; width: auto;">
<pre style="line-height: 125%; margin: 0;">library(ggplot2)
<span style="color: #008800; font-style: italic;"># read dataset from local file</span>
abalone <span style="color: #666666;"><-</span> read.csv(<span style="color: #bb4444;">"/Users/kostya/Downloads/abalone.data.csv"</span>, header<span style="color: #666666;">=</span><span style="color: #aa22ff; font-weight: bold;">F</span>)
<span style="color: #008800; font-style: italic;"># set names for dataframe columns</span>
colnames(abalone) <span style="color: #666666;"><-</span> c(<span style="color: #bb4444;">'Sex'</span>, <span style="color: #bb4444;">'Length'</span>, <span style="color: #bb4444;">'Diameter'</span>, <span style="color: #bb4444;">'Height'</span>, <span style="color: #bb4444;">'WholeWeight'</span>, <span style="color: #bb4444;">'ShuckedWeight'</span>,
<span style="color: #bb4444;">'VisceraWeight'</span>, <span style="color: #bb4444;">'ShellWeight'</span>, <span style="color: #bb4444;">'Rings'</span>)
<span style="color: #008800; font-style: italic;"># plot histogram</span>
hist(abalone<span style="color: #666666;">$</span>Rings, freq<span style="color: #666666;">=</span><span style="color: #aa22ff; font-weight: bold;">F</span>)
<span style="color: #008800; font-style: italic;"># depicture all charts on one plot</span>
qplot(Diameter, Rings, data<span style="color: #666666;">=</span>abalone, geom<span style="color: #666666;">=</span>c(<span style="color: #bb4444;">"point"</span>, <span style="color: #bb4444;">"smooth"</span>), method<span style="color: #666666;">=</span><span style="color: #bb4444;">"lm"</span>, color<span style="color: #666666;">=</span>Sex, se<span style="color: #666666;">=</span><span style="color: #aa22ff; font-weight: bold;">F</span>)
</pre>
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgrES66lLaDhmESLTTXCAV-_OhfFIsCwQq96KsMXo3qRX7yPKEotJJSMdULWd0Ne-sMCmw7f2SAXy4BIzwH1D-O4YuUMASErhlP8TW6M9_Tihwsgi0LOY3BFylbGUq2P4aUYj2LXlBqSWDJ/s1600/abalone_diameter_rings.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEgrES66lLaDhmESLTTXCAV-_OhfFIsCwQq96KsMXo3qRX7yPKEotJJSMdULWd0Ne-sMCmw7f2SAXy4BIzwH1D-O4YuUMASErhlP8TW6M9_Tihwsgi0LOY3BFylbGUq2P4aUYj2LXlBqSWDJ/s1600/abalone_diameter_rings.png" height="335" width="400" /></a></div>
<div>
<br />
<br />
This image (as well as other relations like Rings~WholeWeight, etc) shows pretty well difference relations for each sex and the first thought is to apply different regression for each 'sex' or use 'sex' as a factor.</div>
<div>
<br /></div>
<div>
For example, go on with different regression models, we need to construct formula by investigating each relations. For example, there is Rings~WholeWeight relation </div>
<!-- HTML generated using hilite.me --><br />
<div style="background: #f8f8f8; border-width: .1em .1em .1em .8em; border: solid gray; overflow: auto; padding: .2em .6em; width: auto;">
<pre style="line-height: 125%; margin: 0;"><span style="color: #008800; font-style: italic;"># plot each sex on different plot</span>
ggplot(abalone, aes(VisceraWeight, Rings)) <span style="color: #666666;">+</span>
geom_jitter(alpha<span style="color: #666666;">=0.25</span>) <span style="color: #666666;">+</span>
geom_smooth(method<span style="color: #666666;">=</span>lm, se<span style="color: #666666;">=</span><span style="color: #aa22ff; font-weight: bold;">FALSE</span>) <span style="color: #666666;">+</span>
facet_grid(. <span style="color: #666666;">~</span> Sex)
</pre>
</div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiP6s7GwwCu013kBv6TSQ0QkZpscGyyCONbKEC8q_UE-dayPp0yYM_H54Fhr91DD49nRwiwDml2XNcNY9mDP0POot7FjYR0C7IwJJ7w_rXoPyuJj8NVZ7W_DIVfb8bykocu4JC-3JsJD4q-/s1600/abalone_wholeweight_rings.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEiP6s7GwwCu013kBv6TSQ0QkZpscGyyCONbKEC8q_UE-dayPp0yYM_H54Fhr91DD49nRwiwDml2XNcNY9mDP0POot7FjYR0C7IwJJ7w_rXoPyuJj8NVZ7W_DIVfb8bykocu4JC-3JsJD4q-/s1600/abalone_wholeweight_rings.png" height="335" width="400" /></a></div>
<div>
<br />
<br />
Obvious, that for Male and Infant relations has logarithmic trend and it will be logically to add 'log' in formula. </div>
<div>
<br /></div>
<!-- HTML generated using hilite.me --><br />
<div style="background: #f8f8f8; border-width: .1em .1em .1em .8em; border: solid gray; overflow: auto; padding: .2em .6em; width: auto;">
<pre style="line-height: 125%; margin: 0;">summary(lm(Rings<span style="color: #666666;">~</span>Length<span style="color: #666666;">+</span>I(Diameter<span style="color: #666666;">^2</span>)<span style="color: #666666;">+</span>log(WholeWeight)<span style="color: #666666;">+</span>log(ShellWeight)<span style="color: #666666;">+</span>log(ShuckedWeight)
<span style="color: #666666;">+</span>Height<span style="color: #666666;">+</span>VisceraWeight, data<span style="color: #666666;">=</span>subset(abalone, Sex <span style="color: #666666;">%in%</span> <span style="color: #bb4444;">'I'</span>)) )
summary(lm(Rings<span style="color: #666666;">~</span>Length<span style="color: #666666;">+</span>I(Diameter<span style="color: #666666;">^2</span>)<span style="color: #666666;">+</span>log(WholeWeight)<span style="color: #666666;">+</span>log(ShellWeight)<span style="color: #666666;">+</span>ShuckedWeight
<span style="color: #666666;">+</span>Height<span style="color: #666666;">+</span>VisceraWeight, data<span style="color: #666666;">=</span>subset(abalone, Sex <span style="color: #666666;">%in%</span> <span style="color: #bb4444;">'M'</span>)) )
summary(lm(Rings<span style="color: #666666;">~</span>Length<span style="color: #666666;">+</span>I(Diameter<span style="color: #666666;">^2</span>)<span style="color: #666666;">+</span>WholeWeight<span style="color: #666666;">+</span>ShellWeight<span style="color: #666666;">+</span>ShuckedWeight
<span style="color: #666666;">+</span>Height<span style="color: #666666;">+</span>VisceraWeight, data<span style="color: #666666;">=</span>subset(abalone, Sex <span style="color: #666666;">%in%</span> <span style="color: #bb4444;">'F'</span>)) )
</pre>
</div>
<br />
<div>
<br /></div>
As result the next formula may be constructed to predict number of rings for Infant based on coefficient of linear regression:<br />
<i><span style="color: #666666;">Rings= 8.5398 - 7.6755*Length + 8.7707*Diameter^2 + 1.4837*log(WholeWeight) + 2.0745*log((ShellWeight) -2.3415*log(ShuckedWeight) + 27.8275*Height + 5.9972*VisceraWeight</span></i><br />
<br />
As was mentioned in task description Age=Rings+1.5Anonymoushttp://www.blogger.com/profile/12572224859800644616noreply@blogger.com0