But it still have one issue which I'm facing on every project and which must be resolver every time: I'm talking about publicity of this information, everyone how can reach the port (defaults, 8080 or 4040) can then access UI, and all information there (and there are a lot of stuff you want to keep private).
There are several solution to deal with it:
- Close all ports and configure nginx to listen specific port and forward requests (of course w/ basic authentication)
- protect UI using Spark's built-in method: implementing own filter
In this post, let's start from How to protect Spark UI with NGINX?
The instruction below is suitable for protecting standalone spark Web UI when job is executed in client mode (so you can predict where driver is up and run).
Let's assume that there is a node with both spark and nginx installed (obviously they can be on different nodes).
First of all, close all spark related ports (and there are a lot of them): they must be still accessible in-network. In Amazon, it easy to do with security groups: just specify appropriate CIDR mask for each inbound rule, for instance 172.16.0.0/12. Next, open 2 ports not used by Spark, but which you're going to make accessible to get into spark master ui or spark driver ui: just for example let's assume it's 2020 and 2020.
Now the small part left: configure nginx to perform basic auth and forward requests to Spark UI. In this case nginx is in provate network, so request will be handled by Spark and UI actually presented to end user.
Before configuring nginx itself, the file to keep proper configuration must be created:
It's simple to do with htpasswd tool, can be installed by running sudo yum install -y httpd-tools
Then generate password and store it into a file (user name will be spark and passowrd entered in CLI):
sudo htpasswd -c /etc/nginx/.htpasswd spark
Last step is to create proper nginx configuration (the eample is only to forward all request on Spark Master 8080 to 2000):
vi /etc/nginx/nginx2001.conf
{
events {
worker_connections 1000;
}
server {
listen 2020;
auth_basic "Private Beta"; auth_basic_user_file /etc/nginx/.htpasswd; |
location / {
proxy_pass http://localhost:8080;
}
}
}
Actually, that's it. After that we just need to start nginx
nginx -c /etc/nginx/nginx2001.conf
And point prowser to HOST:2020 to be asked enter credentials and only after that be redirected to Spark Master UI.
This definitely will not work if you'll try to use location directive that differs from "/".
ВідповістиВидалитиFor example "location /something" will not work with Spark UI in your example
I greatly appreciate your posting.
ВідповістиВидалитиapple service center chennai | Mac service center in chennai | ipod service center in chennai | Apple laptop service center in chennai