ncn-images
S3 BucketThe configuration documented in this procedure is intended to prevent user-facing dedicated nodes (UANs, Compute Nodes) from retrieving NCN image content from Ceph S3 services, as running on storage nodes.
Specifically, the controls enacted via this procedure should do the following:
ncn-images
bucket if the client is not an NCN (NMN) or PXE booting from the MTL network. This via a HAProxy ACL on the storage servers.iptables
rules on the storage servers.This is not designed to prevent UAIs (if in use) from retrieving NCN image content.
If a storage node is rebuilt, this procedure (for the rebuilt node) will need to be applied after the rebuild. The same is true if NCNs are added or removed from the system as it will change source IP ranges for clients.
Procedure should be executed after install or upgrade is otherwise complete, but prior to opening the system for user access.
Unless otherwise noted, the procedure should be run from ncn-m001
(not PIT).
This procedure was back-ported from CSM 1.2 and was tested on a CSM 1.0.11 system.
Test connectivity before applying the ACL.
Save the following script to a file (for example, con_test.sh
).
#!/bin/bash
SNCNS="$(grep 'ncn-s.*\.nmn' /etc/hosts | awk '{print $NF;}' | xargs)"
SCSNS_NMN="$(echo $SNCNS | xargs -n 1 | sed -e 's/$/.nmn/g')"
SCSNS_HMN="$(echo $SNCNS | xargs -n 1 | sed -e 's/$/.hmn/g')"
SCSNS_CAN="$(echo $SNCNS | xargs -n 1 | sed -e 's/$/.can/g')"
RADOS_HTTP_PORT="8080"
HAPROXY_HTTP_PORT="80"
HAPROXY_HTTPSPORT="443"
PASS="PASS"
FAIL="FAIL"
function rados_test
{
NODES="$1"
MSG="$2"
TTYPE="$3"
echo "[i] $MSG"
for n in $NODES
do
echo -n " RADOS $n: "
if [ "$TTYPE" == "CONN_FAIL" ]
then
curl -sI --connect-timeout 2 http://${n}:${RADOS_HTTP_PORT}/ &> /dev/null
rc=$?
rc_pass=28
else
curl -I --connect-timeout 2 http://${n}:${RADOS_HTTP_PORT}/ 2>/dev/null | grep -q "200 OK"
rc=$?
rc_pass=0
fi
if [ $rc -eq $rc_pass ]
then
echo $PASS
else
echo $FAIL
fi
done
}
function haproxy_test
{
NODES="$1"
MSG="$2"
echo "[i] $MSG"
for n in $NODES
do
echo -n " HAPROXY (CEPH) HTTP $n: "
curl -I --connect-timeout 2 http://${n}:${HAPROXY_HTTP_PORT}/ncn-images/ 2>/dev/null | grep -q "x-amz-request-id"
if [ $? -eq 0 ]
then
echo $PASS
else
echo $FAIL
fi
echo -n " HAPROXY (CEPH) HTTPS $n: "
curl -kI --connect-timeout 2 https://${n}:${HAPROXY_HTTPS_PORT}/ncn-images/ 2>/dev/null | grep -q "x-amz-request-id"
if [ $? -eq 0 ]
then
echo $PASS
else
echo $FAIL
fi
done
}
rados_test "$SCSNS_NMN" "MGMT RADOS over NMN"
rados_test "$SCSNS_HMN" "MGMT RADOS over HMN"
rados_test "$SCSNS_CAN" "MGMT RADOS over CAN" "CONN_FAIL"
haproxy_test "$SCSNS_NMN" "MGMT HAProxy over NMN"
Execute the script, if the ACLs have not been applied, results similar to the following will be returned (failures are expected):
[i] MGMT RADOS over NMN
RADOS ncn-s003.nmn: PASS
RADOS ncn-s002.nmn: PASS
RADOS ncn-s001.nmn: PASS
[i] MGMT RADOS over HMN
RADOS ncn-s003.hmn: PASS
RADOS ncn-s002.hmn: PASS
RADOS ncn-s001.hmn: PASS
[i] MGMT RADOS over CAN
RADOS ncn-s003.can: FAIL
RADOS ncn-s002.can: FAIL
RADOS ncn-s001.can: FAIL
[i] MGMT HAProxy over NMN
HAPROXY (CEPH) HTTP ncn-s003.nmn: PASS
HAPROXY (CEPH) HTTPS ncn-s003.nmn: PASS
HAPROXY (CEPH) HTTP ncn-s002.nmn: PASS
HAPROXY (CEPH) HTTPS ncn-s002.nmn: PASS
HAPROXY (CEPH) HTTP ncn-s001.nmn: PASS
HAPROXY (CEPH) HTTPS ncn-s001.nmn: PASS
Build an IP address list of NCNs on the NMN.
Cross-check to verify the count seems appropriate for the system in use.
ncn-m001# grep 'ncn-[mws].*.nmn' /etc/hosts | awk '{print $1;}' | sed -e 's/\./ /g' | sort -nk 4 | sed -e 's/ /\./g' | tee allowed_ncns.lst
10.252.1.4
10.252.1.5
10.252.1.6
10.252.1.7
10.252.1.8
10.252.1.9
10.252.1.10
10.252.1.11
10.252.1.12
10.252.1.13
10.252.1.14
Add the MTL subnet (needed for network boots of NCNs).
ncn-m001# echo '10.1.0.0/16' >> allowed_ncns.lst
Verify the allowed_ncns.lst
contains contain NMN addresses for all management NCNs nodes and the MTL subnet (10.1.0.0/16).
ncn-m001# cat allowed_ncns.lst
10.252.1.4
10.252.1.5
10.252.1.6
10.252.1.7
10.252.1.8
10.252.1.9
10.252.1.10
10.252.1.11
10.252.1.12
10.252.1.13
10.252.1.14
10.1.0.0/16
Confirm HAProxy configurations are identical across storage nodes.
Adjust the -w
predicate to represent the full set of storage nodes for the system. Applies to this step and subsequent steps.
ncn-m001# pdsh -w ncn-s00[1-4] "cat /etc/haproxy/haproxy.cfg" | dshbak -c
----------------
ncn-s[001-004]
----------------
# Please do not change this file directly since it is managed by Ansible and will be overwritten
global
log 127.0.0.1 local2
chroot /var/lib/haproxy
pidfile /var/run/haproxy.pid
maxconn 8000
user haproxy
group haproxy
daemon
stats socket /var/lib/haproxy/stats
tune.ssl.default-dh-param 4096
ssl-default-bind-ciphers EECDH+AESGCM:EDH+AESGCM
ssl-default-bind-options no-sslv3 no-tlsv10 no-tlsv11 no-tls-tickets
defaults
mode http
log global
option httplog
option dontlognull
option http-server-close
option forwardfor except 127.0.0.0/8
option redispatch
retries 3
timeout http-request 10s
timeout queue 1m
timeout connect 10s
timeout client 1m
timeout server 1m
timeout http-keep-alive 10s
timeout check 10s
maxconn 8000
frontend http-rgw-frontend
bind *:80
default_backend rgw-backend
frontend https-rgw-frontend
bind *:443 ssl crt /etc/ceph/rgw.pem
default_backend rgw-backend
backend rgw-backend
option forwardfor
balance static-rr
option httpchk GET /
server server-ncn-s001-rgw0 10.252.1.7:8080 check weight 100
server server-ncn-s002-rgw0 10.252.1.6:8080 check weight 100
server server-ncn-s003-rgw0 10.252.1.5:8080 check weight 100
server server-ncn-s004-rgw0 10.252.1.4:8080 check weight 100
Create a backup of haproxy.cfg
files on storage nodes.
ncn-m001# pdsh -w ncn-s00[1-4] "cp /etc/haproxy/haproxy.cfg /etc/haproxy/haproxy.cfg-dist"
Grab a copy of haproxy.cfg
to modify from a storage node, preserving permissions.
ncn-m001# scp -p ncn-s001:/etc/haproxy/haproxy.cfg .
haproxy.cfg
Edit the haproxy.cfg
, adding in the following ACLs and log directives to each front-end (a diff shown to illustrate changes necessary).
ncn-m001# diff -Naur haproxy.cfg-dist haproxy.cfg
--- haproxy.cfg-dist 2022-06-30 18:20:55.000000000 +0000
+++ haproxy.cfg 2022-07-07 16:56:40.000000000 +0000
@@ -1,6 +1,6 @@
# Please do not change this file directly since it is managed by Ansible and will be overwritten
global
- log 127.0.0.1 local2
+ log 127.0.0.1:514 local0 info
chroot /var/lib/haproxy
pidfile /var/run/haproxy.pid
@@ -31,12 +31,22 @@
maxconn 8000
frontend http-rgw-frontend
+ log global
+ option httplog
bind *:80
default_backend rgw-backend
+ acl allow_ncns src -n -f /etc/haproxy/allowed_ncns.lst
+ acl restrict_ncn_images path_beg /ncn-images
+ http-request deny if restrict_ncn_images !allow_ncns
frontend https-rgw-frontend
+ log global
+ option httplog
bind *:443 ssl crt /etc/ceph/rgw.pem
default_backend rgw-backend
+ acl allow_ncns src -n -f /etc/haproxy/allowed_ncns.lst
+ acl restrict_ncn_images path_beg /ncn-images
+ http-request deny if restrict_ncn_images !allow_ncns
backend rgw-backend
option forwardfor
Create a new rsyslog
configuration for HAProxy to have it listen to UDP 514 on the local host.
With the log directive additions to HAProxy, and allowing a local host UDP 514 socket, access logging should work properly. Set permissions to 640 on the file.
ncn-m001# cat haproxy.conf
# Collect log with UDP
$ModLoad imudp
$UDPServerAddress 127.0.0.1
$UDPServerRun 514
ncn-m001# chmod 0640 haproxy.conf
Make sure HAProxy is running on storage nodes.
ncn-m001# pdsh -w ncn-s00[1-4] "systemctl status haproxy" | grep "Active"
ncn-s001: Active: active (running) since Thu 2022-07-07 17:38:49 UTC; 54min ago
ncn-s003: Active: active (running) since Thu 2022-07-07 17:38:49 UTC; 54min ago
ncn-s002: Active: active (running) since Thu 2022-07-07 17:38:49 UTC; 54min ago
ncn-s004: Active: active (running) since Thu 2022-07-07 17:38:49 UTC; 54min ago
Determine where the HAProxy VIP currently resides (for awareness in the event debug is necessary).
ncn-m001# host rgw-vip
rgw-vip.nmn has address 10.252.1.3
ncn-m001# host rgw-vip.nmn
rgw-vip.nmn has address 10.252.1.3
ncn-m001# host 10.252.1.3
3.1.252.10.in-addr.arpa domain name pointer rgw-vip.
3.1.252.10.in-addr.arpa domain name pointer rgw-vip.local.
3.1.252.10.in-addr.arpa domain name pointer rgw-vip.local.local.
3.1.252.10.in-addr.arpa domain name pointer rgw-vip.nmn.
3.1.252.10.in-addr.arpa domain name pointer rgw-vip.nmn.local.
ncn-m001# ssh rgw-vip 'hostname'
ncn-s001
Propagate the rsyslog
configuration out to all storage nodes.
ncn-m001# pdcp -w ncn-s00[1-4] haproxy.conf /etc/rsyslog.d/
Propagate the HAProxy configuration out to all storage nodes.
ncn-m001# pdcp -w ncn-s00[1-4] haproxy.cfg allowed_ncns.lst /etc/haproxy/
Verify the configurations are identical across storage nodes.
ncn-m001# pdsh -w ncn-s00[1-4] "cat /etc/haproxy/haproxy.cfg" | dshbak -c
----------------
ncn-s[001-004]
----------------
# Please do not change this file directly since it is managed by Ansible and will be overwritten
global
log 127.0.0.1 local2
chroot /var/lib/haproxy
pidfile /var/run/haproxy.pid
maxconn 8000
user haproxy
group haproxy
daemon
stats socket /var/lib/haproxy/stats
tune.ssl.default-dh-param 4096
ssl-default-bind-ciphers EECDH+AESGCM:EDH+AESGCM
ssl-default-bind-options no-sslv3 no-tlsv10 no-tlsv11 no-tls-tickets
defaults
mode http
log global
option httplog
option dontlognull
option http-server-close
option forwardfor except 127.0.0.0/8
option redispatch
retries 3
timeout http-request 10s
timeout queue 1m
timeout connect 10s
timeout client 1m
timeout server 1m
timeout http-keep-alive 10s
timeout check 10s
maxconn 8000
frontend http-rgw-frontend
bind *:80
default_backend rgw-backend
acl allow_ncns src -n -f /etc/haproxy/allowed_ncns.lst
acl restrict_ncn_images path_beg /ncn-images
http-request deny if restrict_ncn_images !allow_ncns
frontend https-rgw-frontend
bind *:443 ssl crt /etc/ceph/rgw.pem
default_backend rgw-backend
acl allow_ncns src -n -f /etc/haproxy/allowed_ncns.lst
acl restrict_ncn_images path_beg /ncn-images
http-request deny if restrict_ncn_images !allow_ncns
backend rgw-backend
option forwardfor
balance static-rr
option httpchk GET /
server server-ncn-s001-rgw0 10.252.1.7:8080 check weight 100
server server-ncn-s002-rgw0 10.252.1.6:8080 check weight 100
server server-ncn-s003-rgw0 10.252.1.5:8080 check weight 100
server server-ncn-s004-rgw0 10.252.1.4:8080 check weight 100
Restart rsyslog
across all storage nodes.
ncn-m001# pdsh -w ncn-s00[1-4] "systemctl restart rsyslog"
ncn-m001# pdsh -w ncn-s00[1-4] "systemctl status rsyslog" | grep Active
ncn-s001: Active: active (running) since Thu 2022-07-07 13:50:39 UTC; 7s ago
ncn-s002: Active: active (running) since Thu 2022-07-07 13:50:39 UTC; 7s ago
ncn-s003: Active: active (running) since Thu 2022-07-07 13:50:39 UTC; 7s ago
ncn-s004: Active: active (running) since Thu 2022-07-07 13:50:39 UTC; 7s ago
Restart HAProxy across all storage nodes.
ncn-m001# pdsh -w ncn-s00[1-4] "systemctl restart haproxy"
ncn-m001# pdsh -w ncn-s00[1-4] "systemctl status haproxy" | grep Active
ncn-s001: Active: active (running) since Thu 2022-07-07 13:50:39 UTC; 7s ago
ncn-s002: Active: active (running) since Thu 2022-07-07 13:50:39 UTC; 7s ago
ncn-s003: Active: active (running) since Thu 2022-07-07 13:50:39 UTC; 7s ago
ncn-s004: Active: active (running) since Thu 2022-07-07 13:50:39 UTC; 7s ago
Apply server-side iptables
rules to storage nodes.
This is needed to prevent direct access to the Ceph Rados GW Service (not through HAProxy).
The process is written to support change on individual nodes, but could be scripted after analysis of the running firewall rule set (notably with respect to local modifications, if they exist).
This process must be completed on each storage node (steps 18 - 21).
Document where Rados GW is running (port wise). It should be the same across all storage nodes.
ncn-s001# ss -tnpl | grep rados
LISTEN 0 128 0.0.0.0:8080 0.0.0.0:* users:(("radosgw",pid=25018,fd=77))
LISTEN 0 128 [::]:8080 [::]:* users:(("radosgw",pid=25018,fd=78))
List existing iptables
rules.
ncn-s001# iptables -L -nx -v
Chain INPUT (policy ACCEPT 399480930 packets, 1051007801113 bytes)
pkts bytes target prot opt in out source destination
0 0 DROP tcp -- * * 0.0.0.0/0 10.102.4.135 tcp dpt:22
Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
Chain OUTPUT (policy ACCEPT 400599807 packets, 1035420933926 bytes)
pkts bytes target prot opt in out source destination
Run the following to add iptables
rules for control.
The range should include all NMN NCN IP addresses generated for the HAProxy ACL step.
iptables -A INPUT -i vlan004 -p tcp --dport 8080 -j ACCEPT
iptables -A INPUT -i lo -p tcp --dport 8080 -j ACCEPT
iptables -A INPUT -p tcp --dport 8080 -m iprange --src-range 10.252.1.4-10.252.1.12 -j ACCEPT
iptables -A INPUT -p tcp --dport 8080 -j LOG --log-prefix "RADOSGW-DROP"
iptables -A INPUT -p tcp --dport 8080 -j DROP
List iptables
rules again, verify rules are in place.
ncn-s001# iptables -L -nx -v
Chain INPUT (policy ACCEPT 22144 packets, 28721015 bytes)
pkts bytes target prot opt in out source destination
0 0 DROP tcp -- * * 0.0.0.0/0 10.102.4.135 tcp dpt:22
0 0 ACCEPT tcp -- vlan004 * 0.0.0.0/0 0.0.0.0/0 tcp dpt:8080
85 4862 ACCEPT tcp -- lo * 0.0.0.0/0 0.0.0.0/0 tcp dpt:8080
276 15438 ACCEPT tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp dpt:8080 source IP range 10.252.1.4-10.252.1.12
0 0 LOG tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp dpt:8080 LOG flags 0 level 4 prefix "RADOSGW-DROP"
0 0 DROP tcp -- * * 0.0.0.0/0 0.0.0.0/0 tcp dpt:8080
Chain FORWARD (policy ACCEPT 0 packets, 0 bytes)
pkts bytes target prot opt in out source destination
Chain OUTPUT (policy ACCEPT 22099 packets, 30141111 bytes)
pkts bytes target prot opt in out source destination
Test connectivity after applying the ACL.
Re-run the connectivity test. While the results will be similar, they should all now be passing:
ncn-m001# bash ./con_test.sh
[i] MGMT RADOS over NMN
RADOS ncn-s001.nmn: PASS
RADOS ncn-s002.nmn: PASS
RADOS ncn-s003.nmn: PASS
[i] MGMT RADOS over HMN
RADOS ncn-s001.hmn: PASS
RADOS ncn-s002.hmn: PASS
RADOS ncn-s003.hmn: PASS
[i] MGMT RADOS over CAN
RADOS ncn-s001.can: PASS
RADOS ncn-s002.can: PASS
RADOS ncn-s003.can: PASS
[i] MGMT HAProxy over NMN
HAPROXY (CEPH) HTTP ncn-s001.nmn: PASS
HAPROXY (CEPH) HTTPS ncn-s001.nmn: PASS
HAPROXY (CEPH) HTTP ncn-s002.nmn: PASS
HAPROXY (CEPH) HTTPS ncn-s002.nmn: PASS
HAPROXY (CEPH) HTTP ncn-s003.nmn: PASS
HAPROXY (CEPH) HTTPS ncn-s003.nmn: PASS
Validate no connection can be made to HAProxy for ncn-images
or Ceph RADOS GW (at all) from compute nodes and UANs.
Use rgw-vip
as it will resolve to one of the storage nodes.
nid000002# host rgw-vip
rgw-vip has address 10.252.1.3
nid000002# curl http://rgw-vip/ncn-images/
<html><body><h1>403 Forbidden</h1>
Request forbidden by administrative rules.
</body></html>
nid000002# curl -k https://rgw-vip/ncn-images/
<html><body><h1>403 Forbidden</h1>
Request forbidden by administrative rules.
</body></html>
nid000002# curl --connect-timeout 2 rgw-vip:8080
curl: (28) Connection timed out after 2001 milliseconds
Look for a 403 response in the HAProxy logs:
ncn-m001# pdsh -N -w ncn-s00[1-4] "cd /var/log && zgrep -h -i -E 'haproxy.*frontend' messages || exit 0" | grep "ncn-images"
2022-07-13T13:57:08+00:00 xxx-ncn-s001.local haproxy[43591]: 10.252.1.13:50238 [13/Jul/2022:13:57:08.363] http-rgw-frontend http-rgw-frontend/<NOSRV> 0/-1/-1/-1/0 403 212 - - PR-- 1/1/0/0/0 0/0 "GET /ncn-images/ HTTP/1.1"
2022-07-13T14:01:11+00:00 xxx-ncn-s001.local haproxy[43591]: 10.252.1.13:50240 [13/Jul/2022:14:01:11.038] http-rgw-frontend http-rgw-frontend/<NOSRV> 0/-1/-1/-1/0 403 212 - - PR-- 1/1/0/0/0 0/0 "GET /ncn-images/ HTTP/1.1"
...
In the firewall logs, the Ceph RADOS GW traffic will be dropped on the storage node. For example:
ncn-m001# pdsh -N -w ncn-s00[1-4] "grep RADOSGW /var/log/firewall"
2022-07-13T14:02:03.418750+00:00 xxx-ncn-s001 kernel: [4397628.546654] RADOSGW-DROPIN=vlan002 OUT= MAC=b8:59:9f:f9:1d:22:a4:bf:01:3f:6f:91:08:00 SRC=10.252.1.13 DST=10.252.1.3 LEN=52 TOS=0x00 PREC=0x00 TTL=64 ID=9727 DF PROTO=TCP SPT=59278 DPT=8080 WINDOW=42340 RES=0x00 SYN URGP=0
...
For further validation, the following script can be saved to a UAN or compute node with a storage node count as an input.
This will test cross-network select access that should not be possible based on a correctly configured switch ACL posture, as well.
nid000002# cat user_con_test.sh
CURL_O="--connect-timeout 2 -f"
NODE_COUNT="$1"
function curl_rept
{
echo -n "[i] $1 -> "
$1 &> /dev/null
if [ $? -ne 0 ]
then
echo "PASS"
else
echo "FAIL"
fi
return
}
for n in `seq 1 $NODE_COUNT`
do
for t in nmn can hmn
do
curl_rept "curl $CURL_O ncn-s00${n}.${t}:8080" # ceph rados
curl_rept "curl $CURL_O http://ncn-s00${n}.${t}/ncn-images/" # ncn images, http
curl_rept "curl $CURL_O -k https://ncn-s00${n}.${t}/ncn-images/" # ncn images, https
done
done
nid000002# bash ./user_con_test.sh 4
[i] curl --connect-timeout 2 -f ncn-s001.nmn:8080 -> PASS
[i] curl --connect-timeout 2 -f http://ncn-s001.nmn/ncn-images/ -> PASS
[i] curl --connect-timeout 2 -f -k https://ncn-s001.nmn/ncn-images/ -> PASS
[i] curl --connect-timeout 2 -f ncn-s001.can:8080 -> PASS
[i] curl --connect-timeout 2 -f http://ncn-s001.can/ncn-images/ -> PASS
[i] curl --connect-timeout 2 -f -k https://ncn-s001.can/ncn-images/ -> PASS
[i] curl --connect-timeout 2 -f ncn-s001.hmn:8080 -> PASS
[i] curl --connect-timeout 2 -f http://ncn-s001.hmn/ncn-images/ -> PASS
[i] curl --connect-timeout 2 -f -k https://ncn-s001.hmn/ncn-images/ -> PASS
[i] curl --connect-timeout 2 -f ncn-s002.nmn:8080 -> PASS
[i] curl --connect-timeout 2 -f http://ncn-s002.nmn/ncn-images/ -> PASS
[i] curl --connect-timeout 2 -f -k https://ncn-s002.nmn/ncn-images/ -> PASS
[i] curl --connect-timeout 2 -f ncn-s002.can:8080 -> PASS
[i] curl --connect-timeout 2 -f http://ncn-s002.can/ncn-images/ -> PASS
[i] curl --connect-timeout 2 -f -k https://ncn-s002.can/ncn-images/ -> PASS
[i] curl --connect-timeout 2 -f ncn-s002.hmn:8080 -> PASS
[i] curl --connect-timeout 2 -f http://ncn-s002.hmn/ncn-images/ -> PASS
[i] curl --connect-timeout 2 -f -k https://ncn-s002.hmn/ncn-images/ -> PASS
[i] curl --connect-timeout 2 -f ncn-s003.nmn:8080 -> PASS
[i] curl --connect-timeout 2 -f http://ncn-s003.nmn/ncn-images/ -> PASS
[i] curl --connect-timeout 2 -f -k https://ncn-s003.nmn/ncn-images/ -> PASS
[i] curl --connect-timeout 2 -f ncn-s003.can:8080 -> PASS
[i] curl --connect-timeout 2 -f http://ncn-s003.can/ncn-images/ -> PASS
[i] curl --connect-timeout 2 -f -k https://ncn-s003.can/ncn-images/ -> PASS
[i] curl --connect-timeout 2 -f ncn-s003.hmn:8080 -> PASS
[i] curl --connect-timeout 2 -f http://ncn-s003.hmn/ncn-images/ -> PASS
[i] curl --connect-timeout 2 -f -k https://ncn-s003.hmn/ncn-images/ -> PASS
[i] curl --connect-timeout 2 -f ncn-s004.nmn:8080 -> PASS
[i] curl --connect-timeout 2 -f http://ncn-s004.nmn/ncn-images/ -> PASS
[i] curl --connect-timeout 2 -f -k https://ncn-s004.nmn/ncn-images/ -> PASS
[i] curl --connect-timeout 2 -f ncn-s004.can:8080 -> PASS
[i] curl --connect-timeout 2 -f http://ncn-s004.can/ncn-images/ -> PASS
[i] curl --connect-timeout 2 -f -k https://ncn-s004.can/ncn-images/ -> PASS
[i] curl --connect-timeout 2 -f ncn-s004.hmn:8080 -> PASS
[i] curl --connect-timeout 2 -f http://ncn-s004.hmn/ncn-images/ -> PASS
[i] curl --connect-timeout 2 -f -k https://ncn-s004.hmn/ncn-images/ -> PASS
Save the iptables
rule set on all storage nodes and make it persistent across reboots.
Create a directory to hold the iptables
configuration.
ncn-m001# pdsh -w ncn-s00[1-4] "mkdir --mode=750 /etc/iptables"
Create a one-shot systemd service to load iptables on system boot.
ncn-m001# cat << EOF > metal-iptables.service
[Unit]
Description=Loads Metal iptables config
After=local-fs.target network.service
[Service]
Type=oneshot
ExecStart=/usr/sbin/iptables-restore /etc/iptables/metal.conf
Restart=no
RemainAfterExit=no
[Install]
WantedBy=multi-user.target
EOF
ncn-m001# chmod 640 metal-iptables.service
Distribute the one-shot systemd service to the storage nodes.
ncn-m001# pdcp -w ncn-s00[1-4] metal-iptables.service /usr/lib/systemd/system
Enable the service.
ncn-m001# pdsh -w ncn-s00[1-4] "systemctl enable metal-iptables.service"
Use iptables-save
to commit running rules to the persistent configuration.
ncn-m001# pdsh -w ncn-s00[1-4] "iptables-save -f /etc/iptables/metal.conf"
Execute the one-shot systemd service.
ncn-m001# pdsh -w ncn-s00[1-4] "systemctl start metal-iptables.service"
Verify the rule set is consistent across nodes.
ncn-m001# pdsh -w ncn-s00[1-4] "cat /etc/iptables/metal.conf" | grep "8080" | dshbak -c
----------------
ncn-s[001-004]
----------------
-A INPUT -i vlan004 -p tcp -m tcp --dport 8080 -j ACCEPT
-A INPUT -i lo -p tcp -m tcp --dport 8080 -j ACCEPT
-A INPUT -p tcp -m tcp --dport 8080 -m iprange --src-range 10.252.1.4-10.252.1.14 -j ACCEPT
-A INPUT -p tcp -m tcp --dport 8080 -j LOG --log-prefix RADOSGW-DROP
-A INPUT -p tcp -m tcp --dport 8080 -j DROP
NOTE: If SMA log forwarders are not yet running, then it might be necessary to temporarily disable the /etc/rsyslog.d/01-cray-rsyslog.conf
rule (for logs to flow to the local nodes without delay).
Restart rsyslog
if this action is required.
Look for RADOSGW drops in /var/log/firewall
on storage nodes, note that the connectivity test will attempt access on the CAN.
ncn-m001# pdsh -N -w ncn-s00[1-4] "grep RADOSGW /var/log/firewall" | grep vlan007 | head -3
2022-08-01T21:22:01.049443+00:00 ncn-s003 kernel: [13242021.397679] RADOSGW-DROPIN=vlan007 OUT= MAC=14:02:ec:d9:79:d0:94:40:c9:5f:9a:84:08:00 SRC=10.103.13.13 DST=10.103.13.5 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=35159 DF PROTO=TCP SPT=60482 DPT=8080 WINDOW=42340 RES=0x00 SYN URGP=0
2022-08-01T21:22:05.061945+00:00 ncn-s001 kernel: [13248180.144514] RADOSGW-DROPIN=vlan007 OUT= MAC=14:02:ec:da:bc:68:94:40:c9:5f:9a:84:08:00 SRC=10.103.13.13 DST=10.103.13.7 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=10604 DF PROTO=TCP SPT=43034 DPT=8080 WINDOW=42340 RES=0x00 SYN URGP=0
2022-08-01T21:22:02.047541+00:00 ncn-s003 kernel: [13242022.399499] RADOSGW-DROPIN=vlan007 OUT= MAC=14:02:ec:d9:79:d0:94:40:c9:5f:9a:84:08:00 SRC=10.103.13.13 DST=10.103.13.5 LEN=60 TOS=0x00 PREC=0x00 TTL=64 ID=35160 DF PROTO=TCP SPT=60482 DPT=8080 WINDOW=42340 RES=0x00 SYN URGP=0
...
Look for HAProxy access logs in /var/log/messages
on storage nodes that have HTTP 403 responses (or other responses depending upon context).
ncn-m001# pdsh -N -w ncn-s00[1-3] "cd /var/log && zgrep -h 'haproxy.*frontend' messages || exit 0" | grep " 403 " | sort -k 1
2022-08-01T21:36:28+00:00 localhost haproxy[20903]: 10.252.1.20:37248 [01/Aug/2022:21:36:28.679] http-rgw-frontend http-rgw-frontend/<NOSRV> 0/-1/-1/-1/0 403 212 - - PR-- 1/1/0/0/0 0/0 "GET /ncn-images/ HTTP/1.1"
2022-08-01T21:36:57+00:00 localhost haproxy[20903]: 10.252.1.20:53358 [01/Aug/2022:21:36:57.898] https-rgw-frontend~ https-rgw-frontend/<NOSRV> 0/-1/-1/-1/0 403 212 - - PR-- 1/1/0/0/0 0/0 "GET /ncn-images/ HTTP/1.1"
2022-08-01T21:39:35+00:00 localhost haproxy[20903]: 10.252.1.20:40400 [01/Aug/2022:21:39:35.141] https-rgw-frontend~ https-rgw-frontend/<NOSRV> 1/-1/-1/-1/1 403 212 - - PR-- 1/1/0/0/0 0/0 "GET /ncn-images/ HTTP/1.1"
2022-08-01T21:39:35+00:00 localhost haproxy[20903]: 10.252.1.20:57530 [01/Aug/2022:21:39:35.134] http-rgw-frontend http-rgw-frontend/<NOSRV> 0/-1/-1/-1/0 403 212 - - PR-- 1/1/0/0/0 0/0 "GET /ncn-images/ HTTP/1.1"
2022-08-01T21:39:37+00:00 localhost haproxy[20903]: 10.252.1.20:34828 [01/Aug/2022:21:39:37.152] http-rgw-frontend http-rgw-frontend/<NOSRV> 0/-1/-1/-1/0 403 212 - - PR-- 1/1/0/0/0 0/0 "GET /ncn-images/ HTTP/1.1"
2022-08-01T21:39:37+00:00 localhost haproxy[20903]: 10.252.1.20:57896 [01/Aug/2022:21:39:37.159] https-rgw-frontend~ https-rgw-frontend/<NOSRV> 0/-1/-1/-1/0 403 212 - - PR-- 1/1/0/0/0 0/0 "GET /ncn-images/ HTTP/1.1"