Hub 反向流量异步
Hub 反向流量异步
网络拓扑

Spoke 有两条运营商线路 ISP1、ISP2,Hub 只有一条运营商线路。
Hub 创建 dynamic 模式的 IPSec VPN,Spoke 配置两条静态隧道与 Hub 建立 IPSec 连接。
Spoke 配置 SD-WAN,成员为两条 IPSec 隧道,并配置 SD-WAN 健康检查,检查源为 Spoke 内网口 port4 的 IP 地址,检查目标为 Hub 的内网口 port3 的 IP 地址。
路由层面:
Spoke → Hub:Spoke 配置静态路由指向 SD-WAN 区域,IPSec 保护网段的源配置为明细网段 10.10.1.0/24。
Hub → Spoke:Hub 通过 IPSec 一阶段中的
add-route/ 二阶段中的set route-overlap allow配置自动添加去往 Spoke 内网的路由,并在 Spoke 的两条隧道中负载。
配置信息
基础网络 IP、路由配置略。
Hub
在公网接口 port2 上配置 IPSec 隧道,隧道类型为 dynamic,一阶段开启
add-route,二阶段保护网段为 0.0.0.0/0↔︎0.0.0.0/0,route-overlap配置为allow,net-device为默认关闭状态。相关信息
add-route的相关信息请参考:VPN → IPSec VPN → IPSec VPN 排错 → Hub-Spoke 静态路由问题。route-overlap的相关信息请参考:VPN → IPSec VPN → IPSec VPN 排错 → Hub-Spoke 双线路隧道抖动。
config vpn ipsec phase1-interface edit "Hub" set type dynamic set interface "port2" set peertype any set net-device disable set proposal aes128-sha256 set dpd on-idle set psksecret fortinet set dpd-retryinterval 60 next end config vpn ipsec phase2-interface edit "Hub" set phase1name "Hub" set proposal aes128-sha1 set route-overlap allow next end配置防火墙策略放通 IPSec 隧道与内网口 port3 之间的流量。
config firewall policy edit 1 set name "VPN_Hub" set srcintf "Hub" "port3" set dstintf "Hub" "port3" set action accept set srcaddr "all" set dstaddr "all" set schedule "always" set service "ALL" next end
Spoke
在两条 ISP 线路接口 port2 和 port3 分别创建 IPSec 隧道配置,隧道类型为 static,第二阶段保护网段本端均使用明细的内网网段:10.10.1.0/24↔︎0.0.0.0/0。
config vpn ipsec phase1-interface edit "vpn_line1" set interface "port2" set peertype any set net-device disable set proposal aes128-sha256 set dpd on-idle set remote-gw 202.103.3.3 set psksecret fortinet next edit "vpn_line2" set interface "port3" set peertype any set net-device disable set proposal aes128-sha256 set dpd on-idle set remote-gw 202.103.3.3 set psksecret fortinet next end config vpn ipsec phase2-interface edit "vpn_line1" set phase1name "vpn_line1" set proposal aes128-sha1 set auto-negotiate enable set src-subnet 10.10.1.0 255.255.255.0 next edit "vpn_line2" set phase1name "vpn_line2" set proposal aes128-sha1 set auto-negotiate enable set src-subnet 10.10.1.0 255.255.255.0 next end将两个 IPSec 接口加入 SD-WAN 区域,并配置 SD-WAN 健康检查,检查源为 Spoke 内网口 port4 的 IP 地址,检查目标为 Hub 的内网口 port3 的 IP 地址。
config system sdwan set status enable config zone edit "virtual-wan-link" next end config members edit 1 set interface "vpn_line1" next edit 2 set interface "vpn_line2" next end config health-check edit "to_Hub" set server "10.10.2.1" set update-static-route disable set source 10.10.1.1 set members 0 next end end配置防火墙策略放通 2 条 IPSec 隧道(SD-WAN 区域)与内网口 port4 之间的流量。
config firewall policy edit 1 set name "all" set srcintf "virtual-wan-link" "port4" set dstintf "port4" "virtual-wan-link" set action accept set srcaddr "all" set dstaddr "all" set schedule "always" set service "ALL" next end配置去往 Hub 内网网段(192.168.0.0/24)的静态路由,出接口选择 SD-WAN 区域。
config router static edit 3 set dst 10.10.2.0 255.255.255.0 set distance 1 set sdwan-zone "virtual-wan-link" next end
问题现象
Spoke 与 Hub 的两条 IPSec 隧道建立后,SD-WAN 健康检查工作异常,在 vpn_line1 隧道上显示
alive,vpn_line2 隧道显示dead。
Spoke # diagnose sys sdwan health-check status Health Check(to_Hub): Seq(1 vpn_line1): state(alive), packet-loss(0.000%) latency(1.092), jitter(0.172), mos(4.404), bandwidth-up(9999999), bandwidth-dw(9999999), bandwidth-bi(19999998) sla_map=0x0 Seq(2 vpn_line2): state(dead), packet-loss(100.000%) sla_map=0x0在 Spoke 上抓包,可以看到健康检查从 Spoke 两条隧道 vpn_line1、vpn_line2 分别发送了 ICMP Request,但 ICMP Reply 都是从 vpn_line1 返回的。这说明 Hub 在返回 ICMP Reply 报文时送错了隧道,所以 Spoke 认定 vpn_line2 为
dead状态。Spoke # diagnose sniffer packet any 'host 10.10.2.1 and icmp' 4 Using Original Sniffing Mode interfaces=[any] filters=[host 10.10.2.1 and icmp] 0.277013 vpn_line1 out 10.10.1.1 -> 10.10.2.1: icmp: echo request 0.277148 vpn_line2 out 10.10.1.1 -> 10.10.2.1: icmp: echo request 0.278120 vpn_line1 in 10.10.2.1 -> 10.10.1.1: icmp: echo reply 0.278160 vpn_line1 in 10.10.2.1 -> 10.10.1.1: icmp: echo reply在 Hub 上查看路由表,可以看到去往 Spoke 内网(10.10.1.0/24)的路由等价负载到了两条 IPSec 隧道(Hub_0、Hub_1)上。
Hub # get router info routing-table all ...... Routing table for VRF=0 S 10.10.1.0/24 [15/0] via Hub_1 tunnel 202.103.1.2, [1/0] [15/0] via Hub_0 tunnel 202.103.2.2, [1/0] ......在 Hub 上查看 Spoke 发起的健康检查的 2 条会话,由于 Hub 未开启
net-device,所以两条会话的入接口均为 dynamic 模式的 IPSec 接口(Hub,index = 16)。Hub # diagnose netlink interface list Hub | grep index if=Hub family=00 type=768 index=16 mtu=1420 link=0 master=0 session info: proto=1 proto_state=00 duration=6288 expire=59 timeout=0 refresh_dir=both flags=00000000 socktype=0 sockport=0 av_idx=0 use=3 origin-shaper= reply-shaper= per_ip_shaper= class_id=0 ha_id=0 policy_dir=0 tunnel=/ tun_id=0.0.0.0/202.103.1.2 vlan_cos=0/0 state=local may_dirty statistic(bytes/packets/allow_err): org=501840/12546/1 reply=501840/12546/1 tuples=2 tx speed(Bps/kbps): 79/0 rx speed(Bps/kbps): 79/0 orgin->sink: org pre->in, reply out->post dev=16->8/8->16 <----入接口为Hub接口 gwy=0.0.0.0/0.0.0.0 hook=pre dir=org act=noop 10.10.1.1:31400->10.10.2.1:8(0.0.0.0:0) hook=post dir=reply act=noop 10.10.2.1:31400->10.10.1.1:0(0.0.0.0:0) misc=0 policy_id=1 pol_uuid_idx=15849 auth_info=0 chk_client_info=0 vd=0 serial=0000a1ef tos=ff/ff app_list=0 app=0 url_cat=0 rpdb_link_id=00000000 ngfwid=n/a npu_state=00000000 no_ofld_reason: local session info: proto=1 proto_state=00 duration=3401 expire=59 timeout=0 refresh_dir=both flags=00000000 socktype=0 sockport=0 av_idx=0 use=3 origin-shaper= reply-shaper= per_ip_shaper= class_id=0 ha_id=0 policy_dir=0 tunnel=/ tun_id=0.0.0.0/202.103.2.2 vlan_cos=0/0 state=local may_dirty statistic(bytes/packets/allow_err): org=271160/6779/1 reply=271160/6779/1 tuples=2 tx speed(Bps/kbps): 79/0 rx speed(Bps/kbps): 79/0 orgin->sink: org pre->in, reply out->post dev=16->8/8->16 <----入接口为Hub接口 gwy=0.0.0.0/0.0.0.0 hook=pre dir=org act=noop 10.10.1.1:32554->10.10.2.1:8(0.0.0.0:0) hook=post dir=reply act=noop 10.10.2.1:32554->10.10.1.1:0(0.0.0.0:0) misc=0 policy_id=1 pol_uuid_idx=15849 auth_info=0 chk_client_info=0 vd=0 serial=0000a704 tos=ff/ff app_list=0 app=0 url_cat=0 rpdb_link_id=00000000 ngfwid=n/a npu_state=00000000 no_ofld_reason: localHub 上显示 Spoke 与本端建立了两条 IPSec 隧道 Hub_0 和 Hub_1。Hub 的 Debug Flow 显示所有的 ICMP Reply 都被送进了 Hub_0 隧道,从 Hub_1 隧道收到的 ICMP Request 对应的 ICMP Reply 从 Hub_0 返回,流量没有源进源出。
Hub # get vpn ipsec tunnel summary 'Hub_0' 202.103.1.2:0 selectors(total,up): 1/1 rx(pkt,err): 158890/0 tx(pkt,err): 317780/0 'Hub_1' 202.103.2.2:0 selectors(total,up): 1/1 rx(pkt,err): 158890/0 tx(pkt,err): 0/0 Hub # diagnose debug flow filter daddr 10.10.1.1 Hub # diagnose debug flow filter proto 1 Hub # diagnose debug enable Hub # diagnose debug flow trace start 200 ↓↓↓↓第1个ICMP Reply(ICMP Request从Hub_0接收),从Hub_0返回↓↓↓↓ id=65308 trace_id=20 func=print_pkt_detail line=6005 msg="vd-root:0 received a packet(proto=1, 10.10.2.1:31400->10.10.1.1:0) tun_id=0.0.0.0 from local. type=0, code=0, id=31400, seq=981." id=65308 trace_id=20 func=resolve_ip_tuple_fast line=6107 msg="Find an existing session, id-0000a1ef, reply direction" id=65308 trace_id=20 func=ip_session_core_in line=6732 msg="dir-1, tun_id=202.103.1.2" <----针对Hub_0收到的ICMP Request返回的Reply,tun_id为Spoke的vpn_line1公网IP id=65308 trace_id=20 func=ipsecdev_hard_start_xmit line=662 msg="enter IPSec interface Hub, tun_id=202.103.1.2" id=65308 trace_id=20 func=_do_ipsecdev_hard_start_xmit line=222 msg="output to IPSec tunnel Hub_0, tun_id=202.103.1.2, vrf 0" <----送到了Hub_0,这个送对了 id=65308 trace_id=20 func=esp_output4 line=917 msg="IPsec encrypt/auth" id=65308 trace_id=20 func=ipsec_output_finish line=676 msg="send to 202.103.3.1 via intf-port2" ↓↓↓↓第2个ICMP Reply(ICMP Request从Hub_1接收),从Hub_0返回↓↓↓↓ id=65308 trace_id=19 func=print_pkt_detail line=6005 msg="vd-root:0 received a packet(proto=1, 10.10.2.1:31596->10.10.1.1:0) tun_id=0.0.0.0 from local. type=0, code=0, id=31596, seq=981." id=65308 trace_id=19 func=resolve_ip_tuple_fast line=6107 msg="Find an existing session, id-0000a2bd, reply direction" id=65308 trace_id=19 func=ip_session_core_in line=6732 msg="dir-1, tun_id=202.103.2.2" <----针对Hub_1收到的ICMP Request返回的Reply,tun_id为Spoke的vpn_line2公网IP id=65308 trace_id=19 func=ipsecdev_hard_start_xmit line=662 msg="enter IPSec interface Hub, tun_id=202.103.2.2" id=65308 trace_id=19 func=_do_ipsecdev_hard_start_xmit line=222 msg="output to IPSec tunnel Hub_0, tun_id=202.103.1.2, vrf 0" <----送到了Hub_0,这个送错了 id=65308 trace_id=19 func=esp_output4 line=917 msg="IPsec encrypt/auth" id=65308 trace_id=19 func=ipsec_output_finish line=676 msg="send to 202.103.3.1 via intf-port2"
问题原因
默认配置下,Hub 的 dynamic 模式的 IPSec 连接的
net-device配置为disable状态,来自同一 Spoke 的多条隧道共享一个 phase1-interface,Hub 做回包决策是:- 查路由(通常是 ECMP,指向同一 phase1)。
- 匹配 selector/proxy-id,也就是感兴趣流。
如果多条隧道的 selector(感兴趣流)相同,多条 SA 会同时匹配。而 Hub 没有“入向隧道”的绑定关系,于是回包会落到“优先级更高/先匹配”的那条路由对应的隧道,形成不对称回流。
典型触发条件:
Hub 为 dynamic 模式隧道,
net-device为disable状态。同一 Spoke 多条隧道。
Spoke 多条隧道的 selector/proxy-id(感兴趣流)重复。
Hub 回程为 ECMP/聚合隧道(多条候选路径同权重)。
解决方法 1-Spoke 配置 location-id
重要
配置要求:
为每台 Spoke 配置唯一的
location-id。这样 Hub 会把来自同一 Spoke 的多条拨号隧道分为一组,并 缓存“原始入向隧道”。回包优先在该组里选入向那条隧道,保持对称。location-id只是用于 Hub 识别同一 Spoke 多条隧道的标识,并不是真实的 IP 地址,但建议配置为能标识该 Spoke 的 IP 地址,如 loopback 接口地址。
工作原理:
分组:Hub 接收报文时,依据远端上报的
location-id,将相同location-id、挂在同一 phase1-interface 的拨号隧道划为同一站点组。隧道绑定:Hub 会缓存流量的原始入向隧道,形成一个“隧道组键”,基于
phase1 + location-id + SPI的内部标识。回包选路:回包时优先在该组内直接选回入向那条隧道。若不可用,再在同组内切换,不会跨到其它 Spoke。
在 Spoke 上配置
location-id(同一台 Spoke 的多条隧道用相同的location-id)。config system settings set location-id 10.10.1.1 end重建 Spoke 与 Hub 间的隧道,然后在 Hub 查看 IPSec 一阶段状态。可以看到 Spoke 将
location-id传递给了 Hub,Hub 将 Spoke 建立的两条隧道标记了相同的remote_location: 10.10.1.1。提示
location-id通过 IKE 协商传递:- IKEv1:
Main Mode的第 5/6 个报文中的 Fortinet 私有字段。 - IKEv2:
IKE_AUTH报文中的 Fortinet 私有字段。
Hub # diagnose vpn ike gateway list vd: root/0 name: Hub_0 version: 1 interface: port2 4 addr: 202.103.3.3:500 -> 202.103.1.2:500 tun_id: 202.103.1.2/::10.0.0.62 remote_location: 10.10.1.1 <----Spoke配置的location-id network-id: 0 transport: UDP created: 3s ago peer-id: 202.103.1.2 peer-id-auth: no pending-queue: 0 IKE SA: created 1/1 established 1/1 time 0/0/0 ms IPsec SA: created 1/1 established 1/1 time 10/10/10 ms id/spi: 72 d1e24bba9ed24c5b/d31a2eaa6a2cec9f direction: responder status: established 3-3s ago = 0ms proposal: aes128-sha256 key: 827f2d4a98e8918f-77fd9d748ea39e48 QKD: no lifetime/rekey: 86400/86126 DPD sent/recv: 00000000/00000000 peer-id: 202.103.1.2 vd: root/0 name: Hub_1 version: 1 interface: port2 4 addr: 202.103.3.3:500 -> 202.103.2.2:500 tun_id: 202.103.2.2/::10.0.0.63 remote_location: 10.10.1.1 <----Spoke配置的location-id network-id: 0 transport: UDP created: 2s ago peer-id: 202.103.2.2 peer-id-auth: no pending-queue: 0 IKE SA: created 1/1 established 1/1 time 0/0/0 ms IPsec SA: created 1/1 established 1/1 time 0/0/0 ms id/spi: 73 4324ed445892ef2b/c7efb986832df49b direction: responder status: established 2-2s ago = 0ms proposal: aes128-sha256 key: b40c73f8b0deb162-7144c5911025e8fe QKD: no lifetime/rekey: 86400/86127 DPD sent/recv: 00000000/00000000 peer-id: 202.103.2.2- IKEv1:
在 Spoke 查看 SD-WAN 健康检查状态,两条隧道均为
alive状态。Spoke # diagnose sys sdwan health-check status Health Check(to_Hub): Seq(1 vpn_line1): state(alive), packet-loss(0.000%) latency(1.358), jitter(0.377), mos(4.403), bandwidth-up(9999999), bandwidth-dw(9999999), bandwidth-bi(19999998) sla_map=0x0 Seq(2 vpn_line2): state(alive), packet-loss(0.000%) latency(1.194), jitter(0.249), mos(4.404), bandwidth-up(9999999), bandwidth-dw(9999999), bandwidth-bi(19999998) sla_map=0x0在 Spoke 上抓包,可以看到 ICMP Reply 可以被 Hub 从正确的隧道送回。
Spoke # diagnose sniffer packet any 'host 10.10.2.1 and icmp' 4 Using Original Sniffing Mode interfaces=[any] filters=[host 10.10.2.1 and icmp] 0.241158 vpn_line1 out 10.10.1.1 -> 10.10.2.1: icmp: echo request 0.241283 vpn_line2 out 10.10.1.1 -> 10.10.2.1: icmp: echo request 0.242357 vpn_line1 in 10.10.2.1 -> 10.10.1.1: icmp: echo reply 0.242594 vpn_line2 in 10.10.2.1 -> 10.10.1.1: icmp: echo reply
解决方法 2-Hub 开启 net-device
注意
由于会带来 Hub 的性能问题,在 Spoke 数量较多的场景下不推荐此方法。
Hub 的 IPSec 一阶段开启
net-device,隧道会自动重建。config vpn ipsec phase1-interface edit "Hub" set net-device enable next end在 Hub 上查看 Spoke 发起的健康检查的 2 条会话,由于 Hub 开启了
net-device,所以两条会话的入接口为动态生成的 tunnel 接口(Hub_0:index = 16,Hub_1:index = 17)。此时源进源出特性可以直接根据会话生效。Hub # diagnose netlink interface list | grep Hub if=Hub family=00 type=768 index=16 mtu=1420 link=0 master=0 if=Hub_0 family=00 type=768 index=17 mtu=1420 link=16 master=0 if=Hub_1 family=00 type=768 index=18 mtu=1420 link=16 master=0 session info: proto=1 proto_state=00 duration=6836 expire=59 timeout=0 refresh_dir=both flags=00000000 socktype=0 sockport=0 av_idx=0 use=3 origin-shaper= reply-shaper= per_ip_shaper= class_id=0 ha_id=0 policy_dir=0 tunnel=/ tun_id=0.0.0.0/202.103.1.2 vlan_cos=0/0 state=local may_dirty statistic(bytes/packets/allow_err): org=545680/13642/1 reply=545680/13642/1 tuples=2 tx speed(Bps/kbps): 80/0 rx speed(Bps/kbps): 80/0 orgin->sink: org pre->in, reply out->post dev=17->8/8->17 <----入接口为Hub_0接口 gwy=0.0.0.0/0.0.0.0 hook=pre dir=org act=noop 10.10.1.1:31400->10.10.2.1:8(0.0.0.0:0) hook=post dir=reply act=noop 10.10.2.1:31400->10.10.1.1:0(0.0.0.0:0) misc=0 policy_id=1 pol_uuid_idx=15849 auth_info=0 chk_client_info=0 vd=0 serial=0000a1ef tos=ff/ff app_list=0 app=0 url_cat=0 rpdb_link_id=00000000 ngfwid=n/a npu_state=00000000 no_ofld_reason: local session info: proto=1 proto_state=00 duration=3949 expire=59 timeout=0 refresh_dir=both flags=00000000 socktype=0 sockport=0 av_idx=0 use=3 origin-shaper= reply-shaper= per_ip_shaper= class_id=0 ha_id=0 policy_dir=0 tunnel=/ tun_id=0.0.0.0/202.103.2.2 vlan_cos=0/0 state=local may_dirty statistic(bytes/packets/allow_err): org=315000/7875/1 reply=315000/7875/1 tuples=2 tx speed(Bps/kbps): 80/0 rx speed(Bps/kbps): 80/0 orgin->sink: org pre->in, reply out->post dev=18->8/8->18 <----入接口为Hub_1接口 gwy=0.0.0.0/0.0.0.0 hook=pre dir=org act=noop 10.10.1.1:32554->10.10.2.1:8(0.0.0.0:0) hook=post dir=reply act=noop 10.10.2.1:32554->10.10.1.1:0(0.0.0.0:0) misc=0 policy_id=1 pol_uuid_idx=15849 auth_info=0 chk_client_info=0 vd=0 serial=0000a704 tos=ff/ff app_list=0 app=0 url_cat=0 rpdb_link_id=00000000 ngfwid=n/a npu_state=00000000 no_ofld_reason: local