SD-WAN 排错
SD-WAN 排错
SD-WAN 日志
健康检查
健康检查对路由的影响:
检测到中断,删除对应接口的静态路由。
date=2024-01-20 time=17:06:31 eventtime=1618963591590008160 tz="-0700" logid="0100022921" type="event" subtype="system" level="critical" vd="root" logdesc="Routing information changed" name="test" interface="R150" status="down" msg="Static route on interface R150 may be removed by health-check test. Route: (10.100.1.2->10.100.2.22 ping-down)"健康检查检测到恢复,恢复对应接口的静态路由。
date=2024-01-20 time=17:11:46 eventtime=1618963906950174240 tz="-0700" logid="0100022921" type="event" subtype="system" level="critical" vd="root" logdesc="Routing information changed" name="test" interface="R150" status="up" msg="Static route on interface R150 may be added by health-check test. Route: (10.100.1.2->10.100.2.22 ping-up)"
SD-WAN 健康检查某个成员的状态变化:
健康检查失败(Dead),停止转发流量。
date=2024-01-20 time=23:04:32 eventtime=1618985072898756700 tz="-0700" logid="0113022923" type="event" subtype="sdwan" level="notice" vd="root" logdesc="SDWAN status" eventtype="Service" interface="R150" member="1" serviceid=1 service="test" gateway=10.100.1.1 msg="Member link is unreachable or miss threshold. Stop forwarding traffic. "健康检查从失败(Dead)恢复为 alive,继续转发流量。
date=2024-01-20 time=23:06:08 eventtime=1618985168018789600 tz="-0700" logid="0113022923" type="event" subtype="sdwan" level="notice" vd="root" logdesc="SDWAN status" eventtype="Service" interface="R150" member="1" serviceid=1 service="test" gateway=10.100.1.1 msg="Member link is available. Start forwarding traffic. "
健康检查中的 SLA 目标:
不满足 SLA 目标。
date=2024-01-20 time=21:32:33 eventtime=1618979553388763760 tz="-0700" logid="0113022923" type="event" subtype="sdwan" level="notice" vd="root" logdesc="SDWAN status" eventtype="Health Check" healthcheck="test" slatargetid=1 oldvalue="2" newvalue="1" msg="Number of pass member changed." date=2024-01-20 time=21:32:33 eventtime=1618979553388751880 tz="-0700" logid="0113022923" type="event" subtype="sdwan" level="notice" vd="root" logdesc="SDWAN status" eventtype="Health Check" healthcheck="test" slatargetid=1 member="1" msg="Member status changed. Member out-of-sla."SLA 目标从不满足目标恢复为满足目标。
date=2024-01-20 time=21:38:49 eventtime=1618979929908765200 tz="-0700" logid="0113022923" type="event" subtype="sdwan" level="notice" vd="root" logdesc="SDWAN status" eventtype="Health Check" healthcheck="test" slatargetid=1 oldvalue="1" newvalue="2" msg="Number of pass member changed." date=2024-01-20 time=21:38:49 eventtime=1618979929908754060 tz="-0700" logid="0113022923" type="event" subtype="sdwan" level="information" vd="root" logdesc="SDWAN status" eventtype="Health Check" healthcheck="test" slatargetid=1 member="1" msg="Member status changed. Member in sla."
成员转发状态
某链路带宽占用:
已经达到了配置的该成员的带宽值,停止转发流量。
date=2024-01-20 time=21:55:14 eventtime=1618980914728863220 tz="-0700" logid="0113022924" type="event" subtype="sdwan" level="notice" vd="root" logdesc="SDWAN volume status" eventtype="Volume" interface="R160" member="2" msg="Member enters into conservative status with limited ablity to receive new sessions for too much traffic."占用已经恢复小于配置的带宽值,并继续开始转发流量。
date=2024-01-20 time=22:12:52 eventtime=1618981972698753360 tz="-0700" logid="0113022924" type="event" subtype="sdwan" level="notice" vd="root" logdesc="SDWAN volume status" eventtype="Volume" interface="R160" member="2" msg="Member resume normal status to receive new sessions for internal adjustment"
配置 SLA 类型(Lowest Cost/Maximize bandwidth)的 SD-WAN 规则,由于 SLA 检查失败,转发成员顺序发生变化。
date=2024-01-20 time=22:40:46 eventtime=1618983646428803040 tz="-0700" logid="0113022923" type="event" subtype="sdwan" level="notice" vd="root" logdesc="SDWAN status" eventtype="Service" serviceid=1 service="test" seq="2,1" msg="Service prioritized by SLA will be redirected in sequence order."配置 Lowest Cost 类型的 SD-WAN 规则,由于 SLA 检查从失败恢复为通过,转发成员顺序发生变化。
date=2024-01-20 time=22:41:51 eventtime=1618983711678827920 tz="-0700" logid="0113022923" type="event" subtype="sdwan" level="notice" vd="root" logdesc="SDWAN status" eventtype="Service" serviceid=1 service="test" seq="1,2" msg="Service prioritized by SLA will be redirected in sequence order."配置 Best Quality 类型的 SD-WAN 规则,转发成员的顺序发生变化。
date=2024-01-20 time=22:56:55 eventtime=1618984615708804760 tz="-0700" logid="0113022923" type="event" subtype="sdwan" level="notice" vd="root" logdesc="SDWAN status" eventtype="Service" serviceid=1 service="test" metric="packet-loss" seq="2,1" msg="Service prioritized by performance metric will be redirected in sequence order." date=2024-01-20 time=22:56:58 eventtime=1618984618278852140 tz="-0700" logid="0113022923" type="event" subtype="sdwan" level="notice" vd="root" logdesc="SDWAN status" eventtype="Service" serviceid=1 service="test" metric="packet-loss" seq="1,2" msg="Service prioritized by performance metric will be redirected in sequence order."配置 Maximize bandwidth 类型的 SD-WAN 规则:
转发成员不满足 SLA 标准,此成员停止转发流量。
date=2024-01-20 time=23:10:24 eventtime=1618985425048820800 tz="-0700" logid="0113022923" type="event" subtype="sdwan" level="notice" vd="root" logdesc="SDWAN status" eventtype="Service" serviceid=1 service="test" member="2(R160)" msg="Service will be load balanced among members with available routing."配置 Maximize bandwidth 类型的 SD-WAN 规则,转发成员从不满足 SLA 标准恢复到满足 SLA 标准,此成员可以继续转发流量。
date=2024-01-20 time=23:11:34 eventtime=1618985494478807100 tz="-0700" logid="0113022923" type="event" subtype="sdwan" level="notice" vd="root" logdesc="SDWAN status" eventtype="Service" serviceid=1 service="test" member="2(R160),1(R150)" msg="Service will be load balanced among members with available routing."
在健康检查的配置中开启了 SLA 周期日志:
在健康检查的配置中开启了
sla-fail-log-period,健康检查周期性产生的 SLA 失败日志。date=2024-01-20 time=23:18:10 eventtime=1618985890469018260 tz="-0700" logid="0113022925" type="event" subtype="sdwan" level="notice" vd="root" logdesc="SDWAN SLA information" eventtype="SLA" healthcheck="test" slatargetid=1 interface="R150" status="up" latency="0.061" jitter="0.004" packetloss="2.000%" inbandwidthavailable="0kbps" outbandwidthavailable="200.00Mbps" bibandwidthavailable="200.00Mbps" inbandwidthused="1kbps" outbandwidthused="1kbps" bibandwidthused="2kbps" slamap="0x0" metric="packetloss" msg="Health Check SLA status. SLA failed due to being over the performance metric threshold."在健康检查的配置中开启了
sla-pass-log-period,健康检查周期性产生的 SLA 成功日志。date=2024-01-20 time=23:18:12 eventtime=1618985892509027220 tz="-0700" logid="0113022925" type="event" subtype="sdwan" level="information" vd="root" logdesc="SDWAN SLA information" eventtype="SLA" healthcheck="test" slatargetid=1 interface="R150" status="up" latency="0.060" jitter="0.003" packetloss="0.000%" inbandwidthavailable="0kbps" outbandwidthavailable="200.00Mbps" bibandwidthavailable="200.00Mbps" inbandwidthused="1kbps" outbandwidthused="1kbps" bibandwidthused="2kbps" slamap="0x1" msg="Health Check SLA status."
SD-WAN 调试命令
SD-WAN 健康检查
查看 SD-WAN 健康检查的状态。
FGT # diagnose sys sdwan health-check
Health Check(server):
Seq(1 R150): state(alive), packet-loss(0.000%) latency(0.110), jitter(0.024) sla_map=0x0
Seq(2 R160): state(alive), packet-loss(0.000%) latency(0.068), jitter(0.009) sla_map=0x0
Health Check(ping):
Seq(1 R150): state(alive), packet-loss(0.000%) latency(0.100), jitter(0.017) sla_map=0x0
Seq(2 R160): state(dead), packet-loss(100.000%) sla_map=0x0
FGT # diagnose sys sdwan health-check ping
Health Check(ping):
Seq(1 R150): state(alive), packet-loss(0.000%) latency(0.100), jitter(0.017) sla_map=0x0
Seq(2 R160): state(dead), packet-loss(100.000%) sla_map=0x0SD-WAN 成员状态
使用
source-ip-based或source-dest-ip-based作为load-balance的模式时,查看 SD-WAN 成员的状态。FGT # diagnose sys sdwan member Member(1): interface: R150, gateway: 10.100.1.1 2000:10:100:1::1, priority: 0 1024, weight: 0 Member(2): interface: R160, gateway: 10.100.1.5 2000:10:100:1::5, priority: 0 1024, weight: 0使用
weight-based作为load-balance的模式时,查看 SD-WAN 成员的状态。FGT # diagnose sys sdwan member Member(1): interface: R150, gateway: 10.100.1.1 2000:10:100:1::1, priority: 0 1024, weight: 33 Session count: 15 Member(2): interface: R160, gateway: 10.100.1.5 2000:10:100:1::5, priority: 0 1024, weight: 66 Session count: 1使用
measured-volume-based作为load-balance的模式时:所有的成员均还有使用余量,查看 SD-WAN 成员的状态。
FGT # diagnose sys sdwan member Member(1): interface: R150, gateway: 10.100.1.1 2000:10:100:1::1, priority: 0 1024, weight: 33 Config volume ratio: 33, last reading: 218067B, volume room 33MB Member(2): interface: R160, gateway: 10.100.1.5 2000:10:100:1::5, priority: 0 1024, weight: 66 Config volume ratio: 66, last reading: 202317B, volume room 66MB某个成员的用量已经用尽。
FGT # diagnose sys sdwan member Member(1): interface: R150, gateway: 10.100.1.1 2000:10:100:1::1, priority: 0 1024, weight: 0 Config volume ratio: 33, last reading: 1287767633B, overload volume 517MB Member(2): interface: R160, gateway: 10.100.1.5 2000:10:100:1::5, priority: 0 1024, weight: 63 Config volume ratio: 66, last reading: 1686997898B, volume room 63MB
使用
usage-based或``spillover作为load-balance` 的模式时:当溢出未发生时,查看 SD-WAN 成员的状态。
FGT # diagnose sys sdwan member Member(1): interface: R150, gateway: 10.100.1.1 2000:10:100:1::1, priority: 0 1024, weight: 255 Egress-spillover-threshold: 400kbit/s, ingress-spillover-threshold: 300kbit/s Egress-overbps=0, ingress-overbps=0 Member(2): interface: R160, gateway: 10.100.1.5 2000:10:100:1::5, priority: 0 1024, weight: 254 Egress-spillover-threshold: 0kbit/s, ingress-spillover-threshold: 0kbit/s Egress-overbps=0, ingress-overbps=0当某个成员发生溢出时。
FGT # diagnose sys sdwan member Member(1): interface: R150, gateway: 10.100.1.1 2000:10:100:1::1, priority: 0 1024, weight: 255 Egress-spillover-threshold: 400kbit/s, ingress-spillover-threshold: 300kbit/s Egress-overbps=1, ingress-overbps=0 Member(2): interface: R160, gateway: 10.100.1.5 2000:10:100:1::5, priority: 0 1024, weight: 254 Egress-spillover-threshold: 0kbit/s, ingress-spillover-threshold: 0kbit/s Egress-overbps=0, ingress-overbps=0diagnose netlink dstmac list命令也可以查看是否发生了溢出情况。FGT # diagnose netlink dstmac list R150 dev=R150 mac=00:00:00:00:00:00 vwl rx_tcp_mss=0 tx_tcp_mss=0 egress_overspill_threshold=50000 egress_bytes=100982 egress_over_bps=1 ingress_overspill_threshold=37500 ingress_bytes=40 ingress_over_bps=0 sampler_rate=0 vwl_zone_id=1 intf_qua=0
SD-WAN 规则状态
使用
manual模式的 SD-WAN 规则。FGT # diagnose sys sdwan service Service(1): Address Mode(IPV4) flags=0x200 Gen(1), TOS(0x0/0x0), Protocol(0: 1->65535), Mode(manual) Members(2): 1: Seq_num(1 R150), alive, selected 2: Seq_num(2 R160), alive, selected Dst address(1): 10.100.21.0-10.100.21.255使用
auto模式的 SD-WAN 规则。FGT # diagnose sys sdwan service Service(1): Address Mode(IPV4) flags=0x200 Gen(1), TOS(0x0/0x0), Protocol(0: 1->65535), Mode(auto), link-cost-factor(latency), link-cost-threshold(10), heath-check(ping) Members(2): 1: Seq_num(2 R160), alive, latency: 0.066, selected 2: Seq_num(1 R150), alive, latency: 0.093 Dst address(1): 10.100.21.0-10.100.21.255使用
Priority模式(Best Quality)的 SD-WAN 规则。FGT # diagnose sys sdwan service Service(1): Address Mode(IPV4) flags=0x200 Gen(1), TOS(0x0/0x0), Protocol(0: 1->65535), Mode(priority), link-cost-factor(latency), link-cost-threshold(10), heath-check(ping) Members(2): 1: Seq_num(2 R160), alive, latency: 0.059, selected 2: Seq_num(1 R150), alive, latency: 0.077, selected Dst address(1): 10.100.21.0-10.100.21.255使用
sla模式(Lowest Cost)的 SD-WAN 规则。FGT # diagnose sys sdwan service Service(1): Address Mode(IPV4) flags=0x200 Gen(1), TOS(0x0/0x0), Protocol(0: 1->65535), Mode(sla), sla-compare-order Members(2): 1: Seq_num(1 R150), alive, sla(0x1), gid(0), cfg_order(0), cost(0), selected 2: Seq_num(2 R160), alive, sla(0x1), gid(0), cfg_order(1), cost(0), selected Dst address(1): 10.100.21.0-10.100.21.255使用
load-balance模式(Best Quality)的 SD-WAN 规则。FGT # diagnose sys sdwan service Service(1): Address Mode(IPV4) flags=0x200 Gen(1), TOS(0x0/0x0), Protocol(0: 1->65535), Mode(load-balance hash-mode=round-robin) Members(2): 1: Seq_num(1 R150), alive, sla(0x1), gid(2), num of pass(1), selected 2: Seq_num(2 R160), alive, sla(0x1), gid(2), num of pass(1), selected Dst address(1): 10.100.21.0-10.100.21.255
SD-WAN 统计信息
过去 15 分钟的 SD-WAN 接口状态统计日志。
FGT (root) # diagnose sys sdwan intf-sla-log R150 Timestamp: Wed Apr 21 16:58:27 2021, used inbandwidth: 655bps, used outbandwidth: 81655306bps, used bibandwidth: 81655961bps, tx bys: 3413479982bytes, rx bytes: 207769bytes. Timestamp: Wed Apr 21 16:58:37 2021, used inbandwidth: 649bps, used outbandwidth: 81655540bps, used bibandwidth: 81656189bps, tx bys: 3515590414bytes, rx bytes: 208529bytes. Timestamp: Wed Apr 21 16:58:47 2021, used inbandwidth: 655bps, used outbandwidth: 81655546bps, used bibandwidth: 81656201bps, tx bys: 3617700886bytes, rx bytes: 209329bytes. Timestamp: Wed Apr 21 16:58:57 2021, used inbandwidth: 620bps, used outbandwidth: 81671580bps, used bibandwidth: 81672200bps, tx bys: 3719811318bytes, rx bytes: 210089bytes. Timestamp: Wed Apr 21 16:59:07 2021, used inbandwidth: 620bps, used outbandwidth: 81671580bps, used bibandwidth: 81672200bps, tx bys: 3821921790bytes, rx bytes: 210889bytes. Timestamp: Wed Apr 21 16:59:17 2021, used inbandwidth: 665bps, used outbandwidth: 81688152bps, used bibandwidth: 81688817bps, tx bys: 3924030936bytes, rx bytes: 211926bytes. Timestamp: Wed Apr 21 16:59:27 2021, used inbandwidth: 671bps, used outbandwidth: 81688159bps, used bibandwidth: 81688830bps, tx bys: 4026141408bytes, rx bytes: 212726bytes. ......过去 10 分钟的 SLA 统计日志(
diagnose sys sdwan sla-log <health-check-name> <seq-num> or <health-check-name> <seq-num> <childname>,如下所示,1为 SD-WAN 成员接口在 SD-WAN 区域中的成员 ID(seq-num)。FGT (root) # diagnose sys sdwan sla-log ping 1 Timestamp: Wed Apr 21 17:10:11 2021, vdom root, health-check ping, interface: R150, status: up, latency: 0.079, jitter: 0.023, packet loss: 0.000%. Timestamp: Wed Apr 21 17:10:12 2021, vdom root, health-check ping, interface: R150, status: up, latency: 0.079, jitter: 0.023, packet loss: 0.000%. Timestamp: Wed Apr 21 17:10:12 2021, vdom root, health-check ping, interface: R150, status: up, latency: 0.081, jitter: 0.024, packet loss: 0.000%. Timestamp: Wed Apr 21 17:10:13 2021, vdom root, health-check ping, interface: R150, status: up, latency: 0.081, jitter: 0.025, packet loss: 0.000%. Timestamp: Wed Apr 21 17:10:13 2021, vdom root, health-check ping, interface: R150, status: up, latency: 0.082, jitter: 0.026, packet loss: 0.000%. Timestamp: Wed Apr 21 17:10:14 2021, vdom root, health-check ping, interface: R150, status: up, latency: 0.083, jitter: 0.026, packet loss: 0.000%. Timestamp: Wed Apr 21 17:10:14 2021, vdom root, health-check ping, interface: R150, status: up, latency: 0.084, jitter: 0.026, packet loss: 0.000%. ......
SD-WAN 状态信息
SD-WAN 规则引用应用控制特征后,应用控制学习的条目状态。
FGT # diagnose sys sdwan internet-service-app-ctrl-list Gmail(15817 4294836957): 64.233.191.19 6 443 Thu Apr 22 10:10:34 2021 Gmail(15817 4294836957): 142.250.128.83 6 443 Thu Apr 22 10:06:47 2021 Facebook(15832 4294836806): 69.171.250.35 6 443 Thu Apr 22 10:12:00 2021 Amazon(16492 4294836342): 3.226.60.231 6 443 Thu Apr 22 10:10:57 2021 Amazon(16492 4294836342): 52.46.135.211 6 443 Thu Apr 22 10:10:58 2021 Amazon(16492 4294836342): 52.46.141.85 6 443 Thu Apr 22 10:10:58 2021 Amazon(16492 4294836342): 52.46.155.13 6 443 Thu Apr 22 10:10:58 2021 Amazon(16492 4294836342): 54.82.242.32 6 443 Thu Apr 22 10:10:59 2021 YouTube(31077 4294838537): 74.125.202.138 6 443 Thu Apr 22 10:06:51 2021 YouTube(31077 4294838537): 108.177.121.119 6 443 Thu Apr 22 10:08:24 2021 YouTube(31077 4294838537): 142.250.136.119 6 443 Thu Apr 22 10:02:02 2021 YouTube(31077 4294838537): 142.250.136.132 6 443 Thu Apr 22 10:08:16 2021 YouTube(31077 4294838537): 142.250.148.100 6 443 Thu Apr 22 10:07:28 2021 YouTube(31077 4294838537): 142.250.148.132 6 443 Thu Apr 22 10:10:32 2021 YouTube(31077 4294838537): 172.253.119.91 6 443 Thu Apr 22 10:02:01 2021 YouTube(31077 4294838537): 184.150.64.211 6 443 Thu Apr 22 10:04:36 2021 YouTube(31077 4294838537): 184.150.168.175 6 443 Thu Apr 22 10:02:26 2021 YouTube(31077 4294838537): 184.150.168.211 6 443 Thu Apr 22 10:02:26 2021 YouTube(31077 4294838537): 184.150.186.141 6 443 Thu Apr 22 10:02:26 2021 YouTube(31077 4294838537): 209.85.145.190 6 443 Thu Apr 22 10:10:36 2021 YouTube(31077 4294838537): 209.85.200.132 6 443 Thu Apr 22 10:02:03 2021查看 IPSec 创建的 Shortcut 隧道的健康检查状态(
diagnose sys link-monitor interface <name> <name>_0)。Spoke1 # diagnose sys link-monitor interface Spoke1_WAN2 Interface(Spoke1_WAN2): state(down, since Fri Jan 19 16:58:13 2024), bandwidth(up:288bps, down:0bps), session count(IPv4:7, IPv6:0), tx(26595 bytes), rx(21568 bytes). Spoke1 # diagnose sys link-monitor interface Spoke1_WAN2 Spoke1_WAN2_0 Interface(Spoke1_WAN2_0): state(up, since Fri Jan 19 16:58:08 2024), bandwidth(up:320bps, down:320bps), session count(IPv4:0, IPv6:0), tx(12360 bytes), rx(12240 bytes), latency(2.15), jitter(0.53), packet-loss(0.00).查看 SD-WAN 中使用的 BGP route-tag。
FGT # get router info bgp network 10.100.11.0/24 VRF 0 BGP routing table entry for 10.100.11.0/24 Paths: (2 available, best #2, table Default-IP-Routing-Table) Advertised to non peer-group peers: 10.100.1.1 Original VRF 0 20 10 10.100.1.1 from 10.100.1.1 (5.5.5.5) Origin incomplete metric 0, route tag 15, localpref 100, valid, external, best Community: 30:5 Advertised Path ID: 2 Last update: Thu Apr 22 10:27:27 2021 Original VRF 0 20 10 10.100.1.5 from 10.100.1.5 (6.6.6.6) Origin incomplete metric 0, route tag 15, localpref 100, valid, external, best Community: 30:5 Advertised Path ID: 1 Last update: Thu Apr 22 10:25:50 2021 FGT # diagnose sys sdwan route-tag-list Route-tag: 15, address: v4(1), v6(0)Last write/now: 6543391 6566007 service(1), last read route-tag 15 at 6543420 Prefix(24): Address list(1): 10.100.11.0-10.100.11.255 oif: 50 48 FGT # diagnose firewall proute list list route policy info(vf=root): id=2133196801(0x7f260001) vwl_service=1(DataCenter) vwl_mbr_seq=1 2 dscp_tag=0xff 0xff flags=0x40 order-addr tos=0x00 tos_mask=0x00 protocol=0 sport=0-65535 iif=0 dport=1-65535 oif=48(R150) oif=50(R160) destination(1): 10.100.11.0-10.100.11.255 source wildcard(1): 0.0.0.0/0.0.0.0 hit_count=0 last_used=2021-04-22 10:25:10