OSPF GR

网络需求

FortiGate HA集群与上下游路由器运行OSPF协议,并开启GR。在FortiGate的HA发生主备切换时,GR需要确保每个路由器(包括FortiGate及其它路由器)在HA切换期间保持其路由表中的OSPF路由(GR Helper),以避免流量中断。

在FortiGate HA集群中,OSPF进程相关工作仅在主设备上运行。当发生HA切换时,将在新选举的主机上启动一个新的OSPF协商过程,在此过程中,新的主机需要在路由Kernel表中维持从原主机学习到的OSPF路由,直至新的OSPF邻居形成。

网络拓扑

image-20240115105813078

  1. FW1与FW2建立A-P模式的HA。
  2. FW1/FW2与FW3、FW4分别建立OSPF邻居,并宣告各自的网段。
  3. FW1/FW2与FW3、FW4开启OSPF Graceful Restart,FW1/FW2发生主备切换后,FW3、FW4可以进入GR Helper模式,协助FW1/FW2的GR过程。

配置步骤

  1. 基础网络配置(略)。

  2. FW1与FW2的HA配置(略)。

  3. 安全策略配置(略)。

  4. 配置FW1/FW2的OSPF,宣告网络,开启GR功能,restart-period设置为600s。

    config router ospf
        set router-id 202.103.12.1
        set restart-mode graceful-restart
        set restart-period 600
        config area
            edit 0.0.0.0
            next
        end
        config network
            edit 1
                set prefix 202.103.12.0 255.255.255.0
            next
            edit 2
                set prefix 202.103.13.0 255.255.255.0
            next
        end
    end
    
  5. 配置FW3的OSPF,宣告网络,路由器作为GR-Helper也需要开启GR功能。

    config router ospf
        set router-id 202.103.12.2
        set restart-mode graceful-restart
        config area
            edit 0.0.0.0
            next
        end
        config network
            edit 2
                set prefix 202.103.12.0 255.255.255.0
            next
            edit 3
                set prefix 10.10.1.0 255.255.255.0
            next
        end
    end
    
  6. 配置FW4的OSPF,路由器作为GR-Helper也需要开启GR功能。

    config router ospf
        set router-id 202.103.13.2
        set restart-mode graceful-restart
        config area
            edit 0.0.0.0
            next
        end
        config network
            edit 1
                set prefix 202.103.13.0 255.255.255.0
            next
            edit 2
                set prefix 10.10.2.0 255.255.255.0
            next
        end
    end
    
  7. 配置FW1/FW2 HA配置中的route-ttl为600s(默认10s),在此时间内,新的主机需要在路由Kernel表中维持从原主机学习到的OSPF路由,直至新的OSPF邻居形成。

    重要步骤,防止HA切换后,新主机建立新的OSPF邻居前,路由消失导致业务中断

    config system ha
        set route-ttl 600
    end
    

结果验证

  1. 初始状态下,FW1为主设备,FW2为备设备。

    FW1 # diagnose sys ha status
    HA information
    Statistics
            traffic.local = s:0 p:42869 b:39035389
            traffic.total = s:0 p:42867 b:39033889
            activity.ha_id_changes = 2
            activity.fdb  = c:0 q:0
    
    Model=80008, Mode=2 Group=7 Debug=0
    nvcluster=1, ses_pickup=0, delay=0
    
    [Debug_Zone HA information]
    HA group member information: is_manage_primary=1.
    FGVM08TM23000175:      Primary, serialno_prio=1, usr_priority=128, hostname=FW1
    FGVM08TM23000176:    Secondary, serialno_prio=0, usr_priority=100, hostname=FW2
    
    [Kernel HA information]
    vcluster 1, state=work, primary_ip=169.254.0.2, primary_id=0
    FGVM08TM23000175:      Primary, ha_prio/o_ha_prio=0/0
    FGVM08TM23000176:    Secondary, ha_prio/o_ha_prio=1/1
    
  2. 初始状态下查看FW3的OSPF邻居状态与路由信息,邻居状态为Full,OSPF路由学习正常,未进入GR Helper状态。

    FW3 # get router info ospf neighbor 
    OSPF process 0, VRF 0:
    Neighbor ID     Pri   State           Dead Time   Address         Interface
    202.103.12.1      1   Full/DR         00:00:34    202.103.12.1    port2
    
    FW3 # get router info routing-table ospf
    Routing table for VRF=0
    O       10.10.2.0/24 [110/3] via 202.103.12.1, port2, 02:47:00, [1/0]
    O       202.103.13.0/24 [110/2] via 202.103.12.1, port2, 02:47:00, [1/0]
    
  3. 初始状态下查看FW4的OSPF邻居状态与路由信息,邻居状态为Full,OSPF路由学习正常,未进入GR Helper状态。

    FW4 # get router info ospf neighbor 
    OSPF process 0, VRF 0:
    Neighbor ID     Pri   State           Dead Time   Address         Interface
    202.103.12.1      1   Full/Backup     00:00:33    202.103.13.1    port2
    
    FW4 # get router info routing-table ospf 
    Routing table for VRF=0
    O       10.10.1.0/24 [110/3] via 202.103.13.1, port2, 02:50:08, [1/0]
    O       202.103.12.0/24 [110/2] via 202.103.13.1, port2, 02:50:08, [1/0]
    
  4. 初始状态下查看FW1的OSPF邻居状态与路由信息,邻居状态为Full,OSPF路由学习正常。

    FW1 # get router info ospf neighbor 
    OSPF process 0, VRF 0:
    Neighbor ID     Pri   State           Dead Time   Address         Interface
    202.103.12.2      1   Full/Backup     00:00:38    202.103.12.2    port2
    202.103.13.2      1   Full/DR         00:00:40    202.103.13.2    port3
    
    FW1 # get router info routing-table ospf 
    Routing table for VRF=0
    O       10.10.1.0/24 [110/2] via 202.103.12.2, port2, 02:49:28, [1/0]
    O       10.10.2.0/24 [110/2] via 202.103.13.2, port3, 02:49:28, [1/0]
    
  5. 使FW1和FW2发生HA主备倒换,此时FW2成为HA主设备。

    FW2 # diagnose sys ha status
    HA information
    Statistics
            traffic.local = s:0 p:6233 b:4732229
            traffic.total = s:0 p:6286 b:4735596
            activity.ha_id_changes = 4
            activity.fdb  = c:0 q:0
    
    Model=80008, Mode=2 Group=7 Debug=0
    nvcluster=1, ses_pickup=0, delay=0
    
    [Debug_Zone HA information]
    HA group member information: is_manage_primary=1.
    FGVM08TM23000176:      Primary, serialno_prio=0, usr_priority=100, hostname=FW2
    FGVM08TM23000175:    Secondary, serialno_prio=1, usr_priority=128, hostname=FW1
    
    [Kernel HA information]
    vcluster 1, state=work, primary_ip=169.254.0.1, primary_id=0
    FGVM08TM23000176:      Primary, ha_prio/o_ha_prio=0/0
    FGVM08TM23000175:    Secondary, ha_prio/o_ha_prio=1/1
    
  6. 查看FW2的OSPF Debug信息,FW2成为HA主后,向OSPF邻居发送grace LSA(Type 9),通知邻居自己进入了GR过程,包含GR的原因和GR最大时间,LS Age为1s。

    FW2 # diagnose ip router ospf level info
    FW2 # diagnose ip router ospf all enable
    FW2 # diagnose debug console time enable
    
    2023-08-13 16:27:48 OSPF: SEND[LS-Upd]: To 224.0.0.5 via port2:202.103.12.1, length 120
    2023-08-13 16:27:48 OSPF:     LS type 9 (Link-Local Opaque-LSA)
    2023-08-13 16:27:48 OSPF:     Grace-LSA
    2023-08-13 16:27:48 OSPF:     Grace Period: 600
    2023-08-13 16:27:48 OSPF:     Graceful Restart Reason: Switch to redundant control processor
    2023-08-13 16:27:48 OSPF:     IP interface Address: 202.103.12.1
    
  7. 该报文格式如下图所示,LS age为1。

    image-20240115145113180

  8. 在FW2进入GR状态后,查看FW3的OSPF邻居状态与OSPF路由状态,FW3进入GR Helper状态(*标记),Dead Time变为FW1/FW2配置的GR period 600s。(FW4也是一样的状态,这里就不再赘述)

    FW3 # get router info ospf neighbor
    OSPF process 0, VRF 0:
    Neighbor ID     Pri   State           Dead Time   Address         Interface
    202.103.12.1      1   Full/Backup     00:09:51*   202.103.12.1    port2
    
    FW3 # get router info routing-table ospf
    Routing table for VRF=0
    O       10.10.2.0/24 [110/3] via 202.103.12.1, port2, 00:33:17, [1/0]
    O       202.103.13.0/24 [110/2] via 202.103.12.1, port2, 00:34:16, [1/0]
    
  9. 在FW3上查看OSPF的Debug信息,FW3收到GR LSA后,进入GR Helper模式。(FW4也是一样的状态,这里就不再赘述)

    FW3 # diagnose ip router ospf level info
    FW3 # diagnose ip router ospf all enable
    FW3 # diagnose debug console time enable
    
    2023-08-13 16:27:48 OSPF: RECV[LS-Upd]: From 202.103.12.1 via port2:202.103.12.2 (202.103.12.1 -> 224.0.0.5)
    2023-08-13 16:27:48 OSPF:     LS type 9 (Link-Local Opaque-LSA)
    2023-08-13 16:27:48 OSPF:   Grace-LSA
    2023-08-13 16:27:48 OSPF:     Grace Period: 600
    2023-08-13 16:27:48 OSPF:     Graceful Restart Reason: Switch to redundant control processor
    2023-08-13 16:27:48 OSPF:     IP interface Address: 202.103.12.1
    2023-08-13 16:27:48 OSPF: Router: Enter Helper mode by grace LSA
    
  10. 查看FW3上的OSPF 9类LSA的状态,此时LS Age未到达3600s。(FW4也是一样的状态,这里就不再赘述)

    FW3 # get router info ospf database opaque-area
    
               OSPF Router with ID (202.103.12.2) (Process ID 0, VRF 0)
                   Link-Local Opaque-LSA (Link port2:202.103.12.2)
     LS age: 5
     Options: 0x2 (*|-|-|-|-|-|E|-)
     LS Type: Link-Local Opaque-LSA
     Link State ID: 3.0.0.0 (Link-Local Opaque-Type/ID)
     Opaque Type: 3
     Opaque ID: 0
     Advertising Router: 202.103.12.1
     LS Seq Number: 80000001
     Checksum: 0x1445
     Length: 44
         Grace Period: 600
         Graceful Restart Reason: Switch to redundant control processor
         IP Interface Address: 202.103.12.1
    
  11. 等待FW2与FW3的OSPF邻居变为Full状态后,FW2的GR完成,FW3退出GR Helper状态,期间OSPF路由没有丢失,Dead Time从40s开始倒计时。(FW4也是一样的状态,这里就不再赘述)

    FW3 # get router info ospf neighbor
    OSPF process 0, VRF 0:
    Neighbor ID     Pri   State           Dead Time   Address         Interface
    202.103.12.1      1   Full/Backup     00:00:36    202.103.12.1    port2
    
    FW3 # get router info routing-table ospf
    Routing table for VRF=0
    O       10.10.2.0/24 [110/3] via 202.103.12.1, port2, 00:00:03, [1/0]
    O       202.103.13.0/24 [110/2] via 202.103.12.1, port2, 00:34:31, [1/0]
    
  12. 在FW2上查看OSPF的Debug信息,FW2正常退出了GR,并发送Age为3600的grace-LSA。

    FW2 # diagnose ip router ospf level info
    FW2 # diagnose ip router ospf all enable
    FW2 # diagnose debug console time enable
    
    2023-08-13 16:27:57 OSPF: ROUTER[Process:0, RouterID:202.103.12.1]: Exit Restarting normally
    2023-08-13 16:27:57 OSPF: SEND[LS-Upd]: To 224.0.0.5 via port3:202.103.13.1, length 72
    2023-08-13 19:05:59 OSPF:     LS age 3600
    2023-08-13 19:05:59 OSPF:     LS type 9 (Link-Local Opaque-LSA)
    2023-08-13 19:05:59 OSPF:   Grace-LSA
    
  13. 在FW3上查看OSPF的Debug信息,FW3收到FW2发送的maxage的LSA后,等待几秒后退出GR Helper状态。

    2023-08-13 16:28:08 OSPF: ROUTER: Exit Restart Helper mode for neighbor(port2:202.103.12.2-202.103.12.1) by receiving maxage grace-LSA
    
  14. 再次查看FW3上的OSPF 9类LSA的状态,此时LS Age直接到达3600s。

    FW3 # get router info ospf database opaque-link
    
                OSPF Router with ID (202.103.12.2) (Process ID 0, VRF 0)
                    Link-Local Opaque-LSA (Link port2:202.103.12.2)
      LS age: 3600
      Options: 0x2 (*|-|-|-|-|-|E|-)
      LS Type: Link-Local Opaque-LSA
      Link State ID: 3.0.0.0 (Link-Local Opaque-Type/ID)
      Opaque Type: 3
      Opaque ID: 0
      Advertising Router: 202.103.12.1
      LS Seq Number: 80000001
      Checksum: 0x1445
      Length: 44
          Grace Period: 600
          Graceful Restart Reason: Switch to redundant control processor
          IP Interface Address: 202.103.12.1
    
  15. 抓包查看FW3收到的这个maxage grace-LSA,可以看到LSA的age为3600s,达到了OSPF定义的maxage(https://www.rfc-editor.org/rfc/rfc2328#appendix-B),可以理解为FW2的OSPF邻居状态为Full后,会通过再次发送此类型的LSA告知GR Helper,可以退出GR Helper模式了。

    image-20240115145944862

  16. GR期间,FW2/FW4从FW1重启前同步的路由一直存在(route-ttl 600,prio=2164260865)。

    FW2 # get router info kernel | grep 10.10.
    tab=254 vf=0 scope=0 type=1 proto=30 prio=2164260865 0.0.0.0/0.0.0.0/0->10.10.1.0/24 pref=0.0.0.0 gwy=202.103.12.2 dev=4(port2)
    tab=254 vf=0 scope=0 type=1 proto=30 prio=2164260865 0.0.0.0/0.0.0.0/0->10.10.2.0/24 pref=0.0.0.0 gwy=202.103.13.2 dev=5(port3)
    
  17. GR结束后,学到了新的OSPF路由(prio=1)。

    FW2 # get router info kernel | grep 10.10.
    tab=254 vf=0 scope=0 type=1 proto=11 prio=1 0.0.0.0/0.0.0.0/0->10.10.1.0/24 pref=0.0.0.0 gwy=202.103.12.2 dev=4(port2)
    tab=254 vf=0 scope=0 type=1 proto=30 prio=2164260865 0.0.0.0/0.0.0.0/0->10.10.1.0/24 pref=0.0.0.0 gwy=202.103.12.2 dev=4(port2)
    tab=254 vf=0 scope=0 type=1 proto=11 prio=1 0.0.0.0/0.0.0.0/0->10.10.2.0/24 pref=0.0.0.0 gwy=202.103.13.2 dev=5(port3)
    tab=254 vf=0 scope=0 type=1 proto=30 prio=2164260865 0.0.0.0/0.0.0.0/0->10.10.2.0/24 pref=0.0.0.0 gwy=202.103.13.2 dev=5(port3)
    
  18. 整个HA切换引起的GR期间,所有设备路由转发表未发生实际变化,流量不会中断。

注意事项

  1. 在以上测试步骤过程中,如果FW1/FW2在GR的过程中发现了OSPF拓扑变化,则会直接终端并退出GR的过程,同时通知FW3/FW4退出GR Helper状态,导致OSPF路由提前从GR Helper设备上消失,导致业务临时中断。

    FW2 # diagnose ip router ospf level info
    FW2 # diagnose ip router ospf all enable
    FW2 # diagnose debug console time enable
    
    2023-08-13 17:45:46 OSPF: ROUTER[Process:0, RouterID:202.103.12.1]: Exit Restarting by NbrChanged (port3:202.103.13.1-202.103.13.2)
    2023-08-13 17:45:46 OSPF: LSA[-:Type9:3.0.0.0:(self)]: Flooding via interface[port2:202.103.12.1]
    2023-08-13 17:45:46 OSPF: LSA[-:Type9:3.0.0.0:(self)]: Flooding to neighbor[202.103.12.2]
    ......
    2023-08-13 17:45:46 OSPF: SEND[LS-Upd]: To 224.0.0.5 via port2:202.103.12.1, length 72
    2023-08-13 17:45:46 OSPF:     LS age 3600
    2023-08-13 17:45:46 OSPF:     LS type 9 (Link-Local Opaque-LSA)
    
  2. 为了避免OSPF拓扑变化导致的GR提前退出,可以在FW1/FW2上开启OSPF的restart-on-topology-change,开启后,OSPF进程在GR的过程中将忽略拓扑变化,继续完成GR的整个过程。

    仅支持7.2.0及以后版本。

    config router ospf
        set restart-on-topology-change enable
    end
    
  3. 如下Debug信息所示,FW1/FW2在GR期间探测到拓扑变化,但没有立即退出GR,而是继续正常完成了GR。

    2023-08-13 19:05:56 OSPF: ROUTER[Process:0, RouterID:202.103.12.1]: Exit Restarting by NbrChanged (port3:202.103.13.1-202.103.13.2)
    2023-08-13 19:05:56 OSPF: ROUTER[Process:0, RouterID:202.103.12.1]: skip exiting restart due to topology change
    ......
    2023-08-13 19:05:59 OSPF: ROUTER[Process:0, RouterID:202.103.12.1]: Exit Restarting normally
    ......
    2023-08-13 19:05:59 OSPF: SEND[LS-Upd]: To 224.0.0.6 via port2:202.103.12.1, length 72
    2023-08-13 19:05:59 OSPF:     LS age 3600
    2023-08-13 19:05:59 OSPF:     Options 0x2
    2023-08-13 19:05:59 OSPF:     LS type 9 (Link-Local Opaque-LSA)
    2023-08-13 19:05:59 OSPF:     Advertising Router 202.103.12.1
    2023-08-13 19:05:59 OSPF:   Grace-LSA
    ......
    

Copyright © 2024 Fortinet Inc. All rights reserved. Powered by Fortinet TAC Team.
📲扫描下方二维码分享此页面👇
该页面修订于: 2024-01-16 15:44:53

results matching ""

    No results matching ""