Traffic Engineering for L2VPN Services

Recently, I received a request to provide a solution requiring tactical traffic engineering. Imagine a network extensively utilizing auto-bandwidth-driven RSVP-TE LSPs. Due to the dynamic nature of these LSPs, some latency-sensitive traffic may take a longer path than desired. The traffic in question belongs to L2VPNs, specifically LDP VPWS (it has many names; I prefer this one).

Everything described below applies to the JunOS system (19-th release) and Juniper MX devices.

To explain the topic, I’ll depict a primitive topology. The problem covers a wider range of possible topologies.

Diagram

In the diagram above the ingress LER computes several LSPs through the provided topology. Imagine there are many possible destinations for these LSPs (not shown), including the egress LER. For the LSPx the ingress LER can use the shortest path toward the egress LER, but for the LSPy it is not possible due to lack of bandwidth.

The described scenario is possible when many LSPs from a single source follow to the same destination. This is required to spread the load across the network more evenly, solving possible binary packing issues. With the Auto-Bandwidth, LSPs in such a group can take different paths depending on the available bandwidth. Accordingly, different paths can lead to suboptimal routing and increased latency across traffic flows. Such conditions are appropriate for some services but not for all.

For our L2VPN service, its traffic either goes through one LSP or is balanced over both, depending on the number of flows inside the service and the service characteristics. When traffic uses the LSPy or some flows use it, there is a significant difference in terms of the latency comparing the cases when the LSPx is in use. I was asked to solve this.

The solution for the problem is a central computation model, i.e., controllers, but they are still rare, especially in mid-size and small networks.

Workaround Overview

I called this a workaround because I’ve already described the true solution, but for several reasons, it cannot be done. Thus, we need a workaround.

First, we need an LSP that strictly follows the shortest possible path, through the LSR1. Next, we need to bind our L2VPN service with this LSP. Then, we need other services to avoid this LSP. And finally, this LSP must be protected in case of the path failure. Seems easy, right?

All stuff is done in a virtual lab, no real devices were hurt.

First Approach

Even if we nail a new LSP to the desired path, it still contends with others for available bandwidth. Based on the fact, that this LSP transports premium services, its setup and hold priorities were set to have almost exclusive access to the bandwidth.

Next, the LSP was configured to use paths containing only a specific admin-group to meet the required latency.

The LSP shares its destination address (the egress LER) with others, so the LSP must be excluded from the sight of any services. For this, its preference has been raised above the default.

Additionally, the bypass protection (the node-link-protection) was turned to consider this LSP.

admin-groups {
    ag-shortest-path 16;
}

label-switched-path lsp-pf-to-egress-1 {
    to 192.0.2.2;
    bandwidth 100m*;
    admin-group include-all ag-shortest-straw;
    priority 3 3;
    preference 9;
    node-link-protection;
    primary PRIMARY;
}
path PRIMARY;

* In a lab there was a static bandwidth allocation for the LSP instead of the Auto-Bandwidth.

Eventually, in the inet.3 table for the destination of the egress LER 192.0.2.2 we can see two LSPs:

root@ls:ingress> show route table inet.3

inet.3: 1 destinations, 2 routes (1 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

192.0.2.2/32       *[RSVP/7/1] 05:00:03, metric 1
                    >  to 10.0.0.1 via lt-0/0/0.0, label-switched-path lsp-be-to-egress-1
                    [RSVP/9/1] 00:02:36, metric 1
                    >  to 10.0.0.3 via lt-0/0/0.2, label-switched-path lsp-pf-to-egress-1
                       to 10.0.0.1 via lt-0/0/0.0, label-switched-path Bypass->10.0.0.3->10.0.0.5

root@ls:ingress>

The first one (without protection) is the best-effort, it follows the longest possible path (via the LSR2 or lt-0/0/0.0). The priority forwarding LSP follows the shortest path (via the LSR1 or lt-0/0/0.2). Also, there are different preference values (7 vs. 9) that exclude the priority LSP from service.

So far, so good. It’s time to bind the required L2VPN service with this LSP. There is a KB document describing how to achieve this.

A special policy and community were provisioned:

root@ls:ingress> show configuration policy-options
policy-statement ps-ds-load-balance {
    then {
        load-balance per-packet;
    }
}
policy-statement ps-l2vpn-select-path {
    term accept-shortest-straw {
        from community cm-l2vpn-match-shortest-lsp;
        then {
            install-nexthop lsp lsp-pf-to-egress-1;
        }
    }
}
community cm-l2vpn-match-shortest-lsp members 64512:1;

root@ls:ingress> show configuration routing-options
forwarding-table {
    export [ ps-l2vpn-select-path ps-ds-load-balance ];
}
router-id 192.0.2.1;

root@ls:ingress>

The service was marked by this community:

root@ls:ingress> show configuration protocols l2circuit
neighbor 192.0.2.2 {
    interface ge-0/0/0.10 {
        virtual-circuit-id 10;
        control-word;
        community cm-l2vpn-match-shortest-lsp;
        pseudowire-status-tlv;
    }
}

root@ls:ingress>

Now, it is time to check the LSP the service uses:

root@ls:ingress> show route forwarding-table table default family mpls
Logical system: ingress
Routing table: default.mpls
MPLS:
Destination        Type RtRef Next hop           Type Index    NhRef Netif
default            perm     0                    dscd      683     1
...
ge-0/0/0.10  (CCC) user     0                    indr  1048579     2
                              10.0.0.1          Push 299776      772     2 lt-0/0/0.0

root@ls:ingress>

Unfortunately, this is not the LSP we are expecting.

The problem is in the preference values. JunOS does not treat our priority LSP as serviceable because there is a preferable one for the destination (with a lower value) which is used.

To check the preference hypothesis I changed it to the default values and raised the metric toward the LSR1, expecting the service to use an alternative path:

root@ls:ingress# show | compare
[edit logical-systems ingress protocols ospf area 0.0.0.0 interface lt-0/0/0.2]
+     metric 100;
[edit logical-systems ingress protocols mpls label-switched-path lsp-pf-to-egress-1]
-    preference 9;

[edit]
root@ls:ingress#

The way toward the LSR2 is shortest now, but the L2VPN service uses the priority LSP (via LSR1):

root@ls:ingress> show route table inet.3

inet.3: 1 destinations, 1 routes (1 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

192.0.2.2/32       *[RSVP/7/1] 00:00:02, metric 1
                    >  to 10.0.0.1 via lt-0/0/0.0, label-switched-path lsp-be-to-egress-1
                       to 10.0.0.3 via lt-0/0/0.2, label-switched-path lsp-pf-to-egress-1
                       to 10.0.0.1 via lt-0/0/0.0, label-switched-path Bypass->10.0.0.3->10.0.0.5

root@ls:ingress> show route forwarding-table table default family mpls
Logical system: ingress
Routing table: default.mpls
MPLS:
Destination        Type RtRef Next hop           Type Index    NhRef Netif
...
ge-0/0/0.10  (CCC) user     0                    indr  1048579     2
                                                 ulst  1048575     2
                              10.0.0.3          Push 299776, Push 299872(top)      768     2 lt-0/0/0.2
                              10.0.0.1          Push 299776      771     2 lt-0/0/0.0

root@ls:ingress>

Which means I was right with the hypothesis and wrong with the solution in general.

Core Diversity

I thought about the core diversity pattern from the very beginning but didn’t want to use it for several reasons that came to my mind back then.

This pattern separates paths in the core network via different routing contexts. In my case, a context is simply a single IGP topology with several (at least two) loopback addresses on the egress LSR. But it can be a more complex entity, like a dedicated topology, FAD, slice, etc. The matter is how to attach a service to it. The loopback address is a convenient option.

So, the egress LSR marked with another address:

root@ls:egress> show configuration interfaces lo0
unit 2 {
    family inet {
        address 192.0.2.2/32 {
            primary;
            preferred;
        }
        address 192.0.2.20/32;
    }
}

root@ls:egress>

The tunnel endpoint address for the priority LSP changed now:

label-switched-path lsp-pf-to-egress-1 {
    to 192.0.2.20;
    ...
}

And here we have a problem. Because this new address is not the TE address of the egress LSR (its router ID), a CSPF LSP to 192.0.2.20 cannot be calculated.

For IS-IS, JunOS supports the draft which solves the issue, allowing targeting tunnels to any local addresses of a tail-end, but this is not the case with OSPF. The appropriate standard is not supported <irony>because they are too busy doing the breakthrough SR-like stuff</irony>.

The workaround for this is to turn CSPF off and provide the strict path directly in the LSP:

label-switched-path lsp-pf-to-egress-1 {
    to 192.0.2.20;
    bandwidth 100m;
    priority 3 3;
    no-cspf;
    node-link-protection;
    primary EXPLICIT;
}

Fortunately, the bypass (as well as a detour) can still be calculated for such an LSP. One headache is lesser.

Now, we have the two destinations in the inet.3 table for a single egress router:

root@ls:ingress> show route table inet.3

inet.3: 2 destinations, 2 routes (2 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both

192.0.2.2/32       *[RSVP/7/1] 00:02:26, metric 1
                    >  to 10.0.0.1 via lt-0/0/0.0, label-switched-path lsp-be-to-egress-1
192.0.2.20/32      *[RSVP/7/1] 00:02:24, metric 1
                    >  to 10.0.0.3 via lt-0/0/0.2, label-switched-path lsp-pf-to-egress-1
                       to 10.0.0.1 via lt-0/0/0.0, label-switched-path Bypass->10.0.0.3->10.0.0.5

root@ls:ingress>

This saves us from separating services, they all still use the original destination address 192.0.2.2.

All that we need now is to tie the target service with this new address:

root@ls:ingress> show configuration protocols l2circuit
inactive: neighbor 192.0.2.2 {
    interface ge-0/0/0.10 {
        virtual-circuit-id 10;
        control-word;
        community cm-l2vpn-match-shortest-lsp;
        pseudowire-status-tlv;
    }
}
neighbor 192.0.2.20 {
    interface ge-0/0/0.10 {
        virtual-circuit-id 10;
        control-word;
        pseudowire-status-tlv;
    }
}

root@ls:ingress>

But it does not work either. The service is down on both sides. What’s wrong?

The problem here is we don’t receive a FEC from the 192.0.2.20 which we are targeting by the neighbor stanza. By this address I mean the LDP Identifier which is encoded in every LDP PDU and not a session’s transport address.

We have an extended LDP session with the egress LER which is identified as 192.0.2.2:

root@ls:ingress> show ldp session detail
...
  Session ID: 192.0.2.1:0--192.0.2.2:0
...
root@ls:ingress>

JunOS has a special knob for this case:

root@ls:ingress> show configuration protocols l2circuit
neighbor 192.0.2.2 {
    interface ge-0/0/0.10 {
        psn-tunnel-endpoint 192.0.2.20;
        virtual-circuit-id 10;
        control-word;
        pseudowire-status-tlv;
    }
}

root@ls:ingress>

In my understanding, this knob specifies the address which must be resolved in the tunnel table (which is in most cases the inet.3) to match with a specific tunnel (LSP).

Eventually, we have this working with the correct LSP:

root@ls:ingress> show route forwarding-table table default family mpls
Logical system: ingress
Routing table: default.mpls
MPLS:
Destination        Type RtRef Next hop           Type Index    NhRef Netif
...
ge-0/0/0.10  (CCC) user     0                    indr  1048575     2
                                                 ulst  1048574     2
                              10.0.0.3          Push 299776, Push 299888(top)      763     2 lt-0/0/0.2
                              10.0.0.1          Push 299776      765     2 lt-0/0/0.0

root@ls:ingress>

Summary

Cons

  • Without CSPF admin-groups fail to work.
  • Without CSPF the Auto-Bandwidth does not work either which leads us to statically provisioned bandwidth.
  • Creates extra administrative burden. You have to manually provision LSPs and explicit paths for them. The latter requires an enumeration of strict hops to traverse. Tough to change.
  • Requires additional addresses on routers. Possibly requires altering the CoPP policies.
  • Requires additional LSPs (potentially additional bypass LSPs) which can pose scaling issues depending on the total number of LSPs.

Pros

  • Do not require optimization if an explicit path is completely strict.
  • LSPs can be used for BGP services as well.
  • Supports the active backup scheme with LDP VPWS where a backup PW is also targeted to an additional loopback with a dedicated transport.
  • Solves the task.