I was recently going through the old Network Collective and came across their episode with Daniel Walton which was an excellent listen. This podcast brought up BGP Route Oscillation and the work that Daniel Walton put in to help mitigate it.
Being a newer engineer, I went down a small rabbit hole looking at RFC3345 which documents the problem that Daniel was seeing in his role at Cisco. This document goes over a potential issue that can happen when utilizing a “single-level Route Reflection” and “the network accepts the BGP MULTI_EXIT_DISC (MED) attribute”.
I couldn’t find much online that talked about RFC3345, and decided that it would be something fun to lab out. If you feel you have a good understanding of BGP, feel free to skip ahead.
Multiple books have been written on BGP. This blog will not do the protocol justice, but I would like to cover some basics before going into the problem itself.
Routing protocols are used as a way to scale loop free route propagation between routing systems. When these routes are received, the routers decide on the “preferred” route based on various metrics like lower hop count, lower link cost ect. BGP is unique in that, by default, there must be a single best path for any given route and it will not equal-cost mutlipath or ECMP.
BGP advertisements which are called network layer reachability information (NLRI) they contain attributes that color the advertisements. There are a whole list of mandatory, optional, transitive and non-transitive attributes and NLRI is the prefix being advertised and these attributes for the particular routes.
This results in long bgp best path selection and below is the list pulled from Junipers website.
If a router receives the same NLRI from two BGP sources, it will go down the list until one of the NLRI is preferred. BGP will then advertise that best path to its neighbors.
Lets see what that will look like in a small example.

This diagram has 4 ASNs: 100, 200, 300, and 400. R1 (AS100) will be advertising the prefix 198.18.0.0/16 to both R2 in AS200 and R3 in AS300. The BGP advertisement should show only 1 ASN in the BGP path.
root@R2> show route protocol bgp 198.18.0.0/16
inet.0: 6 destinations, 6 routes (6 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both
198.18.0.0/16 *[BGP/170] 01:21:27, localpref 100
AS path: 100 I, validation-state: unverified
> to 192.168.10.1 via ge-0/0/1.0
root@R2>
root@R3> show route protocol bgp 198.18.0.0/16
inet.0: 6 destinations, 6 routes (6 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both
198.18.0.0/16 *[BGP/170] 01:19:07, localpref 100
AS path: 100 I, validation-state: unverified
> to 192.168.15.1 via ge-0/0/2.0
root@R3>
Both routers are in turn advertising the 198.18.0.0/16 to R4 and R6, and have AS path of 100 I to get to 198.18.0/16.
root@R2> show route protocol bgp advertising-protocol bgp 10.10.10.4
inet.0: 6 destinations, 6 routes (6 active, 0 holddown, 0 hidden)
Prefix Nexthop MED Lclpref AS path
* 198.18.0.0/16 Self 100 I
root@R2>
root@R3> show route protocol bgp advertising-protocol bgp 10.15.15.6
inet.0: 6 destinations, 6 routes (6 active, 0 holddown, 0 hidden)
Prefix Nexthop MED Lclpref AS path
* 198.18.0.0/16 Self 100 I
root@R3>
When R4 and R6 get the 198.18.0.0/16 prefix, they are both going to advertise it to R5. R5 is going to have to make the determination on which path it will install into its RIB. Lets take a look at the BGP table and see if there are any differences between the advertisements.
From R4
root@R4> show route advertising-protocol bgp 5.5.5.5 detail
inet.0: 11 destinations, 11 routes (11 active, 0 holddown, 0 hidden)
* 198.18.0.0/16 (1 entry, 1 announced)
BGP group AS400 type Internal
Nexthop: 10.10.10.2
Localpref: 100
AS path: [400] 200 100 I
root@R4>
In this, the AS path is iBGP from ASN 400 –> 200 –> 100 to the destination. It has the default local preference of 100 and a next-hop of 10.10.10.2 which is the interface between R2 and R4.
From R6
root@R6> show route advertising-protocol bgp 5.5.5.5 detail
inet.0: 11 destinations, 11 routes (11 active, 0 holddown, 0 hidden)
* 198.18.0.0/16 (1 entry, 1 announced)
BGP group AS400 type Internal
Nexthop: 10.15.15.3
Localpref: 100
AS path: [400] 300 100 I
root@R6>
This is similar to R4, with the except that the AS path is through ASN 300 in the middle and a different next hop. R5 is receiving both of those routes that are almost identical, and must make the determination as to which route will be installed in its forwarding table.
R5 will run through the BGP path selection process to determine the desirable path.
Both have the same local-preference, AS-path length, origin value, MED value, both are advertised via an iBGP peer, and R5 has the same igp metric to get to either advertised next-hop. The tie breaker should be the lower Router-ID and choose R4.
R5’s route table does show that it is using the path through R4.
root@R5> show route protocol bgp 198.18.0.0/16
inet.0: 11 destinations, 12 routes (11 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both
198.18.0.0/16 *[BGP/170] 00:13:15, localpref 100, from 4.4.4.4
AS path: 200 100 I, validation-state: unverified
> to 172.16.10.4 via ge-0/0/1.0
[BGP/170] 00:13:14, localpref 100, from 6.6.6.6
AS path: 300 100 I, validation-state: unverified
> to 172.16.15.6 via ge-0/0/2.0
And looking at the detailed output, it shows why the path through R6 was not utilized
Inactive reason: Router ID
root@R5> show route protocol bgp 198.18.0.0/16 detail
inet.0: 11 destinations, 12 routes (11 active, 0 holddown, 0 hidden)
198.18.0.0/16 (2 entries, 1 announced)
*BGP Preference: 170/-101
Next hop type: Indirect, Next hop index: 0
Address: 0x77c6624
Next-hop reference count: 2
Kernel Table Id: 0
Source: 4.4.4.4
Next hop type: Router, Next hop index: 588
Next hop: 172.16.10.4 via ge-0/0/1.0, selected
Session Id: 140
Protocol next hop: 10.10.10.2
Indirect next hop: 0x74dd888 1048574 INH Session ID: 322
State: <Active Int Ext>
Local AS: 400 Peer AS: 400
Age: 14:24 Metric2: 2
Validation State: unverified
Task: BGP_400.4.4.4.4
Announcement bits (2): 0-KRT 4-Resolve tree 1
AS path: 200 100 I
Accepted
Localpref: 100
Router ID: 4.4.4.4
Thread: junos-main
BGP Preference: 170/-101
Next hop type: Indirect, Next hop index: 0
Address: 0x77c67e4
Next-hop reference count: 1
Kernel Table Id: 0
Source: 6.6.6.6
Next hop type: Router, Next hop index: 587
Next hop: 172.16.15.6 via ge-0/0/2.0, selected
Session Id: 141
Protocol next hop: 10.15.15.3
Indirect next hop: 0x74ddbb8 - INH Session ID: 0
State: <Int Ext Changed>
Inactive reason: Router ID
Local AS: 400 Peer AS: 400
Age: 14:23 Metric2: 2
Validation State: unverified
Task: BGP_400.6.6.6.6
AS path: 300 100 I
Accepted
Localpref: 100
Router ID: 6.6.6.6
Thread: junos-main
There are two BGP path selection options that should be focused on for this article.
MED is a non-transitive optional attribute that can be advertised by a BGP speaker to influence entry points from external ASes. MED stands for Multi-exit-disc and it is a metric value where the lower the MED value the better the route is. Imagine a scenario where you are dual homed to a single provider on a single 10g link and a 1g link. If a lower MED value was advertised on the 10g link, that may force traffic to prefer entry into your AS through that 10g peering. Granted, for MED to matter, all tie breakers above must be equal such as local pref, AS path and origin.
Best Exit from AS is referring to the IGP cost to the next-hop value that is a mandatory attribute within a BGP advertisement. A lower IGP cost is preferable to a higher IGP cost and is used as a route tie breaker, though it is low on the list.
A thing to note is that different IGPs can have difference costs associated with them. If all other paths being equal, it is possible to take a less desirable path due to how the IGP is setup.
Running iBGP is a fine and dandy thing to do, but it can run scaling problems. iBGP routers will not advertise routes they have learned via iBGP to other iBGP speakers. This is due to the fact that BGP’s loop prevention mechanism relies on the AS Path. Since iBGP speakers have the same ASN, it cannot correctly determine if there is a loop within the network.
In the diagram above, there would have to be an iBGP session between all the routers in AS 400 if they were to share all routes between themselves. This is a (n * n -1) / 2 problem.
In AS 400, there would be 3 iBGP sessions (R4 to R5, R4 to R6, and R6 to R5). That is doable in a network of 3 iBGP routers, but in a network of 10 iBGP routers, there would have to be 45 sessions.
Route Reflectors (RR) are one way to solve this issue. Instead of every iBGP router peered together, one or multiple routers can become a Route reflectors, and every iBGP router peers with just the route reflector. The Route Reflectors will receive NLRI from their iBGP clients, and re-advertise them to all iBGP devices.
RR do introduce a problem with route obfuscation. Every BGP speaker chooses only the best path according to it, and advertises only that route out to its peers. This can be an issue depending on where your RRs are placed within the network, and is a contributing reason to route oscillation. Since RRs only choose what it thinks as the best route, it can advertise inefficient routes to its peers. How inefficient is dependent on the particular network design, but it is something to keep in mind.
The problem statement is that when using a single Tier route reflector and allowing MED in on NLRI, there is the possibility that there will be constant route oscillation as the routers will change which routes they believe to be the most preferred. This becomes an A > B > C > A and is why the routes constantly oscillate. Each time the router compares the received route to its chosen best path, it will choose another “best path”.
The following lab with go over the environment that may lead to route oscillations.

This is the lab that we will be working through. In this, we are working through the perspective of AS1, primarily that of R1 and R2. In this topology, R1 and R2 are route reflectors with R1 having a R9 and R5 as clients, and R2 having R6 as its RR client.
R9, R5 and R6 having an eBGP peering to AS10 and AS6 and are receiving a single route - 10.0.0.0/8. This route is being advertised with a different MED value depending on the peering relationship. Remember, the lower the MED value, the more preference that particular route has. On top of the MED score, there is an IGP cost of 450, 440, and 600. The NH is referring to “next-hop” of the BGP route.
Since BGP can only have 1 preferred path for a single prefix by default, we will whiteboard which route AS1 will prefer for the 10.0.0.0/8 prefix. This will be in a time series, and afterwards we will go over show commands to see what is happening. Lets start with the perspective of R1 and R2.
Both R9 and R5 will each advertise 10.0.0.0/8 with the same AS Path length and with a MED score of 10 and 1 respectively to R1. The BGP Path Selection tie break will be the “best exit from AS” since the Local pref is the default of 100, AS path is the same, origin of the route is the same, MED is only compared for routes that are advertised by the same AS, and both routes are learned via an iBGP peer.
R6 will advertise the 10.0.0.0/8 prefix to R2 with an AS path of 100 6 I, and a MED score of 0. Since R6 is the only one advertising to R2 at this point, this route is what will get installed on R2.
R1 will advertise the 10.0.0.0/8 route through R5 to R2 and R2 will advertise the 10.0.0.0/8 route through R6 to R1.
R2 will prefer the R6 path to the 10.0.0.0/8 prefix.
R2 is receiving the 10.0.0.0/8 from two different routers that happen to be tied to ASN 6. MED will now be compared on the routes and the lower the MED score, the higher preference that route has. R2 will choose the route through R6 as MED score of 0 is better than a MED score of 1.
R1 will decide that the path through R6 is preferable to the route from R5 after it compares both routes. R1 will then be comparing the R9 route vs the R5 route. Since R9 and R6 are peered to different BGP AS, the tie breaker will be the lower exit. R1 will then have to pick between the route between R9 and R6 and will choose the path through R9 as it has a “best exit from AS”. R1 will then send a BGP Update to R2 for the NLRI through R9.
Default behavior of BGP is to only prefer one route. AS R2 is now choosing the path through R9 as its preferred path, it will send a withdraw message to R1 retracting its advertisement of its path through R6.
This is the full circle of the route oscillation. R1 is preferring the path through R5 since R2 retracted its route through R6. R5s path has a lower IGP cost and has a higher route preference because of it. As soon as R1 advertises the path through R5 to R2, R2 will then prefer the R6 route and the cycle will happen once more. This route oscillation will happen ad nauseam until changes are made to configuration which we will go over later.
Lets see how this looks from the cli:
R1 is preferring the route through R5 due to R5 having a lower IGP metric.
root@vRouter1> show route protocol bgp 10.0.0.0/8
inet.0: 16 destinations, 17 routes (16 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both
10.0.0.0/8 *[BGP/170] 00:00:18, MED 1, localpref 100, from 5.5.5.5
AS path: 6 100 I, validation-state: unverified
> to 172.15.1.5 via ge-0/0/0.0
[BGP/170] 1d 03:01:24, MED 10, localpref 100, from 9.9.9.9
AS path: 10 100 I, validation-state: unverified
> to 172.19.1.9 via ge-0/0/2.0
BGP Preference: 170/-101
Next hop type: Indirect, Next hop index: 0
Source: 9.9.9.9
<output ommitted>
Next hop: 172.19.1.9 via ge-0/0/2.0, selected
Session Id: 145
Protocol next hop: 172.39.3.3
Indirect next hop: 0x74ddbb8 1048577 INH Session ID: 328
State: <Int Ext>
Inactive reason: IGP metric
<output ommitted >
Lets take a look at R2 as it will be preferring the path through R6 as it is the only route it has in its table.
root@vRouter2> show route protocol bgp 10.0.0.0/8
inet.0: 10 destinations, 10 routes (10 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both
10.0.0.0/8 *[BGP/170] 1d 03:21:13, MED 0, localpref 100, from 6.6.6.6
AS path: 6 100 I, validation-state: unverified
> to 172.26.2.6 via ge-0/0/0.0
root@vRouter2>
As we can see, it only has the one route in its table and preferring it. Now lets see what happens after R1 and R2 begin advertising routes to each other. We will first look at R1 throughout the oscillation cycle and then look at R2s output.
root@vRouter1> show route protocol bgp 10.0.0.0/8
inet.0: 20 destinations, 22 routes (20 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both
10.0.0.0/8 *[BGP/170] 00:00:25, MED 10, localpref 100, from 9.9.9.9
AS path: 10 100 I, validation-state: unverified
> to 172.19.1.9 via ge-0/0/2.0
[BGP/170] 00:00:00, MED 0, localpref 100, from 2.2.2.2
AS path: 6 100 I, validation-state: unverified
> to 172.12.1.2 via ge-0/0/1.0
[BGP/170] 00:00:25, MED 1, localpref 100, from 5.5.5.5
AS path: 6 100 I, validation-state: unverified
> to 172.15.1.5 via ge-0/0/0.0
R1 is receiving the 10.0.0.0/8 from R9, R2, and R5. As the route from R5 and R2 are coming from same AS of 6, it will compare the routes MED. MED of 0 is greater than a MED of 1, so R2 is preferrable. It will then compare the route from R2 and R9, and R9 is the chosen best route due to it having a lower IGP metric. Remember, MED is only compared on routes that originate form the same AS.
A detailed show output will show the “Inactive reason”
10.0.0.0/8 (3 entries, 1 announced)
*BGP Preference: 170/-101
Source: 9.9.9.9
Next hop type: Router, Next hop index: 596
Next hop: 172.19.1.9 via ge-0/0/2.0, selected
Protocol next hop: 172.39.3.3
Local AS: 1 Peer AS: 1
Age: 3:18 Metric: 10 Metric2: 550
AS path: 10 100 I
Localpref: 100
Router ID: 9.9.9.9
BGP Preference: 170/-101
Source: 2.2.2.2
Next hop type: Router, Next hop index: 586
Next hop: 172.12.1.2 via ge-0/0/1.0, selected
Session Id: 14b
Inactive reason: IGP metric
Local AS: 1 Peer AS: 1
Age: 1 Metric: 0 Metric2: 800
Task: BGP_1.2.2.2.2
AS path: 6 100 I (Originator)
Cluster list: 2.2.2.2
Originator ID: 6.6.6.6
Localpref: 100
Router ID: 2.2.2.2
BGP Preference: 170/-101
Source: 5.5.5.5
Next hop type: Router, Next hop index: 591
Next hop: 172.15.1.5 via ge-0/0/0.0, selected
Protocol next hop: 172.57.5.7
Inactive reason: Not Best in its group - Route Metric or MED comparison
Local AS: 1 Peer AS: 1
Age: 3:18 Metric: 1 Metric2: 540
ORR Generation-ID: 0
AS path: 6 100 I
Localpref: 100
Router ID: 5.5.5.5
Once R2 sends a BGP update message with the R6 path withdrawn, R1 goes back to preferring the path through R5.
root@vRouter1> show route protocol bgp 10.0.0.0/8
inet.0: 20 destinations, 21 routes (20 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both
10.0.0.0/8 *[BGP/170] 00:06:15, MED 1, localpref 100, from 5.5.5.5
AS path: 6 100 I, validation-state: unverified
> to 172.15.1.5 via ge-0/0/0.0
[BGP/170] 00:06:15, MED 10, localpref 100, from 9.9.9.9
AS path: 10 100 I, validation-state: unverified
> to 172.19.1.9 via ge-0/0/2.0
On R2, it was preferring the R6 route until R1 advertise the path through R9. Once that occurs, R2 will prefer the new path, withdraw the path through R6.
root@vRouter2> show route protocol bgp 10.0.0.0/8
inet.0: 19 destinations, 20 routes (19 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both
10.0.0.0/8 *[BGP/170] 00:00:00, MED 10, localpref 100, from 1.1.1.1
AS path: 10 100 I, validation-state: unverified
> to 172.12.1.1 via ge-0/0/1.0
[BGP/170] 1d 14:21:20, MED 0, localpref 100, from 6.6.6.6
AS path: 6 100 I, validation-state: unverified
> to 172.26.2.6 via ge-0/0/0.0
root@vRouter2>
That route withdrawal will cause R1 to recompute its BGP path selection and R1 will choose the path through R5 over R9, and send the update to R2. This will cause R2 to re-run its BGP path selection process and it will choose the path of R6 over the route advertised by R5.
root@vRouter2> show route protocol bgp 10.0.0.0/8
inet.0: 19 destinations, 20 routes (19 active, 0 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both
10.0.0.0/8 *[BGP/170] 1d 14:29:06, MED 0, localpref 100, from 6.6.6.6
AS path: 6 100 I, validation-state: unverified
> to 172.26.2.6 via ge-0/0/0.0
[BGP/170] 00:00:00, MED 1, localpref 100, from 1.1.1.1
AS path: 6 100 I, validation-state: unverified
> to 172.12.1.1 via ge-0/0/1.0
root@vRouter2>
The technical phrase for this route oscillation is called “messed up”. The issue at hand should be apparent at this point so the question that should be asked is how to resolve this. There are multiple different ways to go about it. The operator could filter MED on inbound so that the routers within AS1 do not care about the MED values at all, but there are two nerd knobs that can be used to stop the oscillation.
The first one that we will go over is always-compare-med. MED is only compared when there are two separate route advertisements that are advertised via the same AS, but that can be over-ridden if always-compare-med is utilized. In the lab above, R6 would be the preferred path due to the fact that it has the lowest MED value. However, this may not be the ideal route for all devices within the network.
A better option would be to use BGP add-path. What this does is allow BGP to advertise multiple routes to their BGP peers, not just their best path. Each router would then have a fuller picture of the network, and run BGP best path algorithm based on multiple different routes.
Lets change the configuration on R1 and R2 and see what route they are both preferring.
root@vRouter1> show configuration protocols bgp | display set
set protocols bgp group internal type internal
set protocols bgp group internal local-address 1.1.1.1
set protocols bgp group internal family inet unicast add-path receive
set protocols bgp group internal family inet unicast add-path send path-count 4
set protocols bgp group internal cluster 1.1.1.1
set protocols bgp group internal peer-as 1
set protocols bgp group internal neighbor 9.9.9.9
set protocols bgp group internal neighbor 5.5.5.5
set protocols bgp group R2 type internal
set protocols bgp group R2 local-address 1.1.1.1
set protocols bgp group R2 family inet unicast add-path receive
set protocols bgp group R2 family inet unicast add-path send path-count 4
set protocols bgp group R2 peer-as 1
set protocols bgp group R2 neighbor 2.2.2.2
The configuration that was added is the add-path options which allows BGP to receive add-path and to send up to 4 BGP best paths. With each router now receiving the 3 routes at one time, the best path will be through R9.
root@vRouter1> show route protocol bgp 10.0.0.0/8
inet.0: 20 destinations, 22 routes (20 active, 1 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both
10.0.0.0/8 *[BGP/170] 00:03:23, MED 10, localpref 100, from 9.9.9.9
AS path: 10 100 I, validation-state: unverified
> to 172.19.1.9 via ge-0/0/2.0
[BGP/170] 00:03:23, MED 0, localpref 100, from 2.2.2.2
AS path: 6 100 I, validation-state: unverified
> to 172.12.1.2 via ge-0/0/1.0
[BGP/170] 00:03:23, MED 1, localpref 100, from 5.5.5.5
AS path: 6 100 I, validation-state: unverified
> to 172.15.1.5 via ge-0/0/0.0
root@vRouter1>
Here is the view from R2
root@vRouter2> show route protocol bgp 10.0.0.0/8
inet.0: 19 destinations, 21 routes (19 active, 1 holddown, 0 hidden)
+ = Active Route, - = Last Active, * = Both
10.0.0.0/8 *[BGP/170] 00:10:06, MED 10, localpref 100, from 1.1.1.1
AS path: 10 100 I, validation-state: unverified
> to 172.12.1.1 via ge-0/0/1.0
[BGP/170] 00:10:43, MED 0, localpref 100, from 6.6.6.6
AS path: 6 100 I, validation-state: unverified
> to 172.26.2.6 via ge-0/0/0.0
[BGP/170] 00:10:06, MED 1, localpref 100, from 1.1.1.1
AS path: 6 100 I, validation-state: unverified
> to 172.12.1.1 via ge-0/0/1.0
And we can see that the route is going through R9.
root@vRouter2> traceroute 10.0.0.1
traceroute to 10.0.0.1 (10.0.0.1), 30 hops max, 52 byte packets
1 172.12.1.1 (172.12.1.1) 2.079 ms 1.716 ms 1.598 ms
2 172.19.1.9 (172.19.1.9) 2.330 ms 2.781 ms 2.255 ms
^C
root@vRouter2>