Algorithm to detect a transit free network
In a recent Network AF podcast Avi Freedman (Kentik) joked with the guest about how he finds who is transit free / tier 1 network. He said, “I ask everyone who they think is a tier 1 network. Everyone includes their own name + other names”. Next, he ignores the selfnomination & looks at the common list to find who actually is a tier 1 network. This is funny, intuitive and gives some clue.
This makes me think of other possible ways to find that out.
 Can we determine that from the global routing table?
 Will that give a complete picture?
 How many views of the global table dump we would need to have that picture? Is one view from a random operator sitting somewhere will be good enough?
A reminder on transit free / tier 1 network status
Basically, a transitfree network is the one which is without IP transit and can reach all other networks in the world purely based on it’s peering & downstream relationships. In other words, it does not pays for the IP transit i.e bandwidth on layer 3 to transit. Often these networks are called tier 1 networks but that’s bit abused term in the marketing & often one can find large country telcos claiming the “tier 1 status”. The situation gets further confusing by Mr Bill Norton from The Dr Peering Playbook agrees to call them “National tier 1 networks” in his book. For the purpose of this blog, I am going to be referring only to global tier 1 network / transitfree networks only.
Revisiting the routing table concepts
Few routing tablespecific concepts can be used to build an algorithm to mathematically determine who are transitfree networks.

Transitfree networks must peer with all other transitfree networks. No exception. If there are say “N” number of transitfree networks a network peers with “N1” i.e all except one, then it’s technically not a transitfree network. Surely network trafficwise that would be insignificant but mathematically that does not work.

A transitfree network in theory can be ASN without any downstream. Although that was not practical but might very well be true today. Imagine the status of networks like Google (AS15169) in present times, especially in IPv6. Thus algorithm should consider all ASNs and not just transit ASNs.

A network can be transit free in either IPv4 or IPv6 or both. In other terms, the algorithm must determine the status of each address family independently.

In theory, a network can be transit free by just being a peer of all other transitfree networks but that would be impractical and unlikely. Whether we look at the network of large telcos or content players, they all have many nontransitfree adjacencies. in the case of telcos, they are mostly downstream customers while for content players those are mostly settlementfree peering sessions. Thus while the algorithm should consider all ASNs, the preference in the litmus test can be given to networks sorted by the largest number of adjacency to make them run faster.

How many points of the full table we should consider  This is a hard one. Imagine I look at the full table purely from my upstream ISP here in Haryana. They are a downstream customer of Airtel (AS9498) + some peerings here & there. Airtel (AS9498) is downstream of multiple large networks outside of India in US/EU/Singapore & thus their full table is a blended feed. Imagine a network sitting say Hetzner in Europe, and Airtel peers with them directly. Thus my table feed will simply show Hetzner > Airtel > my provider kind of AS_PATH & not the Hetzner > Hetzner_transit kind of paths. Hence for sure number of feeds matters. I can say from my experience that multiple RIPE collector’s full tables feeds with 7080 full table feeds is a good blend and would cover almost all AS_PATHs but I cannot think how I can mathematically determine that. Is 10 full table feeds OK or 20 or 30? It’s very hard to tell because ultimately what we need is that we get at least one full table feed from behind/downstream of each transitfree network but hey we are here trying to find who are those transitfree networks in the first place hence the cyclic dependency. This can be sorted with a different approach as I define below in the algorithm.
So finally the algorithm to determine who is transit free network:

Pick a view of the full routing table from any point and sort it out to find a list of networks with the largest number of prefixes originated.

Next, use the old school Mathematical method of Mathematical Induction. Basically in Mathematical Induction, we assume something and next build a hypothesis based on that assumption.
Assume two networks out of the full table to be transitfree networks. These can be any two networks but to speed up the algorithm we can assume the first two to be the ones with a maximum number of routes origination.

Find the third transitfree network using the logic that it MUST be peered / adjacent to the first two assumed ones. Find 4th, 5th and so on based on that. After each step, the number of ASNs in the consideration set will reduce as many networks would be adjacent to two ASNs, a few to three and fewer with 4 and so on. Ultimately we will reach a list where ASNs beyond that list won’t have BGP adjacency to all of them. And hence that would be our final list.

Next, try replacing those first two with any of the other two ASNs which are determined and the result should exactly be the same.

Next, try throwing a pair of nontransit free networks to this logic and see how quickly the search terminates!
Good luck in finding the list of transitfree networks. I am not going the spoil the excitement with the list and remember somewhere up that transit path there is always peering! :)