Routing Across a Split Network

Introduction

Sometimes it is necessary to have a single network split across two physical sites, and connected by a SLIP or PPP connection. If there is more than one host on the remote end of the link, the easy mechanism is to get another network number assigned, and route between the two networks. This is not always possible (among other reasons because network numbers are in short supply). Since I recently had to get one working, this page is to document what I did to make it work before I forget!

Note that all of this is a routing issue, and assumes that SLIP or PPP are already working. If you can ping the other end of the link, then you are ready for the next level. Thank you for playing routing roulette!

Physical Configuration

There are two sites, let us call them W, directly on the WAN, and S that is a big support site. To protect the site involved, fake hostnames and domain names will be used.
Figure 1.

Site W contains the WAN router (named wan-router.domain) and the SLIP server host, server.domain. Site S contains several hosts on the same IP network as site W, one of which is the SLIP client, client.domain. The other machines in site S are S1.domain and S2.domain that need to route through the host client.domain. Additionally, the hosts at site S need to talk to several hosts at site W: W1.domain, W2.domain, and W3.domain. Additional hosts on each side can be accomodated by just adding them to the appropriate host.

General Configuration Rule

Each end of the SLIP or PPP link needs to advertise both proxy-ARP and host routes to the hosts on the other end of the link. This is in addition to the normal default routes needed at site S in order to communicate to the rest of your network.

Quicky Proxy ARP Primer
ARP (Address Resolution Protocol) is how one host converts an IP (logical) address into a MAC (physical: ethernet, etc) address. Proxy ARP is when a host answers the request for for a MAC address for an IP address other than it's own (in this case, for the hosts on the other end of the SLIP or PPP link).
See the section on troubleshooting below for things that can go wrong when using Proxy ARP.

Configuration
The configuration on the two ends of the SLIP or PPP link is similar, but there are enough differences that I will describe each one separately.
Configuration of server.domain
On host server.domain, create an /etc/init.d/network.local script (with a symlink in /etc/rc2.d to run after S30network) that contains the following commands:
arp -s client.domain `netstat -ian | grep :` pub arp -s S1.domain `netstat -ian | grep :` pub arp -s S2.domain `netstat -ian | grep :` pub route add host S1.domain client.domain 1 route add host S2.domain client.domain 1
Additional arp and route commands would need to be added for each additional host at site S.
Configuration of client.domain
On host client.domain, create an /etc/init.d/network.local script (with a symlink in /etc/rc2.d to run after S30network) that contains the following commands:
arp -s wan-router.domain `netstat -ian | grep :` pub arp -s server.domain `netstat -ian | grep :` pub arp -s W1.domain `netstat -ian | grep :` pub arp -s W2.domain `netstat -ian | grep :` pub arp -s W3.domain `netstat -ian | grep :` pub route add host wan-router.domain server.domain 1 route add host W1.domain server.domain 1 route add host W2.domain server.domain 1 route add host W3.domain server.domain 1

Again, add additional arp and route commands for each additional host that needs to be accessed accross the SLIP or PPP link. If you are running IRIX-4.x, then you will need to add a default route, as well (slip and ppp in IRIX-5.1 and pater can do this automatically, see the respective man pages or my docs).
If you also start slip or ppp from /etc/init.d/network.local on the client (dialing side), it should be started before you run the arp and route commands, you probably will need a sleep after starting ppp and before the arp commands. Experience shows that at least 20 seconds (sleep 20) is needed for ppp to initialize the interface.
Also, running a routing daemon like routed or gated will screw up the routing we are doing by hand, so disable them:
chkconfig routed off chkconfig gated off

Configuration of Other Hosts at S
On each host at site S (S1.domain, S2.domain, etc), create a file /etc/init.d/network.local with a symlink in /etc/rc2.d such that it will start after the S30network script, and include the following commands:
route add net default client.domain 1

This creates a default route pointing at client.domain. Putting it in /etc/gateways (and run routed) should work as an alternative to creating an /etc/init.d/network.local script (though I haven't tested it in this case).

Troubleshooting Problems

One of the configuration errors at site W was that some other host (not server.domain) had run the arp commands intended for server.domain, thus causing wan-router.domain to get the wrong MAC address for the hosts at site S. Once those were deleted and the ARP timeout was dropped to something reasonable, I was able to clean things up and get it working.
Another error that I have seen is that server.domain was using the wrong MAC address is it's Proxy ARP command. I have no idea where it came from. Once the MAC address is corrected, you have to wait for up to the ARP cache timeout for things to start working again. The default ARP cache timeout on a Cisco router is 4 hours, which is absurdly long. Reducing it to 20 minutes (the default in Irix) or to 10 minutes helps a lot when fixing problems.
If incorrect routes have been advertised (via routed, etc), then it usually requires waiting the 3 minute route timeout after correcting it before things could start working again.
Another problem that came up at this site was a power hit that dropped the machines. The /etc/init.d/network.local scripts ran fine, but the routes were trying to use ec0 interfaces instead of sl0. This was because the SLIP interface was not up when the route add commands were done. Part of the problem was that some enterprising engineers manually added some of their own routes so debugging was not straight-forward.
The best solution would be to set up a quiet SLIP link on IRIX 5.2, but this was not an option here. We brought the SLIP link back up, and manually removed the routes and re-added them to get them routed through the appropriate interfaces. In addition, to prevent this happening again, or actually, to make it much easier to recover, the route add were removed from /etc/init.d/network.local and a separate, manually run script was created to run from the client at site S that does a killall slip, then removes all the routes that used to be added in /etc/init.d/network.local, kicks off SLIP, re-adds the routes that used to be added in /etc/init.d/network.local, then removes and re-adds routes on the SLIP server with rsh server.domain. This makes sure that all the routes are clean, and allows the end initiating the SLIP connection to straighten the link AND routes out with one command.

http://reality.sgi.com/employees/scotth/ Scott Henry <scotth@sgi.com>

Last modified: Fri Mar 12 13:49:25 1999