Normally multiple routes are handled by routers. However you may not have access to the router tables for administrative reasons or because someone wants to protect their turf (they don't want someone not in their department poisoning their router tables). Here we describe setting up multiple routes and how they can be used in an LVS.
If realservers are supplying services through two directors, then the realservers need two default routes (one through each director). This is allowed by the TCPIP RFCs but rarely implemented. You cannot add a 2nd default route with ip route add, you'll get an error saying that the route already exists. Instead you use the command ip route append. This was worked out by Posko (Malalon)
posko P (dot) Osko (at) elka (dot) pw (dot) edu (dot) pl 17 Apr 2002
I used ip append at home when I was testing source address routing for RealServers. But when I started working with my setup I found that I can't set up two default routes for different addresses in one routing table (in Linux it's by default table 'main' where all normal routes are stored) because only one default route works at the same time (the first added to route table). So I decided to create separate route tables (named 201 and 202 in my setup) containing default route for each alias using the following command:
ip route add 0/0 src 192.168.1.2 via 192.168.1.1 table 201
and route packets with source address from 192.168.1.2 according to this table (201)
ip rule add from 192.168.1.2/25 table 201 prio 220
Here are the details from Pawel Osko, Warsaw University of Technology, Faculty of Electronics and Information Technology.
You can create two (or more) LVS-NAT directors using the Policy Routing. The simplest setup is one RS working with two DIRs:
-------- | client | -------- | | _________ | | DIR1 DIR2 | | --------- | RS |
The first step is to create working setup with one DIR and one RS. In my setup I'm using one NIC two Networks LVS-NAT. Example (my) setup: (You can use the Configure Script to set it up.)
DIR1: VIP=A.B.C.70 DIRECTOR_VIP_DEVICE=eth0:110 DIRECTOR_INSIDEIP=192.168.1.1 DIRECTOR_DEFAULT_GW=A.B.C.126 RS: RIP=192.168.1.2 GW=192.168.1.1 |
Now test it. If everything is ok, set the second DIR, and change settings on RS:
DIR2: VIP=A.B.C.71 DIRECTOR_VIP_DEVICE=eth0:110 DIRECTOR_INSIDEIP=192.168.2.1 DIRECTOR_DEFAULT_GW=A.B.C.126 RS: RIP=192.168.2.2 GW=192.168.2.1 |
and test it.
Now you know for sure that your DIRs are set up properly and your RS can work with both of them.
Step 2.
Keep directors working. Delete addresses on network interface on RS (using `ip addr del` command for example). Add two addresses to NIC (I'm using eth0):
ip addr add 192.168.1.2/25 broadcast 192.168.1.127 dev eth0 label eth0:1 ip addr add 192.168.2.2/25 broadcast 192.168.2.127 dev eth0 label eth0:2 |
Check if everything is ok:
ip addr show |
Each of addresses will work with other DIR. Now you must make packets from eth0:1 go to DIR1 and from eth0:2 to DIR2. Source routing will be used to do this.
Create rules for each IP:
ip rule add from 192.168.1.2/25 table 201 prio 220 ip rule add from 192.168.2.2/25 table 202 prio 220 |
where 201 and 202 are names of tables.
Add default routes for each IP:
ip route add 0/0 src 192.168.1.2 via 192.168.1.1 table 201 ip route add 0/0 src 192.168.2.2 via 192.168.2.1 table 202 |
You are done! Now all packets from 192.168.1.2 will go through DIR1 and packets from 192.168.2.2 through DIR2.
New RSs can be added now, simply follow instructions in Step 2 for new IPs. You can also have more DIRs, just add more IPs on RS. I set up LVS-NAT with four DIRs working with four RSs using mon to dect RS's failures and everything works perfect (at last!).
This is not an LVS problem, just a normal routing problem. You can have multiple default gateways in Linux. The problem is knowing when one of them has died.
The "Connected" site has a discussion of dead gateway detection (http://www.freesoft.org/CIE/RFC/1122/56.htm, site gone 14 Sep 2004) derived from the RFCs. The points raised are
In case you're wondering, what they're really saying is that dead gateway detection was not built into the protocol and no satisfactory solution for its absence has been found.
Ratz 22 Jan 2006
According to RFC816 and RFC1122 there are multiple ways to perform DGD, however I've only seen about 3 of those in the wild:
The Alteon switch does media detection and could also listen to special L2 PDU packets, including advertisements. Media detection under Linux is an often discussed and to date not resolved issue. For about 2 months starting last November, a couple of people on netdev have been working on proper link state propagation in the core kernel, the result will be seen in 2.6.17 ;). Other than that I suggest you use non-cheap but excellently supported NICs, like e1000 and check the media state using ethtool or write a netlink listener.
You are allowed to ping, but only if nothing else works for you (3.3.1.4):
Multiple routes to the internet is discussed in Routing for multiple uplinks/providers (http://lartc.org/howto/lartc.rpdb.multiple-links.html) and Multiple Connections to the Internet (http://linux-ip.net/html/adv-multi-internet.html). Julian (immediately below) has a dead gateway detection mechanism and a working setup with dead gateway detection is shown at Nano-Howto to use more than one independent Internet connection. by Christoph Simon (http://www.ssi.bg/~ja/nano.txt). The author warns
The setup of all this is not a question of 5 minutes
Logu lvslog (at) yahoo (dot) com 5 Oct
I have two isdn internet connection from two different isps. I am going to put an lvs_nat between the users and these two links so as to loadbalace the bandwidth.
Julian
You can use the Linux's multipath feature:
# ip ru 0: from all lookup local 50: from all lookup main ... 100: from 192.168.0.0/24 lookup 100 200: from all lookup 200 32766: from all lookup main 32767: from all lookup 253 # ip r l t 100 default src DUMMY_IP nexthop via ISP1 dev DEV1 weight 1 nexthop via ISP2 dev DEV2 weight 1 # ip r l t 200 default via ISP1 dev DEV1 src MY_IP1 default via ISP2 dev DEV2 src MY_IP2 |
You can add my dead gateway detection extension (for now only against 2.2)
This way you will be able fully to utilize the both lines for masquerading. Without this patch you will not be able to select different public IPs to each ISP. They are named "Alternative routes". Of course, in any case the management is not an easy task. It needs understanding.
anon
I currently have multiple adsl modems that connects to the internet.
Alexandre Cassen alexandre (dot) cassen (at) wanadoo (dot) fr 11 Apr 2003
This is a routing design problem, commonly accomplished done by loadbalancing default route at the routing level (netlink). You add 2 default gateway with the same weight to provide outbound loadbalancing. Since current linux kernel routing suffer lake of dead gateway detection, you will need to apply Julian's "dead gateway detection" patch.
Here I show how use dynamic routing to handle routing following failure of the link from a director to its default gw. The director with the failed default route gets its new routing information from the adjacent director, which is assumed to have a functional route to the outside world.
After I got this to work, I found out that you don't do dynamic routing if the interfaces on two machines are in the same networks as shown here, as happens with duplicate directors (or routers).
-------network_A--------- | | host_1 host_2 | | -------network_B--------- |
In the case of common networks, alternate routes are (usually) handled by multiple static routes with different weights e.g. Routing for multiple uplinks/providers in the Linux Advanced Routing and Traffic Control HOWTO. This section then is not exactly central to LVS failover and unless you have some other reason to read about dynamic routing, you may want to skip this section. This was my first attempt at dynamic routing. Even if you use dynamic routing, I won't be surprised if there are better ways of doing it. Suggestions welcome.
You use dynamic routing only if the hosts are connected to non-common networks, as here, where host_1 is not connected to network_C, while host_2 (which is connected to and can communicate with host_1) is.
----network_A----- | host_1 | ----network_B----- | host_2 | ----network_C---- |
Dynamic routing would be used by host_2 to send information about network_C to host_1 (etc).
I had previously been handling routing failure with scripts. Script driven failover (where as well, you have to reconfigure demons to listen to the moveable IP and the router has to think that is has a new name), requires the scripts to run in pairs (to_up on one machine, and to_down on the other). The scripts have to be synchronized and have to run to completion on both machines. If one machine becomes deranged and looses track of its state, then scripts won't failover cleanly. You should be able to down/crash/wedge any single NIC/route/disk/demon in a failover router pair without loosing routing, no matter what. I found that my scripts would often result in some hung state. Perhaps better scripts would have handled it, but this would indicate that functional scripts are difficult to write.
I was looking for other ways of handling routing failure, when John Reuning posted on the mailing list that he was using zebra. I had not managed to even figure out how to setup the .conf file last time I tried (several years ago) as I found the docs inscrutable (some sections were blank). Here's the posting from John Reuning, which showed me how simple it was to configure zebra and which started me off with dynamic routing.
John Reuning john (at) metalab (dot) unc (dot) edu 17 Feb 20004
I've included the .conf files below. I didn't do anything crazy coming up with this stuff. There were sample config files in the source code, and I just copied what I needed.
To make snmp work, these need to go in the snmpd config:
smuxpeer 1.3.6.1.4.1.3317.1.2.1 zebra smuxpeer 1.3.6.1.4.1.3317.1.2.2 zebra_bgpdThe one quirk I remember is that one of the daemons needs to start before the other. If zebra isn't running when bgpd starts up, it freaks out.
bgpd.conf
! bgpd.conf ! hostname bgpd password xxxxxx enable password xxxxxx log syslog !log stdout !log file bgpd.log smux peer .1.3.6.1.4.1.3317.1.2.2 zebra_bgpd ! !bgp multiple-instance ! router bgp 2 bgp router-id 192.168.1.254 neighbor 10.0.1.1 remote-as 2 neighbor 192.168.2.1 remote-as 2 neighbor 192.168.2.1 route-reflector-client neighbor 192.168.2.2 remote-as 2 neighbor 192.168.2.2 route-reflector-client ! line vty !zebra.conf
! zebra.conf ! hostname director password xxxxxx enable password xxxxxx log syslog !log stdout !log file zebra.log ! smux peer .1.3.6.1.4.1.3317.1.2.1 zebra !
I thought it would be better to handle the failover with hardened and well tested demon(s) running on each machine, that maintain communication, and know what to do when one machine is in an arbitary fault state. These demons then would run the minimum depth of the more fragile, dependant scripts.
Zebra is a GPL package containing the common dynamic routing demons (ripd, bgpd, ospfd). Zebra runs on many platforms and uses a command syntax close to that of cisco IOS (i.e. you can use the cisco documentation if you're stuck). Useful documentation I found
As with most computer documentation, you already have to understand the topic in order to be able to read it. Much documentation about dynamic routing concerns the differences between RIP, BGP, OSPF, and goes into details about convergence, horizons... You don't need any of that right now. All you need to know is that these 3 protocols move routing information from one machine to another and that the syntax of the commands for them is much the same. For moving routing information within an AS, you use rip (the original protocol) or ospf (the newer protocol). For moving routing information between different ASs, you need bgp (I think).
To the LVS client, as far as routing is concerned, an LVS appears to be a single leaf node. For an LVS with one director, all routing is to the director and the LVS really is a single leaf node. When multiple directors are involved, and the VIP hops between directors on failover, the inbound routing can be handled at the arp level (the director uses send-arp to update the location of the VIP). For outbound routing (i.e. packets from the VIP on the director to 0/0), dynamic routing protocols can be used. One place that dynamic routing could be used in an LVS, is following failure of the link to 0/0, a director does a failover and no longer having a route to 0/0, has to route packets through the other director (see diagram below).
Note | |
---|---|
I wanted the setup to run a router failover pair. If you are using this to maintain outbound routing for an LVS, you will only need this for LVS-NAT. For LVS-DR and LVS-Tun, for security, there should be no route from the VIP on the director to 0/0 - see default gw for director with LVS-DR/LVS-Tun. |
Normally with dynamic routing, the routers (here, the two directors) are in contact with upstream routers (running a dynamic routing protocol), who feed routing information to them. The link state of the network (up||down) can be inferred from the presence (or absence) of the routing advertisements. With routing advertisements exchanged at 30-60 sec intervals, it will take ripd about 3 mins to timeout a dead link. bgp is a little faster and takes about a minute to timeout a dead link.
In the general case, you may not be able to get dynamic routing information from upstream. Some organisations are big and inflexible, there maybe turf battles, and the IT department will worry about getting bogus advertisements from you.
Where I live, network link failure (or routing failure, which may appear as a link failure) is the most common problem when maintaining service.
Note | |
---|---|
Other problems, e.g. power failures occur more often, but these can be handled by UPSs; disk failures, which you have to plan for, are handled by disk monintoring tools and pre-emptive disk replacement of working disks as they approach their warrantee expiration. |
Assuming that the two routers (directors) are both functional, then failover after a routing/link failure has to handle two problems
detection of link/routing failure
A setup is needed that works without link (or routing) information from upstream machines. In the absence of packets from an upstream machine, link (or routing) failure detection is difficult. I will assume that this is being handled by the failover demon (keepalived/vrrpd or Linux-HA).
reconfiguring the default gw
The director with a failed route to the outside, has to route via the other director.
Here's some info about the differences between routing via tables (i.e. how you set up a leaf node) and routing with dynamic routing protocols (i.e. on a router)
leaf nodes: automatically route to networks on interfaces. All other packets are sent to a default gw. The machine's view of the network is fixed and it knows that it is at the edge of the internet.
routers: advertise networks and IPs. Other routers pick up the messages and figure out the routing. All routers see themselves in the middle of the internet, with no idea where they are in it, or how big the internet is (the Ptolemeic view of the network). In particular, RIP and OSPF routers don't know about edges of ASs or the existence of other ASs. You don't explicitely set routes, rather you list the IPs of the neighbors and then let the routing demons figure out the topology. Except for border routers, the other routers (running RIP or OSPF) don't know about an AS.
Note | |
---|---|
What you need to have your own AS, depends on your clout and size in the networking world. If you're a big governmental agency with offices throughout the country, have thousands of networked computers, route all your intra-agency's packets through clouds (leased lines, where packets don't go onto the internet) and route all your packets to the internet over multiple redundant links, via local ISPs at each site, then you'll have your own AS. Big ISPs will have an AS. If you're a small organisation, you'll have static links to your provider and you won't have an AS. Small dial-up companies with just a few machines handling the traffic have static links and don't even get routing advertisements from their ISPs. Businesses usually are dealing with computers or networks, but not both. If you're in a business that uses computers (e.g. you're an applications programmer), then you won't have an AS. If you even ask the question "what do I need to have an AS?", then you aren't in the network business and you won't have one. |
An AS is connected through border routers (usually two or more for redundancy) to an ISP which is connected to the internet backbone. The border routers act as a default gw for the routers inside the AS (and do so by the instruction "default route originate" in their .conf file) and appear as a "route of last resort" in the routing tables of the inside machines.
If you want your own (private) AS, then use the private AS numbers 64512-65535 (the AS equivalent of 192.168.x.x IPs). Advertisements for these ASs are not propagated.
After convergence, the routers within an AS will know the routes in the AS and will know which machines to use as their default gw (gateway of last resort).
Here's the setup for the demonstration with two routers (directors), working as an active/backup pair, running a dynamic routing protocol.
Note | |
---|---|
dummy0 is configured with an IP in the Cimafranca paper, partly for their demonstration. This IP allows you to ping the node from the outside, as long as at least one hardware NIC is up. Supposedly this IP is a convenience to be able to identify a host (although I didn't have any need for it). dummy0 is chosen as it is the interface least likely to go down. cisco routers use lo for this IP, but apparently the convention with Linux is to use dummy0. The IPs on each dummy0 interface are in different networks. If they are in the same network, you can't route to the IP on dummy0 on adjacent machines. |
Here is the network during normal functioning
______________ ______________ | | | | | router | | router | |______________| |______________| 192.168.1.253 192.168.1.254 | | | | | | | | | | 192.168.1.1 192.168.1.2 ______________ ______________ | | | | 10.0.1.1/24=dummy0-| backup | | active |-dummy0=10.0.2.1/24 |______________| |______________| | | 192.168.2.1 192.168.2.2 | | ----------------- | realservers |
Here is the network immediately following link loss to the backup director's default gw.
______________ ______________ | | | | | router | | router | |______________| |______________| 192.168.1.253 192.168.1.254 | | | X | | | | 192.168.1.1 192.168.1.2 ______________ ______________ | | | | 10.0.1.1/24=dummy0-| backup | | active |-dummy0=10.0.2.1/24 |______________| |______________| | | 192.168.2.1 192.168.2.2 | | ----------------- | realservers |
Here is the network after re-establishing the default route for the backup master node. This takes about 25 secs with RIP.
______________ ______________ | | | | | router | | router | |______________| |______________| 192.168.1.253 192.168.1.254 | | | | -----------------| | | 192.168.1.1 192.168.1.2 ______________ ______________ | | | | 10.0.1.1/24=dummy0-| backup | | active |-dummy0=10.0.2.1/24 |______________| |______________| | | 192.168.2.1 192.168.2.2 | | ----------------- | realservers |
Note | |
---|---|
You install the iproute2 tools for zebra to work and and the CLI commands must be policy routing commands. There are two series of network tools available with Linux
The routes/IPs added by rip/zebra are added by the iproute2 tools. The two series of commands are incompatible. IPs (or routes) added by iproute2 may not be visible to ifconfig (or route). Routes added by ip route add may be visible to route but aren't capable of being deleted by route. All IP and route commands from the command line must use the iproute2 tools. |
If you like names rather than port numbers, add these to /etc/services
#zebra zebrasrv 2600/tcp # zebra service zebra 2601/tcp # zebra vty ripd 2602/tcp # RIPd vty ripngd 2603/tcp # RIPngd vty ospfd 2604/tcp # OSPFd vty bgpd 2605/tcp # BGPd vty ospf6d 2606/tcp # OSPF6d vty # |
zebra.conf
! ! Zebra configuration saved from vty ! 2004/02/19 17:48:27 ! ! hostname given at zebra prompt and passwd hostname zebra password zebra ! ! enable "enable" command and give passwd for it. enable password zebra ! ! log to a file log file /var/log/zebra.log ! alternatively, log to a facility !log syslog !log stdout ! ! turn on vtysh access line vty ! ! the interfaces you want Zebra to know about ! (tell zebra about all of them) interface lo ! interface dummy0 ! interface tunl0 ! interface eth0 ! interface eth1 !--------------------------------- |
here's my zebra init script, ripd init script, bgpd init script, ospfd init script.
Now use the zebra shell (vtysh or telnet localhost zebra) to install an IP on dummy0 (following the instructions of Cimafranca and Young).
Note | |
---|---|
|
You can add the IP for dummy0 into zebra.conf with an editor instead. You could also add the IP on bootup, but by adding the information to the .conf file, the IP will only be present after you start up zebra.
director:/etc/zebra# /etc/rc.d/rc.zebra start director:/etc/zebra# telnet localhost zebra Trying 127.0.0.1... Connected to localhost. Escape character is '^]'. Hello, this is zebra (version 0.94). Copyright 1996-2002 Kunihiro Ishiguro. User Access Verification Password: zebra> enable Password: zebra# configure terminal zebra(config)# interface dummy0 zebra(config-if)# ip address 10.0.1.1/24 zebra(config-if)# quit zebra(config)# write Configuration saved to /etc/zebra/zebra.conf zebra(config)# end zebra# show run Current configuration: ! hostname zebra password zebra enable password zebra log file /var/log/zebra.log ! interface lo ! interface dummy0 ip address 10.0.1.1/24 ! interface tunl0 ! interface eth0 ! interface eth1 ! ! line vty ! end zebra# quit Connection closed by foreign host. director:/etc/zebra# cat zebra.conf ! ! Zebra configuration saved from vty ! 2004/02/24 17:51:02 ! hostname zebra password zebra enable password zebra log file /var/log/zebra.log ! interface lo ! interface dummy0 ip address 10.0.1.1/24 ! interface tunl0 ! interface eth0 ! interface eth1 ! ! line vty ! |
Next time you start up zebra, the new zebra.conf script will add the IP to dummy0 and the src route (as if you'd run ip addr add 10.0.1.1/24 dev dummy0 brd + from the command line).
Start up zebra on the second director and add an IP to dummy0 there (you can copy the zebra.conf file here to the other director and change the IP for dummy0).
Now you're going to start ripd. Here's my ripd.conf
! ! Zebra configuration saved from vty ! 2004/03/01 14:38:03 ! hostname ripd password zebra enable password zebra log file /var/log/ripd.log ! interface lo ! interface dummy0 ! interface tunl0 ! interface eth0 ! interface eth1 ! router rip network eth0 network eth1 ! line vty ! |
Here I add networks to the conf file from the zebra interface (you could use an editor on the conf file too).
director:/etc/zebra# telnet 0 ripd Trying 0.0.0.0... Connected to 0. Escape character is '^]'. Hello, this is zebra (version 0.94). Copyright 1996-2002 Kunihiro Ishiguro. User Access Verification Password: ripd> enable Password: ripd# configure terminal ripd(config)# router rip ripd(config-router)# network 10.0.1.0/24 ripd(config-router)# network 192.168.1.0/24 ripd(config-router)# write Configuration saved to /etc/zebra/ripd.conf ripd(config-router)# show run ripd(config-router)# show run Current configuration: ! hostname ripd password zebra enable password zebra log file /var/log/ripd.log ! interface lo ! interface dummy0 ! interface tunl0 ! interface eth0 ! interface eth1 ! router rip network 10.0.1.0/24 network 192.168.1.0/24 network eth0 network eth1 ! line vty ! end ripd(config-router)# quit ripd(config)# exit ripd# exit Connection closed by foreign host. director:/etc/zebra# |
Here's the ripd.conf I used for the demo.
! -*- rip -*- ! ! RIPd sample configuration file ! ! $Id: ripd.conf.sample,v 1.11 1999/02/19 17:28:42 developer Exp $ ! hostname ripd password zebra enable password zebra ! ! debug rip events ! debug rip packet ! router rip network 0.0.0.0/0 network 192.168.1.0/24 network 192.168.2.0/24 network eth0 network eth1 redistribute kernel ! !default-information originate ! log file /var/log/ripd.log |
Make sure both routers have default routes.
backup:/etc/zebra: ip route add default via 192.168.1.253 active:/etc/zebra: ip route add default via 192.168.1.254 |
Activate debugging in zebra (so you will see notices of rip updates on the screen) and then show the routes
backup:/etc/zebra# telnet 0 zebra Trying 0.0.0.0... Connected to 0. Escape character is '^]'. Hello, this is zebra (version 0.94). Copyright 1996-2002 Kunihiro Ishiguro. User Access Verification Password: zebra> enable Password: zebra# debug zebra packet zebra# show ip route Codes: K - kernel route, C - connected, S - static, R - RIP, O - OSPF, B - BGP, > - selected route, * - FIB route K>* 0.0.0.0/0 via 192.168.1.253, eth1 R>* 10.0.1.0/24 [120/2] via 192.168.2.1, eth0, 00:07:44 C>* 10.0.2.0/24 is directly connected, dummy0 K * 127.0.0.0/8 is directly connected, lo C>* 127.0.0.0/8 is directly connected, lo K * 192.168.1.0/24 is directly connected, eth1 C>* 192.168.1.0/24 is directly connected, eth1 K * 192.168.2.0/24 is directly connected, eth0 C>* 192.168.2.0/24 is directly connected, eth0 |
The output shows that the backup router has a default route added by the kernel (at the CLI above) and a route to 10.0.1.0 added by RIP, which enables routing to 10.0.1.1 on the other machine. (A similar view will be seen by running ip route show at the CLI.) The [120/2] indicates the administrative weight of the route [120] and the number of hops [2].
Then do the following in order -
zebra# show ip route Codes: K - kernel route, C - connected, S - static, R - RIP, O - OSPF, B - BGP, > - selected route, * - FIB route R>* 10.0.1.0/24 [120/2] via 192.168.2.1, eth0, 00:18:01 C>* 10.0.2.0/24 is directly connected, dummy0 K * 127.0.0.0/8 is directly connected, lo C>* 127.0.0.0/8 is directly connected, lo K * 192.168.1.0/24 is directly connected, eth1 C>* 192.168.1.0/24 is directly connected, eth1 K * 192.168.2.0/24 is directly connected, eth0 C>* 192.168.2.0/24 is directly connected, eth0 |
zebra# 2004/03/02 21:21:54 ZEBRA: zebra message received [ZEBRA_IPV4_ROUTE_ADD] 14 |
zebra# show ip route Codes: K - kernel route, C - connected, S - static, R - RIP, O - OSPF, B - BGP, > - selected route, * - FIB route R>* 0.0.0.0/0 [120/2] via 192.168.1.254, eth1, 00:00:03 R>* 10.0.1.0/24 [120/2] via 192.168.2.1, eth0, 00:18:31 C>* 10.0.2.0/24 is directly connected, dummy0 K * 127.0.0.0/8 is directly connected, lo C>* 127.0.0.0/8 is directly connected, lo K * 192.168.1.0/24 is directly connected, eth1 C>* 192.168.1.0/24 is directly connected, eth1 K * 192.168.2.0/24 is directly connected, eth0 C>* 192.168.2.0/24 is directly connected, eth0 |
The new (x.x.x.254 rather than x.x.x.253) default gw is now installed and this time it's installed by RIP (rather than the kernel). Here's the view of the routing as shown from the CLI
backup:/etc/zebra# ip route show 10.0.1.0/24 via 192.168.2.1 dev eth0 proto zebra metric 2 equalize 192.168.2.0/24 dev eth0 scope link 192.168.2.0/24 dev eth0 proto kernel scope link src 192.168.2.253 192.168.1.0/24 dev eth1 scope link 192.168.1.0/24 dev eth1 proto kernel scope link src 192.168.1.253 10.0.2.0/24 dev dummy0 proto kernel scope link src 10.0.2.253 127.0.0.0/8 dev lo scope link default via 192.168.1.254 dev eth1 proto zebra metric 2 equalize |
This time the default route is installed by zebra.
You can time the route failover: At 18:31 (min:sec since executing), the new route has been up for 00:03 seconds. The failover occurred at 18:01, showing that the new route took ((31-3)-1)=27 seconds to appear after failover.
The new default gw is the other director's default gw. I had initially hoped that the new default gw would be an IP on the active director, and that ICMP redirects would handle re-routing to the active director's default gw. However this didn't work, although I though it would for a while. Here's what happened.
If you activate the line
default-information originate |
in ripd.conf on just the active director, the active director, having a default route of its own, will advertise that it is a default route. If you then do the failover, the default route on the backup director will be an IP on the active director. (I thought I was home at this stage.) Since you want to do this symmetrically, you activeate the same line to ripd.conf on the backup director. The problem (from talking to Steve Buchanan) is that the backup director, if it's been told to advertise that it is the default route, is not going to accept an advertisement from anyone else (like the active director) declaring that they are the default gw instead. After activating the option default-information originate, then on failure of the link, the backup master node will not accept the RIP update of a default route and will not show a default route.
With dynamic routing then, after failover, the default route for the backup router is the default route of the active router, and not an IP on the active router. Functionally these achieve the same result if there are no other problems with the routing on the backup router.
Patrick LeBoutillier patl (at) fusemail (dot) com 26 May 2004
Here is a "recipe" for creating LVS clusters with machines that support redundant networking.
Our production environment is fully redundant at the network level (each machine has two network interfaces, each connected to a different network). All machine are connected to both these networks and data can come from either network. On each machine, service run on a local network address and gated announces the route for these networks via both network interface. My task was to create an LVS cluster of 2 such machines (each a potential director and realserver as well).
The network setup:
Network 1 is 192.168.10.0/24 Network 2 is 192.168.11.0/24 Machine 1: - eth0: 192.168.10.1 - eth1: 192.168.11.1 - local network on loopback (lo:real): 192.168.20.1/32 Machine 2: - eth0: 192.168.10.2 - eth1: 192.168.11.2 - local network on loopback (lo:real): 192.168.21.1/32 Virtual IP is 192.168.30.1 |
gated setup:
Have gated announce (and accept) the following routes:
Machine 1:
- announce 192.168.20.1/32
- accept routes from 192.168.10.2 and 192.168.11.2
Machine 2:
- announce 192.168.21.1/32
- accept routes from 192.168.10.1 and 192.168.11.1
These routes will be used by ldirectord to monitor the realservers.
Recipe
Install UltraMonkey as usual, but:
Make sure to configure ping nodes in both networks.
Note | |
---|---|
A "ping node" is a pingable IP that is used by the heartbeat ipfail plugin, to determine if a director has lost network connectivity. The "ping node" terminology is defined at Getting Started with Linux-HA (heartbeat) (http://linux-ha.org/download/GettingStarted.html). |
- Create the virtual IP alias as 192.168.30.1
- A virtual service definition in ldirectord.cf should look something like this:
virtual=192.168.30.1:80 real=192.168.20.1:80 gate real=192.168.21.1:80 gate service=http checkport=80 request="/test.html" receive="test" scheduler=rr protocol=tcp |
In a normal setup, heartbeat manages the virtual IP alias and brings it up on the active director. If I understand correctly, an arp request is then sent, making the other machines in the local network aware that the active director is now the machine to be reached for the virtual IP.
In this setup we will tell heartbeat to leave the virtual IP alias alone and have it tell gated to announce the route for the 192.168.30.1/32 network instead. Therefore ONLY the active director will announce the routes to reach the virtual IP network.
Change your haresources line to something like this:
node1.cluster.tld gated-toggle ldirectord |
Place the following (or equivalent) code in a file called /etc/ha.d/resource.d/gated-toggle:
--------8<-------- #!/bin/bash # # This gated control script should only be called by heartbeat! # # start: RESTART gated with the original (non-director config) # stop: RESTART gated with the director config # # Source function library. . /etc/rc.d/init.d/functions # Source networking configuration. . /etc/sysconfig/network # Check that networking is up. [ ${NETWORKING} = "no" ] && exit 0 gdc=/usr/sbin/gdc gated=/usr/sbin/gated prog=gated if [ ! -f /etc/gated.conf -o ! -f $gdc ] ; then action $"Not starting $prog: " true exit 0 fi PATH=$PATH:/usr/bin:/usr/sbin RETVAL=0 start() { echo -n $"Starting $prog: " CFG=$1 if [ "$CFG" != "" ] ; then RES='$2$3' RE="s/^(\s*\#+)(.*)(\#\s*heartbeat-toggle\s*)$/$RES/" /usr/bin/perl -p -e "$RE" /etc/gated.conf > $CFG daemon $gated -f $CFG else daemon $gated fi RETVAL=$? [ $RETVAL -eq 0 ] && touch /var/lock/subsys/gated echo return $RETVAL } stop() { # Stop daemons. action $"Stopping $prog" $gdc stop RETVAL=$? if [ $RETVAL -eq 0 ] ; then rm -f /var/lock/subsys/gated fi return $RETVAL } # See how we were called. case "$1" in start) stop start "/etc/gated-heartbeat.conf" ;; stop) stop start ;; *) echo $"Usage: $0 {start|stop}" exit 1 esac exit $RETVAL -------->8-------- |
What this script does is:
On resource acquisition:
Copy the gated configuration file (/etc/gated.conf) to another file (/etc/gated-heartbeat.conf), activate the route for the virtual IP network and restart gated using the new file.
On resource loss:
Restart gated using the original configuration.
Note | |
---|---|
gated must always be running and must start at boot time using the non-active (default) config. |
Modify /etc/gated.conf accordingly. Here is the /etc/gated.conf file for machine 1:
--------8<-------- options syslog upto debug; smux off; bgp off; egp off; ospf off; rip yes{ interface all noripin noripout; interface eth0 ripin ripout version 2 multicast; interface eth1 ripin ripout version 2 multicast; trustedgateways 192.168.10.2 192.168.11.2 (...) # other routers in the network ; }; static { 192.168.20.1 masklen 32 interface 127.0.0.1 preference 0 retain; 192.168.30.1 masklen 32 interface 127.0.0.1 preference 0 retain; }; import proto rip{ all; }; # On exporte differentes affaires, en concordance avec le mode de fonctionnement (prod/releve) export proto rip{ proto static{ host 192.168.20.1 metric 1; # host 192.168.30.1 metric 1; # heartbeat-toggle }; }; -------->8-------- |
The gated-toggle script will look for all lines ending with "# heartbeat-toggle" and turn them on (or off) depending on the cluster state.
I suspect you could do something similar with zebra or some other routing software, as long you can restart it with a different config or (even better) change it's config dynamically (maybe you can dynamically change the config for gated, but I'm not aware of this).
Shaun McCullagh shaun (dot) mccullagh (dot) marviq (dot) com 27 May 2004
I've encountered some flapping problems with Keepalived v1.1.1 (on RH Linux 7.3 Kernel 2.4.18-5) when used with Cisco 2948 and C3548-XL switches. Both Master and Backup PC's use 3COM 905C NICS. As an experiment I tried ifconfig eth2 down on the Backup system to check it recovered from a FAULT state. The system went into FAULT state as expected, but I when I did ifconfig eth2 up, keepalived initially went to Backup state, then started oscillating between MASTER and BACKUP state.
I fixed the problem by increasing the advert_int to 35 seconds (on both Master and Backup system). The problem with this is when Keepalived is started the VIPs obviously take much longer to start than if the advert_int is set to 5 seconds.
I'd grateful for suggestions as to what to investigate, as I quite like to set the advert_int back to 5 seconds
Graeme Fowler keepalived (at) graemef (dot) net 27 May 2004
Hard set your switch speed/duplex settings for those ports, and use "mii-tool" (assuming it will support your cards) to do the same at the server end. Cisco switches take up to 30 seconds to complete their autonegotiation - if they're hard set, they don't.
Kjetil Torgrim Homme kjetilho (at) ifi (dot) uio (dot) no 27 May 2004
it's not auto-negotiation which takes time, it's the spanning tree algorithm. It's required to wait for 30 seconds to discover loops in the topology (nodes will only announce their presence so often). You can turn this off, with the configuration option spanning-tree portfast, if you're certain the port will never be used to connect to switches.
Graeme Fowler keepalived (at) graemef (dot) net 28 May 2004
Whoops! My mistake; indeed it is the spanning tree algorithm. I also ensure that I have "spanning-tree portfast" set on interfaces which I know will always be connected to hosts rather than switches (or in fact where I know that the port may connect to a switch which is not spanning tree capable).
One point of note though is that I have on occasion been bitten by interfaces which continually autonegotiate - whilst connectivity seems OK, the interface itself flaps wildly ever few seconds. Hence the comments about hard-setting port speeds :)