We rarely hear of anyone using this to make a director function as a normal realserver. However more specialised roles have been found for localnode.
Note | |
---|---|
2008: plenty of people are using it now, particularly the two box lvs. |
With localnode, the director machine can be a realserver too. This is convenient when only a small number of machines are available as servers.
To use localnode, with ipvsadm you add a realserver with IP 127.0.0.1 (or any local IP on your director). You then setup the service to listen to the VIP on the director, so that when the service replies to the client, the src_addr of the reply packets are from the VIP. The client is not connecting to a service on 127.0.0.1 (or a local IP on the director), despite ipvsadm installing a service with RIP=127.0.0.1.
Some services, e.g. telnet listen on all IP's on the machine and you won't have to do anything special for them, they will already be listening on the VIP. Other services, e.g. http, sshd, have to be specifically configured to listen to each IP.
Note | |
---|---|
Configuring the service to listen to an IP which is not the VIP, is the most common mistake of people reporting problems with setting up LocalNode. |
LocalNode operates independantly of NAT,TUN or DR modules (i.e. you have have LocalNode running on a director that is forwarding packets to realservers by any of the forwarding methods).
Horms 04 Mar 2003
from memory, this is what is going to happen: The connection will come in for VIP. LVS will pick this up and send it to the realserver (which happens to be a local address on the director e.g.192.168.0.1). As this address is a local IP address, the packet will be sent directly to the local port without any modification. That is, the destination IP address will still be the VIP, not 192.168.0.1. So I am guessing that an application that is only bound to 192.168.0.1 will not get this connection.
We've only had the ability to have one service in LocalNode, till Horms made this proposal. Let us know if it works.
Horms 5 Jun 2007
If you want to use LVS to have two local services on the director, wouldn't an easy way be to bind the processes to 127.0.0.1 and 127.0.0.2 respectively and set them up as the real-servers in LVS?
It's possible to have a fully failover LVS with just two boxes. The machine which is acting as director, also is acting as a realserver using localnode. The second box is a normal realserver. The two boxes run failover code to allow them to swap roles as directors. The two box machine is the minimal setup for an LVS with both director and realserver functions protected by failover.
An example two box LVS setup can be found at http://www.ultramonkey.org/2.0.1/topologies/sl-ha-lb-eg.html. UltraMonkey uses LVS so this setup should be applicable to anyone else using LVS.
Salvatore D. Tepedino sal (at) tepedino (dot) org 21 Jan 2004
I've set one up before and it works well. Here's a page http://www.ultramonkey.org/2.0.1/topologies/sl-ha-lb-overview.html that explains how it's done. You do not have to use the ultramonkey packages if you don't want to. I didn't and it worked fine.
In practice, having the director also function as a realserver, complicates failover. The realserver, which had a connection on VIP:port will have to release it before it can function as the director, which only forwards connections on VIP:port (but doesn't accept them). If after failover, the new active director is still listening on the LVS'ed port, it won't be able to forward connections.
Karl Kopper karl (at) gardengrown (dot) org 22 Jan 2004
At failover time, the open sockets on the backup Director may survive when the backup Director acquires the (now arp-able) VIP (of course the localnode connections to the primary director are dropped anyway), but that's not going to happen at failback time automatically. You may be able to rig something up with ipvsadm using the --start-daemon master/backup, but it is not supported "out-of-the-box" with Heartbeat+ldirectord. (I think this might be easier on the 2.6 kernel btw). Perhaps what you want to achieve is only possible with dedicated Directors not using LocalNode mode.
The "Two Box LVS" is only suitable for low loads and is more difficult to manage than a standard (non localnode) LVS.
Horms horms (at) verge (dot) net (dot) au 23 Jan 2004
The only thing that you really need to consider is capacity. If you have 2 nodes and one goes down, then will that be sufficient untill you can bring the failed node back up again? If so go for it. Obviously the more nodes you have the more capacity you have - though this also depends on the capacity of each node.
My thinking is that for smallish sites having the linux director as a machine which is also a realserver is fine. The overhead in being a linux director is typically much smaller than that of a realserver. But once you start pushing a lot of traffic you really want a dedicated pair of linux directors.
Also once you have a bunch of nodes it is probably easier to manage things if you you know that these servers are realservers and those ones are linux directors, and spec out the hardware as appropriate for each task - e.g. linux-directors don't need much in the way of storgage, just CPU and memory.
Horms horms (at) verge (dot) net (dot) au 26 Aug 2003
The discussion revolves around using LVS where Linux Directors are also realservers. To complicate matters more there are usually two such Linux Directors that may be active or standby at any point in time, but will be Real Servers as long as they are available.
The key problem that I think you have is that unless you are using a fwmark virtual service then the VIP on the _active_ Linux Director must be on an interface that will answer ARP requests.
To complicate things, this setup really requires the use of LVS-DR and thus, unless you use an iptables redirect of some sort, the VIP needs to be on an interface that will not answer ARP on all the realservers. In this setup that means the stand-by Linux Director.
Thus when using this type of setup with the constraints outlined above, when a Linux Director goes from strand-by to active then the VIP must go from being on an interface that does not answer ARP to an interface that does answer ARP. The opposite is true if a Linux Director goes from being active to stand-by.
In the example on ultramonkey.org the fail-over is controlled by heartbeat (as opposed to Keepalived which I believe you are using). As part of the fail-over process heartbeat can move an interface from lo:0 to ethX:Y and reverse this change as need be. This fits the requirement above. Unfortunately I don't think that Keepalived does this, though I would imagine that it would be trivial to implement.
Another option would be to change the hidden status of lo as fail-over occurs. This should be easily scriptable.
There are some more options too: Use a fwmark service and be rid of your VIP on an interface all together. Unfortunately this probably won't solve your problem though, as you really need one VIP in there somewhere. Or instead of using hidden interfaces just use an iptables redirect rule. I have heard good reports of people getting this to work on redhat kernels. I still haven't had time to chase up whether this works on stock kernels or not (sorry, I know it has been a while).
(For other postings on this thread see the mailing list archive http://marc.theaimsgroup.com/?l=linux-virtual-server&m=103612116901768&w=2.)
Note | |
---|---|
The normal way to run the Two Box LVS is with no ipvsadm entries on the backup director. However keepalived does have ipvsadm entries, and a non-arp'ing VIP. If the backup director has ipvsadm entries then even though it's not receiving packets directly from the internet, a connection request can be forwarded from the active director. The backup director will attempt to loadbalance this request, which could be sent back to the active director. You are now in a loop. Here's the story of the discovery of the problem and the fix by Graeme. |
Martijn Grendelman martijn (at) pocos (dot) nl 19 Dec 2007
I have a quite straightforward LVS-DR setup: two machines, both running a webserver on port 80, one of them directing traffic to either the local node or the other machine. I am using the 'sh' scheduler, as I have been for ages.
Since a while, directing traffic to the other machine (not the director) doesn't work anymore, BUT ONLY on a specific VIP:PORT combination. During my tests, the LVS setup is as follows:
martijn@whisky:~> rr ipvsadm -L -n IP Virtual Server version 1.2.1 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP 213.207.104.20:80 sh -> 213.207.104.11:80 Route 500 0 0 TCP 213.207.104.20:81 sh -> 213.207.104.11:81 Route 500 0 0 TCP 213.207.104.50:80 sh -> 213.207.104.11:80 Route 500 0 0Note that all references to the local node have been temporarily removed. Now, the service defined second (port 81) works. The third one, port 80, but a different VIP, works too. The first one, the one that I need, does not.
When I connect to 213.207.104.20:80, I see some kind of SYN storm on both the director and the real server on 213.207.104.11: Mostly identical lines (and nothing else) keep appearing at a high rate, even after I kill the connection on the client. Only after I remove the service from LVS, this stops.
Graeme
You likely have a pair of "battling" directors. Consider this Client sends SYN to director1. Director1 sends it on to director2, being the other realserver - so far this is your scenario.
(Joe: now only if director2 is active): Director2's LVS catches the packet and sends it back to director1 for service, but director1 already sent that connection to director2, so sends the packet back.
What happens now is that second paragraph happens ad nauseum, until your ethernet between the machines is full up of the same SYN packet, performance degrades, and the directors fall over under the load (eventually).
Martin
Indeed, the other realserver, being the backup director, had its LVS rules loaded. After clearing the LVS table on this machine, everything worked like ever before.
In the past, the machines only had LVS active if they were actually the active director, but at some point in time, I figured I could just leave it active, because the stand-by director didn't get any requests anyway. But of course, in this DR fashion, that is not true.
Graeme
I found it eventually, on the keepalived-devel@lists.sourceforge.net list - I'll post it verbatim below.
This *should* allow you, with some modifications, to sort out your problem and keep an active/active master/backup (by this I mean with IPVS loaded and configured on both directors).
Client sends packet to VIP. Director1 (Master) has VIP on external interface, and has LVS to catch packets and load balance them. Director1 uses LVS-DR to route packet either to itself (which is fine), or to Director2 (Backup).
There's the problem... In the case of keepalived, Director2 *also* has a VIP, and has ipvsadm rules configured to forward packets regardless of the VRRP mode (MASTER, BACKUP or FAULT). This makes for faster failover but leads directly to this problem/solution. In the backup director keepalived moves the VIP from the VRRP interface to the lo which is configured to not reply to arp requests. In the basic case, 50% of the packets being forwarded by Director1 to Director2 *will now get sent back to Director1* by the LVS-DR configuration. Because Director1 LVS has already sent traffic for that connection to Director2, so it forwards the traffic to Director2.
Time passes, friend.
Your servers collapse under the weight of the amplifying traffic on their intermediate or backend (or frontend, if you're on a one-net setup) network.
The solution? A real nice easy one - use iptables to set a MARK on the incoming traffic - something like:
iptables -t mangle -I PREROUTING -i eth0 -p tcp -m tcp \ -s 0/0 -d $VIP --dport $VIP_PORT \ -j MARK --set-mark 0x1Then configure keepalived to match traffic on the MARK value instead of the VIP/PORT combination, like so:
virtual_server fwmark 1 { delay_loop 10 lb_algo rr lb_kind DR protocol TCP real_server x.x.x.72 25 { TCP_CHECK { connect_timeout 5 } } real_server x.x.x.73 25 { TCP_CHECK { connect_timeout 5 } } }...and so on for the other MARK values you define in your iptables setup.
This works perfectly where you have more than one interface and are routing inter-director traffic via a "backend". In the case of a single NIC on each box, you need a modified rule to NOT apply the mark value to packets sourced from the "other" director:
On node1 create an iptables rule of the form:
-t mangle -I PREROUTING -d $VIP -p tcp -m tcp --dport $VPORT -m mac \ ! --mac-source $MAC_NODE2 -j MARK --set-mark 0x6where $MAC_NODE2 is node2's MAC address as seen by node1. Do a similar trick on node2:
-t mangle -I PREROUTING -d $VIP -p tcp -m tcp --dport $VPORT -m mac \ ! --mac-source $MAC_NODE1 -j MARK --set-mark 0x7where $MAC_NODE1 is node1's MAC address as seen by node2.
Change your keepalived.conf so that it uses fwmarks.
node1: virtual_server fwmark 6 { node2: virtual_server fwmark 7 {Note | |
---|---|
The problem came up again, before the solution went into the HOWTO. Graeme directed Thomas to the original posting in the archive |
Thomas Pedoussaut thomas (at) pedoussaut (dot) com 15 Apr 2008
I have a very light infrastructure, with 2 servers acting as directors AND real servers.
I came across the packet storm problem where when the MASTER forwards a connection to the real server on the BACKUP (via DR), the BACKUP treats it as a VIP connection to be loadbalanced rather than a real server connection to process. And decides to load balance it back to the MASTER
I'm sure there is a way to do it, maybe with iptables. I'm looking for a schema explaining how a packet coming on an interface traverses the various layers (ipvs, netfilter, routing) so I could figure out how to do it.
My chance is that I have 2 physical interfaces, one public and one private, so if a packet arrives on the private interface for the VIP, it's a DR from the MASTER, and if it comes on the public, it's pre-loadbalance traffic.
Another option would be to be sure that the tables are in sync between the 2 machines so the BACKUP know that the connection has to be directed locally. I have tried to setup that feature, but it doesn't seems to sync really.
Joe: Here's my explanation of Graeme's problem, in case you haven't got it yet. The problem only occurs if ipvsadm rules are loaded on the backup director (having the rules loaded makes failover simpler). Here's the two NIC version of the problem
ipvsadm balances on VIP:port CIP CIP | v |---------------- VIP | | eth0 VIP eth0 VIP _______ _______ | | | | active | | | | backup |_______| |_______| eth1 RIP1 eth1 RIP2 | | ---------------- CIP->MAC of RIP2 normal packet MAC of RIP1<-CIP spurious packet |
Solution: fwmark packets coming in on eth0, to VIP:80. Load balance on the fwmark. Packets for the VIP coming from 0/0 to eth0 will be load balanced. Packets for the VIP coming in on eth1 will not be load balanced and will be delivered to the demon.
Here's the 1 NIC version of the problem
ip_vs() balances on VIP:port CIP CIP | v | CIP->MAC of eth0 on backup normal packet VIP | MAC of eth0 on active<-CIP spurious packet |---------------- | | eth0 VIP eth0 VIP _______ _______ | | | | active | | | | backup |_______| |_______| |
Solution: fwmark packets coming in on eth0, to VIP:80 but not those coming from MAC of the other director. Load balance on the fwmark. Packets to VIP:port from 0/0 will be loadbalanced. Packets to VIP:port from the other MAC address, will not be loadbalanced and will be delivered directly to the demon.
Joe: It occurs to me that ipvsadm doesn't have the -i eth0, or -o eth0 options, that the other netfilter commands have. Does a packet arriving on LOCAL_IN, know which NIC it came in on?
Horms 29 Dec 2008
It would be possible, and I believe that the information is available in LOCAL_IN. But there are a lot of different filters taht can be applied through netfitler. And rather than adding them all to ipvsadm, I think it makes a lot more sense to just make use of fwmark.
If you want to explore installing localnode by hand, try this. First make sure scheduling is turned on at the director (this command adds round robin scheduling and direct routing)
#ipvsadm -A -t 192.168.1.110:23 -s rr |
With an httpd listening on the VIP (192.168.1.110:80) of the director (192.168.1.1) AND with _no_ entries in the ipvsadm table, the director appears as a normal non-LVS node and you can connect to this service at 192.168.1.110:80 from an outside client. If you then add an external realserver to the ipvsadm table in the normal manner with
#/sbin/ipvsadm -a -t 192.168.1.110:80 -r 192.168.1.2 |
then connecting to 192.168.1.110:80 will display the webpage at the realserver 192.168.1.2:80 and not the director. This is easier to see if the pages are different (eg put the real IP of each machine at the top of the webpage).
Now comes the LocalNode part -
You can now add the director back into the ipvsadm table with
/sbin/ipvsadm -a -t 192.168.1.110:80 -r 127.0.0.1 |
(or replace 127.0.0.1 by another IP on the director)
Note, the port is the same for LocalNode. LocalNode is independant of the LVS mode (LVS-NAT/Tun/DR) that you are using for the other IP:ports.
Shift-reloading the webpage at 192.168.1.110:80 will alternately display the wepages at the server 192.168.1.2 and the director at 192.168.1.1 (if the scheduling is unweighted round robin). If you remove the (external) server with
/sbin/ipvsadm -d -t 192.168.1.110:80 -r 192.168.1.2 |
you will connect to the LVS only at the directors port. The director:/etc/lvs# ipvsadm table will then look like
Protocol Local Addr:Port ==> Remote Addr Weight ActiveConns TotalConns ... TCP 192.168.1.110:80 ==> 127.0.0.1 2 3 3 |
From the client, you cannot tell whether you are connecting directly to the 192.168.1.110:80 socket or through the LVS code.
With dual directors in active/backup mode, some people are interested in running services in localnode, so that the backup director can function as a normal realserver rather than just sit idle. This should be do-able. There will be extra complexity in setting up the scripts to do this, so make sure that robustness is not compromised. The cost of another server is small compared to the penalties for downtime if you have tight SLAs.
Jan Klopper janklopper (at) gmail (dot) com 2005/03/02
I have 2 directors running hearthbeat and 3 realservers to process the requests. I use LVS-DR and want the balancers to also be realservers. Both directors are setup with localnode to serve requests when they are the active director, but when they are the inactive director, it is idle.
If I add the VIP with noarp to the director, hearthbeat would not be able to setup the VIP when it becomes the active director. Is there any way to tell hearthbeat to toggle the noarp switch on the load balancers instead of adding/removing the VIP?
Ideal sollution would be like this: secondary loadbalancer carries the VIP with noarp (trough noarp2.0/noarpctl) and can thus be used to process querys like any realserver. If the primary loadbalancer fails, the secondary loadbalancer would disable the noarp program, and thus start arping for the VIP, becoming the load balancer, using the local node feature to continue processing requests. If the primary load balancer comes back up, it either takes the role as secondary server (and adds the VIP with noarp to become a realserver), or becomes the primary load balancer agian, which would trigger the secondary load balancer to add the noarp patch again, (which would make it behave like a realserver again)
I figured we could just do the following:
the only point I don't know for sure is: will the new server begin replying to arp requests as soon as noarp has been deleted?
Joe
yes. However the arp caches for the other nodes will still have the old MAC address for the VIP and these take about 90secs to expire. Until the arp cache expires and the node makes another arp request, the node will have the wrong MAC address. Heartbeat handles this situation by sending 5 gratuitous arps (arp broadcasts) using send_arp just to make sure everyone on the net knows the new MAC address for the VIP.
Graeme Fowler graeme (at) graemef (dot) net (addressing the issue that complexity is not a problem in practice)
I've got a 3-node DNS system using LVS-DR, where all 3 nodes are directors and realservers simultaneously. I'm using keepalived to manage it all and do the failover, with a single script running when keepalived transitions from MASTER - BACKUP or FAULT and back again. It uses iptables to add an fwmark on the incoming requests, then uses the fwmark check for the LVS. Basic configuration is as follows:
global_defs { <snipped notifications> lvs_id DNS02 } static_routes { # backend managment LAN 1.2.0.0/16 via 1.2.0.126 dev eth0 } !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!! ! VRRP synchronisation !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!! vrrp_sync_group SYNC1 { group { DNS_OUT GW_IN } } vrrp_instance DNS_1 { state MASTER interface eth0 track_interface { eth1 } lvs_sync_daemon_interface eth0 virtual_router_id 111 priority 100 advert_int 5 smtp_alert virtual_ipaddress { 5.6.7.1 dev eth1 5.6.7.2 dev eth1 } virtual_ipaddress_excluded { 5.6.7.8 dev eth1 5.6.7.9 dev eth1 } virtual_routes { } notify_master "/usr/local/bin/transitions MASTER" notify_backup "/usr/local/bin/transitions BACKUP" notify_fault "/usr/local/bin/transitions FAULT" } vrrp_instance GW_IN { state MASTER garp_master_delay 10 interface eth0 track_interface { eth0 } lvs_sync_interface eth0 virtual_router_id 11 priority 100 advert_int 5 smtp_alert virtual_ipaddress { 1.2.0.125 dev eth0 } virtual_routes { } } !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!! ! DNS TCP !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!! !!!!! virtual_server fwmark 5 { smtp_alert delay_loop 30 lb_algo wlc lb_kind DR persistence_timeout 0 protocol TCP real_server 1.2.0.2 53 { weight 10 inhibit_on_failure TCP_CHECK { connect_timeout 10 connect_port 53 } MISC_CHECK { misc_path "/usr/bin/dig @1.2.0.2 -p 53 known_zone soa" misc_timeout 10 } } <snip other realservers> <snip UDP realservers> |
...Where /usr/local/bin/transitions is:
#!/bin/bash IPLIST="/etc/resolver_ips" IPCMD="/sbin/ip addr" if [ ! -f $IPLIST ] then echo "No resolver list found, exiting" exit 127 fi if [ $1 ] then SWITCH=$1 else # No command, quit echo "No command given, exiting" exit 126 fi if [ $SWITCH = "MASTER" ] then DO="del" elif [ $SWITCH = "BACKUP" -o $SWITCH = "FAULT" ] then DO="add" else # No command, quit echo "Invalid command given, exiting" exit 126 fi if [ $DO = "add" ] then # we cycle through and make the IPs in /etc/resolver_ips loopback live # We're in BACKUP or FAULT here for addr in `cat $IPLIST` do $IPCMD $DO $addr dev lo done /sbin/route del -net 5.6.7.0 netmask 255.255.255.0 dev eth1 /usr/bin/killall -HUP named elif [ $DO = "del" ] then # we do the reverse # We're in MASTER here for addr in `cat $IPLIST` do echo $IPCMD $DO $addr dev lo done /sbin/route add -net 5.6.7.0 netmask 255.255.255.0 dev eth1 /usr/bin/killall -HUP named else echo "Something is wrong, exiting" exit 125 fi ### EOF /usr/local/bin/transitions |
...and /etc/resolver_ips contains:
5.6.7.1/32 5.6.7.2/32 5.6.7.3/32 5.6.7.4/32 |
...and in /etc/sysctl.conf we have (amongst other things):
# Don't hide mac addresses from arping out of each interface net.ipv4.conf.all.arp_filter = 0 # Enable configuration of hidden devices net.ipv4.conf.all.hidden = 1 # Make the loopback device hidden net.ipv4.conf.lo.hidden = 1 |
So we have a single MASTER and two BACKUP directors in normal operation, where the MASTER has "resolver" IP addresses on its' "external" NIC, and the BACKUP directors have them on the loopback adapter. Upon failover, the transitions script moves them from loopback to NIC or vice-versa. The DNS server processes themselves are serving in excess of 880000 zones using the DLZ patch to BIND so startup times for the cluster as a whole are really very short (it can be cold-started in a matter of minutes). In practice the system can cope with many thousands of queries per minute without breaking a sweat, and fails over from server to server without a problem. You might think that this is an unmanageable methodology and is impossible to understand, but I think it works rather well :)
see Re-mapping ports in LVS-DR with iptables
For an alternate take on mapping ports for LocalNode see one box lvs.