Note | |
---|---|
fwmark nomenclature: Karl Kopper (Apr 2004) said that he thinks the correct term for this is "netfilter mark". A google search finds references to "netfilter mark" back to 2001, and with "fwmark" current at least to 2003. Both terms seem to be in use. The various netfilter HOWTOs don't say anything about new terminology. Horms (who wrote the fwmark code) doesn't know anything about a change in terminology, but thinks it's possible that fwmark is the implementation of netfilter marks. I asked Harald Welte about this at OLS_2004 and the explanation was as clear as day, except that I didn't write it down and now I've forgotten it (geez, sorry about this). It was a matter of nomenclature rather than logic: it was something like - the entity in the command line is called a mark while the method of marking packets is called fwmark. Whatever it is, you can use either term and people will know what you're talking about. |
fwmark is a way of aggregating an arbitary collection of VIP:port services into one virtual service (the entry made with ipvsadm -A). Thus a virtual service could be composed of multiple VIP:ports (e.g. VIP1:port1, VIP2:port2...VIPn:portn). This is usefull if the client needs to connect to all of the VIP:port services together on one realserver.
Common uses for fwmark are
A minor advantage is that a realserver can be added, removed and re-weighted with one ipvsadm command. To enable fwmark, the packets coming into the director have to be labelled with a fwmark (some bits are flipped in the tcp packet). This is done with iptables (or ipchains).
Note | |
---|---|
The fwmark is only a part of the packet while it stays in the skb of the machine which marked the packet (here the director). The fwmark is not on the packet when the packet is put out on the external network (i.e. the fwmark that is put on the packet when it is on the director is not on the packet when it arrives at the realserver). |
Once the component services are fwmark'ed, filtering (with iptables/ipchains) can be done on the fwmark, rather than on the individual IP:ports.
The original method for setting up an LVS used the VIP as the target for ipvsadm commands. Using the VIP as the target, it is possible for LVS to forward multiple services on the same VIP and to forward packets for several different VIPs. However this method does not scale well to large numbers of services or IPs. As well, the connections to each service are independant, unless persistence is invoked.
The more flexible fwmark method was introduced by Horms in Apr 2000. Ted Pavlic then showed how used fwmarks to group arbitary services. In this way connection to two otherwise independent services, e.g. http and https, will be linked as one service as far as ipvs is concerned and the client will stay on the same realserver for both services. The fwmarks method is more flexible and simpler to administer for large numbers of services than is the VIP method.
fwmark is used
Setting up an LVS on fwmarks rather than the VIP is now the method of choice for setups with multiple VIPs or a group of ports that need to be aggregated.
fwmark can be used with all forwarding methods and should have no affect on performance (throughput, latency).
Fwmarks are numbers but can be translated into names using the fwmark name translation table patch.
Some history from Horms (this has also been described in "Wired" Magazine - see LVS in the News).
The story starts with a trip from Sillicon Valley, where I was working for VA Linux Systems, to a VA Linux Systems Professional Services customer site in Fort Laurderdale. It was mid-February 2000. I was called onsite to help sway the customer towards using LVS. The customer was interested in using LVS for a very large number of customers. Part of their requirement called for a very large number of virtual services to be configured. I suggested that we could simplify this by collecting the virtual services into contiguous network blocks and modifying LVS to recognised all addresses in a block as belonging to a virtual service. The customer seemed to like this idea. My original proposal and implementation was to allow virtual services based on netmasks. Wensong rejected this because of some potential performance issues.
I distinctly remember working on the original implementation on a train trip from the Blue Mountains to Sydney's Central Station with my then girlfriend. By the time I had to change trains go to Wynyard the code was working :)
When I got back home to Sillicon Valley I finished off the changes and emailed them to Wensong. That was on the 20th March. He wasn't particularly happy with some aspects of the change, particularly some performance overhead that my implementation introduced. I made some changes and sent him a new version. He suggested making the new code optional, I made that so too. We exchanged email and code for about a week.
A few days latter Julian came up with the idea of using a fwmark, a feature of the ip_masq code that had been around for a while, but wasn't heavily used. Wensong passed this on to me (30 Mar). Wensong clearly was not happy with my approach to the problem and suggested the implementation that he and Julian had hashed out. The change involved using netfilter (iptables) to handle deciding which packets belong to a virtual service, rather than putting that logic into LVS itself - it was this portion of the code that Wensong was worred about the performance of.
We talked this over a little bit over email and I implemented the idea. On the 6th of April I sent the new code to Wensong and Julian. On the 7th Wensong wrote back explaining a few changes he was going to make, mostly involving having the code always compiled in rather than making it an option as there didn't appear to be any performance overhead in the new code. The new option, which by then was known as firewall mark virtual services was included in IPVS 0.9.10 which was released on the 9th April. Minor fixes were made, mainly by Wensong over the following few months and made it into subsequent releases.
I wrote the kernel, ipvsadm and ldirectord changes and largely have maintained them ever since.
It is of note that as a part of the work that came out of this customer the -R and -S options to ipvsadm were suggested and implemented by myself. These were released just before the inclusion of the fwmark code.
This customer was also the impetus for putting together what is now known as Ultra Monkey. All in all quite an interesting outcome for a couple of days on site. Pleasingly I believe that the customer in question is using Ultra Monkey with the fwmark support in LVS.
A bio of Horms:
I am from Sydney Australia. I have been involved in Linux for, well, a long time. My main area of expertise is High Availability and Load Balancing. Though anything from email to routing is just fine by me. You can see a list of the projects I have worked on as well as the papers that I have presented at confereneces on my web page (http://www.vergenet.net/linux/).
I used to work at VA Research which became VA Linux Systems until they changed their business model and became VA Software. During that time I was based in Sillicon Valley, New York City and Sydney (though not all at the same time :). I currently work for VA Linux Systems Japan, in Tokyo - which I should point out is majority owned by the Sumitomo Coropration and is independant of VA Software (USA) these days. I primarily work on the Ultra Monkey Project in conjunction with NTT Commware.
http://www.ultramonkey.org/, http://www.vasoftware.com/, http://www.valinux.co.jp/, http://www.nttcom.co.jp/.
(Joe) I first saw Horms when he gave a talk at the 4th Annual Linux Expo at Duke University, Durham, NC in May 1998, on Creating Redundant Linux Servers (http://www.vergenet.net/linux/redundant_linux_paper/). Although I attended the talk and thought it pretty neat, it never occured to me to introduce myself. Later when we both joined the LVS project, it took quite some time before I connected Horms on the LVS mailing list with the person who gave the presentation at the Linux Expo.
Sample configurations/topologies for fwmarks are at Ultramonkey.
You can enter a port number with a fwmark command with ipvsadm but it is ignored.
Leonard Soetedjo
From the HOWTO, when using fwmark, I can set the port to be 0. Is this correct? Is it ok if I do that for a single port service such as telnet? for example
iptables -t mangle -A PREROUTING -i eth0 -p tcp -s 0/0 -d VIP --dport telnet -j MARK --set-mark 1 ipvsadm -a -f 1 -r RS1:0 -g -w 1Is the use of "0" not important? i.e. I can set to whatever I want?
Horms 17 Dec 2002: The LVS kernel code that handles fwmarks really doesn't care about ports at all. If you want a service to match on specific ports, then you should set up the iptables rules to only mark packets to that port or ports.
nick garratt Mar 25, 2004
I'm experiencing issues with port translation using LVS-NAT and FWMARK:
iptables -t mangle -A PREROUTING -d VIP -p tcp -m tcp --syn --dport 1237:1239 -j MARK --set-mark 1238 ipvsadm -A -f 1238 -s wlc -p 900 ipvsadm -a -f 1238 -r 192.168.20.1:1237 -m -w 5 # daemon instance 1 ipvsadm -a -f 1238 -r 192.168.20.1:1238 -m -w 5 # daemon instance 2What I am trying to achieve is the following: we have a custom written SMPP service that accepts two connection (transmitter and receiver) from a client. We have run into problems with maximum threads per process and large numbers of binds. As an interim measure we are considering running multiple instances of the daemon on the same server. Its is imperative that a user's two binds are routed to the same daemon instance. The user may connect to a port range so as to allow them to specify different receiver and transmitter ports according to their whim or the peculiarities of their client software but the daemon instance will handle both connections on the same port.
The intention is to group the VIP port range using FWMARK as we do with many other services and load balance them across the RIP service ports ensuring that:
userIP:56789 -> VIP:1237 -> RIP:n userIP:56790 -> VIP:1238 -> RIP:nwhere n is the same port guaranteed by persistence. Problem: FWMARK and LVS-NAT port translation does not seem to work at all. what actually happens is:
userIP:56789 -> VIP:1237 -> RIP:1237 userIP:56790 -> VIP:1238 -> RIP:1238which splits the binds across daemon instances.
Horms horms (at) verge (dot) net (dot) au 06 Apr 2004
Yes, port translation does not work with fwmarks, because there is no way for LVS to tell what the port translation should be. In a fwmark service the virtual service does not have a port (or address for that matter). So it can't know that it is accepting packets for, say port 1237, and then use the realserver entry to translate that to port 1237 (not much of a translaton) or 1238 (or anything else). It has to just assume that the port will be unchanged.
It would be possible to modify LVS to allow this kind of translation to take place, but it isn't immediately obviously how this would be configured.
Another approach to the problem is to configure multiple virtual interfaces on my realserver, get the daemon instances to bind to specific IPs/same port ranges and handle as per normal i.e. no port translation:
iptables -t mangle -A PREROUTING -d VIP -p tcp -m tcp --syn --dport 1237:1239 -j MARK --set-mark 1238 ipvsadm -A -f 1238 -s wlc -p 900 ipvsadm -a -f 1238 -r 192.168.20.11:0 -m -w 5 # daemon instance 1 listening on 1237 - 1239 ipvsadm -a -f 1238 -r 192.168.20.12:0 -m -w 5 # daemon instance 2 listening on 1237 - 1239However I would prefer to keep down the number of IPs I need to failover.
I would suggest doing this. You shouldn't need to failover the IP addresses of your realservers anyway. Just use something like ldirectord to monitor their availability and manipulate the LVS table accordingly.
If you are accepting packets by a fwmark rather than by the VIP, then (in principle) you don't need the VIP on the node with the fwmark rules (which could be either the director or realserver).
To get a working LVS without configuring the VIP on a machine, you need to
To do the examples below, you can either setup the fwmarks to mark only one IP (the VIP) and install the VIP on the director, or you can read the section on routing and delivery of packets and use one of the methods suggested there.
Assuming you already have setup the networks and default gw for the machines in your LVS, here's how you'd setup telnet without fwmarks (i.e. the "normal" method, using the VIP as the target for ipvsadm commands) on a two realserver LVS-DR.
#make a table for connections to VIP:telnet, with round robin scheduling #schedule realserver RS2 for connections to VIP:telnet, weight=1, forwarding method=DR #schedule realserver RS1 for connections to VIP:telnet, weight=1, forwarding method=DR director:# ipvsadm -A -t VIP:telnet -s rr director:# ipvsadm -a -t VIP:telnet -r RS2:telnet -g -w 1 director:# ipvsadm -a -t VIP:telnet -r RS1:telnet -g -w 1 |
Here's how to do the same thing with fwmarks. You first mark the packets with ipchains or iptables.
Here's the recipe for setting a fwmark with ipchains:
#flush ipchains tables #mark with value=1, tcp packets from anywhere, #arriving on eth1 (holds the VIP on my setup), #with dst_addr=192.168.2.110 (the VIP) for port telnet #show ipchains tables director:# ipchains -F director:# ipchains -A input -p tcp -i eth1 -s 0.0.0.0/0 -d 192.168.2.110/32 \ --dport telnet -m 1 director:# ipchains -L input Chain input (policy ACCEPT): target prot opt source destination ports - tcp ------ anywhere lvs2.mack.net any -> telnet |
Here's the recipe for setting a fwmark with iptables:
The iptables parameters are taken from an example by Paul Schulz (http://www.foursticks.com.au/~pschulz/qos/pfifo.sample, link dead Jan 2003), which I found through google.
First put a mark of value=1 on tcp packets which arrive from anywhere with dst_addr=VIP:telnet (the VIP is on eth1 in my setup).
#flush the mangle table #in the skb, put mark=1 on all tcp packets arriving on eth1 from anywhere, with dest=VIP:telnet #output the mangle table, just for a look director:# iptables -F -t mangle director:# iptables -t mangle -A PREROUTING -i eth1 -p tcp -s 0.0.0.0/0 -d 192.168.2.110/32 \ --dport telnet -j MARK --set-mark 1 director:/etc/lvs# iptables -L -t mangle Chain PREROUTING (policy ACCEPT) target prot opt source destination MARK tcp -- anywhere lvs2.mack.net tcp dpt:telnet MARK set 0x1 Chain OUTPUT (policy ACCEPT) target prot opt source destination |
The fwmark is only associated with the packet while it is in the director skb (socket buffer). The packet which emerges from the director and is forwarded to the realserver is a normal (unmarked) packet. (You can't use the director's fwmark information when the packet arrives on the realserver to decide on how to handle the packet.)
#setup an ipvsadm table for packets with mark=1, #schedule them with round robin. #schedule realserver RS1 for connections with mark=1, forwarding method=DR, weight=1 #schedule realserver RS2 for connections with mark=1, forwarding method=DR, weight=1 director:# ipvsadm -A -f 1 -s rr director:# ipvsadm -a -f 1 -r RS1.mack.net:telnet -g -w 1 director:# ipvsadm -a -f 1 -r RS2.mack.net:telnet -g -w 1 |
Here's the output of ipvsadm
director:/etc/lvs# ipvsadm IP Virtual Server version 0.2.7 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn FWM 1 rr -> RS2.mack.net:23 Route 1 0 0 -> RS1.mack.net:23 Route 1 0 0 |
You can now telnet to the VIP. You'll get the expected round robin scheduling of your connections to RS2 and RS1.
The telnet example above could equally well be done using the VIP or a fwmark as the target for ipvsadm commands. The same is true for any one port service, where connections to services are made independantly of each other. Sometimes we need to group services together, e.g. port 20,21 for an ftp server or port 80, 443 for an e-commerce site. With persistence, you can only make ports persistent singly (but you can make persistent as many or as few as you want, they will be persistent independently); or make all ports persistent at once (with the :0 option), in which case persistence of the ports will be linked. There is no way to make pairs (or groups) of ports persistence with the current persistence code. The current method for handling this, persistent connection, links all ports on the VIP, and the director will forward connections to all ports, not just the two we are interested in. For security purposes, if persistence is used to group services, then connection requests to the other ports will have to be blocked. Although workable, it's an ugly solution.
For background on how the specifications for fwmarks were set to allow services to be grouped, see Appendix 1 for the initial discussion between Ted and the LVS developers (Horms and Julian), Appendix 2 where Ted let me know that he'd had it working, and Appendix 3 for Ted's announcement to the mailing list.
Here's an example grouping ports 20,21 for ftp. This uses persistence and the VIP as the target for ipvsadm commands (this is the original, VIP way of setting up ftp).
#make a table for connections to all ports on VIP #with round robin scheduling, persistence timeout=360secs #schedule realserver RS2 for connections to all ports on VIP, weight=1, forwarding method=DR #schedule realserver RS1 for connections to all ports on VIP, weight=1, forwarding method=DR director:# ipvsadm -A -t VIP:0 -s rr -p 360 director:# ipvsadm -a -t VIP:0 -r RS1.mack.net:0 -g -w 1 director:# ipvsadm -a -t VIP:0 -r RS2.mack.net:0 -g -w 1 |
Here's the output of ipvsadm
director:/etc/lvs# ipvsadm IP Virtual Server version 0.2.7 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP lvs2.mack.net:0 rr persistent 360 -> RS1.mack.net:0 Route 1 0 0 -> RS2.mack.net:0 Route 1 0 0 |
After the client has made the initial connection on port 21, then any subsequent connection on port 20 (within the 360sec timeout period) will go to the same realserver.
The problem is that the director will forward to the same realserver, connection requests made to any port by the client. If we have listeners on port 80 and 443 on the realserver, then these services will be linked to each other (which we may want), and they will also be linked to the ftp service (which we may not want). If you telnet to the VIP, this request will be forwarded to the realservers too (in production you'll have to block this).
Here's how to setup an ftp server with fwmarks. First mark the packets of interest with ipchains or iptables (i.e mark all tcp packets destined for VIP:ftp and VIP:ftp-data arriving on eth1).
#flush ipchains tables #mark ftp packets #put the same mark on ftp-data packets #show ipchains tables director:# ipchains -F director:# ipchains -A input -p tcp -i eth1 -s 0.0.0.0/0 -d 192.168.2.110/32 \ --dport ftp -m 1 director:# ipchains -A input -p tcp -i eth1 -s 0.0.0.0/0 -d 192.168.2.110/32 \ --dport ftp-data -m 1 director:# ipchains -L input Chain input (policy ACCEPT): target prot opt source destination ports - tcp ------ anywhere lvs2.mack.net any -> ftp - tcp ------ anywhere lvs2.mack.net any -> ftp-data |
#clear mangle table #mark ftp packets #put the same mark on ftp-data packets #show mangle table director:# iptables -F -t mangle director:/etc/lvs# iptables -t mangle -A PREROUTING -i eth1 -p tcp -s 0.0.0.0/0 -d 192.168.2.110/32 \ --dport ftp -j MARK --set-mark 1 director:/etc/lvs# iptables -t mangle -A PREROUTING -i eth1 -p tcp -s 0.0.0.0/0 -d 192.168.2.110/32 \ --dport ftp-data -j MARK --set-mark 1 director:/etc/lvs# iptables -L -t mangle Chain PREROUTING (policy ACCEPT) target prot opt source destination MARK tcp -- anywhere lvs2.mack.net tcp dpt:ftp MARK set 0x1 MARK tcp -- anywhere lvs2.mack.net tcp dpt:ftp-data MARK set 0x1 Chain OUTPUT (policy ACCEPT) target prot opt source destination |
Next setup ipvsadm to schedule packets marked with fwmark=1 to your realservers. You need persistence (here timeout set to 600secs).
director:# ipvsadm -A -f 1 -s rr -p 600 director:# ipvsadm -a -f 1 -r RS1.mack.net:0 -g -w 1 director:# ipvsadm -a -f 1 -r RS2.mack.net:0 -g -w 1 |
Here's the output of ipvsadm with two current connections to the LVS and 3 expiring ones. Note they are all to the same realserver, as expected for a persistent connection. Since forwarding is by LVS-NAT, the ip_vs_ftp module automatically loads.
director:/etc/lvs# ipvsadm IP Virtual Server version 0.2.7 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn FWM 1 rr persistent 600 -> RS2.mack.net:0 Route 1 2 3 -> RS1.mack.net:0 Route 1 0 0 |
A netpipe test showed the same latency and throughput for a connection based on fwmark or based on VIP.
What happens now when you telnet from the client to the VIP? (pause to let you think.) The director is only forwarding packets with fwmark=1 to the LVS, so a telnet request to the VIP is accepted by the director and not forwarded to the realservers. If telnetd is running on the director, you'll get a login prompt from the director. In production you'll have to block this too (just like you had to when setting up on a VIP).
So what's the difference, you ask, between setting up an ftp server with persistence on the VIP on one hand (which requires you to block all other packets with iptables rules), and grouping 20,21 with fwmarks on the other (which requires exactly the same blocking of unwanted packets)? Not a lot. At the moment you're at least even
Lars Marowsky-Brée lmb (at) suse (dot) de 2000-05-11
When using the LVS box as a firewall/router, the fwmark technique is a perfectly adequate solution, which doesn't cost anything.
But look at the next example.
Setup 2 groups of services, group 1 - ftp(20,21), group 2 - ecommerce(80,443).
First mark packets in 2 groups.
director:# ipchains -F director:# ipchains -A input -p tcp -i eth1 -s 0.0.0.0/0 -d 192.168.2.110/32 \ --dport ftp -m 1 director:# ipchains -A input -p tcp -i eth1 -s 0.0.0.0/0 -d 192.168.2.110/32 \ --dport ftp-data -m 1 director:# ipchains -A input -p tcp -i eth1 -s 0.0.0.0/0 -d 192.168.2.110/32 \ --dport http -m 2 director:# ipchains -A input -p tcp -i eth1 -s 0.0.0.0/0 -d 192.168.2.110/32 \ --dport https -m 2 director:# ipchains -L input Chain input (policy ACCEPT): target prot opt source destination ports - tcp ------ anywhere lvs2.mack.net any -> ftp - tcp ------ anywhere lvs2.mack.net any -> ftp-data - tcp ------ anywhere lvs2.mack.net any -> www - tcp ------ anywhere lvs2.mack.net any -> https |
director:# iptables -t mangle -A PREROUTING -i eth1 -p tcp -s 0.0.0.0/0 -d 192.168.2.110/32 \ --dport ftp -j MARK --set-mark 1 director:# iptables -t mangle -A PREROUTING -i eth1 -p tcp -s 0.0.0.0/0 -d 192.168.2.110/32 \ --dport ftp-data -j MARK --set-mark 1 director:# iptables -t mangle -A PREROUTING -i eth1 -p tcp -s 0.0.0.0/0 -d 192.168.2.110/32 \ --dport http -j MARK --set-mark 2 director:# iptables -t mangle -A PREROUTING -i eth1 -p tcp -s 0.0.0.0/0 -d 192.168.2.110/32 \ --dport https -j MARK --set-mark 2 director:/etc/lvs# iptables -L -t mangle Chain PREROUTING (policy ACCEPT) target prot opt source destination MARK tcp -- anywhere lvs2.mack.net tcp dpt:ftp MARK set 0x1 MARK tcp -- anywhere lvs2.mack.net tcp dpt:ftp-data MARK set 0x1 MARK tcp -- anywhere lvs2.mack.net tcp dpt:www MARK set 0x2 MARK tcp -- anywhere lvs2.mack.net tcp dpt:https MARK set 0x2 Chain OUTPUT (policy ACCEPT) target prot opt source destination |
Note: The ipvs code in Apr 2001 needed a patch to get the expected behaviour. This section describes the function of LVS before and after this patch. As a result of these tests, the patch will be applied to future releases. ipvs-1.0.7-2.2.19 is already patched (Apr 2001). The 2.4.3 series are not patched yet. To see if the code has been patched look in ipvs/Changelog for something like this
Julian changed persistent connection template for fwmark-based service from <CIP,VIP,RIP> to <CIP,FWMARK,RIP>, so that different fwmark-based services that share the same VIP can work correctly.
If your ipvs code is pre-patched, then you can skip down to the part where the behaviour after applying the patch is described. If your code isn't patched, you should just go get the patch and skip to the part where the expected behaviour is described.
Here's what happened with the original code.
director:# ipvsadm -A -f 1 -s rr -p 600 director:# ipvsadm -a -f 1 -r RS1.mack.net:0 -g -w 1 director:# ipvsadm -a -f 1 -r RS2.mack.net:0 -g -w 1 director:# ipvsadm -A -f 2 -s rr -p 600 director:# ipvsadm -a -f 2 -r RS1.mack.net:0 -g -w 1 director:# ipvsadm -a -f 2 -r RS2.mack.net:0 -g -w 1 |
IP Virtual Server version 0.2.7 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn FWM 1 rr persistent 600 -> RS2.mack.net:0 Route 1 0 0 -> RS1.mack.net:0 Route 1 0 0 FWM 2 rr persistent 600 -> RS2.mack.net:0 Route 1 0 0 -> RS1.mack.net:0 Route 1 0 0 |
If you ftp and http to the VIP, you'd expect the ftp connections to go to fwmark 1 (presumably to the first realserver RS2) and the http connections to go to fwmark 2 (again presumably to RS2).
With the director running 1.0.6-2.2.19 (ipvs/kernel version), all connections (ftp, http) go to group 1. With the director 0.2.7-2.4.2, all connections go to group 2. Here's the output from ipvsadm for the 2.2.19 example immediately after downloading a webpage. You would expect the http InActConn to be associated with FWM2.
IP Virtual Server version 1.0.6 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn FWM 1 rr persistent 30 -> RS2.mack.net:0 Route 1 0 2 -> RS1.mack.net:0 Route 1 0 0 FWM 2 rr persistent 30 -> RS2.mack.net:0 Route 1 0 0 -> RS1.mack.net:0 Route 1 0 0 director:/etc/lvs# |
It appears (Apr 2001) that the ipvs code doesn't really follow the persistent fwmarks spec. When there is a collision between VIP space and fwmark space (eg in these examples, where all packets are going to the same VIP), then the VIP takes precedence and the two fwmark groups are not differentiated. The collision arises because there is only one set of templates for the connection tables.
Note: May 2001: the ipvs code now has the persistent-fwmark behaviour.
( The code to produce the expected behaviour requires a separate set of templates for fwmarks and VIP. The patch to do this is on Julian's patch page and has names like persistent-fwmark-0.2.8-2.4-1.diff, persistent-fwmark-1.0.5-2.2.18-1.diff. (Note: the 0.2.8 patch had DOS carriage control and wouldn't patch till I removed the ^M characters). (Note: as of ipvs-0.9.0, this patch has been applied to the source tree.)
After patching the ip_vs code to produce the new ip_vs.o module (rmmod the old one first), you get the expected fwmark behaviour. )
Here's the output of ipvsadm after ftp'ing and http'ing from a client. Note that the ftp connection is to fwmark=1. The InActConn is the expiring connection from the http client to fwmark=2.
director:/etc/lvs# ipvsadm IP Virtual Server version 0.2.8 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn FWM 1 rr persistent 30 -> RS2.mack.net:0 Route 1 1 0 -> RS1.mack.net:0 Route 1 0 0 FWM 2 rr persistent 30 -> RS2.mack.net:0 Route 1 0 1 -> RS1.mack.net:0 Route 1 0 0 |
Here's an example of using persistence granularity (from Ratz 3 Jan 2001). The -M 255.255.255.255 sets up /32 granularity. Here port 80 and port 443 are being linked by fwmarks.
ipchains -A input -j ACCEPT -p tcp -d 192.168.1.100/32 80 -m 1 -l ipchains -A input -j ACCEPT -p tcp -d 192.168.1.100/32 443 -m 1 -l director:/etc/lvs# ipvsadm -A -f 1 -s wlc -p 333 -M 255.255.255.255 director:/etc/lvs# ipvsadm -a -f 1 -r 192.168.1.1 -g -w 1 director:/etc/lvs# ipvsadm -a -f 1 -r 192.168.1.2 -g -w 1 |
Ted Pavlic tpavlic (at) netwalk (dot) com 2000-10-08
Just another persistence option that you may or may not have thought of... LVS does support port-group sticky persistance. Before FWMARK support was added to LVS, the only types of persistance one could do were:
One port persistence (all queries to 80 return to the same realserver per CIP)
ALL port persistence (all queries to all ports return to the same RIP per CIP)
But now that FWMARK support exists in LVS, it is easy to create group-based sticky persistence. That is... It adds the option where:
Only these two ports (443 and 80) return to the same RIP per CIP
Meanwhile, another persistence table keeps track of 20, 21, and 1024:65535
Any other port is not persistent
Just have ipchains keep track of flagging the incoming packets with the correct port group identifier:
ipchains -A input -D VIPNET/VIPMASK PORT -p PROTOCOL -m FWMARK |
And have IPVS stop looking at IPs and start look at FWMARKs:
director:/etc/lvs# ipvsadm -A -f FWMARK director:/etc/lvs# ipvsadm -a -f FWMARK -r RIP:0 |
Ted Pavlic tpavlic (at) netwalk (dot) com 2000-10-13
LVS DIRECTLY supports two types of persistence and INDIRECTLY supports another. If you are just asking how to make port 443 persistent so that those who receive a cookie on 443 will come back to the same realserver on 443, simply:
/sbin/ipvsadm -A -t 192.168.1.110:443 -p /sbin/ipvsadm -a -t 192.168.1.110:443 -R 192.168.2.1 /sbin/ipvsadm -a -t 192.168.1.110:443 -R 192.168.2.2 /sbin/ipvsadm -a -t 192.168.1.110:443 -R 192.168.2.3 ... |
Will setup persistence just for port 443.
However, say someone gets a cookie on port 80 and gives it back on port 443 -- in that case you want to have persistence between multiple ports. Using port 0 accomplishes this:
/sbin/ipvsadm -A -t 192.168.1.110:0 -p /sbin/ipvsadm -a -t 192.168.1.110:0 -R 192.168.2.1 /sbin/ipvsadm -a -t 192.168.1.110:0 -R 192.168.2.2 /sbin/ipvsadm -a -t 192.168.1.110:0 -R 192.168.2.3 ... |
In this setup, anyone who visits ANY service will continue to go back to the same realserver. So requests which come in on 80 or 443 will continue to come in to the same realserver regardless of port.
This is an OK solution, but it basically makes all services persistent which might mess up scheduling. That is, this is a decent solution but sometimes not extremely desirable.
If you want to simply group ports 80 and 443 together, you need to do something more intuitive. Use FWMARK...
ipchains -A input -d 192.168.1.110/32 80 -p tcp -m 1 ipchains -A input -d 192.168.1.110/32 443 -p tcp -m 1 /sbin/ipvsadm -A -f 1 -p /sbin/ipvsadm -a -f 1 -R 192.168.2.1 /sbin/ipvsadm -a -f 1 -R 192.168.2.2 /sbin/ipvsadm -a -f 1 -R 192.168.2.3 ... |
Now only port 80 and 443 will be grouped together via persistence. Any other director:/etc/lvs# ipvsadm rules will be completely separate. This means that you can make 80 and 443 persistence by their own little "port group" and leave ports 25 and 110 (for example) not persistent. OR... You could group all the FTP ports together as well on a completely different persistence group... i.e.
ipchains -A input -d 192.168.1.110/32 80 -p tcp -m 1 ipchains -A input -d 192.168.1.110/32 443 -p tcp -m 1 /sbin/ipvsadm -A -f 1 -p /sbin/ipvsadm -a -f 1 -R 192.168.2.1 /sbin/ipvsadm -a -f 1 -R 192.168.2.2 /sbin/ipvsadm -a -f 1 -R 192.168.2.3 # Really adding port 20 isn't needed ipchains -A input -d 192.168.1.110/32 20 -p tcp -m 2 ipchains -A input -d 192.168.1.110/32 21 -p tcp -m 2 ipchains -A input -d 192.168.1.110/32 1024:65535 -p tcp -m 2 /sbin/ipvsadm -A -f 2 -p /sbin/ipvsadm -a -f 2 -R 192.168.2.1 /sbin/ipvsadm -a -f 2 -R 192.168.2.2 /sbin/ipvsadm -a -f 2 -R 192.168.2.3 ... |
and again
Wayne wrote
Is there a easy way to relating server in both port 80 and port 443 (with LVS-NAT)?
Say I have two farms, each with same three servers. One farm load balancing HTTP requests and another farm load balancing HTTPS farms. To make sure the user in the persistent mode connected to the HTTP server always go to the same server for HTTPS service, we would like to have some way to relate the services between the two farms, is there a easy way to do it?
ratz ratz (at) tac (dot) ch 2001-01-03
Two possibilities to solve this with LVS
Use port 0 in your setup. (advantage: easy to set up and easy understand)
Use fwmark and group them together. (advantage: finer port granularity possible)
Example (1):
director:/etc/lvs# ipvsadm -A -t 192.168.1.100:0 -s wlc -p 333 -M 255.255.255.255 director:/etc/lvs# ipvsadm -a -t 192.168.1.100:0 -r 192.168.1.1 -g -w 1 director:/etc/lvs# ipvsadm -a -t 192.168.1.100:0 -r 192.168.1.2 -g -w 1 |
Example (2):
ipchains -A input -j ACCEPT -p tcp -d 192.168.1.100/32 80 -m 1 -l ipchains -A input -j ACCEPT -p tcp -d 192.168.1.100/32 443 -m 1 -l director:/etc/lvs# ipvsadm -A -f 1 -s wlc -p 333 -M 255.255.255.255 director:/etc/lvs# ipvsadm -a -f 1 -r 192.168.1.1 -g -w 1 director:/etc/lvs# ipvsadm -a -f 1 -r 192.168.1.2 -g -w 1 |
You can setup passive ftp with the VIP as the target using persistence. This is not a particular satisfactory solution, as connect requests to all ports will be forwarded. As well, if another service on the realserver fails (eg http), then all services have to be failed out together.
Here's a solution to passive ftp from Ted Pavlic using fwmark. This allows setting up passive ftp independantly of other services. Passive ftp listens on an unknown and unpredictable high port on realserver. This is handled by forwarding requests to all high ports (it's still ugly, but at least this way, we can fail out ftp independently of other services).
Here's ftp setup in active mode, as a control.
director:# iptables -L -t mangle Chain PREROUTING (policy ACCEPT) target prot opt source destination MARK tcp -- anywhere lvs2.mack.net tcp dpt:ftp MARK set 0x1 MARK tcp -- anywhere lvs2.mack.net tcp dpt:ftp-data MARK set 0x1 # #setup ipvsadm, making all packets with mark=1 persistent director:# ipvsadm -A -f 1 -s rr -p 600 director:# ipvsadm -a -f 1 -r RS1:0 -g -w 1 director:# ipvsadm -a -f 1 -r RS2:0 -g -w 1 director:# ipvsadm IP Virtual Server version 0.2.8 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn FWM 1 rr persistent 600 -> RS2.mack.net:0 Route 1 0 0 -> RS1.mack.net:0 Route 1 0 0 |
Here's netstat -an on the client and the realserver (RS2) immediately after an ftp file transfer (with the client still connected).
#client: client:~# netstat -an | grep 110 #110 is part of the VIP tcp 0 0 client:1176 VIP:21 ESTABLISHED #realserver RS2:/home/ftp/pub# netstat -an | grep 254 #254 is part of the client IP tcp 0 0 VIP:20 client:1180 TIME_WAIT tcp 0 0 VIP:20 client:1178 TIME_WAIT tcp 0 0 VIP:20 client:1177 TIME_WAIT tcp 0 0 VIP:21 client:1176 ESTABLISHED |
Only port 20,21 are involved here.
Here's the command line at the client during the active ftp transfer (all expected output).
ftp> get tulip.c local: tulip.c remote: tulip.c 200 PORT command successful. 150 Opening BINARY mode data connection for tulip.c (104241 bytes). 226 Transfer complete. 104241 bytes received in 0.0232 secs (4.4e+03 Kbytes/sec) |
The iptables rules on the director do not allow passive ftp connection. To test this put the ftp client into passive mode.
ftp> pass Passive mode on. ftp> dir 227 Entering Passive Mode (192,168,2,110,4,72) ftp: connect: Connection refused ftp> |
connection is not allowed. To check that the system is still functioning, put the client back into active mode.
ftp> pass Passive mode off. ftp> dir 200 PORT command successful. 150 Opening ASCII mode data connection for /bin/ls. total 155178 . . -rw-r--r-- 1 root root 104241 Nov 10 1999 tulip.c 226 Transfer complete. ftp> |
Here's the setup for passive ftp (2.4.x director) (you can leave ipvsadm untouched).
director:# iptables -F -t mangle #mark ftp packets director:# iptables -t mangle -A PREROUTING -i eth1 -p tcp -s 0.0.0.0/0 -d 192.168.2.110/32 \ --dport ftp -j MARK --set-mark 1 director:# iptables -t mangle -A PREROUTING -i eth1 -p tcp -s 0.0.0.0/0 -d 192.168.2.110/32 \ --dport 1024: -j MARK --set-mark 1 director:# iptables -L -t mangle Chain PREROUTING (policy ACCEPT) target prot opt source destination MARK tcp -- anywhere lvs2.mack.net tcp dpt:ftp MARK set 0x1 MARK tcp -- anywhere lvs2.mack.net tcp dpts:1024:65535 MARK set 0x1 |
Here's the command line from the ftp client still in active mode
ftp> dir 200 PORT command successful. |
The session is hung, the server shows an established connection to port 21 and the client session has to be killed.
Here's the passive session.
client:~# ftp VIP Connected to VIP. 220 RS2.mack.net FTP server (Version wu-2.4.2-academ[BETA-15](1) Wed May 20 13:45:04 CDT 1998) ready. Name (VIP:root): ftp 331 Guest login ok, send your complete e-mail address as password. Password: 230 Guest login ok, access restrictions apply. Remote system type is UNIX. Using binary mode to transfer files. ftp> pass Passive mode on. ftp> cd pub 250 CWD command successful. ftp> dir *.c 227 Entering Passive Mode (192,168,2,110,4,75) 150 Opening ASCII mode data connection for /bin/ls. -rw-r--r-- 1 root root 104241 Nov 10 1999 tulip.c 226 Transfer complete. ftp> mget *.c mget tulip.c? y 227 Entering Passive Mode (192,168,2,110,4,78) 150 Opening BINARY mode data connection for tulip.c (104241 bytes). 226 Transfer complete. 104241 bytes received in 0.0233 secs (4.4e+03 Kbytes/sec) ftp> |
Here's the connections at the realserver immediately after the file transfer. There is the regular connection at the ftp port (21) and a connection timing out to a high port on the realserver.
RS2:/home/ftp/pub# netstat -an | grep 254 #254 is part of the client IP tcp 0 0 VIP:1104 client:1191 TIME_WAIT tcp 0 0 VIP:21 client:1184 ESTABLISHED |
Here's the output from ipvsadm after connecting to the URL ftp://vip/ using a web-browser
director:# ipvsadm IP Virtual Server version 0.2.8 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn FWM 1 rr persistent 600 -> RS2.mack.net:0 Route 1 1 5 -> RS1.mack.net:0 Route 1 0 0 |
#fwmark rules director:# iptables -F -t mangle #active and passive ftp in group 1 director:# iptables -t mangle -A PREROUTING -i eth1 -p tcp -s 0.0.0.0/0 -d 192.168.2.110/32 \ --dport ftp -j MARK --set-mark 1 director:# iptables -t mangle -A PREROUTING -i eth1 -p tcp -s 0.0.0.0/0 -d 192.168.2.110/32 \ --dport ftp-data -j MARK --set-mark 1 director:# iptables -t mangle -A PREROUTING -i eth1 -p tcp -s 0.0.0.0/0 -d 192.168.2.110/32 \ --dport 1024: -j MARK --set-mark 1 #http as group 2 director:# iptables -t mangle -A PREROUTING -i eth1 -p tcp -s 0.0.0.0/0 -d 192.168.2.110/32 \ --dport http -j MARK --set-mark 2 director:/etc/lvs# iptables -L -t mangle Chain PREROUTING (policy ACCEPT) target prot opt source destination MARK tcp -- anywhere lvs2.mack.net tcp dpt:ftp MARK set 0x1 MARK tcp -- anywhere lvs2.mack.net tcp dpt:ftp-data MARK set 0x1 MARK tcp -- anywhere lvs2.mack.net tcp dpts:1024:65535 MARK set 0x1 MARK tcp -- anywhere lvs2.mack.net tcp dpt:www MARK set 0x2 # #setup LVS for 2 groups director:# ipvsadm -C #ftp (active and passive) are persistent as group 1 director:# ipvsadm -A -f 1 -s rr -p 600 director:# ipvsadm -a -f 1 -r RS1:0 -g -w 1 director:# ipvsadm -a -f 1 -r RS2:0 -g -w 1 #http as group 2 (not persistent) director:# ipvsadm -A -f 2 -s rr director:# ipvsadm -a -f 2 -r RS1:http -g -w 1 director:# ipvsadm -a -f 2 -r RS2:http -g -w 1 director:# ipvsadm IP Virtual Server version 0.2.8 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn FWM 1 rr persistent 600 -> RS2.mack.net:0 Route 1 0 0 -> RS1.mack.net:0 Route 1 0 0 FWM 2 rr -> RS1.mack.net:80 Route 1 0 0 -> RS2.mack.net:80 Route 1 0 0 |
The client connected (in order) ftp://VIP/, http://VIP/ (passive ftp) and then by active (command line) ftp to VIP. Here's the ipvsadm output.
director:# ipvsadm IP Virtual Server version 0.2.8 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn FWM 1 rr persistent 600 -> RS2.mack.net:0 Route 1 2 3 -> RS1.mack.net:0 Route 1 0 0 FWM 2 rr -> RS1.mack.net:80 Route 1 2 2 -> RS2.mack.net:80 Route 1 4 0 |
Here's the connections showing on the realserver. The most recent ones are at the top of the list. The connection list shows (from the bottom, i.e. in the order of connection), passive ftp, http, and active ftp.
RS2:/home/ftp/pub# netstat -an | grep 254 #254 is part of the CIP tcp 0 0 VIP:21 client:1207 ESTABLISHED tcp 0 0 VIP:80 client:1206 FIN_WAIT2 tcp 0 0 VIP:80 client:1204 FIN_WAIT2 tcp 0 0 VIP:1108 client:1202 TIME_WAIT tcp 0 0 VIP:21 client:1201 ESTABLISHED |
The whole point of this setup is to make ftp and http, which belonged to one persistence group when setup on a VIP, into two groups. Now you can bring the httpd and the ftpd up and down independantly (if you want to fail them out, to change the configuration or software).
(based on a posting by Horms on 14 Jul 2000)
Here we setup a LVS-NAT LVS on a 2.4.x director. (Note: With 2.4 LVS, the masquerading is setup by the ipvs code, i.e. you don't have to masquerade the packets back from the realservers). These examples assume that the VIP is on eth1 and your network is already setup (i.e. the realservers are using the director as the default gw etc).
Mark packets for the VIP and setup the LVS for telnet.
Warning | |
---|---|
this first example is not going to get you anything you want. |
# #mark packets director:# iptables -F -t mangle director:# iptables -t mangle -A PREROUTING -i eth1 -p tcp -s 0.0.0.0/0 -d 192.168.2.110/32 -j MARK --set-mark 1 director:# iptables -L -t mangle Chain PREROUTING (policy ACCEPT) target prot opt source destination MARK tcp -- anywhere lvs2.mack.net MARK set 0x1 # #Setup ipvsadm director:# ipvsadm -C director:# ipvsadm -A -f 1 -s rr director:# ipvsadm -a -f 1 -r RS1.mack.net:telnet -m -w 1 director:# ipvsadm -a -f 1 -r RS2.mack.net:telnet -m -w 1 director:# ipvsadm IP Virtual Server version 0.2.8 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn FWM 1 rr -> RS2.mack.net:23 Masq 1 0 0 -> RS1.mack.net:23 Masq 1 0 0 |
You can connect with telnet to the VIP and you'll be forwarded to both realservers in the expected way.
All packets from the client will be marked and processed by the director:/etc/lvs# ipvsadm rules. What happens if you attempt to connect to VIP:80 (pause to think)?
Here's the answer.
client:~# telnet VIP 80 Trying 192.168.2.110... Connected to lvs2.mack.net. Escape character is '^]'. Welcome to Linux 2.2.19. RS2 login: root Linux 2.2.19. Last login: Fri Apr 13 11:43:52 on ttyp1 from client2.mack.net. No mail. |
If you connect to VIP:80 using a browser for a client, it sits there showing the watch symbol for quite a while.
What happened? The explanation is that you told the director to mark all packets (i.e. from any port) from the client, rewrite them to have dest_addr=RIP:telnet and forward the rewritten packets to the realserver. So when you telnet'ed to VIP:80, the packets were forwarded to RIP:23.
Just to make sure that I'd interpretted this correctly, here's the first packets seen by tcpdump running on the client and the realserver during the connect attempts. (These are from different sessions, so the ports shown on the client are different.)
client: here the client is connecting to VIP:80 (lvs2.www)
12:09:44.449566 client2.1118 > lvs2.www: S 2887976275:2887976275(0) win 5840 <mss 1460,sackOK,timestamp 118456418[|tcp]> (DF) [tos 0x10] 12:09:44.450453 lvs2.www > client2.1118: S 1441372470:1441372470(0) ack 2887976276 win 32120 <mss 1460,sackOK,timestamp 117741798[|tcp]> (DF) 12:09:44.450579 client2.1118 > lvs2.www: . ack 1 win 5840 <nop,nop,timestamp 118456418 117741798> (DF) [tos 0x10] |
realserver (RS2): here the realserver is receiving packets to the RIP:23 (RS2.telnet)
11:44:28.319675 client2.1116 > RS2.telnet: S 2722509719:2722509719(0) win 5840 <mss 1460,sackOK,timestamp 118440378[|tcp]> (DF) [tos 0x10] 11:44:28.319974 RS2.telnet > client2.1116: S 1283414485:1283414485(0) ack 2722509720 win 32120 <mss 1460,sackOK,timestamp 117725760[|tcp]> (DF) 11:44:28.320681 client2.1116 > RS2.telnet: . ack 1 win 5840 <nop,nop,timestamp 118440378 117725760> (DF) [tos 0x10] |
If you want only telnet requests to be forwarded to the realservers, you should mark only packets for VIP:telnet. If you want both telnet and http forwarded then you should give them each their own mark. Here's how to setup LVS-NAT with fwmark for both telnet and http.
director:# iptables -F -t mangle #telnet packets to the VIP get fwmark=1 director:# iptables -t mangle -A PREROUTING -i eth1 -p tcp -s 0.0.0.0/0 -d 192.168.2.110/32 \ --dport telnet -j MARK --set-mark 1 #http packets to the VIP get fwmark=2 director:# iptables -t mangle -A PREROUTING -i eth1 -p tcp -s 0.0.0.0/0 -d 192.168.2.110/32 \ --dport http -j MARK --set-mark 2 director:# iptables -L -t mangle Chain PREROUTING (policy ACCEPT) target prot opt source destination MARK tcp -- anywhere lvs2.mack.net tcp dpt:telnet MARK set 0x1 MARK tcp -- anywhere lvs2.mack.net tcp dpt:www MARK set 0x2 # #setup ipvsadm director:# ipvsadm -C #forward packets with mark=1 to the telnet port director:# ipvsadm -A -f 1 -s rr director:# ipvsadm -a -f 1 -r RS1.mack.net:telnet -m -w 1 director:# ipvsadm -a -f 1 -r RS2.mack.net:telnet -m -w 1 #forward packets with mark=2 to the httpd port director:# ipvsadm -A -f 2 -s rr director:# ipvsadm -a -f 2 -r RS1.mack.net:http -m -w 1 director:# ipvsadm -a -f 2 -r RS2.mack.net:http -m -w 1 director:# ipvsadm IP Virtual Server version 0.2.8 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn FWM 1 rr -> RS2.mack.net:23 Masq 1 0 0 -> RS1.mack.net:23 Masq 1 0 0 FWM 2 rr -> RS2.mack.net:80 Masq 1 0 0 -> RS1.mack.net:80 Masq 1 0 0 |
Here's the (expected) output of ipvsadm showing the client with 2 telnet sessions and having just downloaded a webpage from the LVS.
director:/etc/lvs# ipvsadm IP Virtual Server version 0.2.8 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn FWM 1 rr -> RS2.mack.net:23 Masq 1 1 0 -> RS1.mack.net:23 Masq 1 1 0 FWM 2 rr -> RS2.mack.net:80 Masq 1 0 1 -> RS1.mack.net:80 Masq 1 0 0 |
Since it's possible to write iptables rules that include many different types of packets, it's possible to write VIP and fwmark rules that would conflict by accepting the same packet. Here's a setup that would accept telnet by both VIP and fwmarks.
director:# iptables -t mangle -A PREROUTING -i eth1 -p tcp -s 0.0.0.0/0 -d 192.168.2.110/32 \ --dport ftp -j MARK --set-mark 1 director:# ipvsadm -A -t lvs2.mack.net:telnet -s rr director:# ipvsadm -a -t lvs2.mack.net:telnet -r RS1.mack.net:telnet -g -w 1 director:# ipvsadm -a -t lvs2.mack.net:telnet -r RS2.mack.net:telnet -g -w 1 director:# ipvsadm -A -f 1 -s rr director:# ipvsadm -a -f 1 -r RS1.mack.net:telnet -g -w 1 director:# ipvsadm -a -f 1 -r RS2.mack.net:telnet -g -w 1 # director:# iptables -L -t mangle Chain PREROUTING (policy ACCEPT) target prot opt source destination MARK tcp -- anywhere lvs2.mack.net tcp dpt:ftp MARK set 0x1 # director:# ipvsadm IP Virtual Server version 0.2.8 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP lvs2.mack.net:telnet rr -> RS2.mack.net:telnet Route 1 0 0 -> RS1.mack.net:telnet Route 1 0 0 FWM 1 rr -> RS2.mack.net:telnet Route 1 0 0 -> RS1.mack.net:telnet Route 1 0 0 |
Here's the ipvsadm output after 4 telnet connections from a client
director:/etc/lvs# ipvsadm IP Virtual Server version 0.2.8 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP lvs2.mack.net:telnet rr -> RS2.mack.net:telnet Route 1 2 0 -> RS1.mack.net:telnet Route 1 2 0 FWM 1 rr -> RS2.mack.net:telnet Route 1 0 0 -> RS1.mack.net:telnet Route 1 0 0 |
All connections go to the first (here VIP) entries. The same ipvsadm table and connection pattern results if you feed the VIP and fwmarks rules into ipvsadm in the reverse order. This behaviour is not part of the spec (yet). You might want to check the behaviour, if you are doing this sort of setup.
Persistence granularity was added to LVS by Lars lmb (at) suse (dot) de 1999-10-13
This patch adds netmasks to persistent ports, so you can adjust the granularity of the templates. It should help solve the problems created with non-persistent cache clusters on the client side."
The problem being addressed is that some clients (eg AOL customers) connect to the internet via large proxy farms. The IP they present to the server will not neccessarily be the same for different sessions (tcp connections), even though they remain connected to their proxy machine. Persistence granularity makes all clients from a network equivalent as far as persistence is concerned. Thus a client could appear as CIP=x.x.x.13 for their http connections, but CIP=x.x.x.14 for their https connections. With persistence granularity set to /24, all CIPs from the same class C network will be sent to the same realserver. The default behaviour (i.e. persistence granularity is /32) has the effect that all connections from the same CIP to be sent to the one realserver but other connections from the same network will be scheduled to other realservers.
Persistence granularity is applied to the CIP and works the same whether you are using fwmark or the VIP to setup the LVS.
You set the netmask (granularity) for persistence granularity with ipvsadm. If the LVS was setup with the following command, the persistence granularity is 255.255.255.0.
director:/etc/lvs# ipvsadm -A -t 192.168.1.100:0 -s wlc -p 333 -M 255.255.255.0 |
Let's say a client from a class C network (e.g. with IP=100.100.100.2) connects to the LVS. If any other client connects from 100.100.100.0/24 they will also connect to the same realserver as long as the original client's entry in the persistence table has not expired (i.e. the first client is still connected, or disconnected < 333 secs ago).
Here's an example LVS-DR LVS set to mark packets for an IP on the outside of the director (this IP serves as the VIP in the usual LVS setup, but there's no such thing as a VIP with fwmarks) with --dport telnet. Persistence granularity is set to the default (-M 255.255.255.255).
director:# ipvsadm -C director:# ipvsadm -A -f 1 -s rr -p 600 director:# ipvsadm -a -f 1 -r RS1:0 -g -w 1 director:# ipvsadm -a -f 1 -r RS2:0 -g -w 1 |
Two clients (192.168.2.254, 192.168.2.253) connect to the LVS. Each host connects to different realservers but multiple connects from each client go to the same realserver (i.e. client A always goes to realserver A; client B always goes to realserver B, at least till the persistence timeout clears). Here both clients have connected twice.
director:# ipvsadm IP Virtual Server version 0.2.8 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn FWM 1 rr persistent 600 -> RS2.mack.net:0 Route 1 2 0 -> RS1.mack.net:0 Route 1 2 0 |
This is the connection pattern expected if the connections were based on the CIP/32 and fwmark (ie all clients are scheduled independently).
Here's the same setup with persistence granularity set to /24.
director:# ipvsadm -C director:# ipvsadm -A -f 1 -s rr -p 600 -M 255.255.255.0 director:# ipvsadm -a -f 1 -r RS1:0 -g -w 1 director:# ipvsadm -a -f 1 -r RS2:0 -g -w 1 director:# ipvsadm IP Virtual Server version 0.2.8 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn FWM 1 rr persistent 600 mask 255.255.255.0 -> RS2.mack.net:0 Route 1 0 0 -> RS1.mack.net:0 Route 1 0 0 |
Here's what happens when the 2 clients, both of who belong to the same CIP/24 persistence group, connect twice - all connections go to the same realserver.
director:/etc/lvs# ipvsadm IP Virtual Server version 0.2.8 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn FWM 1 rr persistent 600 mask 255.255.255.0 -> RS2.mack.net:0 Route 1 4 0 -> RS1.mack.net:0 Route 1 0 0 |
Joe
I expect if you were using persistence with fwmark, then any connection requests arriving with the same fwmark will be treated as belonging to that persistence group. Presumably any combination of client IPs and/or networks could have been used to make the rules which marks the packets.
Julian
Yes, it is for the same group but in one fwmark group there are many templates created. These templates are different for the client groups. The template looks like this:
CIPNET:0 -> SERVICE(FWMARK/VIP):0 -> RIP:0 |
All ports 0 for the fwmark-based services
So, for client 10.1.2.3/24 (24=persistent granularity) the template looks like this:
10.1.2.0:0 -> VIP:0 -> RIP:0 |
LVS patched with the persistent_fwmark patch:
10.1.2.0:0 -> FWMARK:0 -> RIP:0 |
So, the templates are created with CIP/GRAN in mind and the lookup uses CIPNET too. We use
CIPNET = CIP & CNETMASK |
before creation and lookup.
so if I did
iptables -s 10.1.2.3 -m 1 director:/etc/lvs# ipvsadm -A -f 1 -s rr -p 600 -M 255.255.255.0only packets from 10.1.2.3 will have a fwmark on them, but the director would forward all packets from 10.1.2.0/24, even those without fwmarks?
The patched LVS will accept only the marked packets for this fwmark service, from the same /24 client subnet. If only one client IP sends packets that are marked then the real service will receive packets only from 10.1.2.3. The current LVS versions don't consider the service and all packets CIPNET -> VIP will be forwarded using the first created template for CIPNET:0->VIP:0, i.e. these packets will randomly hit one of the many services that accept packets for the same VIP (just like in your setup) and then may be a wrong realserver.
The current LVS versions don't consider the service and all packets CIPNET -> VIPbut there is no VIP here, I'm using fwmark only. what does the -M 255.255.255.0 do in this case?
The current LVS versions (i.e. without the persistent_fwmark patch) assume the VIP is the iphdr->daddr, i.e. the destination address in the datagram and this addresses is used to lookup/create the template.
how about your persistent-patch, which I've been working with?
The patch ignores this daddr when creating or looking for templates. Instead, the service fwmark values is used when the service is fwmark-based: CIPNET:0 -> FWMARK:0 -> RIP:0
The normal services use daddr as VIP when looking for or creating templates: CIPNET:0 -> daddr:0 -> RIP:0
The persistence is associated with the client address (CIP). The sequence is this:
- packet comes from CIP to VIP1
- fw marking, optional
- lookup for existing connection CIP:CPORT->VIP1:VPORT, if yes => forward, if not found:
- lookup service => fwmark 1, persistent
- try to select real service in context of the virtual service
Apply the persistence granularity to the client address
CIPNET = CIP & svc->netmask |
Now lookup for template
if there is template, bind the new connection to the template's destination
if there is no existing template, get one destination using the scheduler and bind it to the newly created template and the new connection. The created template is
- forward the packet
Persistence granularity was designed for people coming in from large proxy servers (eg AOL). With fwmarks, this can be handled by iptables rules.
Yes, the fact that we group the clients using this netmask is not related to the virtual service type: normal or fwmark-based.
Yes, each different IP is treated as different client. When a netmask <32 is used, the group of addresses is treated as one client when applying the persistence rules. This is not related to the packet marking and virtual service type.
If a LVS-DR director is accepting packets by fwmarks, then it does not have a VIP. The director can then be the default gw for the realservers (see LVS-DR director is default gw for realservers).
If a fwmark rule accepts packets for a /24 network, then 254 IPs are configured in one instruction. The next sections are examples.
Horms horms (at) vergenet (dot) net 2000-12-06
Assume that packets from out local network (192.168.0.0/23) are outgoing traffic.
Mark all outgoing packets with fwmark 1
ipchains -A input -s 192.168.0.0/23 -m 1 # Now, set up a virtual service to act on the marked packets director:/etc/lvs# ipvsadm -A -f 1 director:/etc/lvs# ipvsadm -a -f 1 -r 192.168.1.7 director:/etc/lvs# ipvsadm -a -f 1 -r 192.168.1.8 director:/etc/lvs# ipvsadm -a -f 1 -r 192.168.1.9 |
Where 192.168.1.7, 192.168.1.8 and 192.168.1.9 are your firewall boxen.
Matthew S. Crocker wrote:
would like to put a CIDR block of addresses (/25) through my LVS server. Is there a way I can set one entry for a VIP range and then the load balancing will be handled over the entire range.
Horms horms (at) vergenet (dot) net 2001-01-13
Set up fwmark rules on the input chain to match incoming packets for the CIDR and mark them with a fwmark.
e.g.
ipchains -A input -d 192.168.192.0/24 -m 1 |
Use the fwmark (1 in this case) as the virtual service.
director:/etc/lvs# ipvsadm -A -f 1 director:/etc/lvs# ipvsadm -a -f 1 -r 10.0.0.1 director:/etc/lvs# ipvsadm -a -f 1 -r 10.0.0.2 |
Miri Groentman, 11 Jul 2001
Is it possible to configure a range of ports rather than a single-port
Joe
if you mean ports for services, yes, see fwmark in the HOWTO. You can also forward a range of IPs.
client A (from 192.x.x.x) should go to realserver 1..3, and client B (from 10.x.x.x) should go to realserver 4..6.
(Julian, 10-05-2000)
Write fwmark rules based on the source IP of the packets. Then create two virtual services, one for each fwmark.
Ian Courtney wrote:
Basically here at our ISP, we tend to have 2-3 Class C's worth of hosting per server. We would like to move the the LVS, but I'm not exactly sure how I should be setting it up.
Chris chris (at) isg (dot) de 2001-01-15
You can use the fwmark option for the loadbalancing
#mark the incoming packets with ipchains ipchains -A input -s 0.0.0.0/0 -d 192.168.0.0/24 -m 1 #then you can setup your LVS like director:/etc/lvs# ipvsadm -A -f 1 -s wlc director:/etc/lvs# ipvsadm -a -f 1 -r 10.10.10.15 -g director:/etc/lvs# ipvsadm -a -f 1 -r 10.10.10.16 -g |
the router should point to the director.
Ian Courtney wrote back:
It didn't work until I aliased all 3 class C's to my director. Do I have to do this?
Julian Anastasov ja (at) ssi (dot) bg 2001-01-16
Yes, only the packets destined for local addresses/networks are accepted. The others are dropped or forwarded to another box.
the next project involves redoing our standard linux web space, which so far consists of about 8 webservers, each hosting atleast 2 class C's worth of hosting. I some how don't think Linux will take nicely to have 16 or more class C's aliased to it.
If possible use netmask <24. I assume you execute (replace with the right Class C nets):
ifconfig lo:1 207.228.79.0 netmask 255.255.254.0 ifconfig lo:2 207.148.155.0 netmask 255.255.255.0 ifconfig lo:3 207.148.151.0 netmask 255.255.255.0 |
on the director and on each realserver and solve the arp problem using:
echo 1 > /proc/sys/net/ipv4/conf/all/hidden echo 1 > /proc/sys/net/ipv4/conf/lo/hidden |
in the realservers. If you don't want to advertise these addresses using ARP to the Cisco LAN, you can execute the above two commands in the director too.
Thomas Proell, 16 Aug 2000
How do you use fwmark if you want the director to accept packets for a wide range of addresses, for which is doesn't have IPs.
(Horms)
Here's a setup I used...
Internet | Router 192.168.128.1 "client" Linux Director | va2-------------------------va3-----------------+--------- proxy (va4) 192.168.16.3 192.168.16.1 192.168.128.2 192.168.128.5 |
I have used 192.168/16, but these could be real addresses too. I have only put one proxy server in the diagram but I did test it with 2
Client: default gw va3 (192.168.16.1) Linux Director: eth0: 192.168.128.2 (internet/proxy side) eth1: 192.168.16.1 (client side) Default gw: Router ,192.168.128.1 IPV4 forwarding enabled. Ipvsadm rules - these can be translated into ldirectord configuration. director:/etc/lvs# ipvsadm -A -f 1 -s wlc director:/etc/lvs# ipvsadm -a -f 1 -r 192.168.128.3:0 -g -w 1 ... add additonal proxy servers |
Interestingly enough if you add a proxy that just forwads traffic then it will end up going direct. This may be useful as a failback server if the proxy servers fail.
ipchains -A input -s 0.0.0.0/0.0.0.0 -d 127.0.0.1/255.255.255.255 -j ACCEPT ipchains -A input -s 0.0.0.0/0.0.0.0 -d 192.168.128.2/255.255.255.255 -j ACCEPT ipchains -A input -s 0.0.0.0/0.0.0.0 -d 192.168.16.2/255.255.255.255 -j ACCEPT ipchains -A input -s 0.0.0.0/0.0.0.0 -d 0.0.0.0/0.0.0.0 80 -p tcp -j REDIRECT 80 -m 1 |
The -m 1 means that IPVS will regognise packets patched by this filter as belonging to the virtual service as long as it sees the packets as local. -j REDIRECT 80 makes the packets appear as local. It is of note that the port you redirect to is _ignored_ because of the way IPVS works - paickets using fwmark are sent to the port they arrived on. This means that packets will be sent to proxy servers as port 80 traffic.
Proxy: eth0: 192.168.128.5 Default gw: 192.168.128.1 (router) IPV4 forwarding enabled. ipchains -A input -s 0.0.0.0/0.0.0.0 -d 127.0.0.1/255.255.255.255 -j ACCEPT ipchains -A input -s 0.0.0.0/0.0.0.0 -d 192.168.128.5/255.255.255.255 -j ACCEPT ipchains -A input -s 0.0.0.0/0.0.0.0 -d 0.0.0.0/0.0.0.0 80:80 -p 6 -j REDIRECT +8080 |
Note, this is where the redirection to port 8080 takes place.
Pongsit (at) yahoo (dot) samart (dot) co (dot) th May 08, 2000
If I would like to use LVS to balance 3 transparent proxy is this how i do it ?
Internet | | ------------------------------------------- hub 1 | | | |eth0 | | proxy1 ,2 and 3 set as a proxy1 proxy2 proxy3 transparent proxy with firewall |eth1 | | where eth0 connect to internet | | | and eth1 to the internal network ___________________________________________ | | | | | hub 2 | | | | | LVS/DR client machines | | | ___________________________________________ hub 3 if i have more internel users |
Horms horms (at) vergenet (dot) net 2000-05-08
If you want to do transparent proxying then I would suggest a topology more along the lines of:
Internet | | ------------------------------------------------ hub 1 | | LVS/DR | | ________________________________________________ | | | | | | | hub 2 | | | | | | | proxy1 proxy2 proxy3 client machines | | | _________________________________________________ hub 3 if i have more internel users |
Use IP chains mark all outgoing port 80 traffic, other than from the 3 proxy servers with firewall mark 1 (ipchains -m 1...).
Set up a IPVS virtual service matching of fwmark 1 (ipvsadm -A -f 1...).
The proxy servers will need to be set up to recognise all port 80 traffic forwarded to them as local.
This way all outgoing traffic hits the LVS box. If it is for port 80 and isn't from one of the proxy servers then it gets load balanced and forwarded to one of the proxy servers.
You may want to consider a hot standby LVS/DR host to eliminate a single point of failure on your network.
I haven't tested this but I think it should work.
(Joe: my initial -apparently incorrect- reaction was that routing protocols would handle this better.)
Martin Sk?tt martin (at) xenux (dot) dk 19 Jun 2001
I have several ADSL connections to the Internet (same ISP) and I wan't all the users on my network to be using them. I would like it to work in a way so that all the lines are utilised all the time and without assigning groups of users to specific gateways.
Internet / My users ---- Linux box with LVS - Internet \ Internet |
What I want to do is assign one default gateway, the LVS box.
Joe
..for doing by LVS, you could set up a director to be a router and setup like it was infront of 4 squid boxes (you'll need the IP's of the other end of the ADSL link).
There's an example proxy above.
Alexandre Cassen Alexandre (dot) Cassen (at) wanadoo (dot) fr
I have tried some time ago that kind of setup. I have test 4 differents topology
- Using a dynamic routing protocol like BGP. Using BGP you can use cost onto your routing path. To setup a multipath Internet connection using BGP all the ISP connected to your BGP setting must be informed to add BGP their side. This setup is recommanded by ISP for corporate Internet use. It is mostly expensive due to ISP router side reconfiguration.
- Implementing a loadsharing topology like discribed into the "Linux 2.4 advanced routing HOWTO" section 9.5. You need here to use the same ISP for all your Internet connections because your ISP must implement the symetric config. This mean that ISP must support linux 2.4 loadsharing over multiple interface. This is rarely implemented by ISP because it is much more interesting implementing constructor integration that is more expensive. This is my feedback in France :/
- Setting up router with multiple default gateway. That way you will loadbalance by TCP conversation. I have only implemented this on CISCO, your are limited to the max default gw number implemented (3 or 4 for CISCO).
- Implement the solution discribed in the LVS HOWTO (above). Loadbalancing a squid server pool, each squid directly connected to your ADSL line.
Personally, I prefer the LVS solution which is much more easy and recommanded because it is ISP configuration independent. I have tested that on a RTSP proxy pool.
Initially when testing you should use a non-persistent (in the netscape sense) client, e.g. telnet VIP 80, or lynx VIP. Or else revert to these if you don't understand what you're seeing with netscape.
Peter Mastren Peter (dot) Mastren (at) chron (dot) com 18 Dec 2001
For the past several weeks, we have experienced almost daily denial of service attacks/events on our www servers. A remote client somewhere has opened a number of TCP connections to LVS that have absolutely no traffic whatsoever, save a single keepalive packet every two minutes. I have seen few as 3 and as many as 120 connections in the various incidents over the weeks.
These open connections are counted in the algorithm LVS uses to schedule servers, so the server that has all these open connections receives proportionately fewer new connections, in most cases taking the target server completely out of rotation.
Yesterday, I noticed an event coming from 130.80.XXX.XXX, our firewall address. Three connections were being held open from a machine inside our network. The culprit was my own workstation. I killed my browser and the connections went away. I fired up my browser again and tried to retrace my steps to duplicate the situation.
To make a long story short, it appears that Opera version 6.0 beta will leave a connection open to a server even after the window that was used for that connection has been closed. The only time the connection is closed is when Opera exits.
I will submit a problem report to Opera, in the meantime, there could be hundreds if not thousands of beta Opera browsers out there that could lock up ports on our servers for hours or days or longer.
This morning I made a configuation change to LVS that seems to have solved the problem. The masquerading portion of the Linux kernel (using LVS_NAT) uses default times to keep connections open, one for TCP connections, one for closed TCP connections that have received a FIN, and one for UDP connections. These defaults are 15, 2, and 5 minutes respectively. I changed the TCP timeout from 15 minutes to 110 seconds, which is shorter that the two minute intervals that the keepalive packets occur, yet long enough for any imaginable connection to a web server.
The change I made was:
ipchains -M -S 110 0 0 |
Wensong Aug 2002,
for 2.4 kernels
director:/etc/lvs# ipvsadm --set tcp tcpfin
One of the assumptions of setting up an LVS is that the content presented on the realservers is identical. This is required because the client can be sent to any of the realservers. This requirement is not handled if the client fills in a form which produces a gif on the realserver.
Alois Treindl alois (at) astro (dot) ch 30 Apr 2001
If a page is created by a CGI and contains dynamically created GIFs, the requests for these gifs will land on a different realserver than the one where the cgi runs. Will I need persistence?
I am running an astrology site; a typical request is to a CGI which creates an astrological drawing, based on some form data; this drawing is stored as a temporary GIF file on the server. A html page is output by the CGI which contains a reference to this GIF.
The browser receives the html, and then requests the GIF file from the server. It will mostly hit a different server than the one who created the GIF.
So either we make sure that the new client request for the GIF hits the same realserver which ran the CGI (i.e. have persistence) or we must create the GIF on a shared directory, so that each realserver sees it.
I have not tested it yet (not ported the CGIs yet to the new LVS box) but I think things are not so simple. In a 'rr' scheduling configuration, for example, the scheduler could play dirty, depending on the number of http requests for the given page, and the number of realservers. Both could be incommensurable in a way that the http request for the GIF never reaches the same realserver as the one which ran the CGI request.
I had already decided that I need shared directories between all realservers for our CGI environment which does computationally expensive things all the time. Some CGIs create also data files which are used by later CGIs. It is either shared directories for such files, or a shared database (which we also use).
These temp files will be sitting in the RAM cache of the NFS server, so that only network bandwidth between the realservers and the NFS server is the limiting factor. This is why I give the NFS server 2 gb of RAM, the max it will physically take, and this is why I chose 2.2.19 as the kernel because it contains NFS-3, which is said to be faster than NFS-2.
(Joe)
I tested it here on a page which generates a gif for the client. I found that I could never get the gif. Presumably after downloading the page containing the reference to the gif, the round robin scheduler sends the request for the gif to another realserver.
Presumably even page counters will have this problem. Writing to a shared directory should work.
Here's a solution with persistent fwmark using ip_tables to setup on a 2.4.x kernel. (Note: for page counters, this method will increment for each realserver, and not for the total page count over all the realservers as would happen with a shared directory.)
#put fwmark=1 on all tcp packets for VIP:http arriving on eth0 director:# iptables -t mangle -A PREROUTING -i eth0 -p tcp -s 0.0.0.0/0 -d 192.168.1.110/32 \ --dport http -j MARK --set-mark 1 #setup a 2 realserver LVS to persistently forward packets with fwmark=1 using rr scheduling. director:# ipvsadm -A -f 1 -s rr -p 600 director:# -a -f 1 -r RS1.mack.net:0 -g -w 1 director:# -a -f 1 -r RS2.mack.net:0 -g -w 1 #output setup director:# iptables -L -t mangle Chain PREROUTING (policy ACCEPT) target prot opt source destination MARK tcp -- anywhere lvs.mack.net tcp dpt:http MARK set 0x1 director:# ipvsadm IP Virtual Server version 0.2.11 (size=16384) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn FWM 1 rr persistent 600 -> RS2.mack.net:0 Route 1 0 0 -> RS1.mack.net:0 Route 1 0 0 |
Here's the output of ipvsadm after the successful generation and display of the dynamically generated gif. Note all connections went to one realserver.
director:/etc/lvs# ipvsadm IP Virtual Server version 0.2.11 (size=16384) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn FWM 1 rr persistent 600 -> RS2.mack.net:0 Route 1 5 3 -> RS1.mack.net:0 Route 1 0 0 |
The simplest LVS balances requests to a VIP:port amongst a group of realservers. If you are servicing many VIPs, then few requests may be present for any particular IP at any time and a disproportionate number of requests will be sent to the first realserver. In this case you should balance all the different IPs as one group.
Josh Marcus josh (at) serve (dot) com> 02 Oct 2001
I'm using LVS to serve a few thousand domains, but I don't see how I can setup LVS to load balance all of the domains as if they were all a single ip. In my ideal world, I would have a single entry *:80 that would forward all of our ips at port 80 to our set of realservers, and load balance all requests coming in. The way LVS is working for us now, the vast majority of all of our requests are going to the server that is for some reason being listed first. Only sites with heavy traffic get pushed along to the other servers.
Michael E Brown michael_e_brown (at) dell (dot) com>
fwmarks
In a LVS, you may want requests from a certain IP/netmask to to be forwarded to one set of realservers/services (which may be a subset of the total realservers, or may be other dedicated realservers), while the rest of the requests are forwarded normally to the whole LVS.
Or another way of putting it... You may want 2 (or more) LVSs setup on the one director, with one of the LVS's accepting only packets from an IP/netmask, while the rest of the requests go to other LVS.
Peter Mueller pmueller (at) sidestep (dot) com 18 Apr 2002
source-controlled routing for us gives a few advantages.
- when clients inside our company launch the client for our product (Sidestep), we want that client to redirect automatically to staging. Going to staging directly means it is easier to test code, etc. This is a small advantage and is merely the "proving grounds" or first step.
- One of our customers has a proxy server java-code caching problem (their client doesn't work) and we want to steer them to a server that won't have the problem. Unfortunately the customer is not technically competent, and we'd like to avoid changing anything at their end.
- It'd be nice to redirect our competitors/delinquent customers to a machine that had incorrect or out of date information. Surely most companies would think this is a cool feature!
- it is advantageous to have more control in case of mishap.
Julian supplied the recipe.
#for each $client, mark their packets with fwmark 1 director:# ipchains -A input -p TCP -s $client -d VIP 80 -m 1 -j ACCEPT . . #create an LVS for packets with fwmark 1 director:# ipvsadm -A -f 1 -s wlc director:# ipvsadm -a -f 1 -r $real_server #create LVS for other client IPs (or for everyone) <emphasis>i.e.</emphasis> normal LVS setup here . .
Here's the implementation for iptables.
Armin.Haken Armin (dot) Haken (at) Sun (dot) COM 10 Feb 2006
How to forward packets based on source address
Using the fwmarks in iptables you can create ipvs rules to forward packets to particular realservers or groups of realservers based on source address or source network. I got this information out of a December 2002 post to the lvs-users group by Ratz
Here is an example using LVS-NAT with 2 realservers with RIP 10.1.1.1 and 10.1.1.2. The first realserver serves clients on the 10.0.1.X network and the 10.0.5.X network, while the other realserver serves clients on 10.0.2.X. The VIP is 10.0.1.1.
Packets destined for server 1 get mark 1, packets destined for server 2 get mark 2
iptables -t mangle -A PREROUTING -s 10.0.1.0/24 -d 10.0.1.1 -j MARK --set-mark 1 iptables -t mangle -A PREROUTING -s 10.0.5.0/24 -d 10.0.1.1 -j MARK --set-mark 1 iptables -t mangle -A PREROUTING -s 10.0.2.0/24 -d 10.0.1.1 -j MARK --set-mark 2 |
The following command shows you counters of matched rules
iptables -t mangle -L PREROUTING -n -v |
ipvs forwards based on the marks
ipvsadm -A -f 1 ipvsadm -A -f 2 ipvsadm -a -f 1 -r 10.1.1.1 -m ipvsadm -a -f 2 -r 10.1.1.2 -m |
The iptables rules also allow you to specify the protocol or interface of the packets you mark and you can use negations, specify port numbers, etc. If a packet matches several of the rules, the marks get overwritten so the last matching rule determines the mark.
For failover you could either configure multiple realservers per fwmark or put in a system that changes the marking rules or forwarding rules once a failed realserver is detected.
Here are the discussions that has resulted in the current specifications for handling of persistence with fwmarks in LVS.
Ted Pavlic Jul 14, 2000
What I was asking about would be something like this:
virtual=192.168.6.2-192.168.6.30:80 real=192.168.6.240:80 gate service=http request="index.html" receive="Test Page" scheduler=rr
I have 1029 virtual servers -- that is I have 1029 hosts which need to be load balanced.
Horms horms (at) vergenet (dot) net 2000-07-14
(fwmark) has the advantage of simplfying the amount of _kernel_ configuration that has to be done which is a big win, even if this is automated by a user space application. The basic idea is that this provides a means for LVS to have virtual services that have more than one host/port/protocol triplet. In your situation this means that you can have a single virtual service that handles many virtual IP addresses and all ports and protocols (UDP, TCP and
You should take a look at ultramonkey (note from Joe, April 2001, UM is now 1.0.2, look for examples there). My understanding is that this is quite similar to how your LVS topology will be set up, though I understand you will be having more than one of these configured.
Basically what happens is that you set up LVS to consider any packets like other LVS virtual services other than that no VIP is specified.
e.g.
director:/etc/lvs# ipvsadm -A -f 1 -s rr director:/etc/lvs# ipvsadm -a -f 1 -r 192.168.6.3:80 -m director:/etc/lvs# ipvsadm -a -f 1 -r 192.168.6.2:80 -m director:/etc/lvs# ipvsadm -L -n IP Virtual Server version 0.9.11 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn FWM 1 rr -> 192.168.6.3:80 Masq 1 0 0 -> 192.168.6.2:80 Masq 1 0 0 |
The other half of the equation is that ipchains is used to match incoming traffic for virtual IP addresses and mark them with fwmark 1. Say you have 8 contiguous class C's of virtual addresses beginning at 192.168.0.0/24. The ipchains command to set up matching of these packets would be:
ipchains -A input -d 192.168.0.0/21 -m 1 |
You also need to set up a silent interface so that the LVS box sees traffic for the VIPs as local. To do this use:
ifconfig lo:0 192.168.0.0 netmask 255.255.248.0 mtu 1500 echo 1 > /proc/sys/net/ipv4/conf/all/hidden echo 1 > /proc/sys/net/ipv4/conf/lo/hidden |
Now, as long as 192.168.0.0/21 is routed to the LVS box, or more particularly the floating IP address of the LVS box brought up by heartbeat, traffic for the VIPs will be routed to the LVS box, the ipchains rules will mark it with fwmark 1 and LVS will see this fwmark and consider the traffic as destined for a virtual service.
Ted Jul 14, 2000
for me to enable persistent connections to every port using direct routing, would this work?
director:/etc/lvs# ipvsadm -A -f 1 -s rr -p 1800 director:/etc/lvs# ipvsadm -a -f 1 -r 216.69.192.201:0 -g director:/etc/lvs# ipvsadm -a -f 1 -r 216.69.192.202:0 -g
Horms
Yes, that would work. The port in the "ipvsadm -a" commands is ignored if the realservers are being added to a fwmark service. Connections will be sent to the port on the realserver that they will be recieved on the virtual server. So port 80 traffic will go to port 80, port 443 traffic will go to port 443 etc...
As a caveat you should really make sure that your ipchains statments catch all traffic for the given addresses including ICMP traffic so ICMP traffic is handled correctly by LVS.
(Julian on catching ICMP traffic)
IIRC, this is already not a requirement in the last LVS versions. If we look in skb->fwmark for ICMP packets it is impossible to use normal and fwmark virtual services to same VIP because we can't create such ipchains rules. The good news is that in 2.4 (0.0.3) the virtual service lookup (the fwmark field) is used only for the new connections. In 2.2 the service is looked up even for existing entries but we don't want to break the MASQ code entirely
Ted Pavlic tpavlic (at) netwalk (dot) com 19 Jul 2000
When using fwmark to assign realservers to virtual servers, how is scheduling and persistence handled?
In my particular example, I have: 216.69.196.0/22 (ie 4 class C networks) all marked with a fwmark of 1. ipvsadm setup is
Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn FWM 1 lc persistent 600 -> nw01:0 Route 1 0 0 -> nw02:0 Route 1 0 0 |
Say someone connects to 216.69.196.1 and the connection is assigned to nw01. At this point ipvsadm shows
Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn FWM 1 lc persistent 600 -> nw01:0 Route 1 1 0 -> nw02:0 Route 1 0 0 |
A new person connects to another IP in 216.69.196.0/22 (say 216.69.196.2). Will this new connection to 216.69.196.2 go to nw02 because it has the least number of TOTAL connections, or will it go to nw01 because for that PARTICULAR IP, both have 0 connections?
Now then say that the person who just connected to 216.69.196.1 makes a connection (within the 600 persistence seconds) to 216.69.196.3. Will this new connection go to nw01 because it's being persistent? Or will it go to either server depending on the number of connections?
Here's what I think would be the best way to do things...
If multiple IPs are marked with FWMARK 1, LVS should consider them all one entry in its active/inactive table. I don't believe that's how things are currently being handled.
(Julian)
The templates are not accounted in the active/inactive counters.
(Joe, almost a year later - Julian, what do you mean here?) (Julian 13 Apr 2001)
Ted here thinks that the templates are accounted in the inactive/active counters. And before the persistent-fwmark patch we can have many templates for one fwmark-based service:
CIPNET:0 -> VIP1:0 -> RIP_A:0 CIPNET:0 -> VIP2:0 -> RIP_B:0
where VIP1 and VIP2 are marked with same fwmark.
Ted recommends these two templates to be replaced with one, i.e. just like in the persistent-fwmark patch:
CIPNET:0 -> FWMARK:0 -> RIP1:0
We can't see the templates (which are normal connection entries with some reserved values in the connections structure fields) accounted in the inactive/active counters. The reason for this is that the inactive/active counters are used to represent the realserver load but our templates don't lead to any load in the realservers, we use them only to maintain the persistence.
When a service is marked persistent all connections from CIP to VIP go to same RIP for the specified period. Even for the fwmark based services. This works for many independent VIPs.
The other case is fwmark service covering a DNS name. I expect comments from users with SSL problems and persistent fwmark service. Is there a problem or may be not?
I agree, may be the both cases can be useful:
1. CIP->VIP 2. CIP->FWMARK |
Any examples where/why (2) is needed?
But switching the LVS code always to use (2) for the persistent fwmark services is possible.
(Ted)
In my opinion, here are some pros and cons of case 2:
Pros:
Improves scheduling, I think, and true load balancing. If someone is using [W]RR or [W]LC, the LVS box will actually look at the realservers as a whole rather than separate realserver entries for EACH VIP. Does that make sense?
For example, in my particular configuration I have over one thousand VIPs which are load balanced onto four RIPs. When I configure the LVS server to use LC scheduling, I'd like it to look at how many TOTAL connections are being made to each RIP not how many connections are being made to each RIP PER VIP. I would like to load balance all one thousand VIPs as a WHOLE onto the four RIPs rather than load balance EACH VIP.
That is, in some of my less active sites, most of their traffic will probably hit one VIP just because not much traffic will need to be load balanced. However, more active sites will hit both servers. The load will then not be distributed equally among the servers as one server will probably get not only the active traffic but also the less active traffic and the other server will only get the more active traffic (in the case of having two RIPs).
Cons:
One person on the Internet will keep connecting to the same RIP for many different VIPs if persistence is turned on.
If this causes a problem, the LVS administrator can do one of two different things:
1) Rather than load balancing a fwmark template, go back to load balancing specific VIPs. The scheduling will then be unique for those particular VIPs.
2) Create multiple fwmark templates. The scheduling for each template will be unique.
In my opinion if you group a bunch of IPs together by marking them with an fwmark, that you say that you want to load balance all of those COLLECTIVELY -- almost like load balancing one site.
I'm just saying, are there any examples where CIP->FWMARK is not needed?
As far as the LVS is concerned, if someone connects to a VIP marked with fwmark 1, it should treat it just like every other VIP marked with fwmark 1 -- as if they were all one VIP.
But today on my LVS (where I have a ten minute persistence setup) I connected to one virtual server marked with fwmark 1 and got a certain real server. I then expected to connect to another virtual server also marked with fwmark 1 and get that same realserver. I did not, however. If what you're telling me is correct, the persistence should have connected me to the same realserver as long as I was connecting within that ten minute window.
Now in this particular example -- connecting to DIFFERENT virtual servers -- it isn't so necessary for persistence to be carried through PER virtual server. I'm just worried that least connection scheduling and round-robin scheduling aren't working at the fwmark level -- I'm worried that they are working at the VIP level as if I had setup hundreds of explicit VIP rules inside IPVSADM.
Julian
I hope this feature (2) will be implemented in the next LVS version (if Wensong don't see any problems). I.e. the templates can be changed to case (2) for the persistent fwmark services. For now we (I and Horms) don't see any problems after this change. Then connections from one client IP to different VIPs (from the same fwmark service) will go to the same realserver (only for the persistent fwmark services).
Do you see any reason why enabling CIP->FWMARK for all cases would be a bad thing?
That is, not only using case 2 for persistent fwmark, but just whenever fwmark was used. Personally, I cannot ever forsee a scenerio when a person would setup an fwmark for load balancing and want each VIP associated with that fwmark to act independently.
Web cluster for independent domains (VIPs). fwmark service is used only to reduce the amount of work for configuration.
I've always thought that the scheduling algorithms should look directly at the realservers rather than the realserver stats for each particular virtual server. That is, least connection scheduling would look at the total number of connections on a realserver, not just the connections from that particular VIP. Round-robin would go round-robin from realserver to real server based on the last connection from ANY VIP to the realservers... However, before fwmark I realized that this would probably very difficult to do especially in cases where an LVS administrator was load balancing to a number of different realserver clusters that may overlap.
This is a job for the user space tools: WRR scheduling method + weights derived from the realserver load. Yes, one real server can be loaded from:
many directors
many virtual services
other processes not part from the real service
In this case the director's opinion (for each virtual service) about the realserver load is wrong. The only way to handle such case properly is to use WRR method. In the other cases WLC, LC and RR can do their job.
fwmark, to me, just by causing all VIPs marked with a particular fwmark to look like one big VIP makes it possible to do basically that which I just described. I don't see why anyone would not want such functionality with the fwmark services. If one did want such functionality, he would probably partition the VIPs associated with his fwmark into separate fwmarks or even explicit VIP entries anyway.
Yes. IMO, this can be a problem only for the balancing but I don't think so. The problems will come when one realserver dies and the client can't access any VIP part from the fwmark service for a period of time.
Here's the original e-mail between Ted tpavlic (at) netwalk (dot) com 3 Aug 2000 and Joe
One of the things it fwmarks lets me do is make ports sticky by groups.
Basically I setup ipchains rules that say all packets to ports 80/tcp and 443/tcp mark with a 1. All packets to ports 20/tcp and 21/tcp as well as 1024:65535/tcp mark with a 2. Voila... I just made ports stick by groups.
I then go into IPVS and setup my realservers under FWMARK1 and FWMARK2. Ports 80 and 443 are now persistent as a group just as 20 and 21 and 1024:65535 are persistent as a group. If my HTTP goes down on one of my real servers, I do not have to take my FTP down as well. I only have to remove the realserver from the FWMARK1 group. It's great!
Joe
most people don't program their own on-line transaction processing program and the point of an LVS is for the realservers to be running the same code as when they're stand alone.
My users run PHP scripts as well as ASPs that keep session information. That session information is unique per server and usually is stored in a local /tmp directory. Users are handed cookies which tie them to their session information. If they go to the wrong realserver, that session information won't exist and a number of things could go wrong.
most of my realservers run a lot of services... HTTP, HTTPS, FTP, SMTP, POP3, IMAP, DNS, And when one of them went down (with persistence set up), I would have to take the entire realserver down.
Several problems:
*) One little thing goes down... POP3, for example. Now the load increases a great deal on all my other realservers... Perhaps causing the load to become so high that sendmail starts rejecting connections... and then THAT realserver also is taken COMPLETELY down... domino effect. If I could have just taken POP3 down off of that server, it would have been perfect.
*) Say something horrible happens causing sendmail to go down on all the servers... or HTTP... or POP3... any one service -- just as long as it goes down on all servers. Rather than just causing that service to be affected, ALL of my services go down because every realserver was taken completely off-line until that ONE service is fixed. :(
But I figured that those two problems wouldn't be that big of a deal... I could probably put such a system in production.
Well -- I put such a system in production and those problems weren't that big of a deal... Except for a COUPLE of times when all services went down and caused a BIG hassle. So my superiors wanted something better -- needed in fact.
So at first I came up with the interim idea of separating persistent services and non-persistent services by IP. All of my persistent services were basically on one supernet and all of my non-persistent services were on another subnet. Consequently, I could tie the one supernet to one FWMARK and the other subnet to another FWMARK. Now if a persistent service went down, it would bring down only all of the persistent services. Also, if a non-persistent service went down, it would only bring down all of the non-persistent services.
This was definitely an interim solution because it required a lot more IPs that any one administrator should need, and it still was far from perfect.. BUT... I started to realize that just as I could mark different supernets and subnets with different FWMARKs, I could go farther down the TCP/IP layers and mark things at their protocol and port level. That's where I realized that we COULD do persistents by port group just with a little help from ipchains.
Joe
I asked Horms if there was any point in having multiple fwmarks. His only example was if you had duplicate sets of realservers. Eg the paying customers get the fast servers, while the people coming into the free site get the 486 with 16M.
Similar idea here... except rather than setting up your policies like:
Paying customers -> fast server
Free -> slow server
You have:
SMTP -> a realserver
POP3 -> another realserver
HTTP/HTTPS -> yet another realserver
FTP -> and another realserver
The key of it all is the fact that you can group by about any parameter that ipchains can see. If ipchains can segregate it, you can group it. Anything that ipchains can do IPVS can then add onto itself.
Joe
Have you solved passive ftp without using persistance?
I really don't think there's any way to get around it... In order to get passive FTP to work, you need to make TCP port 21 persistent with every TCP port above 1024. I mean -- how else could you do it without putting some big brother software inside of LVS which would keep an eye on FTP and see what port it tells the end-user to connect to.
Still, putting 21 and 1024:65535 together is a lot better than putting everything together. Personally I only plan on load balancing things in the <1024 range anyway, so I have no problem including that huge group above 1024.
This is my setup
FWMARK1 => HTTP/HTTPS (persistent) FWMARK2 => FTP (persistent) FWMARK3 => SMTP FWMARK4 => POP3 FWMARK5 => DOMAIN FWMARK6 => IMAP FWMARK7 => ICMP (for kicks) ================ IP Virtual Server version 0.9.12 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn FWM 1 lc persistent 600 -> nw04:0 Route 1 58 121 -> nw03:0 Route 1 49 76 -> nw02:0 Route 1 60 98 -> nw01:0 Route 1 61 44 FWM 2 lc persistent 600 -> nw04:0 Route 1 0 2 -> nw03:0 Route 1 0 2 -> nw02:0 Route 1 1 13 -> nw01:0 Route 1 1 0 FWM 3 lc -> nw04:0 Route 1 4 11 -> nw03:0 Route 1 4 12 -> nw02:0 Route 1 3 20 -> nw01:0 Route 1 3 16 FWM 4 lc -> nw04:0 Route 1 3 54 -> nw03:0 Route 1 1 74 -> nw02:0 Route 1 3 51 -> nw01:0 Route 1 2 73 FWM 5 lc -> nw03:0 Route 1 0 46 -> nw01:0 Route 1 0 44 -> nw02:0 Route 1 0 45 -> nw04:0 Route 1 0 45 FWM 6 lc -> nw04:0 Route 1 0 0 -> nw03:0 Route 1 0 0 -> nw02:0 Route 1 1 0 -> nw01:0 Route 1 0 0 FWM 7 lc -> nw04:0 Route 1 0 0 -> nw03:0 Route 1 0 0 -> nw02:0 Route 1 0 0 -> nw01:0 Route 1 0 0 ============== |
Is this anything new?
Joe
It's new to me and Horms didn't have any other ideas for multiple fwmarks 3 weeks ago, so I expect it will be new to him.
I've been thinking of ways of combining different programs which already exist out there to get L7 scheduling working. For example -- you have some program (sorta like policy routing but one more layer up) that filters packets at the application layer and does something to them... routes them to a particular IP... something like that... and then have ipchains mark each one of those packets with a particular mark... and have LVS work from there.
You see -- using multiple fwmarks makes me think that you can do a lot more with LVS.
We could probably borrow some of the ideas used for some of the dynamic routing protocols, like BGP or RIP. A master could advertise its IPVS hash table. If it didn't advertise within a given interval of time, other LVS's could take over.
During the failover, rather than trading an IP like we were talking about, all LVSs could know which one is the active one and ICMP redirect to that LVS or something like that.
Right now I'm routing every virtual server through the active LVS. This lets me do a lot of nifty things (for me at least):
* Very little has to happen on the LVS during failovers. They basically just trade an IP. In fact -- I COULD do the failover right at the router before the LVS's -- just have it route to another IP.
* I do not have to bring every IP up on my realservers -- I just have to bring the network that they're on up on a hidden loopback device. When you route an entire network to a loopback device, the loopback device answers every IP on that subnet automatically. So even with 1024+ IPs, I have to setup very few interfaces/aliases because a great deal of them are on the same subnet.
Ted Pavlic tpavlic (at) netwalk (dot) com 4 Aug 2000
Periodically the issue comes up regarding wanting to do persistence by groups of ports. Until now, an LVS administrator could make a single-port persistent or all ports persistent.
Single port persistence was nice for quite a few things. However, things like HTTP and HTTPS caused complications with it. Someone who connected to a webpage on HTTP and started a session tied to them with a cookie would want to return to that same realserver when they went to the HTTPS version of that site. FTP would also cause a problem with single-port persistence as someone who wanted to use passive FTP wouldn't be gauranteed the same server when they returned on a random TCP port above 1024. There are other examples as well.
So the solution to these problems would be to make every port persistent. This works pretty well, but now anytime a user of a large network behind a firewall would connect to a realserver on ANY service, everyone behind that firewall would hit that same realserver. Plus, if an administrator wanted to stop scheduling a single service to a single realserver, he would have to take all services down on that single realserver. This causes many problems as well... especially if one small service dies on every real server -- brings down every service on every realserver.
So there has been the need for persistence by port GROUPS. Rather than saying all ports are persistent, it would be nice to tell LVS to tie just 80/tcp and 443/tcp together or just 21/tcp and 1024:65535/tcp together. Before the wonderful FWMARK additions to LVS, this was not possible.
But now that LVS listens to FWMARKs, it becomes possible to group ports together inside ipchains with different FWMARKs and then tell LVS to listen to those FWMARKs.
For example, one can setup a rule inside FWMARK to do this...
80/tcp, 443/tcp --> FWMARK1 21/tcp, 1024:65535/tcp --> FWMARK2 25/tcp --> FWMARK3 110/tcp --> FWMARK4 |
Then inside LVS (assume on this setup all of these services are served by the same realserver cluster), say:
FWMARK1 -> PERSISTENT -> real1,real2,real3,real4 FWMARK2 -> PERSISTENT -> real1, real2, real3, real4 FWMARK3 -> real1, real2, real3, real4 FWMARK4 -> real1, real2, real3, real4 |
Not only have you now setup persistence by port groups, but you've also split your services back up into autonomous services that will not bring EVERY server down for the sake of persistence. If FTP goes down on real1, real1 only needs to be stopped scheduling for FTP.
Ted Pavlic tpavlic (at) netwalk (dot) com 2000-09-15
Using fwmark, you can setup something which used to be a big desire in LVS, persistence by port groups.
For example... Say you were serving HTTP and HTTPS. In this case, you would probably want calls to one HTTP server to end up hitting the same HTTPS server. This way session information and such would be accessable no matter how the end-user was accessing the website.
Say you also wanted all forms of FTP to work... You would need persistence there, but not necessarily the same persistence as HTTP/HTTPS.
And other protocols do not need to be persistent.
Back in the olden days before fwmark, to do any of this you would have to make ALL ports persistent. You couldn't simply say "Group 80 and 443 together and make them persistent and then make 21, 20, AND 1024:65535 persistent." If one service went down, you would have to bring down ALL services. Some sort of persistence by port groups would allow you to only need to take down whatever went down and the affected server could still serve other services.
FWMARK allows you to do this by way of setting up multiple FWMARKs.
That is -- you can use ipchains to say that:
HTTP,HTTPS --> FWMARK1 FTP --> FWMARK2 SMTP --> FWMARK3 POP --> FWMARK4 |
Then in LVS, setup:
FWMARK1 --> WLC Persistent 600 FWMARK2 --> WLC Persistent 300 FWMARK3 --> WLC FWMARK4 --> WLC |
And if FTP went down, all you'd have to do is stop scheduling FTP rather than stop scheduling EVERYTHING.
Also note that FWMARK makes setting up MASS VIPs really easy (of course because of recent ARIN policy changes, this probably won't be done much more anymore). That is, if you wanted to load balance 1000 VIPs, it might be easy to setup one single rule in ipchains to cover them all, where it would be 1000 rules for EACH realserver in ipvsadm.
It makes me think that if there was a utility already out there that could sit on a director and figure out where name-based packets were going it might be able to mark each name-based host with a different FWMARK and pass that right back to LVS... Then LVS wouldn't have to worry about handling name-based stuff ITSELF. Of course the name-based challenge is even more challenging considering how much data needs to be looked at to figure out if a TCP stream is a name-based HTTP session going to specific name X.... but that's a completely other argument... Just food for thought.
Simone Sestini, September 23, 2003
I would like to use director and a backup director as a realserver too. I would like to run http and https on the backup director/realserver
how can I configure more than one https domain for each server ? Apache need to use a unique IP for each https domain
Matthew Crocker matthew (at) crocker (dot) com 24 Sep 2003
Search some of the archives a bit. I handle my HTTPS servers with LVS-DR going through my LVS director. The actual web servers are not on the Internet. Here is what I do.
This works great for me. Only packets in the /24 that are marked with the firewall mark actually hit the LVS server and/or the realservers. All other packets are not treated local by the lVS server and will be routed to its default route which will create a routing loop. If you ping/traceroute it will look broken but if you telnet to port 80 on one of the IPs you will get an answer. This also eliminated any ARP issues because the realservers are not on the same LAN segment as the LVS directors and the router doesn't ARP for the /24 IPs anyway because of the static route.
Most of the configs for steps 3,4,5 are in the archives from a couple months ago.