While single-port services all use the same scheme (server listens, client connects), multi-port services each have their own scheme (ftp has two schemes, active and passive). For multi-port services, the initial connection is the standard single-port connection, but the setup of the 2nd (or more) port occurs through information sent in the payload of the connection to the first port. The director does not inspect the payload of packets and has no information about subsequent connection(s) that the client and realserver is attempting to setup. Approaches used to load balance multi-port services are
Use persistence to all ports at once on the realserver. (Persistence can also be set for a single port, but this is not used here).
This is a brute force approach. Once the initial connection is made from the client to the first port on the realserver, then any packet from any port on the client is forwarded to any port that the client requests on the realserver. This has been the approach historically used for ftp on LVS-DR or LVS-Tun. While it works, it is not secure, since any packets are allowed between the client and the LVS, and not just the packets required for the ftp transfer. For ftp, where no state is maintained on the realserver and where idle timeouts are just a matter of the client reconnecting, then persistence is a satisfactory solution for LVS/ftp. It would be nice if we could do better than this, but currently this is the state of the art for LVS with ftp.
e-commerce sites with fwmark listening on ports 80 and 443: This is not a multi-port tcpip protocol. A multi-port tcpip protocol requires one demon running on the realserver sending packets on two ports. For an e-commerce site, the connections are independant at the tcpip level and are serviced by different demons. For LVS, it is convenient to think of an e-commerce site as multi-port, for following the initial connection to port 80, you want the client's subsequent connection to 443 to go to the same realserver. This is handled by persistence or by persistent fwmark.
ftp is a 2 port service in both active and passive modes. For a description of active and passive ftp see Active FTP vs. Passive FTP, a Definitive Explanation on Slaksite. Also see the RFC 1579 for passive ftp and the RFC 959 for ftp (where ftp is referred to as just "ftp", but with the arrival of passive ftp, is now called "active ftp"). The usual resource for this sort of information, "TCP/IP Illustrated Vol 1", by W. Richard Stevens (Chapter 27 on FTP), only discusses what is now called active FTP.
Useful links (from Ratz 30 Nov 2003) http://www.ssh.com/support/documentation/online/ssh/winhelp/32/Forwarding_FTP.html Forwarding ftp. (port forwarded ftp is not the same as sftp or ftps, ssl based ftp).
Because of the problems securing ftp, Ratz suggests that you use a single ftp server that is not part of your LVS and secure it separately.
The ip_vs build produces the modules ip_masq_ftp (2.2.x) or ip_vs_ftp (2.4.x and later, written as a netfilter module). The ip_masq_ftp module is a patched version of the file which allowed ftp through a NAT box. The patch stopped the original function (at least in early versions of LVS) and is probably why it has a new name in 2.4.x kernels.
The ip_vs_ftp module will autoload (Nov 2003) when ipvsadm is invoked - check that the module is loaded by running lsmod.
The ipvs ftp helper module needed for LVS-NAT has resulted in a disproportionate number of problems on the LVS mailing list (presumably this will continue). In Dec 2006, Eric Robinson eric (dot) robinson (at) pmcipa (dot) com was the unwitting guinea pig in straightening some of this out.
Problems include:
the docs and fuctionality were out of step for quite a while.
Tony Clarke sam (at) palamon (dot) ie found (Sep 2002) that the ftp helper module ip_masq_ftp had not been patched for LVS for 2.2.19 at least a year after its release.
I was testing ftp with its default settings (without being terribly aware that I was using active ftp) and found that I didn't need the helper module. It took at least a year before anyone else (Wensong 17 Sep 2002) would agree with me. The conventional wisdom from 2002-2006 was that the ftp helper module wasn't needed for active ftp. I thought the helper function for active ftp must have been in ip_vs. A possible explanation is Mark de Vries comment immediately below, although not having the setup around any more I don't know for sure.
Mark de Vries markdv (dot) lvsuser (at) asphyx (dot) net 23 Dec 2006.
ftp-clients don't care which IP the connection originates from.
Joe - the ftp-data connection then would originate on the RIP, rather than the VIP. With the ftp helper, the ftp-data connections would be nat'ed to src_addr=VIP. In my test setup, with no ftp helper and two private networks (which I routed locally), the packets src_addr=RIP:ftp-data would have been routed directly through the director to the CIP. Complicating matters, I don't remember whether the ftpd was listening to the VIP or 0.0.0.0.
Mark de Vries markdv (dot) lvsuser (at) asphyx (dot) net 23 Dec 2006
from ip_vs_ftp.c: /* * Look at incoming ftp packets to catch the PASV/PORT command * (outside-to-inside). * * The incoming packet having the PORT command should be something like * "PORT xxx,xxx,xxx,xxx,ppp,ppp\n". * xxx,xxx,xxx,xxx is the client address, ppp,ppp is the client port number. * In this case, we create a connection entry using the client address and * port, so that the active ftp data connection from the server can reach * the client. */ |
So that would suggest (to me) that you do need the ip_vs_ftp helper module, to do the src address translation in the active connection from server to client.
Horms 27 Dec 2006
I just skimmed through the code, and the helper seems to listen for both the PASV and PORT command. My FTP knowledge is a bit rusty, but I think the latter is for non-passive ftp, so yes it seems to be needed for both.
The auto-loading is just a hack for the convenience of most people. Basically, in recent versions of ipvsadm, if you're setting up a virtual service on port 21, it guesses that there is a good chance that it is ftp and tries to load ip_vs_ftp. The ftp helper auto-load went in on 9 Oct 2003 - look at the date of your ipvsadm (due to a releaes procedure that is beyond my control, it seems that ipvsadm has been released multiple times with the version number of 1.24. Indeed, the version only seems to denote that it is the ipvsadm that works with the 2.6 kernels, or perhaps an revision of the ABI, rather than a release of the utility itself. Grrr. - i.e. the version number doesn't mean anything.)
If you are using a port other than 21, then you will need to set the ports argument to the module when it is loded
insmod ip_vs_ftp.ko ports=8021 |
The default is 21. You can have up to IP_VS_APP_MAX_PORTS (8). They are comma delimited
insmod ip_vs_ftp.ko ports=21,8021,9021 |
If the ftp helper module doesn't load, maybe you have an old version of ipvsadm? ftp is running on a port other than 21? The module couldn't be found by modprobe for some reason?
Eric: with the ftp helper loaded, the ftp-data packets arriving at the client have src_addr=VIP (the expected behaviour).
Joe - The 2.2.x ftp module is only available as a module (i.e. it can't be built into the kernel).
Juri Haberland juri (at) koschikode (dot) com 30 Apr 2001
AFAIK the IP_MASQ_* parts can only be built as modules. They are automagically selected if you select CONFIG_IP_MASQUERADE.
Julian Anastasov May 01, 2001
Starting from 2.2.19 the following module parameter is required:
modprobe ip_masq_ftp in_ports=21Joe
I don't see this mentioned in /usr/src/linux/Documentation, ipvs-1.0.7-2.2.19/Changelog, google or dejanews. Is this an ip_vs feature or is it a new kernel feature?ratz
I see info only in the source. This is a new 2.2.19 feature. It's /usr/src/linux/net/ipv4/ip_masq_ftp.c:
* Multiple Port Support * The helper can be made to handle up to MAX_MASQ_APP_PORTS (normally 12) * with the port numbers being defined at module load time. The module * uses the symbol "ports" to define a list of monitored ports, which can * be specified on the insmod command line as * ports=x1,x2,x3... * where x[n] are integer port numbers. This option can be put into * /etc/conf.modules (or /etc/modules.conf depending on your config) * where modload will pick it up should you use modload to load your * modules. * Additional portfw Port Support * Module parameter "in_ports" specifies the list of forwarded ports * at firewall (portfw and friends) that must be hooked to allow * PASV connections to inside servers. * Same as before: * in_ports=fw1,fw2,... * Eg: * ipmasqadm portfw -a -P tcp -L a.b.c.d 2021 -R 192.168.1.1 21 * ipmasqadm portfw -a -P tcp -L a.b.c.d 8021 -R 192.168.1.1 21 * modprobe ip_masq_ftp in_ports=2021,8021And it is a new kernel feature, not LVS feature.
what are these modules for: from ipvsadm(8) (ipvs 0.2.11)
If a virtual service is to handle FTP connections then persistence must be set for the virtual service if Direct Routing or Tunnelling is used as the forwarding mechanism. If Masquerading is used in conjunction with an FTP service than persistence is not necessary, but the ip_vs_ftp kernel module must be used. This module may be manually inserted into the kernel using insmod(8)
The modules are NOT used for LVS-DR or LVS-Tun: in these cases persistence is used (or fwmarks version of persistence).
Joe 23 May 2001:
I run these rules on the director (without the ftp module) and ftp works fine
$ ipchains -A forward -p tcp -j MASQ -s RIP ftp -d 0.0.0.0/0 $ ipchains -A forward -p tcp -j MASQ -s RIP ftp-data -d 0.0.0.0/0 $ ipchains -A forward -p tcp -j MASQ -s RIP 1024:65535 -d 0.0.0.0/0 |
Julian - these rules are risky. What happens with ICMP? It is not masqueraded. I hope there is a similar rule for ICMP.
Note | |
---|---|
Joe Dec 2006 - We're a little more careful nat'ing out clients running on the realservers now. We'd at least make sure the packets came out with src_addr=VIP. |
Stephane Klein
I've tried to use your example to setup active and passive FTP. I can authenticate, but i can't list or send data. I can see packet in the conntrack file that with dport=20, but the ftp server tried to send a SYN_SENT and have no reply.
ip_vs_ftp is loaded as module, ip_nat_ftp and ip_conntrack_ftp are in the kernel. I used iptables rules of your example in the HOWTO.
I saw this article where you said it's necessary to patch the kernel to work with ip_nat_ftp (http://www.in-addr.de/pipermail/lvs-users/2004-June/011955.html) That patch is for kernel 2.6.5. Is this patch included in your NFCT patch or is it necessary to apply this patch?
Julian 29 Aug 2004
Yes, it is needed if you are loading ip_nat_ftp. I didn't received any replies from the netfilter coreteam about this patch, so I just linked it to the web site: ip_nat_ftp-2.6.5-1.diff
There are problems with the helper module approach for ftp, since there is no agreement amongst ftpd code authors about the responses given. To help passive ftp, the ip_vs_ftp module looks for the response
227 Entering Passive Mode |
from the ftpd. Postings to the LVS mailing list (starting with a posting by Tom Cronin on LVS-NAT ftp), show that this response is not universal for ftpds. As well Rutger van Oosten found for passive ftp, that the ftpd must be set to listen on the correct IP.
Mark de Vries found that his ftp LVS-NAT didn't work, the reason being that the ftp helper module wasn't forwarding the reply packets from the ftp-data port (usually port 20). On further exploration, Mark found that the ftpd (the GPL'ed vsftp) wasn't using the standard ftp-data port, but was using a high (>1024) port, thus allowing the ftpd to run with lower privileges (vsftp can be setup to run with the standard ftp-data port). Currently the ftp helper expects ftp-data=20. We're working on a fix for this. Here's the discussion so far.
Mark de Vries markdv (dot) lvsuser (at) asphyx (dot) net 25 Nov 2005
Problem found... The thing is that ip_vs(_ftp) seems to assume that the ftp-data connection will be initiated from port 20. Seems like a valid assumption... But unfortunately this is not always the case... the vsftpd I was testing with was configured to "connect_from_port_20=NO" by default. Once I swithched to "=YES" active FTP worked fine. Otherwise I just used some SNAT rules on the director. So.... Now the question is: is this a vsftpd 'problem'? MUST ftp-data connections originate from port 20? Or should this assumption be relaxed?
Aparently the iptables contrack_ftp module does not assume it; Connections from ports other then 20 are considered "RELATED". (I have not checked the src or debugged anything, I just observed that this type of connection is indeed matched by a "RELATED" rule in my own iptables setup.)
I don't think adding an option --data-port="some_number" to the ftp helper would get us anywhere - the src port is not always the same. vsftpd (probably) just connects without binding to a specific port, just getting a random one in the ip_local_port_range... Is there anything against not matching on the src port like the ip_contrack(_ftp) stuff, i.e. matching/finding the source port on the fly?
vsftp has passive ftp (pasv_enable = YES). A lot of clients will default to passive mode or fallback to it if active does not seem to be working. which is probably the main reason I've had relatively few complaints about active ftp not working.
As far as I understands the RFC leaves no room for a different src port for the data connection. It's not fixed at 20 but should be 1 below the controll port. Which is what ip_vs uses literally IIRC.
ip_vs_ftp and ip_conntrack_ftp do much of the same thing. The only difference is that in iptables you need an explicit rule to handle the connection entries created, when in ipvs they are allways used. The real difference is only in the details of the connection entry they create. In ipvs there is the assumption/requirement that the connection will originate from port 20 (assuming the ftpd is listening on port 21). The ip_contrack_ftp module (aparently) does not make this assumption. Taking the RFC as a guide the assumption is of course valid.
Graeme Fowler graeme (at) graemef (dot) net 23 Aug 2006
This is a 2 port service.
Here's part of my /etc/services
ftp-data 20/tcp #File Transfer [Default Data] ftp-data 20/udp #File Transfer [Default Data] ftp 21/tcp #File Transfer [Control] ftp 21/udp #File Transfer [Control] |
To setup ftp with LVS, you schedule only port 21 for forwarding. While the realserver is listening on port 21, it calls the client from port 20 (i.e. it's not listening on port 20) rather than the client calling the realserver (through the director). You do not add entries for port 20 with ipvsadm. Port 20 is handled by persistence for LVS-DR and LVS-Tun. For active ftp with LVS-NAT, you don't need the ipvs ftp helper module (the ftp helper module is only needed for passive ftp, Wensong 17 Sep 2002) (however see ftp helper module.
Here's a standard non-LVS active ftp session using phatcat. The ftp "client" machine (192.168.1.254) connects to the ftp server machine "sneezy" (192.168.1.11). Since two ports are involved, phatcat is run from two windows, xterm_1, xterm_2.
xterm_1:
client:~# phatcat sneezy 21 sneezy.mack.net [192.168.1.11] 21 (ftp) open 220 sneezy.mack.net FTP server (Version wu-2.4.2-academ[BETA-15](1) Wed May 20 13:45:04 CDT 1998) ready. help 214-The following commands are recognized (* =>'s unimplemented). USER PORT STOR MSAM* RNTO NLST MKD CDUP PASS PASV APPE MRSQ* ABOR SITE XMKD XCUP ACCT* TYPE MLFL* MRCP* DELE SYST RMD STOU SMNT* STRU MAIL* ALLO CWD STAT XRMD SIZE REIN* MODE MSND* REST XCWD HELP PWD MDTM QUIT RETR MSOM* RNFR LIST NOOP XPWD 214 Direct comments to ftp-bugs@sneezy.mack.net. user ftp 331 Guest login ok, send your complete e-mail address as password. pass mack 230 Guest login ok, access restrictions apply. |
On the client, use netstat -an to find the highest unprivileged port in use (in this case port 1029).
xterm_2: tell the client to listen on the first unused port (here 1030).
client:~# phatcat -l -p 1030 |
xterm_1: tell the ftpserver to connect to client:1030 (192,168,1,254,256,6) (1030=256x4 + 6), and then list the contents of the directory
port 192,168,1,254,4,6 200 PORT command successful. list 150 Opening ASCII mode data connection for /bin/ls. 226 Transfer complete. |
xterm_2: receives the output of list.
connect to [192.168.1.254] from (UNKNOWN) [192.168.1.11] 20 total 9 drwxr-xr-x 8 root root 1024 Nov 6 20:15 . drwxr-xr-x 8 root root 1024 Nov 6 20:15 .. drwxr-xr-x 2 root root 1024 Apr 7 1998 bin drwxr-xr-x 2 root root 1024 Aug 30 1993 etc drwxr-xr-x 2 root root 1024 Dec 3 1993 incoming drwxr-xr-x 2 root root 1024 Nov 17 1993 lib drwxr-xr-x 2 root root 1024 Jun 4 2001 pub -rw-r--r-- 1 root root 0 Oct 24 13:24 this_is_sneezy drwxr-xr-x 3 root root 1024 Aug 30 1993 usr -rw-r--r-- 1 root root 312 Aug 1 1994 welcome.msg |
The ftpserver then closes the connection from port 21 (i.e. you can't do a second listing).
xterm_1:
list 425 Can't build data connection: Connection refused. |
xterm_2: on the ftp client, initiate another listener (on the next unused port).
client:~# phatcat -l -p 1033 |
xterm_1: tell the ftp server to connect to client:1033 (1033 = 256 x 4 + 9), prepare for upload of an ascii file (type a), check the size of the file (size welcome.msg about to be downloaded, then retreive it (retr welcome.msg). (ftp server will then close connection from port 20.)
port 192,168,1,254,4,9 200 PORT command successful. type a 200 Type set to A. size welcome.msg 213 317 retr welcome.msg 150 Opening ASCII mode data connection for welcome.msg (312 bytes). 226 Transfer complete. |
xterm_2: watch welcome.msg being delivered.
connect to [192.168.1.254] from (UNKNOWN) [192.168.1.11] 20 Welcome, archive user! This is an experimental FTP server. If have any unusual problems, please report them via e-mail to root@%L If you do have problems, please try using a dash (-) as the first character of your password -- this will turn off the continuation messages that may be confusing your ftp client. |
xterm_1:say goodbye (the data connection has closed, so you can't list using the same connection).
list 425 Can't build data connection: Connection refused. quit 221 Goodbye. |
The example illustrates what happens with active ftp on LVS-DR without persistence (it is not going to work). Set up a working one network LVS-DR (i.e. all IPs are in the same network), add rules to forward ftp
Note | |
---|---|
Here you are only running commands to forward port 21. You have not handled the data port 20 in any way. |
pip:/etc/lvs# ipvsadm -A -t lvs.mack.net:ftp -s rr pip:/etc/lvs# ipvsadm -a -t lvs.mack.net:ftp -r bashfull.mack.net -g -w 1 pip:/etc/lvs# ipvsadm -a -t lvs.mack.net:ftp -r sneezy.mack.net -g -w 1 pip:/etc/lvs# ipvsadm IP Virtual Server version 0.9.4 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP lvs.mack.net:ftp rr -> sneezy.mack.net:ftp Route 1 0 0 -> bashfull.mack.net:ftp Route 1 0 0 |
Use phatcat (as above) to connect attempt to setup an ftp session with the VIP.
xterm_1:connect to VIP:ftp
client:~# phatcat lvs 21 lvs.mack.net [192.168.1.110] 21 (ftp) open 220 sneezy.mack.net FTP server (Version wu-2.4.2-academ[BETA-15](1) Wed May 20 13:45:04 CDT 1998) ready. user ftp 331 Guest login ok, send your complete e-mail address as password. pass mack 230 Guest login ok, access restrictions apply. |
With netstat -an on the realserver, note that the client is connected to VIP:21, not to RIP:21.
xterm_2:listen on the next available port
client:~# phatcat -l -p 1036 |
xterm_1:tell the realserver to connect to client:1036, and then list the contents of /home/ftp. (The connection hangs for a while - eventually you'll get the 425 message).
port 192,168,1,254,4,12 200 PORT command successful. list 425 Can't build data connection: Connection timed out. |
On the realserver, netstat -an shows
sneezy:/home/ftp# netstat -an | grep 103 tcp 0 1 192.168.1.110:20 192.168.1.254:1036 SYN_SENT tcp 5 0 192.168.1.110:21 192.168.1.254:1035 ESTABLISHED |
On the client, netstat -an shows that client is listening, but not connecting
client:~# netstat -an | grep 103 tcp 0 0 0.0.0.0:1036 0.0.0.0:* LISTEN tcp 0 0 192.168.1.254:1035 192.168.1.110:21 ESTABLISHED |
following the list, if you run tcpdump on the realserver when you run the list command, you'll see that the realserver is sending SYN packets from VIP:20->client:1036 but not receiving any replies. The problem is that the ACK from the client is sent to VIP:20 which is routed to the director, which has no forwarding rules for VIP:20. Even if the director had forwarding rules for VIP:20, it requires the first packet in a connection to be a SYN, to start the process of making an entry in the ipsvadm table for packets to port 20. Thus the director will reject the ACK from the client to VIP:20 and no connection will be made.
This is the normal method of setting up LVS-DR for ftp.
pip:/etc/lvs# ipvsadm -A -t lvs.mack.net:ftp -s rr -p 600 pip:/etc/lvs# ipvsadm -a -t lvs.mack.net:ftp -r bashfull.mack.net -g -w 1 pip:/etc/lvs# ipvsadm -a -t lvs.mack.net:ftp -r sneezy.mack.net -g -w 1 pip:/etc/lvs# ipvsadm IP Virtual Server version 0.9.4 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn TCP lvs.mack.net:ftp rr persistent 600 -> sneezy.mack.net:ftp Route 1 0 0 -> bashfull.mack.net:ftp Route 1 0 0 |
Passive ftp is used by netscape to get files from an ftp url like ftp://ftp.domain.com/pub/ . Here's an explanation of passive ftp from http://www.tm.net.my/learning/technotes/960513-36.html
If you can't open connections from Netscape Navigator through a firewall to ftp servers outside your site, then try configuring the firewall to allow outgoing connections on high-numbered ports.
Usually, ftp'ing involves opening a connection to an ftp server and then accepting a connection from the ftp server back to your computer on a randomly-chosen high-numbered telnet port. the connection from your computer is called the "control" connection, and the one from the ftp server is known as the "data" connection. All commands you send and the ftp server's responses to those commands will go over the control connection, but any data sent back (such as "ls" directory lists or actual file data in either direction) will go over the data connection.
However, this approach usually doesn't work through a firewall, which typically doesn't let any connections come in at all; In this case you might see your ftp connection appear to work, but then as soon as you do an "ls" or a "dir" or a "get", the connection will appear to hang.
Netscape Navigator uses a different method, known as "PASV" ("passive ftp"), to retrieve files from an ftp site. This means it opens a control connection to the ftp server, tells the ftp server to expect a control connection to the ftp server, tells the ftp server to expect a second connection, then opens the data connection to the ftp server itself on a randomly-chosen high-numbered port. This works with most firewalls, unless your firewall retricts outgoing connections on high-numbered ports too, in which case you're out of luck (and you should tell your sysadmins about this).
"Passive FTP" is described as part of the ftp protocol specification in RFC 959 ("http://www.cis.ohio-state.edu/htbin/rfc/rfc959.html").
If you are setting up an LVS ftp farm, it is likely that users will retrieve files with a browser and you will need to setup the LVS to handle passive ftp. You will need the ftp helper module or persistent connection (also see on the LVS website under documentation; persistence handling in LVS) or fwmark persistent connection for ftp.
For passive ftp, the ftpd sets up a listener on a high port for the data transfer. This problem for LVS is that the IP for the listener is the RIP and not the VIP.
Wenzhuo Zhang 1 May 2001
I've been using 2.2.19 on my dialup masquerading box for quite some time. It doesn't seem to me that the option is required, whether in PASV or PORT mode. We can actually get ftp to work in NAT mode without using the ip_masq_ftp module. The trick is to tell the real ftp servers to use the VIP as the passive address for connections from outside; e.g. in wu-ftpd, add the following lines to the /etc/ftpaccess:
passive address RIP <localnet> passive address 127.0.0.1 127.0.0.0/8 passive address VIP 0.0.0.0/0Of course, the ftp virtual service has to be persistent port 0.
Alois Treindl, 3 May 2001
I found (with kernel 2.2.19) that I needed the command
modprobe ip_masq_ftp in_ports=21so that (passive mode) ftp from Netscape would work. without the in_ports=21 it did not work.
Julian Anastasov ja (at) ssi (dot) bg 03 May 2001
Yes, it seems this option is not useful for the active FTP transfers because if the data connection is not created while the client's PORT command is detected in the command stream, then it is created later when the internal realserver creates normal in->out connection to the client. So, it is not a fatal problem for active FTP to avoid this option. The only problem is that these two connections are independent and the command connection can die before the data connection, for long transfers. With the in_ports option used this can not happen.
Note | |
---|---|
Joe - in previous HOWTOs I had a comment from Julian saying that the ftp helper was "recommended" for active ftp (presumably not required). Presumably this is what he's talking about. |
The fatal problems come for the passive transfers when the data connection from the client must hit the LVS service. For this, the ip_masq_ftp module must detect the 227 response from the realserver in the in->out packets and to open a hole for the client's data connection. And the "good" news is that this works only with in_ports/in_mark options used.
Alois
on option so that I could configure on the server that it gives the VIP to clients making a PASV request; it always gives the realserver IP address in replies to such requests.
Bad ftpd :) It seems the follwing rules are valid:
Jeremy Kusnetz:
although Julian says that all you need for ftp with LVS-NAT is the ip_masq_ftp module, it doesn't work for me (director 2.2.19-1.0.7 with ip_masq_ftp in_ports=21) my ftp client just hangs.
Julian
The Netfilter guys use another approach when detecting the 227 message in Linux 2.4, i.e. they try to ignore the message and to use only the code (I'm not sure what is the final status of this handling there). But in Linux 2.2 the word "Entering" may be a requirement :( You have to select another FTPd, IMO.
Jeremy Kusnetz JKusnetz (at) nrtc (dot) org 24 May 2001
It was my ftp server. When going into passive mode it said:
Passive mode on (x,x,x,x,x,x)instead of:
Entering Passive Mode (x,x,x,x,x,x)
In early 2005 Johan van den Berg, and Simon Schwendemann sent a report of a problem with LVS-NAT (2.4.x) where the ACK reply to a SYN would not be source-NAT'ed and so would emerge with src_addr=RIP and not src_addr=VIP. (http://archive.linuxvirtualserver.org/html/lvs-users/2005-02/msg00299.html) Johan van den Berg switched to using LVS-DR. http://archive.linuxvirtualserver.org/html/lvs-users/2005-02/msg00299.html
Even to figure this out took a while. Initially only one in 60 or so SYNs would have the problem. No-one had any idea what the problem was and cries for help were greeted by silence. Then Jari Takkala Jari (dot) Takkala (at) Q9 (dot) com 15 Aug 2005 found it only occured when the LVS-NAT was forwarding ftp, but the problem occured on all VIPs, not just the VIP that had the ftp service.
With Jari's posting, other people started to recognise the problem too.
Graeme Fowler graeme (at) graemef (dot) net 16 Aug 2005
This is very interesting; I have a number of clusters behind LVS-NAT and hadn't managed to observe myself that the one having problems - which I posted about sometime in the last year - is the only one of the whole lot which has ip_vs_ftp loaded. It's also a 2.4.x kernel, and can't be in-service upgraded.
Julian Anastasov ja (at) ssi (dot) bg 26 Aug 2005
I can not reproduce it, I tried with 2.4.32-pre3 as it contains some changes. Can you show your vs settings?:
grep . /proc/sys/net/ipv4/vs/* |
So, you don't have any iptables rules, fwmarking, NAT or linux ethernet bridging? Any extra patches for IPVS?
From your explanation ip_vs_ftp leads to problems where SYN creates web connection, it is hashed in table, DNAT-ed to RS, then RS replies SYN+ACK which can not match the connection in table. It looks like this connection is not present (may be removed, do you see something in debug logs from the SYN to the SYN+ACK) or the hash table is damaged. Do you still think it is caused by ip_vs_ftp? About your tests, is the client IP on lan? Do you think this client IP has many connections to the director?
Jari (data dumps omitted)
The client IP is not on the LAN. The problem occurs from any source IP trying to visit a load balanced VIP. Whenever we add the FTP service to ipvsadm, and begin load balancing to it, the problem begins to occur on all services. However, it is not consistent. Some outgoing SYN+ACK packets will get translated correctly for a certain period of time, then after awhile some packets will not be translated. I do not think it is load related. We have other load balancers built from the same image handling many more connections.
There were various discussions (under the title "LVS bugs") between Julian and Agostino di Salle a (dot) disalle (at) fineco (dot) it that you can find in the archives if you want to know more.
Julian
As reported from some users, the ip_nat_ftp module causes some problems with other virtual services. ip_nat_ftp can keep ip_vs_conn_no_cport_cnt > 0 for the time it expects connections from unknown client ports. This is fatal for the persistence services as the normal packets start to hit persistence templates instead of valid connections. Such packets are correctly forwarded to real servers but the reply packets do not see connections as they are not created. As result, the reply packets are not SNAT-ed by the IPVS code.
It is enough to have passive FTP connection that waits to learn its client port to trigger problems with non-ftp persistent services. The used VIPs do not matter.
I tried to fix this problem with the following patch: Linux 2.6.13: http://www.ssi.bg/~ja/tmp/ipvs-2.6/ct-2.6.13-1.diff, Linux 2.4.32-pre3: http://www.ssi.bg/~ja/tmp/ipvs-2.4/ct-2.4.32-pre3-1.diff
These patches do the following:
There is a second patch that properly invalidates the templates as Agostino di Salle noticed: Linux 2.6.13: http://www.ssi.bg/~ja/tmp/ipvs-2.6/invct-2.6.13-1.diff Linux 2.4.32-pre3: http://www.ssi.bg/~ja/tmp/ipvs-2.4/invct-2.4.32-pre3-1.diff
I performed simple tests, so please test these patches, for example, persistence+ip_nat_ftp, the ip_vs_sync code is changed too. If there is a better solution please speak before including them in next kernel releases. I'm expecting confirmation from people with the problem that reply packets were not translated from IPVS.
Jari Takkala Jari (dot) Takkala (at) Q9 (dot) com 9 Sep 2005
We applied these patches to a production load balancer on kernel 2.4.26. Our IPVS code is one version behind, however the patches applied cleanly. We began load balancing FTP last night, and so far everything is working properly. Thanks very much for your help!
The patches worked from Graeme Fowler too.
Julian thinks this problem has been affecting people for a while.
Julian Anastasov ja (at) ssi (dot) bg 12 Sep 2005
thanks to Graeme and to Jari for the tests. It seems the problems reported from many users in last 2 years and more are now fixed.
Roberto Nibali ratz (at) tac (dot) ch 06 May 2001
If you are trying to secure the LVS using the LVS as a packetfilter, will have no big success in doing it for the ftp protocol, because it is so open. You can do a lot to minimize full breaches. At least put the ftp daemon in a chroot environment.
We have multiple choices if we want to narrow down the input ipchains rules on the front interface of director
The biggest problem is with the ip_masq_ftp module. It should create an ip_fw entry in the masq_table for the PORT port. It doesn't do this and we have to open the whole port range. For PASV we have to DNAT the range.
ipchains -A forward -i $EXT_IF -s $INTERNAL_NET $UNPRIV_PORTS -d $DEP -j MASQ |
FTP is made up of two connections, the Control- and the Data- Connection.
ftp Control Connection
The Client contacts the Servers port 21 from an UNPRIV Port. No trouble, standard, plain, vanilla TCP-Connection, we all love it. Over this connection the client sends commands to the server. We will see examples later.
FTP Data Connection
"Data" can be either the content of a file (sent as e.g. the result of a "get" or "put" command) or the content of a directory-listing (i.e. the result of a "ls" or "dir" command).
The data connection is where the trouble starts. To transfer data, a second connection is opened.
Usually the client opens this second connection to the server. But for active ftp, the server opens this second connection, using the well-known port 20 (called ftp-data) as sourceport. But which port on the client should he connect to? The client announces the port via a "port"-command over the control connection. This is nasty: Ports are negotiated on application-level where L4 switches like LVS can see what's going on.
For passive ftp, the server announces the port the client should connect to in its reply to the client's "pasv"-command (this command starts passive FTP, active is the default). The client then opens the data-connection to the server. The port that the server listens on is an unprivileged port (rather than a privileged port as is normal for internet services). A passive ftp transfer then requires that connections be allowed between all 63000 unprivileged ports on both the client and realservers rather than just one. A passive ftp server is difficult to secure with packet filter rules.
If we have to protect a client, we would like to only allow passive ftp, because then we do not have to allow incoming connections. If we have to protect a server, we would like to only allow active ftp, because then we only have to allow the incoming control-connection. This is a deadlock.
We need 2 xterms (x1, x2), fatcat and an ftp-server (here "ftpserver" 172.23.2.30).
First passive mode (because it is conceptionally easier)
#x1: Open the control-connection to the server, #and sent the command "pasv" to the server. $ phatcat ftpserver 21 220 ftpserver.terreactive.ch FTP server (Version 6.4/OpenBSD/Linux-ftpd-0.16) ready. user ftp 331 Guest login ok, send your complete e-mail address as password. pass ftp 230 Guest login ok, access restrictions apply. pasv 227 Entering Passive Mode (172,23,2,30,169,29) |
The server replied with 6 numbers:
In x2 I open a second connection with a second phatcat
$ phatcat 172.23.2.30 43293 # x2 will now display output from this connection |
Now in x1 (the control-connection)
$ list list 150 Opening ASCII mode data connection for '/bin/ls'. 226 Transfer complete. |
and in x2 the listing appears.
Active ftp
I use the same control-connection in x1 as above, but I want the server to open a connection. Therefore I first need a listener. I do it with phatcat in x2:
$ phatcat -l -p 2560 |
Now I tell the server on the control connection to connect (2560=10*256+0)
port 172,23,2,8,10,0 200 PORT command successful. |
Now you see, why I used port 2560. 172.23.2.8 is, of course, my own IP-address. And now, using x1, I ask for a directory-listing with the list command, and it appears in x2. For completeness sake, here is the the full in/output.
First the xterm 1:
phatcat ftpserver 21 220 ftpserver.terreactive.ch FTP server (Version 6.4/OpenBSD/Linux-ftpd-0.16) ready. user ftp 331 Guest login ok, send your complete e-mail address as password. pass ftp 230 Guest login ok, access restrictions apply. pasv 227 Entering Passive Mode (172,23,2,30,169,29) list 150 Opening ASCII mode data connection for '/bin/ls'. 226 Transfer complete. port 172,23,2,8,10,0 200 PORT command successful. list 150 Opening ASCII mode data connection for '/bin/ls'. 226 Transfer complete. quit 221 Goodbye. |
xterm 2:
phatcat 172.23.2.30 43293 total 7 dr-x--x--x 2 root root 1024 Jul 26 2000 bin drwxr-xr-x 2 root root 1024 Jul 26 2000 dev dr-x--x--x 2 root root 1024 Aug 20 2000 etc drwxr-xr-x 2 root root 1024 Jul 26 2000 lib drwxr-xr-x 2 root root 1024 Jul 26 2000 msgs dr-xr-xr-x 11 root root 1024 Mar 15 14:26 pub drwxr-xr-x 3 root root 1024 Mar 11 2000 usr |
phatcat -l -p 2560 total 7 dr-x--x--x 2 root root 1024 Jul 26 2000 bin drwxr-xr-x 2 root root 1024 Jul 26 2000 dev dr-x--x--x 2 root root 1024 Aug 20 2000 etc drwxr-xr-x 2 root root 1024 Jul 26 2000 lib drwxr-xr-x 2 root root 1024 Jul 26 2000 msgs dr-xr-xr-x 11 root root 1024 Mar 15 14:26 pub drwxr-xr-x 3 root root 1024 Mar 11 2000 usr |
Joe
I see that ftp is hard to make secure and your prime recommendation is to have an ftp server isolated from all other machines. Do you recommend that people not use ftp and say instead use http for LVSs that are delivering files? I don't like http for file download. At home (28k phone ppp link) if I do anything else over the line (like load a webpage) while doing a download, the download stalls and doesn't start up again. This is pain as a 10M file takes 2hrs and I have to start again.
Joe Cooper joe (at) swelltech (dot) com 07 May 2001
wget -c http://url |
will solve that problem.
sftp is now available as part of the openssh packages I believe, but requires clients to have a recent version of openssh -- probably not what folks want if they have enough clients to justify an LVS cluster. I don't think LVS really has anything to do with whether someone should use ftp for security reasons or not. Securing ftp is a separate issue from securing LVS.
Note | |
---|---|
This is not ftp port forwarded through ssh (see port forwarded ftp), nor is it sftp. |
From Ratz, 30 Nov 2003, see http://www.stunnel.org/examples/ftp.html FTP+SSL, FTP+TLS. There are two deprecated methods of doing SSL+FTP. Make sure that what you're doing and talking about is http://www.ietf.org/rfc/tfc2228.txt RFC228 ftps. The session starts by the client connecting to port 21 and issuing the "PROT P" command. Quite what happens after that I don't know (which ports, are the packets encrypted?).
Kai
I am using LVS/NAT with ssl based ftp. I can ftp via realserver by using either port mode or passive mode.
ratz 29 Nov 2003
Over the director, correct?
For security reasons SSL based ftp was required. After adding ssl based ftp auth to the realservers, the client computers cannot connect to the realserver with passive mode, but port mode works well.
IIRC you need to load balance port 22 too.
I think the problem is ,data which ftp server send to client include the server's passive port was crypted by ssl. so the LVS don't know which port should be translate and open.
AFAICR this isn't the issue. The client receives the PASV command and then translates the PORT into a local ssh tunnel forward. So I think you have to also load balance port 22 TCP. You can use the port 0 feature :).
Kai reposted this on 19 Feb 2004
I think the problem is, the data which ftp server sends to the client includes the server's passive port was crypted by SSL. So the LVS don't know which port should be translated and opened.
Horms 20 Feb 2004
Yes, that sounds likely. Try tracing the traffic using something like ngrep.
Does LVS support the SSL based FTP? If not, is there any solution?
If your guess is correct, then no. Well, not unless you get the linux director to handle the ssl and just talk plain-text to the real-servers, but then that isn't LVS.
(from the IPCHAINS-HOWTO) DNS doesn't always use UDP; if the reply from the server exceeds 512 bytes, the client uses a TCP connection to port number 53, to get the data. Usually this is for a zone transfer.
The name resolution (ulink url="LVS-HOWTO.services.general.html#name_resolution") process is broken. It's possible for a client (resolver) to get a reply from a hung nameserver which it interprets as "no resolution for that name", rather than allowing the client to go on to the next nameserver in the list. This is a design flaw that will take some fixing (all the clients and all the nameservers must be fixed). DNS should have it's own failover mechanism, but it doesn't. In the meantime, some other failover mechanism will have to present a perfect nameserver to the clients.
There is no consensus amongst people running LVS as to whether it's best to have named/bind LVS'ed or to just to have a set of machines in a failover setup (Horms, 2 Oct 2006, is of the opinion than named can't be load balanced by nature of the protocol). If you're running DNS in a failover setup, you might think that you could have one primary machine and a secondary machine and that on failover of the primary you could promote the secondary to be the primary. By design of DNS, there can only be one primary machine. The primary and secondary have different config files and it's not simple to programatically switch the secondary into the primary role (it can be done, it just requires some thinking). A failover DNS setup then requires two machines with identical config files, one as the master and one as the backup. However if you have dhcpd running on the network, the primary name server machine will be updated continuously with the addresses from the dhcpd, which the backup primary will not get. On failover, you will loose name resolution on these addresses until they renew their lease.
If you are going to LVS named, and are running LVS-DR or LVS-Tun, as usual make sure your named is listening on the VIP (not the RIP).
dhcpd has its own failover/redundancy mechanism. You can't LVS a dhcpd server - it has a database of its leases and no other machine can have the same list. dhcpd can be setup with multiple dhcpd servers on the same network and they pass the updates to each other. Unfortunately it doesn't work - you get to a stage where one machine will mistakenly think that another machine is incharge of all the IPs and both machines refuse to answer requests. The problem has been posted to the dhcpd mailing list for several years without any answers from the dhcpd authors. The only thing to do when this happens is to kill all the dhcpd servers, erase the lease table files, touch new ones, and start the servers again. I went back to having only one dhcpd server and left the other one turned off waiting as a backup.
This setup for LVS'ing named is from Ted Pavlic. Two (independant) connections, tcp and udp to port 53 are needed.
Here is part of an lvs.conf file which has dns on two realservers.
#dns, note: need both udp and tcp #A realserver must be able to determine its own name. #(log onto machine from console and use nslookup # to see if it knows who it is) # and to do DNS on the VIP and name associated with the VIP #To test a running LVS, on client machine, run nslookup and set server = VIP. SERVICE=t dns wlc 192.168.1.1 192.168.1.8 SERVICE=u dns wlc 192.168.1.1 192.168.1.8 |
If the LVS is run without mon, then any setup that allows the realservers to resolve names is fine (ie if you can sit at the console of each realserver and run nslookup, you're OK).
If the LVS is run with mon (e.g. for production), then dns needs to be setup in a way that dns.monitor can tell if the LVS'ed form of dns is working. When dns.monitor tests a realserver for valid dns service, it first asks for the zone serial number from the authoritative (SOA) nameserver of the virtualserver's domain. This is compared with the serialnumber for the zone returned from the realserver. If these match then dns.monitor declares that the realserver's dns is working.
The simplest way of setting up an LVS dns server is for the realservers to be secondaries (writing their secondary zone info to local files, so that you can look at the date and contents of the files) and some other machine (e.g. the director) to be the authoritative nameserver. Any changes to the authoritative nameserver (say the director) will have to be propagated to the secondaries (here the realservers) (delete the secondary's zone files and HUP named on the realservers). After the HUP, new files will be created on the secondary nameservers (the realservers) with the time of the HUP and with the new serial numbers. If the files on the secondary nameservers are not deleted before the HUP, then they will not be updated till the refresh/expire time in the zonefile and the secondary nameservers will appear to dns.monitor to not be working.
LVS is no better than DNS for the same number of working DNS servers. However if a DNS server fails...
Nick Burrett nick (at) dsvr (dot) net 20 Jan 2004
Consider a client with a resolv.conf with IPs:
10.0.0.10 10.0.0.11 |
If 10.0.0.10 is taken offline, then the client application's speed at getting domains resolved is drastically reduced, because the resolver library will always query 10.0.0.10 before querying 10.0.0.11. Sticking DNS behind LVS alleviates this. Monitoring software will failout the dead DNS realserver.
anon
I'm planning to put my company's dns on lvs with ha.
Greg Woods woods (at) ucar (dot) edu 30 Aug 2002
Unless you have a really unusual situation, I think using LVS for DNS is massive overkill. There is no way that DNS load should overwhelm a single server. If it does, you probably are in dire need of some subdomains. What I do here is just use the heartbeat code so that the hot spare backup machine will take over if the primary goes down, and I do have a restart script that uses scp to move the data files that have been modified over to the backup machine. scp is called out of a script that will keep trying the scp until it succeeds, in case the backup machine is down at the time a change is made. This seems to work for us.
I do use LVS for our mail system, but then, the mail system does anti-spam IP address blacklist checking, and virus scanning. That means the overhead of establishing a connection through LVS is small compared to the load on the server to process a connection. I don't think this is the case for DNS.
Jeff Kilbride
Does anybody else agree that load balancing DNS servers with LVS is not worthwhile?
Peter Mueller pmueller (at) sidestep (dot) com 2005/04/18
Yes, for authorative. ISC-bind has some kind of response-latency measurement built-in. For client side, LVS is useful. In the event of the first server in /etc/resolv.conf failing, there's a 2 second timeout that can be avoided.
If you're LVS'ing named, you may wind up with many VIP's on your director.
The problems to be solved with setting up an LVS'ed samba are
Lapin(c) lapin (at) linagora (dot) com 04 Mar 2004
Here is a draft for an LVS-Samba HOWTO (http://www.lapinux.org/howto/) that load balances Samba with LVS-NAT. There are still modifications to add and some tricks to point out, but all feedback will be helpful.
I just tried to make the samba realservers invisible to each other with iptables rules. The only visible machines are an LDAP server and the director. It still has some (undocumented) drawbacks, but I can authenticate against 2 samba realservers and I can access shares on each of them (directly in their filesystem). Unsolved is the problem of sync for the shares: I've thought about a SAN, or some DRBD cross definition. I still to solve this.
Joe: This is big news. I haven't read all of Fred's docs yet, or set one of these up, but Fred seems to have solved the many reader/ single writer problem by having a single LDAP database for all Samba servers and by having (or assuming) a single file system for the shares.
Will McDonald wmcdonald (at) gmail (dot) com 21 Mar 2006
We have a simple Samba share available on some systems sat behind a pair of LVSs. We have 2 directors in Active/Passive NATing through to two realservers running Heartbeat in Active/Passive. So only one of the realservers has the Heartbeat managed VIP the LVSs NAT through to at any one time. I know for our purposes the realservers could just sit on the same subnet as our other servers but this is an inherited setup and there are other reasonable reasons for it to be like this. Samba's not the realservers *primary* role, there are other services too. The reason they're Active/Passive is because DRBD devices can only be mounted on one node at any one time.
The LVSs are running CentOS4 and the repackaged Ultramonkey packages out of the CentOS Extras repository
heartbeat-ldirectord-1.2.3.cvs.20050927-1.centos4 heartbeat-stonith-1.2.3.cvs.20050927-1.centos4 heartbeat-pils-1.2.3.cvs.20050927-1.centos4 heartbeat-1.2.3.cvs.20050927-1.centos4 |
ipvsadm: # TCP 192.168.24.45:445 rr persistent 600 -> 192.168.25.10:445 Masq 1 2 0 TCP 192.168.24.45:139 rr persistent 600 -> 192.168.25.10:139 Masq 1 0 0 UDP 192.168.24.45:137 rr persistent 600 -> 192.168.25.10:137 Masq 1 0 0 UDP 192.168.24.45:138 rr persistent 600 -> 192.168.25.10:138 Masq 1 0 0 |
The ldirectord.cf on the LVSs looks as follows...
# TEST SAMBA THROUGH TO DBVIP virtual=192.168.24.45:137 real=192.168.25.10:137 masq service=none scheduler=rr persistent=600 protocol=udp # TEST SAMBA THROUGH TO DBVIP virtual=192.168.24.45:138 real=192.168.25.10:138 masq service=none scheduler=rr persistent=600 protocol=udp # TEST SAMBA THROUGH TO DBVIP virtual=192.168.24.45:139 real=192.168.25.10:139 masq service=none scheduler=rr persistent=600 protocol=tcp # TEST SAMBA THROUGH TO DBVIP virtual=192.168.24.45:445 real=192.168.25.10:445 masq service=none scheduler=rr persistent=600 protocol=tcp |
The back-end boxes are FC3 running Ultramonkey packages again, and DRBD for disk replication.
heartbeat-stonith-1.2.3-2.fr.c.1 heartbeat-pils-1.2.3-2.fr.c.1 heartbeat-1.2.3-2.fr.c.1 |
Samba startup is handled from /etc/ha.d/haresources by simply including "smb" as a resource which starts/stops on failover. The smb.conf's very simple too...
[global] server string = Samba on %h hosts allow = 192.168.24. log file = /var/log/samba/%m.log max log size = 5000 security = share socket options = TCP_NODELAY SO_RCVBUF=8192 SO_SNDBUF=8192 interfaces = 192.168.25.6/32 192.168.25.10/32 dns proxy = no [ftp] comment = Test FTP Homes browseable = yes writeable = yes guest ok = yes path = /mnt/sharedhomes/ |
This has been pretty reliable but it's not high volume by any stretch of the imagination. Nor is it attached to a domain so I'm not sure how you'll get on with browser mastering etc.
The topic of serving samba on an LVS was first raised by John Rodkey rodkey (at) wesmont (dot) edu who wondered if he could serve 300 w2k machines with samba/LVS. Not knowing much about SMB I put out a request for help on samba-technical (at) lists (dot) samba (dot) org. I got replies from several people, including the samba developer Chris Hertal, and from Ryan Fox (who had setup an LVS and who had even read the LVS-HOWTO). I also got a free 2hr phone tutorial by John Terpstra jht (at) samba (dot) org on 26 Oct 2001. One of the big problems is that samba is peer-peer, while LVS works with server-client connections. Wensong has said on the mailing list that you can use samba in read-only mode over LVS, but this will not be of much use to a bunch of windows boxes.
Apparently there's a lot of interest in the commecial world in highly available samba clustering and some effort has been put into making LVS work with samba, by people who don't come up on the LVS mailing list. No-one has succeeded and now it's generally thought that LVS is not the way to go.
Here's John's tutorial as I copied it down over the phone. Thanks John for your help and time.
Microsoft's clustering service is derived from the DEC Wolfpack, which was originally a bulletproof, all things to all men, industrial strength, cluster and failout framework. Microsoft is using the part of Wolfpack that corresponds to Linux-HA.
Most communication between windows machines in setting up logins, finding resources (printers, disks, network) is between peers, rather than server/client as for unix. Any machine then will be able to find the resources on the network, whereas in unix, the clients have to find out by some mechanism external to the host (e.g.phone up the sysadmin). The same ports are used at both ends and communication is usually by broadcast (at least initially). Thus there is no distinction between a samba server and a samba client. One host may have the files the user wants and to unix people, this host would be the server. But in windows there are two peers: one machine has a file and the other machine may want it - the role can in principle be reversed without any change in the setup. Election amongst peers is used to determine who will have the role of knowing the location of other resources (e.g. becoming the domain node controller). Unlike unix where you setup a machine deliberately to be a server, by setting up demons listening on a socket, with windows you cannot be guarantee that a certain machine will assume a particular role. You can bias the election (e.g. machines which have been up longer have more weight), but you can't rig the election. If you have to bring a machine down, it's down and it's less likely to win any new elections (possibly for a long time). In a LVS clustered samba setup (if such a thing could be made to exist), a long running client machine out on the internet might win the election and assume the role of locator service.
Communication between machines using IP, is in fact encapsulating netware (if running Novell) or netbuei datagrams inside IP. Samba uses netbios over IP.
To unix people the network is sometimes thought of as a hardware layer. To windows, the network is a netbios messaging layer. Two windows machines could be connected by several protocols (netbuei, netbios, tcpip) over the same piece of wire (ethernet). These connections are regarded as being separate and independant networks - i.e. they use different names for the machines at each end.
Here is an application talking to the kernel in windows.
---------- | | | app | | | ---------- | | user space __________________ | | kernel space | ---------- | | | WIN32API | communicates with cloud by broadcast messages | | ---------- | /--------------------------------------------\ | CLOUD | | replies to bcast messages from WIN32API | | | | --------- --------- ------------ | | | | | | | | | | | locator | | SMB API | | redirector | | | | service | | | | | | | --------- --------- ------------ | | | | | | --------- ----------- ----------- | | | file | | local | | remote | | | | system | | procedure | | procedure | | | | drivers | | calls | | calls | | | --------- ----------- ------------ | | | \--------------------------------------------/ |
WIN32API - all communication with cloud is by broadcast. The appropriate box from the cloud will reply. There is no direct connection to drivers as in unix, where the kernel asks the disk driver to "open file X" (on behalf of the application).
SMB API - nothing happens in windows without SMB being involved.
locator - knows where resources (printers, disks, network connections) are
redirector - sends services.
resolver - uses SMB messages to find out where to go.
netware - messes everything up, it's incompatible with the rest of the kernel (e.g. as you'll find if you try to connect by netware _and_ tcpip).
let's look a little more closely
---------- | win32api | ---------- | --------- | sbm api | --------- | --------- | Netbios | | api | --------- |
SMB has to decide if request is local or remote
Netbios converts SMB message to a netbios datagram and puts it on the wire as a netbuei or netware message when running IP.
SMB uses netbios over tcpip (not netbios or netware). It uses 3 ports
Every client has to be able to find the local master browser, (domain master browser != local master browser). This could be any machine. Election is conducted by broadcast over udp 137,138 (the election can be biased, but the outcome cannot be forced/guaranteed). What we think of as the samba server, may not win. Broadcast udp will not go over a router, so if the network is routed, then tcp unicast is used for the election (as well as udp broadcasts), telling client to use WINserver (which will be a samba machine or NTWINserver).
When a new windows machine comes on the net (e.g. an smb client or our samba server), it needs to establish that it has a unique name. Name space is handled by contest. The machine udp broadcasts its name (e.g. JACK) 4 times at 200msec interval and asks "who is local master browser?". A samba server will announce that it is "JILL". If there is another machine of the same name already, it will send back a <NACK>. If there are no <NACK>s, the local master browser will accept the name. The client will register its name by udp broadcast (or possibly tcp unicast) with the WINserver, into the workgroup or domain.
The user will then see something in "network neighborhood". The client machine will do a udp 138 unicast to the local master browser "give me browse list enumeration" (the local master browser has information from the domain master browser too).
On a multisegmented, routed network, each segment has its own local master browser. One machine will be both a local master browser and a domain master browser.
If the user clicks on a machine in "network neighbourhood" (and is using WINserver), the client machine will send a "name lookup request" (like a DNS request) - a netbios unicast request to udp 137 on the local master browser and get the IP of the machine. The client registers (includes services available) with the other machine.
The client machine will then send a tcp 139 "session setup request", and then sets up a netbios connection over tcp to IPC$share on the machine. This setup involves an SMB "net_prot" (negotiate protocol) exchange to setup protocol(s) and establish whether the client can use long filename support and UC/lc letters.
The client has connected with an empty username and passwd at this stage. The client now authenticates and receives back a list of printers, files and is given a persistent connection. The original (passwdless) connection is pulled down.
After 10-15mins of inactivity, the client kernel may elect to drop its session (even if an application is in the middle of editing an open file on the remote machine). The application has no knowlege of this disconnect. When something happens in the application again (or you click on network neighbourhood etc), the session will be renegotiated.
If the remote machine has gone down in the mean time or the client is connected to our hypothetical samba LVS and is redirected to a new samba server (which doesn't know anything about the client's original connection), the user will get a message that the connection cannot be re-established and that the user will have to exit from the application (without saving the edits). Ha-ha, just kidding - that's what you should get - you'll actually get the BSOD.
Kai Suchomel1 KAISUCH (at) de (dot) ibm (dot) com 12 Jun 2006
The Samba Service uses a SAN Filesystem, here GPFS. This File system is shared among all the Samba Services on the RS. When I connect to VIP and the SAN Filesystem, the client can connect to any realserver. When the RS fails, after doing a reconnect, the Client can access the SAN Filesystem over another RS.
Note | |
---|---|
Multiple ports are involved here. However you don't have to LVS all the ports. As far as LVS is concerned, only port 177 needs to be LVS'ed. However you have to know about all the ports to get xdmcp to work, so it's in the multi-port services section. |
Not so long ago, a common practice was to serve all applications from a central server to a diskless X-terminal (like an NCD) which ran an X-server from ROM. User's files were backed up centrally. Upgrades/fixes to applications for 100s of clients was a matter of writing the new files to the single, large, reliable central server. The fixes appeared simultaneously for all clients. We've all realised the fundamental flaw in this setup and now we have the applications running on several 100 desktop machines, where upgrades and fixes can take weeks to propagate. The fix to the fix is to run thin clients on the desktops (no wait, didn't we already do that one).
Warning | |
---|---|
This method does not work |
X window is another client-server protocol. The X-client asks for a connection to the X-server by calling from ports starting at 6000 and the server will start displaying X images on its display. If you don't think about it too much, it seems that X should work through LVS. However
Lidsa lidsa (at) legend (dot) com (dot) cn 24 Apr 2002
..but the realserver is the X-client and the X-server does not reside on the realserver. So I think it impossible for LVS to forward X-window.
If you connect from your LVS client to VIP:telnet on an LVS-DR (you will now be connected to one of the realservers) and start xclock on the realserver, you'll get the xclock image on the lvs client (provided that you have a direct connection also between the realserver and the client). If you look with netstat -an you'll find that RIP:1025 is ESTABLISHED with CIP:6000. Yes, the LVS client is the X-server - it is not the X-client. The realserver is the X-client. You can't use LVS to forward X-sesssions.
This method is most like the login from a diskless xterm.
Severin Olloz showed that you can use an LVS to serve X-sessions by running xdmcpd on the realservers. Severin had problems initially with some logins locking up, but this apparently was due to a misconfiguration of one of his realservers.
When I tried it, I was still able to login from the lvs client after leaving the connection idle for a few hours at the xdm login screen. After leaving the nodes idle overnight I couldn't get a login at the LVS client anymore. On one occasion xdm was running on the node corresponding to the login shown on the client. I restarted xdm on that node and could connect again. On another occasion xdm had died on one of the realservers and the client was just showing the background color for the X-window and a functional mouse, but no xdm login screen. The connections to port 6000 on the lvs client were also gone. I restarted xdm on all the realservers and restarted the client ("X :1 -query VIP") but did not get the xdm login screen. I could connect after running ipvsadm again.
Presumably timeouts will need to be explored to make a working xdmcp LVS.
Severin Olloz S (dot) Olloz (at) soid (dot) ch 30 Apr 2002
I have set up an LVS-DR X11-Server. The LVS client makes a XDMCP-Query with a command like this:
X :1 -query VIPand the director of the cluster sends the UDP packets on port 177 (XDMCP) to the realserver. The realserver accepts the request and opens a X11-session for the user. (Note: the realserver is opening a direct connection to CIP:600x - this is not under control of the LVS. The LVS client and the realserver must be able to exchange packets directly.) My ipvsadm table looks like this:
director:~# grep xdmcp /etc/services xdmcp 177/tcp # X Display Manager Control Protocol xdmcp 177/udp
IP Virtual Server version 1.0.2 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn UDP VIP:xdmcp wlc persistent 360 -> node1:xdmcp Local 100 0 0 -> node2:xdmcp Route 100 0 0The director is a realserver too, using localnode.
Here's more details I discovered when I reproduced Severin's setup.
For info on XDMCP, see the Linux XDMCP HOWTO and the many links provided therein.
In this method, X-clients on the realservers connect directly to the X-server on the LVS client. The LVS is only used for xdmcp authentication. Once this step has been accomplished, the LVS steps out of the way and the X-session is between the realserver selected and the client. The client then must be able to send packets directly to the RIP on the realserver. In a normal LVS-DR, the RIP is not routable from the lvs client. The RIPs will have to be routable or public IPs.
For a test, first connect directly from your lvs client box to a realserver (no director or LVS involved yet). Setup your xdm-config, Xaccess files on the realserver(s) as described in the XDMCP HOWTO and check the permissions of Xservers and Xsetup_0. Make sure xdm is running on the realserver (the XDMCP-HOWTO does this via the inittab file, but you can just fire it up from the command line for a test). Check that xdm is running
RS1:/etc# ps -auxw | grep xdm root 329 0.0 1.7 2892 1088 ? S 11:40 0:00 xdm root 331 0.5 3.9 5612 2456 ? S 11:40 0:01 /usr/X11R6/bin/X -auth /usr/X11R6/lib/X11/xdm/authdir/authfiles/A:0-Z |
Run the next command. If you don't have X running, it will be started for you. If your LVS client is displaying an X-window (i.e. you ran `startx`) then the client at the other end will overwrite your current X-session.
client# X :1 -query RIP |
The original window manager screen should disappear on your client box to be replaced by the xdm login from the realserver. If you just have a blank screen on the client, with a mouse X but no login box, check that xdm is running on the realserver. After you login, you get the window manager set in /etc/X11/xdm/Xsessions. In the default Xsession file, xsm is used which defaults (see man xsm) to running twm with smproxy and an xterm. This is pretty gruesome, so I substituted xsm with my window manager, fvwm2. Here's part of Xsession.
if [ -f "$startup" ]; then exec "$startup" else if [ -f "$resources" ]; then xrdb -load "$resources" fi #exec xsm exec /usr/X11/bin/fvwm2 fi |
From the console on the client, you can see the connections from the realserver back to the X-server on the client.
client# netstat -an Active Internet connections (including servers) Proto Recv-Q Send-Q Local Address Foreign Address State tcp 0 0 client:6001 realserver:1067 ESTABLISHED tcp 0 0 client:6001 realserver:1066 ESTABLISHED tcp 0 0 client:6001 realserver:1065 ESTABLISHED tcp 0 0 client:6001 realserver:1063 ESTABLISHED tcp 0 0 client:6001 realserver:1059 ESTABLISHED tcp 0 0 0.0.0.0:6001 0.0.0.0:* LISTEN tcp 0 0 0.0.0.0:6000 0.0.0.0:* LISTEN . . Active UNIX domain sockets (including servers) Proto RefCnt Flags Type State I-Node Path unix 1 [ ACC ] STREAM LISTENING 399263 /tmp/.X11-unix/X1 . |
Now set up the director to forward xdmcp/udp and connect to VIP:xdmcp. Note: I'm not using persistence, while Severin is. Non-persistence seems to work for an LVS of 4 realservers.
client# X :1 -query VIP |
director:~# ipvsadm IP Virtual Server version 0.9.4 (size=4096) Prot LocalAddress:Port Scheduler Flags -> RemoteAddress:Port Forward Weight ActiveConn InActConn UDP lvs.mack.net:xdmcp rr -> RS4.mack.net:xdmcp Route 1 0 1 -> RS3.mack.net:xdmcp Route 1 0 0 -> RS2.mack.net:xdmcp Route 1 0 0 -> RS1.mack.net:xdmcp Route 1 0 0 |
`netstat -an` doesn't show any connections to the VIP (it's udp afterall), but the connections to the X-server ports on the client are seen along with the entry for X1. Once you have logged in via xdm, the client and realserver are connected directly and LVS is not involved anymore. After you logout from the X-session at the client, and return to the XDMCP login screen, the connections to port 600x are gone.
After exiting from an X-session the client will be presented with a new xdm login screen. Watching with tcpdump shows the following steps following the termination of an X-session.
It appears that xdmcp only presents the login screen and that login occurs via the X connection later. If both painting the login screen initially and (after the timeout on the director, about 3mins) sending the name/passwd used xdcmp, then it's possible that the login data could be sent to a different host that painted the screen. This apparently can't happen. (Joe, May 2004: I have no idea why I said that this "apparently can't happen.)
You could setup an xterm farm with this using a bunch of diskless 486 PCs with 16M memory.
In this method, you have your window manager running on your LVS client and you are displaying realserver X-clients on the X-server running on your LVS client.
To setup, on the realserver, have the entry "X11Forwarding yes" in sshd_config (re-HUP sshd if neccessary). On the client, have the entry "ForwardX11 yes" in ssh_config. If you like, as a test, ssh (with `ssh -v`) directly from the client box to the realserver (not to the VIP) as if you were doing a regular ssh login. After login, look to see that X-forwarding is turned on by looking at the DISPLAY variable.
. - verbose output from login with `ssh -v remote_node` - . debug1: channel_free: channel 1: status: The following connections are open: #0 client-session (t4 r0 i1/0 o16/0 fd 4/5) #1 x11 (t4 r2 i8/0 o128/0 fd 7/7) realserver:~# echo $DISPLAY realserver:10.0 realserver:~# |
Note | |
---|---|
"realserver" is the name of the machine you have logged into (it might be "localhost"). The name you get will NOT be the name of the machine with the X-server that will be displaying the X (as would normally happen with non-forwarded X connections). Here your X-server is the lvs client. |
The $DISPLAY variable is showing where X-clients running on the realserver will send their output. In this case "realserver:10.0" is a proxy X-server running on the remote machine, which will forward the X-calls to the X-server running on the lvs client machine (the output will not go to the realserver). If you now run `xclock` on the realserver, it will be displayed on the lvs client machine.
Next setup your director to forward ssh. For more info see the section on sshd. In particular make sure all the host keys on the realservers are identical. Connect to VIP:sshd. You should now be able to start X-clients apparently running on the VIP (but really running on the realserver).
An example of using rsh to copy files is in performance data for single realserver LVS Sect 5.2,
Note | |
---|---|
Caution: The matter of rsh came up in a private e-mail exchange. The person had found that rshd, operating as an LVS'ed service, initiated a call (rsh client request) to the rshd running on the LVS client. (See Stevens "Unix Network Programming" Chapter 14, which explains rsh.) This call will come from the RIP rather than the VIP. This will require rsh to be run under LVS-NAT or else the realservers must be able to contact the client directly. Similar requests from the authd/identd client and passive ftp on realservers cause problems for LVS. |
David Lambe david (dot) lambe (at) netunlimited (dot) com Mon, 13 Nov 2000
I've recently completed "construction" of a LVS cluster consisting of 1 LVS and 3 realservers. Everything seems to work OK with the setup except for rcp. All it ever gives is "Permission Denied" when running rcp blahfile node2:/tmp/blahfile from a console on node1. Both rsh and rlogin function, BUT require the password to be entered twice.
Joe
sounds like you are running RedHat. You have to fix the pam files. The beowulf people have been through all of this. You can either recompile the r* executables without pam (my solution), or you can fiddle with the pam files. For suggestions, go to the beowulf mailing archives - you have to download the whole archive at whole archive and grep through it.
If you go to the beowulf site, you'll find people are moving to replace rsh etc with ssh etc on sites which could be attacked from outside (and turning off telnet, r* etc). For examples setup files for ssh see the section on sshd.
Jerry Glomph Black black (at) real (dot) com August 25, 2000
RealNetworks' streaming protocols are
The server configuration can be altered to run on any port, but the above numbers are the customary, and almost universally-used ones.
Mark Winter, a network/system engineer in my group wrote up the following detailed recipe on how we do it with LVS:
add IP binding in the G2 server config file
<List Name="IPBindings"> <Var Address_1="<real ip address>"/> <Var Address_2="127.0.0.1"/> <Var Address_3="<virtual ip address>"/> </List> On the LVS side ./ipvsadm -A -u <VIP>:0 -p ./ipvsadm -A -t <VIP>:554 -p ./ipvsadm -A -t <VIP>:7070 -p ./ipvsadm -A -t <VIP>:8080 -p ./ipvsadm -a -u <VIP>:0 -r <REAL IP ADDRESS> ./ipvsadm -a -t <VIP>:554 -r <REAL IP ADDRESS> ./ipvsadm -a -t <VIP>:7070 -r <REAL IP ADDRESS> ./ipvsadm -a -t <VIP>:8080 -r <REAL IP ADDRESS> |
(Ted)
I just wanted to add that if you use FWMARK, you might be able to make it a little simpler and not have to worry about forwarding EVERY UDP port.
# Mark packets with FWMARK1 ipchains -A input -d <VIP>/32 7070 -p tcp -m 1 ipchains -A input -d <VIP>/32 554 -p tcp -m 1 ipchains -A input -d <VIP>/32 8080 -p tcp -m 1 ipchains -A input -d <VIP>/32 6970:7170 -p udp -m 1 # Setup the LVS to listen to FWMARK1 director:/etc/lvs# ipvsadm -A -f 1 -p # Setup the realserver director:/etc/lvs# ipvsadm -a -f 1 -r <RIP> |
Not only is this only six lines rather than eight, but now you've setup a persistent port grouping. You do not have to forward EVERY UDP port, and you're still free to setup non-persistent services (or other persistent services that are persistent based on other ports).
When you want to remove a realserver, you now do not have to remove FOUR realservers, you just remove one. Same thing with adding. Plus, if you want to change what's forwarded to each realserver, you can do so with ipchains and not bother with taking up and down the LVS. ALSO... if you have an entire network of VIPs, you can setup IPCHAINS rules which will forward the entire network automatically rather than each VIP one by one.
Jerry Glomph Black black (at) prognet (dot) com 07 Jun 2001
Following is a currently-operational configuration for LVS balancing of a set of 3 RealServers (or Real Servers, in LVS-terminology) It has been running at very high loads (thousands of simultaneous connections) for months, in addition to numerous conventional LVS setups for more familiar web load-balancing at massive loads.
#!/bin/sh # LVS initialization for RealNetworks streaming. # # client connects on TCP ports 554 (rtsp) or 7070 (pnm, deprecated) # data returns to client either as UDP on port-range 6970-7170, or # via the initial TCP socket, if the client cannot receive the UDP stream. # written and tested to very high (several thousand simultaneous) client load by # Mark Winter, network department, RealNetworks # additional LVS work by Rodney Rutherford and Glen Raynor, internet operations # with random comments by Jerry Black, former Director of Internet Operations # supplied with no warranty, support, or sympathy, but it works great for us # Setup IP Addresses VIP="publicly-advertised-IP-number.mynet.com" RIP_1="RealServer-1.mynet.com" RIP_2="RealServer-2.mynet.com" RIP_3="RealServer-3.mynet.com" # Load needed modules BALANCE="wrr" # Load LVS fwmark module /sbin/modprobe ip_masq_mfw # Load appropriate LVS load-balance algorithm module /sbin/modprobe ip_vs_$BALANCE # Mark packets with FWMARK1 /sbin/ipchains -F /sbin/ipchains -A input -d ${VIP}/32 7070 -p tcp -m 1 /sbin/ipchains -A input -d ${VIP}/32 554 -p tcp -m 1 /sbin/ipchains -A input -d ${VIP}/32 8080 -p tcp -m 1 /sbin/ipchains -A input -s 0.0.0.0/0 6970:7170 -d ${VIP}/32 -p udp -m 1 # Setup the LVS to listen to FWMARK1 /sbin/ipvsadm -C /sbin/ipvsadm -A -f 1 -p -s $BALANCE # Setup the realservers /sbin/ipvsadm -a -f 1 -r ${RIP_1} /sbin/ipvsadm -a -f 1 -r ${RIP_2} /sbin/ipvsadm -a -f 1 -r ${RIP_3} |
Roberto Nibali ratz (at) tac (dot) ch 08 Jun 2001
there is no fwmark module, and the ip_vs module is loaded by ipvsadm now. Why do you need persistence?
philz (at) testengeer (dot) com 3 Apr 2000
A realnetworks g2 server is the daemon that serves up real audio/video streams (http://real.com). I'm using LVS-Tun. When I tried setup a realnetworks g2 server I could not get it to accept the connection (tcp port 7070). A telnet to port 7070 on the VIP yeilds a connection refused. while telnet to the realserver ip yeilds a "connect" (it also serves video and audio if you use the proper client).
Joe
Is the service listening on the VIP (a common thing to forget when setting up LVS-DR or LVS-Tun)?
That's it. Success! Here is what has to be done:
- The real real audio/video daemon must be configured to listen/respond to _BOTH_ the VIP and its RIP (see Configure->General Setup->IP Binding on the RealAdministrator web page).
- Both the 7070 and 554 (PNAPort and RTSPPort respectively) must be redirected. You might have to do more ports for other features of the real audio/video daemon.
er OK. The demon listening on the RIP never hears from anyone though ;-\
You actually need the RIP to respond so that you can manage/monitor it.
congratulations. You've got a realserver to be a RealServer. Is this the thing that costs $2995 with RedHat?
Nope. This is the free one that supports 25 session per server ;-)
What's on each of 7070 and 554? Is one video and the other audio? What does PNAPort and RTSPPort stand for? What happens if the client gets 7070 from one realserver and 554 from another? Did you have to link the 2 services with persistent connection?
First, a quicktime primer from Andy Wettstein:
It is similar to Real. 554 is rtsp, and there is an option on the quicktime server to stream over port 80 to avoid firewall problems. The ports 6970:7170 are what the client will actually send/receive the stream on (if not blocked by firewall rules, etc). The udp stuff is why you need persistence. The stream would try to switch between servers without persistence enabled (since udp is really connectionless).
Andy Wettstein awettstein (at) cait (dot) org 20 Dec 2002
I'm trying to set up the quicktime (darwin) streaming server through lvs. It kind of works, but it is very slow, much slower than just accessing the stream without going through lvs. I have set it up exactly the same as the Real rtsp examples. I am using lvs-dr with fwmark on ports. Here are the iptables commands I used:
# iptables -t mangle -A PREROUTING -i eth0 -p tcp -s 0.0.0.0/0 -d 209.174.123.48 --dport 80 -j MARK --set-mark 1 # iptables -t mangle -A PREROUTING -i eth0 -p tcp -s 0.0.0.0/0 -d 209.174.123.48 --dport 554 -j MARK --set-mark 1 # iptables -t mangle -A PREROUTING -i eth0 -p udp -s 0.0.0.0/0 -d 209.174.123.48 --dport 6970:7170 -j MARK --set-mark 1Then I added the lvs-dr like the examples:
# ipvsadm -A -f 1 -s rr # ipvsadm -a -f 1 -r 209.174.123.45 # ipvsadm -a -f 1 -r 209.174.123.47And I get this with ipvsadm:
FWM 1 rr -> lead.web.cait.org:0 Route 1 0 1 -> tin.web.cait.org:0 Route 1 1 1I am also unable to access the stream on port 80 through lvs. If anyone has experience with quicktime please let me know if there is anything further that I need to do.
I figured it out. It needs persistence (or streaming movies will fail) i.e. the ipvsadm -A command needs a "-p".
Here's the mon.cf
watch tin service rtsp interval 30s monitor tcp.monitor -p 554 period wd {Sun-Sat} startupalert qtss.alert -u -V caittv.cait.org -R tin.cait.org -W 3 -m 1 -S wlc upalert qtss.alert -R tin.cait.org -W 3 -m 1 -F dr -s wlc alert qtss.alert -R tin.cait.org -m 1 |
and the qtss.alert
#!/bin/bash # IPTABLES="/sbin/iptables" IPVSADM="/sbin/ipvsadm" while getopts ":s:g:h:l:t:V:m:o:W:R:S:u" Option do case $Option in V) VIRTUALSERVER="$OPTARG";; m) MARK="$OPTARG";; o) OPTION="$OPTARG";; W) WEIGHT="$OPTARG";; R) REALSERVER="$OPTARG";; S) SCHEDULER="$OPTARG";; u) UP=1;; esac done shift $(($OPTIND - 1)) if [ $UP ]; then # won't add more iptables MARK rules after the initial go # so we don't clog up the rules # you'll have to resolve problems if you need to add more to the marked service if ! $IPTABLES -L -t mangle | grep "MARK set 0x$MARK" > /dev/null; then $IPTABLES -t mangle -A PREROUTING -i eth0 -p tcp -s 0.0.0.0/0 -d $VIRTUALSERVER --dport 80 -j MARK --set-mark $MARK $IPTABLES -t mangle -A PREROUTING -i eth0 -p tcp -s 0.0.0.0/0 -d $VIRTUALSERVER --dport 554 -j MARK --set-mark $MARK $IPTABLES -t mangle -A PREROUTING -i eth0 -p udp -s 0.0.0.0/0 -d $VIRTUALSERVER --dport 6970:7170 -j MARK --set-mark $MARK fi # set up the virtual server $IPVSADM -A -f $MARK -s $SCHEDULER -p # add the realserver $IPVSADM -a -f $MARK -w $WEIGHT -r $REALSERVER else # remove $IPVSADM -d -f $MARK -r $REALSERVER fi exit 0 |
Mark Weaver mark (at) npsl (dot) co (dot) uk 23 Mar 2004
Here's how to setup Windows Media Server. This information is not easy to come across as I can't find a simple published document which lists what WMS actually does. There is also some attempt here at WMS9 support, but that's untested and is just based on what the player tries to do (the player connects more quickly, however, if you reject rather than drop those connection attempts, which I'm letting the server do).
# WMS: we want to group TCP 1755 and UDP 1024-500 # Also uses 554/tcp + 554/udp for WMS9. # You might also want to add port 80 if serving up via http as well. # To do this, set an fw mark on such connections, and use LVS fwmark balancing ( # will forward matching IP+fwmark to the same server). Just what we need. EXT_IP="1.2.3.4" EXT_IF="eth0" WMS_MARK="1" RS1_IP="192.168.1.2" RS2_IP="192.168.1.3" # Allow appropriate ports in... $IPTABLES -A INPUT -i $EXT_IF -p tcp -s 0/0 -d $EXT_IP --dport 1755 -j ACCEPT $IPTABLES -A INPUT -i $EXT_IF -p tcp -s 0/0 -d $EXT_IP --dport 554 -j ACCEPT $IPTABLES -A INPUT -i $EXT_IF -p udp -s 0/0 -d $EXT_IP --dport 554 -j ACCEPT $IPTABLES -A INPUT -i $EXT_IF -p udp -s 0/0 -d $EXT_IP --dport 1024:5000 -j ACCEPT # Group with fwmark... $IPTABLES -t mangle -A PREROUTING -i $EXT_IF -p tcp -s 0/0 -d $EXT_IP --dport 1755 -j MARK --set-mark $WMS_MARK $IPTABLES -t mangle -A PREROUTING -i $EXT_IF -p tcp -s 0/0 -d $EXT_IP --dport 554 -j MARK --set-mark $WMS_MARK $IPTABLES -t mangle -A PREROUTING -i $EXT_IF -p udp -s 0/0 -d $EXT_IP --dport 554 -j MARK --set-mark $WMS_MARK $IPTABLES -t mangle -A PREROUTING -i $EXT_IF -p udp -s 0/0 -d $EXT_IP --dport 1024:5000 -j MARK --set-mark $WMS_MARK # Tell LVS to do the load balancing $IPVSADM -D -f $WMS_MARK $IPVSADM -A -f $WMS_MARK -s rr -p 600 $IPVSADM -a -f $WMS_MARK -r $RS1_IP:0 -m $IPVSADM -a -f $WMS_MARK -r $RS1_2P:0 -m |
Francois Baligant 2000-05-10
We have a very weird problem load-balancing UDP-based RADIUS packets.
UDP 195.74.212.37:16450 rr -> 195.74.212.26:16450 Route 1 0 0 -> 195.74.212.34:16450 Route 1 0 0 UDP 195.74.212.31:1646 wlc -> 195.74.212.26:1646 Route 1 0 106 -> 195.74.212.10:1646 Route 1 0 106 UDP 195.74.212.31:1645 wlc -> 195.74.212.26:1645 Route 1 0 1 -> 195.74.212.10:1645 Route 1 0 0I have a series of NAS (Network Access Server) sending Authentication Requests to a single central Proxy Radius server (packets arrive sometimes 5packets/sec). This Proxy Radius Server then forwards Authentication Request to the load-balancer which should normally dispatch them to several nodes for processing (check with DB etc..)
We want to load-balance 3 ports: 1645 (authentication), 1646 (accounting) and 16450 (authentication for another kind of service).
The rule for port 1646 loadbalances. However for rule 16450 and 1645, all UDP requests go to only one realserver. (rule 16450 is not used at the moment. 1645 is. You can see the strange little "1" for 195.74.212.26) What's weird is that 1645 works really fine but the 2 others rules just do not load-balance. Packets are always sent to the same host. (in fact the first that was added to the VS IP)
Joe
Someone had a similar sounding problem with udp ntp. All packets would go to one host and then after a little while to another. In the short term the load balancing was bad, but over the long term (>15mins) the loadbalancing was fine. The udp LVS code sends all udp packets to one realserver, till a timeout is reached, and then sends the next packets to another realserver.
(See also Scheduling TCP/UDP.)
Julian
Julian
Single Radius Server? Does that mean that all packets come from a single IP:port too?
Don't forget that for UDP the autobind ports are not rotated. For TCP you have ports selected in the 1024..4999 range but it is possible all your client UDP packets to come from the same port on the client. This can be a good reason they to be redirected to the same realserver if the UDP entry is not expired. Show a tcpdump session or try to set UDP timeout to a small value:
ipchains -M -S 0 0 2 |
Any difference? How many clients (UDP sockets) you have? If you have one, it can't be balanced. There is a persistency according to the default UDP timeout value.
14:06:36.277177 195.74.193.40.60774 > 195.74.212.31.1645: udp 244 (DF) 14:06:36.277205 195.74.193.40.60774 > 195.74.212.31.1645: udp 244 (DF) 14:06:36.430549 195.74.193.40.60774 > 195.74.212.31.1645: udp 244 (DF) 14:06:36.430575 195.74.193.40.60774 > 195.74.212.31.1645: udp 244 (DF) 14:06:36.639869 195.74.193.40.60774 > 195.74.212.31.1645: udp 244 (DF) 14:06:36.639894 195.74.193.40.60774 > 195.74.212.31.1645: udp 244 (DF) 14:06:38.040246 195.74.193.40.60774 > 195.74.212.31.1645: udp 246 (DF) 14:06:38.040276 195.74.193.40.60774 > 195.74.212.31.1645: udp 246 (DF) 14:06:38.117694 195.74.193.40.60774 > 195.74.212.31.1645: udp 243 (DF) 14:06:49.899222 195.74.193.40.40190 > 195.74.212.31.1646: udp 349 (DF) 14:06:49.899256 195.74.193.40.40190 > 195.74.212.31.1646: udp 349 (DF) 14:06:50.358085 195.74.193.40.40223 > 195.74.212.31.1646: udp 349 (DF) 14:06:50.358114 195.74.193.40.40223 > 195.74.212.31.1646: udp 349 (DF) 14:06:51.494628 195.74.193.40.40346 > 195.74.212.31.1646: udp 349 (DF) 14:06:51.494656 195.74.193.40.40346 > 195.74.212.31.1646: udp 349 (DF) 14:06:51.810022 195.74.193.40.40381 > 195.74.212.31.1646: udp 349 (DF) 14:06:51.810051 195.74.193.40.40381 > 195.74.212.31.1646: udp 349 (DF) 14:06:52.351541 195.74.193.40.40485 > 195.74.212.31.1646: udp 199 (DF)I think you just helped me to understand what was the problem. Port 1645 is not loadbalancing. I will patch the radius to increate port number for accounting request too.