43. LVS: Newer networking tools: Policy Routing

43. LVS: Newer networking tools: Policy Routing
Prev		Next

43.1. Introduction

The standard network tools (e.g. ifconfig, route and netstat) aren't capable of setting up some of the features used in newer LVSs e.g. routing based on src_addr. For this we use iproute2, which allows routing based on almost any of the parameters of a packet (src, dest, proto, tos...). iproute2 is available at iproute2-current.tar.gz. iproute2 implements similar functionality to cisco's IOS.

For nodes attached to only one network (leaf nodes, i.e. there is only one possible route for packets), then ifconfig and route are just fine. If multiple routes exist then iproute2 is needed.

Presumably routing in Linux and the setup of LVS will move more toward using iproute2. The configure script will use the iproute2 package to do some configuration if you have it installed.

Instead of aliases (e.g. eth0:110) iproute2 uses labels. ip_tables is based on the same underlying code and also requires labels to recognise ip_aliases. If you want to see the network as ip_tables sees it, you need the iproute2 tools.

iproute2 is not compatible with ifconfig,route and netstat. The entries added by the iproute2 tools are not seen by ifconfig/route etc and the output of ifconfig/route etc will be incorrect. You can't tell from looking at the output of ifconfig/route whether iproute2 commands have been run - you just have to know. The iproute2 tools correctly interpret the results of ifconfig/route commands and will give the correct state of the network.

Unfortunately the user interface to iproute2 is not easy.

The documentation is not easy to read (although it was all Julian needed).
Ratz suggested "Policy Routing Using Linux" by Matthew G. Marsh, Pub Sams 2001, ISBN 0-672-32052-5, to get you started (it helped me). (Oct 2002) Ratz has just found that the book is also online
Padraig Brady padraig (at) antefactor (dot) com suggests Linux Advanced Routing and Traffic Control HOWTO.
See Guide to IP Layer Network Administration with Linux (http://linux-ip.net/) where Appendix C has information on using the iproute2 tools.
The output from the commands is difficult to parse (see the comments in the configure script for more details) - i.e. it's not machine readable. If the route is 0/0 then it is not listed in the output and the next output item shifts one field. This means that you have to know the route before you can parse the output. Ratz is developing a wrapper for iproute2 that will give machine readable output. (To have a command line utility which is not machine readable is intolerable.)

There are other problems

Joe, Dec 2003

The latest on Alexey's ftp site is 2.4.7 from Jan 2002. Is this really the latest?

Alejandro Mery amery (at) geeks (dot) cl 24 Dec 2003

2.4.7-now-ss010824 is the official lastest 'stable' but Bert Hubert (ahu) (Bert's website, http://ds9a.nl/) from lartc.org had an 'almost-branch' with some fixes and improvements with the date 2002-10-20. Bert's code is downloadable at http://ds9a.nl/cgi-bin/viewcvs.cgi/iproute2-ahu/iproute2-ahu.tar.gz?tarball=1&only_with_tag=HEAD" . Sadly both Bert's and Alexey's code are unmantained.

43.2. Policy Routing and ifconfig

Example:

In a normally functioning LVS-DR, with routing setup by "route" the realservers will be sending packets with the following routing
src_addr=VIP dest_addr=0/0. dest=0/0 - route via default gw
src_addr=RIP dest_addr=RIP network. dest=RIP network - route to RIP network
In LVS-DR a packet leaving the realserver can exit via the default gw or the director. In the standard setup, packets with dst_addr=RIPnetwork are put onto the local network and all other packets are sent to the default gw.
If instead the routing is setup by "iproute2", packets with src_addr=VIP are sent to the default gw, while packets with src_addr=RIP are put onto the local network. The realservers will be sending packets with the following routing
src_addr=VIP dest_addr=0/0. src=VIP - route via default gw
src_addr=RIP dest_addr=RIP network. src=RIP - route to RIP network
The result for a normal working LVS, will be the same (i.e. the LVS will still work). However with the standard setup, packets with scr_addr=RIP cannot get to the outside world (the director does not have a default route to 0/0). If a process needs this (e.g. the operator needs to telnet out, or the realserver needs DNS), then those packets from the RIP can be NAT'ed out via the director (or you can setup the realservers as if they are part of a 3-Tier LVS LVS). For security, all packets from the VIP have to go out the default gw (including any to say the DIP, which will be dropped by rules on the default gw, to prevent spoofing).
src_addr=VIP dest_addr=RIP network. src=VIP - route via default gw, will be dropped
src_addr=RIP dest_addr=0/0. src=RIP - route to RIP network. If the director has the correct NAT rules, then these packets can pass to the outside world.

Lawrence Strydom laurie (at) midafrica (dot) com 26 May 2003

Is it possible to set up heartbeat between a Linux and a Windose box. The MS box will be the master node and the Linux box will provide redundancy.(dont ask! it is what the client wants)

Horms

It should be theoretically possible to run heartbeat on Windows. But to my knowledge no one has done this in the past. The heartbeat code is reasonably portable (between different Unix-like operating systems) but it is likely that you will need to do quite a lot of work to get it to compile and work correctly on Windows. I have no experince with using cygwin so I can't comment any further than that.

43.3. Various debugging techniques for routes

(with Julian)

(I needed this information to setup a one-net LVS-NAT LVS. However since it is about routing and not LVS specifically, maybe I should move it elsewhere.)

The routes added with route go into the kernel FIB (Forwarding information base) route table. The contents are displayed with route (or netstat -a).

Following an icmp redirect, the route updates go into the kernel's route cache (route -C).

You can flush the route cache with

	echo 1 > /proc/sys/net/ipv4/route/flush
or
	ip route flush cache

Here's the route cache on the realserver before any packets are sent.

realserver:/etc/rc.d# route -C
Kernel IP routing cache
Source          Destination     Gateway         Flags Metric Ref    Use Iface
realserver      director        director              0      1        0 eth0
director        realserver      realserver      il    0      0        9 lo

With icmp redirects enabled on the director, repeatedly running traceroute to the client shows the routes changing from 2 hops to 1 hop. This indicates that the realserver has received an icmp redirect packet telling it of a better route to the client.

realserver:/etc/rc.d# traceroute client
traceroute to client (192.168.1.254), 30 hops max, 40 byte packets
 1  director (192.168.1.9)  0.932 ms  0.562 ms  0.503 ms
 2  client (192.168.1.254)  1.174 ms  0.597 ms  0.571 ms
realserver:/etc/rc.d# traceroute client
traceroute to client (192.168.1.254), 30 hops max, 40 byte packets
 1  director (192.168.1.9)  0.72 ms  0.581 ms  0.532 ms
 2  client (192.168.1.254)  0.845 ms  0.559 ms  0.5 ms
realserver:/etc/rc.d# traceroute client
traceroute to client (192.168.1.254), 30 hops max, 40 byte packets
 1  client (192.168.1.254)  0.69 ms *  0.579 ms

Although route shows no change in the FIB, the route cache has changed. (The new route of interest is bracketted by >< signs in the table below.)

 realserver:/etc/rc.d# route -C
 Kernel IP routing cache
 Source          Destination     Gateway         Flags Metric Ref    Use Iface
 client          realserver      realserver      l     0      0        8 lo
 realserver      realserver      realserver      l     0      0     1038 lo
 realserver      director        director              0      1      138 eth0
>realserver      client          client                0      0        6 eth0<
 director        realserver      realserver      l     0      0        9 lo
 director        realserver      realserver      l     0      0      168 lo

Packets to the client now go directly to the client instead of via the director (which you don't want).

It takes about 10mins for the client's route cache to expire (experimental result). The timeouts may be in /proc/sys/net/ipv4/route/gc_*, but their location and values are well encrypted in the sources :) (some more info from Alexey at LVS archives)

Here's the route cache after 10mins.

realserver:/etc/rc.d# route -C
Kernel IP routing cache
Source          Destination     Gateway         Flags Metric Ref    Use Iface
realserver      realserver      realserver      l     0      0     1049 lo
realserver      director        director              0      1      139 eth0
director        realserver      realserver      l     0      0        0 lo
director        realserver      realserver      l     0      0      236 lo

There are no routes to the client anymore. Checking with traceroute, shows that 2 hops are initially required to get to the client (i.e. the routing cache has reverted to using the director as the route to the client). After 2 iterations, icmp redirects route the packets directly to the client again.

realserver:/etc/rc.d# traceroute client
traceroute to client (192.168.1.254), 30 hops max, 40 byte packets
 1  director (192.168.1.9)  0.908 ms  0.572 ms  0.537 ms
 2  client (192.168.1.254)  1.179 ms  0.6 ms  0.577 ms
realserver:/etc/rc.d# traceroute client
traceroute to client (192.168.1.254), 30 hops max, 40 byte packets
 1  director (192.168.1.9)  0.695 ms  0.552 ms  0.492 ms
 2  client (192.168.1.254)  0.804 ms  0.55 ms  0.502 ms
realserver:/etc/rc.d# traceroute client
traceroute to client (192.168.1.254), 30 hops max, 40 byte packets
 1  client (192.168.1.254)  0.686 ms  0.533 ms *

If you now turn off icmp redirects on the director.

director:/etc/lvs# echo 0 > /proc/sys/net/ipv4/conf/all/send_redirects
director:/etc/lvs# echo 0 > /proc/sys/net/ipv4/conf/default/send_redirects
director:/etc/lvs# echo 0 > /proc/sys/net/ipv4/conf/eth0/send_redirects

Checking routes on the realserver -

realserver:/etc/lvs# netstat -rn
Kernel IP routing table
Destination     Gateway         Genmask         Flags   MSS Window  irtt Iface
127.0.0.0       0.0.0.0         255.0.0.0       U         0 0          0 lo
0.0.0.0         director        0.0.0.0         UG        0 0          0 eth0

nothing has changed here.

Flush the kernel routing table and show the kernel routing table -

realserver:/etc/lvs# ip route flush cache
realserver:/etc/lvs# route -C
Kernel IP routing cache
Source          Destination     Gateway         Flags Metric Ref    Use Iface
realserver      director        director              0      1        0 eth0
director        realserver      realserver      l     0      0        1 lo

There are now no routes to the client.

Now when you send packet to the client, the route stays via the director needing 2 hops to get to the client. There are no one hop packets to the client.

realserver:/etc/rc.d# traceroute client
traceroute to client (192.168.1.254), 30 hops max, 40 byte packets
 1  director (192.168.1.9)  0.951 ms  0.56 ms  0.491 ms
 2  client (192.168.1.254)  0.76 ms  0.599 ms  0.574 ms
realserver:/etc/rc.d# traceroute client
traceroute to client (192.168.1.254), 30 hops max, 40 byte packets
 1  director (192.168.1.9)  0.696 ms  0.562 ms  0.583 ms
 2  client (192.168.1.254)  0.62 ms  0.603 ms  0.576 ms
realserver:/etc/rc.d# traceroute client
traceroute to client (192.168.1.254), 30 hops max, 40 byte packets
 1  director (192.168.1.9)  0.692 ms *  0.599 ms
 2  client (192.168.1.254)  0.667 ms  0.603 ms  0.579 ms
realserver:/etc/rc.d# traceroute client
traceroute to client (192.168.1.254), 30 hops max, 40 byte packets
 1  director (192.168.1.9)  0.689 ms  0.558 ms  0.487 ms
 2  client (192.168.1.254)  0.61 ms  0.63 ms  0.567 ms
realserver:/etc/rc.d# traceroute client
traceroute to client (192.168.1.254), 30 hops max, 40 byte packets
 1  director (192.168.1.9)  0.705 ms  0.563 ms  0.526 ms
 2  client (192.168.1.254)  0.611 ms  0.595 ms *
realserver:/etc/rc.d# traceroute client
traceroute to client (192.168.1.254), 30 hops max, 40 byte packets
 1  director (192.168.1.9)  0.706 ms  0.558 ms  0.535 ms
 2  client (192.168.1.254)  0.614 ms  0.593 ms  0.573 ms

The kernel route cache

 realserver:/etc/rc.d# route -C
 Kernel IP routing cache
 Source          Destination     Gateway         Flags Metric Ref    Use Iface
 client          realserver      realserver      l     0      0       17 lo
 realserver      realserver      realserver      l     0      0        2 lo
 realserver      director        director              0      1        0 eth0
>realserver      client          director              0      0       35 eth0<
 director        realserver      realserver      l     0      0       16 lo
 director        realserver      realserver      l     0      0       63 lo

shows that the only route to the client (labelled with ><) is via the director.

For send_redirects, what's the difference between all, default and eth0?

Julian

see the LVS archives
When the kernel needs to check for a feature (e.g. send_redirects) it uses calls like:
if (IN_DEV_TX_REDIRECTS(in_dev)) ...
These macros are defined in /usr/src/linux/include/linux/inetdevice.h
The macro returns a value using expression from all/<var> and <dev>/<var>. So, these macros check for example for: all/send_redirects || eth0/send_redirects or all/hidden && eth0/hidden.
when you create eth0 for first time using ifconfig eth0 ... up default/send_redirects is copied to eth0/send_redirects from the kernel, internally. i.e. default/ contains the initial values the device inherits when it is created. This is the safest way a device to appear with correct conf/<dev>/ values.
When we put a value in all/<var> you can assume that we set the <var>. When we put value in all/<var> you can assume that we set the <var> for all devices in this way:
                all/<var>       the macro returns:
for &&          0               0
for &&          1               the value from <dev>/<var>
for ||          0               the value from <dev>/<var>
for ||          1               1
This scheme allows the different devices to have different values for their vars. e.g. if we set 0 to all/send_redirects, the 3th line applies to the values, i.e. the result from the macro is the real value in <dev>/send_redirects. If we set 1 to all/send_redirects according to the 4th line, the macro always returns 1 regardless of the <dev>/send_redirects.

how to debug/understand TCP/IP packets?

Julian

The RFC documents http://www.ietf.cnri.reston.va.us/rfc.html are your friends. The numbers you need:
793     TRANSMISSION CONTROL PROTOCOL
1122    Requirements for Internet Hosts -- Communication Layers
1812    Requirements for IP Version 4 Routers
826     An Ethernet Address Resolution Protocol
for tcpdump, see man tcpdump.

for Microsoft NT _server_

Steve (dot) Gonczi (at) networkengines (dot) com

there is a uSoft supplied packet capture utility as well.

also -W. Richard Stevens: TCP-IP Illustrated, Vol 1, a good intro into packet layouts and protocol basics. (anything by Stevens is good - Joe).

Ivan Figueredo idf (at) weewannabe (dot) com

for windump - http://netgroup-serv.polito.it/windump/

43.4. checking source routed packets

Packets leaving a LVS-DR realserver can have src_addr=VIP or src_addr=RIP. If the default gw is different for each packet, it would be nice to have a command line testing tool like ping or traceroute to test the route. The normal tools will create packets with src_addr=RIP and you won't be able to test the packets with src_addr=VIP.

Roberto Nibali ratz (at) tac (dot) ch 22 May 2001

maybe hping can help you.

Joe

Ah, the file hping2.8 is the man page i.e. {hping2}.8 - I thought it was v2.8 of hping.

How about:

ip route get $IP?

didn't know about "get". yes that works. It's like a -C with iptables. I'd still like to send a packet and see where it goes rather than getting an answer about where it is expected to go.

Julian

Not possible with src interface "lo" but possible with source address configured in "lo". Oh yes, "source interface" for some tools means "get one address from this iface and use it". In most of the cases these tools don't do the Right Thing.

from iproute2

$ ping -I src dst

arping -I if -s src dst

43.5. handling arp problem with iproute2

see Julian's notes and patches to handle the arp problem with iproute2 (this is somewhat developemental).

43.6. ip commands you mightn't know about

43.6.1. ip route get

from Julian

This will look at the routing tables and tell you the route to xxx.xxx.xxx.xxx

ip route get xxx.xxx.xxx.xxx

43.6.2. ip route append

If you already have a route from A to B, and want to add another, you can't, you have to append the extra route.

dynnema dynnema (at) yahoo (dot) com Mar 22 2002

Lets say I got one RS and two NAT DIRs.

 RS:
 RIP1:   192.168.1.2/24 dev eth0
 RIP2    192.168.2.2/24 dev eth0:10

 DIR1:
 VIP:    x.x.x.69        eth0:110
 DIP     192.168.1.1

 DIR2:
 VIP:    x.x.x.70        eth0:110
 DIP     192.168.2.1

I add the first route

ip route add src 192.168.1.2 via 192.168.1.1

but then I can't add the second route:

ip route add src 192.168.2.2 via 192.168.2.1:
"RTNETLINK answers: File exists"

Careful reading of IProute mailing list was very useful. It should be

ip route append src 192.168.2.2 via 192.168.2.1

43.7. Ratz's corrections on common iproute2/aliases misconceptions

Joe: we used to have ip aliases with ifconfig. We still have ip aliases, but as of kernel 2.1.128, the semantics has changed. Be careful using the old style ip aliases (e.g. eth0:1, lo:127) with the newer tools (e.g. iproute2), which expect a different syntax.

Ratz 25 Nov 2003

the basic problem with route/netstat -rn is, that they only see the main table, which is rather limited.
iproute2 uses labels to provide the same ip aliases as are used by ifconfig. It's not up to the tool to decide if labels work or not. The misconception people have with ip aliasing is that people think an aliased interface is a logically separated interface while it is _not_. And this is the case since 2.1.128 or so.

ipchains doesn't recognize alias neither because since the 2.2.x kernel we moved to the iproute2 architecture.

	Note
	Joe: in other parts of the HOWTO, I've incorrectly said the changeover started with the 2.4 kernels. Hopefully this error has been fixed. The change from 2.2 to 2.4 involved the different packet path through the kernel and the replacement of ipchains with netfilter (http://www.netfilter.org). Netfilter is most familiar through its user space tool iptables which defines rule set for packets.

Packet filtering on aliases stopped working after the decay of ipfwadm in the old 2.0.x kernel days. Today you can still filter on so-called ip aliases but as the name implies you specify the IP ADDRESSS as a classifier and if you want to restrict it further, you add the underlying _physical_ interface definition to the classifying rule.

iproute2 is compatible with ifconfig/route/netstat but not vice versa. The two biggest issues people new to iproute2 have to struggle with are:
- if you add secondary ip addresses without a label (alias interface) ifconfig is confused and doesn't print the information
- if you add rules for branching into different routing tables than the main routing table, route or netstat -rn will not show you those routes. This also the case for blackhole, throw, unreachable and prohibit routes.

Ratz ratz (at) drugphish (dot) ch 07 Mar 2007

Before the arrival of the Linux kernel version 2.2 a network device named eth0:3 was actually a "real" (kernel-wise) network device by the name of eth0:3. You could filter on that device and you could route on that device (please send this packet out eth0:3).

After that the Linux network model changed and the so called logical/virtual devices were degraded to aliases. The nomenclature was never standardised, so in the 2.0.x kernels, a device eth0:3 was called an alias, but it was a real independant device. In later kernels, the name alias meant "another name for". In current kernels, an alias is actually a string related to an IP address, nothing more and nothing less. It has no semantic meaning whatsoever, besides being a backwards compatible string for the ifconfig tool.

The label is optional for secondary IP addresses. Secondary IPs configured with iproute2 without an explicit label do not show up in ifconfig.

	Note
	If the first IP configured on an interface with ip addr add is 192.168.1.1/24, then any subsequent addresses in that network (192.168.1.2/24..192.168.1.254/24) will be secondary addresses and 192.168.1.1 will be the primary address. If the primary address is removed, then all secondary addresses will also be removed. If another address not in that network is added (e.g. 10.10.1.1/16 or 192.168.1.5/30) then it will be another primary address.

Going the other direction, an alias configured with ifconfig always shows up in ip addr show.

ifconfig intf:label ip.ad.dr.ess netmask ne.tm.as.k broadcast br.oa.dc.ast up

is essentially the same as

ip addr add ip.ad.dr.ess/cidr brd + dev intf label intf:label
ip link set dev intf up

Sentences like "Network aliases or IP aliases or device aliases don't work with netfilter anymore" are not correct, since it's rather the other way around, but generally not a correct sentence, since no packet filtering mechanism ever worked with pure strings :). If you want to filter on an alias, find out its corresponding IP address (using ip addr show) and filter based on the IP address and the physical underlying interface. So:

1.1.1.1 eth0
1.1.2.1 eth0:3
1.1.3.1 eth0:foobar

If you want to filter based on eth0:3, you set up a filter as follows on eth0 and 1.1.2.1 (didn't check on the correct syntax):

iptables -t filter -A INPUT -j DROP -i eth0 -s 1.1.2.1 ...

	Note
	you can't filter on eth0:3 and iptables doesn't use labels either. So you can't use interfaces in iptables rules.

A Linux host/router is behaving like modern managed (application) switch: There is no assignment anymore of IP addresses to network interface cards. An IP address is attached to the host and this confuses most people, especially when they have multiple NICs in their node and configure different IP addresses to each NIC, ping one IP address and get the reply from a seemingly different collision domain. This means, that for example even though one can be physically connected only to eth0 with IP address 192.168.1.1 and having eth1 with IP address 10.10.10.10 non-wired to a switch, he/she will be able to ping 10.10.10.10 through eth0. This is because IP addresses do not belong to network interface cards. Also one reason why you can filter per network interface card and also per IP address. In some cases, the machine will have a route to (say) 10.10.10.10, but you don't have a routing entry for /32 IP addresses and they'll still reply.

	Note
	Joe: the Marsh book, p8, explains that from 2.2.x, when you configure an IP on a NIC, the route to that network is configured as well (i.e. you don't have to add a route to the network, as you did with the 2.0.x kernels). If you don't want the route, you have to configure a host (i.e. /32) address.

There is no assignment anymore of IP addresses to network interface cards. An IP address is attached to the host and this confuses most people.

Joe

yes. So why are addresses configured with the name of the NIC (eth0, eth1)? Why not just tell the kernel which IPs it has and let it figure out what to do with all the NICs?

How?

An IP alias or logical/virtual device is simply a string, which you'll clearly see in the output of ip addr show:

root@laphish2:~# ip addr show
1: lo: <LOOPBACK,UP,10000> mtu 16436 qdisc noqueue
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00
    inet 127.0.0.1/8 scope host lo
2: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000
    link/ether 00:08:74:9d:e7:0a brd ff:ff:ff:ff:ff:ff
3: eth1: <BROADCAST,MULTICAST,UP,10000> mtu 1500 qdisc pfifo_fast qlen 1000
    link/ether 00:0b:db:22:82:53 brd ff:ff:ff:ff:ff:ff
    inet 192.168.1.32/24 brd 192.168.1.255 scope global eth1
4: vmnet8: <BROADCAST,MULTICAST,UP,10000> mtu 1500 qdisc pfifo_fast qlen 1000
    link/ether 00:50:56:c0:00:08 brd ff:ff:ff:ff:ff:ff
    inet 172.16.39.1/24 brd 172.16.39.255 scope global vmnet8
5: vmnet1: <BROADCAST,MULTICAST,UP,10000> mtu 1500 qdisc pfifo_fast qlen 1000
    link/ether 00:50:56:c0:00:01 brd ff:ff:ff:ff:ff:ff
    inet 192.168.136.1/24 brd 192.168.136.255 scope global vmnet1

If I add a new IP address to the host, I can specify a physical interface to which it will add the routing entries and that "alias" string (label in iproute2 speak):

root@laphish2:~# ip addr add 7.7.7.7/32 brd + dev eth0 label "eth0joe_bloggs"
root@laphish2:~# ip addr add 8.8.8.8/29 brd + dev eth0 label "eth0:ratzfatz"
root@laphish2:~# ip addr show dev eth0
2: eth0: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000
    link/ether 00:08:74:9d:e7:0a brd ff:ff:ff:ff:ff:ff
    inet 7.7.7.7/32 scope global eth0joe_bloggs
    inet 8.8.8.8/29 brd 8.8.8.15 scope global eth0:ratzfatz

root@laphish2:~# ip route show dev eth0 table main
8.8.8.8/29  proto kernel  scope link  src 8.8.8.8

Even the eth0 is a string with no special meaning:

root@laphish2:~# ip link set dev eth0 down
root@laphish2:~# ip link set dev eth0 name kkk
root@laphish2:~# ip addr show dev eth0
Device "eth0" does not exist.
root@laphish2:~# ip addr show dev kkk
2: kkk: <BROADCAST,MULTICAST> mtu 1500 qdisc pfifo_fast qlen 1000
    link/ether 00:08:74:9d:e7:0a brd ff:ff:ff:ff:ff:ff
    inet 7.7.7.7/32 scope global kkk
    inet 8.8.8.8/29 brd 8.8.8.15 scope global kkk:2
root@laphish2:~# ip link set dev kkk up
root@laphish2:~# ip addr show dev kkk
2: kkk: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000
    link/ether 00:08:74:9d:e7:0a brd ff:ff:ff:ff:ff:ff
    inet 7.7.7.7/32 scope global kkk
    inet 8.8.8.8/29 brd 8.8.8.15 scope global kkk:2

This should remedy the last concerns regarding the Linux networking on the link and addressing level. What is interesting though is that if you rename an existing device, the associated labels will get renames as well and enumerated, so ifconfig will find it.

And for some final fun:

root@laphish2:~# ip addr add 9.9.9.9/29 brd + dev kkk label "kkklllllvvvv"
root@laphish2:~# ifconfig
eth1      Link encap:Ethernet  HWaddr 00:0B:DB:22:82:53
          inet addr:192.168.1.32  Bcast:192.168.1.255  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:71064 errors:0 dropped:0 overruns:0 frame:0
          TX packets:59472 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:71731379 (68.4 MiB)  TX bytes:7894335 (7.5 MiB)
          Interrupt:11 Base address:0x8800

kkk       Link encap:Ethernet  HWaddr 00:08:74:9D:E7:0A
          inet addr:7.7.7.7  Bcast:0.0.0.0  Mask:255.255.255.255
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)
          Interrupt:11 Base address:0x6c00

kkk:2     Link encap:Ethernet  HWaddr 00:08:74:9D:E7:0A
          inet addr:8.8.8.8  Bcast:8.8.8.15  Mask:255.255.255.248
          UP BROADCAST MULTICAST  MTU:1500  Metric:1
          Interrupt:11 Base address:0x6c00

kkklllllvvvv: error fetching interface information: Device not found
root@laphish2:~# ip addr show dev kkk
2: kkk: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 1000
    link/ether 00:08:74:9d:e7:0a brd ff:ff:ff:ff:ff:ff
    inet 7.7.7.7/32 scope global kkk
    inet 8.8.8.8/29 brd 8.8.8.15 scope global kkk:2
    inet 9.9.9.9/29 brd 9.9.9.15 scope global kkklllllvvvv

Not only does ifconfig not show all the IP addresses configured to kkk, it also barfs about an unknown device, which actually is a label for an IP address. Thankfully we have iproute2, which displays the exact state of configuration.

43.8. Ratz's wrappers (for iproute2)

One of the problems with the iproute2 utils is that the syntax is not machine readable (and difficult for humans too). Ratz has built some wrappers around these utils.

Ratz 25 Nov 2003

If you guys are interested I'll offer my first semi-official release of some of the replacement tools I've written for ifconfig/route. You can download them from Ratz's wrappers http://www.drugphish.ch/~ratz/iproute2/

It's still not really scriptable (I wrote it with really gross bash constructs and by using external tools ;). BUT, it solves some of architectural principles, such as separation of concern, correctness, flexibility, conceptional integrity, coupling and cohesion! You are given two tools to maintain almost everything network related. (I'm aware that iptables/netfilter and mii-tool, ethtool are also network related)

ifconfig gives you the (wrong) impression that eth0:0 is an interface, just as others in the output ifconfig -a. This is not true. The iproute2 tools correctly displays the relationship between aliases/labels and their corresponding physical interface.

Example:

laphish:~ # ifconfig -a | grep -A2 eth0
eth0      Link encap:Ethernet  HWaddr 00:20:E0:68:71:3A
           inet addr:172.23.2.131  Bcast:172.23.255.255  Mask:255.255.0.0
           inet6 addr: fe80::220:e0ff:fe68:713a/64 Scope:Link
--
eth0:0    Link encap:Ethernet  HWaddr 00:20:E0:68:71:3A
           inet addr:10.98.43.233  Bcast:10.98.43.255  Mask:255.255.255.0
           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
--
eth0:foo  Link encap:Ethernet  HWaddr 00:20:E0:68:71:3A
           inet addr:10.23.7.233  Bcast:10.23.7.255  Mask:255.255.255.0
           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
laphish:~ #

Sure, one could argue that all HWaddr of those "interfaces" are the same and thus something with the interpretation of them being _real_ physical interfaces must be fishy. But it gives you the wrong idea of connection or entity relationship between link and ip layer.

Now let's compare the same output for iproute2:

laphish:~ # ip addr show dev eth0
2: eth0: <BROADCAST,MULTICAST,UP> mtu 1500 qdisc pfifo_fast qlen 100
     link/ether 00:20:e0:68:71:3a brd ff:ff:ff:ff:ff:ff
     inet 172.23.2.131/16 brd 172.23.255.255 scope global eth0
     inet 10.23.7.233/24 brd 10.23.7.255 scope global eth0:foo
     inet 10.98.43.233/24 brd 10.98.43.255 scope global eth0:0
     inet 10.239.10.1/24 brd 10.239.10.255 scope global eth0
     inet6 fe80::220:e0ff:fe68:713a/64 scope link
laphish:~ #

As you can see we have a physical interface (link layer entity) called eth0 and associated with this interface we have 5 (not 4 like with ifconfig) IP addresses. And you can certainly well spot the labels which in ifconfig were displayed as independant interfaces at the end of each line starting with inet, right?

Plus there you certainly noted that in the second output we have one additional address which was not shown in the ifconfig output but is very well routable and _is_ a valid configuration. I simply didn't want to put an alias there.

Tools like ipchains and iptables and their underlying state machine are better off matching for ip addresses and the _one_ physical interface those are attached to then trying to fiddle around with a label that is optional and doesn't give you real valuable information. Additionally with iproute2 you have a better approach to conceptional integrity which is one of the key ingredients of architectures in that you say that even if I have multiple addresses for one interface I still send out the packet through the physical interface and not through a labeled, aliased or virtual interface.

ifconfig is an example of a "hiding complexity" tool. Hiding complexity is a concept the software industry has not yet adopted to the extent that we can trust it, and thus ifconfig is broken by design.

The reasons why people still use those deprecated tools are:

All other Unices still have those and it worked reliably for 10+ years
Most Linux Distributors except (notably) SuSE haven't switched their network setup to the iproute2 concept yet.
Documentation was seriously lacking and the tools are ... uhmm to a certain degree complex and incoherent in their syntax and semantics.

Prev	Up	Next
42. LVS: Running LVS under UML (User Mode Linux), by Brett Elliot	Home	44. LVS: Weird hardware (and software)