Netdev Archive on
help / color / mirror / Atom feed
From: (Lennart Sorensen)
	Len Sorensen <>,
	"David S. Miller" <>,
	Jakub Kicinski <>
Subject: macvlan breaks garp for vrrp
Date: Thu, 15 Jul 2021 11:37:20 -0400	[thread overview]
Message-ID: <> (raw)

We are seeing an occational problem where a node in a cluster running
vrrp stops being able to send packets to the current master due to having
the wrong arp entry for the virtual IP.  It appears that what happens
is that if around the time vrrp changes who the master is, a packet is
sent from the new master to another node, that node will learn the mac
of the virtual IP as the mac of the physical interface of the master
(since it always transmits from the physical mac, not the virtual mac).
In most cases nodes seem to correctly send an arp request for the
virtual IP and get back the virtual mac (from the maclan interface
used by keepalived with use_vmac enabled), in which case all is fine.
Only if the timing is unlucky do you end up learning the wrong mac and
then you can no longer send packets back to the master since it doesn't
accept packets on the physical interface mac, only the virtual mac.

Of course one would have thought that the garp sent by the master would
take care of the problem, but unfortunately the macvlan code is not
allowing garp packets with a source mac macthing the macvlan interface to
reach the physical interface.  The code causing this is below.  keepalived
uses MACVLAN_MODE_PRIVATE which does not match the condition, so the code
tries to do its work and consumes the packet, and the physical interface
hence never gets to see it, and the arp table is hence not corrected.
I can't actually make sense of what this code is trying to do in this
case where the source mac of the packet matches the macvlan interface,
even though it was received from outside (from the current master in
the vrrp cluster).

        port = macvlan_port_get_rcu(skb->dev);
          if (is_multicast_ether_addr(eth->h_dest)) {
                  unsigned int hash;

                  skb = ip_check_defrag(dev_net(skb->dev), skb, IP_DEFRAG_MACVLAN);                                                                                           if (!skb)
                          return RX_HANDLER_CONSUMED;
                  *pskb = skb;
                  eth = eth_hdr(skb);
                  if (macvlan_forward_source(skb, port, eth->h_source))
                          return RX_HANDLER_CONSUMED;
                  src = macvlan_hash_lookup(port, eth->h_source);  <- this of course has a match
                  if (src && src->mode != MACVLAN_MODE_VEPA &&     <- so this is true and eats the packet
                      src->mode != MACVLAN_MODE_BRIDGE) {
                          /* forward to original port. */
                          vlan = src;
                          ret = macvlan_broadcast_one(skb, vlan, eth, 0) ?:
                          handle_res = RX_HANDLER_CONSUMED;
                          goto out;

                  hash = mc_hash(NULL, eth->h_dest);
                  if (test_bit(hash, port->mc_filter))
                          macvlan_broadcast_enqueue(port, src, skb);

                  return RX_HANDLER_PASS;

If I patch it to do:

                   if (src && src->mode != MACVLAN_MODE_VEPA &&
+                      src->mode != MACVLAN_MODE_PRIVATE &&
                       src->mode != MACVLAN_MODE_BRIDGE) {

the problem goes away, but since I don't understand what the idea of
this code was, that doesn't feel safe.

Can someone that understands the point of this code perhaps give an idea
how to correctly fix this?  Definitely packets received with the same
mac address as this macvlan should still be able to make it through to
the underlying interface, since that is how vrrp is supposed to operate
per the RFC.  Having the wrong arp entry is bad since each time we
receive a packet we increase the timeout so it never ages out, even
though it is the wrong mac address for actually replying (due to how
macvlan interfaces seem to work).

So far this doesn't seem to happen very often since the timing has to
be just "wrong" on the packets, but I have now seen it 4 or 5 times in
the last year.  The solution each time was to manually delete the arp
entry for the virtual IP on the affected node, after which it correctly
arp'd for the right mac and everything worked again.

To test it I would manually set a temp arp entry using the physical mac of
the interface on the master as the arp entry, and see if the garp ever
fixed it, and it did not until I made my small patch, after which it
does fix the arp table reliably as far as I can tell.

Len Sorensen

             reply	other threads:[~2021-07-15 15:45 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2021-07-15 15:37 Lennart Sorensen [this message]
2021-08-05 18:09 ` Lennart Sorensen

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \ \ \ \ \ \ \
    --subject='Re: macvlan breaks garp for vrrp' \

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).