Netdev Archive on lore.kernel.org
help / color / mirror / Atom feed
* VRRP not working on i40e X722 S2600WFT
@ 2020-08-27 18:30 Lennart Sorensen
  2020-08-28 15:56 ` Lennart Sorensen
  0 siblings, 1 reply; 5+ messages in thread
From: Lennart Sorensen @ 2020-08-27 18:30 UTC (permalink / raw)
  To: Linux Kernel Mailing List
  Cc: netdev, intel-wired-lan, Jeff Kirsher, Len Sorensen

I have hit a new problem with the X722 chipset (Intel R1304WFT server).
VRRP simply does not work.

When keepalived registers a vmac interface, and starts transmitting
multicast packets with the vrp message, it never receives those packets
from the peers, so all nodes think they are the master.  tcpdump shows
transmits, but no receives.  If I stop keepalived, which deletes the
vmac interface, then I start to receive the multicast packets from the
other nodes.  Even in promisc mode, tcpdump can't see those packets.

So it seems the hardware is dropping all packets with a source mac that
matches the source mac of the vmac interface, even when the destination
is a multicast address that was subcribed to.  This is clearly not
proper behaviour.

I tried a stock 5.8 kernel to check if a driver update helped, and updated
the nvm firware to the latest 4.10 (which appears to be over a year old),
and nothing changes the behaviour at all.

Seems other people have hit this problem too:
http://mails.dpdk.org/archives/users/2018-May/003128.html

Unless someone has a way to fix this, we will have to change away from
this hardware very quickly.  The IPsec NAT RSS defect we could tolerate
although didn't like, while this is just unworkable.

Quite frustrated by this.  Intel network hardware was always great,
how did the X722 make it out in this state.

-- 
Len Sorensen

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: VRRP not working on i40e X722 S2600WFT
  2020-08-27 18:30 VRRP not working on i40e X722 S2600WFT Lennart Sorensen
@ 2020-08-28 15:56 ` Lennart Sorensen
  2020-08-31 17:35   ` [Intel-wired-lan] " Jesse Brandeburg
  0 siblings, 1 reply; 5+ messages in thread
From: Lennart Sorensen @ 2020-08-28 15:56 UTC (permalink / raw)
  To: Linux Kernel Mailing List; +Cc: netdev, intel-wired-lan, Jeff Kirsher

On Thu, Aug 27, 2020 at 02:30:39PM -0400, Lennart Sorensen wrote:
> I have hit a new problem with the X722 chipset (Intel R1304WFT server).
> VRRP simply does not work.
> 
> When keepalived registers a vmac interface, and starts transmitting
> multicast packets with the vrp message, it never receives those packets
> from the peers, so all nodes think they are the master.  tcpdump shows
> transmits, but no receives.  If I stop keepalived, which deletes the
> vmac interface, then I start to receive the multicast packets from the
> other nodes.  Even in promisc mode, tcpdump can't see those packets.
> 
> So it seems the hardware is dropping all packets with a source mac that
> matches the source mac of the vmac interface, even when the destination
> is a multicast address that was subcribed to.  This is clearly not
> proper behaviour.
> 
> I tried a stock 5.8 kernel to check if a driver update helped, and updated
> the nvm firware to the latest 4.10 (which appears to be over a year old),
> and nothing changes the behaviour at all.
> 
> Seems other people have hit this problem too:
> http://mails.dpdk.org/archives/users/2018-May/003128.html
> 
> Unless someone has a way to fix this, we will have to change away from
> this hardware very quickly.  The IPsec NAT RSS defect we could tolerate
> although didn't like, while this is just unworkable.
> 
> Quite frustrated by this.  Intel network hardware was always great,
> how did the X722 make it out in this state.

Another case with the same problem on an X710:

https://www.talkend.net/post/13256.html

-- 
Len Sorensen

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Intel-wired-lan] VRRP not working on i40e X722 S2600WFT
  2020-08-28 15:56 ` Lennart Sorensen
@ 2020-08-31 17:35   ` Jesse Brandeburg
  2020-09-01  1:35     ` Lennart Sorensen
  0 siblings, 1 reply; 5+ messages in thread
From: Jesse Brandeburg @ 2020-08-31 17:35 UTC (permalink / raw)
  To: Lennart Sorensen; +Cc: Linux Kernel Mailing List, netdev, intel-wired-lan

Lennart Sorensen wrote:

> On Thu, Aug 27, 2020 at 02:30:39PM -0400, Lennart Sorensen wrote:
> > I have hit a new problem with the X722 chipset (Intel R1304WFT server).
> > VRRP simply does not work.
> > 
> > When keepalived registers a vmac interface, and starts transmitting
> > multicast packets with the vrp message, it never receives those packets
> > from the peers, so all nodes think they are the master.  tcpdump shows
> > transmits, but no receives.  If I stop keepalived, which deletes the
> > vmac interface, then I start to receive the multicast packets from the
> > other nodes.  Even in promisc mode, tcpdump can't see those packets.
> > 
> > So it seems the hardware is dropping all packets with a source mac that
> > matches the source mac of the vmac interface, even when the destination
> > is a multicast address that was subcribed to.  This is clearly not
> > proper behaviour.

Thanks for the report Lennart, I understand your frustration, as this
should probably work without user configuration.

However, please give this command a try:
ethtool --set-priv-flags ethX disable-source-pruning on


> > I tried a stock 5.8 kernel to check if a driver update helped, and updated
> > the nvm firware to the latest 4.10 (which appears to be over a year old),
> > and nothing changes the behaviour at all.
> > 
> > Seems other people have hit this problem too:
> > http://mails.dpdk.org/archives/users/2018-May/003128.html
> > 
> > Unless someone has a way to fix this, we will have to change away from
> > this hardware very quickly.  The IPsec NAT RSS defect we could tolerate
> > although didn't like, while this is just unworkable.
> > 
> > Quite frustrated by this.  Intel network hardware was always great,
> > how did the X722 make it out in this state.
> 
> Another case with the same problem on an X710:
> 
> https://www.talkend.net/post/13256.html

I don't know how to reply to this other thread, but it is about DPDK,
which would require a code change or further investigation to issue the
right command to the hardware to disable source pruning.


^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Intel-wired-lan] VRRP not working on i40e X722 S2600WFT
  2020-08-31 17:35   ` [Intel-wired-lan] " Jesse Brandeburg
@ 2020-09-01  1:35     ` Lennart Sorensen
  2020-09-02 13:47       ` Lennart Sorensen
  0 siblings, 1 reply; 5+ messages in thread
From: Lennart Sorensen @ 2020-09-01  1:35 UTC (permalink / raw)
  To: Jesse Brandeburg; +Cc: Linux Kernel Mailing List, netdev, intel-wired-lan

On Mon, Aug 31, 2020 at 10:35:12AM -0700, Jesse Brandeburg wrote:
> Thanks for the report Lennart, I understand your frustration, as this
> should probably work without user configuration.
> 
> However, please give this command a try:
> ethtool --set-priv-flags ethX disable-source-pruning on

Hmm, our 4.9 kernel is just a touch too old to support that.  And yes
that really should not require a flag to be set, given the card has no
reason to ever do that pruning.  There is no justification you could
have for doing it in the first place.

-- 
Len Sorensen

^ permalink raw reply	[flat|nested] 5+ messages in thread

* Re: [Intel-wired-lan] VRRP not working on i40e X722 S2600WFT
  2020-09-01  1:35     ` Lennart Sorensen
@ 2020-09-02 13:47       ` Lennart Sorensen
  0 siblings, 0 replies; 5+ messages in thread
From: Lennart Sorensen @ 2020-09-02 13:47 UTC (permalink / raw)
  To: Jesse Brandeburg; +Cc: Linux Kernel Mailing List, netdev, intel-wired-lan

On Mon, Aug 31, 2020 at 09:35:19PM -0400,  wrote:
> On Mon, Aug 31, 2020 at 10:35:12AM -0700, Jesse Brandeburg wrote:
> > Thanks for the report Lennart, I understand your frustration, as this
> > should probably work without user configuration.
> > 
> > However, please give this command a try:
> > ethtool --set-priv-flags ethX disable-source-pruning on
> 
> Hmm, our 4.9 kernel is just a touch too old to support that.  And yes
> that really should not require a flag to be set, given the card has no
> reason to ever do that pruning.  There is no justification you could
> have for doing it in the first place.

So backporting the patch that enabled that flag does allow it to work.
Of course there isn't a particularly good place to put an ethtool command
in the boot up to make sure it runs before vrrp is started.  This has to
be the default. I know I wasted about a week trying things to get this to
work, and clearly lots of other people have wasted a ton of time on this
"feature" too (calling it a feature is clearly wrong, it is a bug).

By default the NIC should work as expected.  Any weird questionable
optimizations have to be turned on by the user explicitly when they
understand the consequences.  I can't find any use case documented
anywhere for this bug, I can only find things it has broken (like
apparently arp monitoring on bonding, and vrrp).

So who should make the patch to change this to be the default?  Clearly
the current behaviour is harming and confusing more people than could
possibly be impacted by changing the current default setting for the flag
(in fact I would just about be willing to bet there are no people that
want the current behaviour.  After all no other NIC does this, so clearly
there is no need for it to be done).

-- 
Len Sorensen

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2020-09-02 13:51 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-08-27 18:30 VRRP not working on i40e X722 S2600WFT Lennart Sorensen
2020-08-28 15:56 ` Lennart Sorensen
2020-08-31 17:35   ` [Intel-wired-lan] " Jesse Brandeburg
2020-09-01  1:35     ` Lennart Sorensen
2020-09-02 13:47       ` Lennart Sorensen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).