Netdev Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Xie He <xie.he.0141@gmail.com>
To: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>,
	Jakub Kicinski <kuba@kernel.org>,
	John Ogness <john.ogness@linutronix.de>,
	Eric Dumazet <edumazet@google.com>,
	Or Cohen <orcohen@paloaltonetworks.com>,
	Arnd Bergmann <arnd@arndb.de>,
	Network Development <netdev@vger.kernel.org>,
	linux-kernel <linux-kernel@vger.kernel.org>,
	Eric Dumazet <eric.dumazet@gmail.com>,
	Brian Norris <briannorris@chromium.org>,
	Cong Wang <xiyou.wangcong@gmail.com>
Subject: Re: [PATCH net] net/packet: Fix a comment about hard_header_len and headroom allocation
Date: Mon, 7 Sep 2020 17:07:17 -0700	[thread overview]
Message-ID: <CAJht_EO13aYPXBV7sEgOTuUhuHFTFFfdg7NBN2cEKAo6LK0DMQ@mail.gmail.com> (raw)
In-Reply-To: <CA+FuTSfOeMB7Wv1t12VCTOqPYcTLq2WKdG4AJUO=gxotVRZiQw@mail.gmail.com>

Thank you for your comment!

On Mon, Sep 7, 2020 at 2:41 AM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> On Sun, Sep 6, 2020 at 5:18 AM Xie He <xie.he.0141@gmail.com> wrote:
> >
> > This comment is outdated and no longer reflects the actual implementation
> > of af_packet.c.
>
> If it was previously true, can you point to a commit that changes the behavior?

This is my understanding about the history of "af_packet.c":

1. Pre git history

At first, before "needed_headroom" was introduced, "hard_header_len"
was the only way for a driver to request headroom. However,
"hard_header_len" was also used in "af_packet.c" for processing the
header. There was a confusion / disagreement between "af_packet.c"
developers and driver developers about the use of "hard_header_len".
"af_packet.c" developers would assume that all headers were visible to
them through dev->header_ops (called dev->hard_header at that time?).
But the developers of some drivers were not able to expose all their
headers to "af_packet.c" through header_ops (for example, in tunnel
drivers). These drivers still requested the headroom via
"hard_header_len" but this created bugs for "af_packet.c" because
"af_packet.c" would assume "hard_header_len" was the length of the
header visible to them through header_ops.

Therefore, in Linux version 2.1.43pre1, the FIXME comment was added.
In this comment, "af_packet.c" developers clearly stated that not
exposing the header through header_ops was a bug that needed to be
fixed in the drivers. But I think driver developers were not able to
agree because some drivers really had a need to add their own header
without using header_ops (for example in tunnel drivers).

In Linux version 2.1.68, the developer of "af_packet.c" compromised
and recognized the use of "hard_header_len" even when there is no
header_ops, by adding the comment I'm trying to change now. But I
guess some other developers of "af_packet.c" continued to treat
"hard_header_len" to be the length of header of header_ops and created
a lot of problems.

2. Introduction of "needed_headroom"

Because this issue has troubled for developers for long, in 2008,
developers introduced "needed_headroom" to solve this problem.
"needed_headroom" has only one purpose - reserve headroom. It is not
used in af_packet.c for processing so drivers can safely use it to
request headroom without exposing the header via header_ops.

The commit was:
commit f5184d267c1a ("net: Allow netdevices to specify needed head/tailroom")

After "needed_headroom" was introduced, all drivers that needed to
reserve the headroom but didn't want "af_packet.c" to interfere should
change to "needed_headroom".

From this point on, "af_packet.c" developers were able to assume
"hard_header_len" was only used for header processing purposes in
"af_packet.c".

3. Not reserving the headroom of hard_header_len for RAW sockets

Another very important point in history is these two commits in 2018:
commit b84bbaf7a6c8 ("packet: in packet_snd start writing at link
layer allocation")
commit 9aad13b087ab ("packet: fix reserve calculation")

These two commits changed packet_snd to the present state and made it
no long reserve the headroom of hard_header_len for RAW sockets. This
made drivers' switching from hard_header_len to needed_headroom became
urgent because otherwise they might have a kernel panic when used with
RAW sockets.

> > In this file, the function packet_snd first reserves a headroom of
> > length (dev->hard_header_len + dev->needed_headroom).
> > Then if the socket is a SOCK_DGRAM socket, it calls dev_hard_header,
> > which calls dev->header_ops->create, to create the link layer header.
> > If the socket is a SOCK_RAW socket, it "un-reserves" a headroom of
> > length (dev->hard_header_len), and checks if the user has provided a
> > header of length (dev->hard_header_len) (in dev_validate_header).
>
> Not entirely, a header greater than dev->min_header_len that passes
> dev_validate_header.

Yes, I understand. The function checks both hard_header_len and
min_header_len. I want to explain the role of hard_header_len in
dev_validate_header. But I feel a little hard to concisely explain
this without simplifying a little bit.

> >  /*
> >     Assumptions:
> > -   - if device has no dev->hard_header routine, it adds and removes ll header
> > -     inside itself. In this case ll header is invisible outside of device,
> > -     but higher levels still should reserve dev->hard_header_len.
> > -     Some devices are enough clever to reallocate skb, when header
> > -     will not fit to reserved space (tunnel), another ones are silly
> > -     (PPP).
> > +   - If the device has no dev->header_ops, there is no LL header visible
> > +     outside of the device. In this case, its hard_header_len should be 0.
>
> Such a constraint is more robustly captured with a compile time
> BUILD_BUG_ON check. Please do add a comment that summarizes why the
> invariant holds.

I'm not sure how to do this. I guess both header_ops and
hard_header_len are assigned at runtime. (Right?) I guess we are not
able to check this at compile-time.

> More about the older comment, but if reusing: it's not entirely clear
> to me what "outside of the device" means. The upper layers that
> receive data from the device and send data to it, including
> packet_snd, I suppose? Not the lower layers, clearly. Maybe that can
> be more specific.

Yes, right. If a header is visible "outside of the device", it means
the header is exposed to upper layers via "header_ops". If a header is
not visible "outside of the device" and is only used "internally", it
means the header is not exposed to upper layers via "header_ops".
Maybe we can change it to "outside of the device driver"? We can
borrow the idea of encapsulation in object-oriented programming - some
things that happen inside a software component should not be visible
outside of that software component.

  reply	other threads:[~2020-09-08  0:07 UTC|newest]

Thread overview: 6+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-09-06  3:18 Xie He
2020-09-07  9:40 ` Willem de Bruijn
2020-09-08  0:07   ` Xie He [this message]
2020-09-08  8:55     ` Willem de Bruijn
2020-09-08 11:27       ` Xie He
2020-09-08 13:55         ` Willem de Bruijn

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAJht_EO13aYPXBV7sEgOTuUhuHFTFFfdg7NBN2cEKAo6LK0DMQ@mail.gmail.com \
    --to=xie.he.0141@gmail.com \
    --cc=arnd@arndb.de \
    --cc=briannorris@chromium.org \
    --cc=davem@davemloft.net \
    --cc=edumazet@google.com \
    --cc=eric.dumazet@gmail.com \
    --cc=john.ogness@linutronix.de \
    --cc=kuba@kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=netdev@vger.kernel.org \
    --cc=orcohen@paloaltonetworks.com \
    --cc=willemdebruijn.kernel@gmail.com \
    --cc=xiyou.wangcong@gmail.com \
    --subject='Re: [PATCH net] net/packet: Fix a comment about hard_header_len and headroom allocation' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).