Netdev Archive on lore.kernel.org help / color / mirror / Atom feed
* [PATCH RFC 0/9] sk_buff: optimize layout for GRO @ 2021-07-21 16:44 Paolo Abeni 2021-07-21 18:15 ` Casey Schaufler 0 siblings, 1 reply; 14+ messages in thread From: Paolo Abeni @ 2021-07-21 16:44 UTC (permalink / raw) To: netdev Cc: David S. Miller, Jakub Kicinski, Florian Westphal, Eric Dumazet, linux-security-module, selinux This is a very early draft - in a different world would be replaced by hallway discussion at in-person conference - aimed at outlining some ideas and collect feedback on the overall outlook. There are still bugs to be fixed, more test and benchmark need, etc. There are 3 main goals: - [try to] avoid the overhead for uncommon conditions at GRO time (patches 1-4) - enable backpressure for the veth GRO path (patches 5-6) - reduce the number of cacheline used by the sk_buff lifecycle from 4 to 3, at least in some common scenarios (patches 1,7-9). The idea here is avoid the initialization of some fields and control their validity with a bitmask, as presented by at least Florian and Jesper in the past. The above requires a bit of code churn in some places and, yes, a few new bits in the sk_buff struct (using some existing holes) Paolo Abeni (9): sk_buff: track nfct status in newly added skb->_state sk_buff: track dst status in skb->_state sk_buff: move the active_extensions into the state bitfield net: optimize GRO for the common case. skbuff: introduce has_sk state bit. veth: use skb_prepare_for_gro() sk_buff: move inner header fields after tail sk_buff: move vlan field after tail. sk_buff: access secmark via getter/setter drivers/net/veth.c | 2 +- include/linux/skbuff.h | 117 ++++++++++++++++++++++--------- include/net/dst.h | 3 + include/net/sock.h | 9 +++ net/core/dev.c | 31 +++++--- net/core/skbuff.c | 40 +++++++---- net/netfilter/nfnetlink_queue.c | 6 +- net/netfilter/nft_meta.c | 6 +- net/netfilter/xt_CONNSECMARK.c | 8 +-- net/netfilter/xt_SECMARK.c | 2 +- security/apparmor/lsm.c | 15 ++-- security/selinux/hooks.c | 10 +-- security/smack/smack_lsm.c | 4 +- security/smack/smack_netfilter.c | 4 +- 14 files changed, 175 insertions(+), 82 deletions(-) -- 2.26.3 ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH RFC 0/9] sk_buff: optimize layout for GRO 2021-07-21 16:44 [PATCH RFC 0/9] sk_buff: optimize layout for GRO Paolo Abeni @ 2021-07-21 18:15 ` Casey Schaufler 2021-07-22 7:10 ` Paolo Abeni 0 siblings, 1 reply; 14+ messages in thread From: Casey Schaufler @ 2021-07-21 18:15 UTC (permalink / raw) To: Paolo Abeni, netdev Cc: David S. Miller, Jakub Kicinski, Florian Westphal, Eric Dumazet, linux-security-module, selinux, Casey Schaufler On 7/21/2021 9:44 AM, Paolo Abeni wrote: > This is a very early draft - in a different world would be > replaced by hallway discussion at in-person conference - aimed at > outlining some ideas and collect feedback on the overall outlook. > There are still bugs to be fixed, more test and benchmark need, etc. > > There are 3 main goals: > - [try to] avoid the overhead for uncommon conditions at GRO time > (patches 1-4) > - enable backpressure for the veth GRO path (patches 5-6) > - reduce the number of cacheline used by the sk_buff lifecycle > from 4 to 3, at least in some common scenarios (patches 1,7-9). > The idea here is avoid the initialization of some fields and > control their validity with a bitmask, as presented by at least > Florian and Jesper in the past. If I understand correctly, you're creating an optimized case which excludes ct, secmark, vlan and UDP tunnel. Is this correct, and if so, why those particular fields? What impact will this have in the non-optimal (with any of the excluded fields) case? > > The above requires a bit of code churn in some places and, yes, > a few new bits in the sk_buff struct (using some existing holes) > > Paolo Abeni (9): > sk_buff: track nfct status in newly added skb->_state > sk_buff: track dst status in skb->_state > sk_buff: move the active_extensions into the state bitfield > net: optimize GRO for the common case. > skbuff: introduce has_sk state bit. > veth: use skb_prepare_for_gro() > sk_buff: move inner header fields after tail > sk_buff: move vlan field after tail. > sk_buff: access secmark via getter/setter > > drivers/net/veth.c | 2 +- > include/linux/skbuff.h | 117 ++++++++++++++++++++++--------- > include/net/dst.h | 3 + > include/net/sock.h | 9 +++ > net/core/dev.c | 31 +++++--- > net/core/skbuff.c | 40 +++++++---- > net/netfilter/nfnetlink_queue.c | 6 +- > net/netfilter/nft_meta.c | 6 +- > net/netfilter/xt_CONNSECMARK.c | 8 +-- > net/netfilter/xt_SECMARK.c | 2 +- > security/apparmor/lsm.c | 15 ++-- > security/selinux/hooks.c | 10 +-- > security/smack/smack_lsm.c | 4 +- > security/smack/smack_netfilter.c | 4 +- > 14 files changed, 175 insertions(+), 82 deletions(-) > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH RFC 0/9] sk_buff: optimize layout for GRO 2021-07-21 18:15 ` Casey Schaufler @ 2021-07-22 7:10 ` Paolo Abeni 2021-07-22 16:04 ` Casey Schaufler 0 siblings, 1 reply; 14+ messages in thread From: Paolo Abeni @ 2021-07-22 7:10 UTC (permalink / raw) To: Casey Schaufler, netdev Cc: David S. Miller, Jakub Kicinski, Florian Westphal, Eric Dumazet, linux-security-module, selinux Hello, On Wed, 2021-07-21 at 11:15 -0700, Casey Schaufler wrote: > On 7/21/2021 9:44 AM, Paolo Abeni wrote: > > This is a very early draft - in a different world would be > > replaced by hallway discussion at in-person conference - aimed at > > outlining some ideas and collect feedback on the overall outlook. > > There are still bugs to be fixed, more test and benchmark need, etc. > > > > There are 3 main goals: > > - [try to] avoid the overhead for uncommon conditions at GRO time > > (patches 1-4) > > - enable backpressure for the veth GRO path (patches 5-6) > > - reduce the number of cacheline used by the sk_buff lifecycle > > from 4 to 3, at least in some common scenarios (patches 1,7-9). > > The idea here is avoid the initialization of some fields and > > control their validity with a bitmask, as presented by at least > > Florian and Jesper in the past. > > If I understand correctly, you're creating an optimized case > which excludes ct, secmark, vlan and UDP tunnel. Is this correct, > and if so, why those particular fields? What impact will this have > in the non-optimal (with any of the excluded fields) case? Thank you for the feedback. There are 2 different relevant points: - the GRO stage. packets carring any of CT, dst, sk or skb_ext will do 2 additional conditionals per gro_receive WRT the current code. My understanding is that having any of such field set at GRO receive time is quite exceptional for real nic. All others packet will do 4 or 5 less conditionals, and will traverse a little less code. - sk_buff lifecycle * packets carrying vlan and UDP will not see any differences: sk_buff lifecycle will stil use 4 cachelines, as currently does, and no additional conditional is introduced. * packets carring nfct or secmark will see an additional conditional every time such field is accessed. The number of cacheline used will still be 4, as in the current code. My understanding is that when such access happens, there is already a relevant amount of "additional" code to be executed, the conditional overhead should not be measurable. Cheers, Paolo ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH RFC 0/9] sk_buff: optimize layout for GRO 2021-07-22 7:10 ` Paolo Abeni @ 2021-07-22 16:04 ` Casey Schaufler 2021-07-22 16:57 ` Paolo Abeni 0 siblings, 1 reply; 14+ messages in thread From: Casey Schaufler @ 2021-07-22 16:04 UTC (permalink / raw) To: Paolo Abeni, netdev Cc: David S. Miller, Jakub Kicinski, Florian Westphal, Eric Dumazet, linux-security-module, selinux, Casey Schaufler On 7/22/2021 12:10 AM, Paolo Abeni wrote: > Hello, > > On Wed, 2021-07-21 at 11:15 -0700, Casey Schaufler wrote: >> On 7/21/2021 9:44 AM, Paolo Abeni wrote: >>> This is a very early draft - in a different world would be >>> replaced by hallway discussion at in-person conference - aimed at >>> outlining some ideas and collect feedback on the overall outlook. >>> There are still bugs to be fixed, more test and benchmark need, etc. >>> >>> There are 3 main goals: >>> - [try to] avoid the overhead for uncommon conditions at GRO time >>> (patches 1-4) >>> - enable backpressure for the veth GRO path (patches 5-6) >>> - reduce the number of cacheline used by the sk_buff lifecycle >>> from 4 to 3, at least in some common scenarios (patches 1,7-9). >>> The idea here is avoid the initialization of some fields and >>> control their validity with a bitmask, as presented by at least >>> Florian and Jesper in the past. >> If I understand correctly, you're creating an optimized case >> which excludes ct, secmark, vlan and UDP tunnel. Is this correct, >> and if so, why those particular fields? What impact will this have >> in the non-optimal (with any of the excluded fields) case? > Thank you for the feedback. You're most welcome. You did request comments. > > There are 2 different relevant points: > > - the GRO stage. > packets carring any of CT, dst, sk or skb_ext will do 2 additional > conditionals per gro_receive WRT the current code. My understanding is > that having any of such field set at GRO receive time is quite > exceptional for real nic. All others packet will do 4 or 5 less > conditionals, and will traverse a little less code. > > - sk_buff lifecycle > * packets carrying vlan and UDP will not see any differences: sk_buff > lifecycle will stil use 4 cachelines, as currently does, and no > additional conditional is introduced. > * packets carring nfct or secmark will see an additional conditional > every time such field is accessed. The number of cacheline used will > still be 4, as in the current code. My understanding is that when such > access happens, there is already a relevant amount of "additional" code > to be executed, the conditional overhead should not be measurable. I'm responsible for some of that "additonal" code. If the secmark is considered to be outside the performance critical data there are changes I would like to make that will substantially improve the performance of that "additional" code that would include a u64 secmark. If use of a secmark is considered indicative of a "slow" path, the rationale for restricting it to u32, that it might impact the "usual" case performance, seems specious. I can't say that I understand all the nuances and implications involved. It does appear that the changes you've suggested could negate the classic argument that requires the u32 secmark. > > Cheers, > > Paolo > ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH RFC 0/9] sk_buff: optimize layout for GRO 2021-07-22 16:04 ` Casey Schaufler @ 2021-07-22 16:57 ` Paolo Abeni 2021-07-22 18:41 ` Paul Moore 0 siblings, 1 reply; 14+ messages in thread From: Paolo Abeni @ 2021-07-22 16:57 UTC (permalink / raw) To: Casey Schaufler, netdev Cc: David S. Miller, Jakub Kicinski, Florian Westphal, Eric Dumazet, linux-security-module, selinux On Thu, 2021-07-22 at 09:04 -0700, Casey Schaufler wrote: > On 7/22/2021 12:10 AM, Paolo Abeni wrote: > > On Wed, 2021-07-21 at 11:15 -0700, Casey Schaufler wrote: > > > On 7/21/2021 9:44 AM, Paolo Abeni wrote: > > > > This is a very early draft - in a different world would be > > > > replaced by hallway discussion at in-person conference - aimed at > > > > outlining some ideas and collect feedback on the overall outlook. > > > > There are still bugs to be fixed, more test and benchmark need, etc. > > > > > > > > There are 3 main goals: > > > > - [try to] avoid the overhead for uncommon conditions at GRO time > > > > (patches 1-4) > > > > - enable backpressure for the veth GRO path (patches 5-6) > > > > - reduce the number of cacheline used by the sk_buff lifecycle > > > > from 4 to 3, at least in some common scenarios (patches 1,7-9). > > > > The idea here is avoid the initialization of some fields and > > > > control their validity with a bitmask, as presented by at least > > > > Florian and Jesper in the past. > > > If I understand correctly, you're creating an optimized case > > > which excludes ct, secmark, vlan and UDP tunnel. Is this correct, > > > and if so, why those particular fields? What impact will this have > > > in the non-optimal (with any of the excluded fields) case? > > Thank you for the feedback. > > You're most welcome. You did request comments. > > > There are 2 different relevant points: > > > > - the GRO stage. > > packets carring any of CT, dst, sk or skb_ext will do 2 additional > > conditionals per gro_receive WRT the current code. My understanding is > > that having any of such field set at GRO receive time is quite > > exceptional for real nic. All others packet will do 4 or 5 less > > conditionals, and will traverse a little less code. > > > > - sk_buff lifecycle > > * packets carrying vlan and UDP will not see any differences: sk_buff > > lifecycle will stil use 4 cachelines, as currently does, and no > > additional conditional is introduced. > > * packets carring nfct or secmark will see an additional conditional > > every time such field is accessed. The number of cacheline used will > > still be 4, as in the current code. My understanding is that when such > > access happens, there is already a relevant amount of "additional" code > > to be executed, the conditional overhead should not be measurable. > > I'm responsible for some of that "additonal" code. If the secmark > is considered to be outside the performance critical data there are > changes I would like to make that will substantially improve the > performance of that "additional" code that would include a u64 > secmark. If use of a secmark is considered indicative of a "slow" > path, the rationale for restricting it to u32, that it might impact > the "usual" case performance, seems specious. I can't say that I > understand all the nuances and implications involved. It does > appear that the changes you've suggested could negate the classic > argument that requires the u32 secmark. I see now I did not reply to one of you questions - why I picked-up vlan, tunnel secmark fields to move them at sk_buff tail. Tow main drivers on my side: - there are use cases/deployments that do not use them. - moving them around was doable in term of required changes. There are no "slow-path" implications on my side. For example, vlan_* fields are very critical performance wise, if the traffic is tagged. But surely there are busy servers not using tagget traffic which will enjoy the reduced cachelines footprint, and this changeset will not impact negatively the first case. WRT to the vlan example, secmark and nfct require an extra conditional to fetch the data. My understanding is that such additional conditional is not measurable performance-wise when benchmarking the security modules (or conntrack) because they have to do much more intersting things after fetching a few bytes from an already hot cacheline. Not sure if the above somehow clarify my statements. As for expanding secmark to 64 bits, I guess that could be an interesting follow-up discussion :) Cheers, Paolo ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH RFC 0/9] sk_buff: optimize layout for GRO 2021-07-22 16:57 ` Paolo Abeni @ 2021-07-22 18:41 ` Paul Moore 2021-07-24 18:51 ` Florian Westphal 0 siblings, 1 reply; 14+ messages in thread From: Paul Moore @ 2021-07-22 18:41 UTC (permalink / raw) To: Paolo Abeni, Casey Schaufler Cc: netdev, David S. Miller, Jakub Kicinski, Florian Westphal, Eric Dumazet, linux-security-module, selinux On Thu, Jul 22, 2021 at 12:59 PM Paolo Abeni <pabeni@redhat.com> wrote: > On Thu, 2021-07-22 at 09:04 -0700, Casey Schaufler wrote: > > On 7/22/2021 12:10 AM, Paolo Abeni wrote: > > > On Wed, 2021-07-21 at 11:15 -0700, Casey Schaufler wrote: > > > > On 7/21/2021 9:44 AM, Paolo Abeni wrote: > > > > > This is a very early draft - in a different world would be > > > > > replaced by hallway discussion at in-person conference - aimed at > > > > > outlining some ideas and collect feedback on the overall outlook. > > > > > There are still bugs to be fixed, more test and benchmark need, etc. > > > > > > > > > > There are 3 main goals: > > > > > - [try to] avoid the overhead for uncommon conditions at GRO time > > > > > (patches 1-4) > > > > > - enable backpressure for the veth GRO path (patches 5-6) > > > > > - reduce the number of cacheline used by the sk_buff lifecycle > > > > > from 4 to 3, at least in some common scenarios (patches 1,7-9). > > > > > The idea here is avoid the initialization of some fields and > > > > > control their validity with a bitmask, as presented by at least > > > > > Florian and Jesper in the past. > > > > If I understand correctly, you're creating an optimized case > > > > which excludes ct, secmark, vlan and UDP tunnel. Is this correct, > > > > and if so, why those particular fields? What impact will this have > > > > in the non-optimal (with any of the excluded fields) case? > > > Thank you for the feedback. > > > > You're most welcome. You did request comments. > > > > > There are 2 different relevant points: > > > > > > - the GRO stage. > > > packets carring any of CT, dst, sk or skb_ext will do 2 additional > > > conditionals per gro_receive WRT the current code. My understanding is > > > that having any of such field set at GRO receive time is quite > > > exceptional for real nic. All others packet will do 4 or 5 less > > > conditionals, and will traverse a little less code. > > > > > > - sk_buff lifecycle > > > * packets carrying vlan and UDP will not see any differences: sk_buff > > > lifecycle will stil use 4 cachelines, as currently does, and no > > > additional conditional is introduced. > > > * packets carring nfct or secmark will see an additional conditional > > > every time such field is accessed. The number of cacheline used will > > > still be 4, as in the current code. My understanding is that when such > > > access happens, there is already a relevant amount of "additional" code > > > to be executed, the conditional overhead should not be measurable. > > > > I'm responsible for some of that "additonal" code. If the secmark > > is considered to be outside the performance critical data there are > > changes I would like to make that will substantially improve the > > performance of that "additional" code that would include a u64 > > secmark. If use of a secmark is considered indicative of a "slow" > > path, the rationale for restricting it to u32, that it might impact > > the "usual" case performance, seems specious. I can't say that I > > understand all the nuances and implications involved. It does > > appear that the changes you've suggested could negate the classic > > argument that requires the u32 secmark. > > I see now I did not reply to one of you questions - why I picked-up > vlan, tunnel secmark fields to move them at sk_buff tail. > > Tow main drivers on my side: > - there are use cases/deployments that do not use them. > - moving them around was doable in term of required changes. > > There are no "slow-path" implications on my side. For example, vlan_* > fields are very critical performance wise, if the traffic is tagged. > But surely there are busy servers not using tagget traffic which will > enjoy the reduced cachelines footprint, and this changeset will not > impact negatively the first case. > > WRT to the vlan example, secmark and nfct require an extra conditional > to fetch the data. My understanding is that such additional conditional > is not measurable performance-wise when benchmarking the security > modules (or conntrack) because they have to do much more intersting > things after fetching a few bytes from an already hot cacheline. > > Not sure if the above somehow clarify my statements. > > As for expanding secmark to 64 bits, I guess that could be an > interesting follow-up discussion :) The intersection between netdev and the LSM has a long and somewhat tortured past with each party making sacrifices along the way to get where we are at today. It is far from perfect, at least from a LSM perspective, but it is what we've got and since performance is usually used as a club to beat back any changes proposed by the LSM side, I would like to object to these changes that negatively impact the LSM performance without some concession in return. It has been a while since Casey and I have spoken about this, but I think the prefered option would be to exchange the current __u32 "sk_buff.secmark" field with a void* "sk_buff.security" field, like so many other kernel level objects. Previous objections have eventually boiled down to the additional space in the sk_buff for the extra bits (there is some additional editorializing that could be done here, but I'll refrain), but based on the comments thus far in this thread it sounds like perhaps we can now make a deal here: move the LSM field down to a "colder" cacheline in exchange for converting the LSM field to a proper pointer. Thoughts? -- paul moore www.paul-moore.com ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH RFC 0/9] sk_buff: optimize layout for GRO 2021-07-22 18:41 ` Paul Moore @ 2021-07-24 18:51 ` Florian Westphal 2021-07-25 14:57 ` Paul Moore 0 siblings, 1 reply; 14+ messages in thread From: Florian Westphal @ 2021-07-24 18:51 UTC (permalink / raw) To: Paul Moore Cc: Paolo Abeni, Casey Schaufler, netdev, David S. Miller, Jakub Kicinski, Florian Westphal, Eric Dumazet, linux-security-module, selinux Paul Moore <paul@paul-moore.com> wrote: > Tow main drivers on my side: > > - there are use cases/deployments that do not use them. > > - moving them around was doable in term of required changes. > > > > There are no "slow-path" implications on my side. For example, vlan_* > > fields are very critical performance wise, if the traffic is tagged. > > But surely there are busy servers not using tagget traffic which will > > enjoy the reduced cachelines footprint, and this changeset will not > > impact negatively the first case. > > > > WRT to the vlan example, secmark and nfct require an extra conditional > > to fetch the data. My understanding is that such additional conditional > > is not measurable performance-wise when benchmarking the security > > modules (or conntrack) because they have to do much more intersting > > things after fetching a few bytes from an already hot cacheline. > > > > Not sure if the above somehow clarify my statements. > > > > As for expanding secmark to 64 bits, I guess that could be an > > interesting follow-up discussion :) > > The intersection between netdev and the LSM has a long and somewhat > tortured past with each party making sacrifices along the way to get > where we are at today. It is far from perfect, at least from a LSM > perspective, but it is what we've got and since performance is usually > used as a club to beat back any changes proposed by the LSM side, I > would like to object to these changes that negatively impact the LSM > performance without some concession in return. It has been a while > since Casey and I have spoken about this, but I think the prefered > option would be to exchange the current __u32 "sk_buff.secmark" field > with a void* "sk_buff.security" field, like so many other kernel level > objects. Previous objections have eventually boiled down to the > additional space in the sk_buff for the extra bits (there is some > additional editorializing that could be done here, but I'll refrain), > but based on the comments thus far in this thread it sounds like > perhaps we can now make a deal here: move the LSM field down to a > "colder" cacheline in exchange for converting the LSM field to a > proper pointer. > > Thoughts? Is there a summary disucssion somewhere wrt. what exactly LSMs need? There is the skb extension infra, does that work for you? ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH RFC 0/9] sk_buff: optimize layout for GRO 2021-07-24 18:51 ` Florian Westphal @ 2021-07-25 14:57 ` Paul Moore 2021-07-25 16:25 ` Florian Westphal 0 siblings, 1 reply; 14+ messages in thread From: Paul Moore @ 2021-07-25 14:57 UTC (permalink / raw) To: Florian Westphal Cc: Paolo Abeni, Casey Schaufler, netdev, David S. Miller, Jakub Kicinski, Eric Dumazet, linux-security-module, selinux On Sat, Jul 24, 2021 at 2:51 PM Florian Westphal <fw@strlen.de> wrote: > Paul Moore <paul@paul-moore.com> wrote: > > Tow main drivers on my side: > > > - there are use cases/deployments that do not use them. > > > - moving them around was doable in term of required changes. > > > > > > There are no "slow-path" implications on my side. For example, vlan_* > > > fields are very critical performance wise, if the traffic is tagged. > > > But surely there are busy servers not using tagget traffic which will > > > enjoy the reduced cachelines footprint, and this changeset will not > > > impact negatively the first case. > > > > > > WRT to the vlan example, secmark and nfct require an extra conditional > > > to fetch the data. My understanding is that such additional conditional > > > is not measurable performance-wise when benchmarking the security > > > modules (or conntrack) because they have to do much more intersting > > > things after fetching a few bytes from an already hot cacheline. > > > > > > Not sure if the above somehow clarify my statements. > > > > > > As for expanding secmark to 64 bits, I guess that could be an > > > interesting follow-up discussion :) > > > > The intersection between netdev and the LSM has a long and somewhat > > tortured past with each party making sacrifices along the way to get > > where we are at today. It is far from perfect, at least from a LSM > > perspective, but it is what we've got and since performance is usually > > used as a club to beat back any changes proposed by the LSM side, I > > would like to object to these changes that negatively impact the LSM > > performance without some concession in return. It has been a while > > since Casey and I have spoken about this, but I think the prefered > > option would be to exchange the current __u32 "sk_buff.secmark" field > > with a void* "sk_buff.security" field, like so many other kernel level > > objects. Previous objections have eventually boiled down to the > > additional space in the sk_buff for the extra bits (there is some > > additional editorializing that could be done here, but I'll refrain), > > but based on the comments thus far in this thread it sounds like > > perhaps we can now make a deal here: move the LSM field down to a > > "colder" cacheline in exchange for converting the LSM field to a > > proper pointer. > > > > Thoughts? > > Is there a summary disucssion somewhere wrt. what exactly LSMs need? My network access is limited for the next week so I don't have the ability to dig through the list archives, but if you look through the netdev/LSM/lists over the past decade (maybe go back ~15 years?) you will see multiple instances where we/I've brought up different solutions with the netdev folks only to hit a brick wall. The LSM ask for sk_buff is really the same as any other kernel object that we want to control with LSM access controls, e.g. inodes; we basically want a void* blob with the necessary hooks so that the opaque blob can be managed through the skb's lifetime. > There is the skb extension infra, does that work for you? I was hopeful that when the skb_ext capability was introduced we might be able to use it for the LSM(s), but when I asked netdev if they would be willing to accept patches to leverage the skb_ext infrastructure I was told "no". -- paul moore www.paul-moore.com ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH RFC 0/9] sk_buff: optimize layout for GRO 2021-07-25 14:57 ` Paul Moore @ 2021-07-25 16:25 ` Florian Westphal 2021-07-25 21:53 ` Casey Schaufler 0 siblings, 1 reply; 14+ messages in thread From: Florian Westphal @ 2021-07-25 16:25 UTC (permalink / raw) To: Paul Moore Cc: Florian Westphal, Paolo Abeni, Casey Schaufler, netdev, David S. Miller, Jakub Kicinski, Eric Dumazet, linux-security-module, selinux Paul Moore <paul@paul-moore.com> wrote: > > There is the skb extension infra, does that work for you? > > I was hopeful that when the skb_ext capability was introduced we might > be able to use it for the LSM(s), but when I asked netdev if they > would be willing to accept patches to leverage the skb_ext > infrastructure I was told "no". I found https://lore.kernel.org/netdev/CAHC9VhSz1_KA1tCJtNjwK26BOkGhKGbPT7v1O82mWPduvWwd4A@mail.gmail.com/#r and from what I gather from your comments and that of Casey I think skb extensions is the correct thing for this (i.e., needs netlabel/secid config/enablement so typically won't be active on a distro kernel by default). It certainly makes more sense to me than doing lookups in a hashtable based on a ID (I tried to do that to get rid of skb->nf_bridge pointer years ago and it I could not figure out how to invalidate an entry without adding a new skb destructor callback). ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH RFC 0/9] sk_buff: optimize layout for GRO 2021-07-25 16:25 ` Florian Westphal @ 2021-07-25 21:53 ` Casey Schaufler 2021-07-25 22:52 ` Florian Westphal 0 siblings, 1 reply; 14+ messages in thread From: Casey Schaufler @ 2021-07-25 21:53 UTC (permalink / raw) To: Florian Westphal, Paul Moore Cc: Paolo Abeni, netdev, David S. Miller, Jakub Kicinski, Eric Dumazet, linux-security-module, selinux, Casey Schaufler On 7/25/2021 9:25 AM, Florian Westphal wrote: > Paul Moore <paul@paul-moore.com> wrote: >>> There is the skb extension infra, does that work for you? >> I was hopeful that when the skb_ext capability was introduced we might >> be able to use it for the LSM(s), but when I asked netdev if they >> would be willing to accept patches to leverage the skb_ext >> infrastructure I was told "no". > I found > > https://lore.kernel.org/netdev/CAHC9VhSz1_KA1tCJtNjwK26BOkGhKGbPT7v1O82mWPduvWwd4A@mail.gmail.com/#r > > and from what I gather from your comments and that of Casey > I think skb extensions is the correct thing for this (i.e., needs > netlabel/secid config/enablement so typically won't be active on > a distro kernel by default). RedHat and android use SELinux and will want this. Ubuntu doesn't yet, but netfilter in in the AppArmor task list. Tizen definitely uses it with Smack. The notion that security modules are only used in fringe cases is antiquated. > It certainly makes more sense to me than doing lookups > in a hashtable based on a ID Agreed. The data burden required to support a hash scheme for the security module stacking case is staggering. > (I tried to do that to get rid of skb->nf_bridge > pointer years ago and it I could not figure out how to invalidate an entry > without adding a new skb destructor callback). ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH RFC 0/9] sk_buff: optimize layout for GRO 2021-07-25 21:53 ` Casey Schaufler @ 2021-07-25 22:52 ` Florian Westphal 2021-07-26 15:13 ` Casey Schaufler 0 siblings, 1 reply; 14+ messages in thread From: Florian Westphal @ 2021-07-25 22:52 UTC (permalink / raw) To: Casey Schaufler Cc: Florian Westphal, Paul Moore, Paolo Abeni, netdev, David S. Miller, Jakub Kicinski, Eric Dumazet, linux-security-module, selinux Casey Schaufler <casey@schaufler-ca.com> wrote: > RedHat and android use SELinux and will want this. Ubuntu doesn't > yet, but netfilter in in the AppArmor task list. Tizen definitely > uses it with Smack. The notion that security modules are only used > in fringe cases is antiquated. I was not talking about LSM in general, I was referring to the extended info that Paul mentioned. If thats indeed going to be used on every distro then skb extensions are not suitable for this, it would result in extr akmalloc for every skb. > > It certainly makes more sense to me than doing lookups > > in a hashtable based on a ID > > Agreed. The data burden required to support a hash scheme > for the security module stacking case is staggering. It depends on the type of data (and its lifetime). I suspect you have something that is more like skb->dev/dst, i.e. reference to object that persists after the skb is free'd. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH RFC 0/9] sk_buff: optimize layout for GRO 2021-07-25 22:52 ` Florian Westphal @ 2021-07-26 15:13 ` Casey Schaufler 2021-07-27 2:51 ` Paul Moore 0 siblings, 1 reply; 14+ messages in thread From: Casey Schaufler @ 2021-07-26 15:13 UTC (permalink / raw) To: Florian Westphal Cc: Paul Moore, Paolo Abeni, netdev, David S. Miller, Jakub Kicinski, Eric Dumazet, linux-security-module, selinux, Casey Schaufler On 7/25/2021 3:52 PM, Florian Westphal wrote: > Casey Schaufler <casey@schaufler-ca.com> wrote: >> RedHat and android use SELinux and will want this. Ubuntu doesn't >> yet, but netfilter in in the AppArmor task list. Tizen definitely >> uses it with Smack. The notion that security modules are only used >> in fringe cases is antiquated. > I was not talking about LSM in general, I was referring to the > extended info that Paul mentioned. > > If thats indeed going to be used on every distro then skb extensions > are not suitable for this, it would result in extr akmalloc for every > skb. I am explicitly talking about the use of secmarks. All my references are uses of secmarks. >>> It certainly makes more sense to me than doing lookups >>> in a hashtable based on a ID >> Agreed. The data burden required to support a hash scheme >> for the security module stacking case is staggering. > It depends on the type of data (and its lifetime). > > I suspect you have something that is more like skb->dev/dst, > i.e. reference to object that persists after the skb is free'd. Just so. Only to make it more complicated, SELinux and Smack, the two LSMs currently using secmarks, use them differently. SELinux uses u32 "secids" natively, but Smack suffers serious performance degradation because it has to look up (efficiently, but look up nonetheless) the real Smack value on every packet. Please, I know about hash caches, cache hashes and all sorts of clever tricks to reduce the impact. Nothing beats having the end value up front. ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH RFC 0/9] sk_buff: optimize layout for GRO 2021-07-26 15:13 ` Casey Schaufler @ 2021-07-27 2:51 ` Paul Moore 2021-07-28 16:21 ` Paolo Abeni 0 siblings, 1 reply; 14+ messages in thread From: Paul Moore @ 2021-07-27 2:51 UTC (permalink / raw) To: Casey Schaufler Cc: Florian Westphal, Paolo Abeni, netdev, David S. Miller, Jakub Kicinski, Eric Dumazet, linux-security-module, selinux On Mon, Jul 26, 2021 at 11:13 AM Casey Schaufler <casey@schaufler-ca.com> wrote: > On 7/25/2021 3:52 PM, Florian Westphal wrote: > > Casey Schaufler <casey@schaufler-ca.com> wrote: > >> RedHat and android use SELinux and will want this. Ubuntu doesn't > >> yet, but netfilter in in the AppArmor task list. Tizen definitely > >> uses it with Smack. The notion that security modules are only used > >> in fringe cases is antiquated. > > I was not talking about LSM in general, I was referring to the > > extended info that Paul mentioned. > > > > If thats indeed going to be used on every distro then skb extensions > > are not suitable for this, it would result in extr akmalloc for every > > skb. > > I am explicitly talking about the use of secmarks. All my > references are uses of secmarks. I'm talking about a void* which would contain LSM specific data; as I said earlier, think of inodes. This LSM specific data would include the existing secmark data as well as network peer security information which would finally (!!!) allow us to handle forwarded traffic and enable a number of other fixes and performance improvements. (The details are a bit beyond this discussion but it basically revolves around us not having to investigate the import the packet headers every time we want to determine the network peer security attributes, we could store the resolved LSM information in the sk_buff.security blob.) -- paul moore www.paul-moore.com ^ permalink raw reply [flat|nested] 14+ messages in thread
* Re: [PATCH RFC 0/9] sk_buff: optimize layout for GRO 2021-07-27 2:51 ` Paul Moore @ 2021-07-28 16:21 ` Paolo Abeni 0 siblings, 0 replies; 14+ messages in thread From: Paolo Abeni @ 2021-07-28 16:21 UTC (permalink / raw) To: Paul Moore, Casey Schaufler Cc: Florian Westphal, netdev, David S. Miller, Jakub Kicinski, Eric Dumazet, linux-security-module, selinux On Mon, 2021-07-26 at 22:51 -0400, Paul Moore wrote: > On Mon, Jul 26, 2021 at 11:13 AM Casey Schaufler <casey@schaufler-ca.com> wrote: > > On 7/25/2021 3:52 PM, Florian Westphal wrote: > > > Casey Schaufler <casey@schaufler-ca.com> wrote: > > > > RedHat and android use SELinux and will want this. Ubuntu doesn't > > > > yet, but netfilter in in the AppArmor task list. Tizen definitely > > > > uses it with Smack. The notion that security modules are only used > > > > in fringe cases is antiquated. > > > I was not talking about LSM in general, I was referring to the > > > extended info that Paul mentioned. > > > > > > If thats indeed going to be used on every distro then skb extensions > > > are not suitable for this, it would result in extr akmalloc for every > > > skb. > > > > I am explicitly talking about the use of secmarks. All my > > references are uses of secmarks. > > I'm talking about a void* which would contain LSM specific data; as I > said earlier, think of inodes. This LSM specific data would include > the existing secmark data as well as network peer security information > which would finally (!!!) allow us to handle forwarded traffic and > enable a number of other fixes and performance improvements. > > (The details are a bit beyond this discussion but it basically > revolves around us not having to investigate the import the packet > headers every time we want to determine the network peer security > attributes, we could store the resolved LSM information in the > sk_buff.security blob.) I've investigated the feasibility of extending the secmark field to long/void*. I think that performance wise it should be doable on top of this series: the amount of allocated memory for sk_buff will not change, nor the amount of memory memseted at skb initialization time. I stumbled upon some uAPIs issues, as CT/nft expose a secmark related field via uAPI, changing that size without breaking esisting user-space looks hard to me. Additionally, even patch 7/9 is problematic, as there are some in kernel users accessing and using the inner_ field regardless skb- >encapsulation. That works while inner_* field are always initializared/zeored, but will break with the mentioned patch. The fix is doable, but large and complex. To keep the scope of this series sane, I'll drop in the next iteration all the problematic patches - that is: no sk_buff layout change at all. If there is interest for such thing, it could still be added incrementally. Cheers, Paolo ^ permalink raw reply [flat|nested] 14+ messages in thread
end of thread, other threads:[~2021-07-28 16:21 UTC | newest] Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2021-07-21 16:44 [PATCH RFC 0/9] sk_buff: optimize layout for GRO Paolo Abeni 2021-07-21 18:15 ` Casey Schaufler 2021-07-22 7:10 ` Paolo Abeni 2021-07-22 16:04 ` Casey Schaufler 2021-07-22 16:57 ` Paolo Abeni 2021-07-22 18:41 ` Paul Moore 2021-07-24 18:51 ` Florian Westphal 2021-07-25 14:57 ` Paul Moore 2021-07-25 16:25 ` Florian Westphal 2021-07-25 21:53 ` Casey Schaufler 2021-07-25 22:52 ` Florian Westphal 2021-07-26 15:13 ` Casey Schaufler 2021-07-27 2:51 ` Paul Moore 2021-07-28 16:21 ` Paolo Abeni
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).