LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH 0/5] [RFC] AF_RXRPC socket family implementation [try #3]
@ 2007-03-20 19:58 ` David Howells
  2007-03-20 19:59   ` [PATCH 1/5] AF_RXRPC: Add blkcipher accessors for using kernel data directly " David Howells
                     ` (11 more replies)
  0 siblings, 12 replies; 14+ messages in thread
From: David Howells @ 2007-03-20 19:58 UTC (permalink / raw)
  To: davem, netdev, herbert.xu; +Cc: linux-kernel, hch, arjan, dhowells


These patches together supply secure client-side RxRPC connectivity as a Linux
kernel socket family.  Only the transport/session side is supplied - the
presentation side (marshalling the data) is left to the client.  Copies of the
patches can be found here:

	http://people.redhat.com/~dhowells/rxrpc/01-crypto-kernel-buff.diff
	http://people.redhat.com/~dhowells/rxrpc/02-move-skb-generic.diff
	http://people.redhat.com/~dhowells/rxrpc/03-timers.diff
	http://people.redhat.com/~dhowells/rxrpc/04-keys.diff
	http://people.redhat.com/~dhowells/rxrpc/05-af_rxrpc.diff

The userspace access methods make use of the control data passed to/by
sendmsg() and recvmsg().  See the three simple test programs:

	http://people.redhat.com/~dhowells/rxrpc/klog.c
	http://people.redhat.com/~dhowells/rxrpc/rxrpc.c
	http://people.redhat.com/~dhowells/rxrpc/listen.c

I've attached the current in-kernel documentation to this message.

TODO:

 (*) Make fs/afs/ use it and delete the current contents of net/rxrpc/

 (*) Make certain parameters (such as connection timeouts) userspace
     configurable.

 (*) Make userspace utilities use it; librxrpc.

 (*) Userspace documentation.

 (*) KerberosV security.

Changes:

 (*) SOCK_RPC has been removed.  AF_RXRPC sockets now simply ignore the "type"
     argument to socket().

 (*) I've add a facility whereby calls can be made to destinations other than
     the connect() address of a client socket by making use of msg_name in the
     msghdr struct when using sendmsg() to send the first data packet of a
     call.  Indeed, a client socket need not be connected before being used
     so.

 (*) I've also added a facility whereby client calls may also be made on
     server sockets, again by using msg_name in the msghdr struct.  In such a
     case, the server's local transport endpoint is used.

David


			    ======================
			    RxRPC NETWORK PROTOCOL
			    ======================

The RxRPC protocol driver provides a reliable two-phase transport on top of UDP
that can be used to perform RxRPC remote operations.  This is done over sockets
of AF_RXRPC family, using sendmsg() and recvmsg() with control data to send and
receive data, aborts and errors.

Contents of this document:

 (*) Overview.

 (*) RxRPC protocol summary.

 (*) AF_RXRPC driver model.

 (*) Security.

 (*) Example client usage.

 (*) Example server usage.


========
OVERVIEW
========

RxRPC is a two-layer protocol.  There is a session layer which provides
reliable virtual connections using UDP over IPv4 (or IPv6) as the transport
layer, but implements a real network protocol; and there's the presentation
layer which renders structured data to binary blobs and back again using XDR
(as does SunRPC):

		+-------------+
		| Application |
		+-------------+
		|     XDR     |		Presentation
		+-------------+
		|    RxRPC    |		Session
		+-------------+
		|     UDP     |		Transport
		+-------------+


AF_RXRPC provides:

 (1) Part of an RxRPC facility for both kernel and userspace applications by
     making the session part of it a Linux network protocol (AF_RXRPC).

 (2) A two-phase protocol.  The client transmits a blob (the request) and then
     receives a blob (the reply), and the server receives the request and then
     transmits the reply.

 (3) Retention of the reusable bits of the transport system set up for one call
     to speed up subsequent calls.

 (4) A secure protocol, using the Linux kernel's key retention facility to
     manage security on the client end.  The server end must of necessity be
     more active in security negotiations.

AF_RXRPC does not provide XDR marshalling/presentation facilities.  That is
left to the application.  AF_RXRPC only deals in blobs.  Even the operation ID
is just the first four bytes of the request blob, and as such is beyond the
kernel's interest.


Sockets of AF_RXRPC family are:

 (1) created as any type, but 0 is recommended;

 (2) provided with a protocol of the type of underlying transport they're going
     to use - currently only PF_INET is supported.


The Andrew File System (AFS) is an example of an application that uses this and
that has both kernel (filesystem) and userspace (utility) components.


======================
RXRPC PROTOCOL SUMMARY
======================

An overview of the RxRPC protocol:

 (*) RxRPC sits on top of another networking protocol (UDP is the only option
     currently), and uses this to provide network transport.  UDP ports, for
     example, provide transport endpoints.

 (*) RxRPC supports multiple virtual "connections" from any given transport
     endpoint, thus allowing the endpoints to be shared, even to the same
     remote endpoint.

 (*) Each connection goes to a particular "service".  A connection may not go
     to multiple services.  A service may be considered the RxRPC equivalent of
     a port number.  AF_RXRPC permits multiple services to share an endpoint.

 (*) Client-originating packets are marked, thus a transport endpoint can be
     shared between client and server connections (connections have a
     direction).

 (*) Up to a billion connections may be supported concurrently between one
     local transport endpoint and one service on one remote endpoint.  An RxRPC
     connection is described by seven numbers:

	Local address	}
	Local port	} Transport (UDP) address
	Remote address	}
	Remote port	}
	Direction
	Connection ID
	Service ID

 (*) Each RxRPC operation is a "call".  A connection may make up to four
     billion calls, but only up to four calls may be in progress on a
     connection at any one time.

 (*) Calls are two-phase and asymmetric: the client sends its request data,
     which the service receives; then the service transmits the reply data
     which the client receives.

 (*) The data blobs are of indefinite size, the end of a phase is marked with a
     flag in the packet.  The number of packets of data making up one blob may
     not exceed 4 billion, however, as this would cause the sequence number to
     wrap.

 (*) The first four bytes of the request data are the service operation ID.

 (*) Security is negotiated on a per-connection basis.  The connection is
     initiated by the first data packet on it arriving.  If security is
     requested, the server then issues a "challenge" and then the client
     replies with a "response".  If the response is successful, the security is
     set for the lifetime of that connection, and all subsequent calls made
     upon it use that same security.  In the event that the server lets a
     connection lapse before the client, the security will be renegotiated if
     the client uses the connection again.

 (*) Calls use ACK packets to handle reliability.  Data packets are also
     explicitly sequenced per call.

 (*) There are two types of positive acknowledgement: hard-ACKs and soft-ACKs.
     A hard-ACK indicates to the far side that all the data received to a point
     has been received and processed; a soft-ACK indicates that the data has
     been received but may yet be discarded and re-requested.  The sender may
     not discard any transmittable packets until they've been hard-ACK'd.

 (*) Reception of a reply data packet implicitly hard-ACK's all the data
     packets that make up the request.

 (*) An call is complete when the request has been sent, the reply has been
     received and the final hard-ACK on the last packet of the reply has
     reached the server.

 (*) An call may be aborted by either end at any time up to its completion.


=====================
AF_RXRPC DRIVER MODEL
=====================

About the AF_RXRPC driver:

 (*) The AF_RXRPC protocol transparently uses internal sockets of the transport
     protocol to represent transport endpoints.

 (*) AF_RXRPC sockets map onto RxRPC connection bundles.  Actual RxRPC
     connections are handled transparently.  One client socket may be used to
     make multiple simultaneous calls to the same service.  One server socket
     may handle calls from many clients.

 (*) Additional parallel client connections will be initiated to support extra
     concurrent calls, up to a tunable limit.

 (*) Each connection is retained for a certain amount of time [tunable] after
     the last call currently using it has completed in case a new call is made
     that could reuse it.

 (*) Each internal UDP socket is retained [tunable] for a certain amount of
     time [tunable] after the last connection using it discarded, in case a new
     connection is made that could use it.

 (*) A client-side connection is only shared between calls if they have have
     the same key struct describing their security (and assuming the calls
     would otherwise share the connection).  Non-secured calls would also be
     able to share connections with each other.

 (*) A server-side connection is shared if the client says it is.

 (*) ACK'ing is handled by the protocol driver automatically, including ping
     replying.

 (*) SO_KEEPALIVE automatically pings the other side to keep the connection
     alive [TODO].

 (*) If an ICMP error is received, all calls affected by that error will be
     aborted with an appropriate network error passed through recvmsg().


Interaction with the user of the RxRPC socket:

 (*) A socket is made into a server socket by binding an address with a
     non-zero service ID.

 (*) In the client, sending a request is achieved with one or more sendmsgs,
     followed by the reply being received with one or more recvmsgs.

 (*) The first sendmsg for a request to be sent from a client contains a tag to
     be used in all other sendmsgs or recvmsgs associated with that call.  The
     tag is carried in the control data.

 (*) connect() is used to supply a default destination address for a client
     socket.  This may be overridden by supplying an alternate address to the
     first sendmsg() of a call (struct msghdr::msg_name).

 (*) If connect() is called on an unbound client, a random local port will
     bound before the operation takes place.

 (*) A server socket may also be used to make client calls.  To do this, the
     first sendmsg() of the call must specify the target address.  The server's
     transport endpoint is used to send the packets.

 (*) Once the client has received the last message associated with a call, the
     tag is guaranteed not to be seen again, and so it can be used to pin
     client resources.  A new call can then be initiated with the same tag
     without fear of interference.

 (*) In the server, a request is received with one or more recvmsgs, then the
     the reply is transmitted with one or more sendmsgs, and then the final ACK
     is received with a last recvmsg].

 (*) When sending data, sendmsg is given MSG_MORE if there's more data to come.

 (*) An abort may be issued by adding an control message to the control data.
     Issuing an abort terminates the kernel's use of that call's tag.

 (*) Aborts, busy notifications and challenge packets are collected by recvmsg
     with control data message to indicate the context.  Receiving an abort or
     a busy message terminates the kernel's use of that call's tag.

 (*) The control data part of the msghdr struct is used for a number of things:

     (*) The tag of the intended or affected call.

     (*) Sending or receiving errors, aborts and busy notifications.

     (*) Notifications of incoming calls.

     (*) Sending debug requests and receiving debug replies [TODO].

 (*) When the kernel has received and set up an incoming call, it sends a
     message to server application to let it know there's a new call awaiting
     its acceptance [recvmsg reports a special control message].  The server
     application then uses sendmsg to assign a tag to the new call.  Once that
     is done, the first part of the request data will be delivered by recvmsg.

 (*) The server application has to provide the server socket with a keyring of
     secret keys corresponding to the security types it permits.  When a secure
     connection is being set up, the kernel looks up the appropriate secret key
     in the keyring and then sends a challenge packet to the client and
     receives a response packet.  The kernel then checks the authorisation of
     the packet and either aborts the connection or sets up the security.

 (*) The name of the key a client will use to secure its communications is
     nominated by a socket option.


========
SECURITY
========

Currently, only the kerberos 4 equivalent protocol has been implemented
(security index 2 - rxkad).  This requires the rxkad module to be loaded and,
on the client, tickets of the appropriate type to be obtained from the AFS
kaserver or the kerberos server and installed as "rxrpc" type keys.  This is
normally done using the klog program.  An example simple klog program can be
found at:

	http://people.redhat.com/~dhowells/rxrpc/klog.c

The payload provided to add_key() on the client should be of the following
form:

	struct rxrpc_key_sec2_v1 {
		uint16_t	security_index;	/* 2 */
		uint16_t	ticket_length;	/* length of ticket[] */
		uint32_t	expiry;		/* time at which expires */
		uint8_t		kvno;		/* key version number */
		uint8_t		__pad[3];
		uint8_t		session_key[8];	/* DES session key */
		uint8_t		ticket[0];	/* the encrypted ticket */
	};

Where the ticket blob is just appended to the above structure.


For the server, keys of type "rxrpc_s" must be made available to the server.
They have a description of "<serviceID>:<securityIndex>" (eg: "52:2" for an
rxkad key for the AFS VL service).  When such a key is created, it should be
given the server's secret key as the instantiation data (see the example
below).

	add_key("rxrpc_s", "52:2", secret_key, 8, keyring);

A keyring is passed to the server socket by naming it in a sockopt.  The server
socket then looks the server secret keys up in this keyring when secure
incoming connections are made.  This can be seen in an example program that can
be found at:

	http://people.redhat.com/~dhowells/rxrpc/listen.c


====================
EXAMPLE CLIENT USAGE
====================

A client would issue an operation by:

 (1) An RxRPC socket is set up by:

	client = socket(AF_RXRPC, 0, PF_INET);

     Where the third parameter indicates the protocol family of the transport
     socket used - usually IPv4 but it can also be IPv6 [TODO].

 (2) A local address can optionally be bound:

	struct sockaddr_rxrpc srx = {
		.srx_family	= AF_RXRPC,
		.srx_service	= 0,  /* we're a client */
		.transport_type	= SOCK_DGRAM,	/* type of transport socket */
		.transport.sin_family	= AF_INET,
		.transport.sin_port	= htons(7000), /* AFS callback */
		.transport.sin_address	= 0,  /* all local interfaces */
	};
	bind(client, &srx, sizeof(srx));

     This specifies the local UDP port to be used.  If not given, a random
     non-privileged port will be used.  A UDP port may be shared between
     several unrelated RxRPC sockets.  Security is handled on a basis of
     per-RxRPC virtual connection.

 (3) The security is set:

	const char *key = "AFS:cambridge.redhat.com";
	setsockopt(client, SOL_RXRPC, RXRPC_SECURITY_KEY, key, strlen(key));

     This issues a request_key() to get the key representing the security
     context.  The minimum security level can be set:

	unsigned int sec = RXRPC_SECURITY_ENCRYPTED;
	setsockopt(client, SOL_RXRPC, RXRPC_MIN_SECURITY_LEVEL,
		   &sec, sizeof(sec));

 (4) The server to be contacted can then be specified (alternatively this can
     be done through sendmsg):

	struct sockaddr_rxrpc srx = {
		.srx_family	= AF_RXRPC,
		.srx_service	= VL_SERVICE_ID,
		.transport_type	= SOCK_DGRAM,	/* type of transport socket */
		.transport.sin_family	= AF_INET,
		.transport.sin_port	= htons(7005), /* AFS volume manager */
		.transport.sin_address	= ...,
	};
	connect(client, &srx, sizeof(srx));

 (5) The request is then sent:

	sendmsg(client, msg, 0);

 (6) And the reply received:

	recvmsg(client, msg, 0);

     If an abort or error occurred, this will be returned in the control data
     buffer.


====================
EXAMPLE SERVER USAGE
====================

A server would be set up to accept operations in the following manner:

 (1) An RxRPC socket is created by:

	server = socket(AF_RXRPC, 0, PF_INET);

     Where the third parameter indicates the address type of the transport
     socket used - usually IPv4.

 (2) Security is set up if desired by giving the socket a keyring with server
     secret keys in it:

	keyring = add_key("keyring", "AFSkeys", NULL, 0,
			  KEY_SPEC_PROCESS_KEYRING);

	const char secret_key[8] = {
		0xa7, 0x83, 0x8a, 0xcb, 0xc7, 0x83, 0xec, 0x94 };
	add_key("rxrpc_s", "52:2", secret_key, 8, keyring);

	setsockopt(server, SOL_RXRPC, RXRPC_SECURITY_KEYRING, "AFSkeys", 7);

     The keyring can be manipulated after it has been given to the socket. This
     permits the server to add more keys, replace keys, etc. whilst it is live.

 (2) A local address must then be bound:

	struct sockaddr_rxrpc srx = {
		.srx_family	= AF_RXRPC,
		.srx_service	= VL_SERVICE_ID, /* RxRPC service ID */
		.transport_type	= SOCK_DGRAM,	/* type of transport socket */
		.transport.sin_family	= AF_INET,
		.transport.sin_port	= htons(7000), /* AFS callback */
		.transport.sin_address	= 0,  /* all local interfaces */
	};
	bind(server, &srx, sizeof(srx));

 (3) The server is then set to listen out for incoming calls:

	listen(server, 100);

 (4) The kernel notifies the server of pending incoming connections by sending
     it a message for each.  This is received with recvmsg() on the server
     socket.  It has no data, and has a single dataless control message
     attached:

	RXRPC_NEW_CALL

     The address that can be passed back by recvmsg() at this point should be
     ignored since the call for which the message was posted may have gone by
     the time it is accepted - in which case the first call still on the queue
     will be accepted.

 (5) The server then accepts the new call by issuing a sendmsg() with two
     pieces of control data and no actual data:

	RXRPC_ACCEPT		- indicate connection acceptance
	RXRPC_USER_CALL_ID	- specify user ID for this call

 (6) The first request data packet will then be posted to the server socket for
     recvmsg() to pick up.  At that point, the RxRPC address for the call can
     be read from the address fields in the msghdr struct.

     Subsequent request data packets will be posted to the server socket for
     recvmsg() to collect as they arrive.  The last packet in the request will
     be posted with MSG_EOR set in msghdr::msg_flags.

     All data packets will be delivered with the following control message
     attached:

	RXRPC_USER_CALL_ID	- specifies the user ID for this call

 (8) The reply data should then be posted to the server socket using a series
     of sendmsg() calls, each with the following control messages attached:

	RXRPC_USER_CALL_ID	- specifies the user ID for this call

     MSG_MORE should be set in msghdr::msg_flags on all but the last call.

 (9) The final ACK from the client will be posted for retrieval by recvmsg()
     when it is received.  It will take the form of a dataless message with two
     control messages attached:

	RXRPC_USER_CALL_ID	- specifies the user ID for this call
	RXRPC_ACK		- indicates final ACK (no data)

(10) Up to the point the final packet of reply data is sent, the call can be
     aborted by calling sendmsg() with a dataless message with the following
     control messages attached:

	RXRPC_USER_CALL_ID	- specifies the user ID for this call
	RXRPC_ABORT		- indicates abort code (4 byte data)

Note that all the communications for a particular service take place through
the one server socket, using control messages on sendmsg() and recvmsg() to
determine the call affected.

^ permalink raw reply	[flat|nested] 14+ messages in thread

* [PATCH 1/5] AF_RXRPC: Add blkcipher accessors for using kernel data directly [try #3]
  2007-03-20 19:58 ` David Howells
@ 2007-03-20 19:59   ` David Howells
  2007-03-20 19:59   ` [PATCH 2/5] AF_RXRPC: Move generic skbuff stuff from XFRM code to generic code " David Howells
                     ` (10 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: David Howells @ 2007-03-20 19:59 UTC (permalink / raw)
  To: davem, netdev, herbert.xu; +Cc: linux-kernel, hch, arjan, dhowells

Add blkcipher accessors for using kernel data directly without the use of
scatter lists.

Also add a CRYPTO_ALG_DMA algorithm capability flag to permit or deny the use
of DMA and hardware accelerators.  A hardware accelerator may not be used to
access any arbitrary piece of kernel memory lest it not be in a DMA'able
region.  Only software algorithms may do that.

If kernel data is going to be accessed directly, then CRYPTO_ALG_DMA must, for
instance, be passed in the mask of crypto_alloc_blkcipher(), but not the type.

This is used by AF_RXRPC to do quick encryptions, where the size of the data
being encrypted or decrypted is 8 bytes or, occasionally, 16 bytes (ie: one or
two chunks only), and since these data are generally on the stack they may be
split over two pages.  Because they're so small, and because they may be
misaligned, setting up a scatter-gather list is overly expensive.  It is very
unlikely that a hardware FCrypt PCBC engine will be encountered (there is not,
as far as I know, any such thing), and even if one is encountered, the
setup/teardown costs for such small transactions will almost certainly be
prohibitive.

Encrypting and decrypting whole packets, on the other hand, is done through the
scatter-gather list interface as the amount of data is sufficient that the
expense of doing virtual address to page calculations is sufficiently small by
comparison.

Signed-Off-By: David Howells <dhowells@redhat.com>
---

 crypto/blkcipher.c     |    2 +
 crypto/pcbc.c          |   62 +++++++++++++++++++++++++
 include/linux/crypto.h |  118 ++++++++++++++++++++++++++++++++++++++++++++++++
 3 files changed, 181 insertions(+), 1 deletions(-)

diff --git a/crypto/blkcipher.c b/crypto/blkcipher.c
index b5befe8..4498b2d 100644
--- a/crypto/blkcipher.c
+++ b/crypto/blkcipher.c
@@ -376,6 +376,8 @@ static int crypto_init_blkcipher_ops(struct crypto_tfm *tfm, u32 type, u32 mask)
 	crt->setkey = setkey;
 	crt->encrypt = alg->encrypt;
 	crt->decrypt = alg->decrypt;
+	crt->encrypt_kernel = alg->encrypt_kernel;
+	crt->decrypt_kernel = alg->decrypt_kernel;
 
 	addr = (unsigned long)crypto_tfm_ctx(tfm);
 	addr = ALIGN(addr, align);
diff --git a/crypto/pcbc.c b/crypto/pcbc.c
index 5174d7f..fa76111 100644
--- a/crypto/pcbc.c
+++ b/crypto/pcbc.c
@@ -126,6 +126,36 @@ static int crypto_pcbc_encrypt(struct blkcipher_desc *desc,
 	return err;
 }
 
+static int crypto_pcbc_encrypt_kernel(struct blkcipher_desc *desc,
+				      u8 *dst, const u8 *src,
+				      unsigned int nbytes)
+{
+	struct blkcipher_walk walk;
+	struct crypto_blkcipher *tfm = desc->tfm;
+	struct crypto_pcbc_ctx *ctx = crypto_blkcipher_ctx(tfm);
+	struct crypto_cipher *child = ctx->child;
+	void (*xor)(u8 *, const u8 *, unsigned int bs) = ctx->xor;
+
+	BUG_ON(crypto_tfm_alg_capabilities(crypto_cipher_tfm(child)) &
+	       CRYPTO_ALG_DMA);
+
+	if (nbytes == 0)
+		return 0;
+
+	memset(&walk, 0, sizeof(walk));
+	walk.src.virt.addr = (u8 *) src;
+	walk.dst.virt.addr = (u8 *) dst;
+	walk.nbytes = nbytes;
+	walk.total = nbytes;
+	walk.iv = desc->info;
+
+	if (walk.src.virt.addr == walk.dst.virt.addr)
+		nbytes = crypto_pcbc_encrypt_inplace(desc, &walk, child, xor);
+	else
+		nbytes = crypto_pcbc_encrypt_segment(desc, &walk, child, xor);
+	return 0;
+}
+
 static int crypto_pcbc_decrypt_segment(struct blkcipher_desc *desc,
 				       struct blkcipher_walk *walk,
 				       struct crypto_cipher *tfm,
@@ -211,6 +241,36 @@ static int crypto_pcbc_decrypt(struct blkcipher_desc *desc,
 	return err;
 }
 
+static int crypto_pcbc_decrypt_kernel(struct blkcipher_desc *desc,
+				      u8 *dst, const u8 *src,
+				      unsigned int nbytes)
+{
+	struct blkcipher_walk walk;
+	struct crypto_blkcipher *tfm = desc->tfm;
+	struct crypto_pcbc_ctx *ctx = crypto_blkcipher_ctx(tfm);
+	struct crypto_cipher *child = ctx->child;
+	void (*xor)(u8 *, const u8 *, unsigned int bs) = ctx->xor;
+
+	BUG_ON(crypto_tfm_alg_capabilities(crypto_cipher_tfm(child)) &
+		CRYPTO_ALG_DMA);
+
+	if (nbytes == 0)
+		return 0;
+
+	memset(&walk, 0, sizeof(walk));
+	walk.src.virt.addr = (u8 *) src;
+	walk.dst.virt.addr = (u8 *) dst;
+	walk.nbytes = nbytes;
+	walk.total = nbytes;
+	walk.iv = desc->info;
+
+	if (walk.src.virt.addr == walk.dst.virt.addr)
+		nbytes = crypto_pcbc_decrypt_inplace(desc, &walk, child, xor);
+	else
+		nbytes = crypto_pcbc_decrypt_segment(desc, &walk, child, xor);
+	return 0;
+}
+
 static void xor_byte(u8 *a, const u8 *b, unsigned int bs)
 {
 	do {
@@ -313,6 +373,8 @@ static struct crypto_instance *crypto_pcbc_alloc(void *param, unsigned int len)
 	inst->alg.cra_blkcipher.setkey = crypto_pcbc_setkey;
 	inst->alg.cra_blkcipher.encrypt = crypto_pcbc_encrypt;
 	inst->alg.cra_blkcipher.decrypt = crypto_pcbc_decrypt;
+	inst->alg.cra_blkcipher.encrypt_kernel = crypto_pcbc_encrypt_kernel;
+	inst->alg.cra_blkcipher.decrypt_kernel = crypto_pcbc_decrypt_kernel;
 
 out_put_alg:
 	crypto_mod_put(alg);
diff --git a/include/linux/crypto.h b/include/linux/crypto.h
index 779aa78..17e786a 100644
--- a/include/linux/crypto.h
+++ b/include/linux/crypto.h
@@ -40,7 +40,10 @@
 #define CRYPTO_ALG_LARVAL		0x00000010
 #define CRYPTO_ALG_DEAD			0x00000020
 #define CRYPTO_ALG_DYING		0x00000040
-#define CRYPTO_ALG_ASYNC		0x00000080
+
+#define CRYPTO_ALG_CAP_MASK		0x00000180	/* capabilities mask */
+#define CRYPTO_ALG_ASYNC		0x00000080	/* capable of async operation */
+#define CRYPTO_ALG_DMA			0x00000100	/* capable of using of DMA */
 
 /*
  * Set this bit if and only if the algorithm requires another algorithm of
@@ -125,6 +128,10 @@ struct blkcipher_alg {
 	int (*decrypt)(struct blkcipher_desc *desc,
 		       struct scatterlist *dst, struct scatterlist *src,
 		       unsigned int nbytes);
+	int (*encrypt_kernel)(struct blkcipher_desc *desc, u8 *dst,
+			      const u8 *src, unsigned int nbytes);
+	int (*decrypt_kernel)(struct blkcipher_desc *desc, u8 *dst,
+			      const u8 *src, unsigned int nbytes);
 
 	unsigned int min_keysize;
 	unsigned int max_keysize;
@@ -240,6 +247,10 @@ struct blkcipher_tfm {
 		       struct scatterlist *src, unsigned int nbytes);
 	int (*decrypt)(struct blkcipher_desc *desc, struct scatterlist *dst,
 		       struct scatterlist *src, unsigned int nbytes);
+	int (*encrypt_kernel)(struct blkcipher_desc *desc, u8 *dst,
+			      const u8 *src, unsigned int nbytes);
+	int (*decrypt_kernel)(struct blkcipher_desc *desc, u8 *dst,
+			      const u8 *src, unsigned int nbytes);
 };
 
 struct cipher_tfm {
@@ -372,6 +383,11 @@ static inline u32 crypto_tfm_alg_type(struct crypto_tfm *tfm)
 	return tfm->__crt_alg->cra_flags & CRYPTO_ALG_TYPE_MASK;
 }
 
+static inline u32 crypto_tfm_alg_capabilities(struct crypto_tfm *tfm)
+{
+	return tfm->__crt_alg->cra_flags & CRYPTO_ALG_CAP_MASK;
+}
+
 static inline unsigned int crypto_tfm_alg_blocksize(struct crypto_tfm *tfm)
 {
 	return tfm->__crt_alg->cra_blocksize;
@@ -529,6 +545,56 @@ static inline int crypto_blkcipher_encrypt_iv(struct blkcipher_desc *desc,
 	return crypto_blkcipher_crt(desc->tfm)->encrypt(desc, dst, src, nbytes);
 }
 
+/**
+ * crypto_blkcipher_encrypt_kernel - Encrypt flat kernel buffer
+ * - @desc - block cipher descriptor indicating the encryption to apply
+ * - @dst - output buffer
+ * - @src - input data
+ * - @nbytes - amount of data
+ *
+ * Encrypt data contained in a flat kernel buffer into another flat kernel
+ * buffer.  This avoids the need to spend resources to set up a scatterlist for
+ * a very small amount of data.  The encryption begins by selecting the
+ * initialisation vector of the actual block cipher as the initialisation
+ * vector to use and update.  This leaves the IV in the cipher altered.
+ *
+ * This should not be used with a cipher that's marked CRYPTO_ALG_DMA as the
+ * DMA process requires a scatterlist to locate the physical pages on which the
+ * data resides.
+ */
+static inline void crypto_blkcipher_encrypt_kernel(struct blkcipher_desc *desc,
+						   u8 *dst, const u8 *src,
+						   unsigned int nbytes)
+{
+	desc->info = crypto_blkcipher_crt(desc->tfm)->iv;
+	crypto_blkcipher_crt(desc->tfm)->encrypt_kernel(desc, dst, src,
+							nbytes);
+}
+
+/**
+ * crypto_blkcipher_encrypt_kernel_iv - Encrypt flat kernel buffer
+ * - @desc - block cipher descriptor indicating the encryption to apply
+ * - @dst - output buffer
+ * - @src - input data
+ * - @nbytes - amount of data
+ *
+ * Encrypt data contained in a flat kernel buffer into another flat kernel
+ * buffer.  This avoids the need to spend resources to set up a scatterlist for
+ * a very small amount of data.  The encryption proceeds from the
+ * initialisation vector held within the block cipher descriptor.
+ *
+ * This should not be used with a cipher that's marked CRYPTO_ALG_DMA as the
+ * DMA process requires a scatterlist to locate the physical pages on which the
+ * data resides.
+ */
+static inline void crypto_blkcipher_encrypt_kernel_iv(
+	struct blkcipher_desc *desc, u8 *dst, const u8 *src,
+	unsigned int nbytes)
+{
+	crypto_blkcipher_crt(desc->tfm)->encrypt_kernel(desc, dst, src,
+							nbytes);
+}
+
 static inline int crypto_blkcipher_decrypt(struct blkcipher_desc *desc,
 					   struct scatterlist *dst,
 					   struct scatterlist *src,
@@ -546,6 +612,56 @@ static inline int crypto_blkcipher_decrypt_iv(struct blkcipher_desc *desc,
 	return crypto_blkcipher_crt(desc->tfm)->decrypt(desc, dst, src, nbytes);
 }
 
+/**
+ * crypto_blkcipher_decrypt_kernel - Decrypt flat kernel buffer
+ * - @desc - block cipher descriptor indicating the decryption to apply
+ * - @dst - output buffer
+ * - @src - input data
+ * - @nbytes - amount of data
+ *
+ * Decrypt data contained in a flat kernel buffer into another flat kernel
+ * buffer.  This avoids the need to spend resources to set up a scatterlist for
+ * a very small amount of data.  The decryption begins by selecting the
+ * initialisation vector of the actual block cipher as the initialisation
+ * vector to use and update.  This leaves the IV in the cipher altered.
+ *
+ * This should not be used with a cipher that's marked CRYPTO_ALG_DMA as the
+ * DMA process requires a scatterlist to locate the physical pages on which the
+ * data resides.
+ */
+static inline void crypto_blkcipher_decrypt_kernel(struct blkcipher_desc *desc,
+						   u8 *dst, const u8 *src,
+						   unsigned int nbytes)
+{
+	desc->info = crypto_blkcipher_crt(desc->tfm)->iv;
+	crypto_blkcipher_crt(desc->tfm)->decrypt_kernel(desc, dst, src,
+							nbytes);
+}
+
+/**
+ * crypto_blkcipher_decrypt_kernel_iv - Decrypt flat kernel buffer
+ * - @desc - block cipher descriptor indicating the decryption to apply
+ * - @dst - output buffer
+ * - @src - input data
+ * - @nbytes - amount of data
+ *
+ * Encrypt data contained in a flat kernel buffer into another flat kernel
+ * buffer.  This avoids the need to spend resources to set up a scatterlist for
+ * a very small amount of data.  The decryption proceeds from the
+ * initialisation vector held within the block cipher descriptor.
+ *
+ * This should not be used with a cipher that's marked CRYPTO_ALG_DMA as the
+ * DMA process requires a scatterlist to locate the physical pages on which the
+ * data resides.
+ */
+static inline void crypto_blkcipher_decrypt_kernel_iv(
+	struct blkcipher_desc *desc, u8 *dst, const u8 *src,
+	unsigned int nbytes)
+{
+	crypto_blkcipher_crt(desc->tfm)->decrypt_kernel(desc, dst, src,
+							nbytes);
+}
+
 static inline void crypto_blkcipher_set_iv(struct crypto_blkcipher *tfm,
 					   const u8 *src, unsigned int len)
 {


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 2/5] AF_RXRPC: Move generic skbuff stuff from XFRM code to generic code [try #3]
  2007-03-20 19:58 ` David Howells
  2007-03-20 19:59   ` [PATCH 1/5] AF_RXRPC: Add blkcipher accessors for using kernel data directly " David Howells
@ 2007-03-20 19:59   ` David Howells
  2007-03-20 19:59   ` [PATCH 3/5] AF_RXRPC: Make it possible to merely try to cancel timers and delayed work " David Howells
                     ` (9 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: David Howells @ 2007-03-20 19:59 UTC (permalink / raw)
  To: davem, netdev, herbert.xu; +Cc: linux-kernel, hch, arjan, dhowells

Move generic skbuff stuff from XFRM code to generic code so that AF_RXRPC can
use it too.

The kdoc comments I've attached to the functions needs to be checked by whoever
wrote them as I had to make some guesses about the workings of these functions.

Signed-Off-By: David Howells <dhowells@redhat.com>
---

 include/linux/skbuff.h |    6 ++
 include/net/esp.h      |    2 -
 net/core/skbuff.c      |  188 ++++++++++++++++++++++++++++++++++++++++++++++++
 net/xfrm/xfrm_algo.c   |  169 -------------------------------------------
 4 files changed, 194 insertions(+), 171 deletions(-)

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 4ff3940..9e70270 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -83,6 +83,7 @@
  */
 
 struct net_device;
+struct scatterlist;
 
 #ifdef CONFIG_NETFILTER
 struct nf_conntrack {
@@ -363,6 +364,11 @@ extern struct sk_buff *skb_realloc_headroom(struct sk_buff *skb,
 extern struct sk_buff *skb_copy_expand(const struct sk_buff *skb,
 				       int newheadroom, int newtailroom,
 				       gfp_t priority);
+extern int	       skb_to_sgvec(struct sk_buff *skb,
+				    struct scatterlist *sg, int offset,
+				    int len);
+extern int	       skb_cow_data(struct sk_buff *skb, int tailbits,
+				    struct sk_buff **trailer);
 extern int	       skb_pad(struct sk_buff *skb, int pad);
 #define dev_kfree_skb(a)	kfree_skb(a)
 extern void	      skb_over_panic(struct sk_buff *skb, int len,
diff --git a/include/net/esp.h b/include/net/esp.h
index 713d039..d05d8d2 100644
--- a/include/net/esp.h
+++ b/include/net/esp.h
@@ -40,8 +40,6 @@ struct esp_data
 	} auth;
 };
 
-extern int skb_to_sgvec(struct sk_buff *skb, struct scatterlist *sg, int offset, int len);
-extern int skb_cow_data(struct sk_buff *skb, int tailbits, struct sk_buff **trailer);
 extern void *pskb_put(struct sk_buff *skb, struct sk_buff *tail, int len);
 
 static inline int esp_mac_digest(struct esp_data *esp, struct sk_buff *skb,
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 702fa8f..f3ed31b 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -55,6 +55,7 @@
 #include <linux/cache.h>
 #include <linux/rtnetlink.h>
 #include <linux/init.h>
+#include <linux/scatterlist.h>
 
 #include <net/protocol.h>
 #include <net/dst.h>
@@ -2060,6 +2061,190 @@ void __init skb_init(void)
 						NULL, NULL);
 }
 
+/**
+ *	skb_to_sgvec - Fill a scatter-gather list from a socket buffer
+ *	@skb: Socket buffer containing the buffers to be mapped
+ *	@sg: The scatter-gather list to map into
+ *	@offset: The offset into the buffer's contents to start mapping
+ *	@len: Length of buffer space to be mapped
+ *
+ *	Fill the specified scatter-gather list with mappings/pointers into a
+ *	region of the buffer space attached to a socket buffer.
+ */
+int
+skb_to_sgvec(struct sk_buff *skb, struct scatterlist *sg, int offset, int len)
+{
+	int start = skb_headlen(skb);
+	int i, copy = start - offset;
+	int elt = 0;
+
+	if (copy > 0) {
+		if (copy > len)
+			copy = len;
+		sg[elt].page = virt_to_page(skb->data + offset);
+		sg[elt].offset = (unsigned long)(skb->data + offset) % PAGE_SIZE;
+		sg[elt].length = copy;
+		elt++;
+		if ((len -= copy) == 0)
+			return elt;
+		offset += copy;
+	}
+
+	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
+		int end;
+
+		BUG_TRAP(start <= offset + len);
+
+		end = start + skb_shinfo(skb)->frags[i].size;
+		if ((copy = end - offset) > 0) {
+			skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
+
+			if (copy > len)
+				copy = len;
+			sg[elt].page = frag->page;
+			sg[elt].offset = frag->page_offset+offset-start;
+			sg[elt].length = copy;
+			elt++;
+			if (!(len -= copy))
+				return elt;
+			offset += copy;
+		}
+		start = end;
+	}
+
+	if (skb_shinfo(skb)->frag_list) {
+		struct sk_buff *list = skb_shinfo(skb)->frag_list;
+
+		for (; list; list = list->next) {
+			int end;
+
+			BUG_TRAP(start <= offset + len);
+
+			end = start + list->len;
+			if ((copy = end - offset) > 0) {
+				if (copy > len)
+					copy = len;
+				elt += skb_to_sgvec(list, sg+elt, offset - start, copy);
+				if ((len -= copy) == 0)
+					return elt;
+				offset += copy;
+			}
+			start = end;
+		}
+	}
+	BUG_ON(len);
+	return elt;
+}
+
+/**
+ *	skb_cow_data - Check that a socket buffer's data buffers are writable
+ *	@skb: The socket buffer to check.
+ *	@tailbits: Amount of trailing space to be added
+ *	@trailer: Returned pointer to the skb where the @tailbits space begins
+ *
+ *	Make sure that the data buffers attached to a socket buffer are
+ *	writable. If they are not, private copies are made of the data buffers
+ *	and the socket buffer is set to use these instead.
+ *
+ *	If @tailbits is given, make sure that there is space to write @tailbits
+ *	bytes of data beyond current end of socket buffer.  @trailer will be
+ *	set to point to the skb in which this space begins.
+ *
+ *	The number of scatterlist elements required to completely map the
+ *	COW'd and extended socket buffer will be returned.
+ */
+int skb_cow_data(struct sk_buff *skb, int tailbits, struct sk_buff **trailer)
+{
+	int copyflag;
+	int elt;
+	struct sk_buff *skb1, **skb_p;
+
+	/* If skb is cloned or its head is paged, reallocate
+	 * head pulling out all the pages (pages are considered not writable
+	 * at the moment even if they are anonymous).
+	 */
+	if ((skb_cloned(skb) || skb_shinfo(skb)->nr_frags) &&
+	    __pskb_pull_tail(skb, skb_pagelen(skb)-skb_headlen(skb)) == NULL)
+		return -ENOMEM;
+
+	/* Easy case. Most of packets will go this way. */
+	if (!skb_shinfo(skb)->frag_list) {
+		/* A little of trouble, not enough of space for trailer.
+		 * This should not happen, when stack is tuned to generate
+		 * good frames. OK, on miss we reallocate and reserve even more
+		 * space, 128 bytes is fair. */
+
+		if (skb_tailroom(skb) < tailbits &&
+		    pskb_expand_head(skb, 0, tailbits-skb_tailroom(skb)+128, GFP_ATOMIC))
+			return -ENOMEM;
+
+		/* Voila! */
+		*trailer = skb;
+		return 1;
+	}
+
+	/* Misery. We are in troubles, going to mincer fragments... */
+
+	elt = 1;
+	skb_p = &skb_shinfo(skb)->frag_list;
+	copyflag = 0;
+
+	while ((skb1 = *skb_p) != NULL) {
+		int ntail = 0;
+
+		/* The fragment is partially pulled by someone,
+		 * this can happen on input. Copy it and everything
+		 * after it. */
+
+		if (skb_shared(skb1))
+			copyflag = 1;
+
+		/* If the skb is the last, worry about trailer. */
+
+		if (skb1->next == NULL && tailbits) {
+			if (skb_shinfo(skb1)->nr_frags ||
+			    skb_shinfo(skb1)->frag_list ||
+			    skb_tailroom(skb1) < tailbits)
+				ntail = tailbits + 128;
+		}
+
+		if (copyflag ||
+		    skb_cloned(skb1) ||
+		    ntail ||
+		    skb_shinfo(skb1)->nr_frags ||
+		    skb_shinfo(skb1)->frag_list) {
+			struct sk_buff *skb2;
+
+			/* Fuck, we are miserable poor guys... */
+			if (ntail == 0)
+				skb2 = skb_copy(skb1, GFP_ATOMIC);
+			else
+				skb2 = skb_copy_expand(skb1,
+						       skb_headroom(skb1),
+						       ntail,
+						       GFP_ATOMIC);
+			if (unlikely(skb2 == NULL))
+				return -ENOMEM;
+
+			if (skb1->sk)
+				skb_set_owner_w(skb2, skb1->sk);
+
+			/* Looking around. Are we still alive?
+			 * OK, link new skb, drop old one */
+
+			skb2->next = skb1->next;
+			*skb_p = skb2;
+			kfree_skb(skb1);
+			skb1 = skb2;
+		}
+		elt++;
+		*trailer = skb1;
+		skb_p = &skb1->next;
+	}
+
+	return elt;
+}
+
 EXPORT_SYMBOL(___pskb_trim);
 EXPORT_SYMBOL(__kfree_skb);
 EXPORT_SYMBOL(kfree_skb);
@@ -2094,3 +2279,6 @@ EXPORT_SYMBOL(skb_seq_read);
 EXPORT_SYMBOL(skb_abort_seq_read);
 EXPORT_SYMBOL(skb_find_text);
 EXPORT_SYMBOL(skb_append_datato_frags);
+
+EXPORT_SYMBOL_GPL(skb_to_sgvec);
+EXPORT_SYMBOL_GPL(skb_cow_data);
diff --git a/net/xfrm/xfrm_algo.c b/net/xfrm/xfrm_algo.c
index f373a8a..6249a94 100644
--- a/net/xfrm/xfrm_algo.c
+++ b/net/xfrm/xfrm_algo.c
@@ -612,175 +612,6 @@ EXPORT_SYMBOL_GPL(skb_icv_walk);
 
 #if defined(CONFIG_INET_ESP) || defined(CONFIG_INET_ESP_MODULE) || defined(CONFIG_INET6_ESP) || defined(CONFIG_INET6_ESP_MODULE)
 
-/* Looking generic it is not used in another places. */
-
-int
-skb_to_sgvec(struct sk_buff *skb, struct scatterlist *sg, int offset, int len)
-{
-	int start = skb_headlen(skb);
-	int i, copy = start - offset;
-	int elt = 0;
-
-	if (copy > 0) {
-		if (copy > len)
-			copy = len;
-		sg[elt].page = virt_to_page(skb->data + offset);
-		sg[elt].offset = (unsigned long)(skb->data + offset) % PAGE_SIZE;
-		sg[elt].length = copy;
-		elt++;
-		if ((len -= copy) == 0)
-			return elt;
-		offset += copy;
-	}
-
-	for (i = 0; i < skb_shinfo(skb)->nr_frags; i++) {
-		int end;
-
-		BUG_TRAP(start <= offset + len);
-
-		end = start + skb_shinfo(skb)->frags[i].size;
-		if ((copy = end - offset) > 0) {
-			skb_frag_t *frag = &skb_shinfo(skb)->frags[i];
-
-			if (copy > len)
-				copy = len;
-			sg[elt].page = frag->page;
-			sg[elt].offset = frag->page_offset+offset-start;
-			sg[elt].length = copy;
-			elt++;
-			if (!(len -= copy))
-				return elt;
-			offset += copy;
-		}
-		start = end;
-	}
-
-	if (skb_shinfo(skb)->frag_list) {
-		struct sk_buff *list = skb_shinfo(skb)->frag_list;
-
-		for (; list; list = list->next) {
-			int end;
-
-			BUG_TRAP(start <= offset + len);
-
-			end = start + list->len;
-			if ((copy = end - offset) > 0) {
-				if (copy > len)
-					copy = len;
-				elt += skb_to_sgvec(list, sg+elt, offset - start, copy);
-				if ((len -= copy) == 0)
-					return elt;
-				offset += copy;
-			}
-			start = end;
-		}
-	}
-	BUG_ON(len);
-	return elt;
-}
-EXPORT_SYMBOL_GPL(skb_to_sgvec);
-
-/* Check that skb data bits are writable. If they are not, copy data
- * to newly created private area. If "tailbits" is given, make sure that
- * tailbits bytes beyond current end of skb are writable.
- *
- * Returns amount of elements of scatterlist to load for subsequent
- * transformations and pointer to writable trailer skb.
- */
-
-int skb_cow_data(struct sk_buff *skb, int tailbits, struct sk_buff **trailer)
-{
-	int copyflag;
-	int elt;
-	struct sk_buff *skb1, **skb_p;
-
-	/* If skb is cloned or its head is paged, reallocate
-	 * head pulling out all the pages (pages are considered not writable
-	 * at the moment even if they are anonymous).
-	 */
-	if ((skb_cloned(skb) || skb_shinfo(skb)->nr_frags) &&
-	    __pskb_pull_tail(skb, skb_pagelen(skb)-skb_headlen(skb)) == NULL)
-		return -ENOMEM;
-
-	/* Easy case. Most of packets will go this way. */
-	if (!skb_shinfo(skb)->frag_list) {
-		/* A little of trouble, not enough of space for trailer.
-		 * This should not happen, when stack is tuned to generate
-		 * good frames. OK, on miss we reallocate and reserve even more
-		 * space, 128 bytes is fair. */
-
-		if (skb_tailroom(skb) < tailbits &&
-		    pskb_expand_head(skb, 0, tailbits-skb_tailroom(skb)+128, GFP_ATOMIC))
-			return -ENOMEM;
-
-		/* Voila! */
-		*trailer = skb;
-		return 1;
-	}
-
-	/* Misery. We are in troubles, going to mincer fragments... */
-
-	elt = 1;
-	skb_p = &skb_shinfo(skb)->frag_list;
-	copyflag = 0;
-
-	while ((skb1 = *skb_p) != NULL) {
-		int ntail = 0;
-
-		/* The fragment is partially pulled by someone,
-		 * this can happen on input. Copy it and everything
-		 * after it. */
-
-		if (skb_shared(skb1))
-			copyflag = 1;
-
-		/* If the skb is the last, worry about trailer. */
-
-		if (skb1->next == NULL && tailbits) {
-			if (skb_shinfo(skb1)->nr_frags ||
-			    skb_shinfo(skb1)->frag_list ||
-			    skb_tailroom(skb1) < tailbits)
-				ntail = tailbits + 128;
-		}
-
-		if (copyflag ||
-		    skb_cloned(skb1) ||
-		    ntail ||
-		    skb_shinfo(skb1)->nr_frags ||
-		    skb_shinfo(skb1)->frag_list) {
-			struct sk_buff *skb2;
-
-			/* Fuck, we are miserable poor guys... */
-			if (ntail == 0)
-				skb2 = skb_copy(skb1, GFP_ATOMIC);
-			else
-				skb2 = skb_copy_expand(skb1,
-						       skb_headroom(skb1),
-						       ntail,
-						       GFP_ATOMIC);
-			if (unlikely(skb2 == NULL))
-				return -ENOMEM;
-
-			if (skb1->sk)
-				skb_set_owner_w(skb2, skb1->sk);
-
-			/* Looking around. Are we still alive?
-			 * OK, link new skb, drop old one */
-
-			skb2->next = skb1->next;
-			*skb_p = skb2;
-			kfree_skb(skb1);
-			skb1 = skb2;
-		}
-		elt++;
-		*trailer = skb1;
-		skb_p = &skb1->next;
-	}
-
-	return elt;
-}
-EXPORT_SYMBOL_GPL(skb_cow_data);
-
 void *pskb_put(struct sk_buff *skb, struct sk_buff *tail, int len)
 {
 	if (tail != skb) {


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 3/5] AF_RXRPC: Make it possible to merely try to cancel timers and delayed work [try #3]
  2007-03-20 19:58 ` David Howells
  2007-03-20 19:59   ` [PATCH 1/5] AF_RXRPC: Add blkcipher accessors for using kernel data directly " David Howells
  2007-03-20 19:59   ` [PATCH 2/5] AF_RXRPC: Move generic skbuff stuff from XFRM code to generic code " David Howells
@ 2007-03-20 19:59   ` David Howells
  2007-03-20 19:59   ` [PATCH 4/5] AF_RXRPC: Key facility changes for AF_RXRPC " David Howells
                     ` (8 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: David Howells @ 2007-03-20 19:59 UTC (permalink / raw)
  To: davem, netdev, herbert.xu; +Cc: linux-kernel, hch, arjan, dhowells

Export try_to_del_timer_sync() for use by the RxRPC module.

Add a try_to_cancel_delayed_work() so that it is possible to merely attempt to
cancel a delayed work timer.

Signed-Off-By: David Howells <dhowells@redhat.com>
---

 include/linux/workqueue.h |   21 +++++++++++++++++++++
 kernel/timer.c            |    2 ++
 2 files changed, 23 insertions(+), 0 deletions(-)

diff --git a/include/linux/workqueue.h b/include/linux/workqueue.h
index 2a7b38d..40a61ae 100644
--- a/include/linux/workqueue.h
+++ b/include/linux/workqueue.h
@@ -204,4 +204,25 @@ static inline int cancel_delayed_work(struct delayed_work *work)
 	return ret;
 }
 
+/**
+ * try_to_cancel_delayed_work - Try to kill pending scheduled, delayed work
+ * @work: the work to cancel
+ *
+ * Try to kill off a pending schedule_delayed_work().
+ * - The timer may still be running afterwards, and if so, the work may still
+ *   be pending
+ * - Returns -1 if timer still active, 1 if timer removed, 0 if not scheduled
+ * - Can be called from the work routine; if it's still pending, just return
+ *   and it'll be called again.
+ */
+static inline int try_to_cancel_delayed_work(struct delayed_work *work)
+{
+	int ret;
+
+	ret = try_to_del_timer_sync(&work->timer);
+	if (ret > 0)
+		work_release(&work->work);
+	return ret;
+}
+
 #endif
diff --git a/kernel/timer.c b/kernel/timer.c
index 797cccb..447506a 100644
--- a/kernel/timer.c
+++ b/kernel/timer.c
@@ -505,6 +505,8 @@ out:
 	return ret;
 }
 
+EXPORT_SYMBOL(try_to_del_timer_sync);
+
 /**
  * del_timer_sync - deactivate a timer and wait for the handler to finish.
  * @timer: the timer to be deactivated


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* [PATCH 4/5] AF_RXRPC: Key facility changes for AF_RXRPC [try #3]
  2007-03-20 19:58 ` David Howells
                     ` (2 preceding siblings ...)
  2007-03-20 19:59   ` [PATCH 3/5] AF_RXRPC: Make it possible to merely try to cancel timers and delayed work " David Howells
@ 2007-03-20 19:59   ` David Howells
  2007-03-20 20:22   ` [PATCH 0/5] [RFC] AF_RXRPC socket family implementation " David Howells
                     ` (7 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: David Howells @ 2007-03-20 19:59 UTC (permalink / raw)
  To: davem, netdev, herbert.xu; +Cc: linux-kernel, hch, arjan, dhowells

Export the keyring key type definition and document its availability.

Add alternative types into the key's type_data union to make it more useful.
Not all users necessarily want to use it as a list_head (AF_RXRPC doesn't, for
example), so make it clear that it can be used in other ways.

Signed-Off-By: David Howells <dhowells@redhat.com>
---

 Documentation/keys.txt  |   12 ++++++++++++
 include/linux/key.h     |    2 ++
 security/keys/keyring.c |    2 ++
 3 files changed, 16 insertions(+), 0 deletions(-)

diff --git a/Documentation/keys.txt b/Documentation/keys.txt
index 60c665d..81d9aa0 100644
--- a/Documentation/keys.txt
+++ b/Documentation/keys.txt
@@ -859,6 +859,18 @@ payload contents" for more information.
 	void unregister_key_type(struct key_type *type);
 
 
+Under some circumstances, it may be desirable to desirable to deal with a
+bundle of keys.  The facility provides access to the keyring type for managing
+such a bundle:
+
+	struct key_type key_type_keyring;
+
+This can be used with a function such as request_key() to find a specific
+keyring in a process's keyrings.  A keyring thus found can then be searched
+with keyring_search().  Note that it is not possible to use request_key() to
+search a specific keyring, so using keyrings in this way is of limited utility.
+
+
 ===================================
 NOTES ON ACCESSING PAYLOAD CONTENTS
 ===================================
diff --git a/include/linux/key.h b/include/linux/key.h
index 169f05e..a9220e7 100644
--- a/include/linux/key.h
+++ b/include/linux/key.h
@@ -160,6 +160,8 @@ struct key {
 	 */
 	union {
 		struct list_head	link;
+		unsigned long		x[2];
+		void			*p[2];
 	} type_data;
 
 	/* key data
diff --git a/security/keys/keyring.c b/security/keys/keyring.c
index ad45ce7..88292e3 100644
--- a/security/keys/keyring.c
+++ b/security/keys/keyring.c
@@ -66,6 +66,8 @@ struct key_type key_type_keyring = {
 	.read		= keyring_read,
 };
 
+EXPORT_SYMBOL(key_type_keyring);
+
 /*
  * semaphore to serialise link/link calls to prevent two link calls in parallel
  * introducing a cycle


^ permalink raw reply related	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/5] [RFC] AF_RXRPC socket family implementation [try #3] 
  2007-03-20 19:58 ` David Howells
                     ` (3 preceding siblings ...)
  2007-03-20 19:59   ` [PATCH 4/5] AF_RXRPC: Key facility changes for AF_RXRPC " David Howells
@ 2007-03-20 20:22   ` David Howells
  2007-03-20 21:12   ` Alan Cox
                     ` (6 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: David Howells @ 2007-03-20 20:22 UTC (permalink / raw)
  To: Alan Cox; +Cc: davem, netdev, herbert.xu, linux-kernel, hch, arjan

Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:

> Some of them don't seem to be making it through to the list and are
> dropped each time btw

Yeah.  It's a size issue.  The fifth patch is a third of a megabyte.  The
patches at the URLs should now be updated (I'd forgotten to do that before
sending the emails).

David

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/5] [RFC] AF_RXRPC socket family implementation [try #3]
  2007-03-20 19:58 ` David Howells
                     ` (4 preceding siblings ...)
  2007-03-20 20:22   ` [PATCH 0/5] [RFC] AF_RXRPC socket family implementation " David Howells
@ 2007-03-20 21:12   ` Alan Cox
  2007-03-20 21:36   ` Alan Cox
                     ` (5 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: Alan Cox @ 2007-03-20 21:12 UTC (permalink / raw)
  To: David Howells
  Cc: davem, netdev, herbert.xu, linux-kernel, hch, arjan, dhowells

>  (*) SOCK_RPC has been removed.  AF_RXRPC sockets now simply ignore the "type"
>      argument to socket().

This is also incorrect

NAK


^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/5] [RFC] AF_RXRPC socket family implementation [try #3]
  2007-03-20 19:58 ` David Howells
@ 2007-03-20 21:14 Alan Cox
  2007-03-20 19:58 ` David Howells
  11 siblings, 1 reply; 14+ messages in thread
From: Alan Cox @ 2007-03-20 21:14 UTC (permalink / raw)
  To: David Howells
  Cc: davem, netdev, herbert.xu, linux-kernel, hch, arjan, dhowells

> These patches together supply secure client-side RxRPC connectivity as a Linux
> kernel socket family.  Only the transport/session side is supplied - the
> presentation side (marshalling the data) is left to the client.  Copies of the
> patches can be found here:

Some of them don't seem to be making it through to the list and are
dropped each time btw

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/5] [RFC] AF_RXRPC socket family implementation [try #3]
  2007-03-20 19:58 ` David Howells
                     ` (5 preceding siblings ...)
  2007-03-20 21:12   ` Alan Cox
@ 2007-03-20 21:36   ` Alan Cox
  2007-03-20 22:10   ` David Howells
                     ` (4 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: Alan Cox @ 2007-03-20 21:36 UTC (permalink / raw)
  To: David Howells
  Cc: davem, netdev, herbert.xu, linux-kernel, hch, arjan, dhowells

Ok quickly going over the code that hasn't made the list

- recvmsg not supporting MSG_TRUNC is rather weird and really ought to be
fixed one day as its useful to find out the sizeof message pending when
combined with MSG_PEEK

- RXRPC_MIN_SECURITY_LEVEL reads into rx->min_sec_level and then if it is
invalid reports an error but doesn't restore the valid level

- Why does rxrpc_writable always return 0 ?

- rxrpc_process_soft_ACKs doesn't itself limit and check acns->nAcks is
always below RXRPC_MAXACKS, as this is a stakc variable it ought to be
paranoid about it. I think its ok from the caller check but its very hard
to prove...


It needs a lot more eyes/review due to the complexity and network
exposure though - not your fault, whoever designed RXRPC's 8)

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/5] [RFC] AF_RXRPC socket family implementation [try #3]
  2007-03-20 19:58 ` David Howells
                     ` (6 preceding siblings ...)
  2007-03-20 21:36   ` Alan Cox
@ 2007-03-20 22:10   ` David Howells
  2007-03-20 22:26   ` David Howells
                     ` (3 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: David Howells @ 2007-03-20 22:10 UTC (permalink / raw)
  To: Alan Cox
  Cc: davem, netdev, herbert.xu, linux-kernel, hch, arjan, David Howells

Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:

> >  (*) SOCK_RPC has been removed.  AF_RXRPC sockets now simply ignore the
> >      "type" argument to socket().
>
> This is also incorrect

Sigh.

And what would you have me do?  There *isn't* an appropriate SOCK_xxx constant
available, and you won't let me add one that is.  Maybe I should just pick
SOCK_DCCP and have done with it; it's as appropriate as DGRAM, RDM, SEQPACKET
or STREAM - except that would be silly.  I assert that RAW and PACKET are both
even less appropriate than any of the other choices.


Let me explain again why I think each choice is incorrect.  You gave me four
choices, which you and POSIX classify thus:

	Constant	Your Service type	POSIX Service Type
	===============	=======================	=======================
	SOCK_DGRAM	Datagram		Datagram
	SOCK_RDM	Datagram		Datagram
	SOCK_SEQPACKET	Datagram		Stream (maybe Datagram)
	SOCK_STREAM	Stream			Stream

A datagram service by definition (as can be found on various websites),
transfers a piece of data from one place to another with no dependence on and
no regard to any other pieces of data that the service is asked to transport.
At its simplest level (SOCK_DGRAM), that's _all_ it does.  SOCK_DGRAM makes no
assertions about whether the datagram will get there, and requires no report
that it did get there.  Furthermore, no ordering at all is imposed on the
sequence in which the far side sees any such pieces of data.

SOCK_RDM is a step up from that.  Again, like SOCK_DGRAM, it presents a
datagram service to the application.  But unlike SOCK_DGRAM, it will attempt to
report the success or failure of the attempt to transfer the data.  This may
involve exchanging further packets with the peer behind the scenes, but
ultimately, all it does is to transfer one piece of data from one peer to
another; the added value is that the sender can attempt to determine whether
this worked.  Furthermore, as for SOCK_DGRAM, no ordering at all is imposed on
the sequence in which the far side sees any such pieces of data.

SOCK_SEQPACKET can be considered a step up from SOCK_RDM.  It can be considered
to provide a reliable datagram service to the application (as SOCK_RDM), but
one in which the datagrams are guaranteed to be seen by the receiver in
precisely the same order as they are sent by the sender, with no datagrams
being lost from the sequence.  In this model, SOCK_SEQPACKET would be seen as
providing two independent, independently ordered streams of datagrams, one in
each direction.

SOCK_STREAM is a data streaming service in which data is guaranteed to come out
of the receiver in precisely the same order as it was put into the transmitter.
Furthermore, SOCK_STREAM can be seen as providing two independent,
independently ordered streams of data, one in each direction.

SOCK_SEQPACKET can also be considered to provide a streaming service (similar
to SOCK_STREAM) in which record boundaries are maintained.  Data is guaranteed
to come out of the receiver in precisely the order as it was put in to the
transmitter, but the receiver will break off and flag MSG_EOR at points in the
data flow that correspond to those at which the sender flagged a record
boundary.  In this model, SOCK_SEQPACKET would be seen as providing two
independent, independently ordered streams of data and record markers, one in
each direction.


In effect, SOCK_STREAM and SOCK_SEQPACKET can each be viewed as a pair of
independent, symmetric unidirectional services, one for each direction:

	+--------+                           +--------+
	|        |      +-------------+      |        |
	|        |----->|  Tx Stream  |----->|        |
	| Local  |      +-------------+      | Remote |
	| Socket |                           | Socket |
   _____|        |___________________________|        |_____
	|        |                           |        |
	|        |      +-------------+      |        |
	|        |<-----|  Rx Stream  |<-----|        |
	|        |      +-------------+      |        |
	+--------+                           +--------+

SOCK_SEQPACKET can give the appearance of being an ordered, reliable datagram
service simply by the application using it assuming that the record boundaries
delimit separate datagrams.

SOCK_DGRAM and SOCK_RDM, on the other hand, can be viewed as being a pair of
independent unidirectional asymmetric services, one that spits out packets and
one that collects datagrams.

	+--------+     +--+                   +--------+
	|        |     |Tx|----------         |        |
	|        |     +--+          \        |        |
	|        |    /               \       |        |
	|        |   /   +--+          \      |        |
	|        |->+--->|Tx|---------------->|        |
	| Local  |   \   +--+            \    | Remote |
	| Socket |    \                   --->| Socket |
	|        |     +--+                   |        |
	|        |     |Tx|--------X          |        |
	|        |     +--+                   |        |
   _____|        |____________________________|        |_____
	|        |                            |        |
	|        |                    +--+    |        |
	|        |              X-----|Rx|    |        |
	|        |                    +--+    |        |
	|        |                        \   |        |
	|        |                         +<-|        |
	|        |                        /   |        |
	|        |                    +--+    |        |
	|        |<-------------------|Rx|    |        |
	+--------+                    +--+    +--------+

The fact that SOCK_RDM may exchange extra packets behind the scenes is more or
less transparent to the application, and doesn't affect the apparent data flow
model.

Another way to look at it is that with all these four types, a half-way
shutdown() operation makes sense.  With SOCK_STREAM or SOCK_SEQPACKET it just
eliminates one of the two independent stream services; with SOCK_DGRAM or
SOCK_RDM, it eliminates the either capability to send datagrams or capability
to receive datagrams.

So, a quick summary of the guarantees:

 (1) SOCK_DGRAM: Datagram passing service.  Totally unordered and unreliable;
     the presentation of received packets is totally independent of the
     datagrams accepted for transmission - either side might be shut down
     without affecting the other.

 (2) SOCK_RDM: As SOCK_DGRAM, but includes reporting of lost datagrams and
     presumably resending and duplication elimination.  Again, reception is
     apparently (to the application) independent of transmission, and either
     side may be shut down without affecting the other.

 (3) SOCK_STREAM: Reliable and precisely ordered.  The data coming in is
     totally independent of the data going out as far as the application need
     be concerned.  Either side might be shut down without affecting the other.

 (4) SOCK_SEQPACKET: As SOCK_STREAM, but record boundaries may be placed in the
     transmission data stream by the sender, and these are also precisely
     ordered with respect to that data stream (and each other).  The input and
     output data streams are totally independent as far as the application need
     be concerned.  Either side might be shut down without affecting the other.

Now, do you note a common theme.  In each case, you can draw a line through a
pair of sockets; as far as the application is concerned, all the Tx side is on
one side of the line, and all the Rx side is on the other side of the line.
Either side might be shut down without affecting the other.  What goes on
underneath is (or should be) transparent to the application.

Furthermore, the above are either totally unordered or precisely ordered, but
only with respect to each side.  There is no ordering between sides at all
(they're completely independent to the application).

Do you agree with what I've said so far in my explanation?


Okay, now to RxRPC:

 (1) In RxRPC the sending side and the transmission side are irrevocably
     intertwined.  The reply phase of a call is dependent on the preceding
     request phase, despite the fact they go in opposite directions.  Both
     phases must take place in order to complete a call, and so there's a
     partial ordering across the two sides.

 (2) The messages that make up a particular request and its corresponding reply
     are precisely ordered with respect to each other, but are in no way
     ordered with respect to the messages that make up another request and its
     corresponding reply that go over the same socket.  This is something the
     application sees and has to deal with.

 (3) Partial shutdown() makes no sense with RxRPC sockets.  Shutting down the
     receiver in a client socket, for example, means that the app can't be
     presented with the reply to any call it sent, in which case the call would
     have to automatically be aborted.

 (4) A socket may maintain many simultaneous but independent calls.  The
     messages passed to sendmsg() for these may be interleaved at will, and
     recvmsg() will present them to the application in some random interleaving
     (though the ordering of the messages making up a particular call will be
     maintained).

Based on (1), (2) and (3), RxRPC is not a datagram service, and so neither
SOCK_DGRAM nor SOCK_RDM are correct.

Based on (1), (2), (3) and (4), RxRPC is not a streaming service, and so
neither SOCK_STREAM nor SOCK_DGRAM are correct.


Note also, the fact that RxRPC transits a SOCK_DGRAM protocol is irrelevant.
The application does not see that (though it may need to note it in its
address).  Similarly, an application does not see that TCP (SOCK_STREAM) deals
with the raw IP datagram protocol or an ethernet protocol (SOCK_RAW)
underneath.

David

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/5] [RFC] AF_RXRPC socket family implementation [try #3] 
  2007-03-20 19:58 ` David Howells
                     ` (7 preceding siblings ...)
  2007-03-20 22:10   ` David Howells
@ 2007-03-20 22:26   ` David Howells
  2007-03-21 13:26   ` David Howells
                     ` (2 subsequent siblings)
  11 siblings, 0 replies; 14+ messages in thread
From: David Howells @ 2007-03-20 22:26 UTC (permalink / raw)
  To: Alan Cox; +Cc: davem, netdev, herbert.xu, linux-kernel, hch, arjan, dhowells

Alan Cox <alan@lxorguk.ukuu.org.uk> wrote:

> - recvmsg not supporting MSG_TRUNC is rather weird and really ought to be
> fixed one day as its useful to find out the sizeof message pending when
> combined with MSG_PEEK

Hmmm...  I hadn't considered that.  I assumed MSG_TRUNC not to be useful as
arbitrarily chopping bits out of the request or reply would seem to be
pointless.

> - RXRPC_MIN_SECURITY_LEVEL reads into rx->min_sec_level and then if it is
> invalid reports an error but doesn't restore the valid level

Fixed.

> - Why does rxrpc_writable always return 0 ?

Good point.  That's slightly tricky to deal with as output messages don't
remain queued on the socket struct itself.  Hmmm...

One thing I'd like to be able to do is pass the sk_buffs I've set up to UDP
directly rather than having to call the UDP socket's sendmsg.  That'd eliminate
a copy.  But I decided to get it working right first, then look at cute
optimisations like that.

Such a thing would also be useful for the AFS filesystem: it could pass skbuffs
it has preloaded to AF_RXRPC, which would then hand them on to UDP.

> - rxrpc_process_soft_ACKs doesn't itself limit and check acns->nAcks is
> always below RXRPC_MAXACKS, as this is a stakc variable it ought to be
> paranoid about it. I think its ok from the caller check but its very hard
> to prove...

nAcks is a uint8_t.  If that can exceed RXRPC_MAXACKS (255) then I suspect I'll
have more pressing worries.  I could put a check in there, but the compiler
would give me a warning:-/

> It needs a lot more eyes/review due to the complexity and network
> exposure though - not your fault, whoever designed RXRPC's 8)

It's not an entirely insane protocol:-) Actually, part of the problem is Linux
itself.

David

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/5] [RFC] AF_RXRPC socket family implementation [try #3] 
  2007-03-20 19:58 ` David Howells
                     ` (8 preceding siblings ...)
  2007-03-20 22:26   ` David Howells
@ 2007-03-21 13:26   ` David Howells
  2007-03-21 18:10   ` David Howells
  2007-03-21 21:32   ` David Howells
  11 siblings, 0 replies; 14+ messages in thread
From: David Howells @ 2007-03-21 13:26 UTC (permalink / raw)
  To: Alan Cox; +Cc: davem, netdev, herbert.xu, linux-kernel, hch, arjan, dhowells

David Howells <dhowells@redhat.com> wrote:

> > - Why does rxrpc_writable always return 0 ?
> 
> Good point.  That's slightly tricky to deal with as output messages don't
> remain queued on the socket struct itself.  Hmmm...

Okay, I've fixed that by changing it to:

	static inline int rxrpc_writable(struct sock *sk)
	{
		return atomic_read(&sk->sk_wmem_alloc) < (size_t) sk->sk_sndbuf;
	}

All the rest of the sk_wmem_alloc management should be automatic through my
use of sock_alloc_send_skb() to allocate Tx buffers.

I notice that AF_UNIX effectively divides sk_sndbuf by four before doing the
comparison.  Any idea why (there's no comment to say)?  I presume it's so that
we show willingness to accept any chunk of data up to 3/4 of sk_sndbuf in size
if we flag POLLOUT, rather than the app finding POLLOUT is set, but it can
only send one byte of data.


I've still got a problem with this, though, and I'm not sure it's easy to
solve.  An RxRPC socket may be performing several calls at once.  It may have
sufficient space on the *socket* to accept more Tx data, but it may not be
possible to do so because the ACK window on the intended call may be full.  So
POLLOUT could be set, and we might still block anyway:-/

This is unlike SOCK_STREAM sockets for TCP or AF_UNIX, I think, because each
of those has but a single transmit queue:-/

I'm not sure there's a lot I can do about that, but there are a few
possibilities:

 (1) Always send data in non-blocking mode if we know there are multiple calls
     and we don't want to get stuck.  We'll get a suitable error message, but
     there's no notification that the Tx queue is has been unblocked.  I could
     add such a notification (it shouldn't be hard), there are two immediately
     obvious ways of doing it:

     (a) Raise a one-shot pollable event to indicate that one of the
     	 in-progress calls' Tx queues have come unstuck.  This'd require
     	 trying the non-stuck calls individually to find out which one it was.

     (b) Queue a message for recvmsg() to grab that says a Tx queue has become
     	 unstuck.  This would permit the control message to indicate the call.

 (2) Require a separate socket for each individual call.  This is a very
     heavyweight way of doing things, as it would require socket inodes and
     other stuff per call.  On the other hand it would permit a certain amount
     more flexibility.  The real downside is that I wouldn't want to be doing
     this for the in-kernel AFS filesystem, but maybe that could bypass the
     use of sockets.

 (3) Allow the app to nominate which call on a socket it wants a poll() on
     that socket to probe next.

David

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/5] [RFC] AF_RXRPC socket family implementation [try #3] 
  2007-03-20 19:58 ` David Howells
                     ` (9 preceding siblings ...)
  2007-03-21 13:26   ` David Howells
@ 2007-03-21 18:10   ` David Howells
  2007-03-21 21:32   ` David Howells
  11 siblings, 0 replies; 14+ messages in thread
From: David Howells @ 2007-03-21 18:10 UTC (permalink / raw)
  Cc: Alan Cox, davem, netdev, herbert.xu, linux-kernel, hch, arjan

David Howells <dhowells@redhat.com> wrote:

> > - recvmsg not supporting MSG_TRUNC is rather weird and really ought to be
> > fixed one day as its useful to find out the sizeof message pending when
> > combined with MSG_PEEK
> 
> Hmmm...  I hadn't considered that.  I assumed MSG_TRUNC not to be useful as
> arbitrarily chopping bits out of the request or reply would seem to be
> pointless.

But why do I need to support MSG_TRUNC?  I currently have things arranged so
that if you do a recvmsg() that doesn't pull everything out of a packet then
the next time you do a recvmsg() you'll get the next part of the data in that
packet.  MSG_EOR is flagged when recvmsg copies across the last byte of data
of a particular phase.

I might at some point in the future enable recvmsg() to keep pulling packets
off the Rx queue and copying them into userspace until the userspace buffer is
full or we find that the next packet is not the logical next in sequence.

Hmmm...  I'm actually overloading MSG_EOR.  MSG_EOR is flagged on the last
data read, and is also flagged for terminal messages (end or reply data,
abort, net error, final ACK, etc).  I wonder if I should use MSG_MORE (or its
lack) instead to indicate the end of data, and only set MSG_EOR on the
terminal message.

MSG_MORE is set by the app to flag to sendmsg() that there's more data to
come, so it would be consistent to use it for recvmsg() too.

David

^ permalink raw reply	[flat|nested] 14+ messages in thread

* Re: [PATCH 0/5] [RFC] AF_RXRPC socket family implementation [try #3] 
  2007-03-20 19:58 ` David Howells
                     ` (10 preceding siblings ...)
  2007-03-21 18:10   ` David Howells
@ 2007-03-21 21:32   ` David Howells
  11 siblings, 0 replies; 14+ messages in thread
From: David Howells @ 2007-03-21 21:32 UTC (permalink / raw)
  To: Alan Cox; +Cc: davem, netdev, herbert.xu, linux-kernel, hch, arjan, dhowells


David Howells <dhowells@redhat.com> wrote:

> > > - recvmsg not supporting MSG_TRUNC is rather weird and really ought to be
> > > fixed one day as its useful to find out the sizeof message pending when
> > > combined with MSG_PEEK
> > 
> > Hmmm...  I hadn't considered that.  I assumed MSG_TRUNC not to be useful as
> > arbitrarily chopping bits out of the request or reply would seem to be
> > pointless.
> 
> But why do I need to support MSG_TRUNC?  I currently have things arranged so
> that if you do a recvmsg() that doesn't pull everything out of a packet then
> the next time you do a recvmsg() you'll get the next part of the data in that
> packet.  MSG_EOR is flagged when recvmsg copies across the last byte of data
> of a particular phase.

Okay...  I've rewritten my recvmsg implementation for RxRPC.  The one I had
could pull messages belonging to a call off the socket in the wrong order if
two threads both tried to pull simultaneously.

Also:

 (1) If there's a sequence of data messages belonging to a particular call on
     the receive queue, then recvmsg() will keep eating them until it meets
     either a non-data message or a message belonging to a different call or
     until it fills the user buffer.  If it doesn't fill the user buffer, it
     will sleep unless it is non-blocking.

 (2) MSG_PEEK operates similarly, but will return immediately if it has put any
     data in the buffer rather than waiting for further packets to arrive.

 (3) If a packet is only partially consumed in filling a user buffer, then the
     shrunken packet will be left on the front of the queue for the next taker.

 (4) If there is more data to be had on a call (we haven't copied the last byte
     of the last data packet in that phase yet), then MSG_MORE will be flagged.

 (5) MSG_EOR will be flagged on the terminal message of a call.  No more
     messages from that call will be received, and the user ID may be reused.

Patch attached.

David

diff --git a/net/rxrpc/Makefile b/net/rxrpc/Makefile
index 3369534..f12cd28 100644
--- a/net/rxrpc/Makefile
+++ b/net/rxrpc/Makefile
@@ -17,6 +17,7 @@ af-rxrpc-objs := \
 	ar-local.o \
 	ar-output.o \
 	ar-peer.o \
+	ar-recvmsg.o \
 	ar-security.o \
 	ar-skbuff.o \
 	ar-transport.o
diff --git a/net/rxrpc/af_rxrpc.c b/net/rxrpc/af_rxrpc.c
index b25d931..06963e6 100644
--- a/net/rxrpc/af_rxrpc.c
+++ b/net/rxrpc/af_rxrpc.c
@@ -385,217 +385,6 @@ out:
 }
 
 /*
- * receive a message from an RxRPC socket
- */
-static int rxrpc_recvmsg(struct kiocb *iocb, struct socket *sock,
-			 struct msghdr *msg, size_t len, int flags)
-{
-	struct rxrpc_skb_priv *sp;
-	struct rxrpc_call *call;
-	struct rxrpc_sock *rx = rxrpc_sk(sock->sk);
-	struct sk_buff *skb;
-	int copy, ret, ullen;
-	u32 abort_code;
-
-	_enter(",,,%zu,%d", len, flags);
-
-	if (flags & (MSG_OOB | MSG_TRUNC))
-		return -EOPNOTSUPP;
-
-try_again:
-	if (RB_EMPTY_ROOT(&rx->calls) &&
-	    rx->sk.sk_state != RXRPC_SERVER_LISTENING)
-		return -ENODATA;
-
-	/* receive the next message from the common Rx queue */
-	skb = skb_recv_datagram(&rx->sk, flags, flags & MSG_DONTWAIT, &ret);
-	if (!skb) {
-		_leave(" = %d", ret);
-		return ret;
-	}
-
-	sp = rxrpc_skb(skb);
-	call = sp->call;
-	ASSERT(call != NULL);
-
-	/* make sure we wait for the state to be updated in this call */
-	spin_lock_bh(&call->lock);
-	spin_unlock_bh(&call->lock);
-
-	if (test_bit(RXRPC_CALL_RELEASED, &call->flags)) {
-		_debug("packet from release call");
-		rxrpc_free_skb(skb);
-		goto try_again;
-	}
-
-	rxrpc_get_call(call);
-
-	/* copy the peer address. */
-	if (msg->msg_name && msg->msg_namelen > 0)
-		memcpy(&msg->msg_name, &call->conn->trans->peer->srx,
-		       sizeof(call->conn->trans->peer->srx));
-
-	/* set up the control messages */
-	ullen = msg->msg_flags & MSG_CMSG_COMPAT ? 4 : sizeof(unsigned long);
-
-	sock_recv_timestamp(msg, &rx->sk, skb);
-
-	if (skb->mark == RXRPC_SKB_MARK_NEW_CALL) {
-		_debug("RECV NEW CALL");
-		ret = put_cmsg(msg, SOL_RXRPC, RXRPC_NEW_CALL, 0, &abort_code);
-		if (ret < 0)
-			goto error_requeue_packet;
-		goto done;
-	}
-
-	ret = put_cmsg(msg, SOL_RXRPC, RXRPC_USER_CALL_ID,
-		       ullen, &call->user_call_ID);
-	if (ret < 0)
-		goto error_requeue_packet;
-	ASSERT(test_bit(RXRPC_CALL_HAS_USERID, &call->flags));
-
-	switch (skb->mark) {
-	case RXRPC_SKB_MARK_DATA:
-		_debug("recvmsg DATA #%u { %d, %d }",
-		       ntohl(sp->hdr.seq), skb->len, sp->offset);
-
-		ASSERTCMP(ntohl(sp->hdr.seq), >=, call->rx_data_recv);
-		ASSERTCMP(ntohl(sp->hdr.seq), <=, call->rx_data_recv + 1);
-		call->rx_data_recv = ntohl(sp->hdr.seq);
-
-		ASSERTCMP(ntohl(sp->hdr.seq), >, call->rx_data_eaten);
-
-		copy = skb->len - sp->offset;
-		if (copy > len)
-			copy = len;
-
-		if (skb->ip_summed == CHECKSUM_UNNECESSARY) {
-			ret = skb_copy_datagram_iovec(skb, sp->offset, msg->msg_iov,
-						      copy);
-		} else {
-			ret = skb_copy_and_csum_datagram_iovec(skb, sp->offset,
-							       msg->msg_iov);
-			if (ret == -EINVAL)
-				goto csum_copy_err;
-		}
-
-		if (ret < 0)
-			goto error_requeue_packet;
-
-		/* handle piecemeal consumption of data packets */
-		sp->offset += copy;
-		ret = copy;
-
-		if (sp->hdr.flags & RXRPC_LAST_PACKET)
-			msg->msg_flags |= MSG_EOR;
-
-		if (sp->offset < skb->len) {
-			if (!(flags & MSG_PEEK)) {
-				skb_queue_head(&rx->sk.sk_receive_queue, skb);
-				atomic_add(skb->truesize, &rx->sk.sk_rmem_alloc);
-			}
-		} else if (!(flags & MSG_PEEK)) {
-			if (call->conn->out_clientflag &&
-			    sp->hdr.flags & RXRPC_LAST_PACKET)
-				goto terminal_message;
-		}
-		break;
-
-	case RXRPC_SKB_MARK_FINAL_ACK:
-		ret = put_cmsg(msg, SOL_RXRPC, RXRPC_ACK, 0, &abort_code);
-		if (ret < 0)
-			goto error_requeue_packet;
-		goto terminal_message;
-
-	case RXRPC_SKB_MARK_BUSY:
-		ret = put_cmsg(msg, SOL_RXRPC, RXRPC_BUSY, 0, &abort_code);
-		if (ret < 0)
-			goto error_requeue_packet;
-		goto terminal_message;
-
-	case RXRPC_SKB_MARK_REMOTE_ABORT:
-		abort_code = call->abort_code;
-		ret = put_cmsg(msg, SOL_RXRPC, RXRPC_ABORT,
-			       sizeof(abort_code), &abort_code);
-		if (ret < 0)
-			goto error_requeue_packet;
-		goto terminal_message;
-
-	case RXRPC_SKB_MARK_NET_ERROR:
-		_debug("RECV NET ERROR %d", sp->error);
-		abort_code = sp->error;
-		ret = put_cmsg(msg, SOL_RXRPC, RXRPC_NET_ERROR, 4, &abort_code);
-		if (ret < 0)
-			goto error_requeue_packet;
-		goto terminal_message;
-
-	case RXRPC_SKB_MARK_LOCAL_ERROR:
-		_debug("RECV LOCAL ERROR %d", sp->error);
-		abort_code = sp->error;
-		ret = put_cmsg(msg, SOL_RXRPC, RXRPC_LOCAL_ERROR, 4, &abort_code);
-		if (ret < 0)
-			goto error_requeue_packet;
-		goto terminal_message;
-
-	default:
-		BUG();
-		break;
-	}
-
-done:
-	if (!(flags & MSG_PEEK)) {
-		rxrpc_kill_skb(skb);
-		skb_free_datagram(&rx->sk, skb);
-	}
-	rxrpc_put_call(call);
-	_leave(" = %d", ret);
-  	return ret;
-
-terminal_message:
-	if (!(flags & MSG_PEEK)) {
-		_net("free terminal skb %p", skb);
-		rxrpc_kill_skb(skb);
-		skb_free_datagram(&rx->sk, skb);
-		_debug("RELEASE CALL %d", call->debug_id);
-
-		/* withdraw the user ID mapping so that it's not seen again in
-		 * association with that call */
-		if (test_bit(RXRPC_CALL_HAS_USERID, &call->flags)) {
-			write_lock_bh(&rx->call_lock);
-			rb_erase(&call->sock_node, &call->socket->calls);
-			clear_bit(RXRPC_CALL_HAS_USERID, &call->flags);
-			write_unlock_bh(&rx->call_lock);
-		}
-		read_lock_bh(&call->state_lock);
-		if (!test_bit(RXRPC_CALL_RELEASED, &call->flags) &&
-		    !test_and_set_bit(RXRPC_CALL_RELEASE, &call->events))
-			schedule_work(&call->processor);
-		read_unlock_bh(&call->state_lock);
-	}
-	rxrpc_put_call(call);
-	msg->msg_flags |= MSG_EOR;
-	_leave(" = %d", ret);
-  	return ret;
-
-error_requeue_packet:
-	if (!(flags & MSG_PEEK)) {
-		skb_queue_head(&rx->sk.sk_receive_queue, skb);
-		atomic_add(skb->truesize, &rx->sk.sk_rmem_alloc);
-	}
-	rxrpc_put_call(call);
-	_leave(" = %d", ret);
-  	return ret;
-
-csum_copy_err:
-	rxrpc_kill_skb(skb);
-	skb_kill_datagram(&rx->sk, skb, flags);
-	rxrpc_put_call(call);
-	if (flags & MSG_DONTWAIT)
-		return -EAGAIN;
-	goto try_again;
-}
-
-/*
  * set RxRPC socket options
  */
 static int rxrpc_setsockopt(struct socket *sock, int level, int optname,
diff --git a/net/rxrpc/ar-internal.h b/net/rxrpc/ar-internal.h
index 00ebf1b..213a640 100644
--- a/net/rxrpc/ar-internal.h
+++ b/net/rxrpc/ar-internal.h
@@ -580,6 +580,12 @@ extern struct file_operations rxrpc_call_seq_fops;
 extern struct file_operations rxrpc_connection_seq_fops;
 
 /*
+ * ar-recvmsg.c
+ */
+extern int rxrpc_recvmsg(struct kiocb *, struct socket *, struct msghdr *,
+			 size_t, int);
+
+/*
  * ar-security.c
  */
 extern int rxrpc_register_security(struct rxrpc_security *);
diff --git a/net/rxrpc/ar-recvmsg.c b/net/rxrpc/ar-recvmsg.c
new file mode 100644
index 0000000..74e08cc
--- /dev/null
+++ b/net/rxrpc/ar-recvmsg.c
@@ -0,0 +1,362 @@
+/* RxRPC recvmsg() implementation
+ *
+ * Copyright (C) 2007 Red Hat, Inc. All Rights Reserved.
+ * Written by David Howells (dhowells@redhat.com)
+ *
+ * This program is free software; you can redistribute it and/or
+ * modify it under the terms of the GNU General Public License
+ * as published by the Free Software Foundation; either version
+ * 2 of the License, or (at your option) any later version.
+ */
+
+#include <linux/net.h>
+#include <linux/skbuff.h>
+#include <net/sock.h>
+#include <net/af_rxrpc.h>
+#include "ar-internal.h"
+
+/*
+ * removal a call's user ID from the socket tree to make the user ID available
+ * again and so that it won't be seen again in association with that call
+ */
+static void rxrpc_remove_user_ID(struct rxrpc_sock *rx, struct rxrpc_call *call)
+{
+	_debug("RELEASE CALL %d", call->debug_id);
+
+	if (test_bit(RXRPC_CALL_HAS_USERID, &call->flags)) {
+		write_lock_bh(&rx->call_lock);
+		rb_erase(&call->sock_node, &call->socket->calls);
+		clear_bit(RXRPC_CALL_HAS_USERID, &call->flags);
+		write_unlock_bh(&rx->call_lock);
+	}
+
+	read_lock_bh(&call->state_lock);
+	if (!test_bit(RXRPC_CALL_RELEASED, &call->flags) &&
+	    !test_and_set_bit(RXRPC_CALL_RELEASE, &call->events))
+		schedule_work(&call->processor);
+	read_unlock_bh(&call->state_lock);
+}
+
+/*
+ * receive a message from an RxRPC socket
+ */
+int rxrpc_recvmsg(struct kiocb *iocb, struct socket *sock,
+		  struct msghdr *msg, size_t len, int flags)
+{
+	struct rxrpc_skb_priv *sp;
+	struct rxrpc_call *call, *continue_call = NULL;
+	struct rxrpc_sock *rx = rxrpc_sk(sock->sk);
+	struct sk_buff *skb;
+	long timeo;
+	int copy, ret, ullen, offset, copied = 0;
+	u32 abort_code;
+
+	DEFINE_WAIT(wait);
+
+	_enter(",,,%zu,%d", len, flags);
+
+	if (flags & (MSG_OOB | MSG_TRUNC))
+		return -EOPNOTSUPP;
+
+	ullen = msg->msg_flags & MSG_CMSG_COMPAT ? 4 : sizeof(unsigned long);
+
+	timeo = sock_rcvtimeo(&rx->sk, flags & MSG_DONTWAIT);
+	msg->msg_flags |= MSG_MORE;
+
+	lock_sock(&rx->sk);
+
+	for (;;) {
+		/* return immediately if a client socket has no outstanding
+		 * calls */
+		if (RB_EMPTY_ROOT(&rx->calls) &&
+		    rx->sk.sk_state != RXRPC_SERVER_LISTENING) {
+			release_sock(&rx->sk);
+			return -ENODATA;
+		}
+
+		/* get the next message on the Rx queue */
+		skb = skb_peek(&rx->sk.sk_receive_queue);
+		if (!skb) {
+			/* wait for a message to turn up */
+			release_sock(&rx->sk);
+
+			if (msg->msg_flags & MSG_PEEK && copied) {
+				if (continue_call)
+					rxrpc_put_call(continue_call);
+				_leave(" = %d [peekwait]", copied);
+				return copied;
+			}
+
+			prepare_to_wait_exclusive(rx->sk.sk_sleep, &wait,
+						  TASK_INTERRUPTIBLE);
+			ret = sock_error(&rx->sk);
+			if (ret)
+				goto wait_error;
+
+			if (skb_queue_empty(&rx->sk.sk_receive_queue)) {
+				if (signal_pending(current))
+					goto wait_interrupted;
+				timeo = schedule_timeout(timeo);
+			}
+			finish_wait(rx->sk.sk_sleep, &wait);
+			lock_sock(&rx->sk);
+			continue;
+		}
+
+	peek_next_packet:
+		sp = rxrpc_skb(skb);
+		call = sp->call;
+		ASSERT(call != NULL);
+
+		_debug("next pkt %s", rxrpc_pkts[sp->hdr.type]);
+
+		/* make sure we wait for the state to be updated in this call */
+		spin_lock_bh(&call->lock);
+		spin_unlock_bh(&call->lock);
+
+		if (test_bit(RXRPC_CALL_RELEASED, &call->flags)) {
+			_debug("packet from released call");
+			if (skb_dequeue(&rx->sk.sk_receive_queue) != skb)
+				BUG();
+			rxrpc_free_skb(skb);
+			continue;
+		}
+
+		/* determine whether to continue last data receive */
+		if (continue_call) {
+			_debug("maybe cont");
+			if (call != continue_call ||
+			    skb->mark != RXRPC_SKB_MARK_DATA) {
+				release_sock(&rx->sk);
+				rxrpc_put_call(continue_call);
+				_leave(" = %d [noncont]", copied);
+				return copied;
+			}
+		}
+
+		rxrpc_get_call(call);
+
+		/* copy the peer address and timestamp */
+		if (!continue_call) {
+			if (msg->msg_name && msg->msg_namelen > 0)
+				memcpy(&msg->msg_name, &call->conn->trans->peer->srx,
+				       sizeof(call->conn->trans->peer->srx));
+			sock_recv_timestamp(msg, &rx->sk, skb);
+		}
+
+		/* receive the message */
+		if (skb->mark != RXRPC_SKB_MARK_DATA)
+			goto receive_non_data_message;
+
+		_debug("recvmsg DATA #%u { %d, %d }",
+		       ntohl(sp->hdr.seq), skb->len, sp->offset);
+
+		if (!continue_call) {
+			/* only set the control data once per recvmsg() */
+			ret = put_cmsg(msg, SOL_RXRPC, RXRPC_USER_CALL_ID,
+				       ullen, &call->user_call_ID);
+			if (ret < 0)
+				goto copy_error;
+			ASSERT(test_bit(RXRPC_CALL_HAS_USERID, &call->flags));
+		}
+
+		ASSERTCMP(ntohl(sp->hdr.seq), >=, call->rx_data_recv);
+		ASSERTCMP(ntohl(sp->hdr.seq), <=, call->rx_data_recv + 1);
+		call->rx_data_recv = ntohl(sp->hdr.seq);
+
+		ASSERTCMP(ntohl(sp->hdr.seq), >, call->rx_data_eaten);
+
+		offset = sp->offset;
+		copy = skb->len - offset;
+		if (copy > len - copied)
+			copy = len - copied;
+
+		if (skb->ip_summed == CHECKSUM_UNNECESSARY) {
+			ret = skb_copy_datagram_iovec(skb, offset,
+						      msg->msg_iov, copy);
+		} else {
+			ret = skb_copy_and_csum_datagram_iovec(skb, offset,
+							       msg->msg_iov);
+			if (ret == -EINVAL)
+				goto csum_copy_error;
+		}
+
+		if (ret < 0)
+			goto copy_error;
+
+		/* handle piecemeal consumption of data packets */
+		_debug("copied %d+%d", copy, copied);
+
+		offset += copy;
+		copied += copy;
+
+		if (!(flags & MSG_PEEK))
+			sp->offset = offset;
+
+		if (sp->offset < skb->len) {
+			_debug("buffer full");
+			ASSERTCMP(copied, ==, len);
+			break;
+		}
+
+		/* we transferred the whole data packet */
+		if (sp->hdr.flags & RXRPC_LAST_PACKET) {
+			_debug("last");
+			if (call->conn->out_clientflag) {
+				 /* last byte of reply received */
+				ret = copied;
+				goto terminal_message;
+			}
+
+			/* last bit of request received */
+			if (!(flags & MSG_PEEK)) {
+				_debug("eat packet");
+				if (skb_dequeue(&rx->sk.sk_receive_queue) !=
+				    skb)
+					BUG();
+				rxrpc_free_skb(skb);
+			}
+			msg->msg_flags &= ~MSG_MORE;
+			break;
+		}
+
+		/* move on to the next data message */
+		_debug("next");
+		if (!continue_call)
+			continue_call = sp->call;
+		else
+			rxrpc_put_call(call);
+		call = NULL;
+
+		if (flags & MSG_PEEK) {
+			_debug("peek next");
+			skb = skb->next;
+			if (skb == (struct sk_buff *) &rx->sk.sk_receive_queue)
+				break;
+			goto peek_next_packet;
+		}
+
+		_debug("eat packet");
+		if (skb_dequeue(&rx->sk.sk_receive_queue) != skb)
+			BUG();
+		rxrpc_free_skb(skb);
+	}
+
+	/* end of non-terminal data packet reception for the moment */
+	_debug("end rcv data");
+out:
+	release_sock(&rx->sk);
+	if (call)
+		rxrpc_put_call(call);
+	if (continue_call)
+		rxrpc_put_call(continue_call);
+	_leave(" = %d [data]", copied);
+	return copied;
+
+	/* handle non-DATA messages such as aborts, incoming connections and
+	 * final ACKs */
+receive_non_data_message:
+	_debug("non-data");
+
+	if (skb->mark == RXRPC_SKB_MARK_NEW_CALL) {
+		_debug("RECV NEW CALL");
+		ret = put_cmsg(msg, SOL_RXRPC, RXRPC_NEW_CALL, 0, &abort_code);
+		if (ret < 0)
+			goto copy_error;
+		if (!(flags & MSG_PEEK)) {
+			if (skb_dequeue(&rx->sk.sk_receive_queue) != skb)
+				BUG();
+			rxrpc_free_skb(skb);
+		}
+		goto out;
+	}
+
+	ret = put_cmsg(msg, SOL_RXRPC, RXRPC_USER_CALL_ID,
+		       ullen, &call->user_call_ID);
+	if (ret < 0)
+		goto copy_error;
+	ASSERT(test_bit(RXRPC_CALL_HAS_USERID, &call->flags));
+
+	switch (skb->mark) {
+	case RXRPC_SKB_MARK_DATA:
+		BUG();
+	case RXRPC_SKB_MARK_FINAL_ACK:
+		ret = put_cmsg(msg, SOL_RXRPC, RXRPC_ACK, 0, &abort_code);
+		break;
+	case RXRPC_SKB_MARK_BUSY:
+		ret = put_cmsg(msg, SOL_RXRPC, RXRPC_BUSY, 0, &abort_code);
+		break;
+	case RXRPC_SKB_MARK_REMOTE_ABORT:
+		abort_code = call->abort_code;
+		ret = put_cmsg(msg, SOL_RXRPC, RXRPC_ABORT, 4, &abort_code);
+		break;
+	case RXRPC_SKB_MARK_NET_ERROR:
+		_debug("RECV NET ERROR %d", sp->error);
+		abort_code = sp->error;
+		ret = put_cmsg(msg, SOL_RXRPC, RXRPC_NET_ERROR, 4, &abort_code);
+		break;
+	case RXRPC_SKB_MARK_LOCAL_ERROR:
+		_debug("RECV LOCAL ERROR %d", sp->error);
+		abort_code = sp->error;
+		ret = put_cmsg(msg, SOL_RXRPC, RXRPC_LOCAL_ERROR, 4,
+			       &abort_code);
+		break;
+	default:
+		BUG();
+		break;
+	}
+
+	if (ret < 0)
+		goto copy_error;
+
+terminal_message:
+	_debug("terminal");
+	msg->msg_flags &= ~MSG_MORE;
+	msg->msg_flags |= MSG_EOR;
+
+	if (!(flags & MSG_PEEK)) {
+		_net("free terminal skb %p", skb);
+		if (skb_dequeue(&rx->sk.sk_receive_queue) != skb)
+			BUG();
+		rxrpc_free_skb(skb);
+		rxrpc_remove_user_ID(rx, call);
+	}
+
+	release_sock(&rx->sk);
+	rxrpc_put_call(call);
+	if (continue_call)
+		rxrpc_put_call(continue_call);
+	_leave(" = %d", ret);
+	return ret;
+
+copy_error:
+	_debug("copy error");
+	release_sock(&rx->sk);
+	rxrpc_put_call(call);
+	if (continue_call)
+		rxrpc_put_call(continue_call);
+	_leave(" = %d", ret);
+	return ret;
+
+csum_copy_error:
+	_debug("csum error");
+	release_sock(&rx->sk);
+	if (continue_call)
+		rxrpc_put_call(continue_call);
+	rxrpc_kill_skb(skb);
+	skb_kill_datagram(&rx->sk, skb, flags);
+	rxrpc_put_call(call);
+	return -EAGAIN;
+
+wait_interrupted:
+	ret = sock_intr_errno(timeo);
+wait_error:
+	finish_wait(rx->sk.sk_sleep, &wait);
+	if (continue_call)
+		rxrpc_put_call(continue_call);
+	if (copied)
+		copied = ret;
+	_leave(" = %d [waitfail %d]", copied, ret);
+	return copied;
+
+}

^ permalink raw reply related	[flat|nested] 14+ messages in thread

end of thread, other threads:[~2007-03-22  0:57 UTC | newest]

Thread overview: 14+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2007-03-20 21:14 [PATCH 0/5] [RFC] AF_RXRPC socket family implementation [try #3] Alan Cox
2007-03-20 19:58 ` David Howells
2007-03-20 19:59   ` [PATCH 1/5] AF_RXRPC: Add blkcipher accessors for using kernel data directly " David Howells
2007-03-20 19:59   ` [PATCH 2/5] AF_RXRPC: Move generic skbuff stuff from XFRM code to generic code " David Howells
2007-03-20 19:59   ` [PATCH 3/5] AF_RXRPC: Make it possible to merely try to cancel timers and delayed work " David Howells
2007-03-20 19:59   ` [PATCH 4/5] AF_RXRPC: Key facility changes for AF_RXRPC " David Howells
2007-03-20 20:22   ` [PATCH 0/5] [RFC] AF_RXRPC socket family implementation " David Howells
2007-03-20 21:12   ` Alan Cox
2007-03-20 21:36   ` Alan Cox
2007-03-20 22:10   ` David Howells
2007-03-20 22:26   ` David Howells
2007-03-21 13:26   ` David Howells
2007-03-21 18:10   ` David Howells
2007-03-21 21:32   ` David Howells

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).