LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
From: Steffen Maier <maier@linux.ibm.com>
To: Greg Kroah-Hartman <gregkh@linuxfoundation.org>
Cc: linux-kernel@vger.kernel.org, linux-block@vger.kernel.org,
	Steven Rostedt <rostedt@goodmis.org>,
	Ingo Molnar <mingo@redhat.com>, Jens Axboe <axboe@kernel.dk>,
	Li Zefan <lizf@cn.fujitsu.com>, Li Zefan <lizefan@huawei.com>
Subject: Re: [PATCH 2/2] tracing/events: block: dev_t via driver core for plug and unplug events
Date: Mon, 16 Apr 2018 18:33:27 +0200	[thread overview]
Message-ID: <59186bf6-abf1-87b0-914d-eed1b40ef4a8@linux.ibm.com> (raw)
In-Reply-To: <20180415083154.GA12254@kroah.com>

Hi Greg,

On 04/15/2018 10:31 AM, Greg Kroah-Hartman wrote:
> On Fri, Apr 13, 2018 at 03:07:18PM +0200, Steffen Maier wrote:
>> Complements v2.6.31 commit 55782138e47d ("tracing/events: convert block
>> trace points to TRACE_EVENT()") to be equivalent to traditional blktrace
>> output. Also this allows event filtering to not always get all (un)plug
>> events.
>>
>> NB: The NULL pointer check for q->kobj.parent is certainly racy and
>> I don't have enough experience if it's good enough for a trace event.
>> The change did work for my cases (block device read/write I/O on
>> zfcp-attached SCSI disks and dm-mpath on top).
>>
>> While I haven't seen any prior art using driver core (parent) relations
>> for trace events, there are other cases using this when no direct pointer
>> exists between objects, such as:
>>   #define to_scsi_target(d)	container_of(d, struct scsi_target, dev)
>>   static inline struct scsi_target *scsi_target(struct scsi_device *sdev)
>>   {
>> 	return to_scsi_target(sdev->sdev_gendev.parent);
>>   }
> 
> That is because you "know" the parent of a target device is a
> scsi_target.

true

>> This is the object model we make use of here:
>>
>> struct gendisk {
>>          struct hd_struct {
>>                  struct device {      /*container_of*/
>>                          struct kobject kobj; <--+
>>                          dev_t  devt; /*deref*/  |
>>                  } __dev;                        |
>>          } part0;                                |
>>          struct request_queue *queue; ..+        |
>> }                                      :        |
>>                                         :        |
>> struct request_queue {  <..............+        |
>>          /* queue kobject */                     |
>>          struct kobject {                        |
>>                  struct kobject *parent; --------+
> 
> Are you sure about this?

I double checked it with crash on a running system chasing pointers and 
looking at structure debug symbols.
But of course I cannot guarantee it's always been like this and will be.

>>          } kobj;
>> }
>>

>> The difference to blktrace parsed output is that block events don't use the
>> partition's minor number but the containing block device's minor number:
> 
> Why do you want the block device's minor number here?  What is wrong
> with the partition's minor number?  I would think you want that instead.

No change introduced with my patch. I just describe state of the art 
since the mentioned 
https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=55782138e47d.
It (or even its predecessor) used request_queue as trace function 
argument (plus mostly either request or bio). So that's the currently 
available context for these events. My change is consistent with that.
But then again, it's not much of a problem as we do have the remap event 
which shows the mapping from partition to blockdev.

blktrace, hooking with callbacks on the block trace events, has its own 
context information [struct blk_trace] and can get to e.g. the dev_t 
with its own real pointers without using driver core relations. But I 
had the impression that's only wired if one uses the blktrace IOCTL or 
the blk tracer [do_blk_trace_setup()], not for "pure" block events.
> static void blk_add_trace_plug(void *ignore, struct request_queue *q)
> {
> 	struct blk_trace *bt = q->blk_trace;
                                ^^^^^^^^^^^^
> 
> 	if (bt)
> 		__blk_add_trace(bt, 0, 0, 0, 0, BLK_TA_PLUG, 0, 0, NULL, NULL);
> }
> 
> static void blk_add_trace_unplug(void *ignore, struct request_queue *q,
> 				    unsigned int depth, bool explicit)
> {
> 	struct blk_trace *bt = q->blk_trace;
                                ^^^^^^^^^^^^
> 
> 	if (bt) {
> 		__be64 rpdu = cpu_to_be64(depth);
> 		u32 what;
> 
> 		if (explicit)
> 			what = BLK_TA_UNPLUG_IO;
> 		else
> 			what = BLK_TA_UNPLUG_TIMER;
> 
> 		__blk_add_trace(bt, 0, 0, 0, 0, what, 0, sizeof(rpdu), &rpdu, NULL);
> 	}
> }

> struct blk_trace {
> 	int trace_state;
> 	struct rchan *rchan;
> 	unsigned long __percpu *sequence;
> 	unsigned char __percpu *msg_data;
> 	u16 act_mask;
> 	u64 start_lba;
> 	u64 end_lba;
> 	u32 pid;
> 	u32 dev;
             ^^^
> 	struct dentry *dir;
> 	struct dentry *dropped_file;
> 	struct dentry *msg_file;
> 	struct list_head running_list;
> 	atomic_t dropped;
> };



>> $ dd if=/dev/sdf1 count=1
>>
>> $ cat /sys/kernel/debug/tracing/trace
>> block_bio_remap: 8,80 R 2048 + 32 <- (8,81) 0
>> block_bio_queue: 8,80 R 2048 + 32 [dd]
>> block_getrq: 8,80 R 2048 + 32 [dd]
>> block_plug: 8,80 [dd]
>>              ^^^^
>> block_rq_insert: 8,80 R 16384 () 2048 + 32 [dd]
>> block_unplug: 8,80 [dd] 1 explicit
>>                ^^^^
>> block_rq_issue: 8,80 R 16384 () 2048 + 32 [dd]
>> block_rq_complete: 8,80 R () 2048 + 32 [0]

>> diff --git a/include/trace/events/block.h b/include/trace/events/block.h
>> index a13613d27cee..cffedc26e8a3 100644
>> --- a/include/trace/events/block.h
>> +++ b/include/trace/events/block.h
>> @@ -460,14 +460,18 @@ TRACE_EVENT(block_plug,
>>   	TP_ARGS(q),
>>   
>>   	TP_STRUCT__entry(
>> +		__field( dev_t,		dev			)
>>   		__array( char,		comm,	TASK_COMM_LEN	)
>>   	),
>>   
>>   	TP_fast_assign(
>> +		__entry->dev = q->kobj.parent ?
>> +		container_of(q->kobj.parent, struct device, kobj)->devt : 0;
> 
> That really really really scares me.  It feels very fragile and messing
> with parent pointers is ripe for things breaking in the future in odd
> and unexplainable ways.
> 
> And how can the parent be NULL?

I'm hoping for help by block layer experts.

I suppose block device unplug/removal could be a case. But I don't know 
the details how this works and if object release is protected while I/O 
is pending and new I/O is rejected beforehand. That might make my 
approach safe as it would not call the trace functions while the parent 
pointer changes.

> I don't know the block layer but this feels very wrong to me.  Are you
> sure there isn't some other way to get this info?

No, I'm not sure at all. But I'm no block layer expert either. This is 
just an idea I had which did work for my cases and I'm looking for 
confirmation or denial by the experts. So if it's not safe from a block 
layer point of view either, then I have to ditch it.

-- 
Mit freundlichen Grüßen / Kind regards
Steffen Maier

Linux on z Systems Development

IBM Deutschland Research & Development GmbH
Vorsitzende des Aufsichtsrats: Martina Koederitz
Geschaeftsfuehrung: Dirk Wittkopp
Sitz der Gesellschaft: Boeblingen
Registergericht: Amtsgericht Stuttgart, HRB 243294

  reply	other threads:[~2018-04-16 16:33 UTC|newest]

Thread overview: 11+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2018-04-13 13:07 [PATCH 0/2] tracing/events: block: bring more on a par with blktrace Steffen Maier
2018-04-13 13:07 ` [PATCH 1/2] tracing/events: block: track and print if unplug was explicit or schedule Steffen Maier
2018-04-13 14:16   ` Steven Rostedt
2018-04-13 13:07 ` [PATCH 2/2] tracing/events: block: dev_t via driver core for plug and unplug events Steffen Maier
2018-04-15  8:31   ` Greg Kroah-Hartman
2018-04-16 16:33     ` Steffen Maier [this message]
2018-04-19 19:24       ` Omar Sandoval
2018-04-19 20:56         ` Bart Van Assche
2018-04-24 14:49           ` Steffen Maier
2018-04-27 16:38             ` Bart Van Assche
2018-04-13 13:07 ` [RFC] tracing/events: block: also try to get dev_t via driver core for some events Steffen Maier

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=59186bf6-abf1-87b0-914d-eed1b40ef4a8@linux.ibm.com \
    --to=maier@linux.ibm.com \
    --cc=axboe@kernel.dk \
    --cc=gregkh@linuxfoundation.org \
    --cc=linux-block@vger.kernel.org \
    --cc=linux-kernel@vger.kernel.org \
    --cc=lizefan@huawei.com \
    --cc=lizf@cn.fujitsu.com \
    --cc=mingo@redhat.com \
    --cc=rostedt@goodmis.org \
    --subject='Re: [PATCH 2/2] tracing/events: block: dev_t via driver core for plug and unplug events' \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).