LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [patch 0/6] mm: bdi: updates
@ 2008-01-29 15:49 Miklos Szeredi
  2008-01-29 15:49 ` [patch 1/6] mm: bdi: tweak task dirty penalty Miklos Szeredi
                   ` (6 more replies)
  0 siblings, 7 replies; 19+ messages in thread
From: Miklos Szeredi @ 2008-01-29 15:49 UTC (permalink / raw)
  To: akpm; +Cc: a.p.zijlstra, linux-kernel, linux-fsdevel, linux-mm

This is a series from Peter Zijlstra, with various updates by me.  The
patchset mostly deals with exporting BDI attributes in sysfs.

Should be in a mergeable state, at least into -mm.

--

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [patch 1/6] mm: bdi: tweak task dirty penalty
  2008-01-29 15:49 [patch 0/6] mm: bdi: updates Miklos Szeredi
@ 2008-01-29 15:49 ` Miklos Szeredi
  2008-01-31  0:13   ` Andrew Morton
  2008-01-29 15:49 ` [patch 2/6] mm: bdi: export BDI attributes in sysfs Miklos Szeredi
                   ` (5 subsequent siblings)
  6 siblings, 1 reply; 19+ messages in thread
From: Miklos Szeredi @ 2008-01-29 15:49 UTC (permalink / raw)
  To: akpm; +Cc: a.p.zijlstra, linux-kernel, linux-fsdevel, linux-mm

[-- Attachment #1: bdi-task-dirty.patch --]
[-- Type: text/plain, Size: 1341 bytes --]

From: Peter Zijlstra <a.p.zijlstra@chello.nl>

Penalizing heavy dirtiers with 1/8-th the total dirty limit might be rather
excessive on large memory machines. Use sqrt to scale it sub-linearly.

Update the comment while we're there.

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---

Index: linux/mm/page-writeback.c
===================================================================
--- linux.orig/mm/page-writeback.c	2008-01-17 19:00:56.000000000 +0100
+++ linux/mm/page-writeback.c	2008-01-18 13:07:16.000000000 +0100
@@ -219,17 +219,21 @@ static inline void task_dirties_fraction
 }
 
 /*
- * scale the dirty limit
+ * Task specific dirty limit:
  *
- * task specific dirty limit:
+ *   dirty -= 8 * sqrt(dirty) * p_{t}
  *
- *   dirty -= (dirty/8) * p_{t}
+ * Penalize tasks that dirty a lot of pages by lowering their dirty limit. This
+ * avoids infrequent dirtiers from getting stuck in this other guys dirty
+ * pages.
+ *
+ * Use a sub-linear function to scale the penalty, we only need a little room.
  */
 static void task_dirty_limit(struct task_struct *tsk, long *pdirty)
 {
 	long numerator, denominator;
 	long dirty = *pdirty;
-	u64 inv = dirty >> 3;
+	u64 inv = 8*int_sqrt(dirty);
 
 	task_dirties_fraction(tsk, &numerator, &denominator);
 	inv *= numerator;

--

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [patch 2/6] mm: bdi: export BDI attributes in sysfs
  2008-01-29 15:49 [patch 0/6] mm: bdi: updates Miklos Szeredi
  2008-01-29 15:49 ` [patch 1/6] mm: bdi: tweak task dirty penalty Miklos Szeredi
@ 2008-01-29 15:49 ` Miklos Szeredi
  2008-01-29 17:39   ` Greg KH
                     ` (2 more replies)
  2008-01-29 15:49 ` [patch 3/6] mm: bdi: expose the BDI object in sysfs for NFS Miklos Szeredi
                   ` (4 subsequent siblings)
  6 siblings, 3 replies; 19+ messages in thread
From: Miklos Szeredi @ 2008-01-29 15:49 UTC (permalink / raw)
  To: akpm
  Cc: a.p.zijlstra, linux-kernel, linux-fsdevel, linux-mm, Kay Sievers,
	Greg KH, Trond Myklebust

[-- Attachment #1: bdi-sysfs.patch --]
[-- Type: text/plain, Size: 9352 bytes --]

From: Peter Zijlstra <a.p.zijlstra@chello.nl>

Provide a place in sysfs (/sys/class/bdi) for the backing_dev_info
object.  This allows us to see and set the various BDI specific
variables.

In particular this properly exposes the read-ahead window for all
relevant users and /sys/block/<block>/queue/read_ahead_kb should be
deprecated.

With patient help from Kay Sievers and Greg KH

[mszeredi@suse.cz]

 - split off NFS and FUSE changes into separate patches
 - document new sysfs attributes under Documentation/ABI
 - do bdi_class_init as a core_initcall, otherwise the "default" BDI
   won't be initialized
 - remove bdi_init_fmt macro, it's not used very much

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Kay Sievers <kay.sievers@vrfy.org>
CC: Greg KH <greg@kroah.com>
CC: Trond Myklebust <trond.myklebust@fys.uio.no>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---

Index: linux/block/genhd.c
===================================================================
--- linux.orig/block/genhd.c	2008-01-29 13:02:41.000000000 +0100
+++ linux/block/genhd.c	2008-01-29 13:02:46.000000000 +0100
@@ -183,6 +183,8 @@ void add_disk(struct gendisk *disk)
 			    disk->minors, NULL, exact_match, exact_lock, disk);
 	register_disk(disk);
 	blk_register_queue(disk);
+	bdi_register(&disk->queue->backing_dev_info, NULL,
+		"blk-%s", disk->disk_name);
 }
 
 EXPORT_SYMBOL(add_disk);
@@ -191,6 +193,7 @@ EXPORT_SYMBOL(del_gendisk);	/* in partit
 void unlink_gendisk(struct gendisk *disk)
 {
 	blk_unregister_queue(disk);
+	bdi_unregister(&disk->queue->backing_dev_info);
 	blk_unregister_region(MKDEV(disk->major, disk->first_minor),
 			      disk->minors);
 }
Index: linux/include/linux/backing-dev.h
===================================================================
--- linux.orig/include/linux/backing-dev.h	2008-01-29 13:02:41.000000000 +0100
+++ linux/include/linux/backing-dev.h	2008-01-29 13:02:46.000000000 +0100
@@ -11,6 +11,8 @@
 #include <linux/percpu_counter.h>
 #include <linux/log2.h>
 #include <linux/proportions.h>
+#include <linux/kernel.h>
+#include <linux/device.h>
 #include <asm/atomic.h>
 
 struct page;
@@ -48,11 +50,17 @@ struct backing_dev_info {
 
 	struct prop_local_percpu completions;
 	int dirty_exceeded;
+
+	struct device *dev;
 };
 
 int bdi_init(struct backing_dev_info *bdi);
 void bdi_destroy(struct backing_dev_info *bdi);
 
+int bdi_register(struct backing_dev_info *bdi, struct device *parent,
+		const char *fmt, ...);
+void bdi_unregister(struct backing_dev_info *bdi);
+
 static inline void __add_bdi_stat(struct backing_dev_info *bdi,
 		enum bdi_stat_item item, s64 amount)
 {
Index: linux/include/linux/writeback.h
===================================================================
--- linux.orig/include/linux/writeback.h	2008-01-29 13:02:41.000000000 +0100
+++ linux/include/linux/writeback.h	2008-01-29 13:02:46.000000000 +0100
@@ -113,6 +113,9 @@ struct file;
 int dirty_writeback_centisecs_handler(struct ctl_table *, int, struct file *,
 				      void __user *, size_t *, loff_t *);
 
+void get_dirty_limits(long *pbackground, long *pdirty, long *pbdi_dirty,
+		 struct backing_dev_info *bdi);
+
 void page_writeback_init(void);
 void balance_dirty_pages_ratelimited_nr(struct address_space *mapping,
 					unsigned long nr_pages_dirtied);
Index: linux/mm/backing-dev.c
===================================================================
--- linux.orig/mm/backing-dev.c	2008-01-29 13:02:41.000000000 +0100
+++ linux/mm/backing-dev.c	2008-01-29 13:03:23.000000000 +0100
@@ -4,12 +4,118 @@
 #include <linux/fs.h>
 #include <linux/sched.h>
 #include <linux/module.h>
+#include <linux/writeback.h>
+#include <linux/device.h>
+
+
+static struct class *bdi_class;
+
+static ssize_t read_ahead_kb_store(struct device *dev,
+				  struct device_attribute *attr,
+				  const char *buf, size_t count)
+{
+	struct backing_dev_info *bdi = dev_get_drvdata(dev);
+	char *end;
+
+	bdi->ra_pages = simple_strtoul(buf, &end, 10) >> (PAGE_SHIFT - 10);
+
+	return end - buf;
+}
+
+#define K(pages) ((pages) << (PAGE_SHIFT - 10))
+
+#define BDI_SHOW(name, expr)						\
+static ssize_t name##_show(struct device *dev,				\
+			   struct device_attribute *attr, char *page)	\
+{									\
+	struct backing_dev_info *bdi = dev_get_drvdata(dev);		\
+									\
+	return snprintf(page, PAGE_SIZE-1, "%lld\n", (long long)expr);	\
+}
+
+BDI_SHOW(read_ahead_kb, K(bdi->ra_pages))
+
+BDI_SHOW(reclaimable_kb, K(bdi_stat(bdi, BDI_RECLAIMABLE)))
+BDI_SHOW(writeback_kb, K(bdi_stat(bdi, BDI_WRITEBACK)))
+
+static inline unsigned long get_dirty(struct backing_dev_info *bdi, int i)
+{
+	unsigned long thresh[3];
+
+	get_dirty_limits(&thresh[0], &thresh[1], &thresh[2], bdi);
+
+	return thresh[i];
+}
+
+BDI_SHOW(dirty_kb, K(get_dirty(bdi, 1)))
+BDI_SHOW(bdi_dirty_kb, K(get_dirty(bdi, 2)))
+
+#define __ATTR_RW(attr) __ATTR(attr, 0644, attr##_show, attr##_store)
+
+static struct device_attribute bdi_dev_attrs[] = {
+	__ATTR_RW(read_ahead_kb),
+	__ATTR_RO(reclaimable_kb),
+	__ATTR_RO(writeback_kb),
+	__ATTR_RO(dirty_kb),
+	__ATTR_RO(bdi_dirty_kb),
+	__ATTR_NULL,
+};
+
+static __init int bdi_class_init(void)
+{
+	bdi_class = class_create(THIS_MODULE, "bdi");
+	bdi_class->dev_attrs = bdi_dev_attrs;
+	return 0;
+}
+
+core_initcall(bdi_class_init);
+
+int bdi_register(struct backing_dev_info *bdi, struct device *parent,
+		const char *fmt, ...)
+{
+	char *name;
+	va_list args;
+	int ret = 0;
+	struct device *dev;
+
+	va_start(args, fmt);
+	name = kvasprintf(GFP_KERNEL, fmt, args);
+	va_end(args);
+
+	if (!name)
+		return -ENOMEM;
+
+	dev = device_create(bdi_class, parent, MKDEV(0, 0), name);
+	if (IS_ERR(dev)) {
+		ret = PTR_ERR(dev);
+		goto exit;
+	}
+
+	bdi->dev = dev;
+	dev_set_drvdata(bdi->dev, bdi);
+
+exit:
+	kfree(name);
+	return ret;
+}
+EXPORT_SYMBOL(bdi_register);
+
+void bdi_unregister(struct backing_dev_info *bdi)
+{
+	if (bdi->dev) {
+		device_unregister(bdi->dev);
+		bdi->dev = NULL;
+	}
+}
+EXPORT_SYMBOL(bdi_unregister);
 
 int bdi_init(struct backing_dev_info *bdi)
 {
 	int i;
 	int err;
 
+	bdi->dev = NULL;
+
 	for (i = 0; i < NR_BDI_STAT_ITEMS; i++) {
 		err = percpu_counter_init_irq(&bdi->bdi_stat[i], 0);
 		if (err)
@@ -33,6 +139,8 @@ void bdi_destroy(struct backing_dev_info
 {
 	int i;
 
+	bdi_unregister(bdi);
+
 	for (i = 0; i < NR_BDI_STAT_ITEMS; i++)
 		percpu_counter_destroy(&bdi->bdi_stat[i]);
 
Index: linux/mm/page-writeback.c
===================================================================
--- linux.orig/mm/page-writeback.c	2008-01-29 13:02:41.000000000 +0100
+++ linux/mm/page-writeback.c	2008-01-29 13:02:46.000000000 +0100
@@ -304,7 +304,7 @@ static unsigned long determine_dirtyable
 	return x + 1;	/* Ensure that we never return 0 */
 }
 
-static void
+void
 get_dirty_limits(long *pbackground, long *pdirty, long *pbdi_dirty,
 		 struct backing_dev_info *bdi)
 {
Index: linux/lib/percpu_counter.c
===================================================================
--- linux.orig/lib/percpu_counter.c	2008-01-29 13:02:41.000000000 +0100
+++ linux/lib/percpu_counter.c	2008-01-29 13:02:46.000000000 +0100
@@ -102,6 +102,7 @@ void percpu_counter_destroy(struct percp
 		return;
 
 	free_percpu(fbc->counters);
+	fbc->counters = NULL;
 #ifdef CONFIG_HOTPLUG_CPU
 	mutex_lock(&percpu_counters_lock);
 	list_del(&fbc->list);
Index: linux/mm/readahead.c
===================================================================
--- linux.orig/mm/readahead.c	2008-01-29 13:02:41.000000000 +0100
+++ linux/mm/readahead.c	2008-01-29 13:02:46.000000000 +0100
@@ -235,7 +235,13 @@ unsigned long max_sane_readahead(unsigne
 
 static int __init readahead_init(void)
 {
-	return bdi_init(&default_backing_dev_info);
+	int err;
+
+	err = bdi_init(&default_backing_dev_info);
+	if (!err)
+		bdi_register(&default_backing_dev_info, NULL, "default");
+
+	return err;
 }
 subsys_initcall(readahead_init);
 
Index: linux/Documentation/ABI/testing/sysfs-class-bdi
===================================================================
--- /dev/null	1970-01-01 00:00:00.000000000 +0000
+++ linux/Documentation/ABI/testing/sysfs-class-bdi	2008-01-29 13:02:46.000000000 +0100
@@ -0,0 +1,50 @@
+What:		/sys/class/bdi/<bdi>/
+Date:		January 2008
+Contact:	Peter Zijlstra <a.p.zijlstra@chello.nl>
+Description:
+
+Provide a place in sysfs for the backing_dev_info object.
+This allows us to see and set the various BDI specific variables.
+
+The <bdi> identifyer can take the following forms:
+
+blk-NAME
+
+	Block devices, NAME is 'sda', 'loop0', etc...
+
+FSTYPE-MAJOR:MINOR
+
+	Non-block device backed filesystems which provide their own
+	BDI, such as NFS and FUSE.  MAJOR:MINOR is the value of st_dev
+	for files on this filesystem.
+
+default
+
+	The default backing dev, used for non-block device backed
+	filesystems which do not provide their own BDI.
+
+Files under /sys/class/bdi/<bdi>/
+---------------------------------
+
+read_ahead_kb (read-write)
+
+	Size of the read-ahead window in kilobytes
+
+reclaimable_kb (read-only)
+
+	Reclaimable (dirty or unstable) memory destined for writeback
+	to this device
+
+writeback_kb (read-only)
+
+	Memory currently under writeback to this device
+
+dirty_kb (read-only)
+
+	Global threshold for reclaimable + writeback memory
+
+bdi_dirty_kb (read-only)
+
+	Current threshold on this BDI for reclaimable + writeback
+	memory
+

--

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [patch 3/6] mm: bdi: expose the BDI object in sysfs for NFS
  2008-01-29 15:49 [patch 0/6] mm: bdi: updates Miklos Szeredi
  2008-01-29 15:49 ` [patch 1/6] mm: bdi: tweak task dirty penalty Miklos Szeredi
  2008-01-29 15:49 ` [patch 2/6] mm: bdi: export BDI attributes in sysfs Miklos Szeredi
@ 2008-01-29 15:49 ` Miklos Szeredi
  2008-01-29 15:49 ` [patch 4/6] mm: bdi: expose the BDI object in sysfs for FUSE Miklos Szeredi
                   ` (3 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Miklos Szeredi @ 2008-01-29 15:49 UTC (permalink / raw)
  To: akpm; +Cc: a.p.zijlstra, linux-kernel, linux-fsdevel, linux-mm, Trond Myklebust

[-- Attachment #1: bdi-sysfs-nfs.patch --]
[-- Type: text/plain, Size: 2432 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

Register NFS' backing_dev_info under sysfs with the name
"nfs-MAJOR:MINOR"

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
CC: Trond Myklebust <trond.myklebust@fys.uio.no>
---

Index: linux/fs/nfs/super.c
===================================================================
--- linux.orig/fs/nfs/super.c	2008-01-29 10:26:47.000000000 +0100
+++ linux/fs/nfs/super.c	2008-01-29 12:12:38.000000000 +0100
@@ -1475,6 +1475,12 @@ static int nfs_compare_super(struct supe
 	return nfs_compare_mount_options(sb, server, mntflags);
 }
 
+static int nfs_bdi_register(struct nfs_server *server)
+{
+	return bdi_register(&server->backing_dev_info, NULL, "nfs-%u:%u",
+			    MAJOR(server->s_dev), MINOR(server->s_dev));
+}
+
 static int nfs_get_sb(struct file_system_type *fs_type,
 	int flags, const char *dev_name, void *raw_data, struct vfsmount *mnt)
 {
@@ -1515,6 +1521,10 @@ static int nfs_get_sb(struct file_system
 	if (s->s_fs_info != server) {
 		nfs_free_server(server);
 		server = NULL;
+	} else {
+		error = nfs_bdi_register(server);
+		if (error)
+			goto error_splat_super;
 	}
 
 	if (!s->s_root) {
@@ -1555,6 +1565,7 @@ static void nfs_kill_super(struct super_
 {
 	struct nfs_server *server = NFS_SB(s);
 
+	bdi_unregister(&server->backing_dev_info);
 	kill_anon_super(s);
 	nfs_free_server(server);
 }
@@ -1599,6 +1610,10 @@ static int nfs_xdev_get_sb(struct file_s
 	if (s->s_fs_info != server) {
 		nfs_free_server(server);
 		server = NULL;
+	} else {
+		error = nfs_bdi_register(server);
+		if (error)
+			goto error_splat_super;
 	}
 
 	if (!s->s_root) {
@@ -1889,6 +1904,10 @@ static int nfs4_get_sb(struct file_syste
 	if (s->s_fs_info != server) {
 		nfs_free_server(server);
 		server = NULL;
+	} else {
+		error = nfs_bdi_register(server);
+		if (error)
+			goto error_splat_super;
 	}
 
 	if (!s->s_root) {
@@ -1974,6 +1993,10 @@ static int nfs4_xdev_get_sb(struct file_
 	if (s->s_fs_info != server) {
 		nfs_free_server(server);
 		server = NULL;
+	} else {
+		error = nfs_bdi_register(server);
+		if (error)
+			goto error_splat_super;
 	}
 
 	if (!s->s_root) {
@@ -2053,6 +2076,10 @@ static int nfs4_referral_get_sb(struct f
 	if (s->s_fs_info != server) {
 		nfs_free_server(server);
 		server = NULL;
+	} else {
+		error = nfs_bdi_register(server);
+		if (error)
+			goto error_splat_super;
 	}
 
 	if (!s->s_root) {

--

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [patch 4/6] mm: bdi: expose the BDI object in sysfs for FUSE
  2008-01-29 15:49 [patch 0/6] mm: bdi: updates Miklos Szeredi
                   ` (2 preceding siblings ...)
  2008-01-29 15:49 ` [patch 3/6] mm: bdi: expose the BDI object in sysfs for NFS Miklos Szeredi
@ 2008-01-29 15:49 ` Miklos Szeredi
  2008-01-29 15:49 ` [patch 5/6] mm: bdi: allow setting a minimum for the bdi dirty limit Miklos Szeredi
                   ` (2 subsequent siblings)
  6 siblings, 0 replies; 19+ messages in thread
From: Miklos Szeredi @ 2008-01-29 15:49 UTC (permalink / raw)
  To: akpm; +Cc: a.p.zijlstra, linux-kernel, linux-fsdevel, linux-mm

[-- Attachment #1: bdi-sysfs-fuse.patch --]
[-- Type: text/plain, Size: 3462 bytes --]

From: Miklos Szeredi <mszeredi@suse.cz>

Register FUSE's backing_dev_info under sysfs with the name
"fuse-MAJOR:MINOR"

Make the fuse control filesystem use s_dev instead of a fuse specific
ID.  This makes it easier to match directories under
/sys/fs/fuse/connections/ with directories under /sys/class/bdi, and
with actual mounts.

Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
CC: Peter Zijlstra <a.p.zijlstra@chello.nl>
---

Index: linux/fs/fuse/control.c
===================================================================
--- linux.orig/fs/fuse/control.c	2008-01-29 10:26:47.000000000 +0100
+++ linux/fs/fuse/control.c	2008-01-29 12:16:06.000000000 +0100
@@ -117,7 +117,7 @@ int fuse_ctl_add_conn(struct fuse_conn *
 
 	parent = fuse_control_sb->s_root;
 	inc_nlink(parent->d_inode);
-	sprintf(name, "%llu", (unsigned long long) fc->id);
+	sprintf(name, "%u", fc->dev);
 	parent = fuse_ctl_add_dentry(parent, fc, name, S_IFDIR | 0500, 2,
 				     &simple_dir_inode_operations,
 				     &simple_dir_operations);
Index: linux/fs/fuse/fuse_i.h
===================================================================
--- linux.orig/fs/fuse/fuse_i.h	2008-01-29 10:26:47.000000000 +0100
+++ linux/fs/fuse/fuse_i.h	2008-01-29 12:16:06.000000000 +0100
@@ -384,8 +384,8 @@ struct fuse_conn {
 	/** Entry on the fuse_conn_list */
 	struct list_head entry;
 
-	/** Unique ID */
-	u64 id;
+	/** Device ID from super block */
+	dev_t dev;
 
 	/** Dentries in the control filesystem */
 	struct dentry *ctl_dentry[FUSE_CTL_NUM_DENTRIES];
Index: linux/fs/fuse/inode.c
===================================================================
--- linux.orig/fs/fuse/inode.c	2008-01-29 10:26:47.000000000 +0100
+++ linux/fs/fuse/inode.c	2008-01-29 12:57:26.000000000 +0100
@@ -448,7 +448,7 @@ static int fuse_show_options(struct seq_
 	return 0;
 }
 
-static struct fuse_conn *new_conn(void)
+static struct fuse_conn *new_conn(struct super_block *sb)
 {
 	struct fuse_conn *fc;
 	int err;
@@ -468,19 +468,27 @@ static struct fuse_conn *new_conn(void)
 		atomic_set(&fc->num_waiting, 0);
 		fc->bdi.ra_pages = (VM_MAX_READAHEAD * 1024) / PAGE_CACHE_SIZE;
 		fc->bdi.unplug_io_fn = default_unplug_io_fn;
+		fc->dev = sb->s_dev;
 		err = bdi_init(&fc->bdi);
-		if (err) {
-			kfree(fc);
-			fc = NULL;
-			goto out;
-		}
+		if (err)
+			goto error_kfree;
+		err = bdi_register(&fc->bdi, NULL, "fuse-%u:%u",
+				   MAJOR(fc->dev), MINOR(fc->dev));
+		if (err)
+			goto error_bdi_destroy;
 		fc->reqctr = 0;
 		fc->blocked = 1;
 		fc->attr_version = 1;
 		get_random_bytes(&fc->scramble_key, sizeof(fc->scramble_key));
 	}
-out:
 	return fc;
+
+error_bdi_destroy:
+	bdi_destroy(&fc->bdi);
+error_kfree:
+	mutex_destroy(&fc->inst_mutex);
+	kfree(fc);
+	return NULL;
 }
 
 void fuse_conn_put(struct fuse_conn *fc)
@@ -578,12 +586,6 @@ static void fuse_send_init(struct fuse_c
 	request_send_background(fc, req);
 }
 
-static u64 conn_id(void)
-{
-	static u64 ctr = 1;
-	return ctr++;
-}
-
 static int fuse_fill_super(struct super_block *sb, void *data, int silent)
 {
 	struct fuse_conn *fc;
@@ -621,7 +623,7 @@ static int fuse_fill_super(struct super_
 	if (file->f_op != &fuse_dev_operations)
 		return -EINVAL;
 
-	fc = new_conn();
+	fc = new_conn(sb);
 	if (!fc)
 		return -ENOMEM;
 
@@ -659,7 +661,6 @@ static int fuse_fill_super(struct super_
 	if (file->private_data)
 		goto err_unlock;
 
-	fc->id = conn_id();
 	err = fuse_ctl_add_conn(fc);
 	if (err)
 		goto err_unlock;

--

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [patch 5/6] mm: bdi: allow setting a minimum for the bdi dirty limit
  2008-01-29 15:49 [patch 0/6] mm: bdi: updates Miklos Szeredi
                   ` (3 preceding siblings ...)
  2008-01-29 15:49 ` [patch 4/6] mm: bdi: expose the BDI object in sysfs for FUSE Miklos Szeredi
@ 2008-01-29 15:49 ` Miklos Szeredi
  2008-01-29 15:49 ` [patch 6/6] mm: bdi: allow setting a maximum " Miklos Szeredi
  2008-01-29 17:06 ` [patch 0/6] mm: bdi: updates Peter Zijlstra
  6 siblings, 0 replies; 19+ messages in thread
From: Miklos Szeredi @ 2008-01-29 15:49 UTC (permalink / raw)
  To: akpm; +Cc: a.p.zijlstra, linux-kernel, linux-fsdevel, linux-mm

[-- Attachment #1: bdi-min.patch --]
[-- Type: text/plain, Size: 4546 bytes --]

From: Peter Zijlstra <a.p.zijlstra@chello.nl>

Add "min_ratio" to /sys/class/bdi.  This indicates the minimum
percentage of the global dirty threshold allocated to this bdi.

[mszeredi@suse.cz]

 - fix parsing in min_ratio_store()
 - document new sysfs attribute

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---

Index: linux/include/linux/backing-dev.h
===================================================================
--- linux.orig/include/linux/backing-dev.h	2008-01-29 14:40:35.000000000 +0100
+++ linux/include/linux/backing-dev.h	2008-01-29 15:35:34.000000000 +0100
@@ -51,6 +51,8 @@ struct backing_dev_info {
 	struct prop_local_percpu completions;
 	int dirty_exceeded;
 
+	unsigned int min_ratio;
+
 	struct device *dev;
 };
 
@@ -136,6 +138,8 @@ static inline unsigned long bdi_stat_err
 #endif
 }
 
+int bdi_set_min_ratio(struct backing_dev_info *bdi, unsigned int min_ratio);
+
 /*
  * Flags in backing_dev_info::capability
  * - The first two flags control whether dirty pages will contribute to the
Index: linux/mm/backing-dev.c
===================================================================
--- linux.orig/mm/backing-dev.c	2008-01-29 14:40:35.000000000 +0100
+++ linux/mm/backing-dev.c	2008-01-29 15:36:35.000000000 +0100
@@ -50,6 +50,24 @@ static inline unsigned long get_dirty(st
 BDI_SHOW(dirty_kb, K(get_dirty(bdi, 1)))
 BDI_SHOW(bdi_dirty_kb, K(get_dirty(bdi, 2)))
 
+static ssize_t min_ratio_store(struct device *dev,
+		struct device_attribute *attr, const char *buf, size_t count)
+{
+	struct backing_dev_info *bdi = dev_get_drvdata(dev);
+	char *end;
+	unsigned int ratio;
+	ssize_t ret = -EINVAL;
+
+	ratio = simple_strtoul(buf, &end, 10);
+	if (*buf && (end[0] == '\0' || (end[0] == '\n' && end[1] == '\0'))) {
+		ret = bdi_set_min_ratio(bdi, ratio);
+		if (!ret)
+			ret = count;
+	}
+	return ret;
+}
+BDI_SHOW(min_ratio, bdi->min_ratio)
+
 #define __ATTR_RW(attr) __ATTR(attr, 0644, attr##_show, attr##_store)
 
 static struct device_attribute bdi_dev_attrs[] = {
@@ -58,6 +76,7 @@ static struct device_attribute bdi_dev_a
 	__ATTR_RO(writeback_kb),
 	__ATTR_RO(dirty_kb),
 	__ATTR_RO(bdi_dirty_kb),
+	__ATTR_RW(min_ratio),
 	__ATTR_NULL,
 };
 
@@ -116,6 +135,8 @@ int bdi_init(struct backing_dev_info *bd
 
 	bdi->dev = NULL;
 
+	bdi->min_ratio = 0;
+
 	for (i = 0; i < NR_BDI_STAT_ITEMS; i++) {
 		err = percpu_counter_init_irq(&bdi->bdi_stat[i], 0);
 		if (err)
Index: linux/mm/page-writeback.c
===================================================================
--- linux.orig/mm/page-writeback.c	2008-01-29 14:40:35.000000000 +0100
+++ linux/mm/page-writeback.c	2008-01-29 15:35:34.000000000 +0100
@@ -247,6 +247,29 @@ static void task_dirty_limit(struct task
 }
 
 /*
+ *
+ */
+static DEFINE_SPINLOCK(bdi_lock);
+static unsigned int bdi_min_ratio;
+
+int bdi_set_min_ratio(struct backing_dev_info *bdi, unsigned int min_ratio)
+{
+	int ret = 0;
+	unsigned long flags;
+
+	spin_lock_irqsave(&bdi_lock, flags);
+	min_ratio -= bdi->min_ratio;
+	if (bdi_min_ratio + min_ratio < 100) {
+		bdi_min_ratio += min_ratio;
+		bdi->min_ratio += min_ratio;
+	} else
+		ret = -EINVAL;
+	spin_unlock_irqrestore(&bdi_lock, flags);
+
+	return ret;
+}
+
+/*
  * Work out the current dirty-memory clamping and background writeout
  * thresholds.
  *
@@ -334,7 +357,7 @@ get_dirty_limits(long *pbackground, long
 	*pdirty = dirty;
 
 	if (bdi) {
-		u64 bdi_dirty = dirty;
+		u64 bdi_dirty;
 		long numerator, denominator;
 
 		/*
@@ -342,8 +365,10 @@ get_dirty_limits(long *pbackground, long
 		 */
 		bdi_writeout_fraction(bdi, &numerator, &denominator);
 
+		bdi_dirty = (dirty * (100 - bdi_min_ratio)) / 100;
 		bdi_dirty *= numerator;
 		do_div(bdi_dirty, denominator);
+		bdi_dirty += (dirty * bdi->min_ratio) / 100;
 
 		*pbdi_dirty = bdi_dirty;
 		clip_bdi_dirty_limit(bdi, dirty, pbdi_dirty);
Index: linux/Documentation/ABI/testing/sysfs-class-bdi
===================================================================
--- linux.orig/Documentation/ABI/testing/sysfs-class-bdi	2008-01-29 14:40:35.000000000 +0100
+++ linux/Documentation/ABI/testing/sysfs-class-bdi	2008-01-29 15:37:24.000000000 +0100
@@ -48,3 +48,9 @@ bdi_dirty_kb (read-only)
 	Current threshold on this BDI for reclaimable + writeback
 	memory
 
+min_ratio (read-write)
+
+	Minimal percentage of global dirty threshold allocated to this
+	bdi.  If the value written to this file would make the the sum
+	of all min_ratio values exceed 100, then EINVAL is returned.
+	The default is zero

--

^ permalink raw reply	[flat|nested] 19+ messages in thread

* [patch 6/6] mm: bdi: allow setting a maximum for the bdi dirty limit
  2008-01-29 15:49 [patch 0/6] mm: bdi: updates Miklos Szeredi
                   ` (4 preceding siblings ...)
  2008-01-29 15:49 ` [patch 5/6] mm: bdi: allow setting a minimum for the bdi dirty limit Miklos Szeredi
@ 2008-01-29 15:49 ` Miklos Szeredi
  2008-01-31  0:39   ` Andrew Morton
  2008-01-29 17:06 ` [patch 0/6] mm: bdi: updates Peter Zijlstra
  6 siblings, 1 reply; 19+ messages in thread
From: Miklos Szeredi @ 2008-01-29 15:49 UTC (permalink / raw)
  To: akpm; +Cc: a.p.zijlstra, linux-kernel, linux-fsdevel, linux-mm

[-- Attachment #1: bdi-max.patch --]
[-- Type: text/plain, Size: 7889 bytes --]

From: Peter Zijlstra <a.p.zijlstra@chello.nl>

Add "max_ratio" to /sys/class/bdi.  This indicates the maximum
percentage of the global dirty threshold allocated to this bdi.

[mszeredi@suse.cz]

 - fix parsing in max_ratio_store().
 - export bdi_set_max_ratio() to modules
 - limit bdi_dirty with bdi->max_ratio
 - document new sysfs attribute

Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
Signed-off-by: Miklos Szeredi <mszeredi@suse.cz>
---

Index: linux/include/linux/backing-dev.h
===================================================================
--- linux.orig/include/linux/backing-dev.h	2008-01-29 16:33:14.000000000 +0100
+++ linux/include/linux/backing-dev.h	2008-01-29 16:33:14.000000000 +0100
@@ -52,6 +52,7 @@ struct backing_dev_info {
 	int dirty_exceeded;
 
 	unsigned int min_ratio;
+	unsigned int max_ratio, max_prop_frac;
 
 	struct device *dev;
 };
@@ -139,6 +140,7 @@ static inline unsigned long bdi_stat_err
 }
 
 int bdi_set_min_ratio(struct backing_dev_info *bdi, unsigned int min_ratio);
+int bdi_set_max_ratio(struct backing_dev_info *bdi, unsigned int max_ratio);
 
 /*
  * Flags in backing_dev_info::capability
Index: linux/include/linux/proportions.h
===================================================================
--- linux.orig/include/linux/proportions.h	2008-01-29 16:25:14.000000000 +0100
+++ linux/include/linux/proportions.h	2008-01-29 16:33:14.000000000 +0100
@@ -78,6 +78,19 @@ void prop_inc_percpu(struct prop_descrip
 }
 
 /*
+ * Limit the time part in order to ensure there are some bits left for the
+ * cycle counter and fraction multiply.
+ */
+#define PROP_MAX_SHIFT (3*BITS_PER_LONG/4)
+
+#define PROP_FRAC_SHIFT		(BITS_PER_LONG - PROP_MAX_SHIFT - 1)
+#define PROP_FRAC_BASE		(1UL << PROP_FRAC_SHIFT)
+
+void __prop_inc_percpu_max(struct prop_descriptor *pd,
+			   struct prop_local_percpu *pl, long frac);
+
+
+/*
  * ----- SINGLE ------
  */
 
Index: linux/lib/proportions.c
===================================================================
--- linux.orig/lib/proportions.c	2008-01-29 16:25:14.000000000 +0100
+++ linux/lib/proportions.c	2008-01-29 16:33:14.000000000 +0100
@@ -73,12 +73,6 @@
 #include <linux/proportions.h>
 #include <linux/rcupdate.h>
 
-/*
- * Limit the time part in order to ensure there are some bits left for the
- * cycle counter.
- */
-#define PROP_MAX_SHIFT (3*BITS_PER_LONG/4)
-
 int prop_descriptor_init(struct prop_descriptor *pd, int shift)
 {
 	int err;
@@ -268,6 +262,38 @@ void __prop_inc_percpu(struct prop_descr
 }
 
 /*
+ * identical to __prop_inc_percpu, except that it limits this pl's fraction to
+ * @frac/PROP_FRAC_BASE by ignoring events when this limit has been exceeded.
+ */
+void __prop_inc_percpu_max(struct prop_descriptor *pd,
+			   struct prop_local_percpu *pl, long frac)
+{
+	struct prop_global *pg = prop_get_global(pd);
+
+	prop_norm_percpu(pg, pl);
+
+	if (unlikely(frac != PROP_FRAC_BASE)) {
+		unsigned long period_2 = 1UL << (pg->shift - 1);
+		unsigned long counter_mask = period_2 - 1;
+		unsigned long global_count;
+		long numerator, denominator;
+
+		numerator = percpu_counter_read_positive(&pl->events);
+		global_count = percpu_counter_read(&pg->events);
+		denominator = period_2 + (global_count & counter_mask);
+
+		if (numerator > ((denominator * frac) >> PROP_FRAC_SHIFT))
+			goto out_put;
+	}
+
+	percpu_counter_add(&pl->events, 1);
+	percpu_counter_add(&pg->events, 1);
+
+out_put:
+	prop_put_global(pd, pg);
+}
+
+/*
  * Obtain a fraction of this proportion
  *
  *   p_{j} = x_{j} / (period/2 + t % period/2)
Index: linux/mm/backing-dev.c
===================================================================
--- linux.orig/mm/backing-dev.c	2008-01-29 16:33:14.000000000 +0100
+++ linux/mm/backing-dev.c	2008-01-29 16:33:14.000000000 +0100
@@ -68,6 +68,24 @@ static ssize_t min_ratio_store(struct de
 }
 BDI_SHOW(min_ratio, bdi->min_ratio)
 
+static ssize_t max_ratio_store(struct device *dev,
+		struct device_attribute *attr, const char *buf, size_t count)
+{
+	struct backing_dev_info *bdi = dev_get_drvdata(dev);
+	char *end;
+	unsigned int ratio;
+	ssize_t ret = -EINVAL;
+
+	ratio = simple_strtoul(buf, &end, 10);
+	if (*buf && (end[0] == '\0' || (end[0] == '\n' && end[1] == '\0'))) {
+		ret = bdi_set_max_ratio(bdi, ratio);
+		if (!ret)
+			ret = count;
+	}
+	return ret;
+}
+BDI_SHOW(max_ratio, bdi->max_ratio)
+
 #define __ATTR_RW(attr) __ATTR(attr, 0644, attr##_show, attr##_store)
 
 static struct device_attribute bdi_dev_attrs[] = {
@@ -77,6 +95,7 @@ static struct device_attribute bdi_dev_a
 	__ATTR_RO(dirty_kb),
 	__ATTR_RO(bdi_dirty_kb),
 	__ATTR_RW(min_ratio),
+	__ATTR_RW(max_ratio),
 	__ATTR_NULL,
 };
 
@@ -136,6 +155,8 @@ int bdi_init(struct backing_dev_info *bd
 	bdi->dev = NULL;
 
 	bdi->min_ratio = 0;
+	bdi->max_ratio = 100;
+	bdi->max_prop_frac = PROP_FRAC_BASE;
 
 	for (i = 0; i < NR_BDI_STAT_ITEMS; i++) {
 		err = percpu_counter_init_irq(&bdi->bdi_stat[i], 0);
Index: linux/mm/page-writeback.c
===================================================================
--- linux.orig/mm/page-writeback.c	2008-01-29 16:33:14.000000000 +0100
+++ linux/mm/page-writeback.c	2008-01-29 16:33:40.000000000 +0100
@@ -164,7 +164,8 @@ int dirty_ratio_handler(struct ctl_table
  */
 static inline void __bdi_writeout_inc(struct backing_dev_info *bdi)
 {
-	__prop_inc_percpu(&vm_completions, &bdi->completions);
+	__prop_inc_percpu_max(&vm_completions, &bdi->completions,
+			      bdi->max_prop_frac);
 }
 
 static inline void task_dirty_inc(struct task_struct *tsk)
@@ -258,17 +259,43 @@ int bdi_set_min_ratio(struct backing_dev
 	unsigned long flags;
 
 	spin_lock_irqsave(&bdi_lock, flags);
-	min_ratio -= bdi->min_ratio;
-	if (bdi_min_ratio + min_ratio < 100) {
-		bdi_min_ratio += min_ratio;
-		bdi->min_ratio += min_ratio;
-	} else
+	if (min_ratio > bdi->max_ratio) {
 		ret = -EINVAL;
+	} else {
+		min_ratio -= bdi->min_ratio;
+		if (bdi_min_ratio + min_ratio < 100) {
+			bdi_min_ratio += min_ratio;
+			bdi->min_ratio += min_ratio;
+		} else {
+			ret = -EINVAL;
+		}
+	}
 	spin_unlock_irqrestore(&bdi_lock, flags);
 
 	return ret;
 }
 
+int bdi_set_max_ratio(struct backing_dev_info *bdi, unsigned max_ratio)
+{
+	unsigned long flags;
+	int ret = 0;
+
+	if (max_ratio > 100)
+		return -EINVAL;
+
+	spin_lock_irqsave(&bdi_lock, flags);
+	if (bdi->min_ratio > max_ratio) {
+		ret = -EINVAL;
+	} else {
+		bdi->max_ratio = max_ratio;
+		bdi->max_prop_frac = (PROP_FRAC_BASE * max_ratio) / 100;
+	}
+	spin_unlock_irqrestore(&bdi_lock, flags);
+
+	return 0;
+}
+EXPORT_SYMBOL(bdi_set_max_ratio);
+
 /*
  * Work out the current dirty-memory clamping and background writeout
  * thresholds.
@@ -369,6 +396,8 @@ get_dirty_limits(long *pbackground, long
 		bdi_dirty *= numerator;
 		do_div(bdi_dirty, denominator);
 		bdi_dirty += (dirty * bdi->min_ratio) / 100;
+		if (bdi_dirty > (dirty * bdi->max_ratio) / 100)
+			bdi_dirty = dirty * bdi->max_ratio / 100;
 
 		*pbdi_dirty = bdi_dirty;
 		clip_bdi_dirty_limit(bdi, dirty, pbdi_dirty);
Index: linux/Documentation/ABI/testing/sysfs-class-bdi
===================================================================
--- linux.orig/Documentation/ABI/testing/sysfs-class-bdi	2008-01-29 16:33:14.000000000 +0100
+++ linux/Documentation/ABI/testing/sysfs-class-bdi	2008-01-29 16:33:14.000000000 +0100
@@ -53,4 +53,11 @@ min_ratio (read-write)
 	Minimal percentage of global dirty threshold allocated to this
 	bdi.  If the value written to this file would make the the sum
 	of all min_ratio values exceed 100, then EINVAL is returned.
-	The default is zero
+	If min_ratio would become larger than the current max_ratio,
+	then also EINVAL is returned.  The default is zero
+
+max_ratio (read-write)
+
+	Maximal percentage of global dirty threshold allocated to this
+	bdi.  If max_ratio would become smaller than the current
+	min_ratio, then EINVAL is returned.  The default is 100

--

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch 0/6] mm: bdi: updates
  2008-01-29 15:49 [patch 0/6] mm: bdi: updates Miklos Szeredi
                   ` (5 preceding siblings ...)
  2008-01-29 15:49 ` [patch 6/6] mm: bdi: allow setting a maximum " Miklos Szeredi
@ 2008-01-29 17:06 ` Peter Zijlstra
  2008-01-29 18:32   ` Miklos Szeredi
  6 siblings, 1 reply; 19+ messages in thread
From: Peter Zijlstra @ 2008-01-29 17:06 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: akpm, linux-kernel, linux-fsdevel, linux-mm


On Tue, 2008-01-29 at 16:49 +0100, Miklos Szeredi wrote:
> This is a series from Peter Zijlstra, with various updates by me.  The
> patchset mostly deals with exporting BDI attributes in sysfs.
> 
> Should be in a mergeable state, at least into -mm.

Thanks for picking these up Miklos!

While they do not strictly depend upon the /proc/<pid>/mountinfo patch I
think its good to mention they go hand in hand. The mountinfo file gives
the information needed to associate a mount with a given bdi for non
block devices.



^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch 2/6] mm: bdi: export BDI attributes in sysfs
  2008-01-29 15:49 ` [patch 2/6] mm: bdi: export BDI attributes in sysfs Miklos Szeredi
@ 2008-01-29 17:39   ` Greg KH
  2008-01-31  0:28   ` Andrew Morton
  2008-02-29 11:26   ` Andrew Morton
  2 siblings, 0 replies; 19+ messages in thread
From: Greg KH @ 2008-01-29 17:39 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: akpm, a.p.zijlstra, linux-kernel, linux-fsdevel, linux-mm,
	Kay Sievers, Trond Myklebust

On Tue, Jan 29, 2008 at 04:49:02PM +0100, Miklos Szeredi wrote:
> From: Peter Zijlstra <a.p.zijlstra@chello.nl>
> 
> Provide a place in sysfs (/sys/class/bdi) for the backing_dev_info
> object.  This allows us to see and set the various BDI specific
> variables.
> 
> In particular this properly exposes the read-ahead window for all
> relevant users and /sys/block/<block>/queue/read_ahead_kb should be
> deprecated.
> 
> With patient help from Kay Sievers and Greg KH
> 
> [mszeredi@suse.cz]
> 
>  - split off NFS and FUSE changes into separate patches
>  - document new sysfs attributes under Documentation/ABI
>  - do bdi_class_init as a core_initcall, otherwise the "default" BDI
>    won't be initialized
>  - remove bdi_init_fmt macro, it's not used very much
> 
> Signed-off-by: Peter Zijlstra <a.p.zijlstra@chello.nl>
> CC: Kay Sievers <kay.sievers@vrfy.org>
> CC: Greg KH <greg@kroah.com>

Acked-by: Greg Kroah-Hartman <gregkh@suse.de>


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch 0/6] mm: bdi: updates
  2008-01-29 17:06 ` [patch 0/6] mm: bdi: updates Peter Zijlstra
@ 2008-01-29 18:32   ` Miklos Szeredi
  0 siblings, 0 replies; 19+ messages in thread
From: Miklos Szeredi @ 2008-01-29 18:32 UTC (permalink / raw)
  To: a.p.zijlstra; +Cc: miklos, akpm, linux-kernel, linux-fsdevel, linux-mm

> On Tue, 2008-01-29 at 16:49 +0100, Miklos Szeredi wrote:
> > This is a series from Peter Zijlstra, with various updates by me.  The
> > patchset mostly deals with exporting BDI attributes in sysfs.
> > 
> > Should be in a mergeable state, at least into -mm.
> 
> Thanks for picking these up Miklos!
> 
> While they do not strictly depend upon the /proc/<pid>/mountinfo patch I
> think its good to mention they go hand in hand. The mountinfo file gives
> the information needed to associate a mount with a given bdi for non
> block devices.

More precisely /proc/<pid>/mountinfo is only needed to find mounts for
a given BDI (which might not be a very common scenario), and not the
other way round.

But both patches are useful, and they are even more useful together ;)

Miklos

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch 1/6] mm: bdi: tweak task dirty penalty
  2008-01-29 15:49 ` [patch 1/6] mm: bdi: tweak task dirty penalty Miklos Szeredi
@ 2008-01-31  0:13   ` Andrew Morton
  0 siblings, 0 replies; 19+ messages in thread
From: Andrew Morton @ 2008-01-31  0:13 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: a.p.zijlstra, linux-kernel, linux-fsdevel, linux-mm

On Tue, 29 Jan 2008 16:49:01 +0100
Miklos Szeredi <miklos@szeredi.hu> wrote:

> Penalizing heavy dirtiers with 1/8-th the total dirty limit might be rather
> excessive on large memory machines. Use sqrt to scale it sub-linearly.

Then again, it might not be.

I'll skip this one.  Please resend if/when it is proven to be a net benefit
across a broad range of workloads.  And stuff like that.

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch 2/6] mm: bdi: export BDI attributes in sysfs
  2008-01-29 15:49 ` [patch 2/6] mm: bdi: export BDI attributes in sysfs Miklos Szeredi
  2008-01-29 17:39   ` Greg KH
@ 2008-01-31  0:28   ` Andrew Morton
  2008-01-31  9:39     ` Miklos Szeredi
  2008-02-29 11:26   ` Andrew Morton
  2 siblings, 1 reply; 19+ messages in thread
From: Andrew Morton @ 2008-01-31  0:28 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: a.p.zijlstra, linux-kernel, linux-fsdevel, linux-mm, kay.sievers,
	greg, trond.myklebust

On Tue, 29 Jan 2008 16:49:02 +0100
Miklos Szeredi <miklos@szeredi.hu> wrote:

> From: Peter Zijlstra <a.p.zijlstra@chello.nl>
> 
> Provide a place in sysfs (/sys/class/bdi) for the backing_dev_info
> object.  This allows us to see and set the various BDI specific
> variables.
> 
> In particular this properly exposes the read-ahead window for all
> relevant users and /sys/block/<block>/queue/read_ahead_kb should be
> deprecated.

This description is not complete.  It implies that the readahead window is
not "properly" exposed for some "relevant" users.  The reader is left
wondering what on earth this is referring to.  I certainly don't know.
Perhaps when this information is revealed, we can work out what was
wrong with per-queue readahead tuning.

> --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> +++ linux/Documentation/ABI/testing/sysfs-class-bdi	2008-01-29 13:02:46.000000000 +0100
> @@ -0,0 +1,50 @@
> +What:		/sys/class/bdi/<bdi>/
> +Date:		January 2008
> +Contact:	Peter Zijlstra <a.p.zijlstra@chello.nl>
> +Description:
> +
> +Provide a place in sysfs for the backing_dev_info object.
> +This allows us to see and set the various BDI specific variables.
> +
> +The <bdi> identifyer can take the following forms:

"identifier"

> +blk-NAME
> +
> +	Block devices, NAME is 'sda', 'loop0', etc...

But if I've done `mknod /dev/pizza-party 8 0', I'm looking for
blk-pizza-party, not blk-sda.

But I might still have /dev/sda, too.

> +FSTYPE-MAJOR:MINOR
> +
> +	Non-block device backed filesystems which provide their own
> +	BDI, such as NFS and FUSE.  MAJOR:MINOR is the value of st_dev
> +	for files on this filesystem.
> +
> +default
> +
> +	The default backing dev, used for non-block device backed
> +	filesystems which do not provide their own BDI.
> +
> +Files under /sys/class/bdi/<bdi>/
> +---------------------------------
> +
> +read_ahead_kb (read-write)
> +
> +	Size of the read-ahead window in kilobytes
> +
> +reclaimable_kb (read-only)
> +
> +	Reclaimable (dirty or unstable) memory destined for writeback
> +	to this device
> +
> +writeback_kb (read-only)
> +
> +	Memory currently under writeback to this device
> +
> +dirty_kb (read-only)
> +
> +	Global threshold for reclaimable + writeback memory
> +
> +bdi_dirty_kb (read-only)
> +
> +	Current threshold on this BDI for reclaimable + writeback
> +	memory
> +

I dunno.  A number of the things which you're exposing are closely tied to
present-day kernel implementation and may be irrelevant or even
unimplementable in a few years' time.

At the very least you should put a HUGE warning in here telling everyone
that these files may disappear or be renamed with new semantics in the
future, and that they should design their userspace code with this in mind.

But that will only prevent userspace from outright crashing.  Once we
expose functionality of this nature, people will come to depend upon it.
We can't stop this.

Suppose $CLUELESS_CORP modifies $LARGE_DATABASE so that it uses these new
fields to optimise its cache population and cache flushout strategies. 
Later, we are forced to remove these fields.  The database now runs all
slowly.

It's just a bad idea to expose deep kernelguts in this way.  We need really
good reasons for doing so, and those reasons should be in the changelog.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch 6/6] mm: bdi: allow setting a maximum for the bdi dirty limit
  2008-01-29 15:49 ` [patch 6/6] mm: bdi: allow setting a maximum " Miklos Szeredi
@ 2008-01-31  0:39   ` Andrew Morton
  2008-01-31  9:46     ` Miklos Szeredi
  0 siblings, 1 reply; 19+ messages in thread
From: Andrew Morton @ 2008-01-31  0:39 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: a.p.zijlstra, linux-kernel, linux-fsdevel, linux-mm

On Tue, 29 Jan 2008 16:49:06 +0100
Miklos Szeredi <miklos@szeredi.hu> wrote:

> Add "max_ratio" to /sys/class/bdi.  This indicates the maximum
> percentage of the global dirty threshold allocated to this bdi.

Maybe I'm having a stupid day, but I don't understand the semantics of this
min and max at all.  I've read the code, and I've read the comments (well,
I've hunted for some) and I've read the docs.

I really don't know how anyone could use this in its current state without
doing a lot of code-reading and complex experimentation.  All of which
would be unneeded if this tunable was properly documented.

So.  Please provide adequate documentation for this tunable.  I'd suggest
that it be pitched at the level of a reasonably competent system operator. 
It should help them understand why the tunable exists, why they might
choose to alter it, and what effects they can expect to see.  Hopefully a
reaonably competent kernel developer can then understand it too.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch 2/6] mm: bdi: export BDI attributes in sysfs
  2008-01-31  0:28   ` Andrew Morton
@ 2008-01-31  9:39     ` Miklos Szeredi
  2008-01-31  9:54       ` Andrew Morton
  0 siblings, 1 reply; 19+ messages in thread
From: Miklos Szeredi @ 2008-01-31  9:39 UTC (permalink / raw)
  To: akpm
  Cc: miklos, a.p.zijlstra, linux-kernel, linux-fsdevel, linux-mm,
	kay.sievers, greg, trond.myklebust

> On Tue, 29 Jan 2008 16:49:02 +0100
> Miklos Szeredi <miklos@szeredi.hu> wrote:
> 
> > From: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > 
> > Provide a place in sysfs (/sys/class/bdi) for the backing_dev_info
> > object.  This allows us to see and set the various BDI specific
> > variables.
> > 
> > In particular this properly exposes the read-ahead window for all
> > relevant users and /sys/block/<block>/queue/read_ahead_kb should be
> > deprecated.
> 
> This description is not complete.  It implies that the readahead window is
> not "properly" exposed for some "relevant" users.  The reader is left
> wondering what on earth this is referring to.  I certainly don't know.
> Perhaps when this information is revealed, we can work out what was
> wrong with per-queue readahead tuning.

I think Peter meant, that the readahead window was only exposed for
block devices, and not things like NFS or FUSE.

> > --- /dev/null	1970-01-01 00:00:00.000000000 +0000
> > +++ linux/Documentation/ABI/testing/sysfs-class-bdi	2008-01-29 13:02:46.000000000 +0100
> > @@ -0,0 +1,50 @@
> > +What:		/sys/class/bdi/<bdi>/
> > +Date:		January 2008
> > +Contact:	Peter Zijlstra <a.p.zijlstra@chello.nl>
> > +Description:
> > +
> > +Provide a place in sysfs for the backing_dev_info object.
> > +This allows us to see and set the various BDI specific variables.
> > +
> > +The <bdi> identifyer can take the following forms:
> 
> "identifier"

Arrgh.  Must run spellchecker on doc files :)

> > +blk-NAME
> > +
> > +	Block devices, NAME is 'sda', 'loop0', etc...
> 
> But if I've done `mknod /dev/pizza-party 8 0', I'm looking for
> blk-pizza-party, not blk-sda.
> 
> But I might still have /dev/sda, too.

An alternative would be to uniformly use MAJOR:MINOR in there.  It
would work for block devices and anonymous devices (NFS/FUSE) as well.

Would that be any better?

> 
> > +FSTYPE-MAJOR:MINOR
> > +
> > +	Non-block device backed filesystems which provide their own
> > +	BDI, such as NFS and FUSE.  MAJOR:MINOR is the value of st_dev
> > +	for files on this filesystem.
> > +
> > +default
> > +
> > +	The default backing dev, used for non-block device backed
> > +	filesystems which do not provide their own BDI.
> > +
> > +Files under /sys/class/bdi/<bdi>/
> > +---------------------------------
> > +
> > +read_ahead_kb (read-write)
> > +
> > +	Size of the read-ahead window in kilobytes
> > +
> > +reclaimable_kb (read-only)
> > +
> > +	Reclaimable (dirty or unstable) memory destined for writeback
> > +	to this device
> > +
> > +writeback_kb (read-only)
> > +
> > +	Memory currently under writeback to this device
> > +
> > +dirty_kb (read-only)
> > +
> > +	Global threshold for reclaimable + writeback memory
> > +
> > +bdi_dirty_kb (read-only)
> > +
> > +	Current threshold on this BDI for reclaimable + writeback
> > +	memory
> > +
> 
> I dunno.  A number of the things which you're exposing are closely tied to
> present-day kernel implementation and may be irrelevant or even
> unimplementable in a few years' time.

Which ones?  They could possibly be moved to debugfs, or something.

I agree, that sysfs should be relatively stable.

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch 6/6] mm: bdi: allow setting a maximum for the bdi dirty limit
  2008-01-31  0:39   ` Andrew Morton
@ 2008-01-31  9:46     ` Miklos Szeredi
  2008-01-31 10:17       ` Peter Zijlstra
  0 siblings, 1 reply; 19+ messages in thread
From: Miklos Szeredi @ 2008-01-31  9:46 UTC (permalink / raw)
  To: akpm; +Cc: miklos, a.p.zijlstra, linux-kernel, linux-fsdevel, linux-mm

> On Tue, 29 Jan 2008 16:49:06 +0100
> Miklos Szeredi <miklos@szeredi.hu> wrote:
> 
> > Add "max_ratio" to /sys/class/bdi.  This indicates the maximum
> > percentage of the global dirty threshold allocated to this bdi.
> 
> Maybe I'm having a stupid day, but I don't understand the semantics of this
> min and max at all.  I've read the code, and I've read the comments (well,
> I've hunted for some) and I've read the docs.
> 
> I really don't know how anyone could use this in its current state without
> doing a lot of code-reading and complex experimentation.  All of which
> would be unneeded if this tunable was properly documented.
> 
> So.  Please provide adequate documentation for this tunable.  I'd suggest
> that it be pitched at the level of a reasonably competent system operator. 
> It should help them understand why the tunable exists, why they might
> choose to alter it, and what effects they can expect to see.  Hopefully a
> reaonably competent kernel developer can then understand it too.

OK.  I think what's missing from some docs, is a high level
description of the per-bdi throttling algorithm, and how it affects
writeback.  Because with info, I think the min and max ratios are
trivially understandable: they just override the result of the
algorithm, in case it would mean too high or too low threshold.

Peter, could you write something about that?

Thanks,
Miklos

^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch 2/6] mm: bdi: export BDI attributes in sysfs
  2008-01-31  9:39     ` Miklos Szeredi
@ 2008-01-31  9:54       ` Andrew Morton
  2008-01-31 10:08         ` Peter Zijlstra
  0 siblings, 1 reply; 19+ messages in thread
From: Andrew Morton @ 2008-01-31  9:54 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: a.p.zijlstra, linux-kernel, linux-fsdevel, linux-mm, kay.sievers,
	greg, trond.myklebust

On Thu, 31 Jan 2008 10:39:02 +0100 Miklos Szeredi <miklos@szeredi.hu> wrote:

> > On Tue, 29 Jan 2008 16:49:02 +0100
> > Miklos Szeredi <miklos@szeredi.hu> wrote:
> > 
> > > From: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > > 
> > > Provide a place in sysfs (/sys/class/bdi) for the backing_dev_info
> > > object.  This allows us to see and set the various BDI specific
> > > variables.
> > > 
> > > In particular this properly exposes the read-ahead window for all
> > > relevant users and /sys/block/<block>/queue/read_ahead_kb should be
> > > deprecated.
> > 
> > This description is not complete.  It implies that the readahead window is
> > not "properly" exposed for some "relevant" users.  The reader is left
> > wondering what on earth this is referring to.  I certainly don't know.
> > Perhaps when this information is revealed, we can work out what was
> > wrong with per-queue readahead tuning.
> 
> I think Peter meant, that the readahead window was only exposed for
> block devices, and not things like NFS or FUSE.

OK.

> 
> > > +blk-NAME
> > > +
> > > +	Block devices, NAME is 'sda', 'loop0', etc...
> > 
> > But if I've done `mknod /dev/pizza-party 8 0', I'm looking for
> > blk-pizza-party, not blk-sda.
> > 
> > But I might still have /dev/sda, too.
> 
> An alternative would be to uniformly use MAJOR:MINOR in there.  It
> would work for block devices and anonymous devices (NFS/FUSE) as well.
> 
> Would that be any better?

I suppose so.  sysfs likes to use symlinks to point over at related
things in different directories...

> > 
> > > +FSTYPE-MAJOR:MINOR
> > > +
> > > +	Non-block device backed filesystems which provide their own
> > > +	BDI, such as NFS and FUSE.  MAJOR:MINOR is the value of st_dev
> > > +	for files on this filesystem.
> > > +
> > > +default
> > > +
> > > +	The default backing dev, used for non-block device backed
> > > +	filesystems which do not provide their own BDI.
> > > +
> > > +Files under /sys/class/bdi/<bdi>/
> > > +---------------------------------
> > > +
> > > +read_ahead_kb (read-write)
> > > +
> > > +	Size of the read-ahead window in kilobytes
> > > +
> > > +reclaimable_kb (read-only)
> > > +
> > > +	Reclaimable (dirty or unstable) memory destined for writeback
> > > +	to this device
> > > +
> > > +writeback_kb (read-only)
> > > +
> > > +	Memory currently under writeback to this device
> > > +
> > > +dirty_kb (read-only)
> > > +
> > > +	Global threshold for reclaimable + writeback memory
> > > +
> > > +bdi_dirty_kb (read-only)
> > > +
> > > +	Current threshold on this BDI for reclaimable + writeback
> > > +	memory
> > > +
> > 
> > I dunno.  A number of the things which you're exposing are closely tied to
> > present-day kernel implementation and may be irrelevant or even
> > unimplementable in a few years' time.
> 
> Which ones?

I don't know - I misplaced my copy of linux-2.6.44 :)

The whole concept of a BDI might go away, who knows?  Progress in
non-volatile semiconductor storage might make the whole
rotating-platter-with-a-seek-head thing obsolete.

read_ahead_kb is likely to be stable.  writeback_kb is a stable concept
too, although we might lose the ability to keep track of it some time in
the future.

Suppose that /dev/sda and /dev/sdb share the same queue - we lose the ability
to track some of these things?

>  They could possibly be moved to debugfs, or something.
> 
> I agree, that sysfs should be relatively stable.

This does look more like a debugging feature than a permanently-offered,
support-it-forever part of the kernel ABI.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch 2/6] mm: bdi: export BDI attributes in sysfs
  2008-01-31  9:54       ` Andrew Morton
@ 2008-01-31 10:08         ` Peter Zijlstra
  0 siblings, 0 replies; 19+ messages in thread
From: Peter Zijlstra @ 2008-01-31 10:08 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Miklos Szeredi, linux-kernel, linux-fsdevel, linux-mm,
	kay.sievers, greg, trond.myklebust


On Thu, 2008-01-31 at 01:54 -0800, Andrew Morton wrote:
> On Thu, 31 Jan 2008 10:39:02 +0100 Miklos Szeredi <miklos@szeredi.hu> wrote:
> 
> > > On Tue, 29 Jan 2008 16:49:02 +0100
> > > Miklos Szeredi <miklos@szeredi.hu> wrote:
> > > 
> > > > From: Peter Zijlstra <a.p.zijlstra@chello.nl>
> > > > 
> > > > Provide a place in sysfs (/sys/class/bdi) for the backing_dev_info
> > > > object.  This allows us to see and set the various BDI specific
> > > > variables.
> > > > 
> > > > In particular this properly exposes the read-ahead window for all
> > > > relevant users and /sys/block/<block>/queue/read_ahead_kb should be
> > > > deprecated.
> > > 
> > > This description is not complete.  It implies that the readahead window is
> > > not "properly" exposed for some "relevant" users.  The reader is left
> > > wondering what on earth this is referring to.  I certainly don't know.
> > > Perhaps when this information is revealed, we can work out what was
> > > wrong with per-queue readahead tuning.
> > 
> > I think Peter meant, that the readahead window was only exposed for
> > block devices, and not things like NFS or FUSE.
> 
> OK.

And queue-less block devices like loop-back md/dm and whatnot.

> > 
> > > > +blk-NAME
> > > > +
> > > > +	Block devices, NAME is 'sda', 'loop0', etc...
> > > 
> > > But if I've done `mknod /dev/pizza-party 8 0', I'm looking for
> > > blk-pizza-party, not blk-sda.
> > > 
> > > But I might still have /dev/sda, too.
> > 
> > An alternative would be to uniformly use MAJOR:MINOR in there.  It
> > would work for block devices and anonymous devices (NFS/FUSE) as well.
> > 
> > Would that be any better?
> 
> I suppose so.  sysfs likes to use symlinks to point over at related
> things in different directories...

Yeah, I think that would work best. Its more consistent as well.

> > > 
> > > > +FSTYPE-MAJOR:MINOR
> > > > +
> > > > +	Non-block device backed filesystems which provide their own
> > > > +	BDI, such as NFS and FUSE.  MAJOR:MINOR is the value of st_dev
> > > > +	for files on this filesystem.
> > > > +
> > > > +default
> > > > +
> > > > +	The default backing dev, used for non-block device backed
> > > > +	filesystems which do not provide their own BDI.
> > > > +
> > > > +Files under /sys/class/bdi/<bdi>/
> > > > +---------------------------------
> > > > +
> > > > +read_ahead_kb (read-write)
> > > > +
> > > > +	Size of the read-ahead window in kilobytes
> > > > +
> > > > +reclaimable_kb (read-only)
> > > > +
> > > > +	Reclaimable (dirty or unstable) memory destined for writeback
> > > > +	to this device
> > > > +
> > > > +writeback_kb (read-only)
> > > > +
> > > > +	Memory currently under writeback to this device
> > > > +
> > > > +dirty_kb (read-only)
> > > > +
> > > > +	Global threshold for reclaimable + writeback memory
> > > > +
> > > > +bdi_dirty_kb (read-only)
> > > > +
> > > > +	Current threshold on this BDI for reclaimable + writeback
> > > > +	memory
> > > > +
> > > 
> > > I dunno.  A number of the things which you're exposing are closely tied to
> > > present-day kernel implementation and may be irrelevant or even
> > > unimplementable in a few years' time.
> > 
> > Which ones?
> 
> I don't know - I misplaced my copy of linux-2.6.44 :)
> 
> The whole concept of a BDI might go away, who knows?  Progress in
> non-volatile semiconductor storage might make the whole
> rotating-platter-with-a-seek-head thing obsolete.
> 
> read_ahead_kb is likely to be stable.  writeback_kb is a stable concept
> too, although we might lose the ability to keep track of it some time in
> the future.
> 
> Suppose that /dev/sda and /dev/sdb share the same queue - we lose the ability
> to track some of these things?
> 
> >  They could possibly be moved to debugfs, or something.
> > 
> > I agree, that sysfs should be relatively stable.
> 
> This does look more like a debugging feature than a permanently-offered,
> support-it-forever part of the kernel ABI.

Agreed, all except the read_ahead tunable are debugish. The min/max
things are real tunables though. (writing up a little text on the
why/how of those as we speak - well, write)


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch 6/6] mm: bdi: allow setting a maximum for the bdi dirty limit
  2008-01-31  9:46     ` Miklos Szeredi
@ 2008-01-31 10:17       ` Peter Zijlstra
  0 siblings, 0 replies; 19+ messages in thread
From: Peter Zijlstra @ 2008-01-31 10:17 UTC (permalink / raw)
  To: Miklos Szeredi; +Cc: akpm, linux-kernel, linux-fsdevel, linux-mm


On Thu, 2008-01-31 at 10:46 +0100, Miklos Szeredi wrote:
> > On Tue, 29 Jan 2008 16:49:06 +0100
> > Miklos Szeredi <miklos@szeredi.hu> wrote:
> > 
> > > Add "max_ratio" to /sys/class/bdi.  This indicates the maximum
> > > percentage of the global dirty threshold allocated to this bdi.
> > 
> > Maybe I'm having a stupid day, but I don't understand the semantics of this
> > min and max at all.  I've read the code, and I've read the comments (well,
> > I've hunted for some) and I've read the docs.
> > 
> > I really don't know how anyone could use this in its current state without
> > doing a lot of code-reading and complex experimentation.  All of which
> > would be unneeded if this tunable was properly documented.
> > 
> > So.  Please provide adequate documentation for this tunable.  I'd suggest
> > that it be pitched at the level of a reasonably competent system operator. 
> > It should help them understand why the tunable exists, why they might
> > choose to alter it, and what effects they can expect to see.  Hopefully a
> > reaonably competent kernel developer can then understand it too.
> 
> OK.  I think what's missing from some docs, is a high level
> description of the per-bdi throttling algorithm, and how it affects
> writeback.  Because with info, I think the min and max ratios are
> trivially understandable: they just override the result of the
> algorithm, in case it would mean too high or too low threshold.
> 
> Peter, could you write something about that?

Sure.

How about something like:

Under normal circumstances each device is given a part of the total
write-back cache that relates to its current avg writeout speed in
relation to the other devices.

min_ratio - allows one to assign a minimum portion of the write-back
cache to a particular device. This is useful in situations where you
might want to provide a minimum QoS. (One request for this feature came
from flash based storage people who wanted to avoid writing out at all
costs - they of course needed some pdflush hacks as well)

max_ratio - allows one to assign a maximum portion of the dirty limit to
a particular device. This is useful in situations where you want to
avoid one device taking all or most of the write-back cache. Eg. an NFS
mount that is prone to get stuck, or a FUSE mount which you don't trust
to play fair.


^ permalink raw reply	[flat|nested] 19+ messages in thread

* Re: [patch 2/6] mm: bdi: export BDI attributes in sysfs
  2008-01-29 15:49 ` [patch 2/6] mm: bdi: export BDI attributes in sysfs Miklos Szeredi
  2008-01-29 17:39   ` Greg KH
  2008-01-31  0:28   ` Andrew Morton
@ 2008-02-29 11:26   ` Andrew Morton
  2 siblings, 0 replies; 19+ messages in thread
From: Andrew Morton @ 2008-02-29 11:26 UTC (permalink / raw)
  To: Miklos Szeredi
  Cc: a.p.zijlstra, linux-kernel, linux-fsdevel, linux-mm, Kay Sievers,
	Greg KH, Trond Myklebust, linux-ia64

On Tue, 29 Jan 2008 16:49:02 +0100 Miklos Szeredi <miklos@szeredi.hu> wrote:

> From: Peter Zijlstra <a.p.zijlstra@chello.nl>
> 
> Provide a place in sysfs (/sys/class/bdi) for the backing_dev_info
> object.  This allows us to see and set the various BDI specific
> variables.
> 
> In particular this properly exposes the read-ahead window for all
> relevant users and /sys/block/<block>/queue/read_ahead_kb should be
> deprecated.
> 
> With patient help from Kay Sievers and Greg KH
> 
> [mszeredi@suse.cz]
> 
>  - split off NFS and FUSE changes into separate patches
>  - document new sysfs attributes under Documentation/ABI
>  - do bdi_class_init as a core_initcall, otherwise the "default" BDI
>    won't be initialized
>  - remove bdi_init_fmt macro, it's not used very much

please always provide diffstats.

 Documentation/ABI/testing/sysfs-class-bdi |   50 +++++++++++++                
 block/genhd.c                             |    3 
 include/linux/backing-dev.h               |    8 ++
 include/linux/writeback.h                 |    3 
 lib/percpu_counter.c                      |    1 
 mm/backing-dev.c                          |  108 ++++++++++++++++++++++++++++++
 mm/page-writeback.c                       |    2 
 mm/readahead.c                            |    8 +-
 8 files changed, 181 insertions(+), 2 deletions(-)

would you believe this breaks ia64 allmodconfig, in the usual place:

In file included from arch/ia64/ia32/sys_ia32.c:59:
arch/ia64/ia32/ia32priv.h:342:1: warning: "SET_PERSONALITY" redefined
In file included from include/linux/elf.h:7,
                 from include/linux/module.h:14,
                 from include/linux/device.h:21,
                 from include/linux/backing-dev.h:15,
                 from include/linux/nfs_fs_sb.h:5,
                 from include/linux/nfs_fs.h:50,
                 from arch/ia64/ia32/sys_ia32.c:35:
include/asm/elf.h:180:1: warning: this is the location of the previous definition


We keep on hitting stupid build errors in this area: ia64 and elf.  It is
obviously quite fragile.  It would be nice to fix it properly.


For now, the easy fix:

--- a/include/linux/backing-dev.h~mm-bdi-export-bdi-attributes-in-sysfs-ia64-fix
+++ a/include/linux/backing-dev.h
@@ -12,10 +12,10 @@
 #include <linux/log2.h>
 #include <linux/proportions.h>
 #include <linux/kernel.h>
-#include <linux/device.h>
 #include <asm/atomic.h>
 
 struct page;
+struct device;
 
 /*
  * Bits in backing_dev_info.state


^ permalink raw reply	[flat|nested] 19+ messages in thread

end of thread, other threads:[~2008-02-29 11:26 UTC | newest]

Thread overview: 19+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-01-29 15:49 [patch 0/6] mm: bdi: updates Miklos Szeredi
2008-01-29 15:49 ` [patch 1/6] mm: bdi: tweak task dirty penalty Miklos Szeredi
2008-01-31  0:13   ` Andrew Morton
2008-01-29 15:49 ` [patch 2/6] mm: bdi: export BDI attributes in sysfs Miklos Szeredi
2008-01-29 17:39   ` Greg KH
2008-01-31  0:28   ` Andrew Morton
2008-01-31  9:39     ` Miklos Szeredi
2008-01-31  9:54       ` Andrew Morton
2008-01-31 10:08         ` Peter Zijlstra
2008-02-29 11:26   ` Andrew Morton
2008-01-29 15:49 ` [patch 3/6] mm: bdi: expose the BDI object in sysfs for NFS Miklos Szeredi
2008-01-29 15:49 ` [patch 4/6] mm: bdi: expose the BDI object in sysfs for FUSE Miklos Szeredi
2008-01-29 15:49 ` [patch 5/6] mm: bdi: allow setting a minimum for the bdi dirty limit Miklos Szeredi
2008-01-29 15:49 ` [patch 6/6] mm: bdi: allow setting a maximum " Miklos Szeredi
2008-01-31  0:39   ` Andrew Morton
2008-01-31  9:46     ` Miklos Szeredi
2008-01-31 10:17       ` Peter Zijlstra
2008-01-29 17:06 ` [patch 0/6] mm: bdi: updates Peter Zijlstra
2008-01-29 18:32   ` Miklos Szeredi

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).