LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* -mm merge plans for 2.6.23
@ 2007-07-10  8:31 Andrew Morton
  2007-07-10  9:04 ` intel iommu (Re: -mm merge plans for 2.6.23) Jan Engelhardt
                   ` (26 more replies)
  0 siblings, 27 replies; 484+ messages in thread
From: Andrew Morton @ 2007-07-10  8:31 UTC (permalink / raw)
  To: linux-kernel


When replying, please rewrite the subject suitably and try to Cc: the
appropriate developer(s).



add-lzo1x-algorithm-to-the-kernel.patch
make-common-helpers-for-seq_files-that-work-with-list_head-s.patch
lots-of-architectures-enable-arbitary-speed-tty-support.patch

 Merge

serial-assert-dtr-for-serial-console-devices.patch

 Don't know.  I worry about Russell's concern (see the changelog)

git-acpi-s390-struct-bin_attribute-changes.patch
cpuidle-add-rating-to-the-governors-and-pick-the-one-with-highest-rating-by-default-fix.patch
exit-acpi-processor-module-gracefully-if-acpi-is-disabled.patch
fix-empty-macros-in-acpi.patch
drivers-acpi-sbsc-remove-dead-code.patch
acpi-enable-c3-power-state-on-dell-inspiron-8200.patch
drivers-acpi-pci_linkc-lower-printk-severity.patch

 Sent to lenb

working-3d-dri-intel-agpko-resume-for-i815-chip.patch

 Sent to davej

cifs-use-simple_prepare_write-to-zero-page-data.patch
cifs-zero_user_page-conversion.patch

 Sent to sfrench

bugfix-cpufreq-in-combination-with-performance-governor.patch
restore-previously-used-governor-on-a-hot-replugged-cpu.patch

 Sent to davej

kcopyd-use-mutex-instead-of-semaphore.patch

 Sent to agk

powerpc-promc-remove-undef-printk.patch
8xx-mpc885ads-pcmcia-support.patch
dts-kill-hardcoded-phandles.patch
ppc-remove-dead-code-for-preventing-pread-and-pwrite-calls.patch
viotape-use-designated-initializers-for-fops-member.patch
make-drivers-char-hvc_consoleckhvcd-static.patch
powerpc-enable-arbitary-speed-tty-ioctls-and-split.patch
powerpc-tlb_32c-build-fix.patch
sky-cpu-and-nexus-code-style-improvement.patch
sky-cpu-and-nexus-include-ioh.patch
sky-cpu-and-nexus-check-for-platform_get_resource-ret.patch
sky-cpu-and-nexus-check-for-create_proc_entry-ret-code.patch
sky-cpu-use-c99-style-for-struct-init.patch

 Sent to paulus

revert-gregkh-driver-block-device.patch
driver-core-check-return-code-of-sysfs_create_link.patch
driver-core-coding-style-cleanup.patch
pm-do-not-use-saved_state-from-struct-dev_pm_info-on-arm.patch
nozomi-remove-termios-checks-from-various-old-char-serial-drivers.patch

 Sent to greg

git-dvb-saa7134-tvaudio-fix.patch
dvb_en_50221-convert-to-kthread-api.patch

 Sent to mchehab

hdaps-switch-to-using-input-polldev.patch
applesmc-switch-to-using-input-polldev.patch
applesmc-add-temperature-sensors-set-for-macbook.patch
ams-switch-to-using-input-polldev.patch

 Sent to mhoffman

sn-correct-rom-resource-length-for-bios-copy.patch

 Sent to Tony

make-input-layer-use-seq_list_xxx-helpers.patch
touchscreen-fujitsu-touchscreen-driver.patch
serio_raw_read-warning-fix.patch
tsdev-fix-broken-usecto-millisecs-conversion.patch

 Sent to Dmitry

use-posix-bre-in-headers-install-target.patch
modpost-white-list-pattern-adjustment.patch
strip-config_-automatically-in-kernel-configuration-search.patch
fix-the-warning-when-running-make-tags.patch
kconfig-reset-generated-values-only-if-kconfig-and-config-agree.patch

 Sent to Sam

led_colour_show-warning-fix.patch

 Sent to rpurdie

libata-config_pm=n-compile-fix.patch
pata_acpi-restore-driver.patch
libata-core-convert-to-use-cancel_rearming_delayed_work.patch
libata-implement-ata_wait_after_reset.patch
sata_promise-sata-hotplug-support.patch
libata-add-irq_flags-to-struct-pata_platform_info-fix.patch
ata-add-the-sw-ncq-support-to-sata_nv-for-mcp51-mcp55-mcp61.patch
sata_nv-allow-changing-queue-depth.patch
pata_hpt3x3-major-reworking-and-testing.patch
iomap-sort-out-the-broken-address-reporting-caused-by-the-iomap-layer.patch
ata-use-iomap_name.patch

 Sent to jgarzik

libata-check-for-an-support.patch
scsi-expose-an-to-user-space.patch
libata-expose-an-to-user-space.patch
scsi-save-disk-in-scsi_device.patch
libata-send-event-when-an-received.patch

 Am sitting on these due to confusion regarding the status of the ata-ahci
 patches.

ata-ahci-alpm-store-interrupt-value.patch
ata-ahci-alpm-expose-power-management-policy-option-to-users.patch
ata-ahci-alpm-enable-link-power-management-for-ata-drivers.patch
ata-ahci-alpm-enable-aggressive-link-power-management-for-ahci-controllers.patch

 These appear to need some work.

libata-add-human-readable-error-value-decoding.patch
libata-fix-hopefully-all-the-remaining-problems-with.patch
testing-patch-for-ali-pata-fixes-hopefully-for-the-problems-with-atapi-dma.patch
pata_ali-more-work.patch

 Dead/dying/abandoned ata things.  Might drop.

mips-make-resources-for-ds1742-static-__initdata.patch

 Sent to Ralf.

tty-add-the-new-ioctls-and-definitionto-the-mips.patch

 Awaiting merge of
 lots-of-architectures-enable-arbitary-speed-tty-support.patch

mmc-at91_mci-typo.patch

 Sent to drzeus

mtd-onenand-build-fix.patch
nommu-present-backing-device-capabilities-for-mtd.patch
nommu-add-support-for-direct-mapping-through-mtdconcat.patch
nommu-make-it-possible-for-romfs-to-use-mtd-devices.patch
romfs-printk-format-warnings.patch
mtd-add-module-license-to-mtdbdi.patch

 Sent to dvmw2

8139too-force-media-setting-fix.patch
blackfin-on-chip-ethernet-mac-controller-driver.patch
atari_pamsnetc-old-declaration-ritchie-style-fix.patch
sundance-phy-address-form-0-only-for-device-id-0x0200.patch
use-is_power_of_2-in-cxgb3-cxgb3_mainc.patch
use-is_power_of_2-in-myri10ge-myri10gec.patch
3csoho100-tx-needs-extra_preamble.patch

 Sent to jgarzik

3x59x-fix-pci-resource-management.patch
update-smc91x-driver-with-arm-versatile-board-info.patch
drivers-net-ns83820c-add-paramter-to-disable-auto.patch

 netdev patches which are stuck in limbo land.

make-atm-driver-use-seq_list_xxx-helpers.patch
make-some-network-related-proc-files-use-seq_list_xxx.patch
wrong-timeout-value-in-sk_wait_data-v2-fix.patch
use-mutex-instead-of-semaphore-in-vlsi-82c147-irda-controller-driver.patch
bonding-bond_mainc-make-2-functions-static.patch
net-make-struct-dccp_li_cachep-static.patch
net-ipv4-netfilter-ip_tablesc-lower-printk-severity.patch
rpc-remove-makefile-reference-to-obsolete-rxrpc-config.patch

 Sent to davem (mostly merged now, I think)

bluetooth-remove-the-redundant-non-seekable-llseek-method.patch
rfcomm-hangup-ttys-before-releasing-rfcomm_dev.patch

 Sent to Marcel

git-ioat-vs-git-md-accel.patch
ioat-warning-fix.patch
fix-i-oat-for-kexec.patch

 I don't seem to be able to get rid of these.  Chris Leech appears to have
 vanished.

auth_gss-unregister-gss_domain-when-unloading-module.patch

 Sent to Trond and Bruce, needs work.

pa-risc-use-page-allocator-instead-of-slab-allocator.patch

 Sent to Kyle

pcmcia-delete-obsolete-pcmcia_ioctl-feature.patch
use-menuconfig-objects-pcmcia.patch

 Am a bit stuck with the pcmcia patches.  Dominik has disappeared.

pcmcia-pccard-deadlock-fix.patch

 I think this isn't a good patch.  Am holding onto it as a reminder that
 pcmcia deadlocks.

dont-optimise-away-baud-rate-changes-when-bother-is-used.patch
serial-add-support-for-ite-887x-chips.patch
serial_txx9-fix-modem-control-line-handling.patch
serial_txx9-cleanup-includes.patch

 Serial stuff.  Will run these past rmk and Alan and will merge them if they
 survive.

revert-gregkh-pci-pci_bridge-device.patch
fix-gregkh-pci-pci-syscallc-switch-to-refcounting-api.patch
pci-x-pci-express-read-control-interfaces-fix.patch
remove-pci_dac_dma_-apis.patch
round_up-macro-cleanup-in-drivers-pci.patch
pcie-remove-spin_lock_unlocked.patch
add-pci_try_set_mwi.patch
cpci_hotplug-convert-to-use-the-kthread-api.patch
pci_set_power_state-check-for-pm-capabilities-earlier.patch

 Sent to Greg

s390-rename-cpu_idle-to-s390_cpu_idle.patch

 Sent to Martin.

restore-acpi-change-for-scsi.patch
git-scsi-misc-vs-greg-sysfs-stuff.patch
aacraid-rename-check_reset.patch
scsi-dont-build-scsi_dma_mapunmap-for-has_dma.patch
drivers-scsi-small-cleanups.patch
sym53c8xx_2-claims-cpqarray-device.patch
drivers-scsi-wd33c93c-cleanups.patch
make-seagate_st0x_detect-static.patch
pci-error-recovery-symbios-scsi-base-support.patch
pci-error-recovery-symbios-scsi-first-failure.patch
drivers-scsi-pcmcia-nsp_csc-remove-kernel-24-code.patch
drivers-message-i2o-devicec-remove-redundant-gfp_atomic-from-kmalloc.patch
drivers-scsi-aic7xxx_oldc-remove-redundant-gfp_atomic-from-kmalloc.patch
use-menuconfig-objects-ii-scsi.patch
remove-dead-references-to-module_parm-macro.patch
ppa-coding-police-and-printk-levels.patch
remove-the-dead-cyberstormiii_scsi-option.patch
config_scsi_fd_8xx-no-longer-exists.patch
use-mutex-instead-of-semaphore-in-megaraid-mailbox-driver.patch

 Sent to James.

scsi-lpfc-lpfc_initc-remove-unused-variable.patch

 Will add to the James queue once add-pci_try_set_mwi.patch is merged.

use-menuconfig-objects-block-layer.patch
use-menuconfig-objects-ib-block.patch
use-menuconfig-objects-ii-block-devices.patch
block-device-elevator-use-list_for_each_entry-instead-of-list_for_each.patch
update-documentation-block-barriertxt.patch

 Sent to Jens.

videopix-frame-grabber-fix-unreleased-lock-in-vfc_debug.patch

 Sent to davem

fix-gregkh-usb-usb-ehci-cpufreq-fix.patch
fix-gregkh-usb-usb-use-menuconfig-objects.patch
make-usb-autosuspend-timer-1-sec-jiffy-aligned.patch
drivers-block-ubc-use-list_for_each_entry.patch
ftdi_sio-fix-something.patch
usb-make-the-usb_device-numa_node-to-get-assigned-from.patch
mos7840c-turn-this-into-a-serial-driver.patch
pl2303-remove-bogus-checks-and-fix-speed-support-to-use.patch
visor-and-whiteheat-remove-bogus-termios-change-checks.patch
mos7720-remove-bogus-no-termios-change-check.patch
io_-remove-bogus-termios-no-change-checks.patch
usb-remove-makefile-reference-to-obsolete-ohci_at91.patch

 Sent to Greg.

use-list_for_each_entry-for-iteration-in-prism-54-driver.patch

 Sent to linville

revert-x86_64-mm-verify-cpu-rename.patch
add-kstrndup-fix.patch
xen-build-fix.patch
fix-x86_64-numa-fake-apicid_to_node-mapping-for-fake-numa-2.patch
fix-x86_64-mm-xen-xen-smp-guest-support.patch
more-fix-x86_64-mm-xen-xen-smp-guest-support.patch
fix-x86_64-mm-sched-clock-share.patch
fix-x86_64-mm-xen-add-xen-virtual-block-device-driver.patch
fix-x86_64-mm-add-common-orderly_poweroff.patch
fix-x86_64-mm-xen-xen-event-channels.patch
arch-i386-xen-mmuc-must-include-linux-schedh.patch
tidy-up-usermode-helper-waiting-a-bit-fix.patch
update-x86_64-mm-xen-use-iret-directly-where-possible.patch
i386-add-support-for-picopower-irq-router.patch
make-arch-i386-kernel-setupcremapped_pgdat_init-static.patch
arch-i386-kernel-i8253c-should-include-asm-timerh.patch
make-arch-i386-kernel-io_apicctimer_irq_works-static-again.patch
quicklist-support-for-x86_64.patch
x86_64-extract-helper-function-from-e820_register_active_regions.patch
x86_64-fix-e820_hole_size-based-on-address-ranges.patch
x86_64-acpi-disable-srat-when-numa-emulation-succeeds.patch
x86_64-slit-fake-pxm-to-node-mapping-for-fake-numa-2.patch
x86_64-numa-fake-apicid_to_node-mapping-for-fake-numa-2.patch
x86-use-elfnoteh-to-generate-vsyscall-notes-fix.patch
mmconfig-x86_64-i386-insert-unclaimed-mmconfig-resources.patch
x86_64-fix-smp_call_function_single-return-value.patch
x86_64-o_excl-on-dev-mcelog.patch
x86_64-support-poll-on-dev-mcelog.patch
x86_64-mcelog-tolerant-level-cleanup.patch
x86_64-mce-poll-at-idle_start-and-printk-fix.patch
i386-fix-machine-rebooting.patch
x86-fix-section-mismatch-warnings-in-mtrr.patch
x86_64-ratelimit-segfault-reporting-rate.patch
x86_64-pm_trace-support.patch
make-alt-sysrq-p-display-the-debug-register-contents.patch
i386-flush_tlb_kernel_range-add-reference-to-the-arguments.patch
round_jiffies-for-i386-and-x86-64-non-critical-corrected-mce-polling.patch
pci-disable-decode-of-io-memory-during-bar-sizing.patch
mmconfig-validate-against-acpi-motherboard-resources.patch
x86_64-irq-check-remote-irr-bit-before-migrating-level-triggered-irq-v3.patch
i386-remove-support-for-the-rise-cpu.patch
x86-64-calgary-generalize-calgary_increase_split_completion_timeout.patch
x86-64-calgary-update-copyright-notice.patch
x86-64-calgary-introduce-handle_quirks-for-various-chipset-quirks.patch
x86-64-calgary-introduce-chipset-specific-ops.patch
x86-64-calgary-abstract-how-we-find-the-iommu_table-for-a-device.patch
x86-64-calgary-introduce-calioc2-support.patch
x86-64-calgary-add-chip_ops-and-a-quirk-function-for-calioc2.patch
x86-64-calgary-implement-calioc2-tce-cache-flush-sequence.patch
x86-64-calgary-make-dump_error_regs-a-chip-op.patch
x86-64-calgary-grab-plssr-too-when-a-dma-error-occurs.patch
x86-64-calgary-reserve-tces-with-the-same-address-as-mem-regions.patch
x86-64-calgary-cleanup-of-unneeded-macros.patch
x86-64-calgary-tabify-and-trim-trailing-whitespace.patch
x86-64-calgary-only-reserve-the-first-1mb-of-io-space-for-calioc2.patch
x86-64-calgary-tidy-up-debug-printks.patch
i386-make-arch-i386-mm-pgtablecpgd_cdtor-static.patch
i386-fix-section-mismatch-warning-in-intel_cacheinfo.patch
i386-do-not-restore-reserved-memory-after-hibernation.patch
paravirt-helper-to-disable-all-io-space-fix.patch
dmi_match-patch-in-rebootc-for-sff-dell-optiplex-745-fixes-hang.patch
i386-hpet-check-if-the-counter-works.patch
i386-trim-memory-not-covered-by-wb-mtrrs.patch
kprobes-x86_64-fix-for-mark-ro-data.patch
kprobes-i386-fix-for-mark-ro-data.patch
divorce-config_x86_pae-from-config_highmem64g.patch
remove-unneeded-test-of-task-in-dump_trace.patch
i386-move-the-kernel-to-16mb-for-numa-q.patch
i386-show-unhandled-signals.patch
i386-minor-nx-handling-adjustment.patch
x86-smp-alt-once-option-is-only-useful-with-hotplug_cpu.patch
x86-64-remove-unused-variable-maxcpus.patch
move-functions-declarations-to-header-file.patch
x86_64-during-vm-oom-condition.patch
i386-during-vm-oom-condition.patch
x86-64-disable-the-gart-in-shutdown.patch
x86_84-move-iommu-declaration-from-proto-to-iommuh.patch
i386-uaccessh-replace-hard-coded-constant-with-appropriate-macro-from-kernelh.patch
i386-add-cpu_relax-to-cmos_lock.patch
x86_64-flush_tlb_kernel_range-warning-fix.patch
x86_64-add-ioapic-nmi-support.patch
x86_64-change-_map_single-to-static-in-pci_gartc-etc.patch
x86_64-geode-hw-random-number-generator-depend-on-x86_3.patch
x86_64-fix-wrong-comment-regarding-set_fixmap.patch
arch-x86_64-kernel-processc-lower-printk-severity.patch
nohz-fix-nohz-x86-dyntick-idle-handling.patch
acpi-move-timer-broadcast-and-pmtimer-access-before-c3-arbiter-shutdown.patch
clockevents-fix-typo-in-acpi_pmc.patch
timekeeping-fixup-shadow-variable-argument.patch
timerc-cleanup-recently-introduced-whitespace-damage.patch
clockevents-remove-prototypes-of-removed-functions.patch
clockevents-fix-resume-logic.patch
clockevents-fix-device-replacement.patch
tick-management-spread-timer-interrupt.patch
highres-improve-debug-output.patch
hrtimer-speedup-hrtimer_enqueue.patch
pcspkr-use-the-global-pit-lock.patch
ntp-move-the-cmos-update-code-into-ntpc.patch
i386-pit-stop-only-when-in-periodic-or-oneshot-mode.patch
i386-remove-volatile-in-apicc.patch
i386-hpet-assumes-boot-cpu-is-0.patch
i386-move-pit-function-declarations-and-constants-to-correct-header-file.patch
x86_64-untangle-asm-hpeth-from-asm-timexh.patch
x86_64-use-generic-cmos-update.patch
x86_64-remove-dead-code-and-other-janitor-work-in-tscc.patch
x86_64-fix-apic-typo.patch
x86_64-convert-to-cleckevents.patch
acpi-remove-the-useless-ifdef-code.patch
x86_64-hpet-restore-vread.patch
x86_64-restore-restore-nohpet-cmdline.patch
x86_64-block-irq-balancing-for-timer.patch
x86_64-prep-idle-loop-for-dynticks.patch
x86_64-enable-high-resolution-timers-and-dynticks.patch
x86_64-dynticks-disable-hpet_id_legsup-hpets.patch
xen-fix-x86-config-dependencies.patch
x86_64-get-mp_bus_to_node-as-early.patch
xen-suppress-abs-symbol-warnings-for-unused-reloc-pointers.patch
xen-cant-support-numa-yet.patch
x86-fix-iounmaps-use-of-vm_structs-size-field.patch
arch-x86_64-kernel-aperturec-lower-printk-severity.patch
arch-x86_64-kernel-e820c-lower-printk-severity.patch
ich-force-hpet-make-generic-time-capable-of-switching-broadcast-timer.patch
ich-force-hpet-restructure-hpet-generic-clock-code.patch
ich-force-hpet-ich7-or-later-quirk-to-force-detect-enable.patch
ich-force-hpet-late-initialization-of-hpet-after-quirk.patch
ich-force-hpet-ich5-quirk-to-force-detect-enable.patch
ich-force-hpet-ich5-fix-a-bug-with-suspend-resume.patch
ich-force-hpet-add-ich7_0-pciid-to-quirk-list.patch
geode-basic-infrastructure-support-for-amd-geode-class.patch
geode-mfgpt-support-for-geode-class-machines.patch
geode-mfgpt-clock-event-device-support.patch
i386-x86_64-insert-hpet-firmware-resource-after-pci-enumeration-has-completed.patch
i386-ioapic-remove-old-irq-balancing-debug-cruft.patch
i386-deactivate-the-test-for-the-dead-config_debug_page_type.patch

 Sent to Andi

fix-xfs_ioc_fsgeometry_v1-in-compat-mode.patch
fix-xfs_ioc__to_handle-and-xfs_ioc_openreadlink_by_handle-in-compat-mode.patch
fix-xfs_ioc_fsbulkstat_single-and-xfs_ioc_fsinumbers-in-compat-mode.patch

 Sent to Tim & David.

xtensa-enable-arbitary-tty-speed-setting-ioctls.patch

 Sent to czankel

kgdb-warning-fix.patch
kgdb-kconfig-fix.patch
kgdb-use-new-style-interrupt-flags.patch
kgdb-section-fix.patch
kgdb_skipexception-warning-fix.patch
kgdb-ia64-fixes.patch
kgdb-bust-on-ia64.patch
kgdb-build-fix-2.patch

 Sent to Jason

pci-x-pci-express-read-control-interfaces-myrinet.patch
pci-x-pci-express-read-control-interfaces-mthca.patch
pci-x-pci-express-read-control-interfaces-e1000.patch
pci-x-pci-express-read-control-interfaces-qla2xxx.patch

 Will send these to maintainers once
 gregkh-pci-pci-add-pci-x-pci-express-read-control-interfaces.patch gets
 merged.

gen_estimator-fix-locking-and-timer-related-bugs.patch
netpoll-fix-a-leak-n-bug-in-netpoll_cleanup.patch

 I think these might be defunct.  Will let the net guys sort that out.

vmscan-give-referenced-active-and-unmapped-pages-a-second-trip-around-the-lru.patch

 This is scary.  Will sit and admire it until it has been demonstrated
 to be a net gain.

console-more-buf-for-index-parsing.patch
console-console-handover-to-preferred-console.patch

 Merge

x86-initial-fixmap-support.patch
serial-convert-early_uart-to-earlycon-for-8250.patch
change-zonelist-order-zonelist-order-selection-logic.patch
hugetlb-remove-unnecessary-nid-initialization.patch
mm-use-div_round_up-in-mm-memoryc.patch
make-proc-slabinfo-use-seq_list_xxx-helpers.patch
mm-alloc_large_system_hash-can-free-some-memory-for.patch
remove-the-deprecated-kmem_cache_t-typedef-from-slabh.patch
slob-rework-freelist-handling.patch
slob-remove-bigblock-tracking.patch
slob-improved-alignment-handling.patch
vmscan-fix-comments-related-to-shrink_list.patch

 mm stuff: will merge.

mm-fix-fault-vs-invalidate-race-for-linear-mappings.patch
mm-merge-populate-and-nopage-into-fault-fixes-nonlinear.patch
mm-merge-nopfn-into-fault.patch
convert-hugetlbfs-to-use-vm_ops-fault.patch
mm-remove-legacy-cruft.patch
mm-debug-check-for-the-fault-vs-invalidate-race.patch
mm-fix-clear_page_dirty_for_io-vs-fault-race.patch
invalidate_mapping_pages-add-cond_resched.patch
ocfs2-release-page-lock-before-calling-page_mkwrite.patch
document-page_mkwrite-locking.patch

 The fault-vs-invalidate race fix.  I have belatedly learned that these need
 more work, so their state is uncertain.

slub-support-slub_debug-on-by-default.patch
numa-mempolicy-dynamic-interleave-map-for-system-init.patch
oom-stop-allocating-user-memory-if-tif_memdie-is-set.patch
numa-mempolicy-trivial-debug-fixes.patch
mm-fix-improper-init-type-section-references.patch
page-table-handling-cleanup.patch
kill-vmalloc_earlyreserve.patch
mm-more-__meminit-annotations.patch
mm-slabc-start_cpu_timer-should-be-__cpuinit.patch
madvise_need_mmap_write-usage.patch
slob-initial-numa-support.patch
mm-page_allocc-lower-printk-severity.patch
mm-avoid-tlb-gather-restarts.patch
mm-remove-ptep_establish.patch
mm-remove-ptep_test_and_clear_dirty-and-ptep_clear_flush_dirty.patch

 mm misc: will merge.

mm-revert-kernel_ds-buffered-write-optimisation.patch
revert-81b0c8713385ce1b1b9058e916edcf9561ad76d6.patch
revert-6527c2bdf1f833cc18e8f42bd97973d583e4aa83.patch
mm-clean-up-buffered-write-code.patch
mm-debug-write-deadlocks.patch
mm-trim-more-holes.patch
mm-buffered-write-cleanup.patch
mm-write-iovec-cleanup.patch
mm-fix-pagecache-write-deadlocks.patch
mm-buffered-write-iterator.patch
fs-fix-data-loss-on-error.patch
fs-introduce-write_begin-write_end-and-perform_write-aops.patch
mm-restore-kernel_ds-optimisations.patch
implement-simple-fs-aops.patch
block_dev-convert-to-new-aops.patch
ext2-convert-to-new-aops.patch
ext3-convert-to-new-aops.patch
ext3-convert-to-new-aops-fix.patch
ext4-convert-to-new-aops.patch
ext4-convert-to-new-aops-fix.patch
xfs-convert-to-new-aops.patch
gfs2-convert-to-new-aops.patch
fs-new-cont-helpers.patch
fat-convert-to-new-aops.patch
#adfs-convert-to-new-aops.patch
hfs-convert-to-new-aops.patch
hfsplus-convert-to-new-aops.patch
hpfs-convert-to-new-aops.patch
bfs-convert-to-new-aops.patch
qnx4-convert-to-new-aops.patch
reiserfs-use-generic-write.patch
reiserfs-convert-to-new-aops.patch
reiserfs-use-generic_cont_expand_simple.patch
with-reiserfs-no-longer-using-the-weird-generic_cont_expand-remove-it-completely.patch
nfs-convert-to-new-aops.patch
smb-convert-to-new-aops.patch
fuse-convert-to-new-aops.patch
hostfs-convert-to-new-aops.patch
jffs2-convert-to-new-aops.patch
ufs-convert-to-new-aops.patch
udf-convert-to-new-aops.patch
sysv-convert-to-new-aops.patch
minix-convert-to-new-aops.patch
jfs-convert-to-new-aops.patch
fs-adfs-convert-to-new-aops.patch
fs-affs-convert-to-new-aops.patch
ocfs2-convert-to-new-aops.patch

 pagefault-in-write deadlock fixes.  Will hold for 2.6.24.

fix-read-truncate-race.patch
make-sure-readv-stops-reading-when-it-hits-end-of-file.patch
fs-remove-some-aop_truncated_page.patch
remove-alloc_zeroed_user_highpage.patch

 Will merge if they're mergeable.

add-a-bitmap-that-is-used-to-track-flags-affecting-a-block-of-pages.patch
add-__gfp_movable-for-callers-to-flag-allocations-from-high-memory-that-may-be-migrated.patch
split-the-free-lists-for-movable-and-unmovable-allocations.patch
choose-pages-from-the-per-cpu-list-based-on-migration-type.patch
add-a-configure-option-to-group-pages-by-mobility.patch
drain-per-cpu-lists-when-high-order-allocations-fail.patch
move-free-pages-between-lists-on-steal.patch
group-short-lived-and-reclaimable-kernel-allocations.patch
group-high-order-atomic-allocations.patch
do-not-group-pages-by-mobility-type-on-low-memory-systems.patch
bias-the-placement-of-kernel-pages-at-lower-pfns.patch
be-more-agressive-about-stealing-when-migrate_reclaimable-allocations-fallback.patch
fix-corruption-of-memmap-on-ia64-sparsemem-when-mem_section-is-not-a-power-of-2.patch
bias-the-location-of-pages-freed-for-min_free_kbytes-in-the-same-max_order_nr_pages-blocks.patch
remove-page_group_by_mobility.patch
dont-group-high-order-atomic-allocations.patch
fix-calculation-in-move_freepages_block-for-counting-pages.patch
breakout-page_order-to-internalh-to-avoid-special-knowledge-of-the-buddy-allocator.patch
do-not-depend-on-max_order-when-grouping-pages-by-mobility.patch
print-out-statistics-in-relation-to-fragmentation-avoidance-to-proc-pagetypeinfo.patch

 Mel's page allocator work.  Might merge this, but I'm still not hearing
 sufficiently convincing noises from a sufficient number of people over this.

create-the-zone_movable-zone.patch
allow-huge-page-allocations-to-use-gfp_high_movable.patch
handle-kernelcore=-generic.patch

 Mel's moveable-zone work.  In a similar situation.  We need to stop whatever
 we're doing and get down and work out what we're going to do with all this
 stuff.

maps2-uninline-some-functions-in-the-page-walker.patch
maps2-eliminate-the-pmd_walker-struct-in-the-page-walker.patch
maps2-remove-vma-from-args-in-the-page-walker.patch
maps2-propagate-errors-from-callback-in-page-walker.patch
maps2-add-callbacks-for-each-level-to-page-walker.patch
maps2-move-the-page-walker-code-to-lib.patch
maps2-simplify-interdependence-of-proc-pid-maps-and-smaps.patch
maps2-move-clear_refs-code-to-task_mmuc.patch
maps2-regroup-task_mmu-by-interface.patch
maps2-make-proc-pid-smaps-optional-under-config_embedded.patch
maps2-make-proc-pid-clear_refs-option-under-config_embedded.patch
maps2-add-proc-pid-pagemap-interface.patch
maps2-add-proc-kpagemap-interface.patch

 The advanced process-memory-inspection interfaces.

 These weren't quite ready for 2.6.22 and nothing has changed in the past
 month or two.  Not looking like 2.6.23 material either.

lumpy-reclaim-v4.patch
have-kswapd-keep-a-minimum-order-free-other-than-order-0.patch
only-check-absolute-watermarks-for-alloc_high-and-alloc_harder-allocations.patch

 Lumpy reclaim.  In a similar situation to Mel's patches.  Stuck due to
 general lack or interest and effort.

mm-clean-up-and-kernelify-shrinker-registration.patch
mm-clean-up-and-kernelify-shrinker-registration-vs-git-nfs.patch

 Merge.

split-mmap.patch
only-allow-nonlinear-vmas-for-ram-backed-filesystems.patch
mm-document-fault_data-and-flags.patch
slub-mm-only-make-slub-the-default-slab-allocator.patch

 Merge.

slub-exploit-page-mobility-to-increase-allocation-order.patch
slub-reduce-antifrag-max-order.patch

 These are slub changes which are dependent on Mel's stuff, and I have a note
 here that there were reports of page allocation failures with these.  What's
 up with that?


 Maybe I should just drop the 100-odd marginal-looking MM patches?  We're
 simply not showing compelling reasons for merging them and quite a lot of them
 are stuck in a 90% complete state.


slub-change-error-reporting-format-to-follow-lockdep-loosely.patch
slub-use-list_for_each_entry-for-loops-over-all-slabs.patch
slub-slab-validation-move-tracking-information-alloc-outside-of.patch
slub-ensure-that-the-object-per-slabs-stays-low-for-high-orders.patch
slub-debug-fix-initial-object-debug-state-of-numa-bootstrap-objects.patch
slab-allocators-consolidate-code-for-krealloc-in-mm-utilc.patch
slab-allocators-consistent-zero_size_ptr-support-and-null-result-semantics.patch
slab-allocators-support-__gfp_zero-in-all-allocators.patch
slub-add-some-more-inlines-and-ifdef-config_slub_debug.patch
slub-extract-dma_kmalloc_cache-from-get_cache.patch
slub-do-proper-locking-during-dma-slab-creation.patch
slub-faster-more-efficient-slab-determination-for-__kmalloc.patch
slub-simplify-dma-index-size-calculation.patch
mm-slubc-make-code-static.patch
slub-style-fix-up-the-loop-to-disable-small-slabs.patch
slub-do-not-use-length-parameter-in-slab_alloc.patch
slab-allocators-cleanup-zeroing-allocations.patch
slab-allocators-replace-explicit-zeroing-with-__gfp_zero.patch
slub-do-not-allocate-object-bit-array-on-stack.patch
slub-move-sysfs-operations-outside-of-slub_lock.patch
slub-fix-config_slub_debug-use-for-config_numa.patch

 Slub stuff.  Will merge whatever's mergeable after the above droppage and
 stalls.

add-vm_bug_on-in-case-someone-uses-page_mapping-on-a-slab-page.patch
mm-make-needlessly-global-hugetlb_no_page-static.patch

 Merge

fs-introduce-some-page-buffer-invariants.patch
nfs-invariant-fix.patch
fs-introduce-some-page-buffer-invariants-obnoxiousness.patch

 Re-review, maybe merge.

memory-unplug-v7-migration-by-kernel.patch
memory-unplug-v7-isolate_lru_page-fix.patch
memory-unplug-v7-memory-hotplug-cleanup.patch
memory-unplug-v7-page-isolation.patch
memory-unplug-v7-page-offline.patch
memory-unplug-v7-ia64-interface.patch

 These are new, and are dependent on Mel's stuff.  Not for 2.6.23.

freezer-make-kernel-threads-nonfreezable-by-default.patch

 Merge, subject to re-review.

implement-file-posix-capabilities.patch
implement-file-posix-capabilities-fix.patch
file-capabilities-introduce-cap_setfcap.patch
file-capabilities-get_file_caps-cleanups.patch
file-caps-update-selinux-xattr-hooks.patch

 file-caps seems to be stuck.  There has been some movement lately, might
 merge it subject to suiable acks from suitable parties.

frv-connect-up-new-syscalls.patch
frv-be-self-consistent-and-use-config_gdb_console-everywhere.patch
frv-remove-some-dead-code.patch

 Merge

blackfin-enable-arbitary-speed-serial-setting.patch

 Will send to Bryan when
 lots-of-architectures-enable-arbitary-speed-tty-support.patch is merged

nommu-stub-expand_stack-for-nommu-case.patch
m68knommu-use-trhead_size-instead-of-hard-constant.patch
m68knommu-remove-cruft-from-setup-code.patch
m68knommu-remove-old-cache-management-cruft-from-mm-code.patch

 Merge

h8300-enable-arbitary-speed-tty-port-setup.patch
h8300-zimage-support-update.patch

 Merge

alpha-fix-trivial-section-mismatch-warnings.patch
fix-alpha-isa-support.patch

 Merge

arm26-enable-arbitary-speed-tty-ioctls-and-split.patch
arm26-remove-broken-and-unused-macro.patch

 Will send to Ian

freezer-run-show_state-when-freezing-times-out.patch
pm-do-not-require-dev-spew-to-get-pm_debug.patch
swsusp-remove-incorrect-code-from-userc.patch
swsusp-remove-code-duplication-between-diskc-and-userc.patch
swsusp-introduce-restore-platform-operations.patch
swsusp-fix-hibernation-code-ordering.patch
hibernation-prepare-to-enter-the-low-power-state.patch
freezer-avoid-freezing-kernel-threads-prematurely.patch
freezer-use-__set_current_state-in-refrigerator.patch
freezer-return-int-from-freeze_processes.patch
freezer-remove-redundant-check-in-try_to_freeze_tasks.patch
pm-introduce-hibernation-and-suspend-notifiers.patch
pm-disable-usermode-helper-before-hibernation-and-suspend.patch
pm-prevent-frozen-user-mode-helpers-from-failing-the-freezing-of-tasks-rev-2.patch
pm-reduce-code-duplication-between-mainc-and-userc-updated.patch
acpi-do-not-prepare-for-hibernation-in-acpi_shutdown.patch
pm-introduce-pm_power_off_prepare.patch
pm-optional-beeping-during-resume-from-suspend-to-ram.patch
pm-integrate-beeping-flag-with-existing-acpi_sleep-flags.patch

 Merge

m32r-enable-arbitary-speed-tty-rate-setting.patch

 Merge

etrax-enable-arbitary-speed-setting-on-tty-ports.patch
cris-replace-old-style-member-inits-with-designated-inits.patch

 Merge

uml-fix-request-sector-update.patch
uml-use-get_free_pages-to-allocate-kernel-stacks.patch
add-generic-exit-time-stack-depth-checking-to-config_debug_stack_usage.patch
uml-debug_shirq-fixes.patch
uml-xterm-driver-tidying.patch
uml-pty-channel-tidying.patch
uml-handle-errors-on-opening-host-side-of-consoles.patch
uml-sigio-support-cleanup.patch
uml-simplify-helper-stack-handling.patch
uml-eliminate-kernel-allocator-wrappers.patch

 Merge

v850-enable-arbitary-speed-tty-ioctls.patch

 Merge

deprecate-smbfs-in-favour-of-cifs.patch

 Send to sfrench

cpuset-remove-sched-domain-hooks-from-cpusets.patch

 Stuck.

clone-flag-clone_parent_tidptr-leaves-invalid-results-in-memory.patch

 ebiederm no likee.  Stuck.

cache-pipe-buf-page-address-for-non-highmem-arch.patch

 Ugly, will probably drop.

fix-rmmod-read-write-races-in-proc-entries.patch

 Merge

more-scheduled-oss-driver-removal.patch
doc-kernel-parameters-use-x86-32-tag-instead-of-ia-32.patch
introduce-write_trylock_irqsave.patch
use-write_trylock_irqsave-in-ptrace_attach.patch
use-menuconfig-objects-ii-auxdisplay.patch
use-menuconfig-objects-ii-edac.patch
use-menuconfig-objects-ii-ipmi.patch
use-menuconfig-objects-ii-misc-strange-dev.patch
use-menuconfig-objects-ii-module-menu.patch
use-menuconfig-objects-ii-oprofile.patch
use-menuconfig-objects-ii-telephony.patch
use-menuconfig-objects-ii-tpm.patch
use-menuconfig-objects-connector.patch
use-menuconfig-objects-crypto-hw.patch
use-menuconfig-objects-i2o.patch
use-menuconfig-objects-parport.patch
use-menuconfig-objects-pnp.patch
use-menuconfig-objects-w1.patch
fix-jvc-cdrom-drive-lockup.patch
use-no_pci_devices-in-pci-searchc.patch
introduce-boot-based-time.patch
use-boot-based-time-for-process-start-time-and-boot-time.patch
use-boot-based-time-for-uptime-in-proc.patch
udf-check-for-allocated-memory-for-data-of-new-inodes.patch
add-argv_split-fix.patch
add-common-orderly_poweroff-fix.patch
prevent-an-o_ndelay-writer-from-blocking-when-a-tty-write-is-blocked-by.patch
udf-check-for-allocated-memory-for-inode-data-v2.patch
fix-stop_machine_run-problem-with-naughty-real-time-process.patch
cpu-hotplug-fix-ksoftirqd-termination-on-cpu-hotplug-with-naughty-realtime-process.patch
use-mutexes-instead-of-semaphores-in-i2o-driver.patch
fuse-warning-fix.patch
vxfs-warning-fixes.patch
percpu_counters-use-cpu-notifiers.patch
percpu_counters-use-for_each_online_cpu.patch
make-afs-use-seq_list_xxx-helpers.patch
make-crypto-api-use-seq_list_xxx-helpers.patch
make-proc-misc-use-seq_list_xxx-helpers.patch
make-proc-modules-use-seq_list_xxx-helpers.patch
make-proc-tty-drivers-use-seq_list_xxx-helpers.patch
make-proc-self-mountstats-use-seq_list_xxx-helpers.patch
make-nfs-client-use-seq_list_xxx-helpers.patch
fat-gcc-43-warning-fix.patch
remove-unnecessary-includes-of-spinlockh-under-include-linux.patch
drivers-block-z2ram-remove-true-false-defines.patch
fix-compiler-warnings-in-acornc.patch
update-zilog-timeout.patch
edd-switch-to-pci_get-based-api.patch
fix-up-codingstyle-in-isofs.patch
define-config_bounce-to-avoid-useless-inclusion-of-bounce-buffer.patch
mpu401-warning-fixes.patch
introduce-config_virt_to_bus.patch
pie-randomization.patch
remove-unused-tif_notify_resume-flag.patch
rocketc-fix-unchecked-mutex_lock_interruptible.patch
only-send-sigxfsz-when-exceeding-rlimits.patch
procfs-directory-entry-cleanup.patch
8xx-fix-whitespace-and-indentation.patch
vdso-print-fatal-signals.patch
rtc-ratelimit-lost-interrupts-message.patch
reduce-cpusetc-write_lock_irq-to-read_lock.patch
char-n_hdlc-allow-restartsys-retval-of-tty-write.patch
afs-implement-file-locking.patch
tty_io-use-kzalloc.patch
remove-clockevents_releaserequest_device.patch
kconfig-no-strange-misc-devices.patch
afs-drop-explicit-extern.patch
remove-useless-tolower-in-isofs.patch
char-mxser_new-fix-sparse-warning.patch
char-tty_ioctl-use-wait_event_interruptible_timeout.patch
char-tty_ioctl-little-whitespace-cleanup.patch
char-genrtc-use-wait_event_interruptible.patch
char-n_r3964-use-wait_event_interruptible.patch
char-ip2-use-msleep-for-sleeping.patch
proc-environ-wrong-placing-of-ptrace_may_attach-check.patch
udf-coding-style-conversion-lindent.patch
ext2-fix-a-comment-when-ext2_release_file-is-called.patch
mutex_unlock-later-in-seq_lseek.patch
zs-move-to-the-serial-subsystem.patch
fs-block_devc-use-list_for_each_entry.patch
fault-injection-add-min-order-parameter-to-fail_page_alloc.patch
fault-injection-fix-example-scripts-in-documentation.patch
add-printktime-option-deprecate-time.patch
fs-clarify-dummy-member-in-struct.patch
dma-mapping-prevent-dma-dependent-code-from-linking-on.patch
remove-odd-and-misleading-comments-from-uioh.patch
add-a-flag-to-indicate-deferrable-timers-in-proc-timer_stats.patch
buffer-kill-old-incorrect-comment.patch
introduce-o_cloexec-take-2.patch
o_cloexec-for-scm_rights.patch
init-wait-for-asynchronously-scanned-block-devices.patch
atmel_serial-fix-break-handling.patch
documentation-proc-pid-stat-files.patch
seq_file-more-atomicity-in-traverse.patch
lib-add-idr_for_each.patch
lib-add-idr_remove_all.patch
remove-capabilityh-from-mmh.patch
kernel-utf-8-handling.patch
remove-sonypi_camera_command.patch
drop-an-empty-isicomh-from-being-exported-to-user-space.patch
ext3-ext4-orphan-list-check-on-destroy_inode.patch
ext3-ext4-orphan-list-corruption-due-bad-inode.patch
remove-apparently-useless-commented-apm_get_battery_status.patch
taskstats-add-context-switch-counters.patch
sony-laptop-use-null-for-pointer.patch
undeprecate-raw-driver.patch
hfsplus-change-kmalloc-memset-to-kzalloc.patch
submitchecklist-update-fix-spelling-error.patch
add-support-for-xilinx-systemace-compactflash-interface.patch
fix-typo-in-prefetchh.patch
zsc-drain-the-transmission-line.patch
hugetlbfs-use-lib-parser-fix-docs.patch
report-that-kernel-is-tainted-if-there-were-an-oops-before.patch
intel-rng-undo-mess-made-by-an-80-column-extremist.patch
improve-behaviour-of-spurious-irq-detect.patch
audit-add-tty-input-auditing.patch
remove-config_uts_ns-and-config_ipc_ns.patch

 Merge, subject to re-review.

user-namespace-add-the-framework.patch

 I still think the magical root-user thing in here is odd and perhaps poorly
 thought-out.

user-namespace-add-unshare.patch
revert-vanishing-ioctl-handler-debugging.patch
binfmt_elf-warning-fix.patch
document-the-fact-that-rcu-callbacks-can-run-in-parallel.patch
cobalt-remove-all-references-to-cobalt-nvram.patch
allow-softlockup-to-be-runtime-disabled.patch
dirty_writeback_centisecs_handler-cleanup.patch
mm-fix-create_new_namespaces-return-value.patch
add-a-kmem_cache-for-nsproxy-objects.patch
ptrace_peekdata-consolidation.patch
ptrace_pokedata-consolidation.patch
adjust-nosmp-handling.patch
ext3-fix-deadlock-in-ext3_remount-and-orphan-list-handling.patch
ext4-fix-deadlock-in-ext4_remount-and-orphan-list-handling.patch
remove-unused-lock_cpu_hotplug_interruptible-definition.patch
kerneldoc-fix-in-audit_core_dumps.patch
introduce-compat_u64-and-compat_s64-types.patch
diskquota-32bit-quota-tools-on-64bit-architectures.patch
remove-final-two-references-to-__obsolete_setup-macro.patch
update-procfs-guide-doc-of-read_func.patch
ext3-remove-extra-is_rdonly-check.patch
namespace-ensure-clone_flags-are-always-stored-in-an-unsigned-long.patch
doc-oops-tracing-add-code-decode-info.patch
drop-obsolete-sys_ioctl-export.patch
is_power_of_2-ext3-superc.patch
is_power_of_2-jbd.patch

 Merge, subject to re-review

sys_time-speedup.patch

 Am skeptical about this one.

cdrom-replace-hard-coded-constants-by-kernelh-macro.patch
update-description-in-documentation-filesystems-vfstxt-typo-fixed.patch
futex-tidy-up-the-code-v2.patch
add-documentation-sysctl-ctl_unnumberedtxt.patch
sysctlc-add-text-telling-people-to-use-ctl_unnumbered.patch
# drivers-pmc-msp71xx-gpio-char-driver.patch: david-b panned it
drivers-pmc-msp71xx-gpio-char-driver.patch
mistaken-ext4_inode_bitmap-for-ext4_block_bitmap.patch
hfs-refactor-ascii-to-unicode-conversion-routine.patch
hfs-add-custom-dentry-hash-and-comparison-operations.patch
sprint_symbol-cleanup.patch

 Merge, svbject to re-review.

hwrng-add-type-categories.patch

 This generated a flamewar.  Wil probably drop.

fs-namespacec-should-include-internalh.patch
proper-prototype-for-proc_nr_files.patch
replace-obscure-constructs-in-fs-block_devc.patch
bd_claim_by_disk-fix-warning.patch
fs-reiserfs-cleanups.patch
adb_probe_task-remove-unneeded-flush_signals-call.patch
kcdrwd-remove-unneeded-flush_signals-call.patch
nbdcsock_xmit-cleanup-signal-related-code.patch
move-seccomp-from-proc-to-a-prctl.patch
make-seccomp-zerocost-in-schedule.patch
is_power_of_2-kernel-kfifoc.patch
parport_pc-it887x-fix.patch
is_power_of_2-ufs-superc.patch
codingstyle-add-information-about-trailing-whitespace.patch
codingstyle-add-information-about-editor-modelines.patch
uninline-check_signature.patch
add-werror-implicit-function-declaration.patch
generic-bug-use-show_regs-instead-of-dump_stack.patch
udf-fix-function-name-from-udf_crc16-to-udf_crc.patch
dma-make-dma-pool-to-use-kmalloc_node.patch
unregister_chrdev-ignore-the-return-value.patch
unregister_chrdev-return-void.patch
unregister_blkdev-do-warn_on-on-failure.patch
unregister_blkdev-delete-redundant-messages-in-callers.patch
unregister_blkdev-delete-redundant-message.patch
unregister_blkdev-return-void.patch
add-missing-files-and-dirs-to-00-index-in-documentation.patch
remove-the-last-few-umsdos-leftovers.patch
update-documentation-filesystems-vfstxt-second-part.patch
rename-cancel_rearming_delayed_work-to-cancel_delayed_work_sync.patch
make-cancel_xxx_work_sync-return-a-boolean.patch
ext3-fix-error-handling-in-ext3_create_journal.patch
ext4-fix-error-handling-in-ext4_create_journal.patch
modules-remove-modlist_lock.patch
amiserial-remove-incorrect-no-termios-change-check.patch
genericserial-remove-bogus-optimisation-check-and-dead-code-paths.patch
synclink-remove-bogus-no-change-termios-optimisation.patch
68360serial-remove-broken-optimisation.patch
serial-remove-termios-checks-from-various-old-char-serial.patch
docs-static-initialization-of-spinlocks-is-ok.patch
kernel-printkc-document-possible-deadlock-against-scheduler.patch
remove-mm-backing-devccongestion_wait_interruptible.patch
gitignore-update.patch
isapnp-remove-pointless-check-of-type-against-0-in-isapnp_read_tag.patch
fix-trivial-typos-in-anon_inodesc-comments.patch
vsprintfc-optimizing-part-1-easy-and-obvious-stuff.patch
vsprintfc-optimizing-part-2-base-10-conversion-speedup-v2.patch
drivers-char-ipmi-ipmi_poweroffc-lower-printk-severity.patch
drivers-char-ipmi-ipmi_si_intfc-lower-printk-severity.patch
drivers-block-rdc-lower-printk-severity.patch
ext2-statfs-speed-up.patch
ext3-statfs-speed-up.patch
ext4-statfs-speed-up.patch
permit-mempool_freenull.patch
nls-remove-obsolete-makefile-entries.patch
compat32-ignore-the-loop_clr_fd-ioctl.patch
ia64-arbitary-speed-tty-ioctl-support.patch

 Merge, subject to re-review.

writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists.patch
writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-2.patch
writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-3.patch
writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-4.patch
writeback-fix-comment-use-helper-function.patch
writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-5.patch
writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-6.patch
writeback-fix-time-ordering-of-the-per-superblock-dirty-inode-lists-7.patch

 I guess these should be merged.  There are still bugs in there which I think
 Ken Chen has fixed, but I haven't got onto that yet.

introduce-i_sync.patch
introduce-i_sync-fix.patch

 Merge, I guess.

ibmasm-whitespace-cleanup.patch
ibmasm-dont-use-extern-in-function-declarations.patch
ibmasm-miscellaneous-fixes.patch
ibmasm-must-depend-on-config_input.patch

 Merge.

sync_sb_inodes-propagate-errors.patch

 Needs work.

spi-controller-drivers-check-for-unsupported-modes.patch
spi-add-3wire-mode-flag.patch
crc7-support.patch
spidev-compiler-warning-gone.patch
spi_lm70llp-parport-adapter-driver.patch
spi_mpc83xxc-underclocking-hotfix.patch
atmel_spi-minor-updates.patch
s3c24xx-spi-controllers-both-select-bitbang.patch
spi-tle620x-power-switch-driver.patch
spi-master-driver-for-xilinx-virtex.patch
spi_mpc83xxc-support-qe-enabled-83xx-cpus-like-mpc832x.patch
spi-omap2_mcspi-driver.patch
spi_txx9-controller-driver.patch

 Merge

move-page-writeback-acounting-out-of-macros.patch
ext2-balloc-use-io_error-label.patch

 Might merge.

ext2-reservations.patch

 Still needs decent testing.

use-mutex-instead-of-semaphore-in-capi-20-driver.patch
mismatching-declarations-of-revision-strings-in-hisax.patch
make-isdn-capi-use-seq_list_xxx-helpers.patch
update-isdn-tree-to-use-pci_get_device.patch
sane-irq-initialization-in-sedlbauer-hisax.patch
use-menuconfig-objects-isdn-config_isdn.patch
use-menuconfig-objects-isdn-config_isdn_drv_gigaset.patch
use-menuconfig-objects-isdn-config_isdn_capi.patch
use-menuconfig-objects-isdn-config_capi_avm.patch
use-menuconfig-objects-isdn-config_capi_eicon.patch
isdn-capi-warning-fixes.patch
i4l-leak-in-eicon-idifuncc.patch

 Merge

use-menuconfig-objects-isdn-config_isdn_i4l.patch

 tilman didn't like it - might drop

i2o_cfg_passthru-cleanup.patch
wrong-memory-access-in-i2o_block_device_lock.patch
i2o-message-leak-in-i2o_msg_post_wait_mem.patch
i2o-proc-reading-oops.patch
i2o-debug-output-cleanup.patch

 Merge

knfsd-exportfs-add-exportfsh-header.patch
knfsd-exportfs-remove-iget-abuse.patch
knfsd-exportfs-add-procedural-interface-for-nfsd.patch
knfsd-exportfs-remove-call-macro.patch
knfsd-exportfs-untangle-isdir-logic-in-find_exported_dentry.patch
knfsd-exportfs-move-acceptable-check-into-find_acceptable_alias.patch
knfsd-exportfs-add-find_disconnected_root-helper.patch
knfsd-exportfs-split-out-reconnecting-a-dentry-from-find_exported_dentry.patch
nfsd-warning-fix.patch
knfsd-lockd-nfsd4-use-same-grace-period-for-lockd-and-nfsd4.patch
knfsd-nfsd4-fix-nfsv4-filehandle-size-units-confusion.patch
knfsd-nfsd4-silence-a-compiler-warning-in-acl-code.patch
knfsd-nfsd4-fix-enc_stateid_sz-for-nfsd-callbacks.patch
knfsd-nfsd4-fix-handling-of-acl-errrors.patch
knfsd-nfsd-remove-unused-header-interfaceh.patch
knfsd-nfsd4-vary-maximum-delegation-limit-based-on-ram-size.patch
knfsd-nfsd4-dont-delegate-files-that-have-had-conflicts.patch

 Merge

couple-fixes-to-fs-ecryptfs-inodec.patch
ecryptfs-move-ecryptfs-docs-into-documentation-filesystems.patch

 Merge

rtc-ds1307-cleanups.patch
rtc-rs5c372-becomes-a-new-style-i2c-driver.patch
thecus-n2100-register-rtc-rs5c372-i2c-device.patch
rtc-make-example-code-jump-to-done-instead-of-return-when-ioctl-not-supported.patch
rtc-dev-return-enotty-in-ioctl-if-irq_set_freq-is-not-implemented-by-driver.patch
driver-for-the-atmel-on-chip-rtc-on-at32ap700x-devices.patch
rtc_class-is-no-longer-considered-experimental.patch
rtc-kconfig-tweax.patch
rtc-add-rtc-m41t80-driver-take-2.patch
rtc-watchdog-support-for-rtc-m41t80-driver-take-2.patch
rtc-add-support-for-the-st-m48t59-rtc.patch
rtc-add-support-for-the-st-m48t59-rtc-vs-git-acpi.patch
rtc-driver-for-ds1216-chips.patch
rtc-driver-for-ds1216-chips-fix.patch
rtc-ds1307-oscillator-restart-for-ds1337383940.patch

 Merge.

revoke-special-mmap-handling.patch
revoke-special-mmap-handling-vs-fault-vs-invalidate.patch
revoke-core-code.patch
revoke-support-for-ext2-and-ext3.patch
revoke-add-documentation.patch
revoke-wire-up-i386-system-calls.patch
fs-introduce-write_begin-write_end-and-perform_write-aops-revoke.patch
revoke-vs-git-block.patch

 Don't know.  Need to ping suitable developers over this work.

lguest-export-symbols-for-lguest-as-a-module.patch
lguest-the-guest-code.patch
lguest-the-host-code.patch
lguest-the-host-code-lguest-vs-clockevents-fix-resume-logic.patch
lguest-the-asm-offsets.patch
lguest-the-makefile-and-kconfig.patch
lguest-the-console-driver.patch
lguest-the-net-driver.patch
lguest-the-block-driver.patch
lguest-the-documentation-example-launcher.patch

 Merge

oss-trident-massive-whitespace-removal.patch
oss-trident-fix-locking-around-write_voice_regs.patch
oss-trident-replace-deprecated-pci_find_device-with-pci_get_device.patch
remove-options-depending-on-oss_obsolete.patch

 Merge

unprivileged-mounts-add-user-mounts-to-the-kernel.patch
unprivileged-mounts-allow-unprivileged-umount.patch
unprivileged-mounts-account-user-mounts.patch
unprivileged-mounts-propagate-error-values-from-clone_mnt.patch
unprivileged-mounts-allow-unprivileged-bind-mounts.patch
unprivileged-mounts-put-declaration-of-put_filesystem-in-fsh.patch
unprivileged-mounts-allow-unprivileged-mounts.patch
unprivileged-mounts-allow-unprivileged-fuse-mounts.patch
unprivileged-mounts-propagation-inherit-owner-from-parent.patch
unprivileged-mounts-add-no-submounts-flag.patch

 Don't know.  Need to ping suitable developers over this work.
 
char-cyclades-add-firmware-loading.patch
char-cyclades-fix-sparse-warning.patch
char-isicom-cleanup-locking.patch
char-isicom-del_timer-at-exit.patch
char-isicom-proper-variables-types.patch
char-moxa-eliminate-busy-waiting.patch
char-specialix-remove-busy-waiting.patch
char-riscom8-eliminate-busy-loop.patch
char-vt-use-kzalloc.patch
char-vt-use-array_size.patch
char-kconfig-mxser_new-remove-experimental-comment.patch
char-stallion-remove-user-class-report-request.patch
char-istallion-initlocking-fixes-try-2.patch
stallion-remove-unneeded-lock_kernel.patch

 Merge

fbcon-smart-blitter-usage-for-scrolling.patch
nvidiafb-adjust-flags-to-take-advantage-of-new-scroll-method.patch
fbcon-cursor-blink-control.patch
fbcon-use-struct-device-instead-of-struct-class_device.patch
fbdev-move-arch-specific-bits-to-their-respective.patch
fbdev-detect-primary-display-device.patch
fbcon-allow-fbcon-to-use-the-primary-display-driver.patch
radeonfb-add-support-for-radeon-xpress-200m-rs485.patch
nvidiafb-add-proper-support-for-geforce-7600-chipset.patch
pm2fb-white-spaces-clean-up.patch
fbcon-set_con2fb_map-fixes.patch
fbcon-revise-primary-device-selection.patch
fbdev-fbcon-console-unregistration-from-unregister_framebuffer.patch
vt-add-comment-for-unbind_con_driver.patch
68328fb-the-pseudo_palette-is-only-16-elements-long.patch
controlfb-the-pseudo_palette-is-only-16-elements-long.patch
cyblafb-fix-pseudo_palette-array-overrun-in-setcolreg.patch
epson1355fb-color-setting-fixes.patch
fm2fb-the-pseudo_palette-is-only-16-elements-long.patch
gbefb-the-pseudo_palette-is-only-16-elements-long.patch
macfb-fix-pseudo_palette-size-and-overrun.patch
offb-the-pseudo_palette-is-only-16-elements-long.patch
platinumfb-the-pseudo_palette-is-only-16-elements.patch
pvr2fb-fix-pseudo_palette-array-overrun-and-typecast.patch
q40fb-the-pseudo_palette-is-only-16-elements-long.patch
sgivwfb-the-pseudo_palette-is-only-16-elements-long.patch
tgafb-actually-allocate-memory-for-the-pseudo_palette.patch
tridentfb-fix-pseudo_palette-array-overrun-in-setcolreg.patch
tx3912fb-fix-improper-assignment-of-info-pseudo_palette.patch
atyfb-the-pseudo_palette-is-only-16-elements-long.patch
radeonfb-the-pseudo_palette-is-only-16-elements-long.patch
i810fb-the-pseudo_palette-is-only-16-elements-long.patch
intelfb-the-pseudo_palette-is-only-16-elements-long.patch
sisfb-fix-pseudo_palette-array-size-and-overrun.patch
matroxfb-color-setting-fixes.patch
pm3fb-fillrect-acceleration.patch
pm3fb-possible-cleanups.patch
vt8623fbc-make-code-static.patch
matroxfb-color-setting-fixes-fix.patch
fb-epson1355fb-kill-off-dead-sh-support.patch
fix-the-graphic-corruption-issue-on-ia64-machines.patch

 Merge

omap-add-ti-omap-framebuffer-driver.patch
omap-add-ti-omap1610-accelerator-entry.patch
omap-add-ti-omap1-internal-lcd-controller.patch
omap-add-ti-omap2-internal-display-controller-support.patch
omap-add-ti-omap1-external-lcd-controller-support-sossi.patch
omap-add-ti-omap2-external-lcd-controller-support-rfbi.patch
omap-add-external-epson-hwa742-lcd-controller-support.patch
omap-add-external-epson-blizzard-lcd-controller-support.patch
omap-lcd-panel-support-for-the-ti-omap-h4-board.patch
omap-lcd-panel-support-for-the-ti-omap-h3-board.patch
omap-lcd-panel-support-for-the-palm-tungsten-e.patch
omap-lcd-panel-support-for-palm-tungstent.patch
omap-lcd-panel-support-for-the-palm-zire71.patch
omap-lcd-panel-support-for-the-ti-omap1610-innovator-board.patch
omap-lcd-panel-support-for-the-ti-omap1510-innovator-board.patch
omap-lcd-panel-support-for-the-ti-omap-osk-board.patch
omap-lcd-panel-support-for-the-siemens-sx1-mobile-phone.patch

 Merge

use-menuconfig-objects-ii-md.patch
md-improve-message-about-invalid-superblock-during-autodetect.patch
md-improve-the-is_mddev_idle-test-fix.patch
md-check-that-internal-bitmap-does-not-overlap-other-data.patch
md-change-bitmap_unplug-and-others-to-void-functions.patch

 Merge

raid5-add-the-stripe_queue-object-for-tracking-raid.patch
raid5-use-stripe_queues-to-prioritize-the-most.patch

 Ping Neil

readahead-introduce-pg_readahead.patch
readahead-add-look-ahead-support-to-__do_page_cache_readahead.patch
readahead-min_ra_pages-max_ra_pages-macros.patch
readahead-data-structure-and-routines.patch
readahead-on-demand-readahead-logic.patch
readahead-convert-filemap-invocations.patch
readahead-convert-splice-invocations.patch
readahead-convert-ext3-ext4-invocations.patch
readahead-remove-the-old-algorithm.patch
readahead-move-synchronous-readahead-call-out-of-splice-loop.patch
readahead-pass-real-splice-size.patch
mm-share-pg_readahead-and-pg_reclaim.patch
readahead-split-ondemand-readahead-interface-into-two-functions.patch
readahead-sanify-file_ra_state-names.patch

 Merge

fallocate-implementation-on-i86-x86_64-and-powerpc.patch
fallocate-on-s390.patch
fallocate-on-ia64.patch
fallocate-on-ia64-fix.patch

 Merge.

jprobes-make-struct-jprobeentry-a-void.patch
jprobes-remove-jprobe_entry.patch
jprobes-make-jprobes-a-little-safer-for-users.patch

 Merge.

intel-iommu-dmar-detection-and-parsing-logic.patch
intel-iommu-pci-generic-helper-function.patch
intel-iommu-clflush_cache_range-now-takes-size-param.patch
intel-iommu-iova-allocation-and-management-routines.patch
intel-iommu-intel-iommu-driver.patch
intel-iommu-avoid-memory-allocation-failures-in-dma-map-api-calls.patch
intel-iommu-intel-iommu-cmdline-option-forcedac.patch
intel-iommu-dmar-fault-handling-support.patch
intel-iommu-iommu-gfx-workaround.patch
intel-iommu-iommu-floppy-workaround.patch

 Don't know.  I don't think there were any great objections, but I don't
 think much benefit has been demonstrated?

define-new-percpu-interface-for-shared-data-version-4.patch
use-the-new-percpu-interface-for-shared-data-version-4.patch

 Merge

arch-personality-independent-stack-top.patch
audit-rework-execve-audit.patch
mm-variable-length-argument-support.patch

 Merge.

ext4-zero_user_page-conversion.patch
ext4-remove-extra-is_rdonly-check.patch
is_power_of_2-ext4-superc.patch

 Send to tytso

fs-introduce-vfs_path_lookup.patch
sunrpc-use-vfs_path_lookup.patch
nfsctl-use-vfs_path_lookup.patch
fs-mark-link_path_walk-static.patch
fs-remove-path_walk-export.patch

 Merge, after poking suitable maintainers

kernel-doc-add-tools-doc-in-makefile.patch
kernel-doc-fix-unnamed-struct-union-warning.patch
kernel-doc-strip-c99-comments.patch
kernel-doc-fix-leading-dot-in-man-mode-output.patch

 Merge

coredump-masking-bound-suid_dumpable-sysctl.patch
coredump-masking-reimplementation-of-dumpable-using-two-flags.patch
coredump-masking-add-an-interface-for-core-dump-filter.patch
coredump-masking-elf-enable-core-dump-filtering.patch
coredump-masking-elf-fdpic-remove-an-unused-argument.patch
coredump-masking-elf-fdpic-enable-core-dump-filtering.patch
coredump-masking-documentation-for-proc-pid-coredump_filter.patch

 Merge

kernel-relayc-make-functions-static.patch

 Merge

configfsdlm-separate-out-__configfs_attr-into-configfsh.patch
configfsdlmocfs2-convert-subsystem-semaphore-to-mutex.patch
configfsdlm-rename-config_group_find_obj-and-state-semantics-clearly.patch

 Merge, subject to Joel acks

use-data_data-in-cris.patch
add-missing-data_data-in-powerpc.patch
use-data_data-in-xtensa.patch

 Merge

drivers-edac-add-edac_mc_find-api.patch
drivers-edac-core-make-functions-static.patch
drivers-edac-add-rddr2-memory-types.patch
drivers-edac-split-out-functions-to-unique-files.patch
drivers-edac-add-edac_device-class.patch
drivers-edac-mc-sysfs-add-missing-mem-types.patch
drivers-edac-change-from-semaphore-to-mutex-operation.patch
drivers-edac-new-intel-5000-mc-driver.patch
drivers-edac-new-intel-5000-mc-driver-fix.patch
drivers-edac-coreh-fix-scrubdefs.patch
drivers-edac-new-i82443bxgz-mc-driver.patch
drivers-edac-new-i82443bxgz-mc-driver-broken.patch
drivers-edac-add-new-nmi-rescan.patch
drivers-edac-mod-use-edac_coreh.patch
drivers-edac-add-dev_name-getter-function.patch
drivers-edac-new-inte-30x0-mc-driver.patch
drivers-edac-mod-mc-to-use-workq-instead-of-kthread.patch
drivers-edac-updated-pci-monitoring.patch
drivers-edac-mod-assert_error-check.patch
drivers-edac-mod-pci-poll-names.patch
drivers-edac-core-lindent-cleanup.patch
drivers-edac-edac_device-sysfs-cleanup.patch
drivers-edac-cleanup-workq-ifdefs.patch
drivers-edac-lindent-amd76x.patch
drivers-edac-lindent-i5000.patch
drivers-edac-lindent-e7xxx.patch
drivers-edac-lindent-i3000.patch
drivers-edac-lindent-i82860.patch
drivers-edac-lindent-i82875p.patch
drivers-edac-lindent-e752x.patch
drivers-edac-lindent-i82443bxgx.patch
drivers-edac-lindent-r82600.patch
drivers-edac-drivers-to-use-new-pci-operation.patch
drivers-edac-add-device-sysfs-attributes.patch
drivers-edac-device-output-clenaup.patch
drivers-edac-add-info-kconfig.patch
drivers-edac-update-maintainers-files-for-edac.patch
drivers-edac-cleanup-spaces-gotos-after-lindent-messup.patch
driver-edac-add-mips-and-ppc-visibility.patch
driver-edac-mod-race-fix-i82875p.patch
driver-edac-fix-ignored-return-i82875p.patch
include-linux-pci_id-h-add-amd-northbridge-defines.patch
driver-edac-i5000-define-typo.patch
driver-edac-remove-null-from-statics.patch
driver-edac-i5000-code-tidying.patch
driver-edac-edac_device-code-tidying.patch
driver-edac-mod-edac_align_ptr-function.patch
driver-edac-mod-edac_opt_state_to_string-function.patch
driver-edac-remove-file-edac_mc-h.patch

 Probably hold - there are sysfs issues and a large number of update patches
 in my inbox.  Might merge, undecided.

cpuset-zero-malloc-revert-the-old-cpuset-fix.patch
containersv10-basic-container-framework.patch
containersv10-basic-container-framework-fix.patch
containersv10-basic-container-framework-fix-2.patch
containersv10-basic-container-framework-fix-3.patch
containersv10-basic-container-framework-fix-for-bad-lock-balance-in-containers.patch
containersv10-example-cpu-accounting-subsystem.patch
containersv10-example-cpu-accounting-subsystem-fix.patch
containersv10-add-tasks-file-interface.patch
containersv10-add-tasks-file-interface-fix.patch
containersv10-add-tasks-file-interface-fix-2.patch
containersv10-add-fork-exit-hooks.patch
containersv10-add-fork-exit-hooks-fix.patch
containersv10-add-container_clone-interface.patch
containersv10-add-container_clone-interface-fix.patch
containersv10-add-procfs-interface.patch
containersv10-add-procfs-interface-fix.patch
containersv10-make-cpusets-a-client-of-containers.patch
containersv10-make-cpusets-a-client-of-containers-whitespace.patch
containersv10-share-css_group-arrays-between-tasks-with-same-container-memberships.patch
containersv10-share-css_group-arrays-between-tasks-with-same-container-memberships-fix.patch
containersv10-share-css_group-arrays-between-tasks-with-same-container-memberships-cpuset-zero-malloc-fix-for-new-containers.patch
containersv10-simple-debug-info-subsystem.patch
containersv10-simple-debug-info-subsystem-fix.patch
containersv10-simple-debug-info-subsystem-fix-2.patch
containersv10-support-for-automatic-userspace-release-agents.patch
containersv10-support-for-automatic-userspace-release-agents-whitespace.patch
add-containerstats-v3.patch
add-containerstats-v3-fix.patch
update-getdelays-to-become-containerstats-aware.patch
containers-implement-subsys-post_clone.patch
containers-implement-namespace-tracking-subsystem-v3.patch

 Container stuff.  Hold, I guess.  I was expecting updates from Paul.

fix-raw_spinlock_t-vs-lockdep.patch
lockdep-sanitise-config_prove_locking.patch
lockdep-reduce-the-ifdeffery.patch
lockstat-core-infrastructure.patch
lockstat-human-readability-tweaks.patch
lockstat-hook-into-spinlock_t-rwlock_t-rwsem-and-mutex.patch

 Merge

lockdep-various-fixes.patch
lockdep-fixup-sk_callback_lock-annotation.patch
lockstat-measure-lock-bouncing.patch
lockstat-better-class-name-representation.patch
lockdep-debugging-give-stacktrace-for-init_error.patch
stacktrace-fix-header-file-for-config_stacktrace.patch

 Merge

some-kmalloc-memset-kzalloc-tree-wide.patch

 Merge

reiser4-sb_sync_inodes.patch
reiser4-export-remove_from_page_cache.patch
reiser4-export-radix_tree_preload.patch
reiser4-export-find_get_pages.patch
make-copy_from_user_inatomic-not-zero-the-tail-on-i386-vs-reiser4.patch
reiser4.patch
mm-clean-up-and-kernelify-shrinker-registration-reiser4.patch
reiser4-fix-for-new-aops-patches.patch
git-block-vs-reiser4.patch

 Hold.

make-sure-nobodys-leaking-resources.patch
journal_add_journal_head-debug.patch
page-owner-tracking-leak-detector.patch
releasing-resources-with-children.patch
nr_blockdev_pages-in_interrupt-warning.patch
detect-atomic-counter-underflows.patch
device-suspend-debug.patch
#slab-cache-shrinker-statistics.patch
mm-debug-dump-pageframes-on-bad_page.patch
make-frame_pointer-default=y.patch
mutex-subsystem-synchro-test-module.patch
slab-leaks3-default-y.patch
profile-likely-unlikely-macros.patch
put_bh-debug.patch
acpi_format_exception-debug.patch
lockdep-show-held-locks-when-showing-a-stackdump.patch
add-debugging-aid-for-memory-initialisation-problems.patch
kmap_atomic-debugging.patch
shrink_slab-handle-bad-shrinkers.patch
keep-track-of-network-interface-renaming.patch
workaround-for-a-pci-restoring-bug.patch
prio_tree-debugging-patch.patch
check_dirty_inode_list.patch
alloc_pages-debug.patch
squash-ipc-warnings.patch
random-warning-squishes.patch
w1-build-fix.patch

 -mm only things.


^ permalink raw reply	[flat|nested] 484+ messages in thread

* intel iommu (Re: -mm merge plans for 2.6.23)
  2007-07-10  8:31 -mm merge plans for 2.6.23 Andrew Morton
@ 2007-07-10  9:04 ` Jan Engelhardt
  2007-07-10  9:07 ` -mm merge plans for 2.6.23 -- sys_fallocate Heiko Carstens
                   ` (25 subsequent siblings)
  26 siblings, 0 replies; 484+ messages in thread
From: Jan Engelhardt @ 2007-07-10  9:04 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel


On Jul 10 2007 01:31, Andrew Morton wrote:

>intel-iommu-dmar-detection-and-parsing-logic.patch
>intel-iommu-pci-generic-helper-function.patch
>intel-iommu-clflush_cache_range-now-takes-size-param.patch
>intel-iommu-iova-allocation-and-management-routines.patch
>intel-iommu-intel-iommu-driver.patch
>intel-iommu-avoid-memory-allocation-failures-in-dma-map-api-calls.patch
>intel-iommu-intel-iommu-cmdline-option-forcedac.patch
>intel-iommu-dmar-fault-handling-support.patch
>intel-iommu-iommu-gfx-workaround.patch
>intel-iommu-iommu-floppy-workaround.patch

Here's some fix:


Signed-off-by: Jan Engelhardt <jengelh@gmx.de>

---
 arch/x86_64/Kconfig |   20 ++++++++++----------
 1 file changed, 10 insertions(+), 10 deletions(-)

Index: linux-2.6.22-rc6/arch/x86_64/Kconfig
===================================================================
--- linux-2.6.22-rc6.orig/arch/x86_64/Kconfig
+++ linux-2.6.22-rc6/arch/x86_64/Kconfig
@@ -753,11 +753,11 @@ config DMAR
 	depends on PCI_MSI && ACPI && EXPERIMENTAL
 	default y
 	help
-	  DMA remapping(DMAR) devices support enables independent address
-	  translations for Direct Memory Access(DMA) from Devices.
+	  DMA remapping (DMAR) devices support enables independent address
+	  translations for Direct Memory Access (DMA) from devices.
 	  These DMA remapping devices are reported via ACPI tables
-	  and includes pci device scope covered by these DMA
-	  remapping device.
+	  and include PCI device scope covered by these DMA
+	  remapping devices.
 
 config DMAR_GFX_WA
 	bool "Support for Graphics workaround"
@@ -765,9 +765,9 @@ config DMAR_GFX_WA
 	default y
 	help
 	 Current Graphics drivers tend to use physical address
-	 for DMA and avoid using DMA api's. Setting this config
+	 for DMA and avoid using DMA APIs. Setting this config
 	 option permits the IOMMU driver to set a unity map for
-	 all the OS visible memory. Hence the driver can continue
+	 all the OS-visible memory. Hence the driver can continue
 	 to use physical addresses for DMA.
 
 config DMAR_FLOPPY_WA
@@ -775,10 +775,10 @@ config DMAR_FLOPPY_WA
 	depends on DMAR
 	default y
 	help
-	 Floppy disk drivers are know to by pass dma api calls
-	 their by failing to work when IOMMU is enabled. This
-	 work around will setup a 1 to 1 mappings for the first
-	 16M to make floppy(isa device) work.
+	 Floppy disk drivers are know to bypass DMA API calls
+	 thereby failing to work when IOMMU is enabled. This
+	 workaround will setup a 1:1 mapping for the first
+	 16M to make floppy (an ISA device) work.
 
 source "drivers/pci/pcie/Kconfig"
 

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23 -- sys_fallocate
  2007-07-10  8:31 -mm merge plans for 2.6.23 Andrew Morton
  2007-07-10  9:04 ` intel iommu (Re: -mm merge plans for 2.6.23) Jan Engelhardt
@ 2007-07-10  9:07 ` Heiko Carstens
  2007-07-10  9:22   ` Andrew Morton
  2007-07-10  9:17 ` cpuset-remove-sched-domain-hooks-from-cpusets Paul Jackson
                   ` (24 subsequent siblings)
  26 siblings, 1 reply; 484+ messages in thread
From: Heiko Carstens @ 2007-07-10  9:07 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Andi Kleen, Amit Arora, Martin Schwidefsky

> fallocate-implementation-on-i86-x86_64-and-powerpc.patch

Still broken: arch/x86_64/ia32/ia32entry.S wants compat_sys_fallocate instead
of sys_fallocate. Also compat_sys_fallocate probably should be moved to
fs/compat.c.

> fallocate-on-s390.patch

We reserved a different syscall number than the one that is used right now
in the patch. Please drop this patch... Martin or I will wire up the syscall
as soon as the x86 variant is merged. Everything else just causes trouble and
confusion.

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: cpuset-remove-sched-domain-hooks-from-cpusets
  2007-07-10  8:31 -mm merge plans for 2.6.23 Andrew Morton
  2007-07-10  9:04 ` intel iommu (Re: -mm merge plans for 2.6.23) Jan Engelhardt
  2007-07-10  9:07 ` -mm merge plans for 2.6.23 -- sys_fallocate Heiko Carstens
@ 2007-07-10  9:17 ` Paul Jackson
  2007-07-10 10:15 ` -mm merge plans for 2.6.23 Con Kolivas
                   ` (23 subsequent siblings)
  26 siblings, 0 replies; 484+ messages in thread
From: Paul Jackson @ 2007-07-10  9:17 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Dinakar Guniguntala, Cliff Wickman

Andrew wrote:
> cpuset-remove-sched-domain-hooks-from-cpusets.patch
> 
>  Stuck.

Well ... a few hours ago I just finished the 'unrelated task' that kept
me from doing much cpuset work the last six months.

So, after a little bit of saved up vacation (SGI sabbatical - yippee!),
I should be able to dig into this, see what Dinikar and Cliff have been
up to here, and make some progress on this.  Cliff -- did I see some
work from you go by relating to cpusets and the sched domain hooks in
cpusets?

As before, Andrew, if you're getting bored holding this patch, it's ok
to drop it, even though I still figure you'll see it again, as part of
a patch set to improve this cpuset to sched domain interaction.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23 -- sys_fallocate
  2007-07-10  9:07 ` -mm merge plans for 2.6.23 -- sys_fallocate Heiko Carstens
@ 2007-07-10  9:22   ` Andrew Morton
  2007-07-10 15:45     ` Theodore Tso
  0 siblings, 1 reply; 484+ messages in thread
From: Andrew Morton @ 2007-07-10  9:22 UTC (permalink / raw)
  To: Heiko Carstens; +Cc: linux-kernel, Andi Kleen, Amit Arora, Martin Schwidefsky

On Tue, 10 Jul 2007 11:07:37 +0200 Heiko Carstens <heiko.carstens@de.ibm.com> wrote:

> > fallocate-implementation-on-i86-x86_64-and-powerpc.patch
> 
> Still broken: arch/x86_64/ia32/ia32entry.S wants compat_sys_fallocate instead
> of sys_fallocate. Also compat_sys_fallocate probably should be moved to
> fs/compat.c.
> 
> > fallocate-on-s390.patch
> 
> We reserved a different syscall number than the one that is used right now
> in the patch. Please drop this patch... Martin or I will wire up the syscall
> as soon as the x86 variant is merged. Everything else just causes trouble and
> confusion.

OK, I dropped all the fallocate patches.

That means that a few other syscalls (or at least, revoke) get renumbered.

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-10  8:31 -mm merge plans for 2.6.23 Andrew Morton
                   ` (2 preceding siblings ...)
  2007-07-10  9:17 ` cpuset-remove-sched-domain-hooks-from-cpusets Paul Jackson
@ 2007-07-10 10:15 ` Con Kolivas
       [not found]   ` <b21f8390707101802o2d546477n2a18c1c3547c3d7a@mail.gmail.com>
                     ` (2 more replies)
  2007-07-10 10:52 ` containers (was Re: -mm merge plans for 2.6.23) Srivatsa Vaddagiri
                   ` (22 subsequent siblings)
  26 siblings, 3 replies; 484+ messages in thread
From: Con Kolivas @ 2007-07-10 10:15 UTC (permalink / raw)
  To: Andrew Morton, ck list, Ingo Molnar, Paul Jackson, linux-mm; +Cc: linux-kernel

On Tuesday 10 July 2007 18:31, Andrew Morton wrote:
> When replying, please rewrite the subject suitably and try to Cc: the
> appropriate developer(s).

~swap prefetch

Nick's only remaining issue which I could remotely identify was to make it 
cpuset aware:
http://marc.info/?l=linux-mm&m=117875557014098&w=2
as discussed with Paul Jackson it was cpuset aware:
http://marc.info/?l=linux-mm&m=117895463120843&w=2

I fixed all bugs I could find and improved it as much as I could last kernel 
cycle.

Put me and the users out of our misery and merge it now or delete it forever 
please. And if the meaningless handwaving that I 100% expect as a response 
begins again, then that's fine. I'll take that as a no and you can dump it.

-- 
-ck

^ permalink raw reply	[flat|nested] 484+ messages in thread

* containers (was Re: -mm merge plans for 2.6.23)
  2007-07-10  8:31 -mm merge plans for 2.6.23 Andrew Morton
                   ` (3 preceding siblings ...)
  2007-07-10 10:15 ` -mm merge plans for 2.6.23 Con Kolivas
@ 2007-07-10 10:52 ` Srivatsa Vaddagiri
  2007-07-10 11:19   ` Ingo Molnar
  2007-07-10 18:34   ` Paul Menage
  2007-07-10 11:52 ` fallocate-implementation-on-i86-x86_64-and-powerpc.patch (was: " Theodore Tso
                   ` (21 subsequent siblings)
  26 siblings, 2 replies; 484+ messages in thread
From: Srivatsa Vaddagiri @ 2007-07-10 10:52 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, menage, containers

On Tue, Jul 10, 2007 at 01:31:52AM -0700, Andrew Morton wrote:
> cpuset-zero-malloc-revert-the-old-cpuset-fix.patch
> containersv10-basic-container-framework.patch
> containersv10-basic-container-framework-fix.patch
> containersv10-basic-container-framework-fix-2.patch
> containersv10-basic-container-framework-fix-3.patch
> containersv10-basic-container-framework-fix-for-bad-lock-balance-in-containers.patch
> containersv10-example-cpu-accounting-subsystem.patch
> containersv10-example-cpu-accounting-subsystem-fix.patch
> containersv10-add-tasks-file-interface.patch
> containersv10-add-tasks-file-interface-fix.patch
> containersv10-add-tasks-file-interface-fix-2.patch
> containersv10-add-fork-exit-hooks.patch
> containersv10-add-fork-exit-hooks-fix.patch
> containersv10-add-container_clone-interface.patch
> containersv10-add-container_clone-interface-fix.patch
> containersv10-add-procfs-interface.patch
> containersv10-add-procfs-interface-fix.patch
> containersv10-make-cpusets-a-client-of-containers.patch
> containersv10-make-cpusets-a-client-of-containers-whitespace.patch
> containersv10-share-css_group-arrays-between-tasks-with-same-container-memberships.patch
> containersv10-share-css_group-arrays-between-tasks-with-same-container-memberships-fix.patch
> containersv10-share-css_group-arrays-between-tasks-with-same-container-memberships-cpuset-zero-malloc-fix-for-new-containers.patch
> containersv10-simple-debug-info-subsystem.patch
> containersv10-simple-debug-info-subsystem-fix.patch
> containersv10-simple-debug-info-subsystem-fix-2.patch
> containersv10-support-for-automatic-userspace-release-agents.patch
> containersv10-support-for-automatic-userspace-release-agents-whitespace.patch
> add-containerstats-v3.patch
> add-containerstats-v3-fix.patch
> update-getdelays-to-become-containerstats-aware.patch
> containers-implement-subsys-post_clone.patch
> containers-implement-namespace-tracking-subsystem-v3.patch
> 
>  Container stuff.  Hold, I guess.  I was expecting updates from Paul.

Paul,
	Are you working on a new version? I thought it was mostly ready
for mainline.

-- 
Regards,
vatsa

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: containers (was Re: -mm merge plans for 2.6.23)
  2007-07-10 10:52 ` containers (was Re: -mm merge plans for 2.6.23) Srivatsa Vaddagiri
@ 2007-07-10 11:19   ` Ingo Molnar
  2007-07-10 18:34   ` Paul Menage
  1 sibling, 0 replies; 484+ messages in thread
From: Ingo Molnar @ 2007-07-10 11:19 UTC (permalink / raw)
  To: Srivatsa Vaddagiri; +Cc: Andrew Morton, linux-kernel, menage, containers


* Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> wrote:

> On Tue, Jul 10, 2007 at 01:31:52AM -0700, Andrew Morton wrote:
> > cpuset-zero-malloc-revert-the-old-cpuset-fix.patch
> > containersv10-basic-container-framework.patch
> > containersv10-basic-container-framework-fix.patch
> > containersv10-basic-container-framework-fix-2.patch
> > containersv10-basic-container-framework-fix-3.patch
> > containersv10-basic-container-framework-fix-for-bad-lock-balance-in-containers.patch
> > containersv10-example-cpu-accounting-subsystem.patch
> > containersv10-example-cpu-accounting-subsystem-fix.patch
> > containersv10-add-tasks-file-interface.patch
> > containersv10-add-tasks-file-interface-fix.patch
> > containersv10-add-tasks-file-interface-fix-2.patch
> > containersv10-add-fork-exit-hooks.patch
> > containersv10-add-fork-exit-hooks-fix.patch
> > containersv10-add-container_clone-interface.patch
> > containersv10-add-container_clone-interface-fix.patch
> > containersv10-add-procfs-interface.patch
> > containersv10-add-procfs-interface-fix.patch
> > containersv10-make-cpusets-a-client-of-containers.patch
> > containersv10-make-cpusets-a-client-of-containers-whitespace.patch
> > containersv10-share-css_group-arrays-between-tasks-with-same-container-memberships.patch
> > containersv10-share-css_group-arrays-between-tasks-with-same-container-memberships-fix.patch
> > containersv10-share-css_group-arrays-between-tasks-with-same-container-memberships-cpuset-zero-malloc-fix-for-new-containers.patch
> > containersv10-simple-debug-info-subsystem.patch
> > containersv10-simple-debug-info-subsystem-fix.patch
> > containersv10-simple-debug-info-subsystem-fix-2.patch
> > containersv10-support-for-automatic-userspace-release-agents.patch
> > containersv10-support-for-automatic-userspace-release-agents-whitespace.patch
> > add-containerstats-v3.patch
> > add-containerstats-v3-fix.patch
> > update-getdelays-to-become-containerstats-aware.patch
> > containers-implement-subsys-post_clone.patch
> > containers-implement-namespace-tracking-subsystem-v3.patch
> > 
> >  Container stuff.  Hold, I guess.  I was expecting updates from Paul.
> 
> Paul,
> 	Are you working on a new version? I thought it was mostly ready
> for mainline.

in particular CONFIG_FAIR_GROUP_SCHED depends on these APIs. Once the 
APIs to configure are upstream, the group scheduler can be enabled too. 
(basically all the group scheduling bits are upstream now as part of 
CFS.)

	Ingo

^ permalink raw reply	[flat|nested] 484+ messages in thread

* fallocate-implementation-on-i86-x86_64-and-powerpc.patch (was: re: -mm merge plans for 2.6.23)
  2007-07-10  8:31 -mm merge plans for 2.6.23 Andrew Morton
                   ` (4 preceding siblings ...)
  2007-07-10 10:52 ` containers (was Re: -mm merge plans for 2.6.23) Srivatsa Vaddagiri
@ 2007-07-10 11:52 ` Theodore Tso
  2007-07-10 17:15   ` Andrew Morton
  2007-07-10 12:37 ` clam Andy Whitcroft
                   ` (20 subsequent siblings)
  26 siblings, 1 reply; 484+ messages in thread
From: Theodore Tso @ 2007-07-10 11:52 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Amit Arora, Andi Kleen, Paul Mackerras,
	Benjamin Herrenschmidt, Arnd Bergmann, Luck, Tony,
	Heiko Carstens, Martin Schwidefsky, Theodore Ts'o,
	Mark Fasheh, Andrew Morton

On Tue, Jul 10, 2007 at 01:31:52AM -0700, Andrew Morton wrote:
>  Merge
> 
> fallocate-implementation-on-i86-x86_64-and-powerpc.patch

Andrew,

Could you replace the comment/header section of
fallocate-implementation-on-i86-x86_64-and-powerpc.patch with the
following (attached below) ?  This is from the ext4 patches, where
Amit had cleaned up description, which will make for a cleaner and
easier to understand submission into the git tree.

I've reviewed the other fallocate patches, noting the request to drop
the s390 patches since Martin has said he will wire up it up after
this hits mainline, and the only other change that I've found between
what we have in the ext4 tree and -mm is that we have 

	fallocate-on-ia64.patch
and
	fallocate-on-ia64-fix.patch 

merged into a single patch.  It probably would be better to merge them
before sending it off to Linus, in the interests of cleanliness and
making the tree more git-bisect friendly.

Regards,

						- Ted


From: Amit Arora <aarora@in.ibm.com>

sys_fallocate() implementation on i386, x86_64 and powerpc

fallocate() is a new system call being proposed here which will allow
applications to preallocate space to any file(s) in a file system.
Each file system implementation that wants to use this feature will need
to support an inode operation called ->fallocate().
Applications can use this feature to avoid fragmentation to certain
level and thus get faster access speed. With preallocation, applications
also get a guarantee of space for particular file(s) - even if later the
the system becomes full.

Currently, glibc provides an interface called posix_fallocate() which
can be used for similar cause. Though this has the advantage of working
on all file systems, but it is quite slow (since it writes zeroes to
each block that has to be preallocated). Without a doubt, file systems
can do this more efficiently within the kernel, by implementing
the proposed fallocate() system call. It is expected that
posix_fallocate() will be modified to call this new system call first
and incase the kernel/filesystem does not implement it, it should fall
back to the current implementation of writing zeroes to the new blocks.
ToDos:
1. Implementation on other architectures (other than i386, x86_64,
ppc64 and s390(x)). David Chinner has already posted a patch for ia64.
2. A generic file system operation to handle fallocate
(generic_fallocate), for filesystems that do _not_ have the fallocate
inode operation implemented.
3. Changes to glibc,
   a) to support fallocate() system call
   b) to make posix_fallocate() and posix_fallocate64() call fallocate()


Signed-off-by: Amit Arora <aarora@in.ibm.com>
Cc: Andi Kleen <ak@suse.de>
Cc: Paul Mackerras <paulus@samba.org>
Cc: Benjamin Herrenschmidt <benh@kernel.crashing.org>
Cc: Arnd Bergmann <arnd@arndb.de>
Cc: "Luck, Tony" <tony.luck@intel.com>
Cc: Heiko Carstens <heiko.carstens@de.ibm.com>
Cc: Martin Schwidefsky <schwidefsky@de.ibm.com>
Cc: Theodore Ts'o <tytso@mit.edu>
Cc: Mark Fasheh <mark.fasheh@oracle.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>

^ permalink raw reply	[flat|nested] 484+ messages in thread

* clam
  2007-07-10  8:31 -mm merge plans for 2.6.23 Andrew Morton
                   ` (5 preceding siblings ...)
  2007-07-10 11:52 ` fallocate-implementation-on-i86-x86_64-and-powerpc.patch (was: " Theodore Tso
@ 2007-07-10 12:37 ` Andy Whitcroft
  2007-07-11  9:34   ` Re: -mm merge plans -- lumpy reclaim Andy Whitcroft
  2007-07-10 15:08 ` -mm merge plans for 2.6.23 Serge E. Hallyn
                   ` (19 subsequent siblings)
  26 siblings, 1 reply; 484+ messages in thread
From: Andy Whitcroft @ 2007-07-10 12:37 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Mel Gorman, Christoph Lameter

Andrew Morton wrote:

[...]
> lumpy-reclaim-v4.patch
> have-kswapd-keep-a-minimum-order-free-other-than-order-0.patch
> only-check-absolute-watermarks-for-alloc_high-and-alloc_harder-allocations.patch
> 
>  Lumpy reclaim.  In a similar situation to Mel's patches.  Stuck due to
>  general lack or interest and effort.

The lumpy reclaim patches originally came out of work to support Mel's
anti-fragmentation work.  As such I think they have become somewhat
attached to those patches.  Whilst lumpy is most effective where
placement controls are in place as offered by Mel's work, we see benefit
from reduction in the "blunderbuss" effect when we reclaim at higher
orders.  While placement control is pretty much required for the very
highest orders such as huge page size, lower order allocations are
benefited in terms of lower collateral damage.

There are now a few areas other than huge page allocations which can
benefit.  Stacks are still order 1.  Jumbo frames want higher order
contiguous pages for there incoming hardware buffers.  SLUB is showing
performance benefits from moving to a higher allocation order.  All of
these should benefit from more aggressive targeted reclaim, indeed I
have been surprised just how often my test workloads trigger lumpy at
order 1 to get new stacks.

Truly representative work loads are hard to generate for some of these.
 Though we have heard some encouraging noises from those who can
reproduce these problems.

[...]

-apw

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-10  8:31 -mm merge plans for 2.6.23 Andrew Morton
                   ` (6 preceding siblings ...)
  2007-07-10 12:37 ` clam Andy Whitcroft
@ 2007-07-10 15:08 ` Serge E. Hallyn
  2007-07-10 15:11 ` Rafael J. Wysocki
                   ` (18 subsequent siblings)
  26 siblings, 0 replies; 484+ messages in thread
From: Serge E. Hallyn @ 2007-07-10 15:08 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Andrew Morgan

Quoting Andrew Morton (akpm@linux-foundation.org):
...
> implement-file-posix-capabilities.patch
> implement-file-posix-capabilities-fix.patch
> file-capabilities-introduce-cap_setfcap.patch
> file-capabilities-get_file_caps-cleanups.patch
> file-caps-update-selinux-xattr-hooks.patch
> 
>  file-caps seems to be stuck.  There has been some movement lately, might
>  merge it subject to suiable acks from suitable parties.

Andrew Morgan has requested a series of changes.  Since one of these
would involve a change in the on-disk format of file capabilities, I
guess these should (sigh) wait another cycle.

I will try to get that change out the door next, as soon as possible, so
that hopefully there are no more definite blocking requests.

thanks,
-serge

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-10  8:31 -mm merge plans for 2.6.23 Andrew Morton
                   ` (7 preceding siblings ...)
  2007-07-10 15:08 ` -mm merge plans for 2.6.23 Serge E. Hallyn
@ 2007-07-10 15:11 ` Rafael J. Wysocki
  2007-07-10 16:29 ` -mm merge plans for 2.6.23 (pcmcia) Randy Dunlap
                   ` (17 subsequent siblings)
  26 siblings, 0 replies; 484+ messages in thread
From: Rafael J. Wysocki @ 2007-07-10 15:11 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Nigel Cunningham, Pavel Machek

On Tuesday, 10 July 2007 10:31, Andrew Morton wrote:
[--snip--]
>
> freezer-make-kernel-threads-nonfreezable-by-default.patch
> 
>  Merge, subject to re-review.

Hmm, I'm not sure what that means.  Am I supposed to do anything about it?

Greetings,
Rafael


-- 
"Premature optimization is the root of all evil." - Donald Knuth

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23 -- sys_fallocate
  2007-07-10  9:22   ` Andrew Morton
@ 2007-07-10 15:45     ` Theodore Tso
  2007-07-10 17:27       ` Andrew Morton
                         ` (2 more replies)
  0 siblings, 3 replies; 484+ messages in thread
From: Theodore Tso @ 2007-07-10 15:45 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Heiko Carstens, linux-kernel, Andi Kleen, Amit Arora, Martin Schwidefsky

On Tue, Jul 10, 2007 at 02:22:13AM -0700, Andrew Morton wrote:
> On Tue, 10 Jul 2007 11:07:37 +0200 Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
> > We reserved a different syscall number than the one that is used right now
> > in the patch. Please drop this patch... Martin or I will wire up the syscall
> > as soon as the x86 variant is merged. Everything else just causes trouble and
> > confusion.
> 
> OK, I dropped all the fallocate patches.

Andrew, I want to clarify who is going to push the fallocate patches.
I can either push them to Linus as part of the ext4 patch set, or we
can wait for you to push them.  I thought since you had them in -mm
and we were going to wait you to push them (and presume that this was
going to happen soon).

Alternatively I can push them directly to Linus along with other ext4
patches.  We can drop the s390 patch if Martin or Heiko wants to wire
it up themselves.

As far as I know there hasn't been any real contention on the actual
syscall patches, other than the numbering issues, so it seems that
pushing them to Linus sooner rather than later is the right thing to
do. 

I don't particularly care who pushes them, just as long as they get
pushed.  :-)  

So if you've dropped, shall I push them to Linus as part of the ext4
patches we've been planning on pushing?

Regards,

					- Ted

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23 (pcmcia)
  2007-07-10  8:31 -mm merge plans for 2.6.23 Andrew Morton
                   ` (8 preceding siblings ...)
  2007-07-10 15:11 ` Rafael J. Wysocki
@ 2007-07-10 16:29 ` Randy Dunlap
  2007-07-10 17:30   ` Andrew Morton
  2007-07-10 16:31 ` -mm merge plans for 2.6.23 - ioat/dma engine Kok, Auke
                   ` (16 subsequent siblings)
  26 siblings, 1 reply; 484+ messages in thread
From: Randy Dunlap @ 2007-07-10 16:29 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-pcmcia

On Tue, 10 Jul 2007 01:31:52 -0700 Andrew Morton wrote:

> 
> When replying, please rewrite the subject suitably and try to Cc: the
> appropriate developer(s).

...

> pcmcia-delete-obsolete-pcmcia_ioctl-feature.patch
> use-menuconfig-objects-pcmcia.patch
> 
>  Am a bit stuck with the pcmcia patches.  Dominik has disappeared.

The menuconfig patch looks fine.

I looked at the May-2007 discussion of ioctl removal.
I don't see that much has changed since then, so it's either be
brave/foolish/whatever and see what happens or just wait.
I'll gladly send a patch to update the removal date in
feature-removal-schedule.txt

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23 - ioat/dma engine
  2007-07-10  8:31 -mm merge plans for 2.6.23 Andrew Morton
                   ` (9 preceding siblings ...)
  2007-07-10 16:29 ` -mm merge plans for 2.6.23 (pcmcia) Randy Dunlap
@ 2007-07-10 16:31 ` Kok, Auke
  2007-07-10 18:05   ` Nelson, Shannon
  2007-07-10 17:42 ` ata and netdev (was Re: -mm merge plans for 2.6.23) Jeff Garzik
                   ` (15 subsequent siblings)
  26 siblings, 1 reply; 484+ messages in thread
From: Kok, Auke @ 2007-07-10 16:31 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Nelson, Shannon, Leech, Christopher

Andrew Morton wrote:
> git-ioat-vs-git-md-accel.patch
> ioat-warning-fix.patch
> fix-i-oat-for-kexec.patch
> 
>  I don't seem to be able to get rid of these.  Chris Leech appears to have
>  vanished.

Chris is a moving target. Thankfully we have Shannon Nelson taking over Chris' 
duties. Shannon, can you take a look at these and see what needs to happen to it 
? Most likely these just need to be pushed to the right person.

Cheers,

Auke


PS: I think we should add an I/OAT / DMA engine section in the MAINTAINERS...

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: fallocate-implementation-on-i86-x86_64-and-powerpc.patch (was: re: -mm merge plans for 2.6.23)
  2007-07-10 11:52 ` fallocate-implementation-on-i86-x86_64-and-powerpc.patch (was: " Theodore Tso
@ 2007-07-10 17:15   ` Andrew Morton
  2007-07-10 17:44     ` fallocate-implementation-on-i86-x86_64-and-powerpc.patch Jeff Garzik
  2007-07-10 19:07     ` fallocate-implementation-on-i86-x86_64-and-powerpc.patch (was: re: -mm merge plans for 2.6.23) Theodore Tso
  0 siblings, 2 replies; 484+ messages in thread
From: Andrew Morton @ 2007-07-10 17:15 UTC (permalink / raw)
  To: Theodore Tso
  Cc: linux-kernel, Amit Arora, Andi Kleen, Paul Mackerras,
	Benjamin Herrenschmidt, Arnd Bergmann, Luck, Tony,
	Heiko Carstens, Martin Schwidefsky, Mark Fasheh

On Tue, 10 Jul 2007 07:52:51 -0400 Theodore Tso <tytso@mit.edu> wrote:

> On Tue, Jul 10, 2007 at 01:31:52AM -0700, Andrew Morton wrote:
> >  Merge
> > 
> > fallocate-implementation-on-i86-x86_64-and-powerpc.patch
> 
> Andrew,
> 
> Could you replace the comment/header section of
> fallocate-implementation-on-i86-x86_64-and-powerpc.patch with the
> following (attached below) ?  This is from the ext4 patches, where
> Amit had cleaned up description, which will make for a cleaner and
> easier to understand submission into the git tree.

There were issues with the x86 patch, the s390 patch was wrong and Tony
wants the the ia64 patch to use a different syscall number.

So I dropped everything.  Let's start again from scratch.  I'd suggest that
for now we go with just an i386/x86_64 implementation, let the arch
maintainers wire things up when that has settled down.

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23 -- sys_fallocate
  2007-07-10 15:45     ` Theodore Tso
@ 2007-07-10 17:27       ` Andrew Morton
  2007-07-10 18:05       ` Heiko Carstens
  2007-07-10 18:20       ` Mark Fasheh
  2 siblings, 0 replies; 484+ messages in thread
From: Andrew Morton @ 2007-07-10 17:27 UTC (permalink / raw)
  To: Theodore Tso
  Cc: Heiko Carstens, linux-kernel, Andi Kleen, Amit Arora, Martin Schwidefsky

On Tue, 10 Jul 2007 11:45:03 -0400 Theodore Tso <tytso@mit.edu> wrote:

> On Tue, Jul 10, 2007 at 02:22:13AM -0700, Andrew Morton wrote:
> > On Tue, 10 Jul 2007 11:07:37 +0200 Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
> > > We reserved a different syscall number than the one that is used right now
> > > in the patch. Please drop this patch... Martin or I will wire up the syscall
> > > as soon as the x86 variant is merged. Everything else just causes trouble and
> > > confusion.
> > 
> > OK, I dropped all the fallocate patches.
> 
> Andrew, I want to clarify who is going to push the fallocate patches.
> I can either push them to Linus as part of the ext4 patch set, or we
> can wait for you to push them.  I thought since you had them in -mm
> and we were going to wait you to push them (and presume that this was
> going to happen soon).

How about you send them?  The syscall numbers might need to be changed
based upon when/whether the revoke patches get merged.

> Alternatively I can push them directly to Linus along with other ext4
> patches.

I note that nobody really bothered reviewing all those ext4 patches.

Do you feel that they have been adequately reviewed?  I don't.  I guess I
know what I'll be doing today :(

>  We can drop the s390 patch if Martin or Heiko wants to wire
> it up themselves.

ia64 needs changing too.


^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23 (pcmcia)
  2007-07-10 16:29 ` -mm merge plans for 2.6.23 (pcmcia) Randy Dunlap
@ 2007-07-10 17:30   ` Andrew Morton
  0 siblings, 0 replies; 484+ messages in thread
From: Andrew Morton @ 2007-07-10 17:30 UTC (permalink / raw)
  To: Randy Dunlap; +Cc: linux-kernel, linux-pcmcia

On Tue, 10 Jul 2007 09:29:58 -0700 Randy Dunlap <randy.dunlap@oracle.com> wrote:

> On Tue, 10 Jul 2007 01:31:52 -0700 Andrew Morton wrote:
> 
> > 
> > When replying, please rewrite the subject suitably and try to Cc: the
> > appropriate developer(s).
> 
> ...
> 
> > pcmcia-delete-obsolete-pcmcia_ioctl-feature.patch
> > use-menuconfig-objects-pcmcia.patch
> > 
> >  Am a bit stuck with the pcmcia patches.  Dominik has disappeared.
> 
> The menuconfig patch looks fine.

Yeah, I'll merge that.

> I looked at the May-2007 discussion of ioctl removal.
> I don't see that much has changed since then, so it's either be
> brave/foolish/whatever and see what happens or just wait.
> I'll gladly send a patch to update the removal date in
> feature-removal-schedule.txt

I have a note here that the ioctl-removal patch needs Dominik
consideration.  I see no rush on it so I'll just sit on it.


^ permalink raw reply	[flat|nested] 484+ messages in thread

* ata and netdev (was Re: -mm merge plans for 2.6.23)
  2007-07-10  8:31 -mm merge plans for 2.6.23 Andrew Morton
                   ` (10 preceding siblings ...)
  2007-07-10 16:31 ` -mm merge plans for 2.6.23 - ioat/dma engine Kok, Auke
@ 2007-07-10 17:42 ` Jeff Garzik
  2007-07-10 18:24   ` Andrew Morton
  2007-07-10 19:56   ` Sergei Shtylyov
  2007-07-10 17:49 ` ext2 reservations (Re: " Alexey Dobriyan
                   ` (14 subsequent siblings)
  26 siblings, 2 replies; 484+ messages in thread
From: Jeff Garzik @ 2007-07-10 17:42 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, IDE/ATA development list, netdev, Tejun Heo


(just to provide my indicator of status)

Andrew Morton wrote:
> libata-config_pm=n-compile-fix.patch

that's for a branch that you don't get via libata-dev#ALL, #mv-ahci-pata.


> pata_acpi-restore-driver.patch

see Alan's comments.  I've been ignoring pata_acpi for a while, because 
IMO it always needed work.


> libata-core-convert-to-use-cancel_rearming_delayed_work.patch

will merge


> libata-implement-ata_wait_after_reset.patch

I'm pretty much this is obsolete.  Tejun?


> sata_promise-sata-hotplug-support.patch

will merge


> libata-add-irq_flags-to-struct-pata_platform_info-fix.patch

are other pata_platform people happy with this?  I don't know embedded 
well enough to know if adding this struct member will break things.


> ata-add-the-sw-ncq-support-to-sata_nv-for-mcp51-mcp55-mcp61.patch
> sata_nv-allow-changing-queue-depth.patch

should be combined, really.  will merge eventually.  basic concept OK, 
but need to review in depth.

> pata_hpt3x3-major-reworking-and-testing.patch
> iomap-sort-out-the-broken-address-reporting-caused-by-the-iomap-layer.patch
> ata-use-iomap_name.patch

generally OK


> libata-check-for-an-support.patch
> scsi-expose-an-to-user-space.patch
> libata-expose-an-to-user-space.patch
> scsi-save-disk-in-scsi_device.patch
> libata-send-event-when-an-received.patch
> 
>  Am sitting on these due to confusion regarding the status of the ata-ahci
>  patches.

I will apply what I can, but it seems there are lifetime problems


> ata-ahci-alpm-store-interrupt-value.patch
> ata-ahci-alpm-expose-power-management-policy-option-to-users.patch
> ata-ahci-alpm-enable-link-power-management-for-ata-drivers.patch
> ata-ahci-alpm-enable-aggressive-link-power-management-for-ahci-controllers.patch
> 
>  These appear to need some work.

seemed mostly OK to me.  what comments did I miss?


> libata-add-human-readable-error-value-decoding.patch

still pondering; in my mbox queue

> libata-fix-hopefully-all-the-remaining-problems-with.patch
> testing-patch-for-ali-pata-fixes-hopefully-for-the-problems-with-atapi-dma.patch
> pata_ali-more-work.patch

No idea.  I would poke Alan.  Probably drop.


> 8139too-force-media-setting-fix.patch
> blackfin-on-chip-ethernet-mac-controller-driver.patch
> atari_pamsnetc-old-declaration-ritchie-style-fix.patch
> sundance-phy-address-form-0-only-for-device-id-0x0200.patch

Needs a bug fix, so that the newly modified loop doesn't scan the final 
phy id, then loop back around to scan the first again.


> 3x59x-fix-pci-resource-management.patch
> update-smc91x-driver-with-arm-versatile-board-info.patch
> drivers-net-ns83820c-add-paramter-to-disable-auto.patch
> 
>  netdev patches which are stuck in limbo land.

?  I don't think I've seen these.


> bonding-bond_mainc-make-2-functions-static.patch

FWIW bonding stuff should go to me, since it lives mostly in drivers/net


> x86-initial-fixmap-support.patch

Andi material?


> mm-revert-kernel_ds-buffered-write-optimisation.patch
> revert-81b0c8713385ce1b1b9058e916edcf9561ad76d6.patch
> revert-6527c2bdf1f833cc18e8f42bd97973d583e4aa83.patch
> mm-clean-up-buffered-write-code.patch
> mm-debug-write-deadlocks.patch
> mm-trim-more-holes.patch
> mm-buffered-write-cleanup.patch
> mm-write-iovec-cleanup.patch
> mm-fix-pagecache-write-deadlocks.patch
> mm-buffered-write-iterator.patch
> fs-fix-data-loss-on-error.patch
> mm-restore-kernel_ds-optimisations.patch
>  pagefault-in-write deadlock fixes.  Will hold for 2.6.24.

Any of the above worth 2.6.23?  Just wondering if they were useful 
cleanups / minor fixes prior to new aops patches?


> more-scheduled-oss-driver-removal.patch

ACK


> oss-trident-massive-whitespace-removal.patch
> oss-trident-fix-locking-around-write_voice_regs.patch
> oss-trident-replace-deprecated-pci_find_device-with-pci_get_device.patch
> remove-options-depending-on-oss_obsolete.patch
> 
>  Merge

what about just removing the OSS drivers in question?  :)


> intel-iommu-dmar-detection-and-parsing-logic.patch
> intel-iommu-pci-generic-helper-function.patch
> intel-iommu-clflush_cache_range-now-takes-size-param.patch
> intel-iommu-iova-allocation-and-management-routines.patch
> intel-iommu-intel-iommu-driver.patch
> intel-iommu-avoid-memory-allocation-failures-in-dma-map-api-calls.patch
> intel-iommu-intel-iommu-cmdline-option-forcedac.patch
> intel-iommu-dmar-fault-handling-support.patch
> intel-iommu-iommu-gfx-workaround.patch
> intel-iommu-iommu-floppy-workaround.patch
> 
>  Don't know.  I don't think there were any great objections, but I don't
>  think much benefit has been demonstrated?

Just the general march of progress on new hardware :)

I would like to see this support merged in /some/ form.  We've been 
telling Intel for years they were sillyheads for not bothering with an 
IOMMU.  Now that they have, we should give them a cookie and support 
good technology.

	Jeff



^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: fallocate-implementation-on-i86-x86_64-and-powerpc.patch
  2007-07-10 17:15   ` Andrew Morton
@ 2007-07-10 17:44     ` Jeff Garzik
  2007-07-10 23:27       ` fallocate-implementation-on-i86-x86_64-and-powerpc.patch Paul Mackerras
  2007-07-10 19:07     ` fallocate-implementation-on-i86-x86_64-and-powerpc.patch (was: re: -mm merge plans for 2.6.23) Theodore Tso
  1 sibling, 1 reply; 484+ messages in thread
From: Jeff Garzik @ 2007-07-10 17:44 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Theodore Tso, linux-kernel, Amit Arora, Andi Kleen,
	Paul Mackerras, Benjamin Herrenschmidt, Arnd Bergmann, Luck,
	Tony, Heiko Carstens, Martin Schwidefsky, Mark Fasheh,
	linux-arch

Andrew Morton wrote:
> So I dropped everything.  Let's start again from scratch.  I'd suggest that
> for now we go with just an i386/x86_64 implementation, let the arch
> maintainers wire things up when that has settled down.


It's my observation that that plan usually works the best.  Arch 
maintainers come along and wire up batches of syscalls when they have a 
chance to glance at the ABI, and catch up with x86[-64].

	Jeff





^ permalink raw reply	[flat|nested] 484+ messages in thread

* ext2 reservations (Re: -mm merge plans for 2.6.23)
  2007-07-10  8:31 -mm merge plans for 2.6.23 Andrew Morton
                   ` (11 preceding siblings ...)
  2007-07-10 17:42 ` ata and netdev (was Re: -mm merge plans for 2.6.23) Jeff Garzik
@ 2007-07-10 17:49 ` Alexey Dobriyan
  2007-07-10 18:34 ` PCI probing changes Jesse Barnes
                   ` (13 subsequent siblings)
  26 siblings, 0 replies; 484+ messages in thread
From: Alexey Dobriyan @ 2007-07-10 17:49 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-ext4

> ext2-reservations.patch
> 
>  Still needs decent testing.

Was this oops silently fixed?
http://lkml.org/lkml/2007/3/2/138
2.6.21-rc2-mm1: EIP is at ext2_discard_reservation+0x1c/0x52

I still have that ext2 partition backed up.


^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23 -- sys_fallocate
  2007-07-10 15:45     ` Theodore Tso
  2007-07-10 17:27       ` Andrew Morton
@ 2007-07-10 18:05       ` Heiko Carstens
  2007-07-10 18:39         ` Amit K. Arora
  2007-07-10 18:41         ` Andrew Morton
  2007-07-10 18:20       ` Mark Fasheh
  2 siblings, 2 replies; 484+ messages in thread
From: Heiko Carstens @ 2007-07-10 18:05 UTC (permalink / raw)
  To: Theodore Tso, Andrew Morton, linux-kernel, Andi Kleen,
	Amit Arora, Martin Schwidefsky

> Alternatively I can push them directly to Linus along with other ext4
> patches.  We can drop the s390 patch if Martin or Heiko wants to wire
> it up themselves.

Yes, please drop the s390 patch. In general it seems to be better if only
one architecture gets a syscall wired up initially and let other arches
follow later.

Just wondering if the x86_64 compat syscall gets ever fixed? I think
I mentioned already three or four times to Amit that it is broken.
Or is it that nobody cares? Dunno..

In addition there used to be a somewhat inofficial rule that new syscalls
have to come with a test program, so people can easily test if they wired
up the syscall correctly.

^ permalink raw reply	[flat|nested] 484+ messages in thread

* RE: -mm merge plans for 2.6.23 - ioat/dma engine
  2007-07-10 16:31 ` -mm merge plans for 2.6.23 - ioat/dma engine Kok, Auke
@ 2007-07-10 18:05   ` Nelson, Shannon
  2007-07-10 18:47     ` Andrew Morton
  0 siblings, 1 reply; 484+ messages in thread
From: Nelson, Shannon @ 2007-07-10 18:05 UTC (permalink / raw)
  To: Kok, Auke-jan H, Andrew Morton; +Cc: linux-kernel, Leech, Christopher

Kok, Auke wrote: 
>Andrew Morton wrote:
>> git-ioat-vs-git-md-accel.patch
>> ioat-warning-fix.patch
>> fix-i-oat-for-kexec.patch
>> 
>>  I don't seem to be able to get rid of these.  Chris Leech 
>appears to have
>>  vanished.
>
>Chris is a moving target. Thankfully we have Shannon Nelson 
>taking over Chris' 
>duties. Shannon, can you take a look at these and see what 
>needs to happen to it 
>? Most likely these just need to be pushed to the right person.

Auke: Thanks for the introduction :-).

Andrew: All three of these patches are reasonable and can be pushed on
up.  You can add my sign-off to all three:
Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>

>
>Cheers,
>
>Auke
>
>
>PS: I think we should add an I/OAT / DMA engine section in the 
>MAINTAINERS...
>

I'll be posting a MAINTAINERS patch Real Soon Now with my name on
IOAT/DMA.

sln
======================================================================
Mr. Shannon Nelson                 LAN Access Division, Intel Corp.
Shannon.Nelson@intel.com                I don't speak for Intel
(503) 712-7659                    Parents can't afford to be squeamish.

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23 -- sys_fallocate
  2007-07-10 15:45     ` Theodore Tso
  2007-07-10 17:27       ` Andrew Morton
  2007-07-10 18:05       ` Heiko Carstens
@ 2007-07-10 18:20       ` Mark Fasheh
  2007-07-10 20:28         ` Amit K. Arora
  2 siblings, 1 reply; 484+ messages in thread
From: Mark Fasheh @ 2007-07-10 18:20 UTC (permalink / raw)
  To: Theodore Tso, Andrew Morton, Heiko Carstens, linux-kernel,
	Andi Kleen, Amit Arora, Martin Schwidefsky

On Tue, Jul 10, 2007 at 11:45:03AM -0400, Theodore Tso wrote:
> On Tue, Jul 10, 2007 at 02:22:13AM -0700, Andrew Morton wrote:
> > On Tue, 10 Jul 2007 11:07:37 +0200 Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
> > > We reserved a different syscall number than the one that is used right now
> > > in the patch. Please drop this patch... Martin or I will wire up the syscall
> > > as soon as the x86 variant is merged. Everything else just causes trouble and
> > > confusion.
> > 
> > OK, I dropped all the fallocate patches.
> 
> Andrew, I want to clarify who is going to push the fallocate patches.
> I can either push them to Linus as part of the ext4 patch set, or we
> can wait for you to push them.  I thought since you had them in -mm
> and we were going to wait you to push them (and presume that this was
> going to happen soon).
> 
> Alternatively I can push them directly to Linus along with other ext4
> patches.  We can drop the s390 patch if Martin or Heiko wants to wire
> it up themselves.
> 
> As far as I know there hasn't been any real contention on the actual
> syscall patches, other than the numbering issues, so it seems that
> pushing them to Linus sooner rather than later is the right thing to
> do. 

Where is the latest and greatest version of those patches? Is it still the
patch set distributed in 2.6.22-rc6-mm1? I'd mostly like to see the final
set of flags we're planning on supporting. But yeah, I second the "sooner
rather than later" :)
	--Mark

--
Mark Fasheh
Senior Software Developer, Oracle
mark.fasheh@oracle.com

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: ata and netdev (was Re: -mm merge plans for 2.6.23)
  2007-07-10 17:42 ` ata and netdev (was Re: -mm merge plans for 2.6.23) Jeff Garzik
@ 2007-07-10 18:24   ` Andrew Morton
  2007-07-10 18:55     ` James Bottomley
                       ` (3 more replies)
  2007-07-10 19:56   ` Sergei Shtylyov
  1 sibling, 4 replies; 484+ messages in thread
From: Andrew Morton @ 2007-07-10 18:24 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: linux-kernel, IDE/ATA development list, netdev, Tejun Heo,
	Alan Cox, Deepak Saxena, Dan Faerch, Benjamin LaHaise

On Tue, 10 Jul 2007 13:42:16 -0400
Jeff Garzik <jeff@garzik.org> wrote:

> 
> (just to provide my indicator of status)

Thanks.

> > libata-add-irq_flags-to-struct-pata_platform_info-fix.patch
> 
> are other pata_platform people happy with this?  I don't know embedded 
> well enough to know if adding this struct member will break things.

This is just a silly remove-unneeded-cast-of-void* cleanup.  I wrote this
as a fixup against
libata-add-irq_flags-to-struct-pata_platform_info.patch with the intention
of folding it into that base patch, but you went and merged the submitter's
original patch so this trivial fixup got stranded in -mm.  Feel free to give
it the piss-off-too-trivial treatment.

> > ata-ahci-alpm-store-interrupt-value.patch
> > ata-ahci-alpm-expose-power-management-policy-option-to-users.patch
> > ata-ahci-alpm-enable-link-power-management-for-ata-drivers.patch
> > ata-ahci-alpm-enable-aggressive-link-power-management-for-ahci-controllers.patch
> > 
> >  These appear to need some work.
> 
> seemed mostly OK to me.  what comments did I miss?

Oh, I thought these were the patches which affected scsi and which James
had issues with.  I guess I got confused.

> 
> > libata-add-human-readable-error-value-decoding.patch
> 
> still pondering; in my mbox queue
> 
> > libata-fix-hopefully-all-the-remaining-problems-with.patch
> > testing-patch-for-ali-pata-fixes-hopefully-for-the-problems-with-atapi-dma.patch
> > pata_ali-more-work.patch
> 
> No idea.  I would poke Alan.  Probably drop.
> 

Alan: poke.

> 
> > 8139too-force-media-setting-fix.patch
> > blackfin-on-chip-ethernet-mac-controller-driver.patch
> > atari_pamsnetc-old-declaration-ritchie-style-fix.patch
> > sundance-phy-address-form-0-only-for-device-id-0x0200.patch
> 
> Needs a bug fix, so that the newly modified loop doesn't scan the final 
> phy id, then loop back around to scan the first again.
> 
> 
> > 3x59x-fix-pci-resource-management.patch
> > update-smc91x-driver-with-arm-versatile-board-info.patch
> > drivers-net-ns83820c-add-paramter-to-disable-auto.patch
> > 
> >  netdev patches which are stuck in limbo land.
> 
> ?  I don't think I've seen these.
> 

3x59x-fix-pci-resource-management.patch: you wrote it ;) I have a comment
here:

- I don't remember the story with cardbus either.  Presumably once upon a
  time the cardbus layer was claiming IO regions on behalf of cardbus
  devices (?)

Need to think about that.

update-smc91x-driver-with-arm-versatile-board-info.patch:

See comment from rmk in changelog:
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.22-rc6/2.6.22-rc6-mm1/broken-out/update-smc91x-driver-with-arm-versatile-board-info.patch

Deepak, can we move this along a bit please?

drivers-net-ns83820c-add-paramter-to-disable-auto.patch:

See comments in changelog: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.22-rc6/2.6.22-rc6-mm1/broken-out/drivers-net-ns83820c-add-paramter-to-disable-auto.patch

Dan, Ben: is there any prospect of progress here?

> 
> > bonding-bond_mainc-make-2-functions-static.patch
> 
> FWIW bonding stuff should go to me, since it lives mostly in drivers/net
> 

Ah, noted.

> > x86-initial-fixmap-support.patch
> 
> Andi material?
> 

Spose so.  But it's buried in the middle of a series of four patches.

> 
> > mm-revert-kernel_ds-buffered-write-optimisation.patch
> > revert-81b0c8713385ce1b1b9058e916edcf9561ad76d6.patch
> > revert-6527c2bdf1f833cc18e8f42bd97973d583e4aa83.patch
> > mm-clean-up-buffered-write-code.patch
> > mm-debug-write-deadlocks.patch
> > mm-trim-more-holes.patch
> > mm-buffered-write-cleanup.patch
> > mm-write-iovec-cleanup.patch
> > mm-fix-pagecache-write-deadlocks.patch
> > mm-buffered-write-iterator.patch
> > fs-fix-data-loss-on-error.patch
> > mm-restore-kernel_ds-optimisations.patch
> >  pagefault-in-write deadlock fixes.  Will hold for 2.6.24.
> 
> Any of the above worth 2.6.23?  Just wondering if they were useful 
> cleanups / minor fixes prior to new aops patches?
> 

The first few patches will a) fix up our writev performance regression and
b) reintroduce the writev() deadlock which the writev()-regresion-adding
patch fixed.

So it's all a bit ugly.

> 
> > oss-trident-massive-whitespace-removal.patch
> > oss-trident-fix-locking-around-write_voice_regs.patch
> > oss-trident-replace-deprecated-pci_find_device-with-pci_get_device.patch
> > remove-options-depending-on-oss_obsolete.patch
> > 
> >  Merge
> 
> what about just removing the OSS drivers in question?  :)
> 

Hey, I only work here.

> 
> > intel-iommu-dmar-detection-and-parsing-logic.patch
> > intel-iommu-pci-generic-helper-function.patch
> > intel-iommu-clflush_cache_range-now-takes-size-param.patch
> > intel-iommu-iova-allocation-and-management-routines.patch
> > intel-iommu-intel-iommu-driver.patch
> > intel-iommu-avoid-memory-allocation-failures-in-dma-map-api-calls.patch
> > intel-iommu-intel-iommu-cmdline-option-forcedac.patch
> > intel-iommu-dmar-fault-handling-support.patch
> > intel-iommu-iommu-gfx-workaround.patch
> > intel-iommu-iommu-floppy-workaround.patch
> > 
> >  Don't know.  I don't think there were any great objections, but I don't
> >  think much benefit has been demonstrated?
> 
> Just the general march of progress on new hardware :)
> 
> I would like to see this support merged in /some/ form.  We've been 
> telling Intel for years they were sillyheads for not bothering with an 
> IOMMU.  Now that they have, we should give them a cookie and support 
> good technology.

OK, thanks.

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: PCI probing changes
  2007-07-10  8:31 -mm merge plans for 2.6.23 Andrew Morton
                   ` (12 preceding siblings ...)
  2007-07-10 17:49 ` ext2 reservations (Re: " Alexey Dobriyan
@ 2007-07-10 18:34 ` Jesse Barnes
  2007-07-10 18:55   ` Andrew Morton
  2007-07-10 18:44 ` agp / cpufreq Dave Jones
                   ` (12 subsequent siblings)
  26 siblings, 1 reply; 484+ messages in thread
From: Jesse Barnes @ 2007-07-10 18:34 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

On Tuesday, July 10, 2007 1:31:52 Andrew Morton wrote:
> pci-disable-decode-of-io-memory-during-bar-sizing.patch

This is a core PCI change, should probably go through Greg and/or 
linux-pci instead?  Or just send it to Linus directly, iirc everyone 
was ok with the change for 2.6.23.

Also, you can add a
Signed-off-by:  Jesse Barnes <jesse.barnes@intel.com>
to it.

Thanks,
Jesse

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: containers (was Re: -mm merge plans for 2.6.23)
  2007-07-10 10:52 ` containers (was Re: -mm merge plans for 2.6.23) Srivatsa Vaddagiri
  2007-07-10 11:19   ` Ingo Molnar
@ 2007-07-10 18:34   ` Paul Menage
  2007-07-10 18:53     ` Andrew Morton
  1 sibling, 1 reply; 484+ messages in thread
From: Paul Menage @ 2007-07-10 18:34 UTC (permalink / raw)
  To: vatsa; +Cc: Andrew Morton, linux-kernel, containers

On 7/10/07, Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> wrote:
> >
> >  Container stuff.  Hold, I guess.  I was expecting updates from Paul.
>
> Paul,
>         Are you working on a new version? I thought it was mostly ready
> for mainline.
>

There are definitely some big changes that I want to make internally
to the framework, but I guess they don't have to block pushing the
basic framework to mainline.

I've got a new patchset that's primarily got all the various -mm fix
patches rolled into the appropriate original patches, along with plus
some small tweaks

- changed the Kconfig files to avoid using "select"
- adding the subsystem name as a prefix for each control file to
enforce namespace scoping
- misc contributions from others

Short-term I also want to:

- rethink the linked list that runs through each task to its css_group
object, since that seemed to hurt performance a bit, but for now that
can probably be solved by just ripping it out and going back to
scanning the tasklist to enumerate tasks in a container.

- extend the options parsing, so we can have more than just a list of
subsystems. Probably changing the existing -o<subsys1>,<subsys2>,...
to be one of:
  -osubsys=<subsys1>:<subsys2>:...,<otheropt>=<otherval>
  -osubsys=<subsys1>,subsys=<subsys2>,subsys=...,<otheropt>=<otherval>
  (what's the preferred convention for fs mount options with multiple values?)

I'd not realised that anything else depending on containers was ready
for upstream merge, but if CFS group support is ready then merging a
subset of them is probably a good idea, since this is an application
that I can see a lot of people wanting to play with.

Andrew, how about we merge enough of the container framework to
support CFS? Bits we could leave out for now include container_clone()
support and the nsproxy subsystem, fork/exit callback hooks, and
possibly leave cpusets alone for now (which would also mean we could
skip the automatic release-agent stuff). I'm in Tokyo for the Linux
Foundation Japan symposium right now, but I should be able to get the
new patchset to you for Friday afternoon.

Paul

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23 -- sys_fallocate
  2007-07-10 18:05       ` Heiko Carstens
@ 2007-07-10 18:39         ` Amit K. Arora
  2007-07-10 18:41         ` Andrew Morton
  1 sibling, 0 replies; 484+ messages in thread
From: Amit K. Arora @ 2007-07-10 18:39 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Theodore Tso, Andrew Morton, linux-kernel, Andi Kleen,
	Martin Schwidefsky

On Tue, Jul 10, 2007 at 08:05:31PM +0200, Heiko Carstens wrote:
> > Alternatively I can push them directly to Linus along with other ext4
> > patches.  We can drop the s390 patch if Martin or Heiko wants to wire
> > it up themselves.
> 
> Yes, please drop the s390 patch. In general it seems to be better if only
> one architecture gets a syscall wired up initially and let other arches
> follow later.
> 
> Just wondering if the x86_64 compat syscall gets ever fixed? I think
> I mentioned already three or four times to Amit that it is broken.
> Or is it that nobody cares? Dunno..

Last time it was brought up was when TAKE5 of the patchset was posted
and I had planned to fix this in the TAKE6 - which didn't happen since
there was no final descision on the mode flags.
Anyhow, the x86_64 compat syscall has already been fixed in the ext4
patch queue.
I will repost all the patches rebased on 2.6.22 (as they are in the
ext4 patch queue), since these have already been dropped from -mm.
 
> In addition there used to be a somewhat inofficial rule that new syscalls
> have to come with a test program, so people can easily test if they wired
> up the syscall correctly.

Ok. Will work on a small testcase and post it soon.

--
Regards,
Amit Arora

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23 -- sys_fallocate
  2007-07-10 18:05       ` Heiko Carstens
  2007-07-10 18:39         ` Amit K. Arora
@ 2007-07-10 18:41         ` Andrew Morton
  2007-07-11  9:36           ` testcases, was " Christoph Hellwig
  2007-07-11  9:40           ` Andi Kleen
  1 sibling, 2 replies; 484+ messages in thread
From: Andrew Morton @ 2007-07-10 18:41 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Theodore Tso, linux-kernel, Andi Kleen, Amit Arora, Martin Schwidefsky

On Tue, 10 Jul 2007 20:05:31 +0200
Heiko Carstens <heiko.carstens@de.ibm.com> wrote:

> > Alternatively I can push them directly to Linus along with other ext4
> > patches.  We can drop the s390 patch if Martin or Heiko wants to wire
> > it up themselves.
> 
> Yes, please drop the s390 patch. In general it seems to be better if only
> one architecture gets a syscall wired up initially and let other arches
> follow later.

Yep.

otoh, fallocate() was special, because we had so many problems working out
how to organise the args so that certain kooky architectures can implement
it.

> Just wondering if the x86_64 compat syscall gets ever fixed? I think
> I mentioned already three or four times to Amit that it is broken.
> Or is it that nobody cares? Dunno..
> 
> In addition there used to be a somewhat inofficial rule that new syscalls
> have to come with a test program, so people can easily test if they wired
> up the syscall correctly.

Yes please.  I normally just slam the whole .c file into the changelog.

I'd support an ununofficial rule that submitters of new syscalls also raise
a patch against LTP, come to that...

^ permalink raw reply	[flat|nested] 484+ messages in thread

* agp / cpufreq.
  2007-07-10  8:31 -mm merge plans for 2.6.23 Andrew Morton
                   ` (13 preceding siblings ...)
  2007-07-10 18:34 ` PCI probing changes Jesse Barnes
@ 2007-07-10 18:44 ` Dave Jones
  2007-07-10 20:09 ` -mm merge plans for 2.6.23 Christoph Lameter
                   ` (11 subsequent siblings)
  26 siblings, 0 replies; 484+ messages in thread
From: Dave Jones @ 2007-07-10 18:44 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

On Tue, Jul 10, 2007 at 01:31:52AM -0700, Andrew Morton wrote:

 > working-3d-dri-intel-agpko-resume-for-i815-chip.patch
 > 
 >  Sent to davej

You managed to sneak this to me just hours before I handed
AGP maintainership to Dave Airlie.  FWIW, I think this needs
to redone in a much more generic manner before it goes mainline.

 > bugfix-cpufreq-in-combination-with-performance-governor.patch
 > restore-previously-used-governor-on-a-hot-replugged-cpu.patch
 > 
 >  Sent to davej

Will start merging the backlog once someone (ahem) pulls
the last lot of stuff.

	Dave

-- 
http://www.codemonkey.org.uk

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23 - ioat/dma engine
  2007-07-10 18:05   ` Nelson, Shannon
@ 2007-07-10 18:47     ` Andrew Morton
  2007-07-10 21:18       ` Nelson, Shannon
  0 siblings, 1 reply; 484+ messages in thread
From: Andrew Morton @ 2007-07-10 18:47 UTC (permalink / raw)
  To: Nelson, Shannon; +Cc: Kok, Auke-jan H, linux-kernel, Leech, Christopher

On Tue, 10 Jul 2007 11:05:45 -0700
"Nelson, Shannon" <shannon.nelson@intel.com> wrote:

> Kok, Auke wrote: 
> >Andrew Morton wrote:
> >> git-ioat-vs-git-md-accel.patch
> >> ioat-warning-fix.patch
> >> fix-i-oat-for-kexec.patch
> >> 
> >>  I don't seem to be able to get rid of these.  Chris Leech 
> >appears to have
> >>  vanished.
> >
> >Chris is a moving target. Thankfully we have Shannon Nelson 
> >taking over Chris' 
> >duties. Shannon, can you take a look at these and see what 
> >needs to happen to it 
> >? Most likely these just need to be pushed to the right person.
> 
> Auke: Thanks for the introduction :-).

Hi, Shannon.

> Andrew: All three of these patches are reasonable and can be pushed on
> up.  You can add my sign-off to all three:
> Signed-off-by: Shannon Nelson <shannon.nelson@intel.com>

OK, the way it works is that I send these patches at the git tree
maintainer, then the git tree maintainer merges them (this step is
unreliable) and then when I repull that git tree maintainer's tree I see
that they got merged so I drop them from -mm.  The git tree maintainer
decides when to send them to Linus.

I am presently pulling git://lost.foo-projects.org/~cleech/linux-2.6#master
into -mm.

Will you be taking over the IOAT git tree?  If so, please send me a
suitable git URL when it's ready.

The above tree has several changes in it from January (see
ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.22-rc6/2.6.22-rc6-mm1/broken-out/git-ioat.patch).
 Please take a look at those, work out what we should do with it all.


^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: containers (was Re: -mm merge plans for 2.6.23)
  2007-07-10 18:34   ` Paul Menage
@ 2007-07-10 18:53     ` Andrew Morton
  2007-07-10 19:05       ` Paul Menage
  2007-07-11  4:55       ` Srivatsa Vaddagiri
  0 siblings, 2 replies; 484+ messages in thread
From: Andrew Morton @ 2007-07-10 18:53 UTC (permalink / raw)
  To: Paul Menage; +Cc: vatsa, linux-kernel, containers

On Tue, 10 Jul 2007 11:34:38 -0700
"Paul Menage" <menage@google.com> wrote:

> Andrew, how about we merge enough of the container framework to
> support CFS? Bits we could leave out for now include container_clone()
> support and the nsproxy subsystem, fork/exit callback hooks, and
> possibly leave cpusets alone for now (which would also mean we could
> skip the automatic release-agent stuff). I'm in Tokyo for the Linux
> Foundation Japan symposium right now, but I should be able to get the
> new patchset to you for Friday afternoon.

mm..  Given that you propose leaving bits out for the 2.6.23 merge, and
that changes are still pending and that nothing will _use_ the framework in
2.6.23 I'd be inclined to err on the side of caution and hold it all back
from 2.6.23.

This has the advantage that the merge will happen after the kernel-summit
containers discussion which I suspect will be an important point in the
life of this project...

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: ata and netdev (was Re: -mm merge plans for 2.6.23)
  2007-07-10 18:24   ` Andrew Morton
@ 2007-07-10 18:55     ` James Bottomley
  2007-07-10 18:57     ` Jeff Garzik
                       ` (2 subsequent siblings)
  3 siblings, 0 replies; 484+ messages in thread
From: James Bottomley @ 2007-07-10 18:55 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jeff Garzik, linux-kernel, IDE/ATA development list, netdev,
	Tejun Heo, Alan Cox, Deepak Saxena, Dan Faerch, Benjamin LaHaise

On Tue, 2007-07-10 at 11:24 -0700, Andrew Morton wrote:
> > > ata-ahci-alpm-store-interrupt-value.patch
> > > ata-ahci-alpm-expose-power-management-policy-option-to-users.patch
> > > ata-ahci-alpm-enable-link-power-management-for-ata-drivers.patch
> > > ata-ahci-alpm-enable-aggressive-link-power-management-for-ahci-controllers.patch
> > > 
> > >  These appear to need some work.
> > 
> > seemed mostly OK to me.  what comments did I miss?
> 
> Oh, I thought these were the patches which affected scsi and which James
> had issues with.  I guess I got confused.

Well ... my concern was really how to make them more generic ... ahci
isn't the only controller that can do phy power management, and it also
seemed to me that the most generic entity for power management was the
transport rather than the SCSI mid-layer, but that debate is still
ongoing.

James



^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: PCI probing changes
  2007-07-10 18:34 ` PCI probing changes Jesse Barnes
@ 2007-07-10 18:55   ` Andrew Morton
  0 siblings, 0 replies; 484+ messages in thread
From: Andrew Morton @ 2007-07-10 18:55 UTC (permalink / raw)
  To: Jesse Barnes; +Cc: linux-kernel

On Tue, 10 Jul 2007 11:34:03 -0700
Jesse Barnes <jesse.barnes@intel.com> wrote:

> On Tuesday, July 10, 2007 1:31:52 Andrew Morton wrote:
> > pci-disable-decode-of-io-memory-during-bar-sizing.patch
> 
> This is a core PCI change, should probably go through Greg and/or 
> linux-pci instead?  Or just send it to Linus directly, iirc everyone 
> was ok with the change for 2.6.23.

Ah, thanks.  I moved it to the gregkh-pci queue.

> Also, you can add a
> Signed-off-by:  Jesse Barnes <jesse.barnes@intel.com>
> to it.

Updated, thanks.

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: ata and netdev (was Re: -mm merge plans for 2.6.23)
  2007-07-10 18:24   ` Andrew Morton
  2007-07-10 18:55     ` James Bottomley
@ 2007-07-10 18:57     ` Jeff Garzik
  2007-07-10 20:31     ` Sergei Shtylyov
  2007-07-11 16:47     ` Dan Faerch
  3 siblings, 0 replies; 484+ messages in thread
From: Jeff Garzik @ 2007-07-10 18:57 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, IDE/ATA development list, netdev, Tejun Heo,
	Alan Cox, Deepak Saxena, Dan Faerch, Benjamin LaHaise

Andrew Morton wrote:
> On Tue, 10 Jul 2007 13:42:16 -0400
> Jeff Garzik <jeff@garzik.org> wrote:
> 
>> (just to provide my indicator of status)
> 
> Thanks.
> 
>>> libata-add-irq_flags-to-struct-pata_platform_info-fix.patch
>> are other pata_platform people happy with this?  I don't know embedded 
>> well enough to know if adding this struct member will break things.
> 
> This is just a silly remove-unneeded-cast-of-void* cleanup.  I wrote this
> as a fixup against
> libata-add-irq_flags-to-struct-pata_platform_info.patch with the intention
> of folding it into that base patch, but you went and merged the submitter's
> original patch so this trivial fixup got stranded in -mm.  Feel free to give
> it the piss-off-too-trivial treatment.

I'm sorry, I didn't look closely enough.  I was referring to the 
add-irq-flags patch itself, not your small fix.


>>> ata-ahci-alpm-store-interrupt-value.patch
>>> ata-ahci-alpm-expose-power-management-policy-option-to-users.patch
>>> ata-ahci-alpm-enable-link-power-management-for-ata-drivers.patch
>>> ata-ahci-alpm-enable-aggressive-link-power-management-for-ahci-controllers.patch
>>>
>>>  These appear to need some work.
>> seemed mostly OK to me.  what comments did I miss?
> 
> Oh, I thought these were the patches which affected scsi and which James
> had issues with.  I guess I got confused.

hrm.  ISTR James wanted some cleanups, Kristen did some cleanups, then 
looking at the cleanups decided they were needed / appropriate at this time.

Anyway, these are in my mbox queue and the libata portions (of which the 
code is the majority) seem OK.  Need to give them a final review.

	Jeff




^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: containers (was Re: -mm merge plans for 2.6.23)
  2007-07-10 18:53     ` Andrew Morton
@ 2007-07-10 19:05       ` Paul Menage
  2007-07-11  4:55       ` Srivatsa Vaddagiri
  1 sibling, 0 replies; 484+ messages in thread
From: Paul Menage @ 2007-07-10 19:05 UTC (permalink / raw)
  To: Andrew Morton; +Cc: vatsa, linux-kernel, containers

On 7/10/07, Andrew Morton <akpm@linux-foundation.org> wrote:
> > Andrew, how about we merge enough of the container framework to
> > support CFS? Bits we could leave out for now include container_clone()
> > support and the nsproxy subsystem, fork/exit callback hooks, and
> > possibly leave cpusets alone for now (which would also mean we could
> > skip the automatic release-agent stuff). I'm in Tokyo for the Linux
> > Foundation Japan symposium right now, but I should be able to get the
> > new patchset to you for Friday afternoon.
>
> mm..  Given that you propose leaving bits out for the 2.6.23 merge, and
> that changes are still pending and that nothing will _use_ the framework in
> 2.6.23

That's what I was originally thinking too, but since CFS has been
merged, CFS group scheduling would use it.

Paul

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: fallocate-implementation-on-i86-x86_64-and-powerpc.patch (was: re: -mm merge plans for 2.6.23)
  2007-07-10 17:15   ` Andrew Morton
  2007-07-10 17:44     ` fallocate-implementation-on-i86-x86_64-and-powerpc.patch Jeff Garzik
@ 2007-07-10 19:07     ` Theodore Tso
  2007-07-10 19:31       ` Andrew Morton
  1 sibling, 1 reply; 484+ messages in thread
From: Theodore Tso @ 2007-07-10 19:07 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Amit Arora, Andi Kleen, Paul Mackerras,
	Benjamin Herrenschmidt, Arnd Bergmann, Luck, Tony,
	Heiko Carstens, Martin Schwidefsky, Mark Fasheh

On Tue, Jul 10, 2007 at 10:15:58AM -0700, Andrew Morton wrote:
> So I dropped everything.  Let's start again from scratch.  I'd suggest that
> for now we go with just an i386/x86_64 implementation, let the arch
> maintainers wire things up when that has settled down.

Ok, so no objections if we push the i386/x86_64 implementation (only),
plus the ext4 support to Linus?

						- Ted



^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: fallocate-implementation-on-i86-x86_64-and-powerpc.patch (was: re: -mm merge plans for 2.6.23)
  2007-07-10 19:07     ` fallocate-implementation-on-i86-x86_64-and-powerpc.patch (was: re: -mm merge plans for 2.6.23) Theodore Tso
@ 2007-07-10 19:31       ` Andrew Morton
  0 siblings, 0 replies; 484+ messages in thread
From: Andrew Morton @ 2007-07-10 19:31 UTC (permalink / raw)
  To: Theodore Tso
  Cc: linux-kernel, Amit Arora, Andi Kleen, Paul Mackerras,
	Benjamin Herrenschmidt, Arnd Bergmann, Luck, Tony,
	Heiko Carstens, Martin Schwidefsky, Mark Fasheh

On Tue, 10 Jul 2007 15:07:35 -0400
Theodore Tso <tytso@mit.edu> wrote:

> On Tue, Jul 10, 2007 at 10:15:58AM -0700, Andrew Morton wrote:
> > So I dropped everything.  Let's start again from scratch.  I'd suggest that
> > for now we go with just an i386/x86_64 implementation, let the arch
> > maintainers wire things up when that has settled down.
> 
> Ok, so no objections if we push the i386/x86_64 implementation (only),
> plus the ext4 support to Linus?
> 

Sounds like a plan.

I haven't seriously looked at ext4 code in many months.  When I did I found
the changes to be quite incomprehensible, very, very poorly commented and
with quite a lot of odd-looking things about which I asked but got, iirc,
no useful reply.

Hopefully it got better.

Which patches are you proposing merging into 2.6.23?

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: ata and netdev (was Re: -mm merge plans for 2.6.23)
  2007-07-10 17:42 ` ata and netdev (was Re: -mm merge plans for 2.6.23) Jeff Garzik
  2007-07-10 18:24   ` Andrew Morton
@ 2007-07-10 19:56   ` Sergei Shtylyov
  1 sibling, 0 replies; 484+ messages in thread
From: Sergei Shtylyov @ 2007-07-10 19:56 UTC (permalink / raw)
  To: Jeff Garzik; +Cc: Andrew Morton, linux-kernel, netdev

Hello.

Jeff Garzik wrote:

>> 3x59x-fix-pci-resource-management.patch

    Now that the fix for CONFIG_PCI=n has been merged, what's left is to test 
this on EISA (at least Andrew wanted it :-).

>>  netdev patches which are stuck in limbo land.

> ?  I don't think I've seen these.

    You should have, I was sending it to you.

WBR, Sergei

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-10  8:31 -mm merge plans for 2.6.23 Andrew Morton
                   ` (14 preceding siblings ...)
  2007-07-10 18:44 ` agp / cpufreq Dave Jones
@ 2007-07-10 20:09 ` Christoph Lameter
  2007-07-11  9:42   ` Mel Gorman
  2007-07-11 11:35 ` Christoph Hellwig
                   ` (10 subsequent siblings)
  26 siblings, 1 reply; 484+ messages in thread
From: Christoph Lameter @ 2007-07-10 20:09 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Mel Gorman

On Tue, 10 Jul 2007, Andrew Morton wrote:

> slub-exploit-page-mobility-to-increase-allocation-order.patch
> slub-reduce-antifrag-max-order.patch
> 
>  These are slub changes which are dependent on Mel's stuff, and I have a note
>  here that there were reports of page allocation failures with these.  What's
>  up with that?

Those were fixed and all has been well since as far as I know.

>  Maybe I should just drop the 100-odd marginal-looking MM patches?  We're
>  simply not showing compelling reasons for merging them and quite a lot of them
>  are stuck in a 90% complete state.

As far as I can tell the antifrag patches are stable and are significantly 
enhancing various aspects of the VM and also make it more reliable. SLUB 
can use it to increase scalability. MM has been using order 3 allocs via 
SLUB for months now without a problem. Without the antifrag patches order 
1 allocs could cause OOMs.

It opens the door for functionality that we wanted for a long time such a 
memory unplug etc.


^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23 -- sys_fallocate
  2007-07-10 18:20       ` Mark Fasheh
@ 2007-07-10 20:28         ` Amit K. Arora
  0 siblings, 0 replies; 484+ messages in thread
From: Amit K. Arora @ 2007-07-10 20:28 UTC (permalink / raw)
  To: Mark Fasheh
  Cc: Theodore Tso, Andrew Morton, Heiko Carstens, linux-kernel,
	Andi Kleen, Martin Schwidefsky

On Tue, Jul 10, 2007 at 11:20:47AM -0700, Mark Fasheh wrote:
> On Tue, Jul 10, 2007 at 11:45:03AM -0400, Theodore Tso wrote:
> > On Tue, Jul 10, 2007 at 02:22:13AM -0700, Andrew Morton wrote:
> > > On Tue, 10 Jul 2007 11:07:37 +0200 Heiko Carstens <heiko.carstens@de.ibm.com> wrote:
> > > > We reserved a different syscall number than the one that is used right now
> > > > in the patch. Please drop this patch... Martin or I will wire up the syscall
> > > > as soon as the x86 variant is merged. Everything else just causes trouble and
> > > > confusion.
> > > 
> > > OK, I dropped all the fallocate patches.
> > 
> > Andrew, I want to clarify who is going to push the fallocate patches.
> > I can either push them to Linus as part of the ext4 patch set, or we
> > can wait for you to push them.  I thought since you had them in -mm
> > and we were going to wait you to push them (and presume that this was
> > going to happen soon).
> > 
> > Alternatively I can push them directly to Linus along with other ext4
> > patches.  We can drop the s390 patch if Martin or Heiko wants to wire
> > it up themselves.
> > 
> > As far as I know there hasn't been any real contention on the actual
> > syscall patches, other than the numbering issues, so it seems that
> > pushing them to Linus sooner rather than later is the right thing to
> > do. 
> 
> Where is the latest and greatest version of those patches? Is it still the
> patch set distributed in 2.6.22-rc6-mm1? I'd mostly like to see the final
> set of flags we're planning on supporting. But yeah, I second the "sooner
> rather than later" :)

I have posted the latest fallocate patches as part of TAKE6. These
patches are exactly same as how they currently look in the ext4 patch
queue being maintained by Ted.

--
Regards,
Amit Arora

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: ata and netdev (was Re: -mm merge plans for 2.6.23)
  2007-07-10 18:24   ` Andrew Morton
  2007-07-10 18:55     ` James Bottomley
  2007-07-10 18:57     ` Jeff Garzik
@ 2007-07-10 20:31     ` Sergei Shtylyov
  2007-07-10 20:35       ` Andrew Morton
  2007-07-11 16:47     ` Dan Faerch
  3 siblings, 1 reply; 484+ messages in thread
From: Sergei Shtylyov @ 2007-07-10 20:31 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Jeff Garzik, linux-kernel, netdev

Hello.

Andrew Morton wrote:

> 3x59x-fix-pci-resource-management.patch: you wrote it ;) I have a comment

    No, I did, almost a year ago already. :-)

> here:

> - I don't remember the story with cardbus either.  Presumably once upon a
>   time the cardbus layer was claiming IO regions on behalf of cardbus
>   devices (?)

    IIRC, that's your own comment.

> Need to think about that.

WBR, Sergei

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: ata and netdev (was Re: -mm merge plans for 2.6.23)
  2007-07-10 20:31     ` Sergei Shtylyov
@ 2007-07-10 20:35       ` Andrew Morton
  0 siblings, 0 replies; 484+ messages in thread
From: Andrew Morton @ 2007-07-10 20:35 UTC (permalink / raw)
  To: Sergei Shtylyov; +Cc: Jeff Garzik, linux-kernel, netdev

On Wed, 11 Jul 2007 00:31:23 +0400
Sergei Shtylyov <sshtylyov@ru.mvista.com> wrote:

> Hello.
> 
> Andrew Morton wrote:
> 
> > 3x59x-fix-pci-resource-management.patch: you wrote it ;) I have a comment
> 
>     No, I did, almost a year ago already. :-)

I thought that was odd.  I fixed the attribution.
 
> > here:
> 
> > - I don't remember the story with cardbus either.  Presumably once upon a
> >   time the cardbus layer was claiming IO regions on behalf of cardbus
> >   devices (?)
> 
>     IIRC, that's your own comment.

yup, that's what "I have a comment" meant ;)

The comment seems rather bogus actually.  Let's just merge it.


^ permalink raw reply	[flat|nested] 484+ messages in thread

* RE: -mm merge plans for 2.6.23 - ioat/dma engine
  2007-07-10 18:47     ` Andrew Morton
@ 2007-07-10 21:18       ` Nelson, Shannon
  0 siblings, 0 replies; 484+ messages in thread
From: Nelson, Shannon @ 2007-07-10 21:18 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Kok, Auke-jan H, linux-kernel, Leech, Christopher

Andrew Morton [mailto:akpm@linux-foundation.org] 
> 
>I am presently pulling 
>git://lost.foo-projects.org/~cleech/linux-2.6#master
>into -mm.
>
>Will you be taking over the IOAT git tree?  If so, please send me a
>suitable git URL when it's ready.

I'll be getting there Real Soon Now.  The transition seems to be a
little rocky at the moment...

>The above tree has several changes in it from January (see
>ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2
>.6.22-rc6/2.6.22-rc6-mm1/broken-out/git-ioat.patch).
> Please take a look at those, work out what we should do with it all.

Will do.  Thanks for your patience.

sln
======================================================================
Mr. Shannon Nelson                 LAN Access Division, Intel Corp.
Shannon.Nelson@intel.com                I don't speak for Intel
(503) 712-7659                    Parents can't afford to be squeamish.

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: fallocate-implementation-on-i86-x86_64-and-powerpc.patch
  2007-07-10 17:44     ` fallocate-implementation-on-i86-x86_64-and-powerpc.patch Jeff Garzik
@ 2007-07-10 23:27       ` Paul Mackerras
  2007-07-11  0:16         ` fallocate-implementation-on-i86-x86_64-and-powerpc.patch Andrew Morton
  0 siblings, 1 reply; 484+ messages in thread
From: Paul Mackerras @ 2007-07-10 23:27 UTC (permalink / raw)
  To: Jeff Garzik
  Cc: Andrew Morton, Theodore Tso, linux-kernel, Amit Arora,
	Andi Kleen, Benjamin Herrenschmidt, Arnd Bergmann, Luck, Tony,
	Heiko Carstens, Martin Schwidefsky, Mark Fasheh, linux-arch

Jeff Garzik writes:

> Andrew Morton wrote:
> > So I dropped everything.  Let's start again from scratch.  I'd suggest that
> > for now we go with just an i386/x86_64 implementation, let the arch
> > maintainers wire things up when that has settled down.
> 
> 
> It's my observation that that plan usually works the best.  Arch 

... except when the initial implementer picks an argument order which
doesn't work on some archs, as happened with sys_sync_file_range.
That is also the case with fallocate IIRC.

We did come up with an order that worked for everybody, but that
discussion seemed to get totally ignored by the ext4 developers.

Paul.

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: fallocate-implementation-on-i86-x86_64-and-powerpc.patch
  2007-07-10 23:27       ` fallocate-implementation-on-i86-x86_64-and-powerpc.patch Paul Mackerras
@ 2007-07-11  0:16         ` Andrew Morton
  2007-07-11  0:50           ` fallocate-implementation-on-i86-x86_64-and-powerpc.patch Paul Mackerras
  0 siblings, 1 reply; 484+ messages in thread
From: Andrew Morton @ 2007-07-11  0:16 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Jeff Garzik, Theodore Tso, linux-kernel, Amit Arora, Andi Kleen,
	Benjamin Herrenschmidt, Arnd Bergmann, Luck, Tony,
	Heiko Carstens, Martin Schwidefsky, Mark Fasheh, linux-arch

On Wed, 11 Jul 2007 09:27:40 +1000
Paul Mackerras <paulus@samba.org> wrote:

> We did come up with an order that worked for everybody, but that
> discussion seemed to get totally ignored by the ext4 developers.

It was a long discussion.

Can someone please remind us what the signature of the syscall
(and the compat handler) should be?

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: fallocate-implementation-on-i86-x86_64-and-powerpc.patch
  2007-07-11  0:16         ` fallocate-implementation-on-i86-x86_64-and-powerpc.patch Andrew Morton
@ 2007-07-11  0:50           ` Paul Mackerras
  2007-07-11 15:39             ` fallocate-implementation-on-i86-x86_64-and-powerpc.patch Theodore Tso
  0 siblings, 1 reply; 484+ messages in thread
From: Paul Mackerras @ 2007-07-11  0:50 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jeff Garzik, Theodore Tso, linux-kernel, Amit Arora, Andi Kleen,
	Benjamin Herrenschmidt, Arnd Bergmann, Luck, Tony,
	Heiko Carstens, Martin Schwidefsky, Mark Fasheh, linux-arch

Andrew Morton writes:

> On Wed, 11 Jul 2007 09:27:40 +1000
> Paul Mackerras <paulus@samba.org> wrote:
> 
> > We did come up with an order that worked for everybody, but that
> > discussion seemed to get totally ignored by the ext4 developers.
> 
> It was a long discussion.
> 
> Can someone please remind us what the signature of the syscall
> (and the compat handler) should be?

long sys_fallocate(loff_t offset, loff_t len, int fd, int mode)

should work for everybody.  The compat handler would be

long compat_sys_fallocate(u32 offset_hi, u32 offset_lo, u32 len_hi, u32 len_lo,
			  int fd, int mode)

for big-endian, or swap hi/lo for little-endian.  (Actually it would
be good to have an arch-dependent "stitch two args together" macro and
call them offset_0, offset_1 etc.)

Paul.

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [ck] Re: -mm merge plans for 2.6.23
       [not found]   ` <b21f8390707101802o2d546477n2a18c1c3547c3d7a@mail.gmail.com>
@ 2007-07-11  1:14     ` Andrew Morton
       [not found]       ` <b8bf37780707101852g25d835b4ubbf8da5383755d4b@mail.gmail.com>
                         ` (5 more replies)
  0 siblings, 6 replies; 484+ messages in thread
From: Andrew Morton @ 2007-07-11  1:14 UTC (permalink / raw)
  To: Matthew Hawkins
  Cc: Con Kolivas, ck list, Ingo Molnar, Paul Jackson, linux-mm, linux-kernel

On Wed, 11 Jul 2007 11:02:56 +1000 "Matthew Hawkins" <darthmdh@gmail.com> wrote:

> We all know swap prefetch has been tested out the wazoo since Moses was a
> little boy, is compile-time and runtime selectable, and gives an important
> and quantifiable performance increase to desktop systems.

Always interested.  Please provide us more details on your usage and
testing of that code.  Amount of memory, workload, observed results,
etc?

>  Save a Redhat
> employee some time reinventing the wheel and just merge it.  This wheel
> already has dope 21" rims, homes ;-)

ooh, kernel bling.

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Fwd: [ck] Re: -mm merge plans for 2.6.23
       [not found]       ` <b8bf37780707101852g25d835b4ubbf8da5383755d4b@mail.gmail.com>
@ 2007-07-11  1:53         ` André Goddard Rosa
  0 siblings, 0 replies; 484+ messages in thread
From: André Goddard Rosa @ 2007-07-11  1:53 UTC (permalink / raw)
  To: linux list

On 7/10/07, Andrew Morton <akpm@linux-foundation.org> wrote:
>  On Wed, 11 Jul 2007 11:02:56 +1000 "Matthew Hawkins" <darthmdh@gmail.com> wrote:
>
> > We all know swap prefetch has been tested out the wazoo since Moses was a
> > little boy, is compile-time and runtime selectable, and gives an important
> > and quantifiable performance increase to desktop systems.
>
> Always interested.  Please provide us more details on your usage and
> testing of that code.  Amount of memory, workload, observed results,
> etc?
>
>

It keeps my machine responsive after some time of inactivity,
i.e.  when I try to use firefox in the morning after leaving it running
overnight with multiple tabs open. I have 1Gb of memory in this machine.

With regards,
-- 
[]s,
André Goddard

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [ck] Re: -mm merge plans for 2.6.23
  2007-07-11  1:14     ` [ck] " Andrew Morton
       [not found]       ` <b8bf37780707101852g25d835b4ubbf8da5383755d4b@mail.gmail.com>
@ 2007-07-11  2:21       ` Ira Snyder
  2007-07-11  3:37         ` timotheus
  2007-07-11  2:54       ` Matthew Hawkins
                         ` (3 subsequent siblings)
  5 siblings, 1 reply; 484+ messages in thread
From: Ira Snyder @ 2007-07-11  2:21 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Matthew Hawkins, linux-kernel, Con Kolivas, ck list, linux-mm,
	Paul Jackson

[-- Attachment #1: Type: text/plain, Size: 1376 bytes --]

On Tue, 10 Jul 2007 18:14:19 -0700
Andrew Morton <akpm@linux-foundation.org> wrote:

> On Wed, 11 Jul 2007 11:02:56 +1000 "Matthew Hawkins" <darthmdh@gmail.com> wrote:
> 
> > We all know swap prefetch has been tested out the wazoo since Moses was a
> > little boy, is compile-time and runtime selectable, and gives an important
> > and quantifiable performance increase to desktop systems.
> 
> Always interested.  Please provide us more details on your usage and
> testing of that code.  Amount of memory, workload, observed results,
> etc?
> 

I often leave long compiles running overnight (I'm a gentoo user). I always have the desktop running, with quite a few applications open, usually firefox, amarok, sylpheed, and liferea at the minimum. I've recently tried using a "stock" gentoo kernel, without the swap prefetch patch, and in the morning when I get on the computer, it hits the disk pretty hard pulling my applications (especially firefox) in from swap. With swap prefetch, the system responds like I expect: quick. It doesn't hit the swap at all, at least that I can tell.

Swap prefetch definitely makes a difference for me: it makes my experience MUCH better.

My system is a Core Duo 1.83GHz laptop, with 1GB ram and a 5400 rpm disk. With the disk being so slow, the less I hit swap, the better.

I'll cast my vote to merge swap prefetch.

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [ck] Re: -mm merge plans for 2.6.23
  2007-07-11  1:14     ` [ck] " Andrew Morton
       [not found]       ` <b8bf37780707101852g25d835b4ubbf8da5383755d4b@mail.gmail.com>
  2007-07-11  2:21       ` Ira Snyder
@ 2007-07-11  2:54       ` Matthew Hawkins
  2007-07-11  5:18         ` Nick Piggin
  2007-07-11  3:59       ` Grzegorz Kulewski
                         ` (2 subsequent siblings)
  5 siblings, 1 reply; 484+ messages in thread
From: Matthew Hawkins @ 2007-07-11  2:54 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Con Kolivas, ck list, Ingo Molnar, Paul Jackson, linux-mm, linux-kernel

On 7/11/07, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Wed, 11 Jul 2007 11:02:56 +1000 "Matthew Hawkins" <darthmdh@gmail.com> wrote:

> Always interested.  Please provide us more details on your usage and
> testing of that code.  Amount of memory, workload, observed results,
> etc?

My usual workstation has 1Gb of ram & 2Gb of swap (single partition -
though in the past with multiple drives I would spread swap around the
less-used disks & fiddle with the priority).  Its acting as server for
my home network too (so it has squid, cups, bind, dhcpd, apache, mysql
& postgresql) but for the most part I'll have Listen playing music
while I switch between Flock &/or Firefox, Thunderbird, and
xvncviewer.  On the odd occasion I'll fire up some game (gewled,
actioncube, critical mass).  Compiling these days has been mostly
limited to kernels, I've been building mostly -ck and -cfs - keeping
up-to-date and also doing some odd things (like patching the non-SD
-ck stuff on top of CFS).  Mainly just to get swap prefetch, but also
not to lose skills since I'm out of the daily coding routine now.

Anyhow with swap prefetch, applications that may have been sitting
there idle for a while become responsive in the single-digit seconds
rather than double-digit or worse.  The same goes for a morning wakeup
(ie after nightly cron jobs throw things out) and also after doing
basically anything that wants memory, like benchmarking the various
kernels I'm messing with or doing some local DB work or coding a
memory leak into a web application running under apache ;)

-- 
Matt

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [ck] Re: -mm merge plans for 2.6.23
  2007-07-11  2:21       ` Ira Snyder
@ 2007-07-11  3:37         ` timotheus
  0 siblings, 0 replies; 484+ messages in thread
From: timotheus @ 2007-07-11  3:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Matthew Hawkins, linux-kernel, Con Kolivas, ck list, linux-mm,
	Paul Jackson

[-- Attachment #1: Type: text/plain, Size: 2484 bytes --]

Ira Snyder <kernel@irasnyder.com> writes:

>> Always interested.  Please provide us more details on your usage and
>> testing of that code.  Amount of memory, workload, observed results,
>> etc?
>> 
>
> I often leave long compiles running overnight (I'm a gentoo user). I
> always have the desktop running, with quite a few applications open,
> usually firefox, amarok, sylpheed, and liferea at the minimum. I've
> recently tried using a "stock" gentoo kernel, without the swap
> prefetch patch, and in the morning when I get on the computer, it hits
> the disk pretty hard pulling my applications (especially firefox) in
> from swap. With swap prefetch, the system responds like I expect:
> quick. It doesn't hit the swap at all, at least that I can tell.
>
> Swap prefetch definitely makes a difference for me: it makes my
> experience MUCH better.
>
> My system is a Core Duo 1.83GHz laptop, with 1GB ram and a 5400 rpm
> disk. With the disk being so slow, the less I hit swap, the better.
>
> I'll cast my vote to merge swap prefetch.

Very similar experiences. Other usage patterns that swap prefetch can
cause improvements with:

- Idling VMware session with large memory. Since VMware (server) can use
  mixed swap/RAM, the prefetch allows it swap back into RAM without
  having to make the application active in the foreground.

- Firefox, OO Office, long from-source compilations, all of the normal.

- My largest RAM capacity machine is a Core 2 Duo Laptop with 2 GB of
  RAM. It still benefits from the prefetch after running long
  compilations or backups.

- Also, I have an old Pentium 4 server (1.3 GHz, original RDRAM, ...)
  that uses the CK patches including swap prefetch. It has only 640 MB
  of RAM, and runs GBytes of data backup every night. The swap is split
  among multiple disks, and can easily fill .5 GBytes over
  night. Applications that run in a VNC session, web browsers, office
  programs, etc., all resume much faster with the prefetch. Even the
  intial ssh-login appears snappier; but I think that is just CK's fine
  work elsewhere. :)

I am curious how much of the benefit is due to prefetch, or due to CK
using `vm_mapped' rather than `vm_swappiness'. Swappiness always seemed
such an unbenificial hack to me...

(The past 6 months I've tried weeks/months of using various kernels,
-mm, -ck, vanilla, genpatches, combinations there of -- x86 and ppc.)

I vote for prefetch and `vm_mapped'.

[-- Attachment #2: Type: application/pgp-signature, Size: 188 bytes --]

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [ck] Re: -mm merge plans for 2.6.23
  2007-07-11  1:14     ` [ck] " Andrew Morton
                         ` (2 preceding siblings ...)
  2007-07-11  2:54       ` Matthew Hawkins
@ 2007-07-11  3:59       ` Grzegorz Kulewski
  2007-07-11 12:26       ` Kevin Winchester
  2007-07-12 12:06       ` Kacper Wysocki
  5 siblings, 0 replies; 484+ messages in thread
From: Grzegorz Kulewski @ 2007-07-11  3:59 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Matthew Hawkins, linux-kernel, Con Kolivas, ck list, linux-mm,
	Paul Jackson

On Tue, 10 Jul 2007, Andrew Morton wrote:
> On Wed, 11 Jul 2007 11:02:56 +1000 "Matthew Hawkins" <darthmdh@gmail.com> wrote:
>
>> We all know swap prefetch has been tested out the wazoo since Moses was a
>> little boy, is compile-time and runtime selectable, and gives an important
>> and quantifiable performance increase to desktop systems.
>
> Always interested.  Please provide us more details on your usage and
> testing of that code.  Amount of memory, workload, observed results,
> etc?

I am using swap prefetch in -ck kernels since it was introduced.

My machine: Athlon XP 2000MHz, 1GB DDR 266, fast SATA disk, different 
swap configurations but usually heaps of swap (2GB and/or 8GB).

My workload: desktop usage, KDE, software development, Firefox (HUGE 
memory hog), Eclipse and all that stuff (HUGE memory hog), sometimes other 
applications, sometimes some game such as Americas Army (that one will eat 
all your memory in any configuration), Konsole with heaps of tabs, usually 
some heavy compilations in the background.

Observed result (of not broken swap prefetch versions): after closing some 
memory hog (for example stopping playing game and starting to write some 
code or reloading Firefox after it leaked enough memory to nearly bring 
the system down) the disk will work for some time and after that 
everything works as expected, no heavy swap-in when switching between 
applications and so on, nearly no lags in desktop usage.

This is nearly unnoticable. Unless I have to run pure mainline. In that 
case I can notice that swap prefetch is off very quickly because after 
closing such memory hog and returning to some other application the system 
is slow for long time. Worse: after it starts to work reasonably and I try 
to switch to some other application or even try to use some dialog window 
or module of current application I have to wait, sometimes > 10s for it to 
swap back in (even if 70% of my RAM is free at that time, after memory hog 
is gone). It is painfull.

I observed similar results on my laptop (Athlon 64, 512MB RAM, slow ATA 
disk, similar workload but reduced because hardware is weak).

For me swap prefetch makes huge difference. The system lags a lot less in 
such circumstances.

Personaly I think swap prefetch is a hack. Maybe not very dirty and ugly 
but still a hack. But since:

* nobody proposed anything that can replace it and can be considered a 
no-hack,
* swap prefetch is rather well tested and shouldn't cause regressions (no 
known regressions as far as I know, the patch does not look very 
invasive, was reviewed several times, ...),
* Con said he won't make further -ck relases and won't port these patches 
to newer kernels,
* there are at least several people who see the difference,
* if somebody really hates it (s)he can turn it off

I think it could get merged, at least temporarily, before somebody can 
suggest some better or extended solution.

Personaly I would be very happy to see it in so people like me don't have 
to patch it in or (worse) port it (possibly causing bugs and filling 
additional bug reports and asking additional questions on these lists).

I even wonder if adding the opposite of swap prefetch too wouldn't be even 
better for many workloads. Something like: "when system and swap-disk is 
idle try to copy some pages to swap so when system needs memory swap-out 
could be much cheaper". I suspect patch like that can reduce startup times 
(and other operations) of great memory hogs because disk (the slowest 
device) will only have to read the application and won't have to swap-out 
half of the RAM at the same time.

I am happy to provide further info if needed.


Thanks,

Grzegorz Kulewski


^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: containers (was Re: -mm merge plans for 2.6.23)
  2007-07-10 18:53     ` Andrew Morton
  2007-07-10 19:05       ` Paul Menage
@ 2007-07-11  4:55       ` Srivatsa Vaddagiri
  2007-07-11  5:29         ` Andrew Morton
  1 sibling, 1 reply; 484+ messages in thread
From: Srivatsa Vaddagiri @ 2007-07-11  4:55 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Paul Menage, linux-kernel, containers, Ingo Molnar

On Tue, Jul 10, 2007 at 11:53:19AM -0700, Andrew Morton wrote:
> On Tue, 10 Jul 2007 11:34:38 -0700
> "Paul Menage" <menage@google.com> wrote:
> 
> > Andrew, how about we merge enough of the container framework to
> > support CFS? Bits we could leave out for now include container_clone()
> > support and the nsproxy subsystem, fork/exit callback hooks, and
> > possibly leave cpusets alone for now (which would also mean we could
> > skip the automatic release-agent stuff). I'm in Tokyo for the Linux
> > Foundation Japan symposium right now, but I should be able to get the
> > new patchset to you for Friday afternoon.
> 
> mm..  Given that you propose leaving bits out for the 2.6.23 merge, and
> that changes are still pending and that nothing will _use_ the framework in
> 2.6.23 [...]

Andrew,
	The cpu group scheduler is ready and waiting for the container patches 
in 2.6.23 :)

Here are some options with us:

	a. (As Paul says) merge enough of container patches to enable
	   its use with cfs group scheduler (and possibly cpusets?)

	b. Enable group scheduling bits in 2.6.23 using the user-id grouping 
	   mechanism (aka fair user scheduler). For 2.6.24, we could remove 
	   this interface and use Paul's container patches instead. Since this 
	   means change of API interface between 2.6.23 and 2.6.24, I don't 
	   prefer this option.

	c. Enable group scheduling bits only in -mm for now (2.6.23-mmX), using 
	   Paul's container patches. I can send you a short patch that hooks up 
	   cfs group scheduler with Paul's container infrastructure.

If a. is not possible, I would prefer c.

Let me know your thoughts ..

-- 
Regards,
vatsa

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [ck] Re: -mm merge plans for 2.6.23
  2007-07-11  2:54       ` Matthew Hawkins
@ 2007-07-11  5:18         ` Nick Piggin
  2007-07-11  5:47           ` Ray Lee
  0 siblings, 1 reply; 484+ messages in thread
From: Nick Piggin @ 2007-07-11  5:18 UTC (permalink / raw)
  To: Matthew Hawkins
  Cc: Andrew Morton, Con Kolivas, ck list, Ingo Molnar, Paul Jackson,
	linux-mm, linux-kernel

Matthew Hawkins wrote:
> On 7/11/07, Andrew Morton <akpm@linux-foundation.org> wrote:

> Anyhow with swap prefetch, applications that may have been sitting
> there idle for a while become responsive in the single-digit seconds
> rather than double-digit or worse.  The same goes for a morning wakeup
> (ie after nightly cron jobs throw things out)

OK that's a good data point. It would be really good to be able to
do an analysis on your overnight IO patterns and the corresponding
memory reclaim behaviour and see why things are getting evicted.

Not that swap prefetching isn't a good solution for this situation,
but the fact that things are getting swapped out for you also means
that mapped files and possibly important pagecache and dentries are
being flushed out, which we might be able to avoid.

Thanks,
Nick

-- 
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: containers (was Re: -mm merge plans for 2.6.23)
  2007-07-11  4:55       ` Srivatsa Vaddagiri
@ 2007-07-11  5:29         ` Andrew Morton
  2007-07-11  6:03           ` Srivatsa Vaddagiri
                             ` (2 more replies)
  0 siblings, 3 replies; 484+ messages in thread
From: Andrew Morton @ 2007-07-11  5:29 UTC (permalink / raw)
  To: vatsa; +Cc: Paul Menage, linux-kernel, containers, Ingo Molnar

On Wed, 11 Jul 2007 10:25:16 +0530 Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> wrote:

> On Tue, Jul 10, 2007 at 11:53:19AM -0700, Andrew Morton wrote:
> > On Tue, 10 Jul 2007 11:34:38 -0700
> > "Paul Menage" <menage@google.com> wrote:
> > 
> > > Andrew, how about we merge enough of the container framework to
> > > support CFS? Bits we could leave out for now include container_clone()
> > > support and the nsproxy subsystem, fork/exit callback hooks, and
> > > possibly leave cpusets alone for now (which would also mean we could
> > > skip the automatic release-agent stuff). I'm in Tokyo for the Linux
> > > Foundation Japan symposium right now, but I should be able to get the
> > > new patchset to you for Friday afternoon.
> > 
> > mm..  Given that you propose leaving bits out for the 2.6.23 merge, and
> > that changes are still pending and that nothing will _use_ the framework in
> > 2.6.23 [...]
> 
> Andrew,
> 	The cpu group scheduler is ready and waiting for the container patches 
> in 2.6.23 :)
> 
> Here are some options with us:
> 
> 	a. (As Paul says) merge enough of container patches to enable
> 	   its use with cfs group scheduler (and possibly cpusets?)
> 
> 	b. Enable group scheduling bits in 2.6.23 using the user-id grouping 
> 	   mechanism (aka fair user scheduler). For 2.6.24, we could remove 
> 	   this interface and use Paul's container patches instead. Since this 
> 	   means change of API interface between 2.6.23 and 2.6.24, I don't 
> 	   prefer this option.
> 
> 	c. Enable group scheduling bits only in -mm for now (2.6.23-mmX), using 
> 	   Paul's container patches. I can send you a short patch that hooks up 
> 	   cfs group scheduler with Paul's container infrastructure.
> 
> If a. is not possible, I would prefer c.
> 
> Let me know your thoughts ..

I'm inclined to take the cautious route here - I don't think people will be
dying for the CFS thingy (which I didn't even know about?) in .23, and it's
rather a lot of infrastructure to add for a CPU scheduler configurator
gadget (what does it do, anyway?)

We have plenty of stuff for 2.6.23 already ;)

Is this liveable with??

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [ck] Re: -mm merge plans for 2.6.23
  2007-07-11  5:18         ` Nick Piggin
@ 2007-07-11  5:47           ` Ray Lee
  2007-07-11  5:54             ` Nick Piggin
  2007-07-11  6:00             ` [ck] Re: -mm merge plans for 2.6.23 Nick Piggin
  0 siblings, 2 replies; 484+ messages in thread
From: Ray Lee @ 2007-07-11  5:47 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Matthew Hawkins, Andrew Morton, Con Kolivas, ck list,
	Ingo Molnar, Paul Jackson, linux-mm, linux-kernel

On 7/10/07, Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> Matthew Hawkins wrote:
> > On 7/11/07, Andrew Morton <akpm@linux-foundation.org> wrote:
>
> > Anyhow with swap prefetch, applications that may have been sitting
> > there idle for a while become responsive in the single-digit seconds
> > rather than double-digit or worse.  The same goes for a morning wakeup
> > (ie after nightly cron jobs throw things out)
>
> OK that's a good data point. It would be really good to be able to
> do an analysis on your overnight IO patterns and the corresponding
> memory reclaim behaviour and see why things are getting evicted.

Eviction can happen for multiple reasons, as I'm sure you're painfully
aware. It can happen because of poor balancing choices, or it can
happen because the system is just short of RAM for the workload. As
for the former, you're absolutely right, it would be good to know
where those come from and see if they can be addressed.

However, it's the latter that swap prefetch can help and no amount of
fiddling with the aging code can address.

As an honest question, what's it going to take here? If I were to
write something that watched the task stats at process exit (cool
feature, that), and recorded the IO wait time or some such, and showed
it was lower with a kernel with the prefetch, would *that* get us some
forward motion on this?

I mean, from my point of view, it's a simple mental proof to show that
if you're out of RAM for your working set, things that you'll
eventually need again will get kicked out, and prefetch will bring
those back in before normal access patterns would fault them back in
under today's behavior. That seems like an obvious win. Where's the
corresponding obvious loss that makes this a questionable addition to
the kernel?

Ray

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [ck] Re: -mm merge plans for 2.6.23
  2007-07-11  5:47           ` Ray Lee
@ 2007-07-11  5:54             ` Nick Piggin
  2007-07-11  6:04               ` Ray Lee
  2007-07-11  6:00             ` [ck] Re: -mm merge plans for 2.6.23 Nick Piggin
  1 sibling, 1 reply; 484+ messages in thread
From: Nick Piggin @ 2007-07-11  5:54 UTC (permalink / raw)
  To: Ray Lee
  Cc: Matthew Hawkins, Andrew Morton, Con Kolivas, ck list,
	Ingo Molnar, Paul Jackson, linux-mm, linux-kernel

Ray Lee wrote:
> On 7/10/07, Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> 
>> Matthew Hawkins wrote:
>> > On 7/11/07, Andrew Morton <akpm@linux-foundation.org> wrote:
>>
>> > Anyhow with swap prefetch, applications that may have been sitting
>> > there idle for a while become responsive in the single-digit seconds
>> > rather than double-digit or worse.  The same goes for a morning wakeup
>> > (ie after nightly cron jobs throw things out)
>>
>> OK that's a good data point. It would be really good to be able to
>> do an analysis on your overnight IO patterns and the corresponding
>> memory reclaim behaviour and see why things are getting evicted.
> 
> 
> Eviction can happen for multiple reasons, as I'm sure you're painfully
> aware. It can happen because of poor balancing choices, or it can

s/balancing/reclaim, yes. And for the nightly cron job case, this is
could quite possibly be the cause. At least updatedb should be fairly
easy to apply use-once heuristics for, so if they're not working then
we should hopefully be able to improve it.

-- 
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [ck] Re: -mm merge plans for 2.6.23
  2007-07-11  5:47           ` Ray Lee
  2007-07-11  5:54             ` Nick Piggin
@ 2007-07-11  6:00             ` Nick Piggin
  1 sibling, 0 replies; 484+ messages in thread
From: Nick Piggin @ 2007-07-11  6:00 UTC (permalink / raw)
  To: Ray Lee
  Cc: Matthew Hawkins, Andrew Morton, Con Kolivas, ck list,
	Ingo Molnar, Paul Jackson, linux-mm, linux-kernel

Ray Lee wrote:

> As an honest question, what's it going to take here? If I were to
> write something that watched the task stats at process exit (cool
> feature, that), and recorded the IO wait time or some such, and showed
> it was lower with a kernel with the prefetch, would *that* get us some
> forward motion on this?

Honest answer? Sure, why not. Numbers are good.

-- 
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: containers (was Re: -mm merge plans for 2.6.23)
  2007-07-11  5:29         ` Andrew Morton
@ 2007-07-11  6:03           ` Srivatsa Vaddagiri
  2007-07-11  9:04           ` Ingo Molnar
  2007-07-11 19:44           ` Paul Menage
  2 siblings, 0 replies; 484+ messages in thread
From: Srivatsa Vaddagiri @ 2007-07-11  6:03 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Paul Menage, linux-kernel, containers, Ingo Molnar

On Tue, Jul 10, 2007 at 10:29:42PM -0700, Andrew Morton wrote:
> I'm inclined to take the cautious route here - I don't think people will be
> dying for the CFS thingy (which I didn't even know about?) in .23, and it's
> rather a lot of infrastructure to add for a CPU scheduler configurator
> gadget (what does it do, anyway?)

Hmm ok, if you think the container patches is too early for 2.6.23, fine.
We should definitely target to have it in 2.6.24, by which time I am
thinking the memory rss controller will also be in a good shape.

> We have plenty of stuff for 2.6.23 already ;)
> 
> Is this liveable with??

Fine. I will request you to enable group cpu scheduling in
2.6.23-rcX-mmY atleast, so that it gets some amount of testing. The
essential group scheduling bits is already in Linus' tree now (as part
of cfs merge), so what you need in -mm is a slim patch to hook it with Paul's 
container infrastructure (which I trust will continue to be in -mm until it goes
mainline). I will send across that slim patch later (to be included in
2.6.23-rc1-mm1 perhaps).

-- 
Regards,
vatsa

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [ck] Re: -mm merge plans for 2.6.23
  2007-07-11  5:54             ` Nick Piggin
@ 2007-07-11  6:04               ` Ray Lee
  2007-07-11  6:24                 ` Nick Piggin
  0 siblings, 1 reply; 484+ messages in thread
From: Ray Lee @ 2007-07-11  6:04 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Matthew Hawkins, Andrew Morton, Con Kolivas, ck list,
	Ingo Molnar, Paul Jackson, linux-mm, linux-kernel

On 7/10/07, Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> >> OK that's a good data point. It would be really good to be able to
> >> do an analysis on your overnight IO patterns and the corresponding
> >> memory reclaim behaviour and see why things are getting evicted.
> >
> > Eviction can happen for multiple reasons, as I'm sure you're painfully
> > aware. It can happen because of poor balancing choices, or it can
>
> s/balancing/reclaim, yes. And for the nightly cron job case, this is
> could quite possibly be the cause. At least updatedb should be fairly
> easy to apply use-once heuristics for, so if they're not working then
> we should hopefully be able to improve it.

<nod> Sorry, I'm not so clear on the terminology, am I.

So, that's one part of it: one could argue that for that bit swap
prefetch is a bit of a band-aid over the issue. A useful band-aid,
that works today, isn't invasive, and can be ripped out at some future
time if the underlying issue is eventually solved by a proper use-once
aging mechanism, but nevertheless a band-aid.

The other part is when I've got evolution and a few other things open,
then I run gimp on a raw photo and do some work on it, quit out of
gimp, do a couple of things in a shell to upload the photo to my
server, then switch back to evolution. Hang, waiting on swap in. Well,
the kernel had some free time there to repopulate evolution's working
set, and swap prefetch would help that, while better (or perfect!)
heuristics in the reclaim *won't*.

That's the real issue here.

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [ck] Re: -mm merge plans for 2.6.23
  2007-07-11  6:04               ` Ray Lee
@ 2007-07-11  6:24                 ` Nick Piggin
  2007-07-11  7:50                   ` swap prefetch (Re: -mm merge plans for 2.6.23) Ingo Molnar
  0 siblings, 1 reply; 484+ messages in thread
From: Nick Piggin @ 2007-07-11  6:24 UTC (permalink / raw)
  To: Ray Lee
  Cc: Matthew Hawkins, Andrew Morton, Con Kolivas, ck list,
	Ingo Molnar, Paul Jackson, linux-mm, linux-kernel

Ray Lee wrote:
> On 7/10/07, Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> 
>> >> OK that's a good data point. It would be really good to be able to
>> >> do an analysis on your overnight IO patterns and the corresponding
>> >> memory reclaim behaviour and see why things are getting evicted.
>> >
>> > Eviction can happen for multiple reasons, as I'm sure you're painfully
>> > aware. It can happen because of poor balancing choices, or it can
>>
>> s/balancing/reclaim, yes. And for the nightly cron job case, this is
>> could quite possibly be the cause. At least updatedb should be fairly
>> easy to apply use-once heuristics for, so if they're not working then
>> we should hopefully be able to improve it.
> 
> 
> <nod> Sorry, I'm not so clear on the terminology, am I.
> 
> So, that's one part of it: one could argue that for that bit swap
> prefetch is a bit of a band-aid over the issue. A useful band-aid,
> that works today, isn't invasive, and can be ripped out at some future
> time if the underlying issue is eventually solved by a proper use-once
> aging mechanism, but nevertheless a band-aid.

I think for some workloads it is probably a bandaid, and for others
the concept of prefetching likely to be used again data back in is
undeniably going to be a win for others.

A lot of postitive reports I have seen about this say that desktop
the next morning is more responsive. So I kind of want to know what's
happening here -- as far as I can tell, swap prefetching shouldn't
help a huge amount to recover from a simple updatedb alone --
although if other cron stuff happened that used a bit more memory
afterwards and pushing out some of updatedb's cache, perhaps that's
when swap prefetching finds its niche. I don't know.

However, I don't like the fact that there is _any_ swap happening
on 1GB desktops after a single updatedb run. Is something else
running that hogs a huge amount of memory? Maybe that explains it,
but I don't know. I do know that we probably don't do very good
use-once algorithms on the dentry and inode caches, so updatedb
might cause them to push swap out. We could test that by winding
the vfs reclaim right up.


> The other part is when I've got evolution and a few other things open,
> then I run gimp on a raw photo and do some work on it, quit out of
> gimp, do a couple of things in a shell to upload the photo to my
> server, then switch back to evolution. Hang, waiting on swap in. Well,
> the kernel had some free time there to repopulate evolution's working
> set, and swap prefetch would help that, while better (or perfect!)
> heuristics in the reclaim *won't*.
> 
> That's the real issue here.

Yeah that's an issue, and swap prefetching has the potential to
help there no doubt at all. How much is the saving? I don't think
it will be like an order of magnitude because unfortunately we
also get mapped pagecache being thrown out as well as swap, so
for example all your evolution mailbox, libraries, executable,
etc. is still going to have to be paged back in.

Regarding swap prefetching. I'm not going to argue for or against
it anymore because I have really stopped following where it is
up to, for now. If the code and the results meet the standard that
Andrew wants then I don't particularly mind if he merges it.

It would be nice if some of you guys would still report and test
problems with reclaim when prefetching is turned off -- I have
never encountered the morning after sluggishness (although I don't
doubt for a minute that it is a problem for some).

-- 
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: swap prefetch (Re: -mm merge plans for 2.6.23)
  2007-07-11  6:24                 ` Nick Piggin
@ 2007-07-11  7:50                   ` Ingo Molnar
  0 siblings, 0 replies; 484+ messages in thread
From: Ingo Molnar @ 2007-07-11  7:50 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Ray Lee, Matthew Hawkins, Andrew Morton, Con Kolivas, ck list,
	Paul Jackson, linux-mm, linux-kernel


* Nick Piggin <nickpiggin@yahoo.com.au> wrote:

> Regarding swap prefetching. I'm not going to argue for or against it 
> anymore because I have really stopped following where it is up to, for 
> now. If the code and the results meet the standard that Andrew wants 
> then I don't particularly mind if he merges it.

I have tested it and have read the code, and it looks fine to me. (i've 
reported my test results elsewhere already) We should include this in 
v2.6.23.

Acked-by: Ingo Molnar <mingo@elte.hu>

	Ingo

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: containers (was Re: -mm merge plans for 2.6.23)
  2007-07-11  5:29         ` Andrew Morton
  2007-07-11  6:03           ` Srivatsa Vaddagiri
@ 2007-07-11  9:04           ` Ingo Molnar
  2007-07-11  9:23             ` Paul Jackson
  2007-07-11 19:44           ` Paul Menage
  2 siblings, 1 reply; 484+ messages in thread
From: Ingo Molnar @ 2007-07-11  9:04 UTC (permalink / raw)
  To: Andrew Morton; +Cc: vatsa, Paul Menage, linux-kernel, containers


* Andrew Morton <akpm@linux-foundation.org> wrote:

> > 	c. Enable group scheduling bits only in -mm for now (2.6.23-mmX), using
> > 	   Paul's container patches. I can send you a short patch that hooks up 
> > 	   cfs group scheduler with Paul's container infrastructure.
> > 
> > If a. is not possible, I would prefer c.
> > 
> > Let me know your thoughts ..
> 
> I'm inclined to take the cautious route here - I don't think people 
> will be dying for the CFS thingy (which I didn't even know about?) in 
> .23, and it's rather a lot of infrastructure to add for a CPU 
> scheduler configurator gadget (what does it do, anyway?)
> 
> We have plenty of stuff for 2.6.23 already ;)
> 
> Is this liveable with??

another option would be to trivially hook up CONFIG_FAIR_GROUP_SCHED 
with cpusets, and to offer CONFIG_FAIR_GROUP_SCHED in the Kconfig, 
dependent on CPUSETS and defaulting to off. That would give it a chance 
to be tested, benchmarked, etc.

	Ingo

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: containers (was Re: -mm merge plans for 2.6.23)
  2007-07-11  9:04           ` Ingo Molnar
@ 2007-07-11  9:23             ` Paul Jackson
  2007-07-11 10:03               ` Srivatsa Vaddagiri
  0 siblings, 1 reply; 484+ messages in thread
From: Paul Jackson @ 2007-07-11  9:23 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: akpm, vatsa, menage, linux-kernel, containers

Ingo wrote:
> another option would be to trivially hook up CONFIG_FAIR_GROUP_SCHED 
> with cpusets, ...

ah ... you triggered my procmail filter for 'cpuset' ... ;).

What would it mean to hook up CFS with cpusets?  I've a pretty
good idea what a cpuset is, but don't know what kind of purpose
you have in mind for such a hook.  Could you say a few words to
that?  Thanks.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: Re: -mm merge plans -- lumpy reclaim
  2007-07-10 12:37 ` clam Andy Whitcroft
@ 2007-07-11  9:34   ` Andy Whitcroft
  2007-07-11 16:46     ` Andrew Morton
  0 siblings, 1 reply; 484+ messages in thread
From: Andy Whitcroft @ 2007-07-11  9:34 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: Andrew Morton, linux-kernel, Mel Gorman, Christoph Lameter,
	Peter Zijlstra

[Seems a PEBKAC occured on the subject line, resending lest it become a
victim of "oh thats spam".]

Andy Whitcroft wrote:
> Andrew Morton wrote:
> 
> [...]
>> lumpy-reclaim-v4.patch
>> have-kswapd-keep-a-minimum-order-free-other-than-order-0.patch
>> only-check-absolute-watermarks-for-alloc_high-and-alloc_harder-allocations.patch
>>
>>  Lumpy reclaim.  In a similar situation to Mel's patches.  Stuck due to
>>  general lack or interest and effort.
> 
> The lumpy reclaim patches originally came out of work to support Mel's
> anti-fragmentation work.  As such I think they have become somewhat
> attached to those patches.  Whilst lumpy is most effective where
> placement controls are in place as offered by Mel's work, we see benefit
> from reduction in the "blunderbuss" effect when we reclaim at higher
> orders.  While placement control is pretty much required for the very
> highest orders such as huge page size, lower order allocations are
> benefited in terms of lower collateral damage.
> 
> There are now a few areas other than huge page allocations which can
> benefit.  Stacks are still order 1.  Jumbo frames want higher order
> contiguous pages for there incoming hardware buffers.  SLUB is showing
> performance benefits from moving to a higher allocation order.  All of
> these should benefit from more aggressive targeted reclaim, indeed I
> have been surprised just how often my test workloads trigger lumpy at
> order 1 to get new stacks.
> 
> Truly representative work loads are hard to generate for some of these.
>  Though we have heard some encouraging noises from those who can
> reproduce these problems.
> 
> [...]
> 
> -apw



^ permalink raw reply	[flat|nested] 484+ messages in thread

* testcases, was Re: -mm merge plans for 2.6.23 -- sys_fallocate
  2007-07-10 18:41         ` Andrew Morton
@ 2007-07-11  9:36           ` Christoph Hellwig
  2007-07-11  9:40             ` Nick Piggin
  2007-07-11  9:40           ` Andi Kleen
  1 sibling, 1 reply; 484+ messages in thread
From: Christoph Hellwig @ 2007-07-11  9:36 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Heiko Carstens, Theodore Tso, linux-kernel, Andi Kleen,
	Amit Arora, Martin Schwidefsky

On Tue, Jul 10, 2007 at 11:41:19AM -0700, Andrew Morton wrote:
> I'd support an ununofficial rule that submitters of new syscalls also raise
> a patch against LTP, come to that...

s/ununofficial//, please.  And extend this to every new kernel interface
that's not bound to a specific piece of hardware.

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: testcases, was Re: -mm merge plans for 2.6.23 -- sys_fallocate
  2007-07-11  9:36           ` testcases, was " Christoph Hellwig
@ 2007-07-11  9:40             ` Nick Piggin
  2007-07-11 10:36               ` Michael Kerrisk
  0 siblings, 1 reply; 484+ messages in thread
From: Nick Piggin @ 2007-07-11  9:40 UTC (permalink / raw)
  To: Christoph Hellwig
  Cc: Andrew Morton, Heiko Carstens, Theodore Tso, linux-kernel,
	Andi Kleen, Amit Arora, Martin Schwidefsky, Michael Kerrisk

Christoph Hellwig wrote:
> On Tue, Jul 10, 2007 at 11:41:19AM -0700, Andrew Morton wrote:
> 
>>I'd support an ununofficial rule that submitters of new syscalls also raise
>>a patch against LTP, come to that...
> 
> 
> s/ununofficial//, please.  And extend this to every new kernel interface
> that's not bound to a specific piece of hardware.

Agree, and cc manpages maintainer too? (preferably write most
of the manpage body as well, IMO).

-- 
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23 -- sys_fallocate
  2007-07-10 18:41         ` Andrew Morton
  2007-07-11  9:36           ` testcases, was " Christoph Hellwig
@ 2007-07-11  9:40           ` Andi Kleen
  1 sibling, 0 replies; 484+ messages in thread
From: Andi Kleen @ 2007-07-11  9:40 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Heiko Carstens, Theodore Tso, linux-kernel, Amit Arora,
	Martin Schwidefsky


> I'd support an ununofficial rule that submitters of new syscalls also raise
> a patch against LTP, come to that...

And a patch for the manpages. Definitely in favor. 

-Andi

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-10 20:09 ` -mm merge plans for 2.6.23 Christoph Lameter
@ 2007-07-11  9:42   ` Mel Gorman
  2007-07-11 17:49     ` Christoph Lameter
  0 siblings, 1 reply; 484+ messages in thread
From: Mel Gorman @ 2007-07-11  9:42 UTC (permalink / raw)
  To: Christoph Lameter; +Cc: Andrew Morton, linux-kernel

On (10/07/07 13:09), Christoph Lameter didst pronounce:
> On Tue, 10 Jul 2007, Andrew Morton wrote:
> 
> > slub-exploit-page-mobility-to-increase-allocation-order.patch
> > slub-reduce-antifrag-max-order.patch
> > 
> >  These are slub changes which are dependent on Mel's stuff, and I have a note
> >  here that there were reports of page allocation failures with these.  What's
> >  up with that?
> 
> Those were fixed and all has been well since as far as I know.
> 

SLUB using high orders without page allocation failures do depend on
two very questionable patches that I brought to attention in my inital
merge mail. If grouping pages by mobility goes through, I'll be revisiting
that properly to make sure it can work without deadlocking ever under any
circumstances. Right now, it theoritically could livelock although I've
never been able to reproduce it.

The patches as they are will work for high-order allocations if you are
willing to wait and reclaim memory. The more stressful users need more
effort but it's already been shown that it can be made work with one
approach as the last few months in -mm have shown.

> >  Maybe I should just drop the 100-odd marginal-looking MM patches?  We're
> >  simply not showing compelling reasons for merging them and quite a lot of them
> >  are stuck in a 90% complete state.
> 
> As far as I can tell the antifrag patches are stable and are significantly 
> enhancing various aspects of the VM and also make it more reliable. SLUB 
> can use it to increase scalability. MM has been using order 3 allocs via 
> SLUB for months now without a problem. Without the antifrag patches order 
> 1 allocs could cause OOMs.
> 
> It opens the door for functionality that we wanted for a long time such a 
> memory unplug etc.

And I want to avoid a catch-22 here where the features that depend on
grouping pages by mobility have to exist before grouping pages by
mobility is pushed through.

I would like the patches to go through on the grounds that higher order
allocations can succeed. However, I am also happy to say that order-0
pages should be used as much as possible, that case should always be
made as fast as possible and the world must not end if a high-order
allocation fails.

-- 
-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: containers (was Re: -mm merge plans for 2.6.23)
  2007-07-11  9:23             ` Paul Jackson
@ 2007-07-11 10:03               ` Srivatsa Vaddagiri
  2007-07-11 10:19                 ` Ingo Molnar
  2007-07-11 11:10                 ` Paul Jackson
  0 siblings, 2 replies; 484+ messages in thread
From: Srivatsa Vaddagiri @ 2007-07-11 10:03 UTC (permalink / raw)
  To: Paul Jackson; +Cc: Ingo Molnar, akpm, menage, linux-kernel, containers

On Wed, Jul 11, 2007 at 02:23:52AM -0700, Paul Jackson wrote:
> Ingo wrote:
> > another option would be to trivially hook up CONFIG_FAIR_GROUP_SCHED 
> > with cpusets, ...
> 
> ah ... you triggered my procmail filter for 'cpuset' ... ;).

:-)

> What would it mean to hook up CFS with cpusets?

CFS is the new cpu scheduler in Linus's tree (http://lwn.net/Articles/241085/).
It has some group scheduling capabilities added i.e the core scheduler
now recognizes the concept of a task-group and providing fair cpu time 
to each task-group (in addition to providing fair time to each task in a
group).

The core scheduler however is not concerned with how task groups are formed
and/or how tasks migrate between groups. Thats where a patch like Paul Menage's 
container infrastructure comes in hand - to provide a user-interface for 
managing task-groups (create/delete task groups, migrate task from one
group to another etc). Whatever the chosen user-interface is, cpu
scheduler needs to know about such task-group creation/destruction,
migration of tasks across groups etc.

Unfortunately, the group-scheduler bits will be ready in 2.6.23 while
Paul Menage's container patches aren't ready for 2.6.23 yet.

So Ingo was proposing we use cpuset as that user interface to manage
task-groups. This will be only for 2.6.23. In 2.6.24, when hopefully Paul
Menage's container patches will be ready and will be merged, the group
cpu scheduler will stop using cpuset as that interface and use the
container infrastructure instead.

If you recall, I have attempted to use cpuset for such an interface in
the past (metered cpusets - see [1]). It brings in some semantic changes
for cpusets, most notably:

	- metered cpusets cannot have grand-children
	- all cpusets under a metered cpuset need to share the same set
	  of cpus.

Is it fine if I introduce these semantic changes, only for 2.6.23 and
only when CONFIG_FAIR_GROUP_SCHED is enabled? This will let the group
cpu scheduler to receive some amount of testing.

The other alternative is to hook up group scheduler with user-id's
(again only for 2.6.23).

> I've a pretty
> good idea what a cpuset is, but don't know what kind of purpose
> you have in mind for such a hook.  Could you say a few words to
> that?  Thanks.

Reference:

1. http://marc.info/?l=linux-kernel&m=115946525811848&w=2


-- 
Regards,
vatsa

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: containers (was Re: -mm merge plans for 2.6.23)
  2007-07-11 10:03               ` Srivatsa Vaddagiri
@ 2007-07-11 10:19                 ` Ingo Molnar
  2007-07-11 11:39                   ` Srivatsa Vaddagiri
  2007-07-11 11:10                 ` Paul Jackson
  1 sibling, 1 reply; 484+ messages in thread
From: Ingo Molnar @ 2007-07-11 10:19 UTC (permalink / raw)
  To: Srivatsa Vaddagiri; +Cc: Paul Jackson, akpm, menage, linux-kernel, containers


* Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> wrote:

> The other alternative is to hook up group scheduler with user-id's 
> (again only for 2.6.23).

could you just try this and send an as simple patch as possible? This is 
actually something that non-container people would be interested in as 
well. (btw., if this goes into 2.6.23 then we cannot possibly turn it 
off in 2.6.24, so it must be sane - but per UID task groups are 
certainly sane, the only question is how to configure the per-UID weight 
after bootup. [the default after-bootup should be something along the 
lines of 'same weight for all users, a bit more for root'.]) This would 
make it possible for users to test that thing. (it would also help 
X-heavy workloads.)

	Ingo

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: testcases, was Re: -mm merge plans for 2.6.23 -- sys_fallocate
  2007-07-11  9:40             ` Nick Piggin
@ 2007-07-11 10:36               ` Michael Kerrisk
  0 siblings, 0 replies; 484+ messages in thread
From: Michael Kerrisk @ 2007-07-11 10:36 UTC (permalink / raw)
  To: Nick Piggin, hch
  Cc: schwidefsky, aarora, ak, linux-kernel, tytso, heiko.carstens, akpm


> Christoph Hellwig wrote:
> > On Tue, Jul 10, 2007 at 11:41:19AM -0700, Andrew Morton wrote:
> > 
> >>I'd support an ununofficial rule that submitters of new syscalls also
> >> raise a patch against LTP, come to that...
> > 
> > 
> > s/ununofficial//, please.  And extend this to every new kernel interface
> > that's not bound to a specific piece of hardware.
> 
> Agree, and cc manpages maintainer too? (preferably write most
> of the manpage body as well, IMO).

Yes, please.  Docs written by, or with input from, the implementer
provide one of the few ways for everyone else to spot differences
between implementation and intention.
-- 
Ist Ihr Browser Vista-kompatibel? Jetzt die neuesten 
Browser-Versionen downloaden: http://www.gmx.net/de/go/browser

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: containers (was Re: -mm merge plans for 2.6.23)
  2007-07-11 10:03               ` Srivatsa Vaddagiri
  2007-07-11 10:19                 ` Ingo Molnar
@ 2007-07-11 11:10                 ` Paul Jackson
  2007-07-11 11:24                   ` Peter Zijlstra
  1 sibling, 1 reply; 484+ messages in thread
From: Paul Jackson @ 2007-07-11 11:10 UTC (permalink / raw)
  To: vatsa; +Cc: mingo, akpm, menage, linux-kernel, containers

Srivatsa wrote:
> So Ingo was proposing we use cpuset as that user interface to manage
> task-groups. This will be only for 2.6.23.

Good explanation - thanks.

In short, the proposal was to use the task partition defined by cpusets
to define CFS task-groups, until the real process containers are
available.

Or, I see in the next message, Ingo responding favorably to your
alternative, using task uid's to partition the tasks into CFS
task-groups.

Yeah, Ingo's preference for using uid's (or gid's ??) sounds right to
me - a sustainable API.

Wouldn't want to be adding a cpuset API for a single 2.6.N release.

... gid's -- why not?

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: containers (was Re: -mm merge plans for 2.6.23)
  2007-07-11 11:10                 ` Paul Jackson
@ 2007-07-11 11:24                   ` Peter Zijlstra
  2007-07-11 11:30                     ` Peter Zijlstra
  0 siblings, 1 reply; 484+ messages in thread
From: Peter Zijlstra @ 2007-07-11 11:24 UTC (permalink / raw)
  To: Paul Jackson; +Cc: vatsa, mingo, akpm, menage, linux-kernel, containers

On Wed, 2007-07-11 at 04:10 -0700, Paul Jackson wrote:
> Srivatsa wrote:
> > So Ingo was proposing we use cpuset as that user interface to manage
> > task-groups. This will be only for 2.6.23.
> 
> Good explanation - thanks.
> 
> In short, the proposal was to use the task partition defined by cpusets
> to define CFS task-groups, until the real process containers are
> available.
> 
> Or, I see in the next message, Ingo responding favorably to your
> alternative, using task uid's to partition the tasks into CFS
> task-groups.
> 
> Yeah, Ingo's preference for using uid's (or gid's ??) sounds right to
> me - a sustainable API.
> 
> Wouldn't want to be adding a cpuset API for a single 2.6.N release.
> 
> .... gid's -- why not?


Or process or process groups, or all of the above :-)

One thing to think on though, we cannot have per process,uid,gid,pgrp
scheduling for one release only. So we'd have to manage interaction with
process containers. It might be that a simple weight multiplication
scheme is good enough:

  weight = uid_weight * pgrp_weight * container_weight

Of course, if we'd only have a single level group scheduler (as was
proposed IIRC) it'd have to create intersection sets (as there might be
non trivial overlaps) based on these various weights and schedule these
resulting sets instead of the initial groupings.



^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: containers (was Re: -mm merge plans for 2.6.23)
  2007-07-11 11:24                   ` Peter Zijlstra
@ 2007-07-11 11:30                     ` Peter Zijlstra
  2007-07-11 13:14                       ` Srivatsa Vaddagiri
  0 siblings, 1 reply; 484+ messages in thread
From: Peter Zijlstra @ 2007-07-11 11:30 UTC (permalink / raw)
  To: Paul Jackson; +Cc: vatsa, mingo, akpm, menage, linux-kernel, containers

On Wed, 2007-07-11 at 13:24 +0200, Peter Zijlstra wrote:
> On Wed, 2007-07-11 at 04:10 -0700, Paul Jackson wrote:
> > Srivatsa wrote:
> > > So Ingo was proposing we use cpuset as that user interface to manage
> > > task-groups. This will be only for 2.6.23.
> > 
> > Good explanation - thanks.
> > 
> > In short, the proposal was to use the task partition defined by cpusets
> > to define CFS task-groups, until the real process containers are
> > available.
> > 
> > Or, I see in the next message, Ingo responding favorably to your
> > alternative, using task uid's to partition the tasks into CFS
> > task-groups.
> > 
> > Yeah, Ingo's preference for using uid's (or gid's ??) sounds right to
> > me - a sustainable API.
> > 
> > Wouldn't want to be adding a cpuset API for a single 2.6.N release.
> > 
> > .... gid's -- why not?
> 
> 
> Or process or process groups, or all of the above :-)
> 
> One thing to think on though, we cannot have per process,uid,gid,pgrp
> scheduling for one release only. So we'd have to manage interaction with
> process containers. It might be that a simple weight multiplication
> scheme is good enough:
> 
>   weight = uid_weight * pgrp_weight * container_weight
> 
> Of course, if we'd only have a single level group scheduler (as was
> proposed IIRC) it'd have to create intersection sets (as there might be
> non trivial overlaps) based on these various weights and schedule these
> resulting sets instead of the initial groupings.

Lets illustrate with some ASCII art:

so we have this dual level weight grouping (uid, container)

uid:          a a a a a b b b b b c c c c c
container:    A A A A A A A B B B B B B B B

set:          1 1 1 1 1 2 2 3 3 3 4 4 4 4 4 

resulting in schedule sets 1,2,3,4

so that (for instance) weight_2 = weight_b * weight_A


^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-10  8:31 -mm merge plans for 2.6.23 Andrew Morton
                   ` (15 preceding siblings ...)
  2007-07-10 20:09 ` -mm merge plans for 2.6.23 Christoph Lameter
@ 2007-07-11 11:35 ` Christoph Hellwig
  2007-07-11 11:39   ` David Woodhouse
  2007-07-11 11:37 ` scsi, was " Christoph Hellwig
                   ` (9 subsequent siblings)
  26 siblings, 1 reply; 484+ messages in thread
From: Christoph Hellwig @ 2007-07-11 11:35 UTC (permalink / raw)
  To: Andrew Morton; +Cc: dwmw2, linux-kernel, linux-fsdevel

On Tue, Jul 10, 2007 at 01:31:52AM -0700, Andrew Morton wrote:
> romfs-printk-format-warnings.patch

NACK on this one.  This bloats romfs by almost half of it's previous
size to add mtd support to it.  Given that romfs is a compltely
trivial filesystem it's much better to have a separate filesystem
driver handling the format on mtd instead of adding all these
indirections.  In addition to that argument the switch on the
underlying subsystem is done horrible.  There's lots of ifdefs instead
of proper functions pointers, there's one file containing both block
and mtd code instead of seaparate files, etc.

And the get_unmapped_area method in a bare filesystem needs a _lot_
of explanation.


^ permalink raw reply	[flat|nested] 484+ messages in thread

* scsi, was Re: -mm merge plans for 2.6.23
  2007-07-10  8:31 -mm merge plans for 2.6.23 Andrew Morton
                   ` (16 preceding siblings ...)
  2007-07-11 11:35 ` Christoph Hellwig
@ 2007-07-11 11:37 ` Christoph Hellwig
  2007-07-11 17:22   ` Andrew Morton
  2007-07-11 11:39 ` buffered write patches, " Christoph Hellwig
                   ` (8 subsequent siblings)
  26 siblings, 1 reply; 484+ messages in thread
From: Christoph Hellwig @ 2007-07-11 11:37 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-scsi

> restore-acpi-change-for-scsi.patch
> git-scsi-misc-vs-greg-sysfs-stuff.patch
> aacraid-rename-check_reset.patch
> scsi-dont-build-scsi_dma_mapunmap-for-has_dma.patch
> drivers-scsi-small-cleanups.patch
> sym53c8xx_2-claims-cpqarray-device.patch
> drivers-scsi-wd33c93c-cleanups.patch
> make-seagate_st0x_detect-static.patch
> pci-error-recovery-symbios-scsi-base-support.patch
> pci-error-recovery-symbios-scsi-first-failure.patch
> drivers-scsi-pcmcia-nsp_csc-remove-kernel-24-code.patch
> drivers-message-i2o-devicec-remove-redundant-gfp_atomic-from-kmalloc.patch
> drivers-scsi-aic7xxx_oldc-remove-redundant-gfp_atomic-from-kmalloc.patch
> use-menuconfig-objects-ii-scsi.patch
> remove-dead-references-to-module_parm-macro.patch
> ppa-coding-police-and-printk-levels.patch
> remove-the-dead-cyberstormiii_scsi-option.patch
> config_scsi_fd_8xx-no-longer-exists.patch
> use-mutex-instead-of-semaphore-in-megaraid-mailbox-driver.patch
> 
>  Sent to James.

Care to drop the patches James NACKed every single time?


^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-11 11:35 ` Christoph Hellwig
@ 2007-07-11 11:39   ` David Woodhouse
  2007-07-11 17:21     ` Andrew Morton
  0 siblings, 1 reply; 484+ messages in thread
From: David Woodhouse @ 2007-07-11 11:39 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Andrew Morton, linux-kernel, linux-fsdevel

On Wed, 2007-07-11 at 13:35 +0200, Christoph Hellwig wrote:
> On Tue, Jul 10, 2007 at 01:31:52AM -0700, Andrew Morton wrote:
> > romfs-printk-format-warnings.patch
> 
> NACK on this one. 

The rest of it is nacked anyway, until we unify the point and
get_unmapped_area methods of the MTD API.

-- 
dwmw2


^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: buffered write patches, -mm merge plans for 2.6.23
  2007-07-10  8:31 -mm merge plans for 2.6.23 Andrew Morton
                   ` (17 preceding siblings ...)
  2007-07-11 11:37 ` scsi, was " Christoph Hellwig
@ 2007-07-11 11:39 ` Christoph Hellwig
  2007-07-11 17:23   ` Andrew Morton
  2007-07-11 11:55 ` Christoph Hellwig
                   ` (7 subsequent siblings)
  26 siblings, 1 reply; 484+ messages in thread
From: Christoph Hellwig @ 2007-07-11 11:39 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-mm

>  pagefault-in-write deadlock fixes.  Will hold for 2.6.24.

Why that?  This stuff has been in forever and is needed at various
levels.  We need this in for anything to move forward on the buffered
write front.


^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: containers (was Re: -mm merge plans for 2.6.23)
  2007-07-11 10:19                 ` Ingo Molnar
@ 2007-07-11 11:39                   ` Srivatsa Vaddagiri
  2007-07-11 11:42                     ` Paul Jackson
  2007-07-11 12:30                     ` Srivatsa Vaddagiri
  0 siblings, 2 replies; 484+ messages in thread
From: Srivatsa Vaddagiri @ 2007-07-11 11:39 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: containers, menage, Paul Jackson, akpm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 2593 bytes --]

On Wed, Jul 11, 2007 at 12:19:58PM +0200, Ingo Molnar wrote:
> * Srivatsa Vaddagiri <vatsa@linux.vnet.ibm.com> wrote:
> 
> > The other alternative is to hook up group scheduler with user-id's 
> > (again only for 2.6.23).
> 
> could you just try this and send an as simple patch as possible? This is 
> actually something that non-container people would be interested in as 
> well. 

Note that interfacing with container infrastructure doesn't preclude the
possibility of doing fair-user scheduling (that a normal university server or 
desktop user would want). All that is needed is a daemon which listens for uid 
change events from kernel (using process-event connector) and moves the task 
(whose uid is changing) to an appropriate container for that user.
Primitive source for such a daemon is attached.

> (btw., if this goes into 2.6.23 then we cannot possibly turn it off in 2.6.24,

The fact that we will have two interface for group scheduler in 2.6.24
is what worries me a bit (one user-id based and other container based).
We would need some mechanism for admin to choose only one interface (and
not both together, otherwise the group definitions may conflict), which
doesn't sound very clean to me.

Ideally I would have liked to hook onto only container infrastructure
and let user-space decide grouping policy (whether user-id based or
something else).

Hmm ..would it help if I maintain a patch outside the mainline which turns on 
fair-user scheduling in 2.6.23-rcX? Folks will have to apply that patch on
top of 2.6.23-rcX to use and test fair-user scheduling.

In 2.6.24, when container infrastructure goes in, people can get
fair-user scheduling off-the-shelf by simply starting the daemon attached
at bootup/initrd time.

Or would you rather prefer that I add user-id based interface
permanently and in 2.6.24 introduce a compile/run-time switch for admin
to select one of the two interfaces (user-id based or container-based)?

> so it must be sane - but per UID task groups are 
> certainly sane, the only question is how to configure the per-UID weight 
> after bootup.

Yeah ..the container based infrastructure allows for configuring such
things very easily using a fs-based interface. In the absence of that,
we either provide some /proc interface or settle for the non-configurable 
default that you mention below.

>  [the default after-bootup should be something along the 
> lines of 'same weight for all users, a bit more for root'.]) This would 
> make it possible for users to test that thing. (it would also help 
> X-heavy workloads.)


-- 
Regards,
vatsa

[-- Attachment #2: cpuctld.c --]
[-- Type: text/plain, Size: 7072 bytes --]

/*
 * cpuctl_group_changer.c
 *
 * Used to change the group of running tasks to the correct
 * uid container.
 *
 * Copyright IBM Corporation, 2007
 * Author: Dhaval Giani <dhaval@linux.vnet.ibm.com>
 * Derived from test_cn_proc.c by Matt Helsley
 * Original copyright notice follows
 *
 * Copyright (C) Matt Helsley, IBM Corp. 2005
 * Derived from fcctl.c by Guillaume Thouvenin
 * Original copyright notice follows:
 *
 * Copyright (C) 2005 BULL SA.
 * Written by Guillaume Thouvenin <guillaume.thouvenin@bull.net>
 *
 * This program is free software; you can redistribute it and/or modify
 * it under the terms of the GNU General Public License as published by
 * the Free Software Foundation; either version 2 of the License, or
 * (at your option) any later version.
 *
 * This program is distributed in the hope that it will be useful,
 * but WITHOUT ANY WARRANTY; without even the implied warranty of
 * MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
 * GNU General Public License for more details.
 *
 * You should have received a copy of the GNU General Public License
 * along with this program; if not, write to the Free Software
 * Foundation, Inc., 59 Temple Place, Suite 330, Boston, MA 02111-1307 USA
 */
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

#include <string.h>

#include <sys/socket.h>
#include <sys/types.h>

#include <sys/stat.h>
#include <sys/param.h>

#include <linux/connector.h>
#include <linux/netlink.h>
#include "linux/cn_proc.h"

#include <errno.h>

#include <signal.h>
#include <setjmp.h>

#define SEND_MESSAGE_LEN (NLMSG_LENGTH(sizeof(struct cn_msg) + \
				       sizeof(enum proc_cn_mcast_op)))
#define RECV_MESSAGE_LEN (NLMSG_LENGTH(sizeof(struct cn_msg) + \
				       sizeof(struct proc_event)))

#define SEND_MESSAGE_SIZE    (NLMSG_SPACE(SEND_MESSAGE_LEN))
#define RECV_MESSAGE_SIZE    (NLMSG_SPACE(RECV_MESSAGE_LEN))

#define max(x,y) ((y)<(x)?(x):(y))
#define min(x,y) ((y)>(x)?(x):(y))

#define BUFF_SIZE (max(max(SEND_MESSAGE_SIZE, RECV_MESSAGE_SIZE), 1024))
#define MIN_RECV_SIZE (min(SEND_MESSAGE_SIZE, RECV_MESSAGE_SIZE))

#define PROC_CN_MCAST_LISTEN (1)
#define PROC_CN_MCAST_IGNORE (2)

/*
 * SIGINT causes the program to exit gracefully
 * this could happen any time after the LISTEN message has
 * been sent
 */
#define INTR_SIG SIGINT


sigjmp_buf g_jmp;
char cpuctl_fs_path[MAXPATHLEN];

void handle_intr (int signum)
{
	siglongjmp(g_jmp, signum);
}

static inline void itos(int i, char* str)
{
	sprintf(str, "%d", i);
}

int set_notify_release(int val)
{
	FILE *f;

	f = fopen("notify_on_release", "r+");
	fprintf(f, "%d\n", val);
	fclose(f);
	return 0;
}

int add_task_pid(int pid)
{
	FILE *f;

	f = fopen("tasks", "a");
	fprintf(f, "%d\n", pid);
	fclose(f);
	return 0;
}

int set_value(char* file, char *str)
{
	FILE *f;

	f=fopen(file, "w");
	fprintf(f, "%s", str);
	fclose(f);
	return 0;
}

int change_group(int pid, int uid)
{
	char str[100];
	int ret;

	ret = chdir(cpuctl_fs_path);
	itos(uid, str);
	ret = mkdir(str, 0777);
	if (ret == -1) {
		/*
		 * If the folder already exists, then it is alright. anything
		 * else should be killed
		 */
		if (errno != EEXIST) {
			perror("mkdir");
			return -1;
		}
	}
	ret = chdir(str);
	if (ret == -1) {
		/*Again, i am just quitting the program!*/
		perror("chdir");
		return -1;
	}
	/*If using cpusets set cpus and mems*
	 *
	 * set_value("cpus","0");
	 * set_value("mems","0");
	 */
	set_notify_release(1);
	add_task_pid(pid);
	return 0;
}

int handle_msg (struct cn_msg *cn_hdr)
{
	struct proc_event *ev;
	int ret;

	ev = (struct proc_event*)cn_hdr->data;

	switch(ev->what){
	case PROC_EVENT_UID:
		printf("UID Change happening\n");
		printf("UID = %d\tPID=%d\n", ev->event_data.id.e.euid,
						ev->event_data.id.process_pid);
		ret = change_group(ev->event_data.id.process_pid,
					ev->event_data.id.r.ruid);
		break;
	case PROC_EVENT_FORK:
	case PROC_EVENT_EXEC:
	case PROC_EVENT_EXIT:
	default:
		break;
	}
	return ret;
}
int main(int argc, char **argv)
{
	int sk_nl;
	int err;
	struct sockaddr_nl my_nla, kern_nla, from_nla;
	socklen_t from_nla_len;
	char buff[BUFF_SIZE];
	int rc = -1;
	struct nlmsghdr *nl_hdr;
	struct cn_msg *cn_hdr;
	enum proc_cn_mcast_op *mcop_msg;
	size_t recv_len = 0;
	FILE *f;

	if (argc == 1)
		strcpy(cpuctl_fs_path, "/dev/cpuctl");
	else
		strcpy(cpuctl_fs_path, argv[1]);
	chdir(cpuctl_fs_path);
	f = fopen("tasks", "r");
	if (f == NULL) {
		printf("Container not mounted at %s\n", cpuctl_fs_path);
		return -1;
	}
	fclose(f);
	f = fopen("notify_on_release", "r");
	if (f == NULL) {
		printf("Container not mounted at %s\n", cpuctl_fs_path);
		return -1;
	}
	fclose(f);
	if (getuid() != 0) {
		printf("Only root can start/stop the fork connector\n");
		return 0;
	}
	/*
	 * Create an endpoint for communication. Use the kernel user
	 * interface device (PF_NETLINK) which is a datagram oriented
	 * service (SOCK_DGRAM). The protocol used is the connector
	 * protocol (NETLINK_CONNECTOR)
	 */
	sk_nl = socket(PF_NETLINK, SOCK_DGRAM, NETLINK_CONNECTOR);
	if (sk_nl == -1) {
		printf("socket sk_nl error");
		return rc;
	}
	my_nla.nl_family = AF_NETLINK;
	my_nla.nl_groups = CN_IDX_PROC;
	my_nla.nl_pid = getpid();

	kern_nla.nl_family = AF_NETLINK;
	kern_nla.nl_groups = CN_IDX_PROC;
	kern_nla.nl_pid = 1;

	err = bind(sk_nl, (struct sockaddr *)&my_nla, sizeof(my_nla));
	if (err == -1) {
		printf("binding sk_nl error");
		goto close_and_exit;
	}
	nl_hdr = (struct nlmsghdr *)buff;
	cn_hdr = (struct cn_msg *)NLMSG_DATA(nl_hdr);
	mcop_msg = (enum proc_cn_mcast_op*)&cn_hdr->data[0];
	printf("sending proc connector: PROC_CN_MCAST_LISTEN... ");
	memset(buff, 0, sizeof(buff));
	*mcop_msg = PROC_CN_MCAST_LISTEN;
	signal(INTR_SIG, handle_intr);
	/* fill the netlink header */
	nl_hdr->nlmsg_len = SEND_MESSAGE_LEN;
	nl_hdr->nlmsg_type = NLMSG_DONE;
	nl_hdr->nlmsg_flags = 0;
	nl_hdr->nlmsg_seq = 0;
	nl_hdr->nlmsg_pid = getpid();
	/* fill the connector header */
	cn_hdr->id.idx = CN_IDX_PROC;
	cn_hdr->id.val = CN_VAL_PROC;
	cn_hdr->seq = 0;
	cn_hdr->ack = 0;
	cn_hdr->len = sizeof(enum proc_cn_mcast_op);
	if (send(sk_nl, nl_hdr, nl_hdr->nlmsg_len, 0) != nl_hdr->nlmsg_len) {
		printf("failed to send proc connector mcast ctl op!\n");
		goto close_and_exit;
	}
	printf("sent\n");
	for(memset(buff, 0, sizeof(buff)), from_nla_len = sizeof(from_nla);
	  ; memset(buff, 0, sizeof(buff)), from_nla_len = sizeof(from_nla)) {
		struct nlmsghdr *nlh = (struct nlmsghdr*)buff;
		memcpy(&from_nla, &kern_nla, sizeof(from_nla));
		recv_len = recvfrom(sk_nl, buff, BUFF_SIZE, 0,
				(struct sockaddr*)&from_nla, &from_nla_len);
		if (recv_len < 1)
			continue;
		while (NLMSG_OK(nlh, recv_len)) {
			cn_hdr = NLMSG_DATA(nlh);
			if (nlh->nlmsg_type == NLMSG_NOOP)
				continue;
			if ((nlh->nlmsg_type == NLMSG_ERROR) ||
			    (nlh->nlmsg_type == NLMSG_OVERRUN))
				break;
			if(handle_msg(cn_hdr)<0) {
				goto close_and_exit;
			}
			if (nlh->nlmsg_type == NLMSG_DONE)
				break;
			nlh = NLMSG_NEXT(nlh, recv_len);
		}
	}
close_and_exit:
	close(sk_nl);
	exit(rc);

	return 0;
}


^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: containers (was Re: -mm merge plans for 2.6.23)
  2007-07-11 11:39                   ` Srivatsa Vaddagiri
@ 2007-07-11 11:42                     ` Paul Jackson
  2007-07-11 12:06                       ` Peter Zijlstra
  2007-07-11 12:30                     ` Srivatsa Vaddagiri
  1 sibling, 1 reply; 484+ messages in thread
From: Paul Jackson @ 2007-07-11 11:42 UTC (permalink / raw)
  To: vatsa; +Cc: mingo, containers, menage, akpm, linux-kernel

Srivatsa wrote:
> The fact that we will have two interface for group scheduler in 2.6.24
> is what worries me a bit (one user-id based and other container based).

Yeah.

One -could- take linear combinations, as Peter drew in his ascii art,
but would one -want- to do that?

I imagine some future time, when users of this wonder why the API is
more complicated than seems necessary, with two factors determining
task-groups where one seems sufficient, and the answer is "the other
factor, user-id's, is just there because we needed it as an interim
mechanism, and then had to keep it, to preserve ongoing compatibility.
That's not a very persuasive justification.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-10  8:31 -mm merge plans for 2.6.23 Andrew Morton
                   ` (18 preceding siblings ...)
  2007-07-11 11:39 ` buffered write patches, " Christoph Hellwig
@ 2007-07-11 11:55 ` Christoph Hellwig
  2007-07-11 12:00 ` fallocate, " Christoph Hellwig
                   ` (6 subsequent siblings)
  26 siblings, 0 replies; 484+ messages in thread
From: Christoph Hellwig @ 2007-07-11 11:55 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

> mutex_unlock-later-in-seq_lseek.patch
> zs-move-to-the-serial-subsystem.patch
> fs-block_devc-use-list_for_each_entry.patch
>
> introduce-o_cloexec-take-2.patch
> o_cloexec-for-scm_rights.patch
>

Umm, Andrew - mixing new userspace interface, compltely rewritten
drivers and simple fixes in a simple misc category doesn't exactly
help reading this list :)

^ permalink raw reply	[flat|nested] 484+ messages in thread

* fallocate, Re: -mm merge plans for 2.6.23
  2007-07-10  8:31 -mm merge plans for 2.6.23 Andrew Morton
                   ` (19 preceding siblings ...)
  2007-07-11 11:55 ` Christoph Hellwig
@ 2007-07-11 12:00 ` Christoph Hellwig
  2007-07-11 12:23 ` lguest, " Christoph Hellwig
                   ` (5 subsequent siblings)
  26 siblings, 0 replies; 484+ messages in thread
From: Christoph Hellwig @ 2007-07-11 12:00 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, linux-fsdevel

> fallocate-implementation-on-i86-x86_64-and-powerpc.patch
> fallocate-on-s390.patch
> fallocate-on-ia64.patch
> fallocate-on-ia64-fix.patch
> 
>  Merge.

Hopefull this will be done during the 2.6.23 merge window, but right now
it's not (yet).

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: containers (was Re: -mm merge plans for 2.6.23)
  2007-07-11 11:42                     ` Paul Jackson
@ 2007-07-11 12:06                       ` Peter Zijlstra
  2007-07-11 17:03                         ` Paul Jackson
  0 siblings, 1 reply; 484+ messages in thread
From: Peter Zijlstra @ 2007-07-11 12:06 UTC (permalink / raw)
  To: Paul Jackson; +Cc: vatsa, mingo, containers, menage, akpm, linux-kernel

On Wed, 2007-07-11 at 04:42 -0700, Paul Jackson wrote:
> Srivatsa wrote:
> > The fact that we will have two interface for group scheduler in 2.6.24
> > is what worries me a bit (one user-id based and other container based).
> 
> Yeah.
> 
> One -could- take linear combinations, as Peter drew in his ascii art,
> but would one -want- to do that?

I'd very much like to have it, but that is just me. We could take a
weight of 0 to mean disabling of that grouping and default to that. That
way it would not complicate regular behaviour.

It could be implemented with a simple hashing scheme where
sched_group_hash(tsk) and sched_group_cmp(tsk, group->some_task) could
be used to identify a schedule group.

pseudo code:

u64 sched_group_hash(struct task_struct *tsk)
{
	u64 hash = 0;

	if (tsk->pid->weight)
		hash_add(&hash, tsk->pid);

	if (tsk->pgrp->weight)
		hash_add(&hash, tsk->pgrp);

	if (tsk->uid->weight)
		hash_add(&hash, tsk->uid);

	if (tsk->container->weight)
		hash_add(&hash, tsk->container);

	...

	return hash;
}

s64 sched_group_cmp(struct task_struct *t1, struct task_struct *t2)
{
	s64 cmp;

	if (t1->pid->weight || t2->pid->weight) {
		cmp = t1->pid->weight - t2->pid->weight;
		if (cmp)
			return cmp;
	}

	...

	return 0;
}

u64 sched_group_weight(struct task_struct *tsk)
{
	u64 weight = 1024; /* 1 fixed point 10 bits */

	if (tsk->pid->weight) {
		weight *= tsk->pid->weight;
		weight /= 1024;
	}

	....

	return weight;
}





^ permalink raw reply	[flat|nested] 484+ messages in thread

* lguest, Re: -mm merge plans for 2.6.23
  2007-07-10  8:31 -mm merge plans for 2.6.23 Andrew Morton
                   ` (20 preceding siblings ...)
  2007-07-11 12:00 ` fallocate, " Christoph Hellwig
@ 2007-07-11 12:23 ` Christoph Hellwig
  2007-07-11 15:45   ` Randy Dunlap
                     ` (2 more replies)
  2007-07-11 12:43 ` x86 status was " Andi Kleen
                   ` (4 subsequent siblings)
  26 siblings, 3 replies; 484+ messages in thread
From: Christoph Hellwig @ 2007-07-11 12:23 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, rusty, linux-mm

> lguest-export-symbols-for-lguest-as-a-module.patch

__put_task_struct is one of those no way in hell should this be exported
things because we don't want modules messing with task lifetimes.

Fortunately I can't find anything actually using this in lguest, so
it looks the issue has been solved in the meantime.


I also have a rather bad feeling about exporting access_process_vm.
This is the proverbial sledge hammer for access to user vm addresses
and I'd rather keep it away from module programmers with "if all
you have is a hammer ..." in mind.

In lguest this is used by send_dma which from my short reading of the
code seems to be the central IPC mechanism.  The double copy here
doesn't look very efficient to me either.  Maybe some VM folks could
look into a better way to archive this that might be both more
efficient and not require the export.


> lguest-the-guest-code.patch
> lguest-the-host-code.patch
> lguest-the-host-code-lguest-vs-clockevents-fix-resume-logic.patch
> lguest-the-asm-offsets.patch
> lguest-the-makefile-and-kconfig.patch
> lguest-the-console-driver.patch
> lguest-the-net-driver.patch
> lguest-the-block-driver.patch
> lguest-the-documentation-example-launcher.patch

Just started to reading this (again) so no useful comment here, but it
would be nice if the code could follow CodingStyle and place the || and
&& at the end of the line in multiline conditionals instead of at the
beginning of the new one.

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [ck] Re: -mm merge plans for 2.6.23
  2007-07-11  1:14     ` [ck] " Andrew Morton
                         ` (3 preceding siblings ...)
  2007-07-11  3:59       ` Grzegorz Kulewski
@ 2007-07-11 12:26       ` Kevin Winchester
  2007-07-11 12:36         ` Jesper Juhl
  2007-07-12 12:06       ` Kacper Wysocki
  5 siblings, 1 reply; 484+ messages in thread
From: Kevin Winchester @ 2007-07-11 12:26 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Matthew Hawkins, Con Kolivas, ck list, Ingo Molnar, Paul Jackson,
	linux-mm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1411 bytes --]

On Tue, 10 Jul 2007 18:14:19 -0700
Andrew Morton <akpm@linux-foundation.org> wrote:

> On Wed, 11 Jul 2007 11:02:56 +1000 "Matthew Hawkins" <darthmdh@gmail.com> wrote:
> 
> > We all know swap prefetch has been tested out the wazoo since Moses was a
> > little boy, is compile-time and runtime selectable, and gives an important
> > and quantifiable performance increase to desktop systems.
> 
> Always interested.  Please provide us more details on your usage and
> testing of that code.  Amount of memory, workload, observed results,
> etc?
> 

I only have 512 MB of memory on my Athlon64 desktop box, and I switch between -mm and mainline kernels regularly.  I have noticed that -mm is always much more responsive, especially first thing in the morning.  I believe this has been due to the new schedulers in -mm (because I notice an improvement in mainline now that CFS has been merged), as well as swap prefetch.  I haven't tested swap prefetch alone to know for sure, but it seems pretty likely.

My workload is compiling kernels, with sylpheed, pidgin and firefox[1] open, and sometimes MonoDevelop if I want to slow my system to a crawl.

I will be getting another 512 MB of RAM at Christmas time, but from the other reports, it seems that swap prefetch will still be useful.

[1] Is there a graphical browser for linux that doesn't suck huge amounts of RAM?

-- 
Kevin Winchester

[-- Attachment #2: Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: containers (was Re: -mm merge plans for 2.6.23)
  2007-07-11 11:39                   ` Srivatsa Vaddagiri
  2007-07-11 11:42                     ` Paul Jackson
@ 2007-07-11 12:30                     ` Srivatsa Vaddagiri
  1 sibling, 0 replies; 484+ messages in thread
From: Srivatsa Vaddagiri @ 2007-07-11 12:30 UTC (permalink / raw)
  To: Ingo Molnar; +Cc: containers, menage, Paul Jackson, akpm, linux-kernel

On Wed, Jul 11, 2007 at 05:09:53PM +0530, Srivatsa Vaddagiri wrote:
> > (btw., if this goes into 2.6.23 then we cannot possibly turn it off in 2.6.24,
> 
> The fact that we will have two interface for group scheduler in 2.6.24
> is what worries me a bit (one user-id based and other container based).

I know breaking user-interface is a bad thing across releases. But in
this particular case, it's probably ok (since fair-group scheduling is a
brand new feature in Linux)?

If we have that option of breaking API between 2.6.23 and 2.6.24
for fair-group scheduler, then we are in a much more flexible position.

For 2.6.23, I can send a user-id based interface for fair-group
scheduler (with some /proc interface to tune group nice value).

For 2.6.24, this user-id interface will be removed and we will instead
switch to container based interface. Fair-user scheduling will continue
to work, its just that users will have to use a daemon (sources sent in
previous mail) to enable it on top of container-based interface.

Hmm..?

-- 
Regards,
vatsa

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [ck] Re: -mm merge plans for 2.6.23
  2007-07-11 12:26       ` Kevin Winchester
@ 2007-07-11 12:36         ` Jesper Juhl
  0 siblings, 0 replies; 484+ messages in thread
From: Jesper Juhl @ 2007-07-11 12:36 UTC (permalink / raw)
  To: Kevin Winchester
  Cc: Andrew Morton, Matthew Hawkins, Con Kolivas, ck list,
	Ingo Molnar, Paul Jackson, linux-mm, linux-kernel

On 11/07/07, Kevin Winchester <kjwinchester@gmail.com> wrote:
[snip]
>
> [1] Is there a graphical browser for linux that doesn't suck huge amounts of RAM?
>

Dillo (http://www.dillo.org/) is really really tiny , a memory
footprint somewhere in the hundreds of K area IIRC.

links 2 (http://links.twibright.com/) has a graphical mode in addition
to the traditional text only mode (links -g) and the memory footprint
is really tiny.

-- 
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please      http://www.expita.com/nomime.html

^ permalink raw reply	[flat|nested] 484+ messages in thread

* x86 status was Re: -mm merge plans for 2.6.23
  2007-07-10  8:31 -mm merge plans for 2.6.23 Andrew Morton
                   ` (21 preceding siblings ...)
  2007-07-11 12:23 ` lguest, " Christoph Hellwig
@ 2007-07-11 12:43 ` Andi Kleen
  2007-07-11 17:33   ` Jesse Barnes
                     ` (3 more replies)
  2007-07-11 23:03 ` generic clockevents/ (hr)time(r) patches " Thomas Gleixner
                   ` (3 subsequent siblings)
  26 siblings, 4 replies; 484+ messages in thread
From: Andi Kleen @ 2007-07-11 12:43 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, tglx, jeremy, Tim Hockin, jesse.barnes

Andrew Morton <akpm@linux-foundation.org> writes:

> revert-x86_64-mm-verify-cpu-rename.patch
> add-kstrndup-fix.patch
> xen-build-fix.patch
> fix-x86_64-numa-fake-apicid_to_node-mapping-for-fake-numa-2.patch
> fix-x86_64-mm-xen-xen-smp-guest-support.patch
> more-fix-x86_64-mm-xen-xen-smp-guest-support.patch

> fix-x86_64-mm-sched-clock-share.patch
> fix-x86_64-mm-xen-add-xen-virtual-block-device-driver.patch
> fix-x86_64-mm-add-common-orderly_poweroff.patch
> fix-x86_64-mm-xen-xen-event-channels.patch
> arch-i386-xen-mmuc-must-include-linux-schedh.patch
> tidy-up-usermode-helper-waiting-a-bit-fix.patch
> update-x86_64-mm-xen-use-iret-directly-where-possible.patch


Xen is probably going to be merged. I'm still not fully happy
about the review status of the drivers and xenbus, but there doesn't seem
to be much value in delaying it further.

I'll consolidate the fixes and fixes-to-fixes.


These all need re-review:

> i386-add-support-for-picopower-irq-router.patch
> make-arch-i386-kernel-setupcremapped_pgdat_init-static.patch
> arch-i386-kernel-i8253c-should-include-asm-timerh.patch
> make-arch-i386-kernel-io_apicctimer_irq_works-static-again.patch
> quicklist-support-for-x86_64.patch
> x86_64-extract-helper-function-from-e820_register_active_regions.patch
> x86_64-fix-e820_hole_size-based-on-address-ranges.patch
> x86_64-acpi-disable-srat-when-numa-emulation-succeeds.patch
> x86_64-slit-fake-pxm-to-node-mapping-for-fake-numa-2.patch
> x86_64-numa-fake-apicid_to_node-mapping-for-fake-numa-2.patch
> x86-use-elfnoteh-to-generate-vsyscall-notes-fix.patch
> mmconfig-x86_64-i386-insert-unclaimed-mmconfig-resources.patch
> x86_64-fix-smp_call_function_single-return-value.patch
> x86_64-o_excl-on-dev-mcelog.patch
> x86_64-support-poll-on-dev-mcelog.patch

It's still not clear to me this is any useful. The current code
can run a program on MCE which should be really fast enough
for machine check handling.

> i386-fix-machine-rebooting.patch
> x86-fix-section-mismatch-warnings-in-mtrr.patch
> x86_64-ratelimit-segfault-reporting-rate.patch

I think that one was bogus.

> x86_64-pm_trace-support.patch
> make-alt-sysrq-p-display-the-debug-register-contents.patch
> i386-flush_tlb_kernel_range-add-reference-to-the-arguments.patch
> round_jiffies-for-i386-and-x86-64-non-critical-corrected-mce-polling.patch
> pci-disable-decode-of-io-memory-during-bar-sizing.patch
> mmconfig-validate-against-acpi-motherboard-resources.patch
> x86_64-irq-check-remote-irr-bit-before-migrating-level-triggered-irq-v3.patch
> i386-remove-support-for-the-rise-cpu.patch

> i386-make-arch-i386-mm-pgtablecpgd_cdtor-static.patch
> i386-fix-section-mismatch-warning-in-intel_cacheinfo.patch
> i386-do-not-restore-reserved-memory-after-hibernation.patch
> paravirt-helper-to-disable-all-io-space-fix.patch
> dmi_match-patch-in-rebootc-for-sff-dell-optiplex-745-fixes-hang.patch
> i386-hpet-check-if-the-counter-works.patch
> i386-trim-memory-not-covered-by-wb-mtrrs.patch

Might need more testing? 

More review:

> kprobes-x86_64-fix-for-mark-ro-data.patch
> kprobes-i386-fix-for-mark-ro-data.patch
> divorce-config_x86_pae-from-config_highmem64g.patch
> remove-unneeded-test-of-task-in-dump_trace.patch
> i386-move-the-kernel-to-16mb-for-numa-q.patch
> i386-show-unhandled-signals.patch
> i386-minor-nx-handling-adjustment.patch
> x86-smp-alt-once-option-is-only-useful-with-hotplug_cpu.patch
> x86-64-remove-unused-variable-maxcpus.patch
> move-functions-declarations-to-header-file.patch
> x86_64-during-vm-oom-condition.patch
> i386-during-vm-oom-condition.patch
> x86-64-disable-the-gart-in-shutdown.patch
> x86_84-move-iommu-declaration-from-proto-to-iommuh.patch
> i386-uaccessh-replace-hard-coded-constant-with-appropriate-macro-from-kernelh.patch
> i386-add-cpu_relax-to-cmos_lock.patch
> x86_64-flush_tlb_kernel_range-warning-fix.patch
> x86_64-add-ioapic-nmi-support.patch
> x86_64-change-_map_single-to-static-in-pci_gartc-etc.patch
> x86_64-geode-hw-random-number-generator-depend-on-x86_3.patch
> x86_64-fix-wrong-comment-regarding-set_fixmap.patch
> arch-x86_64-kernel-processc-lower-printk-severity.patch
> nohz-fix-nohz-x86-dyntick-idle-handling.patch
> acpi-move-timer-broadcast-and-pmtimer-access-before-c3-arbiter-shutdown.patch
> clockevents-fix-typo-in-acpi_pmc.patch
> timekeeping-fixup-shadow-variable-argument.patch
> timerc-cleanup-recently-introduced-whitespace-damage.patch
> clockevents-remove-prototypes-of-removed-functions.patch
> clockevents-fix-resume-logic.patch
> clockevents-fix-device-replacement.patch
> tick-management-spread-timer-interrupt.patch
> highres-improve-debug-output.patch
> hrtimer-speedup-hrtimer_enqueue.patch
> pcspkr-use-the-global-pit-lock.patch
> ntp-move-the-cmos-update-code-into-ntpc.patch
> i386-pit-stop-only-when-in-periodic-or-oneshot-mode.patch
> i386-remove-volatile-in-apicc.patch
> i386-hpet-assumes-boot-cpu-is-0.patch
> i386-move-pit-function-declarations-and-constants-to-correct-header-file.patch
> x86_64-untangle-asm-hpeth-from-asm-timexh.patch
> x86_64-use-generic-cmos-update.patch
> x86_64-remove-dead-code-and-other-janitor-work-in-tscc.patch
> x86_64-fix-apic-typo.patch
> x86_64-convert-to-cleckevents.patch
> acpi-remove-the-useless-ifdef-code.patch
> x86_64-hpet-restore-vread.patch
> x86_64-restore-restore-nohpet-cmdline.patch
> x86_64-block-irq-balancing-for-timer.patch
> x86_64-prep-idle-loop-for-dynticks.patch
> x86_64-enable-high-resolution-timers-and-dynticks.patch
> x86_64-dynticks-disable-hpet_id_legsup-hpets.patch


I'm sceptical about the dynticks code. It just rips out the
x86-64 timing code completely, which needs a lot more review and testing.
Probably not .23

More review: 

> xen-fix-x86-config-dependencies.patch
> x86_64-get-mp_bus_to_node-as-early.patch
> xen-suppress-abs-symbol-warnings-for-unused-reloc-pointers.patch
> xen-cant-support-numa-yet.patch
> x86-fix-iounmaps-use-of-vm_structs-size-field.patch
> arch-x86_64-kernel-aperturec-lower-printk-severity.patch
> arch-x86_64-kernel-e820c-lower-printk-severity.patch
> ich-force-hpet-make-generic-time-capable-of-switching-broadcast-timer.patch
> ich-force-hpet-restructure-hpet-generic-clock-code.patch
> ich-force-hpet-ich7-or-later-quirk-to-force-detect-enable.patch
> ich-force-hpet-late-initialization-of-hpet-after-quirk.patch
> ich-force-hpet-ich5-quirk-to-force-detect-enable.patch
> ich-force-hpet-ich5-fix-a-bug-with-suspend-resume.patch
> ich-force-hpet-add-ich7_0-pciid-to-quirk-list.patch
> geode-basic-infrastructure-support-for-amd-geode-class.patch
> geode-mfgpt-support-for-geode-class-machines.patch
> geode-mfgpt-clock-event-device-support.patch
> i386-x86_64-insert-hpet-firmware-resource-after-pci-enumeration-has-completed.patch
> i386-ioapic-remove-old-irq-balancing-debug-cruft.patch
> i386-deactivate-the-test-for-the-dead-config_debug_page_type.patch

-Andi

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: containers (was Re: -mm merge plans for 2.6.23)
  2007-07-11 11:30                     ` Peter Zijlstra
@ 2007-07-11 13:14                       ` Srivatsa Vaddagiri
  0 siblings, 0 replies; 484+ messages in thread
From: Srivatsa Vaddagiri @ 2007-07-11 13:14 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Paul Jackson, akpm, linux-kernel, containers, menage, mingo

On Wed, Jul 11, 2007 at 01:30:40PM +0200, Peter Zijlstra wrote:
> > One thing to think on though, we cannot have per process,uid,gid,pgrp
> > scheduling for one release only. So we'd have to manage interaction with
> > process containers. It might be that a simple weight multiplication
> > scheme is good enough:
> > 
> >   weight = uid_weight * pgrp_weight * container_weight

We would need something like this to flatten hierarchy, so that for
example it is possible to do fair-container scheduling +
fair-user/process scheduling inside a container using a hierarchy depth of 
just 1 (containers) that core scheduler understands. We discussed this a bit at
http://marc.info/?l=linux-kernel&m=118054481416140&w=2 and is very much
on my todo list to experiment with.

> > Of course, if we'd only have a single level group scheduler (as was
> > proposed IIRC) it'd have to create intersection sets (as there might be
> > non trivial overlaps) based on these various weights and schedule these
> > resulting sets instead of the initial groupings.
> 
> Lets illustrate with some ASCII art:
> 
> so we have this dual level weight grouping (uid, container)
> 
> uid:          a a a a a b b b b b c c c c c
> container:    A A A A A A A B B B B B B B B
> 
> set:          1 1 1 1 1 2 2 3 3 3 4 4 4 4 4 
> 
> resulting in schedule sets 1,2,3,4

Wouldn't it be simpler if admin created these sets as containers
directly? i.e:


uid:          a a a a a b b b b b c c c c c
container:    1 1 1 1 1 2 2 3 3 3 4 4 4 4 4

That way scheduler will not have to "guess" such intersecting schedulable
sets/groups. It seems much simpler to me this way.

Surely there is some policy which is driving some tasks of userid 
'b' to be in container A and some to be in B. It should be trivial
enough to hook onto that policy making script and create separate
containers like above.

> so that (for instance) weight_2 = weight_b * weight_A



-- 
Regards,
vatsa

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: fallocate-implementation-on-i86-x86_64-and-powerpc.patch
  2007-07-11  0:50           ` fallocate-implementation-on-i86-x86_64-and-powerpc.patch Paul Mackerras
@ 2007-07-11 15:39             ` Theodore Tso
  2007-07-11 18:47               ` fallocate-implementation-on-i86-x86_64-and-powerpc.patch Heiko Carstens
  0 siblings, 1 reply; 484+ messages in thread
From: Theodore Tso @ 2007-07-11 15:39 UTC (permalink / raw)
  To: Paul Mackerras
  Cc: Andrew Morton, Jeff Garzik, linux-kernel, Amit Arora, Andi Kleen,
	Benjamin Herrenschmidt, Arnd Bergmann, Luck, Tony,
	Heiko Carstens, Martin Schwidefsky, Mark Fasheh, linux-arch

On Wed, Jul 11, 2007 at 10:50:49AM +1000, Paul Mackerras wrote:
> > On Wed, 11 Jul 2007 09:27:40 +1000
> > Paul Mackerras <paulus@samba.org> wrote:
> > 
> > > We did come up with an order that worked for everybody, but that
> > > discussion seemed to get totally ignored by the ext4 developers.

Well, in the end it was a toss-up between

asmlinkage long sys_fallocate(int fd, int mode, loff_t offset, loff_t len)

and 

asmlinkage long sys_fallocate(loff_t offset, loff_t len, int fd, int mode)

There were a number of folks who preferred having int fd first, and I
*thought* Amit had gotten agreement from either Martin or Heiko that
it was ok to do this as an exception, even though it was extra work
for that arch.  But if not, we can try going back the second
alternative, or even the 6 32-bits args (off_high, off_low, len_high,
len_low) approach, but I think that drew even more fire. 

Basically, no one approach made everyone happy, and at the end of the
day sometimes you have to choose.  I thought we had settled this in
May with something that people could live with, but if we need to
reopen the discussion, better now than later.....

						- Ted

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: lguest, Re: -mm merge plans for 2.6.23
  2007-07-11 12:23 ` lguest, " Christoph Hellwig
@ 2007-07-11 15:45   ` Randy Dunlap
  2007-07-11 18:04   ` Andrew Morton
  2007-07-12  1:21   ` Rusty Russell
  2 siblings, 0 replies; 484+ messages in thread
From: Randy Dunlap @ 2007-07-11 15:45 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Andrew Morton, linux-kernel, rusty, linux-mm

On Wed, 11 Jul 2007 14:23:24 +0200 Christoph Hellwig wrote:

...

> > lguest-the-guest-code.patch
> > lguest-the-host-code.patch
> > lguest-the-host-code-lguest-vs-clockevents-fix-resume-logic.patch
> > lguest-the-asm-offsets.patch
> > lguest-the-makefile-and-kconfig.patch
> > lguest-the-console-driver.patch
> > lguest-the-net-driver.patch
> > lguest-the-block-driver.patch
> > lguest-the-documentation-example-launcher.patch
> 
> Just started to reading this (again) so no useful comment here, but it
> would be nice if the code could follow CodingStyle and place the || and
> && at the end of the line in multiline conditionals instead of at the
> beginning of the new one.

I prefer them at the ends of lines also, but that's not in CodingStyle,
it's just how we do it most of the time (so "coding style", without
caps).

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans -- lumpy reclaim
  2007-07-11  9:34   ` Re: -mm merge plans -- lumpy reclaim Andy Whitcroft
@ 2007-07-11 16:46     ` Andrew Morton
  2007-07-11 18:38       ` Andy Whitcroft
  2007-07-16 10:37       ` Mel Gorman
  0 siblings, 2 replies; 484+ messages in thread
From: Andrew Morton @ 2007-07-11 16:46 UTC (permalink / raw)
  To: Andy Whitcroft
  Cc: linux-kernel, Mel Gorman, Christoph Lameter, Peter Zijlstra

On Wed, 11 Jul 2007 10:34:31 +0100 Andy Whitcroft <apw@shadowen.org> wrote:

> [Seems a PEBKAC occured on the subject line, resending lest it become a
> victim of "oh thats spam".]
> 
> Andy Whitcroft wrote:
> > Andrew Morton wrote:
> > 
> > [...]
> >> lumpy-reclaim-v4.patch
> >> have-kswapd-keep-a-minimum-order-free-other-than-order-0.patch
> >> only-check-absolute-watermarks-for-alloc_high-and-alloc_harder-allocations.patch
> >>
> >>  Lumpy reclaim.  In a similar situation to Mel's patches.  Stuck due to
> >>  general lack or interest and effort.
> > 
> > The lumpy reclaim patches originally came out of work to support Mel's
> > anti-fragmentation work.  As such I think they have become somewhat
> > attached to those patches.  Whilst lumpy is most effective where
> > placement controls are in place as offered by Mel's work, we see benefit
> > from reduction in the "blunderbuss" effect when we reclaim at higher
> > orders.  While placement control is pretty much required for the very
> > highest orders such as huge page size, lower order allocations are
> > benefited in terms of lower collateral damage.
> > 
> > There are now a few areas other than huge page allocations which can
> > benefit.  Stacks are still order 1.  Jumbo frames want higher order
> > contiguous pages for there incoming hardware buffers.  SLUB is showing
> > performance benefits from moving to a higher allocation order.  All of
> > these should benefit from more aggressive targeted reclaim, indeed I
> > have been surprised just how often my test workloads trigger lumpy at
> > order 1 to get new stacks.
> > 
> > Truly representative work loads are hard to generate for some of these.
> >  Though we have heard some encouraging noises from those who can
> > reproduce these problems.

I'd expect that the main application for lumpy-reclaim is in keeping a pool
of order-2 (say) pages in reserve for GFP_ATOMIC allocators.  ie: jumbo
frames.

At present this relies upon the wakeup_kswapd(..., order) mechanism.

How effective is this at solving the jumbo frame problem?

(And do we still have a jumbo frame problem?  Reports seems to have subsided)

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: ata and netdev (was Re: -mm merge plans for 2.6.23)
  2007-07-10 18:24   ` Andrew Morton
                       ` (2 preceding siblings ...)
  2007-07-10 20:31     ` Sergei Shtylyov
@ 2007-07-11 16:47     ` Dan Faerch
  3 siblings, 0 replies; 484+ messages in thread
From: Dan Faerch @ 2007-07-11 16:47 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Jeff Garzik, linux-kernel, IDE/ATA development list, netdev,
	Tejun Heo, Alan Cox, Deepak Saxena, Benjamin LaHaise

Andrew Morton wrote:
> drivers-net-ns83820c-add-paramter-to-disable-auto.patch:
>
> See comments in changelog: ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.22-rc6/2.6.22-rc6-mm1/broken-out/drivers-net-ns83820c-add-paramter-to-disable-auto.patch
>
> Dan, Ben: is there any prospect of progress here?
Mmm.. Ben had 2 comments last year:

In regards to the ethtool stuff i coded:

> This part is good, although doing something for copper cards needs doing, 

I know very little about hardware and only own the fiber version of this 
card. Even if i tried to make code for the copper version, it would 
probably blow it up the phy and set the switches on fire ;).

And in regards to the '"disable_autoneg" module argument':

> This is the part I disagree with.  Are you sure it isn't a bug in the 
> link autonegotiation state machine for fibre cards?  It should be defaulting 
> to 1Gbit/full duplex if no autonegotiation is happening, and if it isn't 
> then that should be fixed instead of papering over things with a config 
> option.

This is pretty much Russian to me. 
I wouldnt know where to find the "link-autonegotiation-state-machine-for-fibre-cards" or know what to do with it anyway :).

The "disable_autoneg" is a convenient feature (for me and the other guy who made the same patch last year) and i consider it a harmless feature in every way.
It is simply an 'if'-statement, that skips the "start autoneg" function upon load.
We can simply remove the feature entirely if it is deemed undesirable.



So in conclusion:
- I vote "use the patch as-is", but im fine with it being changed. 
- If it needs support for copper, someone else has to code it.


Regards
- Dan


^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: containers (was Re: -mm merge plans for 2.6.23)
  2007-07-11 12:06                       ` Peter Zijlstra
@ 2007-07-11 17:03                         ` Paul Jackson
  2007-07-11 18:47                           ` Peter Zijlstra
  0 siblings, 1 reply; 484+ messages in thread
From: Paul Jackson @ 2007-07-11 17:03 UTC (permalink / raw)
  To: Peter Zijlstra; +Cc: vatsa, mingo, containers, menage, akpm, linux-kernel

Peter wrote:
> I'd very much like to have it, but that is just me.

Why? [linear combinations of uid, container, pid, pgrp weighting]

You provide some implementation details and complications, but no
motivation that I noticed.

Well ... a little motivation ... "just me", which would go a long
way of your first name was Linus.  For the rest of us ... ;).

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-11 11:39   ` David Woodhouse
@ 2007-07-11 17:21     ` Andrew Morton
  2007-07-11 17:28       ` Randy Dunlap
  0 siblings, 1 reply; 484+ messages in thread
From: Andrew Morton @ 2007-07-11 17:21 UTC (permalink / raw)
  To: David Woodhouse
  Cc: Christoph Hellwig, linux-kernel, linux-fsdevel, David Howells

On Wed, 11 Jul 2007 12:39:42 +0100 David Woodhouse <dwmw2@infradead.org> wrote:

> On Wed, 2007-07-11 at 13:35 +0200, Christoph Hellwig wrote:
> > On Tue, Jul 10, 2007 at 01:31:52AM -0700, Andrew Morton wrote:
> > > romfs-printk-format-warnings.patch
> > 
> > NACK on this one. 
> 
> The rest of it is nacked anyway, until we unify the point and
> get_unmapped_area methods of the MTD API.
> 

Methinks you meant
nommu-make-it-possible-for-romfs-to-use-mtd-devices.patch, not
romfs-printk-format-warnings.patch.

I'll drop nommu-make-it-possible-for-romfs-to-use-mtd-devices.patch, thamks.

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: scsi, was Re: -mm merge plans for 2.6.23
  2007-07-11 11:37 ` scsi, was " Christoph Hellwig
@ 2007-07-11 17:22   ` Andrew Morton
  0 siblings, 0 replies; 484+ messages in thread
From: Andrew Morton @ 2007-07-11 17:22 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-kernel, linux-scsi

On Wed, 11 Jul 2007 13:37:18 +0200 Christoph Hellwig <hch@lst.de> wrote:

> > restore-acpi-change-for-scsi.patch
> > git-scsi-misc-vs-greg-sysfs-stuff.patch
> > aacraid-rename-check_reset.patch
> > scsi-dont-build-scsi_dma_mapunmap-for-has_dma.patch
> > drivers-scsi-small-cleanups.patch
> > sym53c8xx_2-claims-cpqarray-device.patch
> > drivers-scsi-wd33c93c-cleanups.patch
> > make-seagate_st0x_detect-static.patch
> > pci-error-recovery-symbios-scsi-base-support.patch
> > pci-error-recovery-symbios-scsi-first-failure.patch
> > drivers-scsi-pcmcia-nsp_csc-remove-kernel-24-code.patch
> > drivers-message-i2o-devicec-remove-redundant-gfp_atomic-from-kmalloc.patch
> > drivers-scsi-aic7xxx_oldc-remove-redundant-gfp_atomic-from-kmalloc.patch
> > use-menuconfig-objects-ii-scsi.patch
> > remove-dead-references-to-module_parm-macro.patch
> > ppa-coding-police-and-printk-levels.patch
> > remove-the-dead-cyberstormiii_scsi-option.patch
> > config_scsi_fd_8xx-no-longer-exists.patch
> > use-mutex-instead-of-semaphore-in-megaraid-mailbox-driver.patch
> > 
> >  Sent to James.
> 
> Care to drop the patches James NACKed every single time?

I'm not aware of any which fit that description.

There may be a couple in there which fix real bugs in an unapproved way. 
But I keep such patches as a matter of policy, so people keep on getting
pestered about their bugs.


^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: buffered write patches, -mm merge plans for 2.6.23
  2007-07-11 11:39 ` buffered write patches, " Christoph Hellwig
@ 2007-07-11 17:23   ` Andrew Morton
  0 siblings, 0 replies; 484+ messages in thread
From: Andrew Morton @ 2007-07-11 17:23 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-kernel, linux-mm, Nick Piggin

On Wed, 11 Jul 2007 13:39:44 +0200 Christoph Hellwig <hch@lst.de> wrote:

> >  pagefault-in-write deadlock fixes.  Will hold for 2.6.24.
> 
> Why that?

At Nick's request.  More work is needed and the code hasn't had a lot of
testing/thought/exposure/review.

>  This stuff has been in forever and is needed at various
> levels.  We need this in for anything to move forward on the buffered
> write front.

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-11 17:21     ` Andrew Morton
@ 2007-07-11 17:28       ` Randy Dunlap
  0 siblings, 0 replies; 484+ messages in thread
From: Randy Dunlap @ 2007-07-11 17:28 UTC (permalink / raw)
  To: Andrew Morton
  Cc: David Woodhouse, Christoph Hellwig, linux-kernel, linux-fsdevel,
	David Howells

On Wed, 11 Jul 2007 10:21:03 -0700 Andrew Morton wrote:

> On Wed, 11 Jul 2007 12:39:42 +0100 David Woodhouse <dwmw2@infradead.org> wrote:
> 
> > On Wed, 2007-07-11 at 13:35 +0200, Christoph Hellwig wrote:
> > > On Tue, Jul 10, 2007 at 01:31:52AM -0700, Andrew Morton wrote:
> > > > romfs-printk-format-warnings.patch
> > > 
> > > NACK on this one. 
> > 
> > The rest of it is nacked anyway, until we unify the point and
> > get_unmapped_area methods of the MTD API.
> > 
> 
> Methinks you meant
> nommu-make-it-possible-for-romfs-to-use-mtd-devices.patch, not
> romfs-printk-format-warnings.patch.
> 
> I'll drop nommu-make-it-possible-for-romfs-to-use-mtd-devices.patch, thamks.

Thanks.  I was certainly getting confused.

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: x86 status was Re: -mm merge plans for 2.6.23
  2007-07-11 12:43 ` x86 status was " Andi Kleen
@ 2007-07-11 17:33   ` Jesse Barnes
  2007-07-11 17:42   ` Ingo Molnar
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 484+ messages in thread
From: Jesse Barnes @ 2007-07-11 17:33 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Andrew Morton, linux-kernel

> > i386-trim-memory-not-covered-by-wb-mtrrs.patch
>
> Might need more testing?

For the mtrr trim patch at least, I think the coverage we've received 
in -mm is probably sufficient (the failure mode would be fairly 
obvious).  The only thing I'm nervous about is adding AMD support for 
the quirk, since I don't have any way of testing it.  We can easily add 
that later though, if a tester steps forward or we see demand for it 
(should just be an extra conditional in the trim code).

Thanks,
Jesse

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: x86 status was Re: -mm merge plans for 2.6.23
  2007-07-11 12:43 ` x86 status was " Andi Kleen
  2007-07-11 17:33   ` Jesse Barnes
@ 2007-07-11 17:42   ` Ingo Molnar
  2007-07-11 21:02     ` Randy Dunlap
                       ` (2 more replies)
  2007-07-11 18:14   ` Jeremy Fitzhardinge
  2007-07-12 19:33   ` Christoph Lameter
  3 siblings, 3 replies; 484+ messages in thread
From: Ingo Molnar @ 2007-07-11 17:42 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Andrew Morton, linux-kernel, Thomas Gleixner, Arjan van de Ven,
	Linus Torvalds, Chris Wright


* Andi Kleen <andi@firstfloor.org> wrote:

> > clockevents-fix-typo-in-acpi_pmc.patch
> > timekeeping-fixup-shadow-variable-argument.patch
> > timerc-cleanup-recently-introduced-whitespace-damage.patch
> > clockevents-remove-prototypes-of-removed-functions.patch
> > clockevents-fix-resume-logic.patch
> > clockevents-fix-device-replacement.patch
> > tick-management-spread-timer-interrupt.patch
> > highres-improve-debug-output.patch
> > hrtimer-speedup-hrtimer_enqueue.patch
> > pcspkr-use-the-global-pit-lock.patch
> > ntp-move-the-cmos-update-code-into-ntpc.patch
> > i386-pit-stop-only-when-in-periodic-or-oneshot-mode.patch
> > i386-remove-volatile-in-apicc.patch
> > i386-hpet-assumes-boot-cpu-is-0.patch
> > i386-move-pit-function-declarations-and-constants-to-correct-header-file.patch
> > x86_64-untangle-asm-hpeth-from-asm-timexh.patch
> > x86_64-use-generic-cmos-update.patch
> > x86_64-remove-dead-code-and-other-janitor-work-in-tscc.patch
> > x86_64-fix-apic-typo.patch
> > x86_64-convert-to-cleckevents.patch
> > acpi-remove-the-useless-ifdef-code.patch
> > x86_64-hpet-restore-vread.patch
> > x86_64-restore-restore-nohpet-cmdline.patch
> > x86_64-block-irq-balancing-for-timer.patch
> > x86_64-prep-idle-loop-for-dynticks.patch
> > x86_64-enable-high-resolution-timers-and-dynticks.patch
> > x86_64-dynticks-disable-hpet_id_legsup-hpets.patch
> 
> I'm sceptical about the dynticks code. It just rips out the x86-64 
> timing code completely, which needs a lot more review and testing. 
> Probably not .23

What you just did here is a slap in the face to a lot of contributors 
who worked hard on this code :(

Let me tell you about the history of this project first. Arjan wrote the 
first version of it a year ago, and it was added to -rt and tested there 
by many people and went through many iterations and fixes. Chris Wright 
then created a x86_64 clockevents cleanup and dynticks enabling patchset 
from it this spring and sent it to lkml three and a half months ago, on 
March 31:

    http://lwn.net/Articles/229094/

Thomas, the high resolution timers and clockevents maintainer, 
immediately picked up Chris' splitup/splitout/cleanup work and fixed and 
extended it, and sent a first cut to lkml on May 6th:

    http://lwn.net/Articles/233226/

Thomas then sent an updated version of the x86_64 clockevents cleanup 
and dynticks code to lkml (on June 10th), for a second round of review:

    http://lwn.net/Articles/237687/

As Thomas stated it in his submission:

  " The patch set has been tested in the -hrt and -rt trees for quite a 
    while and the initial problems have been sorted out. Thanks to the 
    folks from the PowerTop project for testing and feedback. "

Then on June 16th Thomas sent the third series:

    http://lwn.net/Articles/238834/

(which too was in -rt and was tested there on numerous machines. It was 
also added to -mm.)

Then on June 23rd Thomas sent the fourth series of the x86_64 
clockevents and dynticks code:

    http://lwn.net/Articles/239620/

We finally have someone (Thomas) with core kernel clue who actually 
_cares_ about the x86 time code and does not see it as an ugly chore, 
one who collects the right patches and maintains the -hrt tree and 
co-maintains the -rt tree and interacts with other contributors. What he 
did was _hard_ to do but we are making really good progress:

   http://lkml.org/lkml/2007/7/5/242

   " All in all, personally I'm very happy to see Linux making such a 
     huge step forward with tickless and can't wait for this step to be 
     available in all distros and for all architectures... "

Yes, touching the time code is a pain because both the hardware and the 
kernel has skeletons hidden in the closet, but we are mapping them one 
by one, and we already fixed several kernel skeletons in the process. 
The code is in -mm and there is no open regression related to this queue 
of patches at the moment.

But what is curiously absent from all this positive and dynamic activity 
around these patches on lkml? There is not a single email from Andi 
Kleen, the official maintainer of the x86_64 tree directly reacting to 
this submission in any way, shape or form. Not a single email from you 
thanking Arjan, Chris and Thomas for this amount of cleanup to the 
architecture you are maintaining:

   31 files changed, 698 insertions(+), 1367 deletions(-)

Not a single email from you reviewing the patchset in any meaningful 
way. Not a single email from you to indicate that you even did so much 
as boot into this patchset.

What contribution do we have from you instead? A week before the .23 
merge window is closed, in the very last possible moment, we finally get 
your first reaction to this patchset, albeit in the form of three terse 
sentences:

  > I'm sceptical about the dynticks code. It just rips out the x86-64 
  > timing code completely, which needs a lot more review and testing. 
  > Probably not .23

In the past 3+ months there was not a single email from you indicating 
that you are "doubtful" about this submission, and that you think that 
it needs "lot more review and testing". You dont offer any alternative, 
you dont offer any feedback, no review, no testing, no support, just a 
simple rejection on lkml that prevents this project from going upstream.

Yes, maintainers have veto power and often have to make hard decisions,
but, and let me stress this properly:

   Only if they actually act as honest maintainers!

Altogether 197 emails on lkml discussed these patches, and you were 
Cc:-ed to every one of them. Over a dozen kernel developers reviewed it 
or reacted to the patchset in one way or another. And your only reaction 
to this is silence and a rejection claiming that it needs "lot more 
review"? I'm utterly speechless.

	Ingo

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-11  9:42   ` Mel Gorman
@ 2007-07-11 17:49     ` Christoph Lameter
  0 siblings, 0 replies; 484+ messages in thread
From: Christoph Lameter @ 2007-07-11 17:49 UTC (permalink / raw)
  To: Mel Gorman; +Cc: Andrew Morton, linux-kernel

On Wed, 11 Jul 2007, Mel Gorman wrote:

> I would like the patches to go through on the grounds that higher order
> allocations can succeed. However, I am also happy to say that order-0
> pages should be used as much as possible, that case should always be
> made as fast as possible and the world must not end if a high-order
> allocation fails.

SLUB can easily be made to not use higher order pages.

If the SLUB mobility patches are not merged then higher order page use 
can be explicitly enabled via passing the following to the kernel on boot

	slub_max_order=<desired max order>

If they are merged then the higher order page use can be disabled in case 
of trouble via

	slub_max_order=0

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: lguest, Re: -mm merge plans for 2.6.23
  2007-07-11 12:23 ` lguest, " Christoph Hellwig
  2007-07-11 15:45   ` Randy Dunlap
@ 2007-07-11 18:04   ` Andrew Morton
  2007-07-12  1:21   ` Rusty Russell
  2 siblings, 0 replies; 484+ messages in thread
From: Andrew Morton @ 2007-07-11 18:04 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: linux-kernel, rusty, linux-mm

On Wed, 11 Jul 2007 14:23:24 +0200
Christoph Hellwig <hch@lst.de> wrote:

> > lguest-export-symbols-for-lguest-as-a-module.patch
> 
> __put_task_struct is one of those no way in hell should this be exported
> things because we don't want modules messing with task lifetimes.
> 
> Fortunately I can't find anything actually using this in lguest, so
> it looks the issue has been solved in the meantime.
> 

Ther are a couple of calls to put_task_struct() in there, and that needs
__put_task_struct() exported.

> 
> I also have a rather bad feeling about exporting access_process_vm.
> This is the proverbial sledge hammer for access to user vm addresses
> and I'd rather keep it away from module programmers with "if all
> you have is a hammer ..." in mind.

hm, well, access_process_vm() is a convenience wrapper around
get_user_pages(), whcih is exported.

> In lguest this is used by send_dma which from my short reading of the
> code seems to be the central IPC mechanism.  The double copy here
> doesn't look very efficient to me either.  Maybe some VM folks could
> look into a better way to archive this that might be both more
> efficient and not require the export.
> 
> 
> > lguest-the-guest-code.patch
> > lguest-the-host-code.patch
> > lguest-the-host-code-lguest-vs-clockevents-fix-resume-logic.patch
> > lguest-the-asm-offsets.patch
> > lguest-the-makefile-and-kconfig.patch
> > lguest-the-console-driver.patch
> > lguest-the-net-driver.patch
> > lguest-the-block-driver.patch
> > lguest-the-documentation-example-launcher.patch
> 
> Just started to reading this (again) so no useful comment here, but it
> would be nice if the code could follow CodingStyle and place the || and
> && at the end of the line in multiline conditionals instead of at the
> beginning of the new one.

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: x86 status was Re: -mm merge plans for 2.6.23
  2007-07-11 12:43 ` x86 status was " Andi Kleen
  2007-07-11 17:33   ` Jesse Barnes
  2007-07-11 17:42   ` Ingo Molnar
@ 2007-07-11 18:14   ` Jeremy Fitzhardinge
  2007-07-12 19:33   ` Christoph Lameter
  3 siblings, 0 replies; 484+ messages in thread
From: Jeremy Fitzhardinge @ 2007-07-11 18:14 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Andrew Morton, linux-kernel, tglx, Tim Hockin, jesse.barnes,
	Adrian Bunk, dave young

Andi Kleen wrote:
> More review: 
>
>   
>> xen-fix-x86-config-dependencies.patch
>> xen-suppress-abs-symbol-warnings-for-unused-reloc-pointers.patch
>> xen-cant-support-numa-yet.patch
>>     

The first and third of these are just simple Kconfig updates, and the 
middle one just updates the list of symbols which shouldn't be warned 
about in CONFIG_RELOCATABLE's absolute symbol check.  They're completely 
harmless but may prevent someone from generating an unbuildable .config.

>> x86-fix-iounmaps-use-of-vm_structs-size-field.patch
>>     

This appears to fix a real bug; the only question is whether x86-64 
needs the same treatment.  I'm not sure if the original bug reporter 
(dave young) has confirmed it fixed his problem.

    J

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans -- lumpy reclaim
  2007-07-11 16:46     ` Andrew Morton
@ 2007-07-11 18:38       ` Andy Whitcroft
  2007-07-16 10:37       ` Mel Gorman
  1 sibling, 0 replies; 484+ messages in thread
From: Andy Whitcroft @ 2007-07-11 18:38 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Mel Gorman, Christoph Lameter, Peter Zijlstra

Andrew Morton wrote:

>>>> lumpy-reclaim-v4.patch
>>>> have-kswapd-keep-a-minimum-order-free-other-than-order-0.patch
>>>> only-check-absolute-watermarks-for-alloc_high-and-alloc_harder-allocations.patch
>>>>
>>>>  Lumpy reclaim.  In a similar situation to Mel's patches.  Stuck due to
>>>>  general lack or interest and effort.
>>> The lumpy reclaim patches originally came out of work to support Mel's
>>> anti-fragmentation work.  As such I think they have become somewhat
>>> attached to those patches.  Whilst lumpy is most effective where
>>> placement controls are in place as offered by Mel's work, we see benefit
>>> from reduction in the "blunderbuss" effect when we reclaim at higher
>>> orders.  While placement control is pretty much required for the very
>>> highest orders such as huge page size, lower order allocations are
>>> benefited in terms of lower collateral damage.
>>>
>>> There are now a few areas other than huge page allocations which can
>>> benefit.  Stacks are still order 1.  Jumbo frames want higher order
>>> contiguous pages for there incoming hardware buffers.  SLUB is showing
>>> performance benefits from moving to a higher allocation order.  All of
>>> these should benefit from more aggressive targeted reclaim, indeed I
>>> have been surprised just how often my test workloads trigger lumpy at
>>> order 1 to get new stacks.
>>>
>>> Truly representative work loads are hard to generate for some of these.
>>>  Though we have heard some encouraging noises from those who can
>>> reproduce these problems.
> 
> I'd expect that the main application for lumpy-reclaim is in keeping a pool
> of order-2 (say) pages in reserve for GFP_ATOMIC allocators.  ie: jumbo
> frames.
> 
> At present this relies upon the wakeup_kswapd(..., order) mechanism.
> 
> How effective is this at solving the jumbo frame problem?

The tie in between allocator and kswapd is essentially unchanged,
so if allocators are dropping below the watermarks at the specified
order, reclaim will be triggered at that order.  Reclaim continues
until we return above the high watermarks, at the order at which
we are reclaiming.

What lumpy brings is a greater targetting of effort to get the pages.
kswapd now uses the desired allocator order when applying reclaim.
This leads to pressure being applied to contigious areas at the
required order, and so a higher chance of that order becoming
available.  Traditional reclaim could end up applying pressure to
a number of pages, but not all pages in any area at the required
order, leading to a very low chance of success.  By targetting
areas at the required order we significantly increase the chances
of success for any given amount of reclaim.  As we will reclaim
until we have the desired number of free pages, we will have to
reclaim less to achieve this compared to random reclaim.

This certainly is appealing intuitivly, and our testing at higher
orders shows that the cost of each reclaimed page is lower and more
importantly the time to reclaim each page is reduced.  So for a
'continuing' consumer like an incoming packet stream, we should
have to do much less work and thus disrupt the system as a whole
much less to get its pages.

Where demand for atomic higher order pages is not heavy we would
expect kswapd to maintain free levels pages more readily and so
under higher demand.  Though it should be stressed without placement
control success rates drop off significantly at higher orders as
the probabality of reclaim succeeding on all pages in the area
tends to zero.

> (And do we still have a jumbo frame problem?  Reports seems to have
subsided)

It is not in the least bit clear if the problem is resolved or if the
reporters have simply gone quiet.

Overall the approach taken in lumpy reclaim seems to be a logical
extension of the regular reclaim algorithm, leading to more
efficient reclaim.

-apw

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: fallocate-implementation-on-i86-x86_64-and-powerpc.patch
  2007-07-11 15:39             ` fallocate-implementation-on-i86-x86_64-and-powerpc.patch Theodore Tso
@ 2007-07-11 18:47               ` Heiko Carstens
  2007-07-11 20:32                 ` fallocate-implementation-on-i86-x86_64-and-powerpc.patch Martin Schwidefsky
  0 siblings, 1 reply; 484+ messages in thread
From: Heiko Carstens @ 2007-07-11 18:47 UTC (permalink / raw)
  To: Theodore Tso, Paul Mackerras, Andrew Morton, Jeff Garzik,
	linux-kernel, Amit Arora, Andi Kleen, Benjamin Herrenschmidt,
	Arnd Bergmann, Luck, Tony, Martin Schwidefsky, Mark Fasheh,
	linux-arch

On Wed, Jul 11, 2007 at 11:39:39AM -0400, Theodore Tso wrote:
> On Wed, Jul 11, 2007 at 10:50:49AM +1000, Paul Mackerras wrote:
> > > On Wed, 11 Jul 2007 09:27:40 +1000
> > > Paul Mackerras <paulus@samba.org> wrote:
> > > 
> > > > We did come up with an order that worked for everybody, but that
> > > > discussion seemed to get totally ignored by the ext4 developers.
> 
> Well, in the end it was a toss-up between
> 
> asmlinkage long sys_fallocate(int fd, int mode, loff_t offset, loff_t len)
> 
> and 
> 
> asmlinkage long sys_fallocate(loff_t offset, loff_t len, int fd, int mode)
> 
> There were a number of folks who preferred having int fd first, and I
> *thought* Amit had gotten agreement from either Martin or Heiko that
> it was ok to do this as an exception, even though it was extra work
> for that arch.  But if not, we can try going back the second
> alternative, or even the 6 32-bits args (off_high, off_low, len_high,
> len_low) approach, but I think that drew even more fire. 

The second approach would work for all architectures..  but some people
didn't like (no technical reason) not having fd as first argument.

Just go ahead with the current approach. s390 seems to be the only
architecture which suffers from this and I wouldn't like to start this
discussion again.

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: containers (was Re: -mm merge plans for 2.6.23)
  2007-07-11 17:03                         ` Paul Jackson
@ 2007-07-11 18:47                           ` Peter Zijlstra
  0 siblings, 0 replies; 484+ messages in thread
From: Peter Zijlstra @ 2007-07-11 18:47 UTC (permalink / raw)
  To: Paul Jackson; +Cc: vatsa, mingo, containers, menage, akpm, linux-kernel

On Wed, 2007-07-11 at 10:03 -0700, Paul Jackson wrote:
> Peter wrote:
> > I'd very much like to have it, but that is just me.
> 
> Why? [linear combinations of uid, container, pid, pgrp weighting]

Good question, and I really have no other answer than that it seems
usefull and not impossible (or even hard) to implement :-/

I'm not even that interested in using it, it just seems like a nice
idea.



^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: containers (was Re: -mm merge plans for 2.6.23)
  2007-07-11  5:29         ` Andrew Morton
  2007-07-11  6:03           ` Srivatsa Vaddagiri
  2007-07-11  9:04           ` Ingo Molnar
@ 2007-07-11 19:44           ` Paul Menage
  2007-07-12  5:39             ` Srivatsa Vaddagiri
  2 siblings, 1 reply; 484+ messages in thread
From: Paul Menage @ 2007-07-11 19:44 UTC (permalink / raw)
  To: Andrew Morton; +Cc: vatsa, linux-kernel, containers, Ingo Molnar

On 7/10/07, Andrew Morton <akpm@linux-foundation.org> wrote:
>
> I'm inclined to take the cautious route here - I don't think people will be
> dying for the CFS thingy (which I didn't even know about?) in .23, and it's
> rather a lot of infrastructure to add for a CPU scheduler configurator

Selecting the relevant patches to give enough of the container
framework to support a CFS container subsystem (slightly
tweaked/updated versions of the base patch, procfs interface patch and
tasks file interface patch) is about 1600 lines in kernel/container.c
and another 200 in kernel/container.h, which is about 99% of the
non-documentation changes.

So not tiny, but it's not very intrusive on the rest of the kernel,
and would avoid having to introduce a temporary API based on uids.

Paul

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: fallocate-implementation-on-i86-x86_64-and-powerpc.patch
  2007-07-11 18:47               ` fallocate-implementation-on-i86-x86_64-and-powerpc.patch Heiko Carstens
@ 2007-07-11 20:32                 ` Martin Schwidefsky
  0 siblings, 0 replies; 484+ messages in thread
From: Martin Schwidefsky @ 2007-07-11 20:32 UTC (permalink / raw)
  To: Heiko Carstens
  Cc: Theodore Tso, Paul Mackerras, Andrew Morton, Jeff Garzik,
	linux-kernel, Amit Arora, Andi Kleen, Benjamin Herrenschmidt,
	Arnd Bergmann, Luck, Tony, Mark Fasheh, linux-arch

On Wed, 2007-07-11 at 20:47 +0200, Heiko Carstens wrote:
> > There were a number of folks who preferred having int fd first, and I
> > *thought* Amit had gotten agreement from either Martin or Heiko that
> > it was ok to do this as an exception, even though it was extra work
> > for that arch.  But if not, we can try going back the second
> > alternative, or even the 6 32-bits args (off_high, off_low, len_high,
> > len_low) approach, but I think that drew even more fire. 
> 
> The second approach would work for all architectures..  but some people
> didn't like (no technical reason) not having fd as first argument.

For s390 we would have liked the second approach with the two int's as
last arguments since it would avoid the wrapper in the kernel. It does
not avoid the wrapper in user space since the call uses 6 register on 31
bit. So the fallocate call need special treatement in glibc so I don't
mind that it needs another wrapper in the kernel.

> Just go ahead with the current approach. s390 seems to be the only
> architecture which suffers from this and I wouldn't like to start this
> discussion again.

Yes, don't worry about s390 for fallocate, the patch that had been in
-mm only had an incorrect system call number. The wrapper is fine.

-- 
blue skies,
  Martin.

"Reality continues to ruin my life." - Calvin.



^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: RFT: updatedb "morning after" problem [was: Re: -mm merge plans for 2.6.23]
  2007-07-26 14:59                             ` Al Viro
@ 2007-07-11 20:41                               ` Pavel Machek
  0 siblings, 0 replies; 484+ messages in thread
From: Pavel Machek @ 2007-07-11 20:41 UTC (permalink / raw)
  To: Al Viro
  Cc: Andi Kleen, Ingo Molnar, Andrew Morton, Frank Kingswood,
	Nick Piggin, Ray Lee, Jesper Juhl, ck list, Paul Jackson,
	linux-mm, linux-kernel

Hi!

> > That would just save reading the directories. Not sure
> > it helps that much. Much better would be actually if it didn't stat the 
> > individual files (and force their dentries/inodes in). I bet it does that to 
> > find out if they are directories or not. But in a modern system it could just 
> > check the type in the dirent on file systems that support 
> > that and not do a stat. Then you would get much less dentries/inodes.
>  
> FWIW, find(1) does *not* stat non-directories (and neither would this
> approach).  So it's just dentries for directories and you can't realistically
> skip those.  OK, you could - if you had banned cross-directory rename
> for directories and propagated "dirty since last look" towards root (note
> that it would be a boolean, not a timestamp).  Then we could skip unchanged
> subtrees completely...

Could we help it a little from kernel and set 'dirty since last look'
on directory renames?

I mean, this is not only updatedb. KDE startup is limited by this,
too. It would be nice to have effective 'what change in tree'
operation.
							Pavel
-- 
(english) http://www.livejournal.com/~pavelmachek
(cesky, pictures) http://atrey.karlin.mff.cuni.cz/~pavel/picture/horses/blog.html

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: x86 status was Re: -mm merge plans for 2.6.23
  2007-07-11 17:42   ` Ingo Molnar
@ 2007-07-11 21:02     ` Randy Dunlap
  2007-07-11 21:39       ` Thomas Gleixner
  2007-07-11 21:16     ` Andi Kleen
  2007-07-11 21:42     ` x86 status was Re: -mm merge plans for 2.6.23 Linus Torvalds
  2 siblings, 1 reply; 484+ messages in thread
From: Randy Dunlap @ 2007-07-11 21:02 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andi Kleen, Andrew Morton, linux-kernel, Thomas Gleixner,
	Arjan van de Ven, Linus Torvalds, Chris Wright

On Wed, 11 Jul 2007 19:42:52 +0200 Ingo Molnar wrote:

> 
> * Andi Kleen <andi@firstfloor.org> wrote:
> 
> > > clockevents-fix-typo-in-acpi_pmc.patch
> > > timekeeping-fixup-shadow-variable-argument.patch
> > > timerc-cleanup-recently-introduced-whitespace-damage.patch
> > > clockevents-remove-prototypes-of-removed-functions.patch
> > > clockevents-fix-resume-logic.patch
> > > clockevents-fix-device-replacement.patch
> > > tick-management-spread-timer-interrupt.patch
> > > highres-improve-debug-output.patch
> > > hrtimer-speedup-hrtimer_enqueue.patch
> > > pcspkr-use-the-global-pit-lock.patch
> > > ntp-move-the-cmos-update-code-into-ntpc.patch
> > > i386-pit-stop-only-when-in-periodic-or-oneshot-mode.patch
> > > i386-remove-volatile-in-apicc.patch
> > > i386-hpet-assumes-boot-cpu-is-0.patch
> > > i386-move-pit-function-declarations-and-constants-to-correct-header-file.patch
> > > x86_64-untangle-asm-hpeth-from-asm-timexh.patch
> > > x86_64-use-generic-cmos-update.patch
> > > x86_64-remove-dead-code-and-other-janitor-work-in-tscc.patch
> > > x86_64-fix-apic-typo.patch
> > > x86_64-convert-to-cleckevents.patch
> > > acpi-remove-the-useless-ifdef-code.patch
> > > x86_64-hpet-restore-vread.patch
> > > x86_64-restore-restore-nohpet-cmdline.patch
> > > x86_64-block-irq-balancing-for-timer.patch
> > > x86_64-prep-idle-loop-for-dynticks.patch
> > > x86_64-enable-high-resolution-timers-and-dynticks.patch
> > > x86_64-dynticks-disable-hpet_id_legsup-hpets.patch
> > 
> > I'm sceptical about the dynticks code. It just rips out the x86-64 
> > timing code completely, which needs a lot more review and testing. 
> > Probably not .23
> 
> What you just did here is a slap in the face to a lot of contributors 
> who worked hard on this code :(
> 
> Let me tell you about the history of this project first.

... [lwn.net articles and other quotes snipped]

> But what is curiously absent from all this positive and dynamic activity 
> around these patches on lkml? There is not a single email from Andi 
> Kleen, the official maintainer of the x86_64 tree directly reacting to 
> this submission in any way, shape or form. Not a single email from you 
> thanking Arjan, Chris and Thomas for this amount of cleanup to the 
> architecture you are maintaining:
> 
>    31 files changed, 698 insertions(+), 1367 deletions(-)

Hm, I don't usually get thanks emails.  Do other people?

> Not a single email from you reviewing the patchset in any meaningful 
> way. Not a single email from you to indicate that you even did so much 
> as boot into this patchset.
> 
> What contribution do we have from you instead? A week before the .23 
> merge window is closed, in the very last possible moment, we finally get 
> your first reaction to this patchset, albeit in the form of three terse 
> sentences:
> 
>   > I'm sceptical about the dynticks code. It just rips out the x86-64 
>   > timing code completely, which needs a lot more review and testing. 
>   > Probably not .23
> 
> In the past 3+ months there was not a single email from you indicating 
> that you are "doubtful" about this submission, and that you think that 
> it needs "lot more review and testing". You dont offer any alternative, 
> you dont offer any feedback, no review, no testing, no support, just a 
> simple rejection on lkml that prevents this project from going upstream.
> 
> Yes, maintainers have veto power and often have to make hard decisions,
> but, and let me stress this properly:
> 
>    Only if they actually act as honest maintainers!
> 
> Altogether 197 emails on lkml discussed these patches, and you were 
> Cc:-ed to every one of them. Over a dozen kernel developers reviewed it 
> or reacted to the patchset in one way or another. And your only reaction 
> to this is silence and a rejection claiming that it needs "lot more 
> review"? I'm utterly speechless.

I can understand being disappointed, but not quite as upset as you
appear to be.

so have you (Ingo) reviewed the ext4 patches?  or reiser4 patches?
or lumpy reclaim?  or anti-fragmentation?

I certainly haven't.  I can barely keep up with reading about 1/2
of lkml emails.  And in my non-scientific method, I think that we
are suffering from both (a) more patch submittals and (b) fewer
qualified reviewers (per kernel KLOC) than we had 3-5 years ago.

I don't see how you can expect Andrew to review these or any other
specific patchset.  Do you have some suggestions on how to clone
Andrew?

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: x86 status was Re: -mm merge plans for 2.6.23
  2007-07-11 17:42   ` Ingo Molnar
  2007-07-11 21:02     ` Randy Dunlap
@ 2007-07-11 21:16     ` Andi Kleen
  2007-07-11 21:46       ` Valdis.Kletnieks
                         ` (2 more replies)
  2007-07-11 21:42     ` x86 status was Re: -mm merge plans for 2.6.23 Linus Torvalds
  2 siblings, 3 replies; 484+ messages in thread
From: Andi Kleen @ 2007-07-11 21:16 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andi Kleen, Andrew Morton, linux-kernel, Thomas Gleixner,
	Arjan van de Ven, Linus Torvalds, Chris Wright


Well I spent a lot of time making the x86-64 timing code work
well on a variety of machines; working around a wide variety
of hardware and platform bugs. I obviously don't agree on your description
of its maintenance state. 

> What contribution do we have from you instead? A week before the .23 

I told him my objections privately earlier. Basically i would
like to see an actually debuggable step-by-step change, not a rip everything 
out.

If that isn't possible it needs very careful review which just hasn't
happened yet. But I'm not convinced even step by step is not possible
here.

I thought it was clear that rip everything out is rarely a good idea
in Linux land?  That's really not something I should need to harp on 
repeatedly.

-Andi


^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: x86 status was Re: -mm merge plans for 2.6.23
  2007-07-11 21:02     ` Randy Dunlap
@ 2007-07-11 21:39       ` Thomas Gleixner
  2007-07-11 23:21         ` Randy Dunlap
  0 siblings, 1 reply; 484+ messages in thread
From: Thomas Gleixner @ 2007-07-11 21:39 UTC (permalink / raw)
  To: Randy Dunlap
  Cc: Ingo Molnar, Andi Kleen, Andrew Morton, linux-kernel,
	Arjan van de Ven, Linus Torvalds, Chris Wright

Randy,

On Wed, 2007-07-11 at 14:02 -0700, Randy Dunlap wrote: 
> I certainly haven't.  I can barely keep up with reading about 1/2
> of lkml emails.  And in my non-scientific method, I think that we
> are suffering from both (a) more patch submittals and (b) fewer
> qualified reviewers (per kernel KLOC) than we had 3-5 years ago.
> 
> I don't see how you can expect Andrew to review these or any other
> specific patchset.  Do you have some suggestions on how to clone
> Andrew?

Ingo was talking to Andi, the x86_64 maintainer, not to Andrew.

And I share his opinion that the maintainer of the subsystem, which is
affected by such a fundamental patch, could have at least shown any
public sign of interest, disgust, comment or what ever in a 3+ month
time frame.

Especially about a patch, which is a logical consequence of an almost
two years public and transparent effort to consolidate the time code in
the kernel.

I for my part have no problem maintaining the set for another round out
of tree and weed out eventually problems in -mm, but my expectation for
qualified response of the responsible maintainer is exactly zero right
now.

Thanks,

	tglx



^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: x86 status was Re: -mm merge plans for 2.6.23
  2007-07-11 17:42   ` Ingo Molnar
  2007-07-11 21:02     ` Randy Dunlap
  2007-07-11 21:16     ` Andi Kleen
@ 2007-07-11 21:42     ` Linus Torvalds
  2007-07-11 22:04       ` Thomas Gleixner
  2007-07-11 23:19       ` Ingo Molnar
  2 siblings, 2 replies; 484+ messages in thread
From: Linus Torvalds @ 2007-07-11 21:42 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andi Kleen, Andrew Morton, linux-kernel, Thomas Gleixner,
	Arjan van de Ven, Chris Wright



On Wed, 11 Jul 2007, Ingo Molnar wrote:
> 
> What you just did here is a slap in the face to a lot of contributors 
> who worked hard on this code :(

Ingo, I'm sorry to say so, but your answer just convinced me that you're 
wrong, and we MUST NOT take that code.

That was *exactly* the same thing you talked about when I refused to take 
the original timer changes into 2.6.20. You were talking about how lots of 
people had worked really hard, and how it was really tested.

And it damn well was NOT really tested, and 2.6.21 ended up being a 
horribly painful experience (one of the more painful kernel releases in 
recent times), and we ended up havign to fix a *lot* of stuff.

And you admitted you were wrong at the time.

Now you do the *exact* same thing.

Here's a big clue: it doesn't matter one _whit_ how much face-slapping you 
get, or how much effort some programmers have put into the code. It's 
untested. And no, we are *not* going to do another "rip everything out, 
and replace it with new code" again.

Over my dead body.

We're going to do this thing gradually, or not at all.

And if somebody feels slighted by the face-slap, and thinks he has already 
done enough, and isn't interested in doing it gradually, then good 
riddance. The "not at all" seems like a good idea, and maybe we can 
re-visit this in a year or two.

I'm not going to have another 2.6.21 on my hands.

		Linus

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: x86 status was Re: -mm merge plans for 2.6.23
  2007-07-11 21:16     ` Andi Kleen
@ 2007-07-11 21:46       ` Valdis.Kletnieks
  2007-07-11 21:54         ` Chris Wright
  2007-07-11 22:12         ` Linus Torvalds
  2007-07-11 21:46       ` Thomas Gleixner
  2007-07-11 21:46       ` Andrea Arcangeli
  2 siblings, 2 replies; 484+ messages in thread
From: Valdis.Kletnieks @ 2007-07-11 21:46 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Ingo Molnar, Andrew Morton, linux-kernel, Thomas Gleixner,
	Arjan van de Ven, Linus Torvalds, Chris Wright

[-- Attachment #1: Type: text/plain, Size: 1125 bytes --]

On Wed, 11 Jul 2007 23:16:38 +0200, Andi Kleen said:

(Note - I'm just a usually-confused crash test dummy here...)

> Well I spent a lot of time making the x86-64 timing code work
> well on a variety of machines; working around a wide variety
> of hardware and platform bugs. I obviously don't agree on your description
> of its maintenance state. 

I'm seeing a bit of a disconnect here.  If you spent all that time making it
work, how come the guys who developed the patch are saying you didn't provide
any feedback about the patchset?

> > What contribution do we have from you instead? A week before the .23 
> 
> I told him my objections privately earlier. Basically i would
> like to see an actually debuggable step-by-step change, not a rip everything 
> out.

Odd, I looked at the patchset fairly closely a number of times, as I was
hand-retrofitting the -rc[1-4] versions onto -rc[1-4]-mm kernels, and it looked
to *me* like it was a nice set of 20 or so step-by-step changes (bisectable
and everything - I got to do that once trying to figure out which one I botched).
Was there something in there that I missed?


[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: x86 status was Re: -mm merge plans for 2.6.23
  2007-07-11 21:16     ` Andi Kleen
  2007-07-11 21:46       ` Valdis.Kletnieks
@ 2007-07-11 21:46       ` Thomas Gleixner
  2007-07-11 21:52         ` Chris Wright
  2007-07-11 22:18         ` Andi Kleen
  2007-07-11 21:46       ` Andrea Arcangeli
  2 siblings, 2 replies; 484+ messages in thread
From: Thomas Gleixner @ 2007-07-11 21:46 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Ingo Molnar, Andrew Morton, linux-kernel, Arjan van de Ven,
	Linus Torvalds, Chris Wright

Andi,

On Wed, 2007-07-11 at 23:16 +0200, Andi Kleen wrote: 
> > What contribution do we have from you instead? A week before the .23 
> 
> I told him my objections privately earlier. Basically i would
> like to see an actually debuggable step-by-step change, not a rip everything 
> out.

You promised privately to do a thorough review as well, which I'm still
waiting for since months.

> If that isn't possible it needs very careful review which just hasn't
> happened yet. But I'm not convinced even step by step is not possible
> here.

There is no step by step thing. You convert an arch to clock events or
you convert it not.

> I thought it was clear that rip everything out is rarely a good idea
> in Linux land?  That's really not something I should need to harp on 
> repeatedly.

If you have technical objections, put them on the table. Point by point.

All I heard so far from you are platitudes, which are not worth the
electrons to transport them.

	tglx



^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: x86 status was Re: -mm merge plans for 2.6.23
  2007-07-11 21:16     ` Andi Kleen
  2007-07-11 21:46       ` Valdis.Kletnieks
  2007-07-11 21:46       ` Thomas Gleixner
@ 2007-07-11 21:46       ` Andrea Arcangeli
  2007-07-11 22:09         ` Linus Torvalds
  2 siblings, 1 reply; 484+ messages in thread
From: Andrea Arcangeli @ 2007-07-11 21:46 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Ingo Molnar, Andrew Morton, linux-kernel, Thomas Gleixner,
	Arjan van de Ven, Linus Torvalds, Chris Wright

Hi Andi,

On Wed, Jul 11, 2007 at 11:16:38PM +0200, Andi Kleen wrote:
> I thought it was clear that rip everything out is rarely a good idea
> in Linux land?  That's really not something I should need to harp on 
> repeatedly.

I'm going to change topic big time because your sentence above
perfectly applies to the O(1) scheduler too. It's not like process
schedulers are sacred and there shall be only one, while I/O
schedulers and packet schedulers are profane and there can be many of
them. FWIW IMHO the right way would have been to make the new
scheduler pluggable and switchable at runtime, too bad it was ripped
off instead. The difficulty of making the scheduler pluggable isn't
really enormous, there have been patches floating around to achieve
it, some I even deal with them myself once.

The only positive side of being forced to CFS I can imagine, is that
more testing will make it more stable and more tuned more quickly. But
I'm fairly certain Ingo's good enough to achieve without it, perhaps
with a few more weeks.

Personally I very much like the unfariness of O(1), I'm afraid CFS
will overschedule under a certain number of workloads in its attempt
to provide a complete fair queieing at all costs, and it won't deal
with the X server as nicely as O(1), but I may as well be wrong. The
only thing I'm more sure about is that the computational complexity is
higher, and that reason alone is a good technical reason to provide
both and let the java folks stick with O(1) if they want.

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: x86 status was Re: -mm merge plans for 2.6.23
  2007-07-11 21:46       ` Thomas Gleixner
@ 2007-07-11 21:52         ` Chris Wright
  2007-07-11 22:18         ` Andi Kleen
  1 sibling, 0 replies; 484+ messages in thread
From: Chris Wright @ 2007-07-11 21:52 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andi Kleen, Ingo Molnar, Andrew Morton, linux-kernel,
	Arjan van de Ven, Linus Torvalds, Chris Wright

* Thomas Gleixner (tglx@linutronix.de) wrote:
> > If that isn't possible it needs very careful review which just hasn't
> > happened yet. But I'm not convinced even step by step is not possible
> > here.
> 
> There is no step by step thing. You convert an arch to clock events or
> you convert it not.

Indeed, about the only thing can be done is to take a slower approach
to converging the arch specific implementations (hpet, pit, etc).

thanks,
-chris

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: x86 status was Re: -mm merge plans for 2.6.23
  2007-07-11 21:46       ` Valdis.Kletnieks
@ 2007-07-11 21:54         ` Chris Wright
  2007-07-11 22:11           ` Valdis.Kletnieks
  2007-07-11 22:12         ` Linus Torvalds
  1 sibling, 1 reply; 484+ messages in thread
From: Chris Wright @ 2007-07-11 21:54 UTC (permalink / raw)
  To: Valdis.Kletnieks
  Cc: Andi Kleen, Ingo Molnar, Andrew Morton, linux-kernel,
	Thomas Gleixner, Arjan van de Ven, Linus Torvalds, Chris Wright

* Valdis.Kletnieks@vt.edu (Valdis.Kletnieks@vt.edu) wrote:
> On Wed, 11 Jul 2007 23:16:38 +0200, Andi Kleen said:
> (Note - I'm just a usually-confused crash test dummy here...)
> 
> > Well I spent a lot of time making the x86-64 timing code work
> > well on a variety of machines; working around a wide variety
> > of hardware and platform bugs. I obviously don't agree on your description
> > of its maintenance state. 
> 
> I'm seeing a bit of a disconnect here.  If you spent all that time making it
> work, how come the guys who developed the patch are saying you didn't provide
> any feedback about the patchset?

I think Andi's referring to the existing x86_64 code, which gets
replaced by the patchset in question.

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: x86 status was Re: -mm merge plans for 2.6.23
  2007-07-11 21:42     ` x86 status was Re: -mm merge plans for 2.6.23 Linus Torvalds
@ 2007-07-11 22:04       ` Thomas Gleixner
  2007-07-11 22:20         ` Linus Torvalds
  2007-07-11 23:19       ` Ingo Molnar
  1 sibling, 1 reply; 484+ messages in thread
From: Thomas Gleixner @ 2007-07-11 22:04 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Andi Kleen, Andrew Morton, linux-kernel,
	Arjan van de Ven, Chris Wright

Linus,

On Wed, 2007-07-11 at 14:42 -0700, Linus Torvalds wrote: 
> Here's a big clue: it doesn't matter one _whit_ how much face-slapping you 
> get, or how much effort some programmers have put into the code. It's 
> untested. And no, we are *not* going to do another "rip everything out, 
> and replace it with new code" again.
> 
> Over my dead body.
>
> We're going to do this thing gradually, or not at all.

Can you please shed some light on me, how exactly you switch an
architecture gradually to clock events.

You simply can not convert PIT today and the HPET next week followed by
the local APIC in three month. 

At least not to my knowledge.

> And if somebody feels slighted by the face-slap, and thinks he has already 
> done enough, and isn't interested in doing it gradually, then good 
> riddance. The "not at all" seems like a good idea, and maybe we can 
> re-visit this in a year or two.

I have no problem to brew this for some more time. I got not repulsed by
the 2.6.20 decision, but I have no clue how to communicate with a black
hole.

	tglx



^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: x86 status was Re: -mm merge plans for 2.6.23
  2007-07-11 21:46       ` Andrea Arcangeli
@ 2007-07-11 22:09         ` Linus Torvalds
  2007-07-12 15:36           ` Oleg Verych
  2007-07-13  2:23           ` Roman Zippel
  0 siblings, 2 replies; 484+ messages in thread
From: Linus Torvalds @ 2007-07-11 22:09 UTC (permalink / raw)
  To: Andrea Arcangeli
  Cc: Andi Kleen, Ingo Molnar, Andrew Morton, linux-kernel,
	Thomas Gleixner, Arjan van de Ven, Chris Wright



On Wed, 11 Jul 2007, Andrea Arcangeli wrote:
>
> I'm going to change topic big time because your sentence above
> perfectly applies to the O(1) scheduler too.

I disagree to a large degree.

We almost never have problems with code you can "think about".

Sure, bugs happen, but code that everybody runs the same generally doesn't 
break. So a CPU scheduler doesn't worry me all that much. CPU schedulers 
are "easy".

What worries me is interfaces to hardware that we know looks different for 
different people. That means that any testing that one person has done 
doesn't necessarily translate to anything at *all* on another persons 
machine.

The timer problems we had when merging the stuff in 2.6.21 just scarred 
me. I'd _really_ hate to have to go through that again. And no, the 
"gradual" thing where the patch that actually *enables* something isn't 
very gradual at all, so that's the absolutely worst kind of thing, because 
then people can "git bisect" to the point where it got enabled and tell us 
that's where things broke, but that doesn't actually say anything at all 
about the patch that actually implements the new behaviour.

So the "enable" kind of patch is actually the worst of the lot, when it 
comes to hardware.

When it comes to pure software algorithms, and things like schedulers, 
you'll still obviously have timing issues and tuning, but generally things 
*work*, which makes it a lot easier to debug and describe.

		Linus

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: x86 status was Re: -mm merge plans for 2.6.23
  2007-07-11 21:54         ` Chris Wright
@ 2007-07-11 22:11           ` Valdis.Kletnieks
  2007-07-11 22:20             ` Chris Wright
  2007-07-11 22:33             ` Linus Torvalds
  0 siblings, 2 replies; 484+ messages in thread
From: Valdis.Kletnieks @ 2007-07-11 22:11 UTC (permalink / raw)
  To: Chris Wright
  Cc: Andi Kleen, Ingo Molnar, Andrew Morton, linux-kernel,
	Thomas Gleixner, Arjan van de Ven, Linus Torvalds

[-- Attachment #1: Type: text/plain, Size: 1348 bytes --]

On Wed, 11 Jul 2007 14:54:12 PDT, Chris Wright said:
> * Valdis.Kletnieks@vt.edu (Valdis.Kletnieks@vt.edu) wrote:
> > On Wed, 11 Jul 2007 23:16:38 +0200, Andi Kleen said:
> > (Note - I'm just a usually-confused crash test dummy here...)
> > 
> > > Well I spent a lot of time making the x86-64 timing code work
> > > well on a variety of machines; working around a wide variety
> > > of hardware and platform bugs. I obviously don't agree on your description
> > > of its maintenance state. 
> > 
> > I'm seeing a bit of a disconnect here.  If you spent all that time making it
> > work, how come the guys who developed the patch are saying you didn't provide
> > any feedback about the patchset?
> 
> I think Andi's referring to the existing x86_64 code, which gets
> replaced by the patchset in question.

<Takes a closer look at the patches>  D'Oh! :)  Yeah, the -rc4 version I'm
looking at is like a dozen 1-3K patches setting up and cleaning up, and then
one monster 65K patch doing the clockevents conversion, then another 6 or 8
small ones.

Yeah, that one big patch really doesn't look separable to me.  But as I said,
I'm just a crash test dummy here. :)

Andrew - how do you feel about keeping this in the -mm tree until Linus,
Andi, Ingo, and Thomas get on the same page (which may be around the 2.6.24
merge window, by my guesstimate)?


[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: x86 status was Re: -mm merge plans for 2.6.23
  2007-07-11 21:46       ` Valdis.Kletnieks
  2007-07-11 21:54         ` Chris Wright
@ 2007-07-11 22:12         ` Linus Torvalds
  1 sibling, 0 replies; 484+ messages in thread
From: Linus Torvalds @ 2007-07-11 22:12 UTC (permalink / raw)
  To: Valdis.Kletnieks
  Cc: Andi Kleen, Ingo Molnar, Andrew Morton, linux-kernel,
	Thomas Gleixner, Arjan van de Ven, Chris Wright



On Wed, 11 Jul 2007, Valdis.Kletnieks@vt.edu wrote:
> 
> Odd, I looked at the patchset fairly closely a number of times, as I was
> hand-retrofitting the -rc[1-4] versions onto -rc[1-4]-mm kernels, and it looked
> to *me* like it was a nice set of 20 or so step-by-step changes (bisectable
> and everything - I got to do that once trying to figure out which one I botched).
> Was there something in there that I missed?

The patch-set itself actually looks fine, as far as I'm concerned.

But it does seem to have that "enable everything in one go" problem.

I'd much rather see one time source at a time being converted, and enabled 
then and there, so that when people report problems and do a bisection, if 
it was HPET that broke, you get the commit that changed HPET.

As it is, looking at that set, it *looks* like you'd get the "ok, now 
enable it all" as the commit that breaks, which tells you hardly anything, 
since the commit that _shows_ the behaviour has absolutely nothing to do 
with the code that actually causes it.

But yeah, the patch series per se doesn't look bad. If it wasn't for me 
being burnt by the last big switch-over for timers, I probably wouldn't 
mind it at all, personally.

		Linus

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: x86 status was Re: -mm merge plans for 2.6.23
  2007-07-11 21:46       ` Thomas Gleixner
  2007-07-11 21:52         ` Chris Wright
@ 2007-07-11 22:18         ` Andi Kleen
  1 sibling, 0 replies; 484+ messages in thread
From: Andi Kleen @ 2007-07-11 22:18 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andi Kleen, Ingo Molnar, Andrew Morton, linux-kernel,
	Arjan van de Ven, Linus Torvalds, Chris Wright

On Wed, Jul 11, 2007 at 11:46:41PM +0200, Thomas Gleixner wrote:
> Andi,
> 
> On Wed, 2007-07-11 at 23:16 +0200, Andi Kleen wrote: 
> > > What contribution do we have from you instead? A week before the .23 
> > 
> > I told him my objections privately earlier. Basically i would
> > like to see an actually debuggable step-by-step change, not a rip everything 
> > out.
> 
> You promised privately to do a thorough review as well, which I'm still
> waiting for since months.

I did some reviewing, but never the big write up and feedback. That was my
fault, sorry.

-Andi

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: x86 status was Re: -mm merge plans for 2.6.23
  2007-07-11 22:04       ` Thomas Gleixner
@ 2007-07-11 22:20         ` Linus Torvalds
  2007-07-11 22:50           ` Thomas Gleixner
  2007-07-11 22:51           ` Chris Wright
  0 siblings, 2 replies; 484+ messages in thread
From: Linus Torvalds @ 2007-07-11 22:20 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, Andi Kleen, Andrew Morton, linux-kernel,
	Arjan van de Ven, Chris Wright



On Thu, 12 Jul 2007, Thomas Gleixner wrote:
> 
> Can you please shed some light on me, how exactly you switch an
> architecture gradually to clock events.

For example, we can make sure that the code in question that actually 
touches the hardware stays exactly the same, and then just move the 
interfaces around - and basically guarantee that _zero_ hardware-specific 
issues pop up when you switch over, for example.

That way there is a gradual change-over.

The other approach (which would be nice _too_) is to actually try to 
convert one clock source at a time. Why is that not an option? 

		Linus


^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: x86 status was Re: -mm merge plans for 2.6.23
  2007-07-11 22:11           ` Valdis.Kletnieks
@ 2007-07-11 22:20             ` Chris Wright
  2007-07-11 22:33             ` Linus Torvalds
  1 sibling, 0 replies; 484+ messages in thread
From: Chris Wright @ 2007-07-11 22:20 UTC (permalink / raw)
  To: Valdis.Kletnieks
  Cc: Chris Wright, Andi Kleen, Ingo Molnar, Andrew Morton,
	linux-kernel, Thomas Gleixner, Arjan van de Ven, Linus Torvalds

* Valdis.Kletnieks@vt.edu (Valdis.Kletnieks@vt.edu) wrote:
> Andrew - how do you feel about keeping this in the -mm tree until Linus,
> Andi, Ingo, and Thomas get on the same page (which may be around the 2.6.24
> merge window, by my guesstimate)?

Well, that's supposed to be Andi's tree and aggregated by Andrew into -mm.
But keeping it in -mm isn't the hard part.  It's getting enough testing
to convince Linus it's safe, since there's no simple way to enable
clockevents in a slow manner.  IOW, keeping it in -mm just postpones the
issue.

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: x86 status was Re: -mm merge plans for 2.6.23
  2007-07-11 22:11           ` Valdis.Kletnieks
  2007-07-11 22:20             ` Chris Wright
@ 2007-07-11 22:33             ` Linus Torvalds
  1 sibling, 0 replies; 484+ messages in thread
From: Linus Torvalds @ 2007-07-11 22:33 UTC (permalink / raw)
  To: Valdis.Kletnieks
  Cc: Chris Wright, Andi Kleen, Ingo Molnar, Andrew Morton,
	linux-kernel, Thomas Gleixner, Arjan van de Ven



On Wed, 11 Jul 2007, Valdis.Kletnieks@vt.edu wrote:
> 
> <Takes a closer look at the patches>  D'Oh! :)  Yeah, the -rc4 version I'm
> looking at is like a dozen 1-3K patches setting up and cleaning up, and then
> one monster 65K patch doing the clockevents conversion, then another 6 or 8
> small ones.
> 
> Yeah, that one big patch really doesn't look separable to me. 

I think it should be.

That big patch really does do a *lot* more than just the "clockevents 
conversion". It does all the hpet clock setup changes etc that are about 
the hardware, and have *nothing* to do with actually changing the 
interfaces.

For example, look at the hpet.c part of that patch. Totally independent 
cleanups of everything else.

Or look at the changes to __setup_APIC_LVTT(). Same thing. 

All the actual hardware interface changes are *totally* independent of the 
software interface changes, and a lot of them are just cleanups.

But those hardware interface changes are easily the things that can break, 
where some cleanup results in register writes being done in a different 
order or something, and so if there's a bug there (and it's not visible on 
most setups), now you cannot tell where the bug is.

Another example: setup_APIC_timer() used to wait for a timer interrupt 
trigger to happen on the i8259 timer (or HPET). That code just got 
removed (or maybe it got moved so subtly that I just don't see it). 

What has that got to do with switching from the old timer interface to the 
new one?

NOTHING.

So those kinds of changes that change hardware access functions should 
have been done separately. Maybe there's a machine where that early 
synchronization was necessary for some subtle timing reason. If so, 
removing it sounds like a bug, no? Wouldn't it have been nice to see that 
removal as a separate patch that was independent of the interface switch- 
over?

I'd be a *lot* happier with switching over interfaces if I thought that 
the low-level hardware drivers didn't change at the same time. But they 
*do* change, afaik.

		Linus

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: x86 status was Re: -mm merge plans for 2.6.23
  2007-07-11 22:20         ` Linus Torvalds
@ 2007-07-11 22:50           ` Thomas Gleixner
  2007-07-11 23:03             ` Chris Wright
                               ` (2 more replies)
  2007-07-11 22:51           ` Chris Wright
  1 sibling, 3 replies; 484+ messages in thread
From: Thomas Gleixner @ 2007-07-11 22:50 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Andi Kleen, Andrew Morton, linux-kernel,
	Arjan van de Ven, Chris Wright

Linus,

On Wed, 2007-07-11 at 15:20 -0700, Linus Torvalds wrote:
> For example, we can make sure that the code in question that actually 
> touches the hardware stays exactly the same, and then just move the 
> interfaces around - and basically guarantee that _zero_ hardware-specific 
> issues pop up when you switch over, for example.
> 
> That way there is a gradual change-over.

Ok, I can try to split this down further.

> The other approach (which would be nice _too_) is to actually try to 
> convert one clock source at a time. Why is that not an option? 

We need to give control to the clock events core code once we convert
one clock event device. Having two competing subsystems controlling
different devices (e.g. PIT and APIC) is not really desirable.

The HPET change, which is the larger part of the conversion set simply
because we now share the code with i386, might be split out by disabling
HPET in the first step, doing the PIT / APIC conversion and then the
HPET one in a separate step.

Thanks,

	tglx



^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: x86 status was Re: -mm merge plans for 2.6.23
  2007-07-11 22:20         ` Linus Torvalds
  2007-07-11 22:50           ` Thomas Gleixner
@ 2007-07-11 22:51           ` Chris Wright
  2007-07-11 22:58             ` Linus Torvalds
  1 sibling, 1 reply; 484+ messages in thread
From: Chris Wright @ 2007-07-11 22:51 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, Ingo Molnar, Andi Kleen, Andrew Morton,
	linux-kernel, Arjan van de Ven, Chris Wright

* Linus Torvalds (torvalds@linux-foundation.org) wrote:
> For example, we can make sure that the code in question that actually 
> touches the hardware stays exactly the same, and then just move the 
> interfaces around - and basically guarantee that _zero_ hardware-specific 
> issues pop up when you switch over, for example.

That's not quite right.  Leaving the code unchanged caused breakage
already.  The PIT is damn stupid and can be sensitive to how quickly it's
programmed.  So code that enable/disable didn't change, but frequency
with which it is called did and broke some random boxes.

> The other approach (which would be nice _too_) is to actually try to 
> convert one clock source at a time. Why is that not an option? 

It was that way for x86_64, that's the first thing I fixed (since it was
done by fully disabling all other timers but the one coverted ;-)

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: x86 status was Re: -mm merge plans for 2.6.23
  2007-07-11 22:51           ` Chris Wright
@ 2007-07-11 22:58             ` Linus Torvalds
  2007-07-12  2:53               ` Arjan van de Ven
  0 siblings, 1 reply; 484+ messages in thread
From: Linus Torvalds @ 2007-07-11 22:58 UTC (permalink / raw)
  To: Chris Wright
  Cc: Thomas Gleixner, Ingo Molnar, Andi Kleen, Andrew Morton,
	linux-kernel, Arjan van de Ven



On Wed, 11 Jul 2007, Chris Wright wrote:
> 
> That's not quite right.  Leaving the code unchanged caused breakage
> already.  The PIT is damn stupid and can be sensitive to how quickly it's
> programmed.  So code that enable/disable didn't change, but frequency
> with which it is called did and broke some random boxes.

Sure. We cannot avoid *all* problems. Bugs happen. 

But at least we could try to make sure that there aren't totally 
unnecessary changes in that switch-over patch. Which there definitely 
were, as far as I can tell.

		Linus

^ permalink raw reply	[flat|nested] 484+ messages in thread

* generic clockevents/ (hr)time(r) patches was Re: -mm merge plans for 2.6.23
  2007-07-10  8:31 -mm merge plans for 2.6.23 Andrew Morton
                   ` (22 preceding siblings ...)
  2007-07-11 12:43 ` x86 status was " Andi Kleen
@ 2007-07-11 23:03 ` Thomas Gleixner
  2007-07-11 23:57   ` Andrew Morton
  2007-07-11 23:59   ` Andi Kleen
  2007-07-12  0:54 ` fault vs invalidate race (Re: -mm merge plans for 2.6.23) Nick Piggin
                   ` (2 subsequent siblings)
  26 siblings, 2 replies; 484+ messages in thread
From: Thomas Gleixner @ 2007-07-11 23:03 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andi Kleen

Andrew, Linus, 

On Tue, 2007-07-10 at 01:31 -0700, Andrew Morton wrote:
> When replying, please rewrite the subject suitably and try to Cc: the
> appropriate developer(s).

> i386-hpet-check-if-the-counter-works.patch
> nohz-fix-nohz-x86-dyntick-idle-handling.patch
> acpi-move-timer-broadcast-and-pmtimer-access-before-c3-arbiter-shutdown.patch
> clockevents-fix-typo-in-acpi_pmc.patch
> timekeeping-fixup-shadow-variable-argument.patch
> timerc-cleanup-recently-introduced-whitespace-damage.patch
> clockevents-remove-prototypes-of-removed-functions.patch
> clockevents-fix-resume-logic.patch
> clockevents-fix-device-replacement.patch
> tick-management-spread-timer-interrupt.patch
> highres-improve-debug-output.patch
> hrtimer-speedup-hrtimer_enqueue.patch
> pcspkr-use-the-global-pit-lock.patch
> ntp-move-the-cmos-update-code-into-ntpc.patch
> i386-pit-stop-only-when-in-periodic-or-oneshot-mode.patch
> i386-remove-volatile-in-apicc.patch
> i386-hpet-assumes-boot-cpu-is-0.patch
> i386-move-pit-function-declarations-and-constants-to-correct-header-file.patch

These got sent to Andi as well, but the patches are independent of the
x86_64 conversion.

These are bugfixes (nohz-fix-nohz-x86-dyntick-idle-handling.patch) and
general improvements of the core code and the existing i386 code.

Can we please merge the above now ?

I can resend them or setup a git repo if you want.

Andi, any objections against the above i386 fixlets ?

Thanks,

	tglx



^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: x86 status was Re: -mm merge plans for 2.6.23
  2007-07-11 22:50           ` Thomas Gleixner
@ 2007-07-11 23:03             ` Chris Wright
  2007-07-11 23:07             ` Linus Torvalds
  2007-07-12 20:38             ` Matt Mackall
  2 siblings, 0 replies; 484+ messages in thread
From: Chris Wright @ 2007-07-11 23:03 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Linus Torvalds, Ingo Molnar, Andi Kleen, Andrew Morton,
	linux-kernel, Arjan van de Ven, Chris Wright

* Thomas Gleixner (tglx@linutronix.de) wrote:
> The HPET change, which is the larger part of the conversion set simply
> because we now share the code with i386, might be split out by disabling
> HPET in the first step, doing the PIT / APIC conversion and then the
> HPET one in a separate step.

The timer specific changes (i.e. the merges between arches) can be
done more slowly, but the setup above is basically where I started,
and it was already broken on one of my test boxes.  Anyway, I'll help
you however I can, because it's important to me to get this merged.

thanks,
-chris

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: x86 status was Re: -mm merge plans for 2.6.23
  2007-07-11 22:50           ` Thomas Gleixner
  2007-07-11 23:03             ` Chris Wright
@ 2007-07-11 23:07             ` Linus Torvalds
  2007-07-11 23:29               ` Thomas Gleixner
  2007-07-11 23:36               ` Andi Kleen
  2007-07-12 20:38             ` Matt Mackall
  2 siblings, 2 replies; 484+ messages in thread
From: Linus Torvalds @ 2007-07-11 23:07 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, Andi Kleen, Andrew Morton, linux-kernel,
	Arjan van de Ven, Chris Wright



On Thu, 12 Jul 2007, Thomas Gleixner wrote:
>
> The HPET change, which is the larger part of the conversion set simply
> because we now share the code with i386, might be split out by disabling
> HPET in the first step, doing the PIT / APIC conversion and then the
> HPET one in a separate step.

But that misses the point. It means that the commit that actually 
*changes* the code never actually gets tested on its own

Why not just fix up the HPET code so that it can be shared *first*. 
Without the other conversion? Really - What's so wrong with the hpet.c 
changes in the *absense* of conversion to clockevents? Those changes seem 
to be totally independent - just abstracting ou tthe 
"hpet_get_virt_address()" stuff etc.

None of that has anything to do with clockevents, as far as I can see.

In other words, you now change a i386-only file, and maybe it breaks 
subtly on i386 as a result. Wouldn't it be nicer to see that breakage as a 
separate event?

Then, the x86-64 clockevents code will switch over entirely, but now it 
switches over to something we can say has gotten testing, and we know the 
switch-over won't break any 32-bit code, because the switch-over literally 
didn't change anything at all for that case.

See? THAT is what I mean by "gradual". Bugs happen, but if we can make 
_independent_ bugs show up in _independent_ commits, that will make it 
much easier to figure out what happened.

The same is true of a lot of the APIC timer code. Sure, that patch has the 
actual conversion in it, and you don't have the cross-architecture issues, 
but more than 50% of the patch seems to be just cleanup that is 
independent of the actual switch-over, no?

Again, if it was done as a "one patch for cleanup, and another patch that 
actually switches the higher-level interfaces around", then the two mostly 
independent issues (of "hardware access/initialization" vs "higher-level 
changes in how it got called") get done as two independent commits.

And no, I really probably wouldn't ask for this, but 2.6.21 showed 
*exactly* this problem. Trivial debugging helps like "git bisect" didn't 
help at all, because all the problems started when the new code was 
"activated", not when it was actually brought in.

		Linus

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: x86 status was Re: -mm merge plans for 2.6.23
  2007-07-11 21:42     ` x86 status was Re: -mm merge plans for 2.6.23 Linus Torvalds
  2007-07-11 22:04       ` Thomas Gleixner
@ 2007-07-11 23:19       ` Ingo Molnar
  2007-07-11 23:45         ` Linus Torvalds
  1 sibling, 1 reply; 484+ messages in thread
From: Ingo Molnar @ 2007-07-11 23:19 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andi Kleen, Andrew Morton, linux-kernel, Thomas Gleixner,
	Arjan van de Ven, Chris Wright


* Linus Torvalds <torvalds@linux-foundation.org> wrote:

> That was *exactly* the same thing you talked about when I refused to 
> take the original timer changes into 2.6.20. You were talking about 
> how lots of people had worked really hard, and how it was really 
> tested.

yes - i was (way too!) upset about it, and your reasoning for the 
rejection was hard (on us) but fair: you wanted a quiet 2.6.20, and you 
felt fundamentally uneasy about the patches.

> And it damn well was NOT really tested, and 2.6.21 ended up being a 
> horribly painful experience (one of the more painful kernel releases 
> in recent times), and we ended up havign to fix a *lot* of stuff.

yes. We had 12 -hrt/dynticks merge related regressions between 
2.6.21-rc1 and -final, and 4 after final. Here's a quick post-mortem:

12 fixes after -rc1:

    [PATCH] i386: Fix bogus return value in hpet_next_event()
    [PATCH] clockevents: remove bad designed sysfs support for now
    [PATCH] clocksource: Fix thinko in watchdog selection
    [PATCH] dynticks: fix hrtimer rounding error in next_timer_interrupt
    [PATCH] i386: add command line option "local_apic_timer_c2_ok"
    [PATCH] i386: disable local apic timer via command line or dmi quirk
    [PATCH] i386: clockevents fix breakage on Geode/Cyrix PIT     
    [PATCH] i386: trust the PM-Timer calibration of the local APIC timer
    [PATCH] clockevents: Fix suspend/resume to disk hangs
    [PATCH] highres: do not run the TIMER_SOFTIRQ after switching to highres mode
    [PATCH] hrtimer: prevent overrun DoS in hrtimer_forward()
    [PATCH] Save/restore periodic tick information over suspend/resume implementations

4 fixes after -final:

 2.6.21.1: -
 2.6.21.2:
    [PATCH] clocksource: fix resume logic
 2.6.21.3: -
 2.6.21.4: -
 2.6.21.5:
    [PATCH] NOHZ: Rate limit the local softirq pending warning output
    [PATCH] Ignore bogus ACPI info for offline CPUs
    [PATCH] i386: HPET, check if the counter works
 2.6.21.6: -

it's all pretty quiet today on the dynticks regressions front. (there 
are no open regressions in either the upstream i386 code or in the devel 
patches we are aware of. Forced-HPET in -mm, which is not part of this 
queue in question [but which is done for dynticks], has one open 
regression.)

The majority of the above bugs were in the infrastructure code. (the 
worst was the generic resume/suspend one fixed in 2.6.21.2) And sadly, a 
fair number of the infrastructure bugs we introduced during the frentic 
clockevents/dynticks rewrites/redesigns we did between .20 and .21. That 
was a royally stupid mistake for us to do - instead of patiently waiting 
for the bugs to be shaken out we destabilized the infrastructure. (it 
was a "lets make this thing so nice that it's impossible to reject" 
instintic gut reaction.)

In the 'weird arch bugs' category, out of the 6 i386 breakages listed 
above, 'i386 legacy systems' was/is by far the worst offender: 4-5 were 
on such old (not 64-bit-capable) systems. (this is not really a 
surprise) While x86_64 certainly has weird crap hardware too, it 
probably is an order of magnitude fewer than i386 - just due to the 
sheer volume, time and diversity difference. (On the other hand if 
there's crap then it will be debugged/tested slower than on 32-bit, 
which offsets that advantage.)

The most prominent bugs were the ones that were in the infrastructure - 
they affected many machines. (But i'd expect the infrastructure to be 
pretty robust by now.)

The x86_64 hrt/dynticks code makes the x86_64 PIT driver (and hpet too) 
shared between the two architectures - which is perhaps another 
difference to the original i386 clockevents merge.

We also integrated _all_ feedback we got, and we had the capacity and 
capability to fix whatever other feedback comes back - it just never 
came ... until today.

But i fully agree with you that the cleanups should be done separately - 
it's just so hard to actually hack on the old hpet code (and to 
understand it to begin with) without first cleaning it up a bit so that 
it does not cause permanent brain damage ;)

	Ingo

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: x86 status was Re: -mm merge plans for 2.6.23
  2007-07-11 21:39       ` Thomas Gleixner
@ 2007-07-11 23:21         ` Randy Dunlap
  0 siblings, 0 replies; 484+ messages in thread
From: Randy Dunlap @ 2007-07-11 23:21 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Ingo Molnar, Andi Kleen, Andrew Morton, linux-kernel,
	Arjan van de Ven, Linus Torvalds, Chris Wright

On Wed, 11 Jul 2007 23:39:19 +0200 Thomas Gleixner wrote:

> Randy,
> 
> On Wed, 2007-07-11 at 14:02 -0700, Randy Dunlap wrote: 
> > I certainly haven't.  I can barely keep up with reading about 1/2
> > of lkml emails.  And in my non-scientific method, I think that we
> > are suffering from both (a) more patch submittals and (b) fewer
> > qualified reviewers (per kernel KLOC) than we had 3-5 years ago.
> > 
> > I don't see how you can expect Andrew to review these or any other
> > specific patchset.  Do you have some suggestions on how to clone
> > Andrew?
> 
> Ingo was talking to Andi, the x86_64 maintainer, not to Andrew.

Yep, I see that when I re-read it.  I apologize.

---
~Randy
*** Remember to use Documentation/SubmitChecklist when testing your code ***

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: x86 status was Re: -mm merge plans for 2.6.23
  2007-07-11 23:07             ` Linus Torvalds
@ 2007-07-11 23:29               ` Thomas Gleixner
  2007-07-11 23:36               ` Andi Kleen
  1 sibling, 0 replies; 484+ messages in thread
From: Thomas Gleixner @ 2007-07-11 23:29 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Andi Kleen, Andrew Morton, linux-kernel,
	Arjan van de Ven, Chris Wright

Linus,

On Wed, 2007-07-11 at 16:07 -0700, Linus Torvalds wrote:
> Why not just fix up the HPET code so that it can be shared *first*. 
> Without the other conversion? Really - What's so wrong with the hpet.c 
> changes in the *absense* of conversion to clockevents? Those changes seem 
> to be totally independent - just abstracting ou tthe 
> "hpet_get_virt_address()" stuff etc.
> 
> None of that has anything to do with clockevents, as far as I can see.
>
> In other words, you now change a i386-only file, and maybe it breaks 
> subtly on i386 as a result. Wouldn't it be nicer to see that breakage as a 
> separate event?

Sure, I meant to do the HPET changes to i386 separate as a preparatory
patch.

Sharing HPET before the conversion is nasty at best (it involves a ton
of ifdeffery at least).

> Then, the x86-64 clockevents code will switch over entirely, but now it 
> switches over to something we can say has gotten testing, and we know the 
> switch-over won't break any 32-bit code, because the switch-over literally 
> didn't change anything at all for that case.

Well, we know that it works on i386, but once we turn on the x64 switch
we have not tested the shared code for x64 yet.

I try to find some practicable compromise between the big bang patch and
the theoretical gradual optimum.

> The same is true of a lot of the APIC timer code. Sure, that patch has the 
> actual conversion in it, and you don't have the cross-architecture issues, 
> but more than 50% of the patch seems to be just cleanup that is 
> independent of the actual switch-over, no?

I said before, that I'm going to split them further.

	tglx



^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: x86 status was Re: -mm merge plans for 2.6.23
  2007-07-11 23:07             ` Linus Torvalds
  2007-07-11 23:29               ` Thomas Gleixner
@ 2007-07-11 23:36               ` Andi Kleen
  2007-07-11 23:48                 ` Thomas Gleixner
  2007-07-11 23:58                 ` Ingo Molnar
  1 sibling, 2 replies; 484+ messages in thread
From: Andi Kleen @ 2007-07-11 23:36 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Thomas Gleixner, Ingo Molnar, Andi Kleen, Andrew Morton,
	linux-kernel, Arjan van de Ven, Chris Wright

> The same is true of a lot of the APIC timer code. Sure, that patch has the 
> actual conversion in it, and you don't have the cross-architecture issues, 
> but more than 50% of the patch seems to be just cleanup that is 
> independent of the actual switch-over, no?

I don't think it's that much cleanup. One of my goals for x86-64 was always
to have it support modern x86 only; this means in particularly most of the
old bug workaround removed. With the APIC timer merging a lot of that crap
gets back in.

I would prefer to keep APIC code separate.

-Andi

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: x86 status was Re: -mm merge plans for 2.6.23
  2007-07-11 23:19       ` Ingo Molnar
@ 2007-07-11 23:45         ` Linus Torvalds
  0 siblings, 0 replies; 484+ messages in thread
From: Linus Torvalds @ 2007-07-11 23:45 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andi Kleen, Andrew Morton, linux-kernel, Thomas Gleixner,
	Arjan van de Ven, Chris Wright



On Thu, 12 Jul 2007, Ingo Molnar wrote:
> 
> We also integrated _all_ feedback we got, and we had the capacity and 
> capability to fix whatever other feedback comes back - it just never 
> came ... until today.

One thing I'll happily talk about is that while 2.6.21 was painful, you 
and Thomas in particular were both very responsible about the thing, so 
no, I'm not at all complaining or worried about it in that sense!

I just really _really_ wish we could have two fairly stable releases in a 
row. I think 2.6.22 has the potential to be a pretty good setup, and I'd 
really like to avoid having another 2.6.21 immediately afterwards.

So I'm not worried about integration and getting fixes when things break 
per se, but I *am* worried that this is an area where we've traditionally 
had lots of unexpected problems.

And hey, maybe this time there will be none. I just still smart from the 
last time, so I'd prefer it to go more smoothly this time around.

		Linus

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: x86 status was Re: -mm merge plans for 2.6.23
  2007-07-11 23:36               ` Andi Kleen
@ 2007-07-11 23:48                 ` Thomas Gleixner
  2007-07-11 23:58                 ` Ingo Molnar
  1 sibling, 0 replies; 484+ messages in thread
From: Thomas Gleixner @ 2007-07-11 23:48 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, linux-kernel,
	Arjan van de Ven, Chris Wright

Andi,

On Thu, 2007-07-12 at 01:36 +0200, Andi Kleen wrote:
> > The same is true of a lot of the APIC timer code. Sure, that patch has the 
> > actual conversion in it, and you don't have the cross-architecture issues, 
> > but more than 50% of the patch seems to be just cleanup that is 
> > independent of the actual switch-over, no?
> 
> I don't think it's that much cleanup. One of my goals for x86-64 was always
> to have it support modern x86 only; this means in particularly most of the
> old bug workaround removed. With the APIC timer merging a lot of that crap
> gets back in.
> 
> I would prefer to keep APIC code separate.

Care to look at the patch ? It _IS_ seperate.

Only HPET and PIT got shared.

	tglx



^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: generic clockevents/ (hr)time(r) patches was Re: -mm merge plans for 2.6.23
  2007-07-11 23:03 ` generic clockevents/ (hr)time(r) patches " Thomas Gleixner
@ 2007-07-11 23:57   ` Andrew Morton
  2007-07-12  0:04     ` Thomas Gleixner
  2007-07-11 23:59   ` Andi Kleen
  1 sibling, 1 reply; 484+ messages in thread
From: Andrew Morton @ 2007-07-11 23:57 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andi Kleen

On Thu, 12 Jul 2007 01:03:28 +0200
Thomas Gleixner <tglx@linutronix.de> wrote:

> Andrew, Linus, 
> 
> On Tue, 2007-07-10 at 01:31 -0700, Andrew Morton wrote:
> > When replying, please rewrite the subject suitably and try to Cc: the
> > appropriate developer(s).
> 
> > i386-hpet-check-if-the-counter-works.patch
> > nohz-fix-nohz-x86-dyntick-idle-handling.patch
> > acpi-move-timer-broadcast-and-pmtimer-access-before-c3-arbiter-shutdown.patch
> > clockevents-fix-typo-in-acpi_pmc.patch
> > timekeeping-fixup-shadow-variable-argument.patch
> > timerc-cleanup-recently-introduced-whitespace-damage.patch
> > clockevents-remove-prototypes-of-removed-functions.patch
> > clockevents-fix-resume-logic.patch
> > clockevents-fix-device-replacement.patch
> > tick-management-spread-timer-interrupt.patch
> > highres-improve-debug-output.patch
> > hrtimer-speedup-hrtimer_enqueue.patch
> > pcspkr-use-the-global-pit-lock.patch
> > ntp-move-the-cmos-update-code-into-ntpc.patch
> > i386-pit-stop-only-when-in-periodic-or-oneshot-mode.patch
> > i386-remove-volatile-in-apicc.patch
> > i386-hpet-assumes-boot-cpu-is-0.patch
> > i386-move-pit-function-declarations-and-constants-to-correct-header-file.patch
> 
> These got sent to Andi as well, but the patches are independent of the
> x86_64 conversion.
> 
> These are bugfixes (nohz-fix-nohz-x86-dyntick-idle-handling.patch) and
> general improvements of the core code and the existing i386 code.
> 
> Can we please merge the above now ?
> 
> I can resend them or setup a git repo if you want.
> 
> Andi, any objections against the above i386 fixlets ?
> 

They all look pretty innocuous to me.

Could you please take a second look, decide if any of them should also be
in 2.6.22.x and let me know?

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: x86 status was Re: -mm merge plans for 2.6.23
  2007-07-11 23:36               ` Andi Kleen
  2007-07-11 23:48                 ` Thomas Gleixner
@ 2007-07-11 23:58                 ` Ingo Molnar
  2007-07-12  0:07                   ` Andi Kleen
  1 sibling, 1 reply; 484+ messages in thread
From: Ingo Molnar @ 2007-07-11 23:58 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Linus Torvalds, Thomas Gleixner, Andrew Morton, linux-kernel,
	Arjan van de Ven, Chris Wright


* Andi Kleen <andi@firstfloor.org> wrote:

> > The same is true of a lot of the APIC timer code. Sure, that patch 
> > has the actual conversion in it, and you don't have the 
> > cross-architecture issues, but more than 50% of the patch seems to 
> > be just cleanup that is independent of the actual switch-over, no?
> 
> I don't think it's that much cleanup. One of my goals for x86-64 was 
> always to have it support modern x86 only; this means in particularly 
> most of the old bug workaround removed. With the APIC timer merging a 
> lot of that crap gets back in.

i dont think "clean, modern x86 code" will ever happen - x86_64 has and 
is going to have the exact same type of crap. And i'll say a weird thing 
now: that is a _blessing_. Why? Because this crap in question originates 
from the _diversity_ of the platform, and that is a much larger asset 
than the cost of the quirks can ever be!

What you suggest does not end up in "clean 64-bit code", it ends up in 
"a bit less crappy 64-bit code", plus a lot of unnecessary duplication 
of effort and duplication of code - which easily introduces more crap 
total than it gets rid of ...

The x86 architecture isnt fully analogous to a random piece of device 
hardware that evolves. It is more of a collector of random pieces of 
hardware that evolve independently, and as such it will always be 
exposed to human messups in a factorized way. "The pristine, clean 
architecture" is an utopia and it will never come until humans design 
hardware.

Under your scheme we'll end up with is two sets of code which share some 
of the workarounds and dont share some others. No, in fact we _already_ 
ended up with two sets of code that is crappy in different ways. We had 
countless cases of bugs fixed in i386 but not fixed in x86_64. (and vice 
versa) Sharing code for similar hardware is almost always good.

I think the PowerPC experience (although it is not a fully equivalent 
case) about them merging their 32-bit and 64-bit architectures was an 
overwhelmingly positive move, and x86 could learn a thing or two from 
that.

The only way to fight crappy hardware is to map it, to understand it and 
to design as cleanly in the presence of it as possible. Having two sets 
of code for the same thing hardly serves that purpose. In fact, having 
_more_ crappy hardware _forces_ us to do a cleaner design (up to a pain 
threshold).

	Ingo

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: generic clockevents/ (hr)time(r) patches was Re: -mm merge plans for 2.6.23
  2007-07-11 23:03 ` generic clockevents/ (hr)time(r) patches " Thomas Gleixner
  2007-07-11 23:57   ` Andrew Morton
@ 2007-07-11 23:59   ` Andi Kleen
  2007-07-12  0:33     ` Andrew Morton
  1 sibling, 1 reply; 484+ messages in thread
From: Andi Kleen @ 2007-07-11 23:59 UTC (permalink / raw)
  To: Thomas Gleixner; +Cc: Andrew Morton, linux-kernel, Linus Torvalds, Ingo Molnar


> Andi, any objections against the above i386 fixlets ?

No, they are fine for me.

-Andi

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: generic clockevents/ (hr)time(r) patches was Re: -mm merge plans for 2.6.23
  2007-07-11 23:57   ` Andrew Morton
@ 2007-07-12  0:04     ` Thomas Gleixner
  2007-07-12  0:17       ` [stable] " Chris Wright
  2007-07-12  0:43       ` Andi Kleen
  0 siblings, 2 replies; 484+ messages in thread
From: Thomas Gleixner @ 2007-07-12  0:04 UTC (permalink / raw)
  To: Andrew Morton
  Cc: linux-kernel, Linus Torvalds, Ingo Molnar, Andi Kleen, Stable Team

Andrew,

On Wed, 2007-07-11 at 16:57 -0700, Andrew Morton wrote:
> They all look pretty innocuous to me.
> 
> Could you please take a second look, decide if any of them should also be
> in 2.6.22.x and let me know?

i386-hpet-check-if-the-counter-works.patch
pcspkr-use-the-global-pit-lock.patch

are the only candidates.

	tglx



^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: x86 status was Re: -mm merge plans for 2.6.23
  2007-07-11 23:58                 ` Ingo Molnar
@ 2007-07-12  0:07                   ` Andi Kleen
  2007-07-12  0:15                     ` Chris Wright
  2007-07-12  0:18                     ` Ingo Molnar
  0 siblings, 2 replies; 484+ messages in thread
From: Andi Kleen @ 2007-07-12  0:07 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andi Kleen, Linus Torvalds, Thomas Gleixner, Andrew Morton,
	linux-kernel, Arjan van de Ven, Chris Wright

> i dont think "clean, modern x86 code" will ever happen - x86_64 has and 
> is going to have the exact same type of crap. And i'll say a weird thing 

Yes, but it will be new crap, but no old crap anymore.

If you always pile the new crap on the old crap at some point
the whole thing might fall over. 64bit was intended as a fresh start.

Admittedly we're getting more and more workarounds too and sometimes
when I want to remove cruft i find out it is still needed on some
64bit boxes (e.g. see my repeated attempts to clean up the irq 0
routing), but it's still much better than i386.

> I think the PowerPC experience (although it is not a fully equivalent 
> case) about them merging their 32-bit and 64-bit architectures was an 
> overwhelmingly positive move, and x86 could learn a thing or two from 
> that.

The equivalent to the powerpc way would be essentially to report i386
into the x86-64 code base and leave the really old hardware only
in arch/i386. I've considered doing it, but it would be an awful
lot of work and to tempt distributions to actually use the new
port would require going back quite a long time. And at least
immediately it would end up with three cases to do things instead
of two like currently.

-Andi


^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: x86 status was Re: -mm merge plans for 2.6.23
  2007-07-12  0:07                   ` Andi Kleen
@ 2007-07-12  0:15                     ` Chris Wright
  2007-07-12  0:18                     ` Ingo Molnar
  1 sibling, 0 replies; 484+ messages in thread
From: Chris Wright @ 2007-07-12  0:15 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Ingo Molnar, Linus Torvalds, Thomas Gleixner, Andrew Morton,
	linux-kernel, Arjan van de Ven, Chris Wright

* Andi Kleen (andi@firstfloor.org) wrote:
> The equivalent to the powerpc way would be essentially to report i386
> into the x86-64 code base and leave the really old hardware only
> in arch/i386. I've considered doing it, but it would be an awful
> lot of work and to tempt distributions to actually use the new
> port would require going back quite a long time. And at least
> immediately it would end up with three cases to do things instead
> of two like currently.

Well that's just silly.  The right way will never create 3 ways, but
always keep the limit to the existing 2 where the differences aren't
worth reconciling, and 1 for anything that is common.

It will be a fair amount of work, so any constructive input you have
upfront would be helpful.

thanks,
-chris

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [stable] generic clockevents/ (hr)time(r) patches was Re: -mm merge plans for 2.6.23
  2007-07-12  0:04     ` Thomas Gleixner
@ 2007-07-12  0:17       ` Chris Wright
  2007-07-12  0:43       ` Andi Kleen
  1 sibling, 0 replies; 484+ messages in thread
From: Chris Wright @ 2007-07-12  0:17 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andrew Morton, Stable Team, Ingo Molnar, Linus Torvalds,
	linux-kernel, Andi Kleen

* Thomas Gleixner (tglx@linutronix.de) wrote:
> Andrew,
> 
> On Wed, 2007-07-11 at 16:57 -0700, Andrew Morton wrote:
> > They all look pretty innocuous to me.
> > 
> > Could you please take a second look, decide if any of them should also be
> > in 2.6.22.x and let me know?
> 
> i386-hpet-check-if-the-counter-works.patch
> pcspkr-use-the-global-pit-lock.patch
> 
> are the only candidates.

yup, come through -stable a few times, be great to get them upstream,
and into .22.y

thanks,
-chris

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: x86 status was Re: -mm merge plans for 2.6.23
  2007-07-12  0:07                   ` Andi Kleen
  2007-07-12  0:15                     ` Chris Wright
@ 2007-07-12  0:18                     ` Ingo Molnar
  2007-07-12  0:37                       ` Andi Kleen
  1 sibling, 1 reply; 484+ messages in thread
From: Ingo Molnar @ 2007-07-12  0:18 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Linus Torvalds, Thomas Gleixner, Andrew Morton, linux-kernel,
	Arjan van de Ven, Chris Wright


* Andi Kleen <andi@firstfloor.org> wrote:

> > i dont think "clean, modern x86 code" will ever happen - x86_64 has 
> > and is going to have the exact same type of crap. And i'll say a 
> > weird thing
> 
> Yes, but it will be new crap, but no old crap anymore.
> 
> If you always pile the new crap on the old crap at some point the 
> whole thing might fall over. 64bit was intended as a fresh start.

I think there's no such thing as a fresh start for a diverse 
architecture - the ia64 failure has proven that. x86_64 CPUs still do 
A20 emulation today (!). We still have people running industrial boards 
on real i386 DX CPUs, with the latest upstream kernel. 15 years ago an 
i386 DX was already quite obsolete. 32-bit is not going to go away in 
our lifetime, and we'll want to support it in a first-grade way. We 
better realize that prospect and have it right before our eyes in a 
single tree wherever it makes sense to share code - i'm certainly not 
talking about sharing mtrr/centaur.c or k8.c. (and i'm not necessarily 
suggesting to share io_apic.c either - although it's certainly 
borderline.)

	Ingo

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: generic clockevents/ (hr)time(r) patches was Re: -mm merge plans for 2.6.23
  2007-07-11 23:59   ` Andi Kleen
@ 2007-07-12  0:33     ` Andrew Morton
  0 siblings, 0 replies; 484+ messages in thread
From: Andrew Morton @ 2007-07-12  0:33 UTC (permalink / raw)
  To: Andi Kleen; +Cc: Thomas Gleixner, linux-kernel, Linus Torvalds, Ingo Molnar

On Thu, 12 Jul 2007 01:59:23 +0200
Andi Kleen <ak@suse.de> wrote:

> 
> > Andi, any objections against the above i386 fixlets ?
> 
> No, they are fine for me.
> 

OK, I queued them up for an akpm->linus transfer.  Which will of course be
abandoned if an akpm->andi or andi->linus merge happens in the next week or
so.

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: x86 status was Re: -mm merge plans for 2.6.23
  2007-07-12  0:18                     ` Ingo Molnar
@ 2007-07-12  0:37                       ` Andi Kleen
  0 siblings, 0 replies; 484+ messages in thread
From: Andi Kleen @ 2007-07-12  0:37 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Andi Kleen, Linus Torvalds, Thomas Gleixner, Andrew Morton,
	linux-kernel, Arjan van de Ven, Chris Wright

On Thu, Jul 12, 2007 at 02:18:07AM +0200, Ingo Molnar wrote:
> > > i dont think "clean, modern x86 code" will ever happen - x86_64 has 
> > > and is going to have the exact same type of crap. And i'll say a 
> > > weird thing
> > 
> > Yes, but it will be new crap, but no old crap anymore.
> > 
> > If you always pile the new crap on the old crap at some point the 
> > whole thing might fall over. 64bit was intended as a fresh start.
> 
> I think there's no such thing as a fresh start for a diverse 
> architecture - the ia64 failure has proven that. x86_64 CPUs still do 
> A20 emulation today (!). 

x86-64 doesn't care about a lot of x86 baggage and a lot of
things have been even obsoleted in the platform.

In practice the backwards compatibility on x86 isn't that
great either.  For example a significant number of new systems don't 
even work correctly in PIC mode anymore.

> We still have people running industrial boards 
> on real i386 DX CPUs, with the latest upstream kernel. 15 years ago an 

Yes, but those for example would be perfectly happy with an arch/i386
with all APIC and SMP code stripped out.

Only the few people who still run dual P5s might not, but
those could continue using old kernels.

But eventually I think that would be the right clean way: 

arch/i386 stripped down port for truly old systems like the embedded 386
upto 586 or early 686. No SMP or APIC.
arch/x86 supporting 32bit and 64bit for reasonably modern systems.
NUMAQ/Voyager/P5-SMP/visual workstation gone [frankly the user
base of those is too small to justify the code impact]

It's just quite ugly to get there and when you think through it
the actual advantages of such a setup it is likely not enough to 
justify the significant work to make it work.

Also I wouldn't have any idea how to regression test significant
changes to arch/i386 aimed at old systems. e.g. I don't think
the powerpc people actually tried to still support really
old systems where it is hard to do regression tests anymore,
only really supported platforms.

So while such a setup would be quite nice the practical
problems of getting there are nasty. Also I must admit I prefer
hacking on new code instead.

-Andi

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: generic clockevents/ (hr)time(r) patches was Re: -mm merge plans for 2.6.23
  2007-07-12  0:04     ` Thomas Gleixner
  2007-07-12  0:17       ` [stable] " Chris Wright
@ 2007-07-12  0:43       ` Andi Kleen
  2007-07-12  0:46         ` [stable] " Chris Wright
  1 sibling, 1 reply; 484+ messages in thread
From: Andi Kleen @ 2007-07-12  0:43 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Andrew Morton, linux-kernel, Linus Torvalds, Ingo Molnar, Stable Team

On Thursday 12 July 2007 02:04, Thomas Gleixner wrote:
> Andrew,
>
> On Wed, 2007-07-11 at 16:57 -0700, Andrew Morton wrote:
> > They all look pretty innocuous to me.
> >
> > Could you please take a second look, decide if any of them should also be
> > in 2.6.22.x and let me know?
>
> i386-hpet-check-if-the-counter-works.patch
> pcspkr-use-the-global-pit-lock.patch

Ok by me, although I suspect a lot of the cases where the hpet one
was needed got resolved with the PCI HPET resource fix But it's still
safer to check.

However I don't think patches should go into stable before they 
hit Linus' tree.

-Andi

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [stable] generic clockevents/ (hr)time(r) patches was Re: -mm merge plans for 2.6.23
  2007-07-12  0:43       ` Andi Kleen
@ 2007-07-12  0:46         ` Chris Wright
  0 siblings, 0 replies; 484+ messages in thread
From: Chris Wright @ 2007-07-12  0:46 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Thomas Gleixner, Andrew Morton, Ingo Molnar, Linus Torvalds,
	linux-kernel, Stable Team

* Andi Kleen (ak@suse.de) wrote:
> Ok by me, although I suspect a lot of the cases where the hpet one
> was needed got resolved with the PCI HPET resource fix But it's still
> safer to check.
> 
> However I don't think patches should go into stable before they 
> hit Linus' tree.

Agreed, we're just waiting ;-)

thanks,
-chris

^ permalink raw reply	[flat|nested] 484+ messages in thread

* fault vs invalidate race (Re: -mm merge plans for 2.6.23)
  2007-07-10  8:31 -mm merge plans for 2.6.23 Andrew Morton
                   ` (23 preceding siblings ...)
  2007-07-11 23:03 ` generic clockevents/ (hr)time(r) patches " Thomas Gleixner
@ 2007-07-12  0:54 ` Nick Piggin
  2007-07-12  2:31   ` block_page_mkwrite? (Re: fault vs invalidate race (Re: -mm merge plans for 2.6.23)) David Chinner
  2007-07-13  9:46 ` -mm merge plans for 2.6.23 Jan Engelhardt
  2007-07-17  8:55 ` unprivileged mounts (was: Re: -mm merge plans for 2.6.23) Andrew Morton
  26 siblings, 1 reply; 484+ messages in thread
From: Nick Piggin @ 2007-07-12  0:54 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel, Linux Memory Management

Andrew Morton wrote:

> mm-fix-fault-vs-invalidate-race-for-linear-mappings.patch
> mm-merge-populate-and-nopage-into-fault-fixes-nonlinear.patch
> mm-merge-nopfn-into-fault.patch
> convert-hugetlbfs-to-use-vm_ops-fault.patch
> mm-remove-legacy-cruft.patch
> mm-debug-check-for-the-fault-vs-invalidate-race.patch
> mm-fix-clear_page_dirty_for_io-vs-fault-race.patch
> invalidate_mapping_pages-add-cond_resched.patch
> ocfs2-release-page-lock-before-calling-page_mkwrite.patch
> document-page_mkwrite-locking.patch
> 
>  The fault-vs-invalidate race fix.  I have belatedly learned that these need
>  more work, so their state is uncertain.

The more work may turn out being too much for you (although it is nothing
exactly tricky that would introduce subtle bugs, it is a fair amont of churn).

However, in that case we can still merge these two:

mm-fix-fault-vs-invalidate-race-for-linear-mappings.patch
mm-fix-clear_page_dirty_for_io-vs-fault-race.patch

Which fix real bugs that need fixing (and will at least help to get some of
my patches off your hands).

-- 
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: lguest, Re: -mm merge plans for 2.6.23
  2007-07-11 12:23 ` lguest, " Christoph Hellwig
  2007-07-11 15:45   ` Randy Dunlap
  2007-07-11 18:04   ` Andrew Morton
@ 2007-07-12  1:21   ` Rusty Russell
  2007-07-12  2:28     ` David Miller
  2 siblings, 1 reply; 484+ messages in thread
From: Rusty Russell @ 2007-07-12  1:21 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Andrew Morton, linux-kernel, linux-mm

On Wed, 2007-07-11 at 14:23 +0200, Christoph Hellwig wrote:
> > lguest-export-symbols-for-lguest-as-a-module.patch
> 
> __put_task_struct is one of those no way in hell should this be exported
> things because we don't want modules messing with task lifetimes.
> 
> Fortunately I can't find anything actually using this in lguest, so
> it looks the issue has been solved in the meantime.

To do inter-guest (ie. inter-process) I/O you really have to make sure
the other side doesn't go away.

> I also have a rather bad feeling about exporting access_process_vm.
> This is the proverbial sledge hammer for access to user vm addresses
> and I'd rather keep it away from module programmers with "if all
> you have is a hammer ..." in mind.
> 
> In lguest this is used by send_dma which from my short reading of the
> code seems to be the central IPC mechanism.  The double copy here
> doesn't look very efficient to me either.  Maybe some VM folks could
> look into a better way to archive this that might be both more
> efficient and not require the export.

It's not a double copy: it's a map & copy.

If KVM develops inter-guest I/O then this could all be extracted into a
helper function and made more efficient.

> Just started to reading this (again) so no useful comment here, but it
> would be nice if the code could follow CodingStyle and place the || and
> && at the end of the line in multiline conditionals instead of at the
> beginning of the new one.

Surprisingly, you have a point here.  Since the key purpose of lguest is
as demonstration code, it meticulously match kernel style.

I shall immediately prepare a patch to convert the rest of the kernel to
the correct "&& at beginning of line" style.

Rusty.


^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: lguest, Re: -mm merge plans for 2.6.23
  2007-07-12  1:21   ` Rusty Russell
@ 2007-07-12  2:28     ` David Miller
  2007-07-12  2:48       ` Rusty Russell
  0 siblings, 1 reply; 484+ messages in thread
From: David Miller @ 2007-07-12  2:28 UTC (permalink / raw)
  To: rusty; +Cc: hch, akpm, linux-kernel, linux-mm

From: Rusty Russell <rusty@rustcorp.com.au>
Date: Thu, 12 Jul 2007 11:21:51 +1000

> To do inter-guest (ie. inter-process) I/O you really have to make sure
> the other side doesn't go away.

You should just let it exit and when it does you receive some kind of
exit notification that resets your virtual device channel.

I think the reference counting approach is error and deadlock prone.
Be more loose and let the events reset the virtual devices when
guests go splat.

^ permalink raw reply	[flat|nested] 484+ messages in thread

* block_page_mkwrite? (Re: fault vs invalidate race (Re: -mm merge plans for 2.6.23))
  2007-07-12  0:54 ` fault vs invalidate race (Re: -mm merge plans for 2.6.23) Nick Piggin
@ 2007-07-12  2:31   ` David Chinner
  2007-07-12  2:42     ` Nick Piggin
  0 siblings, 1 reply; 484+ messages in thread
From: David Chinner @ 2007-07-12  2:31 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Andrew Morton, linux-kernel, Linux Memory Management,
	linux-fsdevel, xfs-oss

On Thu, Jul 12, 2007 at 10:54:57AM +1000, Nick Piggin wrote:
> Andrew Morton wrote:
> > The fault-vs-invalidate race fix.  I have belatedly learned that these 
> > need
> > more work, so their state is uncertain.
> 
> The more work may turn out being too much for you (although it is nothing
> exactly tricky that would introduce subtle bugs, it is a fair amont of 
> churn).

OK, so does that mean we can finally get the block_page_mkwrite
patches merged?

i.e.:

http://marc.info/?l=linux-kernel&m=117426058311032&w=2
http://marc.info/?l=linux-kernel&m=117426070111136&w=2

I've got up-to-date versions of them ready to go and they've been
consistently tested thanks to the XFSQA test I wrote for the bug
that it fixes. I've been holding them out-of-tree for months now
because ->fault was supposed to supercede this interface.....

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: block_page_mkwrite? (Re: fault vs invalidate race (Re: -mm merge plans for 2.6.23))
  2007-07-12  2:31   ` block_page_mkwrite? (Re: fault vs invalidate race (Re: -mm merge plans for 2.6.23)) David Chinner
@ 2007-07-12  2:42     ` Nick Piggin
  0 siblings, 0 replies; 484+ messages in thread
From: Nick Piggin @ 2007-07-12  2:42 UTC (permalink / raw)
  To: David Chinner
  Cc: Andrew Morton, linux-kernel, Linux Memory Management,
	linux-fsdevel, xfs-oss

David Chinner wrote:
> On Thu, Jul 12, 2007 at 10:54:57AM +1000, Nick Piggin wrote:
> 
>>Andrew Morton wrote:
>>
>>>The fault-vs-invalidate race fix.  I have belatedly learned that these 
>>>need
>>>more work, so their state is uncertain.
>>
>>The more work may turn out being too much for you (although it is nothing
>>exactly tricky that would introduce subtle bugs, it is a fair amont of 
>>churn).
> 
> 
> OK, so does that mean we can finally get the block_page_mkwrite
> patches merged?
> 
> i.e.:
> 
> http://marc.info/?l=linux-kernel&m=117426058311032&w=2
> http://marc.info/?l=linux-kernel&m=117426070111136&w=2
> 
> I've got up-to-date versions of them ready to go and they've been
> consistently tested thanks to the XFSQA test I wrote for the bug
> that it fixes. I've been holding them out-of-tree for months now
> because ->fault was supposed to supercede this interface.....

Yeah, as I've said, don't hold them back because of me. They are
relatively simple enough that I don't see why they couldn't be
merged in this window.

-- 
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: lguest, Re: -mm merge plans for 2.6.23
  2007-07-12  2:28     ` David Miller
@ 2007-07-12  2:48       ` Rusty Russell
  2007-07-12  2:51         ` David Miller
  2007-07-12  4:24         ` Andrew Morton
  0 siblings, 2 replies; 484+ messages in thread
From: Rusty Russell @ 2007-07-12  2:48 UTC (permalink / raw)
  To: David Miller; +Cc: hch, akpm, linux-kernel, linux-mm

On Wed, 2007-07-11 at 19:28 -0700, David Miller wrote:
> From: Rusty Russell <rusty@rustcorp.com.au>
> Date: Thu, 12 Jul 2007 11:21:51 +1000
> 
> > To do inter-guest (ie. inter-process) I/O you really have to make sure
> > the other side doesn't go away.
> 
> You should just let it exit and when it does you receive some kind of
> exit notification that resets your virtual device channel.
> 
> I think the reference counting approach is error and deadlock prone.
> Be more loose and let the events reset the virtual devices when
> guests go splat.

There are two places where we grab task refcnt.  One might be avoidable
(will test and get back) but the deferred wakeup isn't really:

        /* We cache one process to wakeup: helps for batching & wakes outside locks. */
        void set_wakeup_process(struct lguest *lg, struct task_struct *p)
        {
        	if (p == lg->wake)
        		return;
        
        	if (lg->wake) {
        		wake_up_process(lg->wake);
        		put_task_struct(lg->wake);
        	}
        	lg->wake = p;
        	if (lg->wake)
        		get_task_struct(lg->wake);
        }

We drop the lock after I/O, and then do this wakeup.  Meanwhile the
other task might have exited.

I could get rid of it, but I don't think there's anything wrong with the
code...

Cheers,
Rusty.



^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: lguest, Re: -mm merge plans for 2.6.23
  2007-07-12  2:48       ` Rusty Russell
@ 2007-07-12  2:51         ` David Miller
  2007-07-12  3:15           ` Rusty Russell
  2007-07-12  4:24         ` Andrew Morton
  1 sibling, 1 reply; 484+ messages in thread
From: David Miller @ 2007-07-12  2:51 UTC (permalink / raw)
  To: rusty; +Cc: hch, akpm, linux-kernel, linux-mm

From: Rusty Russell <rusty@rustcorp.com.au>
Date: Thu, 12 Jul 2007 12:48:41 +1000

> We drop the lock after I/O, and then do this wakeup.  Meanwhile the
> other task might have exited.

I already understand what you're doing.

Is it possible to use exit notifiers to handle this case?
That's what I'm trying to suggest. :)

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: x86 status was Re: -mm merge plans for 2.6.23
  2007-07-11 22:58             ` Linus Torvalds
@ 2007-07-12  2:53               ` Arjan van de Ven
  0 siblings, 0 replies; 484+ messages in thread
From: Arjan van de Ven @ 2007-07-12  2:53 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Chris Wright, Thomas Gleixner, Ingo Molnar, Andi Kleen,
	Andrew Morton, linux-kernel

On Wed, 2007-07-11 at 15:58 -0700, Linus Torvalds wrote:
> 
> On Wed, 11 Jul 2007, Chris Wright wrote:
> > 
> > That's not quite right.  Leaving the code unchanged caused breakage
> > already.  The PIT is damn stupid and can be sensitive to how quickly it's
> > programmed.  So code that enable/disable didn't change, but frequency
> > with which it is called did and broke some random boxes.
> 
> Sure. We cannot avoid *all* problems. Bugs happen. 
> 
> But at least we could try to make sure that there aren't totally 
> unnecessary changes in that switch-over patch. Which there definitely 
> were, as far as I can tell.


one note is that the "talk differently to hardware" thing is in part
already tested with the 32 bit tickless code; a lot of people (80% ?)
are still using the 32 bit OS on their 64 bit machines, and the 32 bit
code already talks in the "new way" to this hardware.... 
(and since Fedora 7 already ships tickless for 32 bit there are quite a
lot of people using that in practice, in addition to the kernel.org
kernel users)

I would expect just about all the hardware interaction issues to have
popped up already because of this "run 32 bit on 64 bit hardware" thing.
-- 
if you want to mail me at work (you don't), use arjan (at) linux.intel.com
Test the interaction between Linux and your BIOS via http://www.linuxfirmwarekit.org


^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: lguest, Re: -mm merge plans for 2.6.23
  2007-07-12  2:51         ` David Miller
@ 2007-07-12  3:15           ` Rusty Russell
  2007-07-12  3:35             ` David Miller
  0 siblings, 1 reply; 484+ messages in thread
From: Rusty Russell @ 2007-07-12  3:15 UTC (permalink / raw)
  To: David Miller; +Cc: hch, akpm, linux-kernel, linux-mm

On Wed, 2007-07-11 at 19:51 -0700, David Miller wrote:
> From: Rusty Russell <rusty@rustcorp.com.au>
> Date: Thu, 12 Jul 2007 12:48:41 +1000
> 
> > We drop the lock after I/O, and then do this wakeup.  Meanwhile the
> > other task might have exited.
> 
> I already understand what you're doing.
> 
> Is it possible to use exit notifiers to handle this case?
> That's what I'm trying to suggest. :)

Sure, the process has /dev/lguest open, so I can do something in the
close routine.  Instead of keeping a reference to the tsk, I can keep a
reference to the struct lguest (currently it doesn't have or need a
refcnt).  Then I need another lock, to protect lg->tsk.

This seems like a lot of dancing to avoid one export.  If it's that
important I'd far rather drop the code and do a normal wakeup under the
big lguest lock for 2.6.23.

Cheers,
Rusty.



^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: lguest, Re: -mm merge plans for 2.6.23
  2007-07-12  3:15           ` Rusty Russell
@ 2007-07-12  3:35             ` David Miller
  0 siblings, 0 replies; 484+ messages in thread
From: David Miller @ 2007-07-12  3:35 UTC (permalink / raw)
  To: rusty; +Cc: hch, akpm, linux-kernel, linux-mm

From: Rusty Russell <rusty@rustcorp.com.au>
Date: Thu, 12 Jul 2007 13:15:18 +1000

> Sure, the process has /dev/lguest open, so I can do something in the
> close routine.  Instead of keeping a reference to the tsk, I can keep a
> reference to the struct lguest (currently it doesn't have or need a
> refcnt).  Then I need another lock, to protect lg->tsk.
> 
> This seems like a lot of dancing to avoid one export.  If it's that
> important I'd far rather drop the code and do a normal wakeup under the
> big lguest lock for 2.6.23.

I'm not against the export, so use if it really helps.

Ref-counting just seems clumsy to me given how the hw assisted
virtualization stuff works on platforms I am intimately familiar with
:)

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: lguest, Re: -mm merge plans for 2.6.23
  2007-07-12  2:48       ` Rusty Russell
  2007-07-12  2:51         ` David Miller
@ 2007-07-12  4:24         ` Andrew Morton
  2007-07-12  4:52           ` Rusty Russell
  1 sibling, 1 reply; 484+ messages in thread
From: Andrew Morton @ 2007-07-12  4:24 UTC (permalink / raw)
  To: Rusty Russell; +Cc: David Miller, hch, linux-kernel, linux-mm

On Thu, 12 Jul 2007 12:48:41 +1000 Rusty Russell <rusty@rustcorp.com.au> wrote:

> On Wed, 2007-07-11 at 19:28 -0700, David Miller wrote:
> > From: Rusty Russell <rusty@rustcorp.com.au>
> > Date: Thu, 12 Jul 2007 11:21:51 +1000
> > 
> > > To do inter-guest (ie. inter-process) I/O you really have to make sure
> > > the other side doesn't go away.
> > 
> > You should just let it exit and when it does you receive some kind of
> > exit notification that resets your virtual device channel.
> > 
> > I think the reference counting approach is error and deadlock prone.
> > Be more loose and let the events reset the virtual devices when
> > guests go splat.
> 
> There are two places where we grab task refcnt.  One might be avoidable
> (will test and get back) but the deferred wakeup isn't really:
> 
>         /* We cache one process to wakeup: helps for batching & wakes outside locks. */
>         void set_wakeup_process(struct lguest *lg, struct task_struct *p)
>         {
>         	if (p == lg->wake)
>         		return;
>         
>         	if (lg->wake) {
>         		wake_up_process(lg->wake);
>         		put_task_struct(lg->wake);
>         	}
>         	lg->wake = p;
>         	if (lg->wake)
>         		get_task_struct(lg->wake);
>         }

<handwaving>

We seem to be taking the reference against the wrong thing here.  It should
be against the mm, not against a task_struct?

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: lguest, Re: -mm merge plans for 2.6.23
  2007-07-12  4:24         ` Andrew Morton
@ 2007-07-12  4:52           ` Rusty Russell
  2007-07-12 11:10             ` Avi Kivity
  2007-07-19 17:27             ` Christoph Hellwig
  0 siblings, 2 replies; 484+ messages in thread
From: Rusty Russell @ 2007-07-12  4:52 UTC (permalink / raw)
  To: Andrew Morton; +Cc: David Miller, hch, linux-kernel, linux-mm

On Wed, 2007-07-11 at 21:24 -0700, Andrew Morton wrote:
> We seem to be taking the reference against the wrong thing here.  It should
> be against the mm, not against a task_struct?

This is solely for the wakeup: you don't wake an mm 8)

The mm reference is held as well under the big lguest_mutex (mm gets
destroyed before files get closed, so we definitely do need to hold a
reference).

I just completed benchmarking: the cached wakeup with the current naive
drivers makes no difference (at one stage I was playing with batched
hypercalls, where it seemed to help).

Thanks Christoph, DaveM!
===
Remove export of __put_task_struct, and usage in lguest

lguest takes a reference count of tasks for two reasons.  The first is
bogus: the /dev/lguest close callback will be called before the task
is destroyed anyway, so no need to take a reference on open.

The second is code to defer waking up tasks for inter-guest I/O, but
the current lguest drivers are too simplistic to benefit (only batched
hypercalls will see an effect, and it's likely that lguests' entire
I/O model will be replaced with virtio and ringbuffers anyway).

Signed-off-by: Rusty Russell <rusty@rustcorp.com.au>
---
 drivers/lguest/hypercalls.c  |    1 -
 drivers/lguest/io.c          |   18 +-----------------
 drivers/lguest/lg.h          |    1 -
 drivers/lguest/lguest_user.c |    2 --
 kernel/fork.c                |    1 -
 5 files changed, 1 insertion(+), 22 deletions(-)

===================================================================
--- a/drivers/lguest/hypercalls.c
+++ b/drivers/lguest/hypercalls.c
@@ -189,5 +189,4 @@ void do_hypercalls(struct lguest *lg)
 		do_hcall(lg, lg->regs);
 		clear_hcall(lg);
 	}
-	set_wakeup_process(lg, NULL);
 }
===================================================================
--- a/drivers/lguest/io.c
+++ b/drivers/lguest/io.c
@@ -296,7 +296,7 @@ static int dma_transfer(struct lguest *s
 
 	/* Do this last so dst doesn't simply sleep on lock. */
 	set_bit(dst->interrupt, dstlg->irqs_pending);
-	set_wakeup_process(srclg, dstlg->tsk);
+	wake_up_process(dstlg->tsk);
 	return i == dst->num_dmas;
 
 fail:
@@ -333,7 +333,6 @@ again:
 			/* Give any recipients one chance to restock. */
 			up_read(&current->mm->mmap_sem);
 			mutex_unlock(&lguest_lock);
-			set_wakeup_process(lg, NULL);
 			empty++;
 			goto again;
 		}
@@ -360,21 +359,6 @@ void release_all_dma(struct lguest *lg)
 			unlink_dma(&lg->dma[i]);
 	}
 	up_read(&lg->mm->mmap_sem);
-}
-
-/* We cache one process to wakeup: helps for batching & wakes outside locks. */
-void set_wakeup_process(struct lguest *lg, struct task_struct *p)
-{
-	if (p == lg->wake)
-		return;
-
-	if (lg->wake) {
-		wake_up_process(lg->wake);
-		put_task_struct(lg->wake);
-	}
-	lg->wake = p;
-	if (lg->wake)
-		get_task_struct(lg->wake);
 }
 
 /* Userspace wants a dma buffer from this guest. */
===================================================================
--- a/drivers/lguest/lg.h
+++ b/drivers/lguest/lg.h
@@ -240,7 +240,6 @@ void release_all_dma(struct lguest *lg);
 void release_all_dma(struct lguest *lg);
 unsigned long get_dma_buffer(struct lguest *lg, unsigned long key,
 			     unsigned long *interrupt);
-void set_wakeup_process(struct lguest *lg, struct task_struct *p);
 
 /* hypercalls.c: */
 void do_hypercalls(struct lguest *lg);
===================================================================
--- a/drivers/lguest/lguest_user.c
+++ b/drivers/lguest/lguest_user.c
@@ -141,7 +141,6 @@ static int initialize(struct file *file,
 	setup_guest_gdt(lg);
 	init_clockdev(lg);
 	lg->tsk = current;
-	get_task_struct(lg->tsk);
 	lg->mm = get_task_mm(lg->tsk);
 	init_waitqueue_head(&lg->break_wq);
 	lg->last_pages = NULL;
@@ -205,7 +204,6 @@ static int close(struct inode *inode, st
 	hrtimer_cancel(&lg->hrt);
 	release_all_dma(lg);
 	free_guest_pagetable(lg);
-	put_task_struct(lg->tsk);
 	mmput(lg->mm);
 	if (!IS_ERR(lg->dead))
 		kfree(lg->dead);
===================================================================
--- a/kernel/fork.c
+++ b/kernel/fork.c
@@ -128,7 +128,6 @@ void __put_task_struct(struct task_struc
 	if (!profile_handoff_task(tsk))
 		free_task(tsk);
 }
-EXPORT_SYMBOL_GPL(__put_task_struct);
 
 void __init fork_init(unsigned long mempages)
 {



^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: containers (was Re: -mm merge plans for 2.6.23)
  2007-07-11 19:44           ` Paul Menage
@ 2007-07-12  5:39             ` Srivatsa Vaddagiri
  0 siblings, 0 replies; 484+ messages in thread
From: Srivatsa Vaddagiri @ 2007-07-12  5:39 UTC (permalink / raw)
  To: Paul Menage; +Cc: Andrew Morton, containers, Ingo Molnar, linux-kernel

On Wed, Jul 11, 2007 at 12:44:42PM -0700, Paul Menage wrote:
> >I'm inclined to take the cautious route here - I don't think people will be
> >dying for the CFS thingy (which I didn't even know about?) in .23, and it's
> >rather a lot of infrastructure to add for a CPU scheduler configurator
> 
> Selecting the relevant patches to give enough of the container
> framework to support a CFS container subsystem (slightly
> tweaked/updated versions of the base patch, procfs interface patch and
> tasks file interface patch) is about 1600 lines in kernel/container.c
> and another 200 in kernel/container.h, which is about 99% of the
> non-documentation changes.
> 
> So not tiny, but it's not very intrusive on the rest of the kernel,
> and would avoid having to introduce a temporary API based on uids.

Yes that would be good. As long as the user-land interface for process
containers doesn't change (much?) between 2.6.23 and later releases this
should be a good workaround for us.

-- 
Regards,
vatsa

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: lguest, Re: -mm merge plans for 2.6.23
  2007-07-12  4:52           ` Rusty Russell
@ 2007-07-12 11:10             ` Avi Kivity
  2007-07-12 23:20               ` Rusty Russell
  2007-07-19 17:27             ` Christoph Hellwig
  1 sibling, 1 reply; 484+ messages in thread
From: Avi Kivity @ 2007-07-12 11:10 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Andrew Morton, David Miller, hch, linux-kernel, linux-mm

Rusty Russell wrote:
> Remove export of __put_task_struct, and usage in lguest
>
> lguest takes a reference count of tasks for two reasons.  The first is
> bogus: the /dev/lguest close callback will be called before the task
> is destroyed anyway, so no need to take a reference on open.
>
>   

What about

  Open /dev/lguest
  transfer fd using SCM_RIGHTS (or clone()?)
  close fd in original task
  exit()

?

My feeling is that if you want to be bound to a task, not a file, you 
need to use syscalls, not ioctls.

-- 
error compiling committee.c: too many arguments to function


^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [ck] Re: -mm merge plans for 2.6.23
  2007-07-11  1:14     ` [ck] " Andrew Morton
                         ` (4 preceding siblings ...)
  2007-07-11 12:26       ` Kevin Winchester
@ 2007-07-12 12:06       ` Kacper Wysocki
  2007-07-12 12:35         ` Avuton Olrich
  5 siblings, 1 reply; 484+ messages in thread
From: Kacper Wysocki @ 2007-07-12 12:06 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Matthew Hawkins, linux-kernel, Con Kolivas, ck list, linux-mm,
	Paul Jackson

On 7/11/07, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Wed, 11 Jul 2007 11:02:56 +1000 "Matthew Hawkins" <darthmdh@gmail.com> wrote:
>
> > We all know swap prefetch has been tested out the wazoo since Moses was a
> > little boy, is compile-time and runtime selectable, and gives an important
> > and quantifiable performance increase to desktop systems.
>
> Always interested.  Please provide us more details on your usage and
> testing of that code.  Amount of memory, workload, observed results,
> etc?

Swap prefetch has been around for years, and it's a complete boon for
the desktop user and a noop in any other situation. In addition to the
sp_tester tool which consistently shows a definite advantage, there
are many user reports that show the noticeable improvements it has.
The many people who have tried it out have generally chosen to switch
to patched kernels because of the performance increase.

It's been discussed on the lkml many times before, we've been over
performance, testing and impact. The big question is: why *don't* we
merge it?

-Kacper

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [ck] Re: -mm merge plans for 2.6.23
  2007-07-12 12:06       ` Kacper Wysocki
@ 2007-07-12 12:35         ` Avuton Olrich
  0 siblings, 0 replies; 484+ messages in thread
From: Avuton Olrich @ 2007-07-12 12:35 UTC (permalink / raw)
  To: Kacper Wysocki
  Cc: Andrew Morton, Matthew Hawkins, linux-kernel, Con Kolivas,
	ck list, linux-mm, Paul Jackson

On 7/12/07, Kacper Wysocki <kacperw@online.no> wrote:
> performance, testing and impact. The big question is: why *don't* we
> merge it?

Stranger thing to me is that this is like Déjà Vu. Many have asked
this same question. When users were asked for their comment before
many end users and some developers have given rave reviews. I don't
remember anyone giving it the heavy thumbs-down, with exception of
when things needed fixing (over 6 months ago?). It continues to go
unmerged. Is there a clear answer on what needs to happen for it to
get merged?
-- 
avuton
--
 Anyone who quotes me in their sig is an idiot. -- Rusty Russell.

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: x86 status was Re: -mm merge plans for 2.6.23
  2007-07-11 22:09         ` Linus Torvalds
@ 2007-07-12 15:36           ` Oleg Verych
  2007-07-13  2:23           ` Roman Zippel
  1 sibling, 0 replies; 484+ messages in thread
From: Oleg Verych @ 2007-07-12 15:36 UTC (permalink / raw)
  To: linux-kernel

* Linus Torvalds "Wed, 11 Jul 2007 15:09:28 -0700 (PDT)"
>
> On Wed, 11 Jul 2007, Andrea Arcangeli wrote:
>>
>> I'm going to change topic big time because your sentence above
>> perfectly applies to the O(1) scheduler too.
>
> I disagree to a large degree.
>
> We almost never have problems with code you can "think about".
>
> Sure, bugs happen, but code that everybody runs the same generally doesn't 
> break. So a CPU scheduler doesn't worry me all that much. CPU schedulers 
> are "easy".
>
> What worries me is interfaces to hardware that we know looks different for 
> different people. That means that any testing that one person has done 
> doesn't necessarily translate to anything at *all* on another persons 
> machine.
>
> The timer problems we had when merging the stuff in 2.6.21 just scarred 
> me. I'd _really_ hate to have to go through that again. And no, the 
> "gradual" thing where the patch that actually *enables* something isn't 
> very gradual at all, so that's the absolutely worst kind of thing, because 
> then people can "git bisect" to the point where it got enabled and tell us 
> that's where things broke, but that doesn't actually say anything at all 
> about the patch that actually implements the new behaviour.
>
> So the "enable" kind of patch is actually the worst of the lot, when it 
> comes to hardware.
>
> When it comes to pure software algorithms, and things like schedulers, 
> you'll still obviously have timing issues and tuning, but generally things 
> *work*, which makes it a lot easier to debug and describe.
>
> 		Linus

Seconded (obviously).

--
-o--=O`C
 #oo'L O
<___=E M


^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: x86 status was Re: -mm merge plans for 2.6.23
  2007-07-11 12:43 ` x86 status was " Andi Kleen
                     ` (2 preceding siblings ...)
  2007-07-11 18:14   ` Jeremy Fitzhardinge
@ 2007-07-12 19:33   ` Christoph Lameter
  2007-07-12 20:38     ` Andi Kleen
  3 siblings, 1 reply; 484+ messages in thread
From: Christoph Lameter @ 2007-07-12 19:33 UTC (permalink / raw)
  To: Andi Kleen
  Cc: Andrew Morton, linux-kernel, tglx, jeremy, Tim Hockin, jesse.barnes

On Wed, 11 Jul 2007, Andi Kleen wrote:

> These all need re-review:
> 
> > i386-add-support-for-picopower-irq-router.patch
> > make-arch-i386-kernel-setupcremapped_pgdat_init-static.patch
> > arch-i386-kernel-i8253c-should-include-asm-timerh.patch
> > make-arch-i386-kernel-io_apicctimer_irq_works-static-again.patch


> > quicklist-support-for-x86_64.patch

^^^ That patch was supposed to be merged for 2.6.22 (you told me you 
forgot to merge it) and has been for a long time in mm. Does it now 
need to be rereviewed for 2.6.23? The other pieces of the quicklist patch 
for core and other arches were merged for 2.6.22.

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: x86 status was Re: -mm merge plans for 2.6.23
  2007-07-11 22:50           ` Thomas Gleixner
  2007-07-11 23:03             ` Chris Wright
  2007-07-11 23:07             ` Linus Torvalds
@ 2007-07-12 20:38             ` Matt Mackall
  2 siblings, 0 replies; 484+ messages in thread
From: Matt Mackall @ 2007-07-12 20:38 UTC (permalink / raw)
  To: Thomas Gleixner
  Cc: Linus Torvalds, Ingo Molnar, Andi Kleen, Andrew Morton,
	linux-kernel, Arjan van de Ven, Chris Wright

On Thu, Jul 12, 2007 at 12:50:19AM +0200, Thomas Gleixner wrote:
> Linus,
> 
> On Wed, 2007-07-11 at 15:20 -0700, Linus Torvalds wrote:
> > For example, we can make sure that the code in question that actually 
> > touches the hardware stays exactly the same, and then just move the 
> > interfaces around - and basically guarantee that _zero_ hardware-specific 
> > issues pop up when you switch over, for example.
> > 
> > That way there is a gradual change-over.
> 
> Ok, I can try to split this down further.
> 
> > The other approach (which would be nice _too_) is to actually try to 
> > convert one clock source at a time. Why is that not an option? 
> 
> We need to give control to the clock events core code once we convert
> one clock event device. Having two competing subsystems controlling
> different devices (e.g. PIT and APIC) is not really desirable.

Can't you take the entire legacy clock system and wrap it as a single
legacy clock source? Then you take bits out of the old system and put
them as independent sources in the new system? When the legacy clock
system is empty, you remove the legacy clock source.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: x86 status was Re: -mm merge plans for 2.6.23
  2007-07-12 19:33   ` Christoph Lameter
@ 2007-07-12 20:38     ` Andi Kleen
  0 siblings, 0 replies; 484+ messages in thread
From: Andi Kleen @ 2007-07-12 20:38 UTC (permalink / raw)
  To: Christoph Lameter
  Cc: Andi Kleen, Andrew Morton, linux-kernel, tglx, jeremy,
	Tim Hockin, jesse.barnes

On Thu, Jul 12, 2007 at 12:33:43PM -0700, Christoph Lameter wrote:
> On Wed, 11 Jul 2007, Andi Kleen wrote:
> 
> > These all need re-review:
> > 
> > > i386-add-support-for-picopower-irq-router.patch
> > > make-arch-i386-kernel-setupcremapped_pgdat_init-static.patch
> > > arch-i386-kernel-i8253c-should-include-asm-timerh.patch
> > > make-arch-i386-kernel-io_apicctimer_irq_works-static-again.patch
> 
> 
> > > quicklist-support-for-x86_64.patch
> 
> ^^^ That patch was supposed to be merged for 2.6.22 (you told me you 
> forgot to merge it) and has been for a long time in mm. Does it now 
> need to be rereviewed for 2.6.23? The other pieces of the quicklist patch 
> for core and other arches were merged for 2.6.22.

It's just on the normal re-review list. But it'll likely go in.

-Andi

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: lguest, Re: -mm merge plans for 2.6.23
  2007-07-12 11:10             ` Avi Kivity
@ 2007-07-12 23:20               ` Rusty Russell
  0 siblings, 0 replies; 484+ messages in thread
From: Rusty Russell @ 2007-07-12 23:20 UTC (permalink / raw)
  To: Avi Kivity; +Cc: Andrew Morton, David Miller, hch, linux-kernel, carsteno

On Thu, 2007-07-12 at 14:10 +0300, Avi Kivity wrote:
> Rusty Russell wrote:
> > Remove export of __put_task_struct, and usage in lguest
> >
> > lguest takes a reference count of tasks for two reasons.  The first is
> > bogus: the /dev/lguest close callback will be called before the task
> > is destroyed anyway, so no need to take a reference on open.
> >
> >   
> 
> What about
> 
>   Open /dev/lguest
>   transfer fd using SCM_RIGHTS (or clone()?)
>   close fd in original task
>   exit()
> 
> ?
> 
> My feeling is that if you want to be bound to a task, not a file, you 
> need to use syscalls, not ioctls.

"Don't do that".  You'll lose the ability to access the operations on
the fd once you are no longer the original task (explicit check).,

It's not an exact match, but a file is a remarkably convenient
abstraction for a non-ABI such as lguest.  Of course, Carsten was
talking about unifying the lguest & kvm userspace interface, so this
could well change anyway.

Cheers,
Rusty.


^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: x86 status was Re: -mm merge plans for 2.6.23
  2007-07-11 22:09         ` Linus Torvalds
  2007-07-12 15:36           ` Oleg Verych
@ 2007-07-13  2:23           ` Roman Zippel
  2007-07-13  4:40             ` Andrew Morton
  2007-07-13  4:47             ` Mike Galbraith
  1 sibling, 2 replies; 484+ messages in thread
From: Roman Zippel @ 2007-07-13  2:23 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Andrea Arcangeli, Andi Kleen, Ingo Molnar, Andrew Morton,
	linux-kernel, Thomas Gleixner, Arjan van de Ven, Chris Wright

Hi,

On Wed, 11 Jul 2007, Linus Torvalds wrote:

> Sure, bugs happen, but code that everybody runs the same generally doesn't 
> break. So a CPU scheduler doesn't worry me all that much. CPU schedulers 
> are "easy".

A little more advance warning wouldn't have hurt though.
The new scheduler does _a_lot_ of heavy 64 bit calculations without any 
attempt to scale that down a little...
One can blame me now for not having it brought up earlier, but discussions 
with Ingo are not something I'm looking forward to. :(

bye, Roman

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: x86 status was Re: -mm merge plans for 2.6.23
  2007-07-13  2:23           ` Roman Zippel
@ 2007-07-13  4:40             ` Andrew Morton
  2007-07-13  4:47             ` Mike Galbraith
  1 sibling, 0 replies; 484+ messages in thread
From: Andrew Morton @ 2007-07-13  4:40 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Linus Torvalds, Andrea Arcangeli, Andi Kleen, Ingo Molnar,
	linux-kernel, Thomas Gleixner, Arjan van de Ven, Chris Wright

On Fri, 13 Jul 2007 04:23:43 +0200 (CEST) Roman Zippel <zippel@linux-m68k.org> wrote:

> Hi,
> 
> On Wed, 11 Jul 2007, Linus Torvalds wrote:
> 
> > Sure, bugs happen, but code that everybody runs the same generally doesn't 
> > break. So a CPU scheduler doesn't worry me all that much. CPU schedulers 
> > are "easy".
> 
> A little more advance warning wouldn't have hurt though.
> The new scheduler does _a_lot_ of heavy 64 bit calculations without any 
> attempt to scale that down a little...
> One can blame me now for not having it brought up earlier, but discussions 
> with Ingo are not something I'm looking forward to. :(
> 

I brought that up a couple of weeks ago, got handwaved at and gave up.

It still isn't obvious to me that all that arith needs to be 64-bit
on 32-bit machines, or even on 64-bit.  4e9 is a big number.

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: x86 status was Re: -mm merge plans for 2.6.23
  2007-07-13  2:23           ` Roman Zippel
  2007-07-13  4:40             ` Andrew Morton
@ 2007-07-13  4:47             ` Mike Galbraith
  2007-07-13 17:23               ` Roman Zippel
  1 sibling, 1 reply; 484+ messages in thread
From: Mike Galbraith @ 2007-07-13  4:47 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Linus Torvalds, Andrea Arcangeli, Andi Kleen, Ingo Molnar,
	Andrew Morton, linux-kernel, Thomas Gleixner, Arjan van de Ven,
	Chris Wright

On Fri, 2007-07-13 at 04:23 +0200, Roman Zippel wrote:
> Hi,

Hi,

> The new scheduler does _a_lot_ of heavy 64 bit calculations without any 
> attempt to scale that down a little...

See prio_to_weight[], prio_to_wmult[] and sysctl_sched_stat_granularity.
Perhaps more can be done, but "without any attempt..." isn't accurate.

	-Mike


^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-10  8:31 -mm merge plans for 2.6.23 Andrew Morton
                   ` (24 preceding siblings ...)
  2007-07-12  0:54 ` fault vs invalidate race (Re: -mm merge plans for 2.6.23) Nick Piggin
@ 2007-07-13  9:46 ` Jan Engelhardt
  2007-07-13 23:09   ` Tilman Schmidt
  2007-07-17  8:55 ` unprivileged mounts (was: Re: -mm merge plans for 2.6.23) Andrew Morton
  26 siblings, 1 reply; 484+ messages in thread
From: Jan Engelhardt @ 2007-07-13  9:46 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linux Kernel Mailing List, tilman


On Jul 10 2007 01:31, Andrew Morton wrote:

>use-menuconfig-objects-isdn-config_isdn_i4l.patch
>
> tilman didn't like it - might drop

Or replace by his suggestion patch ( http://lkml.org/lkml/2007/5/31/222 )


	Jan
-- 

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: x86 status was Re: -mm merge plans for 2.6.23
  2007-07-13  4:47             ` Mike Galbraith
@ 2007-07-13 17:23               ` Roman Zippel
  2007-07-13 19:43                 ` [PATCH] CFS: Fix missing digit off in wmult table Thomas Gleixner
  2007-07-14  5:04                 ` x86 status was Re: -mm merge plans for 2.6.23 Mike Galbraith
  0 siblings, 2 replies; 484+ messages in thread
From: Roman Zippel @ 2007-07-13 17:23 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Linus Torvalds, Andrea Arcangeli, Andi Kleen, Ingo Molnar,
	Andrew Morton, linux-kernel, Thomas Gleixner, Arjan van de Ven,
	Chris Wright

Hi,

On Fri, 13 Jul 2007, Mike Galbraith wrote:

> > The new scheduler does _a_lot_ of heavy 64 bit calculations without any 
> > attempt to scale that down a little...
> 
> See prio_to_weight[], prio_to_wmult[] and sysctl_sched_stat_granularity.
> Perhaps more can be done, but "without any attempt..." isn't accurate.

Calculating these values at runtime would have been completely insane, the 
alternative would be a crummy approximation, so using a lookup table is 
actually a good thing. That's not the problem.
BTW could someone please verify the prio_to_wmult table, especially [16] 
and [21] look a little off, like a digit was cut off.

While I'm at this, the 10% scaling there looks a little much (unless there 
are other changes I haven't looked at yet), the old code used more like 
5%. This would mean a prio -20 task would get 98.86% cpu time compared to 
a prio 0 task, that was previously about the difference between -20 and 
19 (and it would have previously gotten only 88.89%), now a prio -20 task 
would get 99.98% cpu time compared to a prio 19 task.
The individual levels are unfortunately not that easily comparable, but at 
the overall scale the change looks IMHO a little drastic.

bye, Roman

^ permalink raw reply	[flat|nested] 484+ messages in thread

* [PATCH] CFS: Fix missing digit off in wmult table
  2007-07-13 17:23               ` Roman Zippel
@ 2007-07-13 19:43                 ` Thomas Gleixner
  2007-07-16  6:18                   ` James Bruce
  2007-07-14  5:04                 ` x86 status was Re: -mm merge plans for 2.6.23 Mike Galbraith
  1 sibling, 1 reply; 484+ messages in thread
From: Thomas Gleixner @ 2007-07-13 19:43 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Mike Galbraith, Linus Torvalds, Andrea Arcangeli, Andi Kleen,
	Ingo Molnar, Andrew Morton, linux-kernel, Arjan van de Ven,
	Chris Wright

Roman Zippel noticed inconsistency of the wmult table.

wmult[16] has a missing digit.

Signed-off-by: Thomas Gleixner <tglx@linutronix.de>

diff --git a/kernel/sched.c b/kernel/sched.c
index 0559665..3332bbb 100644
--- a/kernel/sched.c
+++ b/kernel/sched.c
@@ -750,7 +750,7 @@ static const u32 prio_to_wmult[40] = {
 	48356,   60446,   75558,   94446,  118058,  147573,
 	184467,  230589,  288233,  360285,  450347,
 	562979,  703746,  879575, 1099582, 1374389,
-	717986, 2147483, 2684354, 3355443, 4194304,
+	1717986, 2147483, 2684354, 3355443, 4194304,
 	5244160, 6557201, 8196502, 10250518, 12782640,
 	16025997, 19976592, 24970740, 31350126, 39045157,
 	49367440, 61356675, 76695844, 95443717, 119304647,



^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-13  9:46 ` -mm merge plans for 2.6.23 Jan Engelhardt
@ 2007-07-13 23:09   ` Tilman Schmidt
  2007-07-14 10:02     ` Jan Engelhardt
  0 siblings, 1 reply; 484+ messages in thread
From: Tilman Schmidt @ 2007-07-13 23:09 UTC (permalink / raw)
  To: Jan Engelhardt; +Cc: Andrew Morton, Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 786 bytes --]

Am 13.07.2007 11:46 schrieb Jan Engelhardt:
> On Jul 10 2007 01:31, Andrew Morton wrote:
> 
>> use-menuconfig-objects-isdn-config_isdn_i4l.patch
>>
>> tilman didn't like it - might drop
> 
> Or replace by his suggestion patch ( http://lkml.org/lkml/2007/5/31/222 )

That posting was just a change proposal for the drivers/isdn/Kconfig
part, not a complete replacement for the entire patch. If you'd care to
reissue that patch with the modification I proposed, I'll gladly ack it.
Alternatively I can also send a full replacement patch if you prefer.

Regards,
Tilman

-- 
Tilman Schmidt                          E-Mail: tilman@imap.cc
Bonn, Germany
Diese Nachricht besteht zu 100% aus wiederverwerteten Bits.
Ungeöffnet mindestens haltbar bis: (siehe Rückseite)


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 253 bytes --]

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: x86 status was Re: -mm merge plans for 2.6.23
  2007-07-13 17:23               ` Roman Zippel
  2007-07-13 19:43                 ` [PATCH] CFS: Fix missing digit off in wmult table Thomas Gleixner
@ 2007-07-14  5:04                 ` Mike Galbraith
  2007-08-01  3:41                   ` CFS review Roman Zippel
  1 sibling, 1 reply; 484+ messages in thread
From: Mike Galbraith @ 2007-07-14  5:04 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Linus Torvalds, Andrea Arcangeli, Andi Kleen, Ingo Molnar,
	Andrew Morton, linux-kernel, Thomas Gleixner, Arjan van de Ven,
	Chris Wright

On Fri, 2007-07-13 at 19:23 +0200, Roman Zippel wrote:
> Hi,
> 
> On Fri, 13 Jul 2007, Mike Galbraith wrote:
> 
> > > The new scheduler does _a_lot_ of heavy 64 bit calculations without any 
> > > attempt to scale that down a little...
> > 
> > See prio_to_weight[], prio_to_wmult[] and sysctl_sched_stat_granularity.
> > Perhaps more can be done, but "without any attempt..." isn't accurate.
> 
> Calculating these values at runtime would have been completely insane, the 
> alternative would be a crummy approximation, so using a lookup table is 
> actually a good thing. That's not the problem.

I meant see usage.

	-Mike




^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-13 23:09   ` Tilman Schmidt
@ 2007-07-14 10:02     ` Jan Engelhardt
       [not found]       ` <20070715131144.3467DFC040@xenon.ts.pxnet.com>
  0 siblings, 1 reply; 484+ messages in thread
From: Jan Engelhardt @ 2007-07-14 10:02 UTC (permalink / raw)
  To: Tilman Schmidt; +Cc: Andrew Morton, Linux Kernel Mailing List


On Jul 14 2007 01:09, Tilman Schmidt wrote:
>Am 13.07.2007 11:46 schrieb Jan Engelhardt:
>> On Jul 10 2007 01:31, Andrew Morton wrote:
>> 
>>> use-menuconfig-objects-isdn-config_isdn_i4l.patch
>>>
>>> tilman didn't like it - might drop
>> 
>> Or replace by his suggestion patch ( http://lkml.org/lkml/2007/5/31/222 )
>
>That posting was just a change proposal for the drivers/isdn/Kconfig
>part, not a complete replacement for the entire patch. If you'd care to
>reissue that patch with the modification I proposed, I'll gladly ack it.
>Alternatively I can also send a full replacement patch if you prefer.

Since I did not really see much of a difference between our two
approaches, I'd be grateful if you could send a full replacement in
the hopes that I see the global picture.


Thanks,
	Jan
-- 

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [PATCH] CFS: Fix missing digit off in wmult table
  2007-07-13 19:43                 ` [PATCH] CFS: Fix missing digit off in wmult table Thomas Gleixner
@ 2007-07-16  6:18                   ` James Bruce
  2007-07-16  7:06                     ` Ingo Molnar
  0 siblings, 1 reply; 484+ messages in thread
From: James Bruce @ 2007-07-16  6:18 UTC (permalink / raw)
  To: linux-kernel
  Cc: Roman Zippel, Mike Galbraith, Linus Torvalds, Andrea Arcangeli,
	Andi Kleen, Ingo Molnar, Andrew Morton, linux-kernel,
	Arjan van de Ven, Chris Wright

Thomas Gleixner wrote:
> Roman Zippel noticed inconsistency of the wmult table.
> wmult[16] has a missing digit.
[snip]

While we're at it, isn't the comment above the wmult table incorrect?
The multiplier is 1.25, meaning a 25% change per nice level, not 10%.

  - Jim


^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [PATCH] CFS: Fix missing digit off in wmult table
  2007-07-16  6:18                   ` James Bruce
@ 2007-07-16  7:06                     ` Ingo Molnar
  2007-07-16  7:41                       ` Ingo Molnar
  2007-07-16 10:18                       ` Roman Zippel
  0 siblings, 2 replies; 484+ messages in thread
From: Ingo Molnar @ 2007-07-16  7:06 UTC (permalink / raw)
  To: James Bruce
  Cc: Thomas Gleixner, Roman Zippel, Mike Galbraith, Linus Torvalds,
	Andrea Arcangeli, Andi Kleen, Andrew Morton, linux-kernel,
	Arjan van de Ven, Chris Wright


* James Bruce <bruce@andrew.cmu.edu> wrote:

> While we're at it, isn't the comment above the wmult table incorrect? 
> The multiplier is 1.25, meaning a 25% change per nice level, not 10%.

yes, the weight multiplier 1.25, but the actual difference in CPU 
utilization, when running two CPU intense tasks, is ~10%:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 8246 mingo     20   0  1576  244  196 R   55  0.0   0:11.96 loop
 8247 mingo     21   1  1576  244  196 R   45  0.0   0:10.52 loop

so the first task 'wins' +10% CPU utilization (relative to the 50% it 
had before), the second task 'loses' -10% CPU utilization (relative to 
the 50% it had before).

so what the comment says is true:

 * The "10% effect" is relative and cumulative: from _any_ nice level,
 * if you go up 1 level, it's -10% CPU usage, if you go down 1 level
 * it's +10% CPU usage.

for there to be a ~+10% change in CPU utilization for a task that races 
against another CPU-intense task there needs to be a ~25% change in the 
weight.

	Ingo

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [PATCH] CFS: Fix missing digit off in wmult table
  2007-07-16  7:06                     ` Ingo Molnar
@ 2007-07-16  7:41                       ` Ingo Molnar
  2007-07-16 15:02                         ` James Bruce
  2007-07-16 10:18                       ` Roman Zippel
  1 sibling, 1 reply; 484+ messages in thread
From: Ingo Molnar @ 2007-07-16  7:41 UTC (permalink / raw)
  To: James Bruce
  Cc: Thomas Gleixner, Roman Zippel, Mike Galbraith, Linus Torvalds,
	Andrea Arcangeli, Andi Kleen, Andrew Morton, linux-kernel,
	Arjan van de Ven, Chris Wright


* Ingo Molnar <mingo@elte.hu> wrote:

> * James Bruce <bruce@andrew.cmu.edu> wrote:
> 
> > While we're at it, isn't the comment above the wmult table incorrect? 
> > The multiplier is 1.25, meaning a 25% change per nice level, not 10%.
> 
> yes, the weight multiplier 1.25, but the actual difference in CPU 
> utilization, when running two CPU intense tasks, is ~10%:
> 
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  8246 mingo     20   0  1576  244  196 R   55  0.0   0:11.96 loop
>  8247 mingo     21   1  1576  244  196 R   45  0.0   0:10.52 loop
> 
> so the first task 'wins' +10% CPU utilization (relative to the 50% it 
> had before), the second task 'loses' -10% CPU utilization (relative to 
> the 50% it had before).
> 
> so what the comment says is true:
> 
>  * The "10% effect" is relative and cumulative: from _any_ nice level,
>  * if you go up 1 level, it's -10% CPU usage, if you go down 1 level
>  * it's +10% CPU usage.
> 
> for there to be a ~+10% change in CPU utilization for a task that 
> races against another CPU-intense task there needs to be a ~25% change 
> in the weight.

in any case more documentation is justified, so i've added some 
clarification to the comments - see the patch below.

	Ingo

------------------------>
Subject: sched: improve weight-array comments
From: Ingo Molnar <mingo@elte.hu>

improve the comments around the wmult array (which controls the weight
of niced tasks). Clarify that to achieve a 10% difference in CPU
utilization, a weight multiplier of 1.25 has to be used.

Signed-off-by: Ingo Molnar <mingo@elte.hu>
---
 kernel/sched.c |    4 +++-
 1 file changed, 3 insertions(+), 1 deletion(-)

Index: linux/kernel/sched.c
===================================================================
--- linux.orig/kernel/sched.c
+++ linux/kernel/sched.c
@@ -736,7 +736,9 @@ static void update_curr_load(struct rq *
  *
  * The "10% effect" is relative and cumulative: from _any_ nice level,
  * if you go up 1 level, it's -10% CPU usage, if you go down 1 level
- * it's +10% CPU usage.
+ * it's +10% CPU usage. (to achieve that we use a multiplier of 1.25.
+ * If a task goes up by ~10% and another task goes down by ~10% then
+ * the relative distance between them is ~25%.)
  */
 static const int prio_to_weight[40] = {
 /* -20 */ 88818, 71054, 56843, 45475, 36380, 29104, 23283, 18626, 14901, 11921,

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [PATCH] CFS: Fix missing digit off in wmult table
  2007-07-16  7:06                     ` Ingo Molnar
  2007-07-16  7:41                       ` Ingo Molnar
@ 2007-07-16 10:18                       ` Roman Zippel
  2007-07-16 11:20                         ` Ingo Molnar
  1 sibling, 1 reply; 484+ messages in thread
From: Roman Zippel @ 2007-07-16 10:18 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: James Bruce, Thomas Gleixner, Mike Galbraith, Linus Torvalds,
	Andrea Arcangeli, Andi Kleen, Andrew Morton, linux-kernel,
	Arjan van de Ven, Chris Wright

Hi,

On Mon, 16 Jul 2007, Ingo Molnar wrote:

> yes, the weight multiplier 1.25, but the actual difference in CPU 
> utilization, when running two CPU intense tasks, is ~10%:
> 
>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>  8246 mingo     20   0  1576  244  196 R   55  0.0   0:11.96 loop
>  8247 mingo     21   1  1576  244  196 R   45  0.0   0:10.52 loop
> 
> so the first task 'wins' +10% CPU utilization (relative to the 50% it 
> had before), the second task 'loses' -10% CPU utilization (relative to 
> the 50% it had before).

As soon as you add another loop the difference changes again, while it's 
always correct to say it gets 25% more cpu time (which I still think is a 
little too much).

bye, Roman

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans -- lumpy reclaim
  2007-07-11 16:46     ` Andrew Morton
  2007-07-11 18:38       ` Andy Whitcroft
@ 2007-07-16 10:37       ` Mel Gorman
  1 sibling, 0 replies; 484+ messages in thread
From: Mel Gorman @ 2007-07-16 10:37 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Andy Whitcroft, linux-kernel, Christoph Lameter, Peter Zijlstra

On (11/07/07 09:46), Andrew Morton didst pronounce:
> On Wed, 11 Jul 2007 10:34:31 +0100 Andy Whitcroft <apw@shadowen.org> wrote:
> 
> > [Seems a PEBKAC occured on the subject line, resending lest it become a
> > victim of "oh thats spam".]
> > 
> > Andy Whitcroft wrote:
> > > Andrew Morton wrote:
> > > 
> > > [...]
> > >> lumpy-reclaim-v4.patch
> > >> have-kswapd-keep-a-minimum-order-free-other-than-order-0.patch
> > >> only-check-absolute-watermarks-for-alloc_high-and-alloc_harder-allocations.patch
> > >>
> > >>  Lumpy reclaim.  In a similar situation to Mel's patches.  Stuck due to
> > >>  general lack or interest and effort.
> > > 
> > > The lumpy reclaim patches originally came out of work to support Mel's
> > > anti-fragmentation work.  As such I think they have become somewhat
> > > attached to those patches.  Whilst lumpy is most effective where
> > > placement controls are in place as offered by Mel's work, we see benefit
> > > from reduction in the "blunderbuss" effect when we reclaim at higher
> > > orders.  While placement control is pretty much required for the very
> > > highest orders such as huge page size, lower order allocations are
> > > benefited in terms of lower collateral damage.
> > > 
> > > There are now a few areas other than huge page allocations which can
> > > benefit.  Stacks are still order 1.  Jumbo frames want higher order
> > > contiguous pages for there incoming hardware buffers.  SLUB is showing
> > > performance benefits from moving to a higher allocation order.  All of
> > > these should benefit from more aggressive targeted reclaim, indeed I
> > > have been surprised just how often my test workloads trigger lumpy at
> > > order 1 to get new stacks.
> > > 
> > > Truly representative work loads are hard to generate for some of these.
> > >  Though we have heard some encouraging noises from those who can
> > > reproduce these problems.
> 
> I'd expect that the main application for lumpy-reclaim is in keeping a pool
> of order-2 (say) pages in reserve for GFP_ATOMIC allocators.  ie: jumbo
> frames.
> 
> At present this relies upon the wakeup_kswapd(..., order) mechanism.
> 
> How effective is this at solving the jumbo frame problem?
> 
> (And do we still have a jumbo frame problem?  Reports seems to have subsided)

The patches have an application with hugepage pool resizing.

When lumpy-reclaim is used used with ZONE_MOVABLE, the hugepages pool can
be resized with greater reliability. Testing on a desktop machine with 2GB
of RAM showed that growing the hugepage pool with ZONE_MOVABLE on it's own
was very slow as the success rate was quite low. Without lumpy-reclaim, each
attempt to grow the pool by 100 pages would yield 1 or 2 hugepages. With
lumpy-reclaim, getting 40 to 70 hugepages on each attempt was typical.

-- 
Mel Gorman
Part-time Phd Student                          Linux Technology Center
University of Limerick                         IBM Dublin Software Lab

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [PATCH] CFS: Fix missing digit off in wmult table
  2007-07-16 10:18                       ` Roman Zippel
@ 2007-07-16 11:20                         ` Ingo Molnar
  2007-07-16 11:58                           ` Roman Zippel
  0 siblings, 1 reply; 484+ messages in thread
From: Ingo Molnar @ 2007-07-16 11:20 UTC (permalink / raw)
  To: Roman Zippel
  Cc: James Bruce, Thomas Gleixner, Mike Galbraith, Linus Torvalds,
	Andrea Arcangeli, Andi Kleen, Andrew Morton, linux-kernel,
	Arjan van de Ven, Chris Wright


* Roman Zippel <zippel@linux-m68k.org> wrote:

> > yes, the weight multiplier 1.25, but the actual difference in CPU 
> > utilization, when running two CPU intense tasks, is ~10%:
> > 
> >   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> >  8246 mingo     20   0  1576  244  196 R   55  0.0   0:11.96 loop
> >  8247 mingo     21   1  1576  244  196 R   45  0.0   0:10.52 loop
> > 
> > so the first task 'wins' +10% CPU utilization (relative to the 50% 
> > it had before), the second task 'loses' -10% CPU utilization 
> > (relative to the 50% it had before).
> 
> As soon as you add another loop the difference changes again, while 
> it's always correct to say it gets 25% more cpu time [...]

yep, and i'll add the relative effect to the comment too.

	Ingo

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [PATCH] CFS: Fix missing digit off in wmult table
  2007-07-16 11:20                         ` Ingo Molnar
@ 2007-07-16 11:58                           ` Roman Zippel
  2007-07-16 12:12                             ` Ingo Molnar
  2007-07-16 17:47                             ` Linus Torvalds
  0 siblings, 2 replies; 484+ messages in thread
From: Roman Zippel @ 2007-07-16 11:58 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: James Bruce, Thomas Gleixner, Mike Galbraith, Linus Torvalds,
	Andrea Arcangeli, Andi Kleen, Andrew Morton, linux-kernel,
	Arjan van de Ven, Chris Wright

Hi,

On Mon, 16 Jul 2007, Ingo Molnar wrote:

> > As soon as you add another loop the difference changes again, while 
> > it's always correct to say it gets 25% more cpu time [...]
> 
> yep, and i'll add the relative effect to the comment too.

Why did you cut off the rest of the sentence?
To illustrate the problem a little different: a task with a nice level -20 
got around 700% more cpu time (or 8 times more), now it gets 8500% more 
cpu time (or 86.7 times more).
You don't think that change to the nice levels is a little drastic?

bye, Roman

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [PATCH] CFS: Fix missing digit off in wmult table
  2007-07-16 11:58                           ` Roman Zippel
@ 2007-07-16 12:12                             ` Ingo Molnar
  2007-07-16 12:42                               ` Roman Zippel
  2007-07-16 17:47                             ` Linus Torvalds
  1 sibling, 1 reply; 484+ messages in thread
From: Ingo Molnar @ 2007-07-16 12:12 UTC (permalink / raw)
  To: Roman Zippel
  Cc: James Bruce, Thomas Gleixner, Mike Galbraith, Linus Torvalds,
	Andrea Arcangeli, Andi Kleen, Andrew Morton, linux-kernel,
	Arjan van de Ven, Chris Wright


* Roman Zippel <zippel@linux-m68k.org> wrote:

> On Mon, 16 Jul 2007, Ingo Molnar wrote:
> 
> > > As soon as you add another loop the difference changes again, 
> > > while it's always correct to say it gets 25% more cpu time [...]
> > 
> > yep, and i'll add the relative effect to the comment too.
> 
> Why did you cut off the rest of the sentence?

(no need to become hostile, i answered to that portion of your sentence 
separately, which was logically detached from the other portion of your 
sentence. I marked the cut with the '[...]' sign. )

> To illustrate the problem a little different: a task with a nice level 
> -20 got around 700% more cpu time (or 8 times more), now it gets 8500% 
> more cpu time (or 86.7 times more). You don't think that change to the 
> nice levels is a little drastic?

This was discussed on lkml in detail, see the CFS threads. It has been a 
common request for nice levels to be more logical (i.e. to make them 
universal and to detach them from HZ) and for them to be more effective 
as well.

	Ingo

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [PATCH] CFS: Fix missing digit off in wmult table
  2007-07-16 12:12                             ` Ingo Molnar
@ 2007-07-16 12:42                               ` Roman Zippel
  2007-07-16 13:40                                 ` Ingo Molnar
  0 siblings, 1 reply; 484+ messages in thread
From: Roman Zippel @ 2007-07-16 12:42 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: James Bruce, Thomas Gleixner, Mike Galbraith, Linus Torvalds,
	Andrea Arcangeli, Andi Kleen, Andrew Morton, linux-kernel,
	Arjan van de Ven, Chris Wright

Hi,

On Mon, 16 Jul 2007, Ingo Molnar wrote:

> > > > As soon as you add another loop the difference changes again, 
> > > > while it's always correct to say it gets 25% more cpu time [...]
> > > 
> > > yep, and i'll add the relative effect to the comment too.
> > 
> > Why did you cut off the rest of the sentence?
> 
> (no need to become hostile, i answered to that portion of your sentence 
> separately, which was logically detached from the other portion of your 
> sentence. I marked the cut with the '[...]' sign. )

Could you please stop with these accusations?
Could you please point me to the mail with the separate answer?

> > To illustrate the problem a little different: a task with a nice level 
> > -20 got around 700% more cpu time (or 8 times more), now it gets 8500% 
> > more cpu time (or 86.7 times more). You don't think that change to the 
> > nice levels is a little drastic?
> 
> This was discussed on lkml in detail, see the CFS threads.

Which are quite big, so I skipped most of it, a more precise pointer would 
be appreciated.

> It has been a 
> common request for nice levels to be more logical (i.e. to make them 
> universal and to detach them from HZ) and for them to be more effective 
> as well.

Huh? What has this to do with HZ? The scheduler used ticks internally, but 
it's irrelevant to what the user sees via the nice levels.
So the question still stands that this change may be a little drastic, as 
you changed the nice levels of _all_ users, not just of those who were 
previously interested in CFS.

bye, Roman

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [PATCH] CFS: Fix missing digit off in wmult table
  2007-07-16 12:42                               ` Roman Zippel
@ 2007-07-16 13:40                                 ` Ingo Molnar
  2007-07-16 14:01                                   ` Roman Zippel
  0 siblings, 1 reply; 484+ messages in thread
From: Ingo Molnar @ 2007-07-16 13:40 UTC (permalink / raw)
  To: Roman Zippel
  Cc: James Bruce, Thomas Gleixner, Mike Galbraith, Linus Torvalds,
	Andi Kleen, Andrew Morton, linux-kernel, Arjan van de Ven,
	Chris Wright


* Roman Zippel <zippel@linux-m68k.org> wrote:

> > It has been a common request for nice levels to be more logical 
> > (i.e. to make them universal and to detach them from HZ) and for 
> > them to be more effective as well.
> 
> Huh? What has this to do with HZ? The scheduler used ticks internally, 
> but it's irrelevant to what the user sees via the nice levels. [...]

unfortunately you are wrong again - there are various HZ related 
artifacts in the nice level support code of the old scheduler.

v2.6.22, CONFIG_HZ=100, nice +19 task against a nice-0 CPU-intense task:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 2446 mingo     25   0  1576  244  196 R 90.9  0.0   0:32.79 loop
 2448 mingo     39  19  1580  248  196 R  9.1  0.0   0:02.94 loop

v2.6.22, CONFIG_HZ=250, nice +19 task against a nice-0 CPU-intense task:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 2358 mingo     25   0  1576  248  196 R 96.1  0.0   0:31.97 loop_silent
 2363 mingo     39  19  1576  244  196 R  3.9  0.0   0:01.24 loop_silent

v2.6.22, CONFIG_HZ=300, nice +19 task against a nice-0 CPU-intense task:

  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
 2332 mingo     25   0  1580  248  196 R 95.1  0.0   0:11.84 loop_silent
 2335 mingo     39  19  1576  244  196 R  3.1  0.0   0:00.39 loop_silent

to sum it up: a nice +19 task (the most commonly used nice level in 
practice) gets 9.1%, 3.9%, 3.1% of CPU time on the old scheduler, 
depending on the value of HZ. This is quite inconsistent and illogical.

this HZ dependency of nice levels existed for many years, and the new 
scheduler solves that inconsistency - every nice level will get the same 
amount of time, regardless of HZ.

	Ingo

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [PATCH] CFS: Fix missing digit off in wmult table
  2007-07-16 13:40                                 ` Ingo Molnar
@ 2007-07-16 14:01                                   ` Roman Zippel
  2007-07-16 20:31                                     ` Matt Mackall
  0 siblings, 1 reply; 484+ messages in thread
From: Roman Zippel @ 2007-07-16 14:01 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: James Bruce, Thomas Gleixner, Mike Galbraith, Linus Torvalds,
	Andi Kleen, Andrew Morton, linux-kernel, Arjan van de Ven,
	Chris Wright

Hi,

On Mon, 16 Jul 2007, Ingo Molnar wrote:

> to sum it up: a nice +19 task (the most commonly used nice level in 
> practice) gets 9.1%, 3.9%, 3.1% of CPU time on the old scheduler, 
> depending on the value of HZ. This is quite inconsistent and illogical.

You're correct that you can find artifacts in the extreme cases, it's 
subjective whether this is a serious problem.
It's nice that these artifacts are gone, but that still doesn't explain 
why this ratio had to be increase that much from around 1:10 to 1:69.

bye, Roman

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [PATCH] CFS: Fix missing digit off in wmult table
  2007-07-16  7:41                       ` Ingo Molnar
@ 2007-07-16 15:02                         ` James Bruce
  0 siblings, 0 replies; 484+ messages in thread
From: James Bruce @ 2007-07-16 15:02 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Thomas Gleixner, Roman Zippel, Mike Galbraith, Linus Torvalds,
	Andrea Arcangeli, Andi Kleen, Andrew Morton, linux-kernel,
	Arjan van de Ven, Chris Wright

Ingo Molnar wrote:
> * Ingo Molnar <mingo@elte.hu> wrote:
>> * James Bruce <bruce@andrew.cmu.edu> wrote:
>>> While we're at it, isn't the comment above the wmult table incorrect? 
>>> The multiplier is 1.25, meaning a 25% change per nice level, not 10%.
>> yes, the weight multiplier 1.25, but the actual difference in CPU 
>> utilization, when running two CPU intense tasks, is ~10%:
>>
>>   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
>>  8246 mingo     20   0  1576  244  196 R   55  0.0   0:11.96 loop
>>  8247 mingo     21   1  1576  244  196 R   45  0.0   0:10.52 loop
>>
>> so the first task 'wins' +10% CPU utilization (relative to the 50% it 
>> had before), the second task 'loses' -10% CPU utilization (relative to 
>> the 50% it had before).
>>
>> so what the comment says is true:
>>
>>  * The "10% effect" is relative and cumulative: from _any_ nice level,
>>  * if you go up 1 level, it's -10% CPU usage, if you go down 1 level
>>  * it's +10% CPU usage.
>>
>> for there to be a ~+10% change in CPU utilization for a task that 
>> races against another CPU-intense task there needs to be a ~25% change 
>> in the weight.
> 
> in any case more documentation is justified, so i've added some 
> clarification to the comments - see the patch below.

Ah ok so it's 10% of the original CPU usage, not relative to a tasks
share from before.  While I guess I still think in terms of relative CPU
share, your comments now make sense to me.  Thanks for the
clarification.

  - Jim

> ------------------------>
> Subject: sched: improve weight-array comments
> From: Ingo Molnar <mingo@elte.hu>
> 
> improve the comments around the wmult array (which controls the weight
> of niced tasks). Clarify that to achieve a 10% difference in CPU
> utilization, a weight multiplier of 1.25 has to be used.
> 
> Signed-off-by: Ingo Molnar <mingo@elte.hu>
> ---
>  kernel/sched.c |    4 +++-
>  1 file changed, 3 insertions(+), 1 deletion(-)
> 
> Index: linux/kernel/sched.c
> ===================================================================
> --- linux.orig/kernel/sched.c
> +++ linux/kernel/sched.c
> @@ -736,7 +736,9 @@ static void update_curr_load(struct rq *
>   *
>   * The "10% effect" is relative and cumulative: from _any_ nice level,
>   * if you go up 1 level, it's -10% CPU usage, if you go down 1 level
> - * it's +10% CPU usage.
> + * it's +10% CPU usage. (to achieve that we use a multiplier of 1.25.
> + * If a task goes up by ~10% and another task goes down by ~10% then
> + * the relative distance between them is ~25%.)
>   */
>  static const int prio_to_weight[40] = {
>  /* -20 */ 88818, 71054, 56843, 45475, 36380, 29104, 23283, 18626, 14901, 11921,


^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [PATCH] CFS: Fix missing digit off in wmult table
  2007-07-16 11:58                           ` Roman Zippel
  2007-07-16 12:12                             ` Ingo Molnar
@ 2007-07-16 17:47                             ` Linus Torvalds
  2007-07-16 18:12                               ` Roman Zippel
  2007-07-18 10:27                               ` Peter Zijlstra
  1 sibling, 2 replies; 484+ messages in thread
From: Linus Torvalds @ 2007-07-16 17:47 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Ingo Molnar, James Bruce, Thomas Gleixner, Mike Galbraith,
	Andrea Arcangeli, Andi Kleen, Andrew Morton, linux-kernel,
	Arjan van de Ven, Chris Wright



On Mon, 16 Jul 2007, Roman Zippel wrote:
>
> To illustrate the problem a little different: a task with a nice level -20 
> got around 700% more cpu time (or 8 times more), now it gets 8500% more 
> cpu time (or 86.7 times more).

Ingo, that _does_ sound excessive. 

How about trying a much less aggressive nice-level (and preferably linear, 
not exponential)?

		Linus

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [PATCH] CFS: Fix missing digit off in wmult table
  2007-07-16 17:47                             ` Linus Torvalds
@ 2007-07-16 18:12                               ` Roman Zippel
  2007-07-18 10:27                               ` Peter Zijlstra
  1 sibling, 0 replies; 484+ messages in thread
From: Roman Zippel @ 2007-07-16 18:12 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, James Bruce, Thomas Gleixner, Mike Galbraith,
	Andrea Arcangeli, Andi Kleen, Andrew Morton, linux-kernel,
	Arjan van de Ven, Chris Wright

Hi,

On Mon, 16 Jul 2007, Linus Torvalds wrote:

> How about trying a much less aggressive nice-level (and preferably linear, 
> not exponential)?

I think the exponential increase isn't the problem. The old code did 
approximate something like this rather crudely with the result that there 
was a big gap between level 0 and -1.

Something like this:

echo 'for (i=-20;i<=20;i++) print i, " : ", 1024*e(l(2)*(-i/20*3)), "\n";' | bc -l

would produce a range similiar to the old code. Replacing the factor 3 
with 4 would be IMO a more reasonable increase and had the advantage for 
the user that it's easier to understand that every 5 levels the time a 
process gets is doubled.

bye, Roman

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [PATCH] CFS: Fix missing digit off in wmult table
  2007-07-16 14:01                                   ` Roman Zippel
@ 2007-07-16 20:31                                     ` Matt Mackall
  2007-07-16 21:18                                       ` Ingo Molnar
  2007-07-16 21:25                                       ` Roman Zippel
  0 siblings, 2 replies; 484+ messages in thread
From: Matt Mackall @ 2007-07-16 20:31 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Ingo Molnar, James Bruce, Thomas Gleixner, Mike Galbraith,
	Linus Torvalds, Andi Kleen, Andrew Morton, linux-kernel,
	Arjan van de Ven, Chris Wright

On Mon, Jul 16, 2007 at 04:01:17PM +0200, Roman Zippel wrote:
> Hi,
> 
> On Mon, 16 Jul 2007, Ingo Molnar wrote:
> 
> > to sum it up: a nice +19 task (the most commonly used nice level in 
> > practice) gets 9.1%, 3.9%, 3.1% of CPU time on the old scheduler, 
> > depending on the value of HZ. This is quite inconsistent and illogical.
> 
> You're correct that you can find artifacts in the extreme cases, it's 
> subjective whether this is a serious problem.
> It's nice that these artifacts are gone, but that still doesn't explain 
> why this ratio had to be increase that much from around 1:10 to 1:69.

More dynamic range is better? If you actually want a task to get 20x
the CPU time of another, the older scheduler doesn't really allow it.

Getting 1/69th of a modern CPU is still a fair number of cycles.
Nevermind 1/69th of a machine with > 64 cores.

-- 
Mathematics is the supreme nostalgia of our time.

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [PATCH] CFS: Fix missing digit off in wmult table
  2007-07-16 20:31                                     ` Matt Mackall
@ 2007-07-16 21:18                                       ` Ingo Molnar
  2007-07-16 22:13                                         ` Roman Zippel
  2007-07-16 21:25                                       ` Roman Zippel
  1 sibling, 1 reply; 484+ messages in thread
From: Ingo Molnar @ 2007-07-16 21:18 UTC (permalink / raw)
  To: Matt Mackall
  Cc: Roman Zippel, James Bruce, Thomas Gleixner, Mike Galbraith,
	Linus Torvalds, Andi Kleen, Andrew Morton, linux-kernel,
	Arjan van de Ven, Chris Wright


* Matt Mackall <mpm@selenic.com> wrote:

> More dynamic range is better? If you actually want a task to get 20x 
> the CPU time of another, the older scheduler doesn't really allow it.
> 
> Getting 1/69th of a modern CPU is still a fair number of cycles. 
> Nevermind 1/69th of a machine with > 64 cores.

yeah. furthermore, nice -20 is only admin-selectable.

Here are the current CPU-use values for positive nice levels:

 nice  0: 100.00%
 nice  1: 80.00%
 nice  2: 64.10%
 nice  3: 51.28%
 nice  4: 40.98%
 nice  5: 32.78%
 nice  6: 26.24%
 nice  7: 21.00%
 nice  8: 16.77%
 nice  9: 13.42%
 nice 10: 10.74%
 nice 11: 8.59%
 nice 12: 6.87%
 nice 13: 5.50%
 nice 14: 4.39%
 nice 15: 3.51%
 nice 16: 2.81%
 nice 17: 2.25%
 nice 18: 1.80%
 nice 19: 1.44%

here's the CPU utilization table for negative nice levels (relative to a 
nice -20 task):

 nice  0: 1.15%
 nice -1: 1.44%
 nice -2: 1.80%
 nice -3: 2.25%
 nice -4: 2.81%
 nice -5: 3.51%
 nice -6: 4.39%
 nice -7: 5.50%
 nice -8: 6.87%
 nice -9: 8.59%
 nice -10: 10.74%
 nice -11: 13.42%
 nice -12: 16.77%
 nice -13: 21.00%
 nice -14: 26.24%
 nice -15: 32.78%
 nice -16: 40.98%
 nice -17: 51.28%
 nice -18: 64.10%
 nice -19: 80.00%
 nice -20: 100.00%

these are pretty sane, and symmetric across the origo. Nice -20 is the 
odd one out, because there is no nice +20. But its value is still 
logical, it's the mirror image of an imaginery nice +20.

and note that even on the old scheduler, nice-0 was "3200% more 
powerful" than nice +19 (with CONFIG_HZ=300), and nice -19 was only 700% 
more powerful than nice-0. So not only was it inconsistent (and i can 
create scary numbers too ;), it gave the admin-controlled negative nice 
levels less of a punch than to user-controlled nice +19. A number of 
people complainted about that, and CFS addresses this.

in fact i like it that nice -20 has a slightly bigger punch than it used 
to have before: it might remove the need to run audio apps (and other 
multimedia apps) under SCHED_FIFO. (SCHED_FIFO is unprotected against 
lockups, while under CFS a nice 0 task is still starvation protected 
against a nice -20 task.)

furthermore, there is a quality of implementation issue as well, look at 
the definition of the nice system call:

   asmlinkage long sys_nice(int increment)

the "increment" is relative. So nice(1) has the same behavioral effect 
under CFS, regardless of which nice level you start out from. Under the 
old scheduler, the result depended on which nice level you started out 
from.

	Ingo

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [PATCH] CFS: Fix missing digit off in wmult table
  2007-07-16 20:31                                     ` Matt Mackall
  2007-07-16 21:18                                       ` Ingo Molnar
@ 2007-07-16 21:25                                       ` Roman Zippel
  2007-07-17  7:53                                         ` Ingo Molnar
  1 sibling, 1 reply; 484+ messages in thread
From: Roman Zippel @ 2007-07-16 21:25 UTC (permalink / raw)
  To: Matt Mackall
  Cc: Ingo Molnar, James Bruce, Thomas Gleixner, Mike Galbraith,
	Linus Torvalds, Andi Kleen, Andrew Morton, linux-kernel,
	Arjan van de Ven, Chris Wright

Hi,

On Mon, 16 Jul 2007, Matt Mackall wrote:

> > It's nice that these artifacts are gone, but that still doesn't explain 
> > why this ratio had to be increase that much from around 1:10 to 1:69.
> 
> More dynamic range is better? If you actually want a task to get 20x
> the CPU time of another, the older scheduler doesn't really allow it.

You can already have that, the complete range level from 19 to -20 was 
about 1:80.
There is also something like too much range, I tried it with top at 19 and 
as soon as something runs at -20 it's practically dead, because it gets 
now only 1/5900 of cpu time.

bye, Roman

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [PATCH] CFS: Fix missing digit off in wmult table
  2007-07-16 21:18                                       ` Ingo Molnar
@ 2007-07-16 22:13                                         ` Roman Zippel
  2007-07-16 22:29                                           ` Ingo Molnar
  0 siblings, 1 reply; 484+ messages in thread
From: Roman Zippel @ 2007-07-16 22:13 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Matt Mackall, James Bruce, Thomas Gleixner, Mike Galbraith,
	Linus Torvalds, Andi Kleen, Andrew Morton, linux-kernel,
	Arjan van de Ven, Chris Wright

Hi,

On Mon, 16 Jul 2007, Ingo Molnar wrote:

> and note that even on the old scheduler, nice-0 was "3200% more 
> powerful" than nice +19 (with CONFIG_HZ=300),

How did you get that value? At any HZ the ratio should be around 1:10
(+- rounding error).

> in fact i like it that nice -20 has a slightly bigger punch than it used 
> to have before:

"Slightly bigger"??? You're joking, right?
Especially the user levels are doing something completely different now, 
which may break user expectation. While the user couldn't expect anything 
precise, it's still a big difference whether a process at nice 5 gets 75% 
of the time or only 30%.

bye, Roman

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [PATCH] CFS: Fix missing digit off in wmult table
  2007-07-16 22:13                                         ` Roman Zippel
@ 2007-07-16 22:29                                           ` Ingo Molnar
  2007-07-17  0:02                                             ` Roman Zippel
  0 siblings, 1 reply; 484+ messages in thread
From: Ingo Molnar @ 2007-07-16 22:29 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Matt Mackall, James Bruce, Thomas Gleixner, Mike Galbraith,
	Linus Torvalds, Andi Kleen, Andrew Morton, linux-kernel,
	Arjan van de Ven, Chris Wright


* Roman Zippel <zippel@linux-m68k.org> wrote:

> Hi,
> 
> On Mon, 16 Jul 2007, Ingo Molnar wrote:
> 
> > and note that even on the old scheduler, nice-0 was "3200% more 
> > powerful" than nice +19 (with CONFIG_HZ=300),
> 
> How did you get that value? At any HZ the ratio should be around 1:10 
> (+- rounding error).

you are wrong again. I sent you the numbers earlier today already:

|   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
|  2332 mingo     25   0  1580  248  196 R 95.1  0.0   0:11.84 loop
|  2335 mingo     39  19  1576  244  196 R  3.1  0.0   0:00.39 loop

3.1% is 3067% more than 95.1%, and the ratio is 1:30.67. You again deny 
above that this is the case, and there's nothing i can do about your 
denial of facts - that is your own private problem.

	Ingo

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [PATCH] CFS: Fix missing digit off in wmult table
  2007-07-16 22:29                                           ` Ingo Molnar
@ 2007-07-17  0:02                                             ` Roman Zippel
  2007-07-17  3:20                                               ` Roman Zippel
  0 siblings, 1 reply; 484+ messages in thread
From: Roman Zippel @ 2007-07-17  0:02 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Matt Mackall, James Bruce, Thomas Gleixner, Mike Galbraith,
	Linus Torvalds, Andi Kleen, Andrew Morton, linux-kernel,
	Arjan van de Ven, Chris Wright

Hi,

On Tue, 17 Jul 2007, Ingo Molnar wrote:

> * Roman Zippel <zippel@linux-m68k.org> wrote:
> 
> > Hi,
> > 
> > On Mon, 16 Jul 2007, Ingo Molnar wrote:
> > 
> > > and note that even on the old scheduler, nice-0 was "3200% more 
> > > powerful" than nice +19 (with CONFIG_HZ=300),
> > 
> > How did you get that value? At any HZ the ratio should be around 1:10 
> > (+- rounding error).
> 
> you are wrong again. I sent you the numbers earlier today already:
> 
> |   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND
> |  2332 mingo     25   0  1580  248  196 R 95.1  0.0   0:11.84 loop
> |  2335 mingo     39  19  1576  244  196 R  3.1  0.0   0:00.39 loop
> 
> 3.1% is 3067% more than 95.1%, and the ratio is 1:30.67. You again deny 
> above that this is the case, and there's nothing i can do about your 
> denial of facts - that is your own private problem.

Ingo, how am I supposed to react to this? I'm asking a simple question
and I get this? I'm at serious loss how to deal with you. :-(

Above is based on theoritical values, for a 300HZ kernel these two 
processes should get 30 and 3 ticks. Should there be any rounding error or 
off by one error so that the processes get one tick less than they should 
get or one tick is accounted to the wrong process, my theoritical value is 
still within the possible error range and doesn't contradict your
practical values.
Playing around with some other nice levels, confirms the theory that 
something is a little off, so I'm quite correct at saying that the ratio 
_should_ be 1:10.
OTOH you are the one who is wrong about me (again). :-(

bye, Roman

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [PATCH] CFS: Fix missing digit off in wmult table
  2007-07-17  0:02                                             ` Roman Zippel
@ 2007-07-17  3:20                                               ` Roman Zippel
  2007-07-17  8:02                                                 ` Ingo Molnar
  0 siblings, 1 reply; 484+ messages in thread
From: Roman Zippel @ 2007-07-17  3:20 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Matt Mackall, James Bruce, Thomas Gleixner, Mike Galbraith,
	Linus Torvalds, Andi Kleen, Andrew Morton, linux-kernel,
	Arjan van de Ven, Chris Wright

Hi,

On Tue, 17 Jul 2007, I wrote:

> Playing around with some other nice levels, confirms the theory that 
> something is a little off, so I'm quite correct at saying that the ratio 
> _should_ be 1:10.

Rechecking everything there was actually a small error in my test program, 
so the ratio should be at 1:20. Sorry about that mistake.
Nice level 19 shows the largest artifacts, as that level only gets a 
single tick, so the ratio is often 1:HZ/10 (except for 1000HZ where it's 
5:100). Nevertheless it's still true that in general nice levels were 
independent of HZ (that's all I wanted to say a couple of mails ago).

Ingo, you can start now gloating, but contrary to you I have no problems 
with admitting mistakes and apologizing for them. The point is just that 
I'm reacting better to factual arguments instead of flames (and I think 
it's not just me), so I'm pretty sure I'm still correct about this:

> OTOH you are the one who is wrong about me (again). :-(

bye, Roman

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [PATCH] CFS: Fix missing digit off in wmult table
  2007-07-16 21:25                                       ` Roman Zippel
@ 2007-07-17  7:53                                         ` Ingo Molnar
  2007-07-17 15:12                                           ` Roman Zippel
  0 siblings, 1 reply; 484+ messages in thread
From: Ingo Molnar @ 2007-07-17  7:53 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Matt Mackall, James Bruce, Thomas Gleixner, Mike Galbraith,
	Linus Torvalds, Andi Kleen, Andrew Morton, linux-kernel,
	Arjan van de Ven, Chris Wright


* Roman Zippel <zippel@linux-m68k.org> wrote:

> > > It's nice that these artifacts are gone, but that still doesn't 
> > > explain why this ratio had to be increase that much from around 
> > > 1:10 to 1:69.
> > 
> > More dynamic range is better? If you actually want a task to get 20x 
> > the CPU time of another, the older scheduler doesn't really allow 
> > it.
> 
> You can already have that, the complete range level from 19 to -20 was 
> about 1:80.

But that is irrelevant: all tasks start out at nice 0, and what matters 
is the dynamic range around 0.

So the dynamic range has been made uniform in the positive from 
1:10...1:20...1:30 to 1:69 for nice +19, and from 1:8 to 1:69 in the 
minus. (with 1:86 nice -20) If you look at the negative nice levels 
alone it's a substantial increase but if you compare it with positive 
nice levels you'll similar kinds of dynamic ranges were already present 
in the old scheduler and you'll see why we've done it.

Negative nice levels are admin-controlled, the increase in the negative 
levels is is not a big issue and people actually like the increased 
dynamic range and the consistency. The positive range _might_ be a 
bigger issue but there we were largely inconsistent anyway, and again, 
people like the increased dynamic range.

	Ingo

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [PATCH] CFS: Fix missing digit off in wmult table
  2007-07-17  3:20                                               ` Roman Zippel
@ 2007-07-17  8:02                                                 ` Ingo Molnar
  2007-07-17 14:06                                                   ` Roman Zippel
  0 siblings, 1 reply; 484+ messages in thread
From: Ingo Molnar @ 2007-07-17  8:02 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Matt Mackall, James Bruce, Thomas Gleixner, Mike Galbraith,
	Linus Torvalds, Andi Kleen, Andrew Morton, linux-kernel,
	Arjan van de Ven, Chris Wright


* Roman Zippel <zippel@linux-m68k.org> wrote:

> Nice level 19 shows the largest artifacts, as that level only gets a 
> single tick, so the ratio is often 1:HZ/10 (except for 1000HZ where 
> it's 5:100). [...]

Roman, please do me a favor, and ask me the following question:

 " Ingo, you've been maintaining the scheduler for years. In fact you 
   wrote the old nice code we are talking about here. You changed it a
   number of times since then. So you really know what's going on here. 
   Why does the old nice code behave like that for nice +19 levels? "

I've been waiting for that obvious question, and i _might_ be able to 
answer it, but somehow it never occured to you ;-) Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 484+ messages in thread

* unprivileged mounts (was: Re: -mm merge plans for 2.6.23)
  2007-07-10  8:31 -mm merge plans for 2.6.23 Andrew Morton
                   ` (25 preceding siblings ...)
  2007-07-13  9:46 ` -mm merge plans for 2.6.23 Jan Engelhardt
@ 2007-07-17  8:55 ` Andrew Morton
  26 siblings, 0 replies; 484+ messages in thread
From: Andrew Morton @ 2007-07-17  8:55 UTC (permalink / raw)
  To: linux-kernel; +Cc: Christoph Hellwig, Al Viro, Miklos Szeredi

On Tue, 10 Jul 2007 01:31:52 -0700 Andrew Morton <akpm@linux-foundation.org> wrote:

> unprivileged-mounts-add-user-mounts-to-the-kernel.patch
> unprivileged-mounts-allow-unprivileged-umount.patch
> unprivileged-mounts-account-user-mounts.patch
> unprivileged-mounts-propagate-error-values-from-clone_mnt.patch
> unprivileged-mounts-allow-unprivileged-bind-mounts.patch
> unprivileged-mounts-put-declaration-of-put_filesystem-in-fsh.patch
> unprivileged-mounts-allow-unprivileged-mounts.patch
> unprivileged-mounts-allow-unprivileged-fuse-mounts.patch
> unprivileged-mounts-propagation-inherit-owner-from-parent.patch
> unprivileged-mounts-add-no-submounts-flag.patch
> 
>  Don't know.  Need to ping suitable developers over this work.

ping.

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [PATCH] CFS: Fix missing digit off in wmult table
  2007-07-17  8:02                                                 ` Ingo Molnar
@ 2007-07-17 14:06                                                   ` Roman Zippel
  2007-07-18 10:40                                                     ` Ingo Molnar
  0 siblings, 1 reply; 484+ messages in thread
From: Roman Zippel @ 2007-07-17 14:06 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Matt Mackall, James Bruce, Thomas Gleixner, Mike Galbraith,
	Linus Torvalds, Andi Kleen, Andrew Morton, linux-kernel,
	Arjan van de Ven, Chris Wright

Hi,

On Tue, 17 Jul 2007, Ingo Molnar wrote:

> Roman, please do me a favor, and ask me the following question:
> 
>  " Ingo, you've been maintaining the scheduler for years. In fact you 
>    wrote the old nice code we are talking about here. You changed it a
>    number of times since then. So you really know what's going on here. 
>    Why does the old nice code behave like that for nice +19 levels? "
> 
> I've been waiting for that obvious question, and i _might_ be able to 
> answer it, but somehow it never occured to you ;-) Thanks,

Do you have any idea how insulting and arrogant this is?
Let me translate for you, how this arrived:

"O Ingo, who art our god of the scheduler. You have blessed the paths I 
walked in. You kept me from sinning numerous times. Your wisdom is 
infinite. Guide me on the journey that layeth ahead of me into this world 
knowledge of Your truth."

(I apologize already in advance, if I should have hurt anyones religious 
feelings.)

It's obvious that you have more experience with the scheduler code, but 
does that make you unfailable? Does that give you the right to act like a 
jerk?
I do make mistakes, I try to learn from them and life goes on, I have no 
problem with that, but what I have a problem with is if someone is abusing 
this to his own advantage. I have to be extremely carful what I say to 
you, because you jump on the first small mistake and I have to bear your 
insults like "there's nothing i can do about your denial of facts - that 
is your own private problem." I have no problems with facts, I'm only 
trying very hard to ignore your arrogant behaviour...
If you have something to contribute to this discussion which might clear 
things up, then just say it, but I'm not going to beg for it.

bye, Roman

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [PATCH] CFS: Fix missing digit off in wmult table
  2007-07-17  7:53                                         ` Ingo Molnar
@ 2007-07-17 15:12                                           ` Roman Zippel
  0 siblings, 0 replies; 484+ messages in thread
From: Roman Zippel @ 2007-07-17 15:12 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Matt Mackall, James Bruce, Thomas Gleixner, Mike Galbraith,
	Linus Torvalds, Andi Kleen, Andrew Morton, linux-kernel,
	Arjan van de Ven, Chris Wright

Hi,

On Tue, 17 Jul 2007, Ingo Molnar wrote:

> * Roman Zippel <zippel@linux-m68k.org> wrote:
> 
> > > > It's nice that these artifacts are gone, but that still doesn't 
> > > > explain why this ratio had to be increase that much from around 
> > > > 1:10 to 1:69.
> > > 
> > > More dynamic range is better? If you actually want a task to get 20x 
> > > the CPU time of another, the older scheduler doesn't really allow 
> > > it.
> > 
> > You can already have that, the complete range level from 19 to -20 was 
> > about 1:80.
> 
> But that is irrelevant: all tasks start out at nice 0, and what matters 
> is the dynamic range around 0.
> 
> So the dynamic range has been made uniform in the positive from 
> 1:10...1:20...1:30 to 1:69 for nice +19, and from 1:8 to 1:69 in the 
> minus. (with 1:86 nice -20) If you look at the negative nice levels 
> alone it's a substantial increase but if you compare it with positive 
> nice levels you'll similar kinds of dynamic ranges were already present 
> in the old scheduler and you'll see why we've done it.

So let's look at them:

for (i=0;i<20;i++) print i, " : ", (20-i)*5, " : ", 100*1.25^-i, " : ", e(l(2)*(-i/5))*100, "\n";
0 : 100 : 100 : 100.00000000000000000000
1 : 95 : 80.00000000000000000000 : 87.05505632961241391300
2 : 90 : 64.00000000000000000000 : 75.78582832551990411700
3 : 85 : 51.20000000000000000000 : 65.97539553864471296900
4 : 80 : 40.96000000000000000000 : 57.43491774985175034000
5 : 75 : 32.76800000000000000000 : 50.00000000000000000000
6 : 70 : 26.21440000000000000000 : 43.52752816480620695700
7 : 65 : 20.97152000000000000000 : 37.89291416275995205900
8 : 60 : 16.77721600000000000000 : 32.98769776932235648400
9 : 55 : 13.42177280000000000000 : 28.71745887492587517000
10 : 50 : 10.73741824000000000000 : 25.00000000000000000000
11 : 45 : 8.58993459200000000000 : 21.76376408240310347800
12 : 40 : 6.87194767360000000000 : 18.94645708137997602900
13 : 35 : 5.49755813888000000000 : 16.49384888466117824200
14 : 30 : 4.39804651110400000000 : 14.35872943746293758500
15 : 25 : 3.51843720888320000000 : 12.50000000000000000000
16 : 20 : 2.81474976710656000000 : 10.88188204120155173900
17 : 15 : 2.25179981368524800000 : 9.47322854068998801400
18 : 10 : 1.80143985094819840000 : 8.24692444233058912100
19 : 5 : 1.44115188075855872000 : 7.17936471873146879200

(nice level : old % : new % : my suggested %)

Your levels divert very quickly from what they used to be (upto a factor 
of 7), it's also not really easy to remember what the individual levels 
mean.
I at least try to keep them somewhat in the range they used to be (and 
the difference is limited to a factor of about 2), also every 5 levels the 
amount of cpu time is halved, which is very easy to remember.

If you need more dynamic range, is there a law that prevents us from going 
beyond 19? For example:

for (i=20;i<=30;i++) print i, " : ", (20-i)*5, " : ", 100*1.25^-i, " : ", e(l(2)*(-i/5))*100, "\n";
20 : 0 : 1.15292150460684697600 : 6.25000000000000000000
21 : -5 : .92233720368547758000 : 5.44094102060077586900
22 : -10 : .73786976294838206400 : 4.73661427034499400700
23 : -15 : .59029581035870565100 : 4.12346222116529456000
24 : -20 : .47223664828696452100 : 3.58968235936573439600
25 : -25 : .37778931862957161700 : 3.12500000000000000000
26 : -30 : .30223145490365729300 : 2.72047051030038793400
27 : -35 : .24178516392292583400 : 2.36830713517249700300
28 : -40 : .19342813113834066700 : 2.06173111058264728000
29 : -45 : .15474250491067253400 : 1.79484117968286719800
30 : -50 : .12379400392853802700 : 1.56250000000000000000

setpriority() accepts such values without error.

bye, Roman

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [PATCH] CFS: Fix missing digit off in wmult table
  2007-07-16 17:47                             ` Linus Torvalds
  2007-07-16 18:12                               ` Roman Zippel
@ 2007-07-18 10:27                               ` Peter Zijlstra
  2007-07-18 12:45                                 ` Roman Zippel
  1 sibling, 1 reply; 484+ messages in thread
From: Peter Zijlstra @ 2007-07-18 10:27 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Roman Zippel, Ingo Molnar, James Bruce, Thomas Gleixner,
	Mike Galbraith, Andrea Arcangeli, Andi Kleen, Andrew Morton,
	linux-kernel, Arjan van de Ven, Chris Wright

On Mon, 2007-07-16 at 10:47 -0700, Linus Torvalds wrote:
> 
> On Mon, 16 Jul 2007, Roman Zippel wrote:
> >
> > To illustrate the problem a little different: a task with a nice level -20 
> > got around 700% more cpu time (or 8 times more), now it gets 8500% more 
> > cpu time (or 86.7 times more).
> 
> Ingo, that _does_ sound excessive. 
> 
> How about trying a much less aggressive nice-level (and preferably linear, 
> not exponential)?

I actually like the extra range, it allows for a much softer punch of
background tasks even on somewhat slower boxen.

I've been testing CFS on my 1200 MHz lappy for some time and a strongly
niced kbuild leaves a very usable system. 

The old scheduler would leave the thing rather jumpy. And while CFS
fully fixes the jumpyness, I just did a nice +13 (which should be
equivalent to the old schedulers nice +19 for my HZ) and did a nice +19
kbuild and I can definitely feel the difference between them.

Early CFS versions had an pretty aggressive nice range (0.1% for +19),
and that has been toned down based on feedback. The current levels seem
to work well, at least on my boxen.

- Peter


^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [PATCH] CFS: Fix missing digit off in wmult table
  2007-07-17 14:06                                                   ` Roman Zippel
@ 2007-07-18 10:40                                                     ` Ingo Molnar
  2007-07-18 12:40                                                       ` Roman Zippel
  0 siblings, 1 reply; 484+ messages in thread
From: Ingo Molnar @ 2007-07-18 10:40 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Matt Mackall, James Bruce, Thomas Gleixner, Mike Galbraith,
	Linus Torvalds, Andi Kleen, Andrew Morton, linux-kernel,
	Arjan van de Ven, Chris Wright


* Roman Zippel <zippel@linux-m68k.org> wrote:

> On Tue, 17 Jul 2007, Ingo Molnar wrote:
> 
> > Roman, please do me a favor, and ask me the following question:
> > 
> >  " Ingo, you've been maintaining the scheduler for years. In fact you 
> >    wrote the old nice code we are talking about here. You changed it a
> >    number of times since then. So you really know what's going on here. 
> >    Why does the old nice code behave like that for nice +19 levels? "
> > 
> > I've been waiting for that obvious question, and i _might_ be able 
> > to answer it, but somehow it never occured to you ;-) Thanks,

[...]
> It's obvious that you have more experience with the scheduler code, 
> but does that make you unfailable? [...]

Roman, it is really not about 'experience', and yes, we all make 
frequent mistakes.

it's about the plain fact that i happened to write _both_ the old and 
the new code you were talking about all along. In this discussion about 
nice levels you were (very) agressively asserting things that were 
untrue, you were suggesting that i dont understand the code, instead of 
simply asking me why the code was written in such a way and what the 
motivation behind it was. I'd be glad to attempt to answer such a 
friendly question, if you are interested in asking it and if you are 
interested in my answer. Thanks,

	Ingo

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [PATCH] CFS: Fix missing digit off in wmult table
  2007-07-18 10:40                                                     ` Ingo Molnar
@ 2007-07-18 12:40                                                       ` Roman Zippel
  2007-07-18 16:17                                                         ` Ingo Molnar
  0 siblings, 1 reply; 484+ messages in thread
From: Roman Zippel @ 2007-07-18 12:40 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Matt Mackall, James Bruce, Thomas Gleixner, Mike Galbraith,
	Linus Torvalds, Andi Kleen, Andrew Morton, linux-kernel,
	Arjan van de Ven, Chris Wright

Hi,

On Wed, 18 Jul 2007, Ingo Molnar wrote:

> > > Roman, please do me a favor, and ask me the following question:
> > > 
> > >  [insult deleted]


> In this discussion about 
> nice levels you were (very) agressively asserting things that were 
> untrue,

Instead of simply asserting things, how about you provide some examples?
I made so far a single mistake of mixing up nice levels 18 and 19.
If you would point me to such examples, I could learn how to tone it down 
a little, since the nice levels are not the only issue I have with the new 
scheduler, the heavy stuff is still about to come. The problem here is 
there is too much burnt ground so I can't just present raw ideas, which 
get flamed by you, I have to be sufficiently confident they are valid, 
what you might then interpret as "agressive assertion".

> you were suggesting that i dont understand the code,

Again, please point me to examples, so I at least have a chance to clear 
things up, since it was never my intention to make such a suggestion, but 
this gives me no chance to defend myself.

OTOH I can tell you exactly how you continuously insult me, e.g. by 
suggesting I ask "stupid questions" or that I'm in "denial of facts".
Don't make such suggestions if you have no idea how insulting they are. 
Especially the one deleted insult above where you have the impertinence to 
quote it, such tone is more appropriate between lord and inferior, where 
the latter have to make a request and the former "might" grant it. 
_Never_ make me beg. :-(

bye, Roman

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [PATCH] CFS: Fix missing digit off in wmult table
  2007-07-18 10:27                               ` Peter Zijlstra
@ 2007-07-18 12:45                                 ` Roman Zippel
  2007-07-18 12:52                                   ` Peter Zijlstra
  0 siblings, 1 reply; 484+ messages in thread
From: Roman Zippel @ 2007-07-18 12:45 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Ingo Molnar, James Bruce, Thomas Gleixner,
	Mike Galbraith, Andrea Arcangeli, Andi Kleen, Andrew Morton,
	linux-kernel, Arjan van de Ven, Chris Wright

Hi,

On Wed, 18 Jul 2007, Peter Zijlstra wrote:

> I actually like the extra range, it allows for a much softer punch of
> background tasks even on somewhat slower boxen.

The extra range is not really a problem, in 

http://www.ussg.iu.edu/hypermail/linux/kernel/0707.2/0850.html

I suggested how we can have both.

bye, Roman

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [PATCH] CFS: Fix missing digit off in wmult table
  2007-07-18 12:45                                 ` Roman Zippel
@ 2007-07-18 12:52                                   ` Peter Zijlstra
  2007-07-18 12:59                                     ` Ingo Molnar
                                                       ` (2 more replies)
  0 siblings, 3 replies; 484+ messages in thread
From: Peter Zijlstra @ 2007-07-18 12:52 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Linus Torvalds, Ingo Molnar, James Bruce, Thomas Gleixner,
	Mike Galbraith, Andrea Arcangeli, Andi Kleen, Andrew Morton,
	linux-kernel, Arjan van de Ven, Chris Wright

On Wed, 2007-07-18 at 14:45 +0200, Roman Zippel wrote:
> Hi,
> 
> On Wed, 18 Jul 2007, Peter Zijlstra wrote:
> 
> > I actually like the extra range, it allows for a much softer punch of
> > background tasks even on somewhat slower boxen.
> 
> The extra range is not really a problem, in 
> 
> http://www.ussg.iu.edu/hypermail/linux/kernel/0707.2/0850.html
> 
> I suggested how we can have both.

By breaking the UNIX model of nice levels. Not an option in my book.


^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [PATCH] CFS: Fix missing digit off in wmult table
  2007-07-18 12:52                                   ` Peter Zijlstra
@ 2007-07-18 12:59                                     ` Ingo Molnar
  2007-07-18 13:07                                     ` Roman Zippel
  2007-07-18 13:26                                     ` Roman Zippel
  2 siblings, 0 replies; 484+ messages in thread
From: Ingo Molnar @ 2007-07-18 12:59 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Roman Zippel, Linus Torvalds, James Bruce, Thomas Gleixner,
	Mike Galbraith, Andrea Arcangeli, Andi Kleen, Andrew Morton,
	linux-kernel, Arjan van de Ven, Chris Wright


* Peter Zijlstra <peterz@infradead.org> wrote:

> On Wed, 2007-07-18 at 14:45 +0200, Roman Zippel wrote:
> > Hi,
> > 
> > On Wed, 18 Jul 2007, Peter Zijlstra wrote:
> > 
> > > I actually like the extra range, it allows for a much softer punch of
> > > background tasks even on somewhat slower boxen.
> > 
> > The extra range is not really a problem, in 
> > 
> > http://www.ussg.iu.edu/hypermail/linux/kernel/0707.2/0850.html
> > 
> > I suggested how we can have both.
> 
> By breaking the UNIX model of nice levels. Not an option in my book.

yeah, that's pretty much out of question.

	Ingo

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [PATCH] CFS: Fix missing digit off in wmult table
  2007-07-18 12:52                                   ` Peter Zijlstra
  2007-07-18 12:59                                     ` Ingo Molnar
@ 2007-07-18 13:07                                     ` Roman Zippel
  2007-07-18 13:27                                       ` Peter Zijlstra
  2007-07-18 13:48                                       ` Ingo Molnar
  2007-07-18 13:26                                     ` Roman Zippel
  2 siblings, 2 replies; 484+ messages in thread
From: Roman Zippel @ 2007-07-18 13:07 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Ingo Molnar, James Bruce, Thomas Gleixner,
	Mike Galbraith, Andrea Arcangeli, Andi Kleen, Andrew Morton,
	linux-kernel, Arjan van de Ven, Chris Wright

Hi,

On Wed, 18 Jul 2007, Peter Zijlstra wrote:

> By breaking the UNIX model of nice levels. Not an option in my book.

Breaking user expectations of nice levels is?

bye, Roman

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [PATCH] CFS: Fix missing digit off in wmult table
  2007-07-18 12:52                                   ` Peter Zijlstra
  2007-07-18 12:59                                     ` Ingo Molnar
  2007-07-18 13:07                                     ` Roman Zippel
@ 2007-07-18 13:26                                     ` Roman Zippel
  2007-07-18 13:31                                       ` Peter Zijlstra
  2 siblings, 1 reply; 484+ messages in thread
From: Roman Zippel @ 2007-07-18 13:26 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Ingo Molnar, James Bruce, Thomas Gleixner,
	Mike Galbraith, Andrea Arcangeli, Andi Kleen, Andrew Morton,
	linux-kernel, Arjan van de Ven, Chris Wright

Hi,

On Wed, 18 Jul 2007, Peter Zijlstra wrote:

> By breaking the UNIX model of nice levels. Not an option in my book.

BTW what is the "UNIX model of nice levels"?

SUS specifies the limit via NZERO, which is defined as "Minimum Acceptable 
Value: 20", I can't find any information that it must be 20.

bye, Roman

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [PATCH] CFS: Fix missing digit off in wmult table
  2007-07-18 13:07                                     ` Roman Zippel
@ 2007-07-18 13:27                                       ` Peter Zijlstra
  2007-07-18 13:58                                         ` Roman Zippel
  2007-07-18 13:48                                       ` Ingo Molnar
  1 sibling, 1 reply; 484+ messages in thread
From: Peter Zijlstra @ 2007-07-18 13:27 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Linus Torvalds, Ingo Molnar, James Bruce, Thomas Gleixner,
	Mike Galbraith, Andrea Arcangeli, Andi Kleen, Andrew Morton,
	linux-kernel, Arjan van de Ven, Chris Wright

On Wed, 2007-07-18 at 15:07 +0200, Roman Zippel wrote:
> Hi,
> 
> On Wed, 18 Jul 2007, Peter Zijlstra wrote:
> 
> > By breaking the UNIX model of nice levels. Not an option in my book.
> 
> Breaking user expectations of nice levels is?

http://www.opengroup.org/onlinepubs/009695399/basedefs/xbd_chap03.html

specifically:

"3.239 Nice Value

  A number used as advice to the system to alter process scheduling.
  Numerically smaller values give a process additional preference when
  scheduling a process to run. Numerically larger values reduce the
  preference and make a process less likely to run. Typically, a process
  with a smaller nice value runs to completion more quickly than an
  equivalent process with a higher nice value. The symbol {NZERO}
  specifies the default nice value of the system."


The only expectation is that a process with a lower nice level gets more
time. Any other expectation is a bug.




^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [PATCH] CFS: Fix missing digit off in wmult table
  2007-07-18 13:26                                     ` Roman Zippel
@ 2007-07-18 13:31                                       ` Peter Zijlstra
  0 siblings, 0 replies; 484+ messages in thread
From: Peter Zijlstra @ 2007-07-18 13:31 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Linus Torvalds, Ingo Molnar, James Bruce, Thomas Gleixner,
	Mike Galbraith, Andrea Arcangeli, Andi Kleen, Andrew Morton,
	linux-kernel, Arjan van de Ven, Chris Wright

On Wed, 2007-07-18 at 15:26 +0200, Roman Zippel wrote:
> Hi,
> 
> On Wed, 18 Jul 2007, Peter Zijlstra wrote:
> 
> > By breaking the UNIX model of nice levels. Not an option in my book.
> 
> BTW what is the "UNIX model of nice levels"?
> 
> SUS specifies the limit via NZERO, which is defined as "Minimum Acceptable 
> Value: 20", I can't find any information that it must be 20.

I have never encountered a UNIX where it is anything other than 20.
Convention (alas not specification) does dictate 20.


^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [PATCH] CFS: Fix missing digit off in wmult table
  2007-07-18 13:07                                     ` Roman Zippel
  2007-07-18 13:27                                       ` Peter Zijlstra
@ 2007-07-18 13:48                                       ` Ingo Molnar
  2007-07-18 14:14                                         ` Roman Zippel
  1 sibling, 1 reply; 484+ messages in thread
From: Ingo Molnar @ 2007-07-18 13:48 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Peter Zijlstra, Linus Torvalds, James Bruce, Thomas Gleixner,
	Mike Galbraith, Andrea Arcangeli, Andi Kleen, Andrew Morton,
	linux-kernel, Arjan van de Ven, Chris Wright


* Roman Zippel <zippel@linux-m68k.org> wrote:

> > By breaking the UNIX model of nice levels. Not an option in my book.
> 
> Breaking user expectations of nice levels is?

_changing_ it is an option within reason, and we've done it a couple of 
times already in the past, and even within CFS (as Peter correctly 
observed) we've been through a couple of iterations already. And as i 
mentioned it before, the outer edge of nice levels (+19, by far the most 
commonly used nice level) was inconsistent to begin with: 3%, 5%, 9% of 
nice-0, depending on HZ. So changing that to a consistent (and 
user-requested) 1.5% is a much smaller change than you seem to make it 
out to be. CFS itself is a far larger "change of expectations" than this 
tweak to nice levels. So by your standard we could never change the 
scheduler. (which your ultimate argument might be after all =B-)

	Ingo

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [PATCH] CFS: Fix missing digit off in wmult table
  2007-07-18 13:27                                       ` Peter Zijlstra
@ 2007-07-18 13:58                                         ` Roman Zippel
  0 siblings, 0 replies; 484+ messages in thread
From: Roman Zippel @ 2007-07-18 13:58 UTC (permalink / raw)
  To: Peter Zijlstra
  Cc: Linus Torvalds, Ingo Molnar, James Bruce, Thomas Gleixner,
	Mike Galbraith, Andrea Arcangeli, Andi Kleen, Andrew Morton,
	linux-kernel, Arjan van de Ven, Chris Wright

Hi,

On Wed, 18 Jul 2007, Peter Zijlstra wrote:

> The only expectation is that a process with a lower nice level gets more
> time. Any other expectation is a bug.

Yes, users are buggy, they expect a lot of stupid things...
Is this really reason enough to break this?

What exactly is the damage if setpriority() accepts a few more levels?

bye, Roman

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [PATCH] CFS: Fix missing digit off in wmult table
  2007-07-18 13:48                                       ` Ingo Molnar
@ 2007-07-18 14:14                                         ` Roman Zippel
  2007-07-18 16:02                                           ` Ingo Molnar
  0 siblings, 1 reply; 484+ messages in thread
From: Roman Zippel @ 2007-07-18 14:14 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Linus Torvalds, James Bruce, Thomas Gleixner,
	Mike Galbraith, Andrea Arcangeli, Andi Kleen, Andrew Morton,
	linux-kernel, Arjan van de Ven, Chris Wright

Hi,

On Wed, 18 Jul 2007, Ingo Molnar wrote:

> _changing_ it is an option within reason, and we've done it a couple of 
> times already in the past, and even within CFS (as Peter correctly 
> observed) we've been through a couple of iterations already. And as i 
> mentioned it before, the outer edge of nice levels (+19, by far the most 
> commonly used nice level) was inconsistent to begin with: 3%, 5%, 9% of 
> nice-0, depending on HZ.

Why do you constantly stress level 19? Yes, that one is special, all other 
positive levels were already relatively consistent.

> So changing that to a consistent (and 
> user-requested)

How old is CFS and how many users did it have so far? How many users has 
the old scheduler, which will be exposed to the new one soon?

> 1.5% is a much smaller change than you seem to make it 
> out to be.

The percentage levels are off by a factor of upto _seven_, sorry I fail 
see how you can characterize this as "small".

> So by your standard we could never change the 
> scheduler. (which your ultimate argument might be after all =B-)

Careful, you make assertion about me, for which you have absolutely no 
base, adding a smiley doesn't make this any funnier.

bye, Roman

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [PATCH] CFS: Fix missing digit off in wmult table
  2007-07-18 14:14                                         ` Roman Zippel
@ 2007-07-18 16:02                                           ` Ingo Molnar
  2007-07-20 15:03                                             ` Roman Zippel
  0 siblings, 1 reply; 484+ messages in thread
From: Ingo Molnar @ 2007-07-18 16:02 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Peter Zijlstra, Linus Torvalds, James Bruce, Thomas Gleixner,
	Mike Galbraith, Andrea Arcangeli, Andi Kleen, Andrew Morton,
	linux-kernel, Arjan van de Ven, Chris Wright


* Roman Zippel <zippel@linux-m68k.org> wrote:

> > _changing_ it is an option within reason, and we've done it a couple 
> > of times already in the past, and even within CFS (as Peter 
> > correctly observed) we've been through a couple of iterations 
> > already. And as i mentioned it before, the outer edge of nice levels 
> > (+19, by far the most commonly used nice level) was inconsistent to 
> > begin with: 3%, 5%, 9% of nice-0, depending on HZ.
> 
> Why do you constantly stress level 19? Yes, that one is special, all 
> other positive levels were already relatively consistent.

i constantly stress it for the reason i mentioned a good number of 
times: because it's by far the most commonly used (and complained about) 
nice level. =B-)

but because you are asking, i'm glad to give you some first-hand 
historic background about Linux nice levels (in case you are interested) 
and the motivations behind their old and new implementations:

nice levels were always so weak under Linux (just read Peter's report) 
that people continuously bugged me about making nice +19 tasks use up 
much less CPU time. Unfortunately that was not that easy to implement 
(otherwise we'd have done it long ago) because nice level support was 
historically coupled to timeslice length, and timeslice units were 
driven by the HZ tick, so the smallest timeslice was 1/HZ.

In the O(1) scheduler (about 4 years ago) i changed negative nice levels 
to be much stronger than they were before in 2.4 (and people were happy 
about that change), and i also intentionally calibrated the linear 
timeslice rule so that nice +19 level would be _exactly_ 1 jiffy. To 
better understand it, the timeslice graph went like this (cheesy ASCII 
art alert!):


                   A
             \     | [timeslice length]
              \    |
               \   |
                \  |
                 \ |
                  \|___100msecs
                   |^ . _
                   |      ^ . _
                   |            ^ . _
 -*----------------------------------*-----> [nice level]
 -20               |                +19
                   |
                   |

so that if someone wants to really renice tasks, +19 would give a much 
bigger hit than the normal linear rule would do. (The solution of 
changing the ABI to extend priorities was discarded early on.)

This approach worked to some degree for some time, but later on with 
HZ=1000 it caused 1 jiffy to be 1 msec, which meant 0.1% CPU usage which 
we felt to be a bit excessive. Excessive _not_ because it's too small of 
a CPU utilization, but because it causes too frequent (once per 
millisec) rescheduling. (and would thus trash the cache, etc. Remember, 
this was 4-5 years ago when hardware was weaker and caches were smaller, 
and people were running number crunching apps at nice +19.)

So for HZ=1000 i changed nice +19 to 5msecs, because that felt like the 
right minimal granularity - and this translates to 5% CPU utilization. 
But the fundamental HZ-sensitive property for nice+19 still remained, 
and i never got a single complaint about nice +19 being too _weak_ in 
terms of CPU utilization, i only got complaints about it (still) being 
way too _strong_.

To sum it up: i always wanted to make nice levels more consistent, but 
within the constraints of HZ and jiffies and their nasty design level 
coupling to timeslices and granularity it was not really viable.

The second (less frequent but still periodically occuring) complaint 
about Linux's nice level support was its assymetry around the origo 
(which you can see demonstrated in the picture above), or more 
accurately: the fact that nice level behavior depended on the _absolute_ 
nice level as well, while the nice API itself is fundamentally 
"relative":

   int nice(int inc);

   asmlinkage long sys_nice(int increment)

(the first one is the glibc API, the second one is the syscall API.) 
Note that the 'inc' is relative to the current nice level. Tools like 
bash's "nice" command mirror this relative API.

With the old scheduler, if you for example started a niced task with +1 
and another task with +2, the CPU split between the two tasks would 
depend on the nice level of the parent shell - if it was at nice -10 the 
CPU split was different than if it was at +5 or +10.

A third complaint against Linux's nice level support was that negative 
nice levels were not 'punchy enough', so lots of people had to resort to 
run audio (and other multimedia) apps under RT priorities such as 
SCHED_FIFO. But this caused other problems: SCHED_FIFO is not starvation 
proof, and a buggy SCHED_FIFO app can also lock up the system for good.

CFS addresses all three types of complaints:

To address the first complaint (of nice levels being not "punchy" 
enough), i decoupled the scheduler from 'time slice' and HZ concepts 
(and made granularity a separate concept from nice levels) and thus CFS 
was able to implement better and more consistent nice +19 support: now 
in CFS nice +19 tasks get a HZ-independent 1.5%, instead of the variable 
3%-5%-9% range they got in the old scheduler.

To address the second complaint (of nice levels not being consistent), i 
made nice(1) have the same CPU utilization effect on tasks, regardless 
of their absolute nice levels. So on CFS, running a nice +10 and a nice 
+11 task has the same CPU utilization "split" between them as running a 
nice -5 and a nice -4 task. (one will get 55% of the CPU, the other 
45%.) That is why I changed nice levels to be "multiplicative" (or 
exponential) - that way it does not matter which nice level you start 
out from, the 'relative result' will always be the same.

The third complaint (of negative nice levels not being "punchy" enough 
and forcing audio apps to run under the more dangerous SCHED_FIFO 
scheduling policy) is addressed by CFS almost automatically: stronger 
negative nice levels are an automatic side-effect of the recalibrated 
dynamic range of nice levels.

Hope this helps,

	Ingo

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [PATCH] CFS: Fix missing digit off in wmult table
  2007-07-18 12:40                                                       ` Roman Zippel
@ 2007-07-18 16:17                                                         ` Ingo Molnar
  2007-07-20 13:38                                                           ` Roman Zippel
  0 siblings, 1 reply; 484+ messages in thread
From: Ingo Molnar @ 2007-07-18 16:17 UTC (permalink / raw)
  To: Roman Zippel
  Cc: Matt Mackall, James Bruce, Thomas Gleixner, Mike Galbraith,
	Linus Torvalds, Andi Kleen, Andrew Morton, linux-kernel,
	Arjan van de Ven, Chris Wright


* Roman Zippel <zippel@linux-m68k.org> wrote:

> Don't make such suggestions if you have no idea how insulting they 
> are. Especially the one deleted insult above where you have the 
> impertinence to quote it, such tone is more appropriate between lord 
> and inferior, where the latter have to make a request and the former 
> "might" grant it. [...]

uhm, [and the uninterested reader might want to skip to the next mail 
;-)], i'm really confused about your reply. Do you really mean this:

> > Roman, please do me a favor, and ask me the following question:
> >
> >  " Ingo, you've been maintaining the scheduler for years. In fact you
> >    wrote the old nice code we are talking about here. You changed it a
> >    number of times since then. So you really know what's going on here.
> >    Why does the old nice code behave like that for nice +19 levels? "
> >
> > I've been waiting for that obvious question, and i _might_ be able
> > to answer it, but somehow it never occured to you ;-) Thanks,

the ";-)" emoticon (and its contents) clearly signals this as a 
sarcastic, tongue-in-cheek remark. To make it even clearer, please 
re-read it with the <sarcastic> tag added as well for clarity:

> > <sarcastic>
> >
> > Roman, please do me a favor, and ask me the following question:
> >
> >  " Ingo, you've been maintaining the scheduler for years. In fact you
> >    wrote the old nice code we are talking about here. You changed it a
> >    number of times since then. So you really know what's going on here.
> >    Why does the old nice code behave like that for nice +19 levels? "
> >
> > I've been waiting for that obvious question, and i _might_ be able
> > to answer it, but somehow it never occured to you ;-) Thanks,
> >
> > </sarcastic>

ok? (If you didnt see/read it as sarcastic straight away then my 
apologies for insulting you!)

The "_might_ be able to answer" bit is of course sarcastic too, and 
contrary to your (i have to say, pretty absurd) suggestion i did not 
suggest that i "might be _willing_ to answer" - which would be quite 
arrogant indeed and which i never said or suggested. To make it even 
clearer: i'm definitely able to answer questions about code i wrote 
originally and which i just changed, were you to show genuine interest 
in hearing my opinion :-)

	Ingo

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [PATCH] Use menuconfig objects - CONFIG_ISDN_I4L [v2]
       [not found]       ` <20070715131144.3467DFC040@xenon.ts.pxnet.com>
@ 2007-07-18 18:18         ` Jan Engelhardt
  2007-07-18 18:22         ` [more PATCHes] Use menuconfig objects - CONFIG_ISDN_I4L Jan Engelhardt
  1 sibling, 0 replies; 484+ messages in thread
From: Jan Engelhardt @ 2007-07-18 18:18 UTC (permalink / raw)
  To: Tilman Schmidt; +Cc: Andrew Morton, Karsten Keil, Linux Kernel Mailing List


>Remove a menu statement and several dependencies from the Kconfig files in
>the drivers/isdn tree as they have become unnecessary by the transformation
>of CONFIG_ISDN from "menu, config" into "menuconfig".
>(Modified version of a patch originally proposed by Jan Engelhardt.)
>
>Signed-off-by: Tilman Schmidt <tilman@imap.cc>
>---
>
>This is my alternative proposal for
>use-menuconfig-objects-isdn-config_isdn_i4l.patch (patch 2 of 6 in Jan's
>"Use menuconfig objects 4 - ISDN" series). It must go between patch 1 and
>3 of that series because they touch some of the same files.

This looks good to me. (Applies on top of today's git.)


Thanks,
	Jan
-- 

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [more PATCHes] Use menuconfig objects - CONFIG_ISDN_I4L
       [not found]       ` <20070715131144.3467DFC040@xenon.ts.pxnet.com>
  2007-07-18 18:18         ` [PATCH] Use menuconfig objects - CONFIG_ISDN_I4L [v2] Jan Engelhardt
@ 2007-07-18 18:22         ` Jan Engelhardt
  2007-07-18 18:23           ` [patch 1/2] Use menuconfig objects - ISDN Jan Engelhardt
                             ` (2 more replies)
  1 sibling, 3 replies; 484+ messages in thread
From: Jan Engelhardt @ 2007-07-18 18:22 UTC (permalink / raw)
  To: Tilman Schmidt; +Cc: Andrew Morton, Karsten Keil, Linux Kernel Mailing List

Hi,


here are two more changes I propose for the isdn submenu(s).
They go on top of Tilman's patch; each of the two following patches is 
independent of another.
Opinions please :)


	Jan
-- 

^ permalink raw reply	[flat|nested] 484+ messages in thread

* [patch 1/2] Use menuconfig objects - ISDN
  2007-07-18 18:22         ` [more PATCHes] Use menuconfig objects - CONFIG_ISDN_I4L Jan Engelhardt
@ 2007-07-18 18:23           ` Jan Engelhardt
  2007-07-18 18:23           ` [patch 2/2] Use menuconfig objects - ISDN/Gigaset Jan Engelhardt
  2007-07-22  0:32           ` [more PATCHes] Use menuconfig objects - CONFIG_ISDN_I4L Tilman Schmidt
  2 siblings, 0 replies; 484+ messages in thread
From: Jan Engelhardt @ 2007-07-18 18:23 UTC (permalink / raw)
  To: Tilman Schmidt; +Cc: Andrew Morton, Karsten Keil, Linux Kernel Mailing List


Unclutter the ISDN menu a tiny bit by moving ISDN4Linux and the CAPI2.0
layers into their own menu.

Signed-off-by: Jan Engelhardt <jengelh@gmx.de>

---
 drivers/isdn/Kconfig |    4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Index: linux-2.6.23/drivers/isdn/Kconfig
===================================================================
--- linux-2.6.23.orig/drivers/isdn/Kconfig
+++ linux-2.6.23/drivers/isdn/Kconfig
@@ -21,7 +21,7 @@ menuconfig ISDN
 
 if ISDN
 
-config ISDN_I4L
+menuconfig ISDN_I4L
 	tristate "Old ISDN4Linux (deprecated)"
 	---help---
 	  This driver allows you to use an ISDN adapter for networking
@@ -43,7 +43,7 @@ if ISDN_I4L
 source "drivers/isdn/i4l/Kconfig"
 endif
 
-config ISDN_CAPI
+menuconfig ISDN_CAPI
 	tristate "CAPI 2.0 subsystem"
 	help
 	  This provides the CAPI (Common ISDN Application Programming

^ permalink raw reply	[flat|nested] 484+ messages in thread

* [patch 2/2] Use menuconfig objects - ISDN/Gigaset
  2007-07-18 18:22         ` [more PATCHes] Use menuconfig objects - CONFIG_ISDN_I4L Jan Engelhardt
  2007-07-18 18:23           ` [patch 1/2] Use menuconfig objects - ISDN Jan Engelhardt
@ 2007-07-18 18:23           ` Jan Engelhardt
  2007-07-22  0:32           ` [more PATCHes] Use menuconfig objects - CONFIG_ISDN_I4L Tilman Schmidt
  2 siblings, 0 replies; 484+ messages in thread
From: Jan Engelhardt @ 2007-07-18 18:23 UTC (permalink / raw)
  To: Tilman Schmidt; +Cc: Andrew Morton, Karsten Keil, Linux Kernel Mailing List


Change Kconfig objects from "menu, config" into "menuconfig" so
that the user can disable the whole feature without having to
enter the menu first.

Signed-off-by: Jan Engelhardt <jengelh@gmx.de>

---
 drivers/isdn/gigaset/Kconfig |    8 ++------
 1 file changed, 2 insertions(+), 6 deletions(-)

Index: linux-2.6.23/drivers/isdn/gigaset/Kconfig
===================================================================
--- linux-2.6.23.orig/drivers/isdn/gigaset/Kconfig
+++ linux-2.6.23/drivers/isdn/gigaset/Kconfig
@@ -1,6 +1,4 @@
-menu "Siemens Gigaset"
-
-config ISDN_DRV_GIGASET
+menuconfig ISDN_DRV_GIGASET
 	tristate "Siemens Gigaset support (isdn)"
 	select CRC_CCITT
 	select BITREVERSE
@@ -53,6 +51,4 @@ config GIGASET_UNDOCREQ
 	  features like configuration mode of M105, say yes. If you
 	  care about your device, say no.
 
-endif
-
-endmenu
+endif # ISDN_DRV_GIGASET != n

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: lguest, Re: -mm merge plans for 2.6.23
  2007-07-12  4:52           ` Rusty Russell
  2007-07-12 11:10             ` Avi Kivity
@ 2007-07-19 17:27             ` Christoph Hellwig
  2007-07-20  3:27               ` Rusty Russell
  1 sibling, 1 reply; 484+ messages in thread
From: Christoph Hellwig @ 2007-07-19 17:27 UTC (permalink / raw)
  To: Rusty Russell; +Cc: Andrew Morton, David Miller, hch, linux-kernel, linux-mm

On Thu, Jul 12, 2007 at 02:52:23PM +1000, Rusty Russell wrote:
> This is solely for the wakeup: you don't wake an mm 8)
> 
> The mm reference is held as well under the big lguest_mutex (mm gets
> destroyed before files get closed, so we definitely do need to hold a
> reference).
> 
> I just completed benchmarking: the cached wakeup with the current naive
> drivers makes no difference (at one stage I was playing with batched
> hypercalls, where it seemed to help).
> 
> Thanks Christoph, DaveM!

The version that just got into mainline still has the __put_task_struct
export despite not needing it anymore.  Care to fix this up?

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: lguest, Re: -mm merge plans for 2.6.23
  2007-07-19 17:27             ` Christoph Hellwig
@ 2007-07-20  3:27               ` Rusty Russell
  2007-07-20  7:15                 ` Christoph Hellwig
  0 siblings, 1 reply; 484+ messages in thread
From: Rusty Russell @ 2007-07-20  3:27 UTC (permalink / raw)
  To: Christoph Hellwig; +Cc: Andrew Morton, David Miller, linux-kernel, linux-mm

On Thu, 2007-07-19 at 19:27 +0200, Christoph Hellwig wrote:
> The version that just got into mainline still has the __put_task_struct
> export despite not needing it anymore.  Care to fix this up?

No, it got patched in then immediately patched out again.  Andrew
mis-mixed my patches, but there have been so many of them I find it hard
to blame him.

Rusty.


^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: lguest, Re: -mm merge plans for 2.6.23
  2007-07-20  3:27               ` Rusty Russell
@ 2007-07-20  7:15                 ` Christoph Hellwig
  0 siblings, 0 replies; 484+ messages in thread
From: Christoph Hellwig @ 2007-07-20  7:15 UTC (permalink / raw)
  To: Rusty Russell
  Cc: Christoph Hellwig, Andrew Morton, David Miller, linux-kernel, linux-mm

On Fri, Jul 20, 2007 at 01:27:26PM +1000, Rusty Russell wrote:
> On Thu, 2007-07-19 at 19:27 +0200, Christoph Hellwig wrote:
> > The version that just got into mainline still has the __put_task_struct
> > export despite not needing it anymore.  Care to fix this up?
> 
> No, it got patched in then immediately patched out again.  Andrew
> mis-mixed my patches, but there have been so many of them I find it hard
> to blame him.

Indeed, the export is gone in last mainline gone.

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [PATCH] CFS: Fix missing digit off in wmult table
  2007-07-18 16:17                                                         ` Ingo Molnar
@ 2007-07-20 13:38                                                           ` Roman Zippel
  0 siblings, 0 replies; 484+ messages in thread
From: Roman Zippel @ 2007-07-20 13:38 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Matt Mackall, James Bruce, Thomas Gleixner, Mike Galbraith,
	Linus Torvalds, Andi Kleen, Andrew Morton, linux-kernel,
	Arjan van de Ven, Chris Wright

Hi,

On Wed, 18 Jul 2007, Ingo Molnar wrote:

> > > [more rude insults deleted]
> > > I've been waiting for that obvious question, and i _might_ be able
> > > to answer it, but somehow it never occured to you ;-) Thanks,
> 
> the ";-)" emoticon (and its contents) clearly signals this as a 
> sarcastic, tongue-in-cheek remark.

To take another example why is this still insulting and inappropriate, 
this is a behaviour I would characterize as school bullying:
A bully attacks someone obviously weaker than himself and for example 
takes something away and than continues like "If you ask nicely I'll give 
it back to you.", this often accompied by laughter to signal he's enjoying 
himself and the power he has, but for the other person it's everything but 
funny.

Maybe you don't know what it feels like, but I do and I can't find 
anything funny, sarcastic or whatever about this, no matter how many 
smileys or other tags you add there. If the communication is already that 
troubled as this, such "humor" is really the worst thing you can do and I 
find it rather sad that you can't realize this yourself.

> ok? (If you didnt see/read it as sarcastic straight away then my 
> apologies for insulting you!)

Sorry, that is too little too late. You've apologized before and you 
continued to make fun of me personally to the point of spreading wrong 
information about me, which you could have very easily verified yourself, 
if you only wanted.
What I want from you is that you treat me with respect and to keep your 
"sarcasm" to yourself.

I told you very clearly how I think about you requoting this crap and yet 
you repeat it again _twice_, so on the one hand I get this apology attempt 
and on the other hand you continue to kick me in the crotch? How do you 
think am I supposed to feel about this?

It's also always interesting what you don't respond to. I asked you for 
examples which would prove the (rather strong) assertions you made about 
me, what does it tell me now if you can't back up your statements?

bye, Roman

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [PATCH] CFS: Fix missing digit off in wmult table
  2007-07-18 16:02                                           ` Ingo Molnar
@ 2007-07-20 15:03                                             ` Roman Zippel
  0 siblings, 0 replies; 484+ messages in thread
From: Roman Zippel @ 2007-07-20 15:03 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Peter Zijlstra, Linus Torvalds, James Bruce, Thomas Gleixner,
	Mike Galbraith, Andrea Arcangeli, Andi Kleen, Andrew Morton,
	linux-kernel, Arjan van de Ven, Chris Wright

Hi,

On Wed, 18 Jul 2007, Ingo Molnar wrote:

> > Why do you constantly stress level 19? Yes, that one is special, all 
> > other positive levels were already relatively consistent.
> 
> i constantly stress it for the reason i mentioned a good number of 
> times: because it's by far the most commonly used (and complained about) 
> nice level. =B-)

How do you know that? Most complained about makes most commonly used?

> but because you are asking, i'm glad to give you some first-hand 
> historic background about Linux nice levels (in case you are interested) 
> and the motivations behind their old and new implementations:

I guess I should be thankful now?
I'm curious why you post this now, after I "asked" about this. Most of the 
information is either rather generic or not specific enough for the 
problem at hand. If you had posted this information earlier, it had been 
far more valueable as it could have been a nice base for a discussion.
But posting it this late I can't lose the feeling you're more interested 
in "teaching" me.

> nice levels were always so weak under Linux (just read Peter's report) 

-ENOLINK

> Hope this helps,

Not completely.

For negative nice levels you mentioned audio apps, but these aren't really 
interested in a fair share, they would use the higher percentage only to 
guarantee they get the amount of time they need independent of the 
current load. I think they would be better served with e.g. a deadline 
scheduler, which guarantees them an absolute time share not a relative 
one.
On the other end with positive levels I more remember requests for 
something closer to idle scheduling, where a process only runs when 
nothing else is running.

So assuming we had scheduling classes for the above use cases, what other 
reasons are left for such extreme nice levels?

My proposed nice levels have otherwise the same properties as yours (e.g. 
being consistent). There is one propery you haven't commented on at all 
yet. My proposed levels give the average use a far better idea what they 
actually mean, i.e. that every 5 levels the process gets double/halve the 
cpu time. This is IMO a considerable advantage.

bye, Roman

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [more PATCHes] Use menuconfig objects - CONFIG_ISDN_I4L
  2007-07-18 18:22         ` [more PATCHes] Use menuconfig objects - CONFIG_ISDN_I4L Jan Engelhardt
  2007-07-18 18:23           ` [patch 1/2] Use menuconfig objects - ISDN Jan Engelhardt
  2007-07-18 18:23           ` [patch 2/2] Use menuconfig objects - ISDN/Gigaset Jan Engelhardt
@ 2007-07-22  0:32           ` Tilman Schmidt
  2 siblings, 0 replies; 484+ messages in thread
From: Tilman Schmidt @ 2007-07-22  0:32 UTC (permalink / raw)
  To: Jan Engelhardt
  Cc: Tilman Schmidt, Andrew Morton, Karsten Keil, Linux Kernel Mailing List

[-- Attachment #1: Type: text/plain, Size: 511 bytes --]

Hi,

sorry for the late reply.
On 18.07.2007 20:22 Jan Engelhardt wrote:
> here are two more changes I propose for the isdn submenu(s).
> They go on top of Tilman's patch; each of the two following patches is 
> independent of another.
> Opinions please :)

These are fine by me.

Thanks
Tilman

-- 
Tilman Schmidt                          E-Mail: tilman@imap.cc
Bonn, Germany
Diese Nachricht besteht zu 100% aus wiederverwerteten Bits.
Ungeöffnet mindestens haltbar bis: (siehe Rückseite)


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 253 bytes --]

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-10 10:15 ` -mm merge plans for 2.6.23 Con Kolivas
       [not found]   ` <b21f8390707101802o2d546477n2a18c1c3547c3d7a@mail.gmail.com>
@ 2007-07-23 23:08   ` Jesper Juhl
  2007-07-24  3:22     ` Nick Piggin
  2007-07-24  0:08   ` Con Kolivas
  2 siblings, 1 reply; 484+ messages in thread
From: Jesper Juhl @ 2007-07-23 23:08 UTC (permalink / raw)
  To: Con Kolivas
  Cc: Andrew Morton, ck list, Ingo Molnar, Paul Jackson, linux-mm,
	linux-kernel

On 10/07/07, Con Kolivas <kernel@kolivas.org> wrote:
> On Tuesday 10 July 2007 18:31, Andrew Morton wrote:
> > When replying, please rewrite the subject suitably and try to Cc: the
> > appropriate developer(s).
>
> ~swap prefetch
>
> Nick's only remaining issue which I could remotely identify was to make it
> cpuset aware:
> http://marc.info/?l=linux-mm&m=117875557014098&w=2
> as discussed with Paul Jackson it was cpuset aware:
> http://marc.info/?l=linux-mm&m=117895463120843&w=2
>
> I fixed all bugs I could find and improved it as much as I could last kernel
> cycle.
>
> Put me and the users out of our misery and merge it now or delete it forever
> please. And if the meaningless handwaving that I 100% expect as a response
> begins again, then that's fine. I'll take that as a no and you can dump it.
>
For what it's worth; put me down as supporting the merger of swap
prefetch. I've found it useful in the past, Con has maintained it
nicely and cleaned up everything that people have pointed out - it's
mature, does no harm - let's just get it merged.  It's too late for
2.6.23-rc1 now, but let's try and get this in by -rc2 - it's long
overdue...

-- 
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please      http://www.expita.com/nomime.html

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-10 10:15 ` -mm merge plans for 2.6.23 Con Kolivas
       [not found]   ` <b21f8390707101802o2d546477n2a18c1c3547c3d7a@mail.gmail.com>
  2007-07-23 23:08   ` Jesper Juhl
@ 2007-07-24  0:08   ` Con Kolivas
  2 siblings, 0 replies; 484+ messages in thread
From: Con Kolivas @ 2007-07-24  0:08 UTC (permalink / raw)
  To: Andrew Morton; +Cc: ck list, Ingo Molnar, Paul Jackson, linux-mm, linux-kernel

On Tuesday 10 July 2007 20:15, Con Kolivas wrote:
> On Tuesday 10 July 2007 18:31, Andrew Morton wrote:
> > When replying, please rewrite the subject suitably and try to Cc: the
> > appropriate developer(s).
>
> ~swap prefetch
>
> Nick's only remaining issue which I could remotely identify was to make it
> cpuset aware:
> http://marc.info/?l=linux-mm&m=117875557014098&w=2
> as discussed with Paul Jackson it was cpuset aware:
> http://marc.info/?l=linux-mm&m=117895463120843&w=2
>
> I fixed all bugs I could find and improved it as much as I could last
> kernel cycle.
>
> Put me and the users out of our misery and merge it now or delete it
> forever please. And if the meaningless handwaving that I 100% expect as a
> response begins again, then that's fine. I'll take that as a no and you can
> dump it.

The window for 2.6.23 has now closed and your position on this is clear. I've 
been supporting this code in -mm for 21 months since 16-Oct-2005 without any 
obvious decision for this code forwards or backwards.

I am no longer part of your operating system's kernel's world; thus I cannot 
support this code any longer. Unless someone takes over the code base for 
swap prefetch you have to assume it is now unmaintained and should delete it.

Please respect my request to not be contacted further regarding this or any 
other kernel code.

-- 
-ck

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-23 23:08   ` Jesper Juhl
@ 2007-07-24  3:22     ` Nick Piggin
  2007-07-24  4:53       ` Ray Lee
  0 siblings, 1 reply; 484+ messages in thread
From: Nick Piggin @ 2007-07-24  3:22 UTC (permalink / raw)
  To: Jesper Juhl
  Cc: Andrew Morton, ck list, Ingo Molnar, Paul Jackson, linux-mm,
	linux-kernel

Jesper Juhl wrote:
> On 10/07/07, Con Kolivas <kernel@kolivas.org> wrote:
> 
>> On Tuesday 10 July 2007 18:31, Andrew Morton wrote:
>> > When replying, please rewrite the subject suitably and try to Cc: the
>> > appropriate developer(s).
>>
>> ~swap prefetch
>>
>> Nick's only remaining issue which I could remotely identify was to 
>> make it
>> cpuset aware:
>> http://marc.info/?l=linux-mm&m=117875557014098&w=2
>> as discussed with Paul Jackson it was cpuset aware:
>> http://marc.info/?l=linux-mm&m=117895463120843&w=2
>>
>> I fixed all bugs I could find and improved it as much as I could last 
>> kernel
>> cycle.
>>
>> Put me and the users out of our misery and merge it now or delete it 
>> forever
>> please. And if the meaningless handwaving that I 100% expect as a 
>> response
>> begins again, then that's fine. I'll take that as a no and you can 
>> dump it.
>>
> For what it's worth; put me down as supporting the merger of swap
> prefetch. I've found it useful in the past, Con has maintained it
> nicely and cleaned up everything that people have pointed out - it's
> mature, does no harm - let's just get it merged.  It's too late for
> 2.6.23-rc1 now, but let's try and get this in by -rc2 - it's long
> overdue...


Not talking about swap prefetch itself, but everytime I have asked
anyone to instrument or produce some workload where swap prefetch
helps, they never do.

Fair enough if swap prefetch helps them, but I also want to look at
why that is the case and try to improve page reclaim in some of
these situations (for example standard overnight cron jobs shouldn't
need swap prefetch on a 1 or 2GB system, I would hope).

Anyway, back to swap prefetch, I don't know why I've been singled out
as the bad guy here. I'm one of the only people who has had a look at
the damn thing and tried to point out areas where it could be improved
to the point of being included, and outlining things that are needed
for it to be merged (ie. numbers). If anyone thinks that makes me the
bad guy then they have an utterly inverted understanding of what peer
review is for.

Finally, everyone who has ever hacked on these heuristicy parts of the
VM has heaps of patches that help some workload or some silly test
case or (real or percieved) shortfall but have not been merged. It
really isn't anything personal.

If something really works, then it should be possible to get real
numbers in real situations where it helps (OK, swap prefetching won't
be as easy as a straight line performance improvement, but still much
easier than trying to measure something like scheduler interactivity).

Numbers are the best way to add weight to the pro-merge argument, so
for all the people who a whining about merging this and don't want
to actually work on the code -- post some numbers for where it helps
you!!

-- 
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-24  3:22     ` Nick Piggin
@ 2007-07-24  4:53       ` Ray Lee
  2007-07-24  5:10         ` Jeremy Fitzhardinge
                           ` (2 more replies)
  0 siblings, 3 replies; 484+ messages in thread
From: Ray Lee @ 2007-07-24  4:53 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Jesper Juhl, Andrew Morton, ck list, Ingo Molnar, Paul Jackson,
	linux-mm, linux-kernel

On 7/23/07, Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> Not talking about swap prefetch itself, but everytime I have asked
> anyone to instrument or produce some workload where swap prefetch
> helps, they never do.
[...]
> so for all the people who a whining about merging this and don't want
> to actually work on the code -- post some numbers for where it helps
> you!!

<Raised eyebrow> You sound frustrated. Perhaps we could be
communicating better. I'll start.

Unlike others on the cc: line, I don't get paid to hack on the kernel,
not even indirectly. So if you find that my lack of providing numbers
is giving you heartache, I can only apologize and point at my paying
work that requires my attention.

That said, I'm willing to run my day to day life through both a swap
prefetch kernel and a normal one. *However*, before I go through all
the work of instrumenting the damn thing, I'd really like Andrew (or
Linus) to lay out his acceptance criteria on the feature. Exactly what
*should* I be paying attention to? I've suggested keeping track of
process swapin delay total time, and comparing with and without. Is
that reasonable? Is it incomplete?

Without Andrew's criteria, we're back to where we've been for a long
time: lots of work, no forward motion. Perhaps it's a character flaw
of mine, but I'd really like to know what would constitute proof here
before I invest the effort. Especially given that Con has already
written a test case that shows that swap prefetch works, and that I've
given you a clear argument for why better (or even perfect) page
reclaim can't provide full coverage to all the situations that swap
prefetch helps. (Also, it's not like I've got tons free time, y'know?
Just like all the rest of you all, I have to pick and choose my
battles if I'm going to be effective.)

Since this merge period has appeared particularly frazzling for
Andrew, I've been keeping silent and waiting for him to get to a point
where there's a breather. I didn't feel it would be polite to request
yet more work out of him while he had a mess on his hands.

But, given this has come to a head, I'm asking now.

Andrew? You've always given the impression that you want this run more
as an engineering effort than an artistic endeavour, so help us out
here. What are your concerns with swap prefetch? What sort of
comparative data would you like to see to justify its inclusion, or to
prove that it's not needed?

Or are we reading too much into the fact that it isn't merged? In
short, communicate please, it will help.

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-24  4:53       ` Ray Lee
@ 2007-07-24  5:10         ` Jeremy Fitzhardinge
  2007-07-24  5:18           ` Ray Lee
  2007-07-24  5:16         ` Nick Piggin
  2007-07-24  5:18         ` Andrew Morton
  2 siblings, 1 reply; 484+ messages in thread
From: Jeremy Fitzhardinge @ 2007-07-24  5:10 UTC (permalink / raw)
  To: Ray Lee
  Cc: Nick Piggin, Jesper Juhl, Andrew Morton, ck list, Ingo Molnar,
	Paul Jackson, linux-mm, linux-kernel

Ray Lee wrote:
> That said, I'm willing to run my day to day life through both a swap
> prefetch kernel and a normal one. *However*, before I go through all
> the work of instrumenting the damn thing, I'd really like Andrew (or
> Linus) to lay out his acceptance criteria on the feature. Exactly what
> *should* I be paying attention to? I've suggested keeping track of
> process swapin delay total time, and comparing with and without. Is
> that reasonable? Is it incomplete?

Um, isn't it up to you?  The questions that need to be answered are:

   1. What are you trying to achieve?  Presumably you have some intended
      or desired effect you're trying to get.  What's the intended
      audience?  Who would be expected to see a benefit?  Who suffers?
   2. How does the code achieve that end?  Is it nasty or nice?  Has
      everyone who's interested in the affected areas at least looked at
      the changes, or ideally given them a good review?  Does it need
      lots of tunables, or is it set-and-forget?
   3. Does it achieve the intended end?  Numbers are helpful here.
   4. Does it make anything worse?  A lot or a little?  Rare corner
      cases, or a real world usage?  Again, numbers make the case most
      strongly.


I can't say I've been following this particular feature very closely,
but these are the fundamental questions that need to be dealt with in
merging any significant change.  And as Nick says, historically point 4
is very important in VM tuning changes, because "obvious" improvements
have often ended up giving pathologically bad results on unexpected
workloads.

    J

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-24  4:53       ` Ray Lee
  2007-07-24  5:10         ` Jeremy Fitzhardinge
@ 2007-07-24  5:16         ` Nick Piggin
  2007-07-24 16:15           ` Ray Lee
  2007-07-24  5:18         ` Andrew Morton
  2 siblings, 1 reply; 484+ messages in thread
From: Nick Piggin @ 2007-07-24  5:16 UTC (permalink / raw)
  To: Ray Lee
  Cc: Jesper Juhl, Andrew Morton, ck list, Ingo Molnar, Paul Jackson,
	linux-mm, linux-kernel

Ray Lee wrote:
> On 7/23/07, Nick Piggin <nickpiggin@yahoo.com.au> wrote:

> That said, I'm willing to run my day to day life through both a swap
> prefetch kernel and a normal one. *However*, before I go through all
> the work of instrumenting the damn thing, I'd really like Andrew (or
> Linus) to lay out his acceptance criteria on the feature. Exactly what
> *should* I be paying attention to? I've suggested keeping track of
> process swapin delay total time, and comparing with and without. Is
> that reasonable? Is it incomplete?

I don't feel it is so useful without more context. For example, in
most situations where pages get pushed to swap, there will *also* be
useful file backed pages being thrown out. Swap prefetch might
improve the total swapin delay time very significantly but that may
be just a tiny portion of the real problem.

Also a random day at the desktop, it is quite a broad scope and
pretty well impossible to analyse. If we can first try looking at
some specific problems that are easily identified.

Looking at your past email, you have a 1GB desktop system and your
overnight updatedb run is causing stuff to get swapped out such that
swap prefetch makes it significantly better. This is really
intriguing to me, and I would hope we can start by making this
particular workload "not suck" without swap prefetch (and hopefully
make it even better than it currently is with swap prefetch because
we'll try not to evict useful file backed pages as well).

After that we can look at other problems that swap prefetch helps
with, or think of some ways to measure your "whole day" scenario.

So when/if you have time, I can cook up a list of things to monitor
and possibly a patch to add some instrumentation over this updatedb
run.

Anyway, I realise swap prefetching has some situations where it will
fundamentally outperform even the page replacement oracle. This is
why I haven't asked for it to be dropped: it isn't a bad idea at all.

However, if we can improve basic page reclaim where it is obviously
lacking, that is always preferable. eg: being a highly speculative
operation, swap prefetch is not great for power efficiency -- but we
still want laptop users to have a good experience as well, right?

-- 
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-24  5:10         ` Jeremy Fitzhardinge
@ 2007-07-24  5:18           ` Ray Lee
  0 siblings, 0 replies; 484+ messages in thread
From: Ray Lee @ 2007-07-24  5:18 UTC (permalink / raw)
  To: Jeremy Fitzhardinge
  Cc: Nick Piggin, Jesper Juhl, Andrew Morton, ck list, Ingo Molnar,
	Paul Jackson, linux-mm, linux-kernel

On 7/23/07, Jeremy Fitzhardinge <jeremy@goop.org> wrote:
> Ray Lee wrote:
> > That said, I'm willing to run my day to day life through both a swap
> > prefetch kernel and a normal one. *However*, before I go through all
> > the work of instrumenting the damn thing, I'd really like Andrew (or
> > Linus) to lay out his acceptance criteria on the feature. Exactly what
> > *should* I be paying attention to? I've suggested keeping track of
> > process swapin delay total time, and comparing with and without. Is
> > that reasonable? Is it incomplete?
>
> Um, isn't it up to you?

Huh? I'm not Linus or Andrew, with the power to merge a patch to the
2.6 kernel, so I think that the answer to that is a really clear 'No.'

> 4. Does it make anything worse?  A lot or a little?  Rare corner
> cases, or a real world usage?  Again, numbers make the case most
> strongly.
>
> I can't say I've been following this particular feature very closely,
> but these are the fundamental questions that need to be dealt with in
> merging any significant change.  And as Nick says, historically point 4
> is very important in VM tuning changes, because "obvious" improvements
> have often ended up giving pathologically bad results on unexpected
> workloads.

Dude. My whole question was *what* numbers. Please go back and read it
all again. Maybe I was unclear, but I really don't think so.

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-24  4:53       ` Ray Lee
  2007-07-24  5:10         ` Jeremy Fitzhardinge
  2007-07-24  5:16         ` Nick Piggin
@ 2007-07-24  5:18         ` Andrew Morton
  2007-07-24  6:01           ` Ray Lee
  2007-07-25  1:26           ` [ck] " Matthew Hawkins
  2 siblings, 2 replies; 484+ messages in thread
From: Andrew Morton @ 2007-07-24  5:18 UTC (permalink / raw)
  To: Ray Lee
  Cc: Nick Piggin, Jesper Juhl, ck list, Ingo Molnar, Paul Jackson,
	linux-mm, linux-kernel

On Mon, 23 Jul 2007 21:53:38 -0700 "Ray Lee" <ray-lk@madrabbit.org> wrote:

> 
> Since this merge period has appeared particularly frazzling for
> Andrew, I've been keeping silent and waiting for him to get to a point
> where there's a breather. I didn't feel it would be polite to request
> yet more work out of him while he had a mess on his hands.

Let it just be noted that Con is not the only one who has expended effort
on this patch.  It's been in -mm for nearly two years and it has meant
ongoing effort for me and, to a lesser extent, other MM developers to keep
it alive.

> But, given this has come to a head, I'm asking now.
> 
> Andrew? You've always given the impression that you want this run more
> as an engineering effort than an artistic endeavour, so help us out
> here. What are your concerns with swap prefetch? What sort of
> comparative data would you like to see to justify its inclusion, or to
> prove that it's not needed?

Critera are different for each patch, but it usually comes down to a
cost/benefit judgement.  Does the benefit of the patch exceed its
maintenance cost over the lifetime of the kernel (whatever that is).

In this case the answer to that has never been clear to me.  The (much
older) fs-aio patches were (are) in a similar situation.

The other consideration here is, as Nick points out, are the problems which
people see this patch solving for them solveable in other, better ways? 
IOW, is this patch fixing up preexisting deficiencies post-facto?

To attack the second question we could start out with bug reports: system A
with workload B produces result C.  I think result C is wrong for <reasons>
and would prefer to see result D.

> Or are we reading too much into the fact that it isn't merged? In
> short, communicate please, it will help.

Well.  The above, plus there's always a lot of stuff happening in MM land,
and I haven't seen much in the way of enthusiasm from the usual MM
developers.


^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-24  5:18         ` Andrew Morton
@ 2007-07-24  6:01           ` Ray Lee
  2007-07-24  6:10             ` Andrew Morton
  2007-07-24  9:38             ` Tilman Schmidt
  2007-07-25  1:26           ` [ck] " Matthew Hawkins
  1 sibling, 2 replies; 484+ messages in thread
From: Ray Lee @ 2007-07-24  6:01 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nick Piggin, Jesper Juhl, ck list, Ingo Molnar, Paul Jackson,
	linux-mm, linux-kernel

On 7/23/07, Andrew Morton <akpm@linux-foundation.org> wrote:
> Let it just be noted that Con is not the only one who has expended effort
> on this patch.  It's been in -mm for nearly two years and it has meant
> ongoing effort for me and, to a lesser extent, other MM developers to keep
> it alive.

<nods> Yes, keeping patches from crufting up and stepping on other
patches' toes is hard work; I did it for a bit, and it was one of the
more thankless tasks I've tried a hand at.

So, thanks.

> Critera are different for each patch, but it usually comes down to a
> cost/benefit judgement.  Does the benefit of the patch exceed its
> maintenance cost over the lifetime of the kernel (whatever that is).

Well, I suspect it's 'lifetime of the feature,' in this case as it's
no more user visible than the page replacement algorithm in the first
place.

> The other consideration here is, as Nick points out, are the problems which
> people see this patch solving for them solveable in other, better ways?
> IOW, is this patch fixing up preexisting deficiencies post-facto?

In some cases, it almost certainly is. It also has the troubling
aspect of mitigating future regressions without anyone terribly
noticing, due to it being able to paper over those hypothetical future
deficiencies when they're introduced.

> To attack the second question we could start out with bug reports: system A
> with workload B produces result C.  I think result C is wrong for <reasons>
> and would prefer to see result D.

I spend a lot of time each day watching my computer fault my
workingset back in when I switch contexts. I'd rather I didn't have to
do that. Unfortunately, that's a pretty subjective problem report. For
whatever it's worth, we have pretty subjective solution reports
pointing to swap prefetch as providing a fix for them.

My concern is that a subjective problem report may not be good enough.
So, what do I measure to make this an objective problem report? And if
I do that (and it shows a positive result), will that be good enough
to argue for inclusion?

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-24  6:01           ` Ray Lee
@ 2007-07-24  6:10             ` Andrew Morton
  2007-07-24  9:38             ` Tilman Schmidt
  1 sibling, 0 replies; 484+ messages in thread
From: Andrew Morton @ 2007-07-24  6:10 UTC (permalink / raw)
  To: Ray Lee
  Cc: Nick Piggin, Jesper Juhl, ck list, Ingo Molnar, Paul Jackson,
	linux-mm, linux-kernel

On Mon, 23 Jul 2007 23:01:41 -0700 "Ray Lee" <ray-lk@madrabbit.org> wrote:

> So, what do I measure to make this an objective problem report?

Ideal would be to find a reproducible-by-others testcase which does what you
believe to be the wrong thing.

> And if
> I do that (and it shows a positive result), will that be good enough
> to argue for inclusion?

That depends upon whether there are more suitable ways of fixing "the
wrong thing".

There may not be - it could well be that present behaviour
is correct for the testcase, but it leaves the system in the wrong
state for your large workload shift.  In that case, prefetching (ie:
restoring system state approximately to that which prevailed prior to
"testcase") might well be a suitable fix.

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-24  6:01           ` Ray Lee
  2007-07-24  6:10             ` Andrew Morton
@ 2007-07-24  9:38             ` Tilman Schmidt
  1 sibling, 0 replies; 484+ messages in thread
From: Tilman Schmidt @ 2007-07-24  9:38 UTC (permalink / raw)
  To: Ray Lee
  Cc: Andrew Morton, Nick Piggin, Jesper Juhl, ck list, Ingo Molnar,
	Paul Jackson, linux-mm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1106 bytes --]

Ray Lee schrieb:
> I spend a lot of time each day watching my computer fault my
> workingset back in when I switch contexts. I'd rather I didn't have to
> do that. Unfortunately, that's a pretty subjective problem report. For
> whatever it's worth, we have pretty subjective solution reports
> pointing to swap prefetch as providing a fix for them.

Add me.

> My concern is that a subjective problem report may not be good enough.

That's my impression too, seeing the insistence on numbers.

> So, what do I measure to make this an objective problem report?

That seems to be the crux of the matter: how to measure subjective
usability issues (aka user experience) when simple reports along the
lines of "A is much better than B for everyday work" are not enough.
The same problem already impaired the "fair scheduler" discussion.
It would really help to have a clear direction there.

-- 
Tilman Schmidt                    E-Mail: tilman@imap.cc
Bonn, Germany
Diese Nachricht besteht zu 100% aus wiederverwerteten Bits.
Ungeöffnet mindestens haltbar bis: (siehe Rückseite)


[-- Attachment #2: OpenPGP digital signature --]
[-- Type: application/pgp-signature, Size: 250 bytes --]

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-24  5:16         ` Nick Piggin
@ 2007-07-24 16:15           ` Ray Lee
  2007-07-24 17:46             ` [ck] " Rashkae
                               ` (2 more replies)
  0 siblings, 3 replies; 484+ messages in thread
From: Ray Lee @ 2007-07-24 16:15 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Jesper Juhl, Andrew Morton, ck list, Ingo Molnar, Paul Jackson,
	linux-mm, linux-kernel

On 7/23/07, Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> Ray Lee wrote:
> > That said, I'm willing to run my day to day life through both a swap
> > prefetch kernel and a normal one. *However*, before I go through all
> > the work of instrumenting the damn thing, I'd really like Andrew (or
> > Linus) to lay out his acceptance criteria on the feature. Exactly what
> > *should* I be paying attention to? I've suggested keeping track of
> > process swapin delay total time, and comparing with and without. Is
> > that reasonable? Is it incomplete?
>
> I don't feel it is so useful without more context. For example, in
> most situations where pages get pushed to swap, there will *also* be
> useful file backed pages being thrown out. Swap prefetch might
> improve the total swapin delay time very significantly but that may
> be just a tiny portion of the real problem.

Agreed, it's important to make sure we're not being penny-wise and
pound-foolish here.

> Also a random day at the desktop, it is quite a broad scope and
> pretty well impossible to analyse.

It is pretty broad, but that's also what swap prefetch is targetting.
As for hard to analyze, I'm not sure I agree. One can black-box test
this stuff with only a few controls. e.g., if I use the same apps each
day (mercurial, firefox, xorg, gcc), and the total I/O wait time
consistently goes down on a swap prefetch kernel (normalized by some
control statistic, such as application CPU time or total I/O, or
something), then that's a useful measurement.

> If we can first try looking at
> some specific problems that are easily identified.

Always easier, true. Let's start with "My mouse jerks around under
memory load." A Google Summer of Code student working on X.Org claims
that mlocking the mouse handling routines gives a smooth cursor under
load ([1]). It's surprising that the kernel would swap that out in the
first place.

[1] http://vignatti.wordpress.com/2007/07/06/xorg-input-thread-summary-or-something/

> Looking at your past email, you have a 1GB desktop system and your
> overnight updatedb run is causing stuff to get swapped out such that
> swap prefetch makes it significantly better. This is really
> intriguing to me, and I would hope we can start by making this
> particular workload "not suck" without swap prefetch (and hopefully
> make it even better than it currently is with swap prefetch because
> we'll try not to evict useful file backed pages as well).

updatedb is an annoying case, because one would hope that there would
be a better way to deal with that highly specific workload. It's also
pretty stat dominant, which puts it roughly in the same category as a
git diff. (They differ in that updatedb does a lot of open()s and
getdents on directories, git merely does a ton of lstat()s instead.)

Anyway, my point is that I worry that tuning for an unusual and
infrequent workload (which updatedb certainly is), is the wrong way to
go.

> After that we can look at other problems that swap prefetch helps
> with, or think of some ways to measure your "whole day" scenario.
>
> So when/if you have time, I can cook up a list of things to monitor
> and possibly a patch to add some instrumentation over this updatedb
> run.

That would be appreciated. Don't spend huge amounts of time on it,
okay? Point me the right direction, and we'll see how far I can run
with it.

> Anyway, I realise swap prefetching has some situations where it will
> fundamentally outperform even the page replacement oracle. This is
> why I haven't asked for it to be dropped: it isn't a bad idea at all.

<nod>

> However, if we can improve basic page reclaim where it is obviously
> lacking, that is always preferable. eg: being a highly speculative
> operation, swap prefetch is not great for power efficiency -- but we
> still want laptop users to have a good experience as well, right?

Absolutely. Disk I/O is the enemy, and the best I/O is one you never
had to do in the first place.

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [ck] Re: -mm merge plans for 2.6.23
  2007-07-24 16:15           ` Ray Lee
@ 2007-07-24 17:46             ` Rashkae
  2007-07-25  4:06             ` Nick Piggin
  2007-07-25  4:46             ` david
  2 siblings, 0 replies; 484+ messages in thread
From: Rashkae @ 2007-07-24 17:46 UTC (permalink / raw)
  To: Ray Lee; +Cc: ck, linux-kernel

> 
>> However, if we can improve basic page reclaim where it is obviously
>> lacking, that is always preferable. eg: being a highly speculative
>> operation, swap prefetch is not great for power efficiency -- but we
>> still want laptop users to have a good experience as well, right?
>

Sounds like something that can be altered with a tuneable for workloads 
where power efficiency is more important than performance.

As far as performance goes, empty memory is wasted memory.  I think the 
most important 'measurement' people can make for swap prefetch, if this 
is even possible to capture, is a positive hit ratio.  Under everyman's 
typical workload, what percentage of pages prefetched end up being used? 
And what percentage end up discarded?  I'm pulling these numbers out of 
thin air, but I would say, if > 10% is referenced, and < 70% discarded, 
then that would be significant performance boost well worthwhile.


To be clear, I don't know what I'm talking about.  It just seems to me 
however, that debating whether or not to implement a performance boost 
because we can better tune corner cases is silly.  For as long as 
computers have used swap and unused memory, there will be a performance 
gain to background prefetching.  That doesn't preclude developers from 
tuning the specific workloads that lead to such.  It's not like this 
were a theoretical discussion of should we develop this or not.. 
Prefetch is here, now, and working.  The only questions I see are:

Does the performance gain from Prefetch compensate for the prefetch code 
memory requirements?

Is there someone who's comfortable with lkml politics willing to 
maintain the thing?

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [ck] Re: -mm merge plans for 2.6.23
  2007-07-24  5:18         ` Andrew Morton
  2007-07-24  6:01           ` Ray Lee
@ 2007-07-25  1:26           ` Matthew Hawkins
  2007-07-25  1:35             ` David Miller
  1 sibling, 1 reply; 484+ messages in thread
From: Matthew Hawkins @ 2007-07-25  1:26 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ray Lee, Nick Piggin, Jesper Juhl, linux-kernel, ck list,
	linux-mm, Paul Jackson

On 7/24/07, Andrew Morton <akpm@linux-foundation.org> wrote:
> The other consideration here is, as Nick points out, are the problems which
> people see this patch solving for them solveable in other, better ways?
> IOW, is this patch fixing up preexisting deficiencies post-facto?

So let me get this straight - you don't want to merge swap prefetch
which exists now and solves issues many people are seeing, and has
been tested more than a gazillion other bits & pieces that do get
merged - because it could be possible that in the future some other
patch, which doesn't yet exist and nobody is working on, may solve the
problem better?

You know what, just release Linux 0.02 as 2.6.23 because, using your
logic, everything that was merged since October 5, 1991 could be
replaced by something better.  Perhaps.  So there's obviously no point
having it there in the first place & there'll be untold savings in
storage costs and compilation time for the kernel tree, also bandwidth
for the mirror sites etc. in the mean time while we wait for the magic
pixies to come and deliver the one true piece of code that cannot be
improved upon.

> Well.  The above, plus there's always a lot of stuff happening in MM land,
> and I haven't seen much in the way of enthusiasm from the usual MM
> developers.

I haven't seen much in the way of enthusiasm from developers, period.
People are tired of maintaining patches for years that never get
merged into mainline because of totally bullshit reasons (usually
amounting to NIH syndrome)

-- 
Matt

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [ck] Re: -mm merge plans for 2.6.23
  2007-07-25  1:26           ` [ck] " Matthew Hawkins
@ 2007-07-25  1:35             ` David Miller
  0 siblings, 0 replies; 484+ messages in thread
From: David Miller @ 2007-07-25  1:35 UTC (permalink / raw)
  To: darthmdh
  Cc: akpm, ray-lk, nickpiggin, jesper.juhl, linux-kernel, ck, linux-mm, pj

From: "Matthew Hawkins" <darthmdh@gmail.com>
Date: Wed, 25 Jul 2007 11:26:57 +1000

> On 7/24/07, Andrew Morton <akpm@linux-foundation.org> wrote:
> > The other consideration here is, as Nick points out, are the problems which
> > people see this patch solving for them solveable in other, better ways?
> > IOW, is this patch fixing up preexisting deficiencies post-facto?
> 
> So let me get this straight - you don't want to merge swap prefetch
> which exists now and solves issues many people are seeing, and has
> been tested more than a gazillion other bits & pieces that do get
> merged - because it could be possible that in the future some other
> patch, which doesn't yet exist and nobody is working on, may solve the
> problem better?

I have to generally agree that the objections to the swap prefetch
patches have been conjecture and in general wasting time and
frustrating people.

There is a point at which it might be wise to just step back and let
the river run it's course and see what happens.  Initially, it's good
to play games of "what if", but after several months it's not a
productive thing and slows down progress for no good reason.

If a better mechanism gets implemented, great!  We'll can easily
replace the swap prefetch stuff at such time.  But until then swap
prefetch is what we have and it's sat long enough in -mm with no major
problems to merge it.

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-24 16:15           ` Ray Lee
  2007-07-24 17:46             ` [ck] " Rashkae
@ 2007-07-25  4:06             ` Nick Piggin
  2007-07-25  4:55               ` Rene Herman
                                 ` (4 more replies)
  2007-07-25  4:46             ` david
  2 siblings, 5 replies; 484+ messages in thread
From: Nick Piggin @ 2007-07-25  4:06 UTC (permalink / raw)
  To: Ray Lee
  Cc: Jesper Juhl, Andrew Morton, ck list, Ingo Molnar, Paul Jackson,
	linux-mm, linux-kernel

Ray Lee wrote:
> On 7/23/07, Nick Piggin <nickpiggin@yahoo.com.au> wrote:

>> Also a random day at the desktop, it is quite a broad scope and
>> pretty well impossible to analyse.
> 
> 
> It is pretty broad, but that's also what swap prefetch is targetting.
> As for hard to analyze, I'm not sure I agree. One can black-box test
> this stuff with only a few controls. e.g., if I use the same apps each
> day (mercurial, firefox, xorg, gcc), and the total I/O wait time
> consistently goes down on a swap prefetch kernel (normalized by some
> control statistic, such as application CPU time or total I/O, or
> something), then that's a useful measurement.

I'm not saying that we can't try to tackle that problem, but first of
all you have a really nice narrow problem where updatedb seems to be
causing the kernel to completely do the wrong thing. So we start on
that.


>> If we can first try looking at
>> some specific problems that are easily identified.
> 
> 
> Always easier, true. Let's start with "My mouse jerks around under
> memory load." A Google Summer of Code student working on X.Org claims
> that mlocking the mouse handling routines gives a smooth cursor under
> load ([1]). It's surprising that the kernel would swap that out in the
> first place.
> 
> [1] 
> http://vignatti.wordpress.com/2007/07/06/xorg-input-thread-summary-or-something/ 

OK, I'm not sure what the point is though. Under heavy memory load,
things are going to get swapped out... and swap prefetch isn't going
to help there (at least, not during the memory load).

There are also other issues like whether the CPU scheduler is at fault,
etc. Interactive workloads are always the hardest to work out. updatedb
is a walk in the park by comparison.


>> Looking at your past email, you have a 1GB desktop system and your
>> overnight updatedb run is causing stuff to get swapped out such that
>> swap prefetch makes it significantly better. This is really
>> intriguing to me, and I would hope we can start by making this
>> particular workload "not suck" without swap prefetch (and hopefully
>> make it even better than it currently is with swap prefetch because
>> we'll try not to evict useful file backed pages as well).
> 
> 
> updatedb is an annoying case, because one would hope that there would
> be a better way to deal with that highly specific workload. It's also
> pretty stat dominant, which puts it roughly in the same category as a
> git diff. (They differ in that updatedb does a lot of open()s and
> getdents on directories, git merely does a ton of lstat()s instead.)

Yeah, and I suspect we might be able to do better use-once of
inode and dentry caches. It isn't really highly specific: lots
of things tend to just scan over a few files once -- updatedb
just scans a lot so the problem becomes more noticable.


> Anyway, my point is that I worry that tuning for an unusual and
> infrequent workload (which updatedb certainly is), is the wrong way to
> go.

Well it runs every day or so for every desktop Linux user, and
it has similarities with other workloads. We don't want to optimise
it at the expense of other things, but it _really_ should not be
pushing a 1-2GB desktop into swap, I don't think.


>> After that we can look at other problems that swap prefetch helps
>> with, or think of some ways to measure your "whole day" scenario.
>>
>> So when/if you have time, I can cook up a list of things to monitor
>> and possibly a patch to add some instrumentation over this updatedb
>> run.
> 
> 
> That would be appreciated. Don't spend huge amounts of time on it,
> okay? Point me the right direction, and we'll see how far I can run
> with it.

I guess /proc/meminfo, /proc/zoneinfo, /proc/vmstat, /proc/slabinfo
before and after the updatedb run with the latest kernel would be a
first step. top and vmstat output during the run wouldn't hurt either.

Thanks,
Nick

-- 
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-24 16:15           ` Ray Lee
  2007-07-24 17:46             ` [ck] " Rashkae
  2007-07-25  4:06             ` Nick Piggin
@ 2007-07-25  4:46             ` david
  2007-07-25  8:00               ` Rene Herman
  2007-07-25 15:55               ` Ray Lee
  2 siblings, 2 replies; 484+ messages in thread
From: david @ 2007-07-25  4:46 UTC (permalink / raw)
  To: Ray Lee
  Cc: Nick Piggin, Jesper Juhl, Andrew Morton, ck list, Ingo Molnar,
	Paul Jackson, linux-mm, linux-kernel

On Tue, 24 Jul 2007, Ray Lee wrote:

> On 7/23/07, Nick Piggin <nickpiggin@yahoo.com.au> wrote:
>>  Ray Lee wrote:
>
>>  Looking at your past email, you have a 1GB desktop system and your
>>  overnight updatedb run is causing stuff to get swapped out such that
>>  swap prefetch makes it significantly better. This is really
>>  intriguing to me, and I would hope we can start by making this
>>  particular workload "not suck" without swap prefetch (and hopefully
>>  make it even better than it currently is with swap prefetch because
>>  we'll try not to evict useful file backed pages as well).
>
> updatedb is an annoying case, because one would hope that there would
> be a better way to deal with that highly specific workload. It's also
> pretty stat dominant, which puts it roughly in the same category as a
> git diff. (They differ in that updatedb does a lot of open()s and
> getdents on directories, git merely does a ton of lstat()s instead.)
>
> Anyway, my point is that I worry that tuning for an unusual and
> infrequent workload (which updatedb certainly is), is the wrong way to
> go.

updatedb pushing out program data may be able to be improved on with drop 
behind or similar.

however another scenerio that causes a similar problem is when a user is 
busy useing one of the big memory hogs and then switches to another (think 
switching between openoffice and firefox)

>>  After that we can look at other problems that swap prefetch helps
>>  with, or think of some ways to measure your "whole day" scenario.
>>
>>  So when/if you have time, I can cook up a list of things to monitor
>>  and possibly a patch to add some instrumentation over this updatedb
>>  run.
>
> That would be appreciated. Don't spend huge amounts of time on it,
> okay? Point me the right direction, and we'll see how far I can run
> with it.

you could make a synthetic test by writing a memory hog that allocates 3/4 
of your ram then pauses waiting for input and then randomly accesses the 
memory for a while (say randomly accessing 2x # of pages allocated) and 
then pausing again before repeating

run two of these, alternating which one is running at any one time. time 
how long it takes to do the random accesses.

the difference in this time should be a fair example of how much it would 
impact the user.

by the way, I've also seen comments on the Postgres performance mailing 
list about how slow linux is compared to other OS's in pulling data back 
in that's been pushed out to swap (not a factor on dedicated database 
machines, but a big factor on multi-purpose machines)

>>  Anyway, I realise swap prefetching has some situations where it will
>>  fundamentally outperform even the page replacement oracle. This is
>>  why I haven't asked for it to be dropped: it isn't a bad idea at all.
>
> <nod>
>
>>  However, if we can improve basic page reclaim where it is obviously
>>  lacking, that is always preferable. eg: being a highly speculative
>>  operation, swap prefetch is not great for power efficiency -- but we
>>  still want laptop users to have a good experience as well, right?
>
> Absolutely. Disk I/O is the enemy, and the best I/O is one you never
> had to do in the first place.

almost always true, however there is some amount of I/O that is free with 
todays drives (remember, they read the entire track into ram and then 
give you the sectors on the track that you asked for). and if you have a 
raid array this is even more true.

if you read one sector in from a raid5 array you have done all the same 
I/O that you would have to do to read in the entire stripe, but I don't 
believe that the current system will keep it all around if it exceeds the 
readahead limit.

so in many cases readahead may end up being significantly cheaper then you 
expect.

David Lang

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-25  4:06             ` Nick Piggin
@ 2007-07-25  4:55               ` Rene Herman
  2007-07-25  5:00                 ` Nick Piggin
                                   ` (2 more replies)
  2007-07-25  6:09               ` [ck] " Matthew Hawkins
                                 ` (3 subsequent siblings)
  4 siblings, 3 replies; 484+ messages in thread
From: Rene Herman @ 2007-07-25  4:55 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Ray Lee, Jesper Juhl, Andrew Morton, ck list, Ingo Molnar,
	Paul Jackson, linux-mm, linux-kernel

On 07/25/2007 06:06 AM, Nick Piggin wrote:

> Ray Lee wrote:

>> Anyway, my point is that I worry that tuning for an unusual and 
>> infrequent workload (which updatedb certainly is), is the wrong way to 
>> go.
> 
> Well it runs every day or so for every desktop Linux user, and it has
> similarities with other workloads.

It certainly doesn't run for me ever. Always kind of a "that's not the 
point" comment but I just keep wondering whenever I see anyone complain 
about updatedb why the _hell_ they are running it in the first place. If 
anyone who never uses "locate" for anything simply disable updatedb, the 
problem will for a large part be solved.

This not just meant as a cheap comment; while I can think of a few similar 
loads even on the desktop (scanning a browser cache, a media player indexing 
a large amount of media files, ...) I've never heard of problems _other_ 
than updatedb. So just junk that crap and be happy.

Rene.

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-25  4:55               ` Rene Herman
@ 2007-07-25  5:00                 ` Nick Piggin
  2007-07-25  5:12                 ` david
  2007-07-25  5:30                 ` Eric St-Laurent
  2 siblings, 0 replies; 484+ messages in thread
From: Nick Piggin @ 2007-07-25  5:00 UTC (permalink / raw)
  To: Rene Herman
  Cc: Ray Lee, Jesper Juhl, Andrew Morton, ck list, Ingo Molnar,
	Paul Jackson, linux-mm, linux-kernel

Rene Herman wrote:
> On 07/25/2007 06:06 AM, Nick Piggin wrote:
> 
>> Ray Lee wrote:
> 
> 
>>> Anyway, my point is that I worry that tuning for an unusual and 
>>> infrequent workload (which updatedb certainly is), is the wrong way 
>>> to go.
>>
>>
>> Well it runs every day or so for every desktop Linux user, and it has
>> similarities with other workloads.
> 
> 
> It certainly doesn't run for me ever. Always kind of a "that's not the 
> point" comment but I just keep wondering whenever I see anyone complain 
> about updatedb why the _hell_ they are running it in the first place. If 
> anyone who never uses "locate" for anything simply disable updatedb, the 
> problem will for a large part be solved.
> 
> This not just meant as a cheap comment; while I can think of a few 
> similar loads even on the desktop (scanning a browser cache, a media 
> player indexing a large amount of media files, ...) I've never heard of 
> problems _other_ than updatedb. So just junk that crap and be happy.

OK fair point, but the counter point that there are real patterns
that just use-once a lot of metadata (ls, for example. grep even.)

-- 
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-25  4:55               ` Rene Herman
  2007-07-25  5:00                 ` Nick Piggin
@ 2007-07-25  5:12                 ` david
  2007-07-25  5:30                   ` Rene Herman
  2007-07-25  5:30                 ` Eric St-Laurent
  2 siblings, 1 reply; 484+ messages in thread
From: david @ 2007-07-25  5:12 UTC (permalink / raw)
  To: Rene Herman
  Cc: Nick Piggin, Ray Lee, Jesper Juhl, Andrew Morton, ck list,
	Ingo Molnar, Paul Jackson, linux-mm, linux-kernel

On Wed, 25 Jul 2007, Rene Herman wrote:

> On 07/25/2007 06:06 AM, Nick Piggin wrote:
>
>>  Ray Lee wrote:
>
>> >  Anyway, my point is that I worry that tuning for an unusual and 
>> >  infrequent workload (which updatedb certainly is), is the wrong way to 
>> >  go.
>>
>>  Well it runs every day or so for every desktop Linux user, and it has
>>  similarities with other workloads.
>
> It certainly doesn't run for me ever. Always kind of a "that's not the point" 
> comment but I just keep wondering whenever I see anyone complain about 
> updatedb why the _hell_ they are running it in the first place. If anyone who 
> never uses "locate" for anything simply disable updatedb, the problem will 
> for a large part be solved.
>
> This not just meant as a cheap comment; while I can think of a few similar 
> loads even on the desktop (scanning a browser cache, a media player indexing 
> a large amount of media files, ...) I've never heard of problems _other_ than 
> updatedb. So just junk that crap and be happy.

but if you do use locate then the alturnative becomes sitting around and 
waiting for find to complete on a regular basis.

David Lang

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-25  5:12                 ` david
@ 2007-07-25  5:30                   ` Rene Herman
  2007-07-25  5:51                     ` david
                                       ` (2 more replies)
  0 siblings, 3 replies; 484+ messages in thread
From: Rene Herman @ 2007-07-25  5:30 UTC (permalink / raw)
  To: david
  Cc: Nick Piggin, Ray Lee, Jesper Juhl, Andrew Morton, ck list,
	Ingo Molnar, Paul Jackson, linux-mm, linux-kernel

On 07/25/2007 07:12 AM, david@lang.hm wrote:

> On Wed, 25 Jul 2007, Rene Herman wrote:

>> It certainly doesn't run for me ever. Always kind of a "that's not the 
>> point" comment but I just keep wondering whenever I see anyone 
>> complain about updatedb why the _hell_ they are running it in the 
>> first place. If anyone who never uses "locate" for anything simply 
>> disable updatedb, the problem will for a large part be solved.
>>
>> This not just meant as a cheap comment; while I can think of a few 
>> similar loads even on the desktop (scanning a browser cache, a media 
>> player indexing a large amount of media files, ...) I've never heard 
>> of problems _other_ than updatedb. So just junk that crap and be happy.
> 
> but if you do use locate then the alturnative becomes sitting around and 
> waiting for find to complete on a regular basis.

Yes, but what's locate's usage scenario? I've never, ever wanted to use it. 
When do you know the name of something but not where it's located, other 
than situations which "which" wouldn't cover and after just having 
installed/unpacked something meaning locate doesn't know about it yet either?

Rene.


^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-25  4:55               ` Rene Herman
  2007-07-25  5:00                 ` Nick Piggin
  2007-07-25  5:12                 ` david
@ 2007-07-25  5:30                 ` Eric St-Laurent
  2007-07-25  5:37                   ` Nick Piggin
  2 siblings, 1 reply; 484+ messages in thread
From: Eric St-Laurent @ 2007-07-25  5:30 UTC (permalink / raw)
  To: Rene Herman
  Cc: Nick Piggin, Ray Lee, Jesper Juhl, Andrew Morton, ck list,
	Ingo Molnar, Paul Jackson, linux-mm, linux-kernel

On Wed, 2007-25-07 at 06:55 +0200, Rene Herman wrote:

> It certainly doesn't run for me ever. Always kind of a "that's not the 
> point" comment but I just keep wondering whenever I see anyone complain 
> about updatedb why the _hell_ they are running it in the first place. If 
> anyone who never uses "locate" for anything simply disable updatedb, the 
> problem will for a large part be solved.
> 
> This not just meant as a cheap comment; while I can think of a few similar 
> loads even on the desktop (scanning a browser cache, a media player indexing 
> a large amount of media files, ...) I've never heard of problems _other_ 
> than updatedb. So just junk that crap and be happy.

>From my POV there's two different problems discussed recently:

- updatedb type of workloads that add tons of inodes and dentries in the
slab caches which of course use the pagecache.

- streaming large files (read or copying) that fill the pagecache with
useless used-once data

swap prefetch fix the first case, drop-behind fix the second case.

Both have the same symptoms but the cause is different.

Personally updatedb doesn't really hurt me.  But I don't have that many
files on my desktop.  I've tried the swap prefetch patch in the past and
it was not so noticeable for me. (I don't doubt it's helpful for others)

But every time I read or copy a large file around (usually from a
server) the slowdown is noticeable for some moments.

I just wanted to point this out, if it wasn't clean enough for everyone.
I hope both problems get fixed.


Best regards,

- Eric



^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-25  5:30                 ` Eric St-Laurent
@ 2007-07-25  5:37                   ` Nick Piggin
  2007-07-25  5:53                     ` david
                                       ` (4 more replies)
  0 siblings, 5 replies; 484+ messages in thread
From: Nick Piggin @ 2007-07-25  5:37 UTC (permalink / raw)
  To: Eric St-Laurent
  Cc: Rene Herman, Ray Lee, Jesper Juhl, Andrew Morton, ck list,
	Ingo Molnar, Paul Jackson, linux-mm, linux-kernel

Eric St-Laurent wrote:
> On Wed, 2007-25-07 at 06:55 +0200, Rene Herman wrote:
> 
> 
>>It certainly doesn't run for me ever. Always kind of a "that's not the 
>>point" comment but I just keep wondering whenever I see anyone complain 
>>about updatedb why the _hell_ they are running it in the first place. If 
>>anyone who never uses "locate" for anything simply disable updatedb, the 
>>problem will for a large part be solved.
>>
>>This not just meant as a cheap comment; while I can think of a few similar 
>>loads even on the desktop (scanning a browser cache, a media player indexing 
>>a large amount of media files, ...) I've never heard of problems _other_ 
>>than updatedb. So just junk that crap and be happy.
> 
> 
>>From my POV there's two different problems discussed recently:
> 
> - updatedb type of workloads that add tons of inodes and dentries in the
> slab caches which of course use the pagecache.
> 
> - streaming large files (read or copying) that fill the pagecache with
> useless used-once data
> 
> swap prefetch fix the first case, drop-behind fix the second case.

OK, this is where I start to worry. Swap prefetch AFAIKS doesn't fix
the updatedb problem very well, because if updatedb has caused swapout
then it has filled memory, and swap prefetch doesn't run unless there
is free memory (not to mention that updatedb would have paged out other
files as well).

And drop behind doesn't fix your usual problem where you are downloading
from a server, because that is use-once write(2) data which is the
problem. And this readahead-based drop behind also doesn't help if data
you were reading happened to be a sequence of small files, or otherwise
not in good readahead order.

Not to say that neither fix some problems, but for such conceptually
big changes, it should take a little more effort than a constructed test
case and no consideration of the alternatives to get it merged.

-- 
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-25  5:30                   ` Rene Herman
@ 2007-07-25  5:51                     ` david
  2007-07-25  7:14                     ` Valdis.Kletnieks
  2007-07-25 16:02                     ` Ray Lee
  2 siblings, 0 replies; 484+ messages in thread
From: david @ 2007-07-25  5:51 UTC (permalink / raw)
  To: Rene Herman
  Cc: Nick Piggin, Ray Lee, Jesper Juhl, Andrew Morton, ck list,
	Ingo Molnar, Paul Jackson, linux-mm, linux-kernel

On Wed, 25 Jul 2007, Rene Herman wrote:

> On 07/25/2007 07:12 AM, david@lang.hm wrote:
>
>>  On Wed, 25 Jul 2007, Rene Herman wrote:
>
>> >  It certainly doesn't run for me ever. Always kind of a "that's not the 
>> >  point" comment but I just keep wondering whenever I see anyone complain 
>> >  about updatedb why the _hell_ they are running it in the first place. If 
>> >  anyone who never uses "locate" for anything simply disable updatedb, the 
>> >  problem will for a large part be solved.
>> > 
>> >  This not just meant as a cheap comment; while I can think of a few 
>> >  similar loads even on the desktop (scanning a browser cache, a media 
>> >  player indexing a large amount of media files, ...) I've never heard of 
>> >  problems _other_ than updatedb. So just junk that crap and be happy.
>>
>>  but if you do use locate then the alturnative becomes sitting around and
>>  waiting for find to complete on a regular basis.
>
> Yes, but what's locate's usage scenario? I've never, ever wanted to use it. 
> When do you know the name of something but not where it's located, other than 
> situations which "which" wouldn't cover and after just having 
> installed/unpacked something meaning locate doesn't know about it yet either?

which only finds executables that are in the path.

I commonly use locate to find config files (or sample config files) for 
packages that were installed at some point in the past with fairly default 
configs and now I want to go and tweak them. so I start reading 
documentation and then need to find out where $disto moved the files to 
this release (I commonly am working on machines with over a half dozen 
different distro releases, and none of them RedHat)

David Lang

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-25  5:37                   ` Nick Piggin
@ 2007-07-25  5:53                     ` david
  2007-07-25  6:04                       ` Nick Piggin
  2007-07-25  6:19                     ` [ck] " Matthew Hawkins
                                       ` (3 subsequent siblings)
  4 siblings, 1 reply; 484+ messages in thread
From: david @ 2007-07-25  5:53 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Eric St-Laurent, Rene Herman, Ray Lee, Jesper Juhl,
	Andrew Morton, ck list, Ingo Molnar, Paul Jackson, linux-mm,
	linux-kernel

On Wed, 25 Jul 2007, Nick Piggin wrote:

> Eric St-Laurent wrote:
>>  On Wed, 2007-25-07 at 06:55 +0200, Rene Herman wrote:
>>
>> 
>> > It certainly doesn't run for me ever. Always kind of a "that's not the 
>> > point" comment but I just keep wondering whenever I see anyone complain 
>> > about updatedb why the _hell_ they are running it in the first place. If 
>> > anyone who never uses "locate" for anything simply disable updatedb, the 
>> > problem will for a large part be solved.
>> > 
>> > This not just meant as a cheap comment; while I can think of a few 
>> > similar loads even on the desktop (scanning a browser cache, a media 
>> > player indexing a large amount of media files, ...) I've never heard of 
>> > problems _other_ than updatedb. So just junk that crap and be happy.
>>
>> 
>> >From my POV there's two different problems discussed recently:
>>
>>  - updatedb type of workloads that add tons of inodes and dentries in the
>>  slab caches which of course use the pagecache.
>>
>>  - streaming large files (read or copying) that fill the pagecache with
>>  useless used-once data
>>
>>  swap prefetch fix the first case, drop-behind fix the second case.
>
> OK, this is where I start to worry. Swap prefetch AFAIKS doesn't fix
> the updatedb problem very well, because if updatedb has caused swapout
> then it has filled memory, and swap prefetch doesn't run unless there
> is free memory (not to mention that updatedb would have paged out other
> files as well).
>
> And drop behind doesn't fix your usual problem where you are downloading
> from a server, because that is use-once write(2) data which is the
> problem. And this readahead-based drop behind also doesn't help if data
> you were reading happened to be a sequence of small files, or otherwise
> not in good readahead order.
>
> Not to say that neither fix some problems, but for such conceptually
> big changes, it should take a little more effort than a constructed test
> case and no consideration of the alternatives to get it merged.

well, there appears to be a fairly large group of people who have 
subjective opinions that it helps them. but those were dismissed becouse 
they aren't measurements.

so now the measurements of the constructed test case aren't acceptable.

what sort of test case would be acceptable?

David Lang

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-25  5:53                     ` david
@ 2007-07-25  6:04                       ` Nick Piggin
  2007-07-25  6:23                         ` david
  2007-07-25 10:41                         ` Jesper Juhl
  0 siblings, 2 replies; 484+ messages in thread
From: Nick Piggin @ 2007-07-25  6:04 UTC (permalink / raw)
  To: david
  Cc: Eric St-Laurent, Rene Herman, Ray Lee, Jesper Juhl,
	Andrew Morton, ck list, Ingo Molnar, Paul Jackson, linux-mm,
	linux-kernel

david@lang.hm wrote:
> On Wed, 25 Jul 2007, Nick Piggin wrote:

>> OK, this is where I start to worry. Swap prefetch AFAIKS doesn't fix
>> the updatedb problem very well, because if updatedb has caused swapout
>> then it has filled memory, and swap prefetch doesn't run unless there
>> is free memory (not to mention that updatedb would have paged out other
>> files as well).
>>
>> And drop behind doesn't fix your usual problem where you are downloading
>> from a server, because that is use-once write(2) data which is the
>> problem. And this readahead-based drop behind also doesn't help if data
>> you were reading happened to be a sequence of small files, or otherwise
>> not in good readahead order.
>>
>> Not to say that neither fix some problems, but for such conceptually
>> big changes, it should take a little more effort than a constructed test
>> case and no consideration of the alternatives to get it merged.
> 
> 
> well, there appears to be a fairly large group of people who have 
> subjective opinions that it helps them. but those were dismissed becouse 
> they aren't measurements.

Not at all. But there is also seems to be some people also experiencing
problems with basic page reclaim on some of the workloads where these
things help. I am not dismissing anybody's claims about anything; I want
to try to solve some of these problems.

Interestingly, some of the people ranting the most about how the VM sucks
are the ones helping least in solving these basic problems.


> so now the measurements of the constructed test case aren't acceptable.
> 
> what sort of test case would be acceptable?

Well I never said real world tests aren't acceptable, they are. There is
a difference between an "it feels better for me", and some actual real
measurement and analysis of said workload.

And constructed test cases of course are useful as well, I didn't say
they weren't. I don't know what you mean by "acceptable", but you should
read my last paragraph again.

-- 
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [ck] Re: -mm merge plans for 2.6.23
  2007-07-25  4:06             ` Nick Piggin
  2007-07-25  4:55               ` Rene Herman
@ 2007-07-25  6:09               ` Matthew Hawkins
  2007-07-25  6:18                 ` Nick Piggin
  2007-07-25 16:19               ` Ray Lee
                                 ` (2 subsequent siblings)
  4 siblings, 1 reply; 484+ messages in thread
From: Matthew Hawkins @ 2007-07-25  6:09 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Ray Lee, Jesper Juhl, linux-kernel, ck list, linux-mm,
	Paul Jackson, Andrew Morton

On 7/25/07, Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> I'm not saying that we can't try to tackle that problem, but first of
> all you have a really nice narrow problem where updatedb seems to be
> causing the kernel to completely do the wrong thing. So we start on
> that.

updatedb isn't the only problem, its just an obvious one.  I like the
idea of looking into the vfs for this and other one-shot applications
(rather than looking at updatedb itself specifically)

Many modern applications have a lot of open file handles.  For
example, I just fired up my usual audio player and sys/fs/file-nr
showed another 600 open files (funnily enough, I have roughly that
many audio files :)  I'm not exactly sure what happens when this one
gets swapped out for whatever reason (firefox/java/vmware/etc chews
ram, updatedb, whatever) but I'm fairly confident what happens between
kswapd and the vfs and whatever else we're caching is not optimal come
time for this process to context-switch back in.  We're not running a
highly-optimised number-crunching scientific app on desktops, we're
running a full herd of poorly-coded hogs simultaneously through
smaller pens.

I don't think anyone is trying to claim that swap prefetch is the be
all and end all of this problem's solution, however without it the
effects are an order of magnitude worse (I've cited numbers elsewhere,
as have several others); its relatively non-intrusive (600+ lines of
the 755 changed ones are self-contained), is compile and runtime
selectable, and still has a maintainer now that Con has retired.  If
there was a better solution, it should have been developed sometime in
the past 23 months that swap prefetch has addressed it.  That's how we
got rmap versus aa, and so on.  But nobody chose to do so, and
continuing to hold out on merging it on the promise of vapourware is
ridiculous.  That has never been the way linux kernel development has
operated.

-- 
Matt

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [ck] Re: -mm merge plans for 2.6.23
  2007-07-25  6:09               ` [ck] " Matthew Hawkins
@ 2007-07-25  6:18                 ` Nick Piggin
  0 siblings, 0 replies; 484+ messages in thread
From: Nick Piggin @ 2007-07-25  6:18 UTC (permalink / raw)
  To: Matthew Hawkins
  Cc: Ray Lee, Jesper Juhl, linux-kernel, ck list, linux-mm,
	Paul Jackson, Andrew Morton

Matthew Hawkins wrote:
> On 7/25/07, Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> 
>> I'm not saying that we can't try to tackle that problem, but first of
>> all you have a really nice narrow problem where updatedb seems to be
>> causing the kernel to completely do the wrong thing. So we start on
>> that.
> 
> 
> updatedb isn't the only problem, its just an obvious one.  I like the
> idea of looking into the vfs for this and other one-shot applications
> (rather than looking at updatedb itself specifically)

That's the point, it is an obvious one. So it should be easy to work
out why it is going wrong, and fix it. (And hopefully that fixes some
of the less obvious problems too.)


> Many modern applications have a lot of open file handles.  For
> example, I just fired up my usual audio player and sys/fs/file-nr
> showed another 600 open files (funnily enough, I have roughly that
> many audio files :)  I'm not exactly sure what happens when this one
> gets swapped out for whatever reason (firefox/java/vmware/etc chews
> ram, updatedb, whatever) but I'm fairly confident what happens between
> kswapd and the vfs and whatever else we're caching is not optimal come
> time for this process to context-switch back in.  We're not running a
> highly-optimised number-crunching scientific app on desktops, we're
> running a full herd of poorly-coded hogs simultaneously through
> smaller pens.

And yet nobody wants to take the time to properly analyse why these
things are going wrong and reporting their findings? Or if they have,
where is that documented?

-- 
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [ck] Re: -mm merge plans for 2.6.23
  2007-07-25  5:37                   ` Nick Piggin
  2007-07-25  5:53                     ` david
@ 2007-07-25  6:19                     ` Matthew Hawkins
  2007-07-25  6:30                       ` Nick Piggin
  2007-07-25  6:47                       ` Mike Galbraith
  2007-07-25  6:44                     ` Eric St-Laurent
                                       ` (2 subsequent siblings)
  4 siblings, 2 replies; 484+ messages in thread
From: Matthew Hawkins @ 2007-07-25  6:19 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Eric St-Laurent, Ray Lee, Jesper Juhl, linux-kernel, ck list,
	linux-mm, Paul Jackson, Andrew Morton, Rene Herman

On 7/25/07, Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> Not to say that neither fix some problems, but for such conceptually
> big changes, it should take a little more effort than a constructed test
> case and no consideration of the alternatives to get it merged.

Swap Prefetch has existed since September 5, 2005.  Please Nick,
enlighten us all with your "alternatives" which have been offered (in
practical, not theoretical form) in the past 23 months, along with
their non-constructed benchmarks proving their case and the hordes of
happy users and kernel developers who have tested them out the wazoo
and given their backing.  Or just take a nice steaming jug of STFU.

-- 
Matt

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-25  6:04                       ` Nick Piggin
@ 2007-07-25  6:23                         ` david
  2007-07-25  7:25                           ` Nick Piggin
  2007-07-25 10:41                         ` Jesper Juhl
  1 sibling, 1 reply; 484+ messages in thread
From: david @ 2007-07-25  6:23 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Eric St-Laurent, Rene Herman, Ray Lee, Jesper Juhl,
	Andrew Morton, ck list, Ingo Molnar, Paul Jackson, linux-mm,
	linux-kernel

On Wed, 25 Jul 2007, Nick Piggin wrote:

> david@lang.hm wrote:
>>  On Wed, 25 Jul 2007, Nick Piggin wrote:
>
>> >  OK, this is where I start to worry. Swap prefetch AFAIKS doesn't fix
>> >  the updatedb problem very well, because if updatedb has caused swapout
>> >  then it has filled memory, and swap prefetch doesn't run unless there
>> >  is free memory (not to mention that updatedb would have paged out other
>> >  files as well).
>> > 
>> >  And drop behind doesn't fix your usual problem where you are downloading
>> >  from a server, because that is use-once write(2) data which is the
>> >  problem. And this readahead-based drop behind also doesn't help if data
>> >  you were reading happened to be a sequence of small files, or otherwise
>> >  not in good readahead order.
>> > 
>> >  Not to say that neither fix some problems, but for such conceptually
>> >  big changes, it should take a little more effort than a constructed test
>> >  case and no consideration of the alternatives to get it merged.
>>
>>
>>  well, there appears to be a fairly large group of people who have
>>  subjective opinions that it helps them. but those were dismissed becouse
>>  they aren't measurements.
>
> Not at all. But there is also seems to be some people also experiencing
> problems with basic page reclaim on some of the workloads where these
> things help. I am not dismissing anybody's claims about anything; I want
> to try to solve some of these problems.
>
> Interestingly, some of the people ranting the most about how the VM sucks
> are the ones helping least in solving these basic problems.
>
>
>>  so now the measurements of the constructed test case aren't acceptable.
>>
>>  what sort of test case would be acceptable?
>
> Well I never said real world tests aren't acceptable, they are. There is
> a difference between an "it feels better for me", and some actual real
> measurement and analysis of said workload.
>
> And constructed test cases of course are useful as well, I didn't say
> they weren't. I don't know what you mean by "acceptable", but you should
> read my last paragraph again.

this problem has been around for many years, with many different people 
working on solutions. it's hardly a case of getting a proposal and trying 
to get it in without anyone looking at other options.

it seems that there are some people (not nessasarily including you) who 
will oppose this feature until a test is created that shows that it's 
better. the question is what sort of test will be accepted as valid? I'm 
not useing this patch, but it sounds as if the people who are useing it 
are interested in doing whatever testing is required, but so far the 
situation seems to be a series of "here's a test", "that test isn't valid, 
try again" loops. which don't seem to be doing anyone any good and are 
frustrating lots of people, so like several people over the last few days 
O'm asking the question, "what sort of test would be acceptable as proof 
that this patch does some good?"

David Lang

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [ck] Re: -mm merge plans for 2.6.23
  2007-07-25  6:19                     ` [ck] " Matthew Hawkins
@ 2007-07-25  6:30                       ` Nick Piggin
  2007-07-25  6:47                       ` Mike Galbraith
  1 sibling, 0 replies; 484+ messages in thread
From: Nick Piggin @ 2007-07-25  6:30 UTC (permalink / raw)
  To: Matthew Hawkins
  Cc: Eric St-Laurent, Ray Lee, Jesper Juhl, linux-kernel, ck list,
	linux-mm, Paul Jackson, Andrew Morton, Rene Herman

Matthew Hawkins wrote:
> On 7/25/07, Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> 
>> Not to say that neither fix some problems, but for such conceptually
>> big changes, it should take a little more effort than a constructed test
>> case and no consideration of the alternatives to get it merged.
> 
> 
> Swap Prefetch has existed since September 5, 2005.  Please Nick,
> enlighten us all with your "alternatives" which have been offered (in
> practical, not theoretical form) in the past 23 months, along with
> their non-constructed benchmarks proving their case and the hordes of
> happy users and kernel developers who have tested them out the wazoo
> and given their backing.  Or just take a nice steaming jug of STFU.

The alternatives comment was in relation to the readahead based drop
behind patch,for which an alternative would be improving use-once,
possibly in the way I described.

As for swap prefetch, I don't know, I'm not in charge of it being
merged or not merged. I do know some people have reported that their
updatedb problem gets much better with swap prefetch turned on, and
I am trying to work on that too.

For you? You also have the alternative to help improve things yourself,
and you can modify your own kernel.

-- 
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-25  5:37                   ` Nick Piggin
  2007-07-25  5:53                     ` david
  2007-07-25  6:19                     ` [ck] " Matthew Hawkins
@ 2007-07-25  6:44                     ` Eric St-Laurent
  2007-07-25 16:09                     ` Ray Lee
  2007-07-25 17:55                     ` Frank A. Kingswood
  4 siblings, 0 replies; 484+ messages in thread
From: Eric St-Laurent @ 2007-07-25  6:44 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Rene Herman, Ray Lee, Jesper Juhl, Andrew Morton, ck list,
	Ingo Molnar, Paul Jackson, linux-mm, linux-kernel

On Wed, 2007-25-07 at 15:37 +1000, Nick Piggin wrote:

> OK, this is where I start to worry. Swap prefetch AFAIKS doesn't fix
> the updatedb problem very well, because if updatedb has caused swapout
> then it has filled memory, and swap prefetch doesn't run unless there
> is free memory (not to mention that updatedb would have paged out other
> files as well).
> 
> And drop behind doesn't fix your usual problem where you are downloading
> from a server, because that is use-once write(2) data which is the
> problem. And this readahead-based drop behind also doesn't help if data
> you were reading happened to be a sequence of small files, or otherwise
> not in good readahead order.
> 
> Not to say that neither fix some problems, but for such conceptually
> big changes, it should take a little more effort than a constructed test
> case and no consideration of the alternatives to get it merged.


Sorry for the confusion.

For swap prefetch I should have said "some people claim that it fix
their problem". I didn't want to hurt anybody feelings, some people are
tired to hear others speak hypothetically about this patch, as it
work-for-them (TM).

I don't experience the problem. Can't help.

For drop behind it fix half the problem. The read case is handled
perfectly by Peter's patch. And the copy (read+write) is unchanged. My
test case demonstrate it very easily, just look at the numbers.

So, I agree with you that drop behind doesn't fix the write() case.
Peter has said so himself when I offered to test his patch.

As I do experience this problem, I have written a small test program and
batch file to help push the patch for acceptance.  I'm very willing to
help improve the test cases, test patches and write code, time
permitting.

About this very subject, earlier this year this Andrew suggested me to
came up with a test case to demonstrate my problem, well finally I've
done so.

http://lkml.org/lkml/2007/3/3/164
http://lkml.org/lkml/2007/3/3/166

Lastly, I would go as far to say that the use-once read then copy fix
must also work with copies over NFS. I don't know if NFS change the
workload on the client station versus the local case, and I don't know
if it's still possible to consider data copied this way as use-once.


- Eric



^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [ck] Re: -mm merge plans for 2.6.23
  2007-07-25  6:19                     ` [ck] " Matthew Hawkins
  2007-07-25  6:30                       ` Nick Piggin
@ 2007-07-25  6:47                       ` Mike Galbraith
  2007-07-25  7:19                         ` Eric St-Laurent
  1 sibling, 1 reply; 484+ messages in thread
From: Mike Galbraith @ 2007-07-25  6:47 UTC (permalink / raw)
  To: Matthew Hawkins
  Cc: Nick Piggin, Eric St-Laurent, Ray Lee, Jesper Juhl, linux-kernel,
	ck list, linux-mm, Paul Jackson, Andrew Morton, Rene Herman

On Wed, 2007-07-25 at 16:19 +1000, Matthew Hawkins wrote:
> On 7/25/07, Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> > Not to say that neither fix some problems, but for such conceptually
> > big changes, it should take a little more effort than a constructed test
> > case and no consideration of the alternatives to get it merged.
> 
> Swap Prefetch has existed since September 5, 2005.  Please Nick,
> enlighten us all with your "alternatives" which have been offered (in
> practical, not theoretical form) in the past 23 months, along with
> their non-constructed benchmarks proving their case and the hordes of
> happy users and kernel developers who have tested them out the wazoo
> and given their backing.  Or just take a nice steaming jug of STFU.

Heh.  Here we have a VM developer expressing his interest in the problem
space, and you offer him a steaming jug of STFU because he doesn't say
what you want to hear.  I wonder how many killfiles you just entered.

	-Mike


^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-25  5:30                   ` Rene Herman
  2007-07-25  5:51                     ` david
@ 2007-07-25  7:14                     ` Valdis.Kletnieks
  2007-07-25  8:18                       ` Rene Herman
  2007-07-25 16:02                     ` Ray Lee
  2 siblings, 1 reply; 484+ messages in thread
From: Valdis.Kletnieks @ 2007-07-25  7:14 UTC (permalink / raw)
  To: Rene Herman
  Cc: david, Nick Piggin, Ray Lee, Jesper Juhl, Andrew Morton, ck list,
	Ingo Molnar, Paul Jackson, linux-mm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 1659 bytes --]

On Wed, 25 Jul 2007 07:30:37 +0200, Rene Herman said:

> Yes, but what's locate's usage scenario? I've never, ever wanted to use it. 
> When do you know the name of something but not where it's located, other 
> than situations which "which" wouldn't cover and after just having 
> installed/unpacked something meaning locate doesn't know about it yet either?

My favorite use - with 5 Fedora kernels and as many -mm kernels on my laptop,
doing a 'locate moby' finds all the moby.c and moby.o and moby.ko for
the various releases. For bonus points, something like:

ls -lt `locate iwl3945.ko`

to find all 19 copies that are on my system, and remind me which ones were
compiled when.  Or just when you remember the name of some one-off 100-line
Perl program that you wrote 6 months ago, but not sure which directory you
left it in... ;)

You want hard numbers? Here you go - 'locate' versus 'find'
(/usr/src/ has about 290K files on it):

%  strace locate iwl3945.ko  >| /tmp/foo3 2>&1
% wc /tmp/foo3
  96  592 6252 /tmp/foo3
% strace find /usr/src /lib -name iwl3945.ko >| /tmp/foo4 2>&1
% wc /tmp/foo4
  328380  1550032 15708205 /tmp/foo4

# echo 1 > /proc/sys/vm/drop_caches     (to empty the caches

% time locate iwl3945.ko > /dev/null

real    0m0.872s
user    0m0.867s
sys     0m0.008s

% time find /usr/src /lib -name iwl3945.ko > /dev/null
find: /usr/src/lost+found: Permission denied

real    1m12.241s
user    0m1.128s
sys     0m3.566s

So 96 system calls in 1 second, against 328K calls in a minute.  There's your
use case, right there.  Now if we can just find a way for that find/updatedb
to not be as painful to the rest of the system.....





[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [ck] Re: -mm merge plans for 2.6.23
  2007-07-25  6:47                       ` Mike Galbraith
@ 2007-07-25  7:19                         ` Eric St-Laurent
  0 siblings, 0 replies; 484+ messages in thread
From: Eric St-Laurent @ 2007-07-25  7:19 UTC (permalink / raw)
  To: Mike Galbraith
  Cc: Matthew Hawkins, Nick Piggin, Ray Lee, Jesper Juhl, linux-kernel,
	ck list, linux-mm, Paul Jackson, Andrew Morton, Rene Herman

On Wed, 2007-25-07 at 08:47 +0200, Mike Galbraith wrote:

> Heh.  Here we have a VM developer expressing his interest in the problem
> space, and you offer him a steaming jug of STFU because he doesn't say
> what you want to hear.  I wonder how many killfiles you just entered.
> 

Agreed.

(a bit OT)

People should understand that it's not (I think) about a desktop
workload vs enterprise workloads war.

I see it mostly as a progression versus regressions trade-off. And
adding potentially useless or unmaintained code is a regression from the
maintainers POV.

The best way to justify a patch and have it integrated is to have a
scientific testing method with repeatable numbers.

Con has done so for his patch, his benchmark demonstrated good
improvements.

But I feel some of his supporters have indirectly harmed his cause by
their comments.  Also, the fact that Con recently stopped maintaining
his work out of frustration also don't help having his patch merged. 

Again I'm not personally pushing this patch, I don't need it.

Con has worked for many years on two area that still cause problems for
desktop users: scheduler interactivity and pagecache trashing.  Now that
the scheduler has been fixed, let's have the VM fixed too.

Sorry for the slightly OT post, and please don't start a flame war...


- Eric



^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-25  6:23                         ` david
@ 2007-07-25  7:25                           ` Nick Piggin
  2007-07-25  7:49                             ` Ingo Molnar
  0 siblings, 1 reply; 484+ messages in thread
From: Nick Piggin @ 2007-07-25  7:25 UTC (permalink / raw)
  To: david
  Cc: Eric St-Laurent, Rene Herman, Ray Lee, Jesper Juhl,
	Andrew Morton, ck list, Ingo Molnar, Paul Jackson, linux-mm,
	linux-kernel

david@lang.hm wrote:
> On Wed, 25 Jul 2007, Nick Piggin wrote:

>> And constructed test cases of course are useful as well, I didn't say
>> they weren't. I don't know what you mean by "acceptable", but you should
>> read my last paragraph again.
> 
> 
> this problem has been around for many years, with many different people 
> working on solutions. it's hardly a case of getting a proposal and 
> trying to get it in without anyone looking at other options.

What is "this problem"? People have an updatedb problem that is solved
by swap prefetching which I want to fix in a different way.

There would be a different problem of "run something that uses heaps of
memory and swap everything else out, then quit it, wait for a while, and
swap prefetching helps". OK, definitely swap prefetching would help there.
How much? I don't know. I'd be slightly surprised if it was like an order
of magnitude, because not only swap but everything else has been thrown
out too.


> it seems that there are some people (not nessasarily including you) who 
> will oppose this feature until a test is created that shows that it's 
> better. the question is what sort of test will be accepted as valid? I'm 
> not useing this patch, but it sounds as if the people who are useing it 
> are interested in doing whatever testing is required, but so far the 
> situation seems to be a series of "here's a test", "that test isn't 
> valid, try again" loops. which don't seem to be doing anyone any good 

And yet despite my repeated pleas, none of those people has yet spent a
bit of time with me to help analyse what is happening.


> and are frustrating lots of people, so like several people over the last 
> few days O'm asking the question, "what sort of test would be acceptable 
> as proof that this patch does some good?"

I don't think any further proof is needed that the patch does "some"
good. Rig up a test case and you could see some seconds shaved off it.
Maybe you want to know "how to get this patch merged"? And I don't know
that one. I do know that it is fuzzy, and probably doesn't include
demanding things of Andrew or Linus.

BTW. If you find out the answer to that one, let me know because I have
this lockless pagecache patch that has also been around for years, is
also just a few hundred lines in the VM, and does do some good too. I'm
sure the buffered AIO people and many others would also like to know.

-- 
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-25  7:25                           ` Nick Piggin
@ 2007-07-25  7:49                             ` Ingo Molnar
  2007-07-25  7:58                               ` Nick Piggin
  0 siblings, 1 reply; 484+ messages in thread
From: Ingo Molnar @ 2007-07-25  7:49 UTC (permalink / raw)
  To: Nick Piggin
  Cc: david, Eric St-Laurent, Rene Herman, Ray Lee, Jesper Juhl,
	Andrew Morton, ck list, Paul Jackson, linux-mm, linux-kernel


* Nick Piggin <nickpiggin@yahoo.com.au> wrote:

> And yet despite my repeated pleas, none of those people has yet spent 
> a bit of time with me to help analyse what is happening.

btw., it might help to give specific, precise instructions about what 
people should do to help you analyze this problem.

	Ingo

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-25  7:49                             ` Ingo Molnar
@ 2007-07-25  7:58                               ` Nick Piggin
  2007-07-25  8:15                                 ` Ingo Molnar
  0 siblings, 1 reply; 484+ messages in thread
From: Nick Piggin @ 2007-07-25  7:58 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: david, Eric St-Laurent, Rene Herman, Ray Lee, Jesper Juhl,
	Andrew Morton, ck list, Paul Jackson, linux-mm, linux-kernel

Ingo Molnar wrote:
> * Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> 
> 
>>And yet despite my repeated pleas, none of those people has yet spent 
>>a bit of time with me to help analyse what is happening.
> 
> 
> btw., it might help to give specific, precise instructions about what 
> people should do to help you analyze this problem.

Ray has been the first one to offer (thank you), and yes I have asked
him for precise details of info to collect to hopefully work out what
is happening with his first problem.

For the general "it feels better for me" it is harder, but not as hard
as CPU scheduler. We can measure various types of IO waits, swap in/out
events, swap prefetch events and successfulness; see what happens to
those as we change swappiness or vfs_cache_pressure etc.

-- 
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-25  4:46             ` david
@ 2007-07-25  8:00               ` Rene Herman
  2007-07-25  8:07                 ` david
  2007-07-25 15:55               ` Ray Lee
  1 sibling, 1 reply; 484+ messages in thread
From: Rene Herman @ 2007-07-25  8:00 UTC (permalink / raw)
  To: david
  Cc: Ray Lee, Nick Piggin, Jesper Juhl, Andrew Morton, ck list,
	Ingo Molnar, Paul Jackson, linux-mm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 760 bytes --]

On 07/25/2007 06:46 AM, david@lang.hm wrote:

> you could make a synthetic test by writing a memory hog that allocates 
> 3/4 of your ram then pauses waiting for input and then randomly accesses 
> the memory for a while (say randomly accessing 2x # of pages allocated) 
> and then pausing again before repeating

Something like this?

> run two of these, alternating which one is running at any one time. time 
> how long it takes to do the random accesses.
> 
> the difference in this time should be a fair example of how much it 
> would impact the user.

Notenotenote, not sure what you're going to show with it (times are simply 
as horrendous as I'd expect) but thought I'd try to inject something other 
than steaming cups of 4-letter beverages.

Rene.

[-- Attachment #2: hog.c --]
[-- Type: text/plain, Size: 974 bytes --]

/* gcc -W -Wall -o hog hog.c */

#include <stdlib.h>
#include <stdio.h>

#include <sys/time.h>
#include <unistd.h>

int main(void)
{
	int pages, pagesize, i;
	unsigned char *mem;
	struct timeval tv;
	
	pages = sysconf(_SC_PHYS_PAGES);
	if (pages < 0) {
		perror("_SC_PHYS_PAGES");
		return EXIT_FAILURE;
	}
	pages = (3 * pages) / 4;

	pagesize = sysconf(_SC_PAGESIZE);
	if (pagesize < 0) {
		perror("_SC_PAGESIZE");
		return EXIT_FAILURE;
	}

	mem = malloc(pages * pagesize);
	if (!mem) {
		fprintf(stderr, "out of memory\n");
		return EXIT_FAILURE;
	}
	for (i = 0; i < pages; i++)
		mem[i * pagesize] = 0;

	gettimeofday(&tv, NULL);
	srand((unsigned int)tv.tv_sec);

	while (1) {
		struct timeval start;

		getchar();

		gettimeofday(&start, NULL);
		for (i = 0; i < 2 * pages; i++)
			mem[(rand() / (RAND_MAX / pages + 1)) * pagesize] = 0;
		gettimeofday(&tv, NULL);

		timersub(&tv, &start, &tv);
		printf("%lu.%lu\n", tv.tv_sec, tv.tv_usec);
	}

	return EXIT_SUCCESS;
}

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-25  8:00               ` Rene Herman
@ 2007-07-25  8:07                 ` david
  2007-07-25  8:29                   ` Rene Herman
  0 siblings, 1 reply; 484+ messages in thread
From: david @ 2007-07-25  8:07 UTC (permalink / raw)
  To: Rene Herman
  Cc: Ray Lee, Nick Piggin, Jesper Juhl, Andrew Morton, ck list,
	Ingo Molnar, Paul Jackson, linux-mm, linux-kernel

On Wed, 25 Jul 2007, Rene Herman wrote:

> On 07/25/2007 06:46 AM, david@lang.hm wrote:
>
>>  you could make a synthetic test by writing a memory hog that allocates 3/4
>>  of your ram then pauses waiting for input and then randomly accesses the
>>  memory for a while (say randomly accessing 2x # of pages allocated) and
>>  then pausing again before repeating
>
> Something like this?
>
>>  run two of these, alternating which one is running at any one time. time
>>  how long it takes to do the random accesses.
>>
>>  the difference in this time should be a fair example of how much it would
>>  impact the user.
>
> Notenotenote, not sure what you're going to show with it (times are simply as 
> horrendous as I'd expect) but thought I'd try to inject something other than 
> steaming cups of 4-letter beverages.

when the swap readahead is enabled does it make a significant difference 
in the time to do the random access?

if it does that should show a direct benifit of the patch in a simulation 
of a relativly common workflow (startup a memory hog like openoffice then 
try and go back to your prior work)

David Lang


^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-25  7:58                               ` Nick Piggin
@ 2007-07-25  8:15                                 ` Ingo Molnar
  0 siblings, 0 replies; 484+ messages in thread
From: Ingo Molnar @ 2007-07-25  8:15 UTC (permalink / raw)
  To: Nick Piggin
  Cc: david, Eric St-Laurent, Rene Herman, Ray Lee, Jesper Juhl,
	Andrew Morton, ck list, Paul Jackson, linux-mm, linux-kernel


* Nick Piggin <nickpiggin@yahoo.com.au> wrote:

> > > And yet despite my repeated pleas, none of those people has yet 
> > > spent a bit of time with me to help analyse what is happening.
> >
> > btw., it might help to give specific, precise instructions about 
> > what people should do to help you analyze this problem.
> 
> Ray has been the first one to offer (thank you), and yes I have asked 
> him for precise details of info to collect to hopefully work out what 
> is happening with his first problem.

do you mean this paragraph:

| I guess /proc/meminfo, /proc/zoneinfo, /proc/vmstat, /proc/slabinfo 
| before and after the updatedb run with the latest kernel would be a 
| first step. top and vmstat output during the run wouldn't hurt either.

correct? Does "latest kernel" mean v2.6.22.1, or does it have to be 
v2.6.23-rc1? I guess v2.6.22.1 would be fine as this is a VM problem, 
not a scheduling problem.

the following script will gather all the above information for a 10 
seconds interval:

  http://people.redhat.com/mingo/cfs-scheduler/tools/cfs-debug-info.sh

Ray, please run this script before the updatedb run, once during the 
updatedb run and once after the updatedb run, and send Nick the 3 files 
it creates. (feel free to Cc: me too)

	Ingo

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-25  7:14                     ` Valdis.Kletnieks
@ 2007-07-25  8:18                       ` Rene Herman
  2007-07-25  8:28                         ` Ingo Molnar
  0 siblings, 1 reply; 484+ messages in thread
From: Rene Herman @ 2007-07-25  8:18 UTC (permalink / raw)
  To: Valdis.Kletnieks
  Cc: david, Nick Piggin, Ray Lee, Jesper Juhl, Andrew Morton, ck list,
	Ingo Molnar, Paul Jackson, linux-mm, linux-kernel

On 07/25/2007 09:14 AM, Valdis.Kletnieks@vt.edu wrote:

> On Wed, 25 Jul 2007 07:30:37 +0200, Rene Herman said:
> 
>> Yes, but what's locate's usage scenario? I've never, ever wanted to use
>> it. When do you know the name of something but not where it's located,
>> other than situations which "which" wouldn't cover and after just
>> having installed/unpacked something meaning locate doesn't know about
>> it yet either?
> 
> My favorite use - with 5 Fedora kernels and as many -mm kernels on my
> laptop, doing a 'locate moby' finds all the moby.c and moby.o and moby.ko
> for the various releases.

Supposing you know the path in one tree, you know the path in all of them, 
right? :-?

> You want hard numbers? Here you go - 'locate' versus 'find'

These are ofcourse not necesary. If you discount the time updatedb itself 
takes it's utterly obvious that _if_ you use it, it's going to be wildly 
faster than find.

Regardless, I'll stand by "[by disabling updatedb] the problem will for a 
large part be solved" as I expect approximately 94.372 percent of Linux 
desktop users couldn't care less about locate.

Rene.

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-25  8:18                       ` Rene Herman
@ 2007-07-25  8:28                         ` Ingo Molnar
  2007-07-25  8:43                           ` Rene Herman
  0 siblings, 1 reply; 484+ messages in thread
From: Ingo Molnar @ 2007-07-25  8:28 UTC (permalink / raw)
  To: Rene Herman
  Cc: Valdis.Kletnieks, david, Nick Piggin, Ray Lee, Jesper Juhl,
	Andrew Morton, ck list, Paul Jackson, linux-mm, linux-kernel


* Rene Herman <rene.herman@gmail.com> wrote:

> Regardless, I'll stand by "[by disabling updatedb] the problem will 
> for a large part be solved" as I expect approximately 94.372 percent 
> of Linux desktop users couldn't care less about locate.

i think that approach is illogical: because Linux mis-handled a mixed 
workload the answer is to ... remove a portion of that workload?

To bring your approach to the extreme: what if Linux sucked at running 
more than two CPU-intense tasks at once. Most desktop users dont do 
that, so a probably larger than 94.372 percent of Linux desktop users 
couldn't care less about a proper scheduler. Still, anyone who builds a 
kernel (the average desktop user wont do that) while using firefox will 
attest to the fact that it's quite handy that the Linux scheduler can 
handle mixed workloads pretty well.

now, it might be the case that this mixed VM/VFS workload cannot be 
handled any more intelligently - but that wasnt your argument! The 
swap-prefetch patch certainly tried to do things more intelligently and 
the test-case (measurement app) Con provided showed visible improvements 
in swap-in latency. (and a good number of people posted those results)

	Ingo

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-25  8:07                 ` david
@ 2007-07-25  8:29                   ` Rene Herman
  2007-07-25  8:31                     ` david
  0 siblings, 1 reply; 484+ messages in thread
From: Rene Herman @ 2007-07-25  8:29 UTC (permalink / raw)
  To: david
  Cc: Ray Lee, Nick Piggin, Jesper Juhl, Andrew Morton, ck list,
	Ingo Molnar, Paul Jackson, linux-mm, linux-kernel

On 07/25/2007 10:07 AM, david@lang.hm wrote:

> On Wed, 25 Jul 2007, Rene Herman wrote:

>> Something like this?

[ ... ]

> when the swap readahead is enabled does it make a significant difference 
> in the time to do the random access?

I don't use swap prefetch (nor -ck or -mm). If someone who has the patch 
applied waits to hit enter until swap prefetch has prefetched it all back in 
again, it certainly will.

Swap prefetch's potential to do larger reads back from swapspace than a 
random segfaulting app could well be very significant. Reads are dwarved by 
seeks. If this program does what you wanted, please use it to show us.

Rene.

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-25  8:29                   ` Rene Herman
@ 2007-07-25  8:31                     ` david
  2007-07-25  8:33                       ` david
  0 siblings, 1 reply; 484+ messages in thread
From: david @ 2007-07-25  8:31 UTC (permalink / raw)
  To: Rene Herman
  Cc: Ray Lee, Nick Piggin, Jesper Juhl, Andrew Morton, ck list,
	Ingo Molnar, Paul Jackson, linux-mm, linux-kernel

On Wed, 25 Jul 2007, Rene Herman wrote:

> On 07/25/2007 10:07 AM, david@lang.hm wrote:
>
>>  On Wed, 25 Jul 2007, Rene Herman wrote:
>
>> >  Something like this?
>
> [ ... ]
>
>>  when the swap readahead is enabled does it make a significant difference
>>  in the time to do the random access?
>
> I don't use swap prefetch (nor -ck or -mm). If someone who has the patch 
> applied waits to hit enter until swap prefetch has prefetched it all back in 
> again, it certainly will.
>
> Swap prefetch's potential to do larger reads back from swapspace than a 
> random segfaulting app could well be very significant. Reads are dwarved by 
> seeks. If this program does what you wanted, please use it to show us.

I haven't used swap prefetch either, the call was put out for what could 
be used to test the performance, and I was suggesting a test.

if nobody else follows up on this I'll try to get some time to test it 
myself in a day or two.

David Lang

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-25  8:31                     ` david
@ 2007-07-25  8:33                       ` david
  2007-07-25 10:58                         ` Rene Herman
  0 siblings, 1 reply; 484+ messages in thread
From: david @ 2007-07-25  8:33 UTC (permalink / raw)
  To: Rene Herman
  Cc: Ray Lee, Nick Piggin, Jesper Juhl, Andrew Morton, ck list,
	Ingo Molnar, Paul Jackson, linux-mm, linux-kernel

On Wed, 25 Jul 2007, david@lang.hm wrote:

> Subject: Re: -mm merge plans for 2.6.23
> 
> On Wed, 25 Jul 2007, Rene Herman wrote:
>
>>  On 07/25/2007 10:07 AM, david@lang.hm wrote:
>> 
>> >   On Wed, 25 Jul 2007, Rene Herman wrote:
>> 
>> > >   Something like this?
>>
>>  [ ... ]
>> 
>> >   when the swap readahead is enabled does it make a significant 
>> >   difference
>> >   in the time to do the random access?
>>
>>  I don't use swap prefetch (nor -ck or -mm). If someone who has the patch
>>  applied waits to hit enter until swap prefetch has prefetched it all back
>>  in again, it certainly will.
>>
>>  Swap prefetch's potential to do larger reads back from swapspace than a
>>  random segfaulting app could well be very significant. Reads are dwarved
>>  by seeks. If this program does what you wanted, please use it to show us.
>
> I haven't used swap prefetch either, the call was put out for what could be 
> used to test the performance, and I was suggesting a test.
>
> if nobody else follows up on this I'll try to get some time to test it myself 
> in a day or two.

this assumes that this isn't ruled an invalid test in the meantime.

in any case thanks for codeing this up so quickly.

David Lang

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-25  8:28                         ` Ingo Molnar
@ 2007-07-25  8:43                           ` Rene Herman
  2007-07-25 11:34                             ` Ingo Molnar
       [not found]                             ` <5c77e14b0707250353r48458316x5e6adde6dbce1fbd@mail.gmail.com>
  0 siblings, 2 replies; 484+ messages in thread
From: Rene Herman @ 2007-07-25  8:43 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Valdis.Kletnieks, david, Nick Piggin, Ray Lee, Jesper Juhl,
	Andrew Morton, ck list, Paul Jackson, linux-mm, linux-kernel

On 07/25/2007 10:28 AM, Ingo Molnar wrote:

>> Regardless, I'll stand by "[by disabling updatedb] the problem will 
>> for a large part be solved" as I expect approximately 94.372 percent 
>> of Linux desktop users couldn't care less about locate.
> 
> i think that approach is illogical: because Linux mis-handled a mixed 
> workload the answer is to ... remove a portion of that workload?

No. It got snipped but I introduced the comment by saying it was a "that's 
not the point" kind of thing. Sometimes things that aren't the point are 
still true though and in the case of Linux desktop users complaining about 
updatedb runs, a comment that says that for many an obvious solution would 
be to stop running the damned thing is not in any sense illogical.

Also note I'm not against swap prefetch or anything. I don't use it and do 
not believe I have a pressing need for it, but do suspect it has potential 
to make quite a bit of difference on some things -- if only to drastically 
reduce seeks if it means it's swapping in larger chunks than a randomly 
faulting program would.

Rene.

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-25  6:04                       ` Nick Piggin
  2007-07-25  6:23                         ` david
@ 2007-07-25 10:41                         ` Jesper Juhl
  1 sibling, 0 replies; 484+ messages in thread
From: Jesper Juhl @ 2007-07-25 10:41 UTC (permalink / raw)
  To: Nick Piggin
  Cc: david, Eric St-Laurent, Rene Herman, Ray Lee, Andrew Morton,
	ck list, Ingo Molnar, Paul Jackson, linux-mm, linux-kernel

On 25/07/07, Nick Piggin <nickpiggin@yahoo.com.au> wrote:
[snip]
>
> Well I never said real world tests aren't acceptable, they are. There is
> a difference between an "it feels better for me", and some actual real
> measurement and analysis of said workload.
>

Let me tell you about the use-case where swap prefetch helps me. I
don't have actual numbers currently, only a subjective "it feels
better", but when I get home from work tonight I'll try to collect
some actual numbers for you.

Anyway, here's a description of the scenario (machine is a AMD
Athlon64 X2 4400+, 2GB RAM, 1GB swap, running 32bit kernel &
userspace):

A KDE desktop with the following running is common for me
 - A few (konsole) shells open running vim, pine, less, ssh sessions etc.
 - Eclipse (with CDT) with 20-30 files open in a project.
 - Firefox with 30+ tabs open.
 - LyX with a 200+ page document I'm working on open, is running.
 - Gimp running, usually with at least one or two images open (~1280x1024).
 - Amarok open and playing my playlist (a few days worth of music).
 - At least one Konqueror window in filemanager mode running.
 - More often than not OpenOffice is running with a spreadsheet or
text document open.
 - In the background the machine is running Apache, MySQL, BIND and
NFS services for my local LAN, but they see very little actual use.

Now, a thing I commonly do is fire up a new shell, pull the latest
changes from Linus' git tree and start a script running that builds a
allnoconfig kernel, a allmodconfig kernel, a allyesconfig kernel and
then 30 randconfig kernels.  Obviously that script takes quite a while
to run and loads the box quite a bit, so I usually just leave the box
alone for a few hours until it is done (sometimes I leave it over
night, in which case updatedb also gets added to the mix during the
night). This usually pushes the box to use some amount of swap.

Without swap prefetch; when I start working with one of the apps I had
running before starting the compile job it always feels a little laggy
at first. With swap prefetch app response time is not laggy when I
come back.  The "laggyness" doesn't last too long and is hard to
quantify, but I'll try getting some numbers (if in no other way, then
perhaps by using a stop watch)....

Fact is, this is a scenario that is common to me and one where swap
prefetch definately makes the box feel nicer to work with.


-- 
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please      http://www.expita.com/nomime.html

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-25  8:33                       ` david
@ 2007-07-25 10:58                         ` Rene Herman
  0 siblings, 0 replies; 484+ messages in thread
From: Rene Herman @ 2007-07-25 10:58 UTC (permalink / raw)
  To: david
  Cc: Ray Lee, Nick Piggin, Jesper Juhl, Andrew Morton, ck list,
	Ingo Molnar, Paul Jackson, linux-mm, linux-kernel

On 07/25/2007 10:33 AM, david@lang.hm wrote:

>> I haven't used swap prefetch either, the call was put out for what 
>> could be used to test the performance, and I was suggesting a test.
>>
>> if nobody else follows up on this I'll try to get some time to test it 
>> myself in a day or two.
> 
> this assumes that this isn't ruled an invalid test in the meantime.

Let's save a little time and guess. While two instances of the hog are 
running no physical memory is free (as together they take up 1.5x physical) 
meaning that swap-prefetch wouldn't get a change to do anything and wouldn't 
make a difference. As such, the two instances test as you suggested would in 
fact not be testing anything it seems.

However, if you quit one, and idle long enough to continue with the other 
one until swap-prefetch prefetched all its memory back in, it should be a 
difference on the order of minutes, even total if swap prefetch fetched it 
back in without seeking al over swap-space, and "total" isn't applicable if 
the idle time really is free.

A program randomly touching single pages all over memory is a contrived 
worst case scenario and not a real-world issue. It is a boundary condition 
though, and it's simply quite impossible to think of any example where 
swap-prefetch would _not_ give you a snappier feeling machine after you've 
been idling.

So really the only question would seem to be -- does it hurt any if you have 
_not_ been?

Rene.


^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [ck] Re: -mm merge plans for 2.6.23
       [not found]                             ` <5c77e14b0707250353r48458316x5e6adde6dbce1fbd@mail.gmail.com>
@ 2007-07-25 11:06                               ` Nick Piggin
  2007-07-25 13:30                               ` Rene Herman
  1 sibling, 0 replies; 484+ messages in thread
From: Nick Piggin @ 2007-07-25 11:06 UTC (permalink / raw)
  To: Jos Poortvliet
  Cc: Rene Herman, Ingo Molnar, david, Valdis.Kletnieks, Ray Lee,
	Jesper Juhl, linux-kernel, ck list, linux-mm, Paul Jackson,
	Andrew Morton

Jos Poortvliet wrote:

> Nick
> has been talking about 'fixing the updatedb thing' for years now, no patch
> yet.

Wrong Nick, I think.

First I heard about the updatedb problem was a few months ago with people
saying updatedb was causing their system to swap (that is, swap prefetching
helped after updatedb). I haven't been able to even try to fix it because I
can't reproduce it (I'm sitting on a machine with 256MB RAM), and nobody
has wanted to help me.


> Besides, he won't fix OO.o nor all other userspace stuff - so 
> actually,
> he does NOT even promise an alternative. Not that I think fixing updatedb
> would be cool, btw - it sure would, but it's no reason not to include swap
> prefetch - it's mostly unrelated.
> 
> I think everyone with >1 gb ram should stop saying 'I don't need it' 
> because
> that's obvious for that hardware. Just like ppl having a dual- or quadcore
> shouldn't even talk about scheduler interactivity stuff...

Actually there are people with >1GB of ram who are saying it helps. Why do
you want to shut people out of the discussion?


> Desktop users want it, tests show it works, there is no alternative and the
> maybe-promised-one won't even fix all cornercases. It's small, mostly
> selfcontained. There is a maintainer. It's been stable for a long time. 
> It's
> been in MM for a long time.
> 
> Yet it doesn't make it. Andrew says 'some ppl have objections' (he means
> Nick) and he doesn't see an advantage in it (at least 4 gig ram, right,
> Andrew?).
> 
> Do I miss things?

You could try constructively contributing?


> Apparently, it didn't get in yet - and I find it hard to believe Andrew
> holds swapprefetch for reasons like the above. So it must be something 
> else.
> 
> 
> Nick is saying tests have already proven swap prefetch to be helpfull,
> that's not the problem. He calls the requirements to get in 'fuzzy'. OK.

The test I have seen is the one that forces a huge amount of memory to
swap out, waits, then touches it. That speeds up, and that's fine. That's
a good sanity test to ensure it is working. Beyond that there are other
considerations to getting something merged.

-- 
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-25  8:43                           ` Rene Herman
@ 2007-07-25 11:34                             ` Ingo Molnar
  2007-07-25 11:40                               ` Rene Herman
                                                 ` (2 more replies)
       [not found]                             ` <5c77e14b0707250353r48458316x5e6adde6dbce1fbd@mail.gmail.com>
  1 sibling, 3 replies; 484+ messages in thread
From: Ingo Molnar @ 2007-07-25 11:34 UTC (permalink / raw)
  To: Rene Herman
  Cc: Valdis.Kletnieks, david, Nick Piggin, Ray Lee, Jesper Juhl,
	Andrew Morton, ck list, Paul Jackson, linux-mm, linux-kernel


* Rene Herman <rene.herman@gmail.com> wrote:

> On 07/25/2007 10:28 AM, Ingo Molnar wrote:
> 
> >>Regardless, I'll stand by "[by disabling updatedb] the problem will 
> >>for a large part be solved" as I expect approximately 94.372 percent 
> >>of Linux desktop users couldn't care less about locate.
> >
> > i think that approach is illogical: because Linux mis-handled a 
> > mixed workload the answer is to ... remove a portion of that 
> > workload?
> 
> No. It got snipped but I introduced the comment by saying it was a 
> "that's not the point" kind of thing. [...]

ok - with that qualification i understand.

still, especially for someone like me who frequently deals with source 
code, 'locate' is indispensible.

and the fact is: updatedb discards a considerable portion of the cache 
completely unnecessarily: on a reasonably complex box no way do all the 
inodes and dentries fit into all of RAM, so we just trash everything. 
Maybe the kernel could be extended with a method of opening files in a 
'drop from the dcache after use' way. (beagled and backup tools could 
make use of that facility too.) (Or some other sort of 
file-cache-invalidation syscall that already exist, which would _also_ 
result in the immediate zapping of the dentry+inode from the dcache.)

	Ingo

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-25 11:34                             ` Ingo Molnar
@ 2007-07-25 11:40                               ` Rene Herman
  2007-07-25 11:50                                 ` Ingo Molnar
  2007-07-25 16:08                               ` Valdis.Kletnieks
  2007-07-25 22:05                               ` Paul Jackson
  2 siblings, 1 reply; 484+ messages in thread
From: Rene Herman @ 2007-07-25 11:40 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Valdis.Kletnieks, david, Nick Piggin, Ray Lee, Jesper Juhl,
	Andrew Morton, ck list, Paul Jackson, linux-mm, linux-kernel

On 07/25/2007 01:34 PM, Ingo Molnar wrote:

> and the fact is: updatedb discards a considerable portion of the cache 
> completely unnecessarily: on a reasonably complex box no way do all the 
> inodes and dentries fit into all of RAM, so we just trash everything.

Okay, but unless I've now managed to really quite horribly confuse myself, 
that wouldn't have anything to do with _swap_ prefetch would it?

Rene.

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-25 11:40                               ` Rene Herman
@ 2007-07-25 11:50                                 ` Ingo Molnar
  0 siblings, 0 replies; 484+ messages in thread
From: Ingo Molnar @ 2007-07-25 11:50 UTC (permalink / raw)
  To: Rene Herman
  Cc: Valdis.Kletnieks, david, Nick Piggin, Ray Lee, Jesper Juhl,
	Andrew Morton, ck list, Paul Jackson, linux-mm, linux-kernel


* Rene Herman <rene.herman@gmail.com> wrote:

> > and the fact is: updatedb discards a considerable portion of the 
> > cache completely unnecessarily: on a reasonably complex box no way 
> > do all the inodes and dentries fit into all of RAM, so we just trash 
> > everything.
> 
> Okay, but unless I've now managed to really quite horribly confuse 
> myself, that wouldn't have anything to do with _swap_ prefetch would 
> it?

it's connected: it would remove updatedb from the VM picture altogether. 
(updatedb would just cycle through the files with leaving minimal cache 
disturbance.)

hence swap-prefetch could concentrate on the cases where it makes sense 
to start swap prefetching _without_ destroying other, already cached 
content: such as when a large app exits and frees gobs of memory back 
into the buddy allocator. _That_ would be a definitive "no costs and 
side-effects" point for swap-prefetch to kick in, and it would eliminate 
this pretty artificial (and unnecessary) 'desktop versus server' 
controversy and would turn it into a 'helps everyone' feature.

	Ingo

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [ck] Re: -mm merge plans for 2.6.23
       [not found]                             ` <5c77e14b0707250353r48458316x5e6adde6dbce1fbd@mail.gmail.com>
  2007-07-25 11:06                               ` Nick Piggin
@ 2007-07-25 13:30                               ` Rene Herman
  2007-07-25 13:50                                 ` Ingo Molnar
  1 sibling, 1 reply; 484+ messages in thread
From: Rene Herman @ 2007-07-25 13:30 UTC (permalink / raw)
  To: Jos Poortvliet
  Cc: Ingo Molnar, david, Nick Piggin, Valdis.Kletnieks, Ray Lee,
	Jesper Juhl, linux-kernel, ck list, linux-mm, Paul Jackson,
	Andrew Morton

On 07/25/2007 12:53 PM, Jos Poortvliet wrote:

> On 7/25/07, *Rene Herman* <rene.herman@gmail.com 
> <mailto:rene.herman@gmail.com>> wrote:

>> Also note I'm not against swap prefetch or anything. I don't use it and
>> do not believe I have a pressing need for it, but do suspect it has 
>> potential to make quite a bit of difference on some things -- if only
>> to drastically reduce seeks if it means it's swapping in larger chunks
>> than a randomly faulting program would.
> 
> I wonder what your hardware is. Con talked about the diff in hardware 
> between most endusers and the kernel developers. 

I'm afraid you will need to categorize me more as an innocent bystander than 
a kernel developer and, as such, I have an endusery x86 with 768M (but I 
still think of myself as one of the cool kids!)

> Yes, swap prefetch doesn't help if you have 3 GB ram, but it DOES do a
> lot on a 256 mb laptop...

Rather, it does not help if you are not swapping or idling. Probably largely 
due to me being a rather serial person I seem to never even push my git tree 
from cache. Hence my belief that "I don't have a pressing need for it".

Taking a laptop as an example is interesting in itself by the way since a 
spundown disk (most applicable to laptops) is an argument against swap 
prefetch even when idle and when I'm not mistaken the feature actually 
disables itself when the machine's set to laptop mode...

> After using OO.o, the system continues to be slow for a long time. With
> swap prefetch, it's back up speed much faster. Con has showed a benchmark
> for this with speedups of 10 times and more, users mentioned they liked
> it.

After using and quiting OO.o. If you simply don't have any memory free to 
prefetch into swap prefetch won't help any. The fact that it helps the case 
of OO.o having pushed out firefox is fairly obvious.

> Nick has been talking about 'fixing the updatedb thing' for years now, no
> patch yet. Besides, he won't fix OO.o nor all other userspace stuff - so
> actually, he does NOT even promise an alternative. Not that I think
> fixing updatedb would be cool, btw - it sure would, but it's no reason
> not to include swap prefetch - it's mostly unrelated.

Well, the trouble at least to some is that they indeed seem to be rather 
unrelated. Why does the updatedb run even cause swapout? (itself ofcourse a 
pre-condition for swap-prefetch to help).

> I think everyone with >1 gb ram should stop saying 'I don't need it' 
> because that's obvious for that hardware. Just like ppl having a dual- 
> or quadcore shouldn't even talk about scheduler interactivity stuff...

Actually, interactivity is largely about latency and latency is largely or 
partly independent of CPU speed -- if something's keeping the system from 
scheduling for too long it's likely that it's hogging the CPU for a fixed 
number of usecs and those pass in the same amount of time on all CPUs (we 
hope...).

But that's a tangent anyway. I'm just glad that I get to say that I believe 
I don't need it with my 768M!

> Apparently, it didn't get in yet - and I find it hard to believe Andrew 
> holds swapprefetch for reasons like the above. So it must be something 
> else.
> 
> Nick is saying tests have already proven swap prefetch to be helpfull, 
> that's not the problem. He calls the requirements to get in 'fuzzy'. OK.
> Beer is fuzzy, do we need to offer beer to someone? If Andrew promises 
> to come to FOSDEM again next year, I'll offer him a beer, if that 
> helps... Anything else? A nice massage?

Personally I'd go for sexual favours directly (but then again, I always do).

But please also note that I even _literally_ said above that I myself am not 
against swap-prefetch or anything and yet I get what appears to be an least 
somewhat adversary rant directed at me. Which in itself is fine, but not too 
helpful...

Nick Piggin is the person to convince it seems and if I've read things right 
(I only stepped into this thing at the updatedb mention, so maybe I haven't) 
his main question is _why_ the hell it helps updatedb. As long as you don't 
know this, then even a solution that helps could be papering over a problem 
which you'd much rather fix at the root rather than at the symptom.

Rene.


^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [ck] Re: -mm merge plans for 2.6.23
  2007-07-25 13:30                               ` Rene Herman
@ 2007-07-25 13:50                                 ` Ingo Molnar
  2007-07-25 17:33                                   ` Satyam Sharma
  0 siblings, 1 reply; 484+ messages in thread
From: Ingo Molnar @ 2007-07-25 13:50 UTC (permalink / raw)
  To: Rene Herman
  Cc: Jos Poortvliet, david, Nick Piggin, Valdis.Kletnieks, Ray Lee,
	Jesper Juhl, linux-kernel, ck list, linux-mm, Paul Jackson,
	Andrew Morton


* Rene Herman <rene.herman@gmail.com> wrote:

> Nick Piggin is the person to convince it seems and if I've read things 
> right (I only stepped into this thing at the updatedb mention, so 
> maybe I haven't) his main question is _why_ the hell it helps 
> updatedb. [...]

btw., i'd like to make this clear: if you want stuff to go upstream, do 
not concentrate on 'convincing the maintainer'.

Instead concentrate on understanding the _problem_, concentrate on 
making sure that both you and the maintainer understands the problem 
correctly, possibly write some testcase that clearly exposes it, and 
help the maintainer debug the problem. _Optionally_, if you find joy in 
it, you are also free to write a proposed solution for that problem and 
submit it to the maintainer.

But a "here is a solution, take it or leave it" approach, before having 
communicated the problem to the maintainer and before having debugged 
the problem is the wrong way around. It might still work out fine if the 
solution is correct (especially if the patch is small and obvious), but 
if there are any non-trivial tradeoffs involved, or if nontrivial amount 
of code is involved, you might see your patch at the end of a really 
long (and constantly growing) waiting list of patches.

	Ingo

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-25  4:46             ` david
  2007-07-25  8:00               ` Rene Herman
@ 2007-07-25 15:55               ` Ray Lee
  2007-07-25 20:16                 ` Al Boldi
  1 sibling, 1 reply; 484+ messages in thread
From: Ray Lee @ 2007-07-25 15:55 UTC (permalink / raw)
  To: david, Al Boldi
  Cc: Nick Piggin, Jesper Juhl, Andrew Morton, ck list, Ingo Molnar,
	Paul Jackson, linux-mm, linux-kernel

Hoo boy, lots of messages this morning.

(Al? I've added you to the CC: because of your swap-in vs swap-out
speed report from January. See below -- half-way down or so -- for
more detals.)

On 7/24/07, david@lang.hm <david@lang.hm> wrote:
> On Tue, 24 Jul 2007, Ray Lee wrote:
>
> > On 7/23/07, Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> >>  Ray Lee wrote:
> >
> >>  Looking at your past email, you have a 1GB desktop system and your
> >>  overnight updatedb run is causing stuff to get swapped out such that
> >>  swap prefetch makes it significantly better. This is really
> >>  intriguing to me, and I would hope we can start by making this
> >>  particular workload "not suck" without swap prefetch (and hopefully
> >>  make it even better than it currently is with swap prefetch because
> >>  we'll try not to evict useful file backed pages as well).
> >
> > updatedb is an annoying case, because one would hope that there would
> > be a better way to deal with that highly specific workload. It's also
> > pretty stat dominant, which puts it roughly in the same category as a
> > git diff. (They differ in that updatedb does a lot of open()s and
> > getdents on directories, git merely does a ton of lstat()s instead.)
> >
> > Anyway, my point is that I worry that tuning for an unusual and
> > infrequent workload (which updatedb certainly is), is the wrong way to
> > go.
>
> updatedb pushing out program data may be able to be improved on with drop
> behind or similar.

Hmm, I thought drop-behind wasn't going to be able to target metadata?

> however another scenerio that causes a similar problem is when a user is
> busy useing one of the big memory hogs and then switches to another (think
> switching between openoffice and firefox)

Yes, and that was the core of my original report months ago. I'm
working for a while on one task, go to openoffice to view a report, or
gimp to tweak the colors on a photo before uploading it, and then go
back to my email and... and... and... there we go. The faults that
occur when I context switch is what's most annoying.

> >>  After that we can look at other problems that swap prefetch helps
> >>  with, or think of some ways to measure your "whole day" scenario.
> >>
> >>  So when/if you have time, I can cook up a list of things to monitor
> >>  and possibly a patch to add some instrumentation over this updatedb
> >>  run.
> >
> > That would be appreciated. Don't spend huge amounts of time on it,
> > okay? Point me the right direction, and we'll see how far I can run
> > with it.
>
> you could make a synthetic test by writing a memory hog that allocates 3/4
> of your ram then pauses waiting for input and then randomly accesses the
> memory for a while (say randomly accessing 2x # of pages allocated) and
> then pausing again before repeating

Con wrote a benchmark much like that. It showed measurable improvement
with swap prefetch.

> by the way, I've also seen comments on the Postgres performance mailing
> list about how slow linux is compared to other OS's in pulling data back
> in that's been pushed out to swap (not a factor on dedicated database
> machines, but a big factor on multi-purpose machines)

Yeah, akpm and... one of the usual suspects, had mentioned something
such as 2.6 is half the speed of 2.4 for swapin. (Let's see if I can
find a reference for that, it's been a year or more...) Okay,
misremembered. Swap in is half the speed of swap out (
http://lkml.org/lkml/2007/1/22/173 ). Al Boldi (added to the CC:, poor
sod), is the one who knows how to measure that, I'm guessing.

Al? How are you coming up with those figures? I'm interested in
reproducing it. It could be due to something stupid, such as the VM
faulting things out in reverse order or something...

> >>  Anyway, I realise swap prefetching has some situations where it will
> >>  fundamentally outperform even the page replacement oracle. This is
> >>  why I haven't asked for it to be dropped: it isn't a bad idea at all.
> >
> > <nod>
> >
> >>  However, if we can improve basic page reclaim where it is obviously
> >>  lacking, that is always preferable. eg: being a highly speculative
> >>  operation, swap prefetch is not great for power efficiency -- but we
> >>  still want laptop users to have a good experience as well, right?
> >
> > Absolutely. Disk I/O is the enemy, and the best I/O is one you never
> > had to do in the first place.
>
> almost always true, however there is some amount of I/O that is free with
> todays drives (remember, they read the entire track into ram and then
> give you the sectors on the track that you asked for). and if you have a
> raid array this is even more true.

Yeah, I knew I'd get called on that one :-). It's the seeks that'll
really kill you, and as you say once you're on the track the rest is
practically free (which is why the VM should prefer to evict larger
chunks at a time rather than lots of small things, see
http://lkml.org/lkml/2007/7/23/214 for something that's heading the
right direction, though the side-effects are unfortunate.

> if you read one sector in from a raid5 array you have done all the same
> I/O that you would have to do to read in the entire stripe, but I don't
> believe that the current system will keep it all around if it exceeds the
> readahead limit.

Fengguang Wu is doing lots of active work on making the readahead suck
less. Ping him and he'll likely take an active interest in the RAID
stuff.

Ray

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-25  5:30                   ` Rene Herman
  2007-07-25  5:51                     ` david
  2007-07-25  7:14                     ` Valdis.Kletnieks
@ 2007-07-25 16:02                     ` Ray Lee
  2007-07-25 20:55                       ` Zan Lynx
  2007-07-26  1:15                       ` [ck] " Matthew Hawkins
  2 siblings, 2 replies; 484+ messages in thread
From: Ray Lee @ 2007-07-25 16:02 UTC (permalink / raw)
  To: Rene Herman
  Cc: david, Nick Piggin, Jesper Juhl, Andrew Morton, ck list,
	Ingo Molnar, Paul Jackson, linux-mm, linux-kernel

On 7/24/07, Rene Herman <rene.herman@gmail.com> wrote:
> Yes, but what's locate's usage scenario? I've never, ever wanted to use it.
> When do you know the name of something but not where it's located, other
> than situations which "which" wouldn't cover and after just having
> installed/unpacked something meaning locate doesn't know about it yet either?

I use it to find source files and documents all the time. One of my
work boxes has <runs a locate work | wc -l> ~38500 files and
directories under my source directory. And then there's the "I wrote
that tech doc two years ago, where was that. Hmm, what did I name it?
Bet it had 323 in the name, and doc in the path."

I'd just like updatedb to amortize its work better. If we had some way
to track all filesystem events, updatedb could keep a live and
accurate index on the filesystem. And this isn't just updatedb that
wants that, beagle and tracker et al also want to know filesystem
events so that they can index the documents themselves as well as the
metadata. And if they do it live, that spreads the cost out, including
the VM pressure.

Ray

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-25 11:34                             ` Ingo Molnar
  2007-07-25 11:40                               ` Rene Herman
@ 2007-07-25 16:08                               ` Valdis.Kletnieks
  2007-07-25 22:05                               ` Paul Jackson
  2 siblings, 0 replies; 484+ messages in thread
From: Valdis.Kletnieks @ 2007-07-25 16:08 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rene Herman, david, Nick Piggin, Ray Lee, Jesper Juhl,
	Andrew Morton, ck list, Paul Jackson, linux-mm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 860 bytes --]

On Wed, 25 Jul 2007 13:34:01 +0200, Ingo Molnar said:

> Maybe the kernel could be extended with a method of opening files in a 
> 'drop from the dcache after use' way. (beagled and backup tools could 
> make use of that facility too.) (Or some other sort of 
> file-cache-invalidation syscall that already exist, which would _also_ 
> result in the immediate zapping of the dentry+inode from the dcache.)

The semantic that would benefit my work patterns the most would not be
"immediate zapping" - I have 2G of RAM, so often there's no memory pressure,
and often a 'find' will be followed by another similar 'find' that will hit a
lot of the same dentries and inodes, so may as well save them if we can.
Flagging it as "the first to be heaved over the side the instant there *is*
pressure" would suit just fine.

Or is that the semantic you actually meant?


[-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --]

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-25  5:37                   ` Nick Piggin
                                       ` (2 preceding siblings ...)
  2007-07-25  6:44                     ` Eric St-Laurent
@ 2007-07-25 16:09                     ` Ray Lee
  2007-07-26  4:57                       ` Andrew Morton
  2007-07-25 17:55                     ` Frank A. Kingswood
  4 siblings, 1 reply; 484+ messages in thread
From: Ray Lee @ 2007-07-25 16:09 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Eric St-Laurent, Rene Herman, Jesper Juhl, Andrew Morton,
	ck list, Ingo Molnar, Paul Jackson, linux-mm, linux-kernel

Hey Eric,

On 7/24/07, Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> Eric St-Laurent wrote:
> > On Wed, 2007-25-07 at 06:55 +0200, Rene Herman wrote:
> >
> >
> >>It certainly doesn't run for me ever. Always kind of a "that's not the
> >>point" comment but I just keep wondering whenever I see anyone complain
> >>about updatedb why the _hell_ they are running it in the first place. If
> >>anyone who never uses "locate" for anything simply disable updatedb, the
> >>problem will for a large part be solved.
> >>
> >>This not just meant as a cheap comment; while I can think of a few similar
> >>loads even on the desktop (scanning a browser cache, a media player indexing
> >>a large amount of media files, ...) I've never heard of problems _other_
> >>than updatedb. So just junk that crap and be happy.
> >
> >
> >>From my POV there's two different problems discussed recently:
> >
> > - updatedb type of workloads that add tons of inodes and dentries in the
> > slab caches which of course use the pagecache.
> >
> > - streaming large files (read or copying) that fill the pagecache with
> > useless used-once data

No, there's a third case which I find the most annoying. I have
multiple working sets, the sum of which won't fit into RAM. When I
finish one, the kernel had time to preemptively swap back in the
other, and yet it didn't. So, I sit around, twiddling my thumbs,
waiting for my music player to come back to life, or thunderbird,
or...

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-25  4:06             ` Nick Piggin
  2007-07-25  4:55               ` Rene Herman
  2007-07-25  6:09               ` [ck] " Matthew Hawkins
@ 2007-07-25 16:19               ` Ray Lee
  2007-07-25 20:46               ` Andi Kleen
  2007-07-31 16:37               ` [ck] Re: -mm merge plans for 2.6.23 Matthew Hawkins
  4 siblings, 0 replies; 484+ messages in thread
From: Ray Lee @ 2007-07-25 16:19 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Jesper Juhl, Andrew Morton, ck list, Ingo Molnar, Paul Jackson,
	linux-mm, linux-kernel

On 7/24/07, Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> Ray Lee wrote:
> > On 7/23/07, Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> >> If we can first try looking at
> >> some specific problems that are easily identified.
> >
> > Always easier, true. Let's start with "My mouse jerks around under
> > memory load." A Google Summer of Code student working on X.Org claims
> > that mlocking the mouse handling routines gives a smooth cursor under
> > load ([1]). It's surprising that the kernel would swap that out in the
> > first place.
> >
> > [1]
> > http://vignatti.wordpress.com/2007/07/06/xorg-input-thread-summary-or-something/
>
> OK, I'm not sure what the point is though. Under heavy memory load,
> things are going to get swapped out... and swap prefetch isn't going
> to help there (at least, not during the memory load).

Sorry, I headed slightly off-topic. Or perhaps 'up-topic' to the
larger issue, which is that the desktop experience has some suckiness
to it.

My point is that the page replacement algorithm has some choice as to
what to evict. The xorg input handler never should have been evicted.
It was hopefully a hard example of where the current page replacement
policy is falling flat on its face.

All that said, this could really easily be handled by xorg mlocking
the critical realtime stuff.

> There are also other issues like whether the CPU scheduler is at fault,
> etc. Interactive workloads are always the hardest to work out.

This one is not a scheduler issue, as mlock()ing the mouse handling
routines gives a smooth cursor. It's just a pure page replacement
problem, as the kernel should never have swapped that out in the first
place.

<snip things I agreed with>

<snip list of things to watch during updatedb run>

Ray

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [ck] Re: -mm merge plans for 2.6.23
  2007-07-25 13:50                                 ` Ingo Molnar
@ 2007-07-25 17:33                                   ` Satyam Sharma
  2007-07-25 20:35                                     ` Ingo Molnar
  0 siblings, 1 reply; 484+ messages in thread
From: Satyam Sharma @ 2007-07-25 17:33 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Rene Herman, Jos Poortvliet, david, Nick Piggin,
	Valdis.Kletnieks, Ray Lee, Jesper Juhl, linux-kernel, ck list,
	linux-mm, Paul Jackson, Andrew Morton

Hi Ingo,

[ Going off-topic, nothing related to swap/prefetch/etc. Just getting
a hang of how development goes on here ... ]

On 7/25/07, Ingo Molnar <mingo@elte.hu> wrote:
>
> * Rene Herman <rene.herman@gmail.com> wrote:
>
> > Nick Piggin is the person to convince it seems and if I've read things
> > right (I only stepped into this thing at the updatedb mention, so
> > maybe I haven't) his main question is _why_ the hell it helps
> > updatedb. [...]
>
> btw., i'd like to make this clear: if you want stuff to go upstream, do
> not concentrate on 'convincing the maintainer'.

It's not so easy or clear-cut, see below.

> Instead concentrate on understanding the _problem_,

Of course -- that's a given.

> concentrate on
> making sure that both you and the maintainer understands the problem
> correctly,

This itself may require some "convincing" to do. What if the maintainer
just doesn't recognize the problem? Note that the development model
here is more about the "social" thing than purely a "technical" thing.
People do handwave, possibly due to innocent misunderstandings,
possibly without. Often it's just a case of seeing different reasons behind
the "problematic behaviour". Or it could be a case of all of the above.

> possibly write some testcase that clearly exposes it, and

Oh yes -- that'll be helpful, but definitely not necessarily a prerequisite
for all issues, and then you can't even expect everybody to write or
test/benchmark with testcases. (oh, btw, this is assuming you do find
consensus on a testcase)

> help the maintainer debug the problem.

Umm ... well. Should this "dance-with-the-maintainer" and all be really
necessary? What you're saying is easy if a "bug" is simple and objective,
with mathematically few (probably just one) possible correct solutions.
Often (most often, in fact) it's a subjective issue -- could be about APIs,
high level design, tradeoffs, even little implementation nits ... with one
person wanting to do it one way, another thinks there's something hacky
or "band-aidy" about it and a more beautiful/elegant solution exists elsewhere.
I think there's a similar deadlock here (?)

> _Optionally_, if you find joy in
> it, you are also free to write a proposed solution for that problem

Oh yes. But why "optionally"? This is *precisely* what the spirit of
development in such open / distributed projects is ... unless Linux
wants to die the same, slow, ivory-towered, miserable death that
*BSD have.

> and
> submit it to the maintainer.

Umm, ok ... pretty unlikely Linus or Andrew would take patches for any
kernel subsystem (that isn't obvious/trivial) from anybody just like that,
so you do need to Cc: the ones they trust (maintainer) to ensure they
review/ack your work and pick it up.

> But a "here is a solution, take it or leave it" approach,

Agreed. That's definitely not the way to go.

> before having
> communicated the problem to the maintainer

Umm, well this could depend from problem-to-problem.

> and before having debugged
> the problem

Again, agreed -- but people can plausibly see different root causes for
the same symptoms -- and different solutions.

> is the wrong way around. It might still work out fine if the
> solution is correct (especially if the patch is small and obvious), but
> if there are any non-trivial tradeoffs involved, or if nontrivial amount
> of code is involved, you might see your patch at the end of a really
> long (and constantly growing) waiting list of patches.

That's the whole point. For non-trivial / non-obvious / subjective issues,
the "process" you laid out above could itself become a problem ...

Satyam

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-25  5:37                   ` Nick Piggin
                                       ` (3 preceding siblings ...)
  2007-07-25 16:09                     ` Ray Lee
@ 2007-07-25 17:55                     ` Frank A. Kingswood
  4 siblings, 0 replies; 484+ messages in thread
From: Frank A. Kingswood @ 2007-07-25 17:55 UTC (permalink / raw)
  To: linux-kernel; +Cc: ck, linux-mm

Nick Piggin wrote:
> OK, this is where I start to worry. Swap prefetch AFAIKS doesn't fix
> the updatedb problem very well, because if updatedb has caused swapout
> then it has filled memory, and swap prefetch doesn't run unless there
> is free memory (not to mention that updatedb would have paged out other
> files as well).

It is *not* about updatedb. That is just a trivial case which people 
notice. Therefore fixing updatedb to be nicer, as was discussed at 
various points in this thread, is *not* the solution.
Most users are also *not*at*all* interested in kernel builds as a metric 
of system performance.

When I'm at work, I run a large, commercial, engineering application. 
While running, it takes most of the system memory (4GB and up), and it 
reads and writes very large files. Swap prefetch noticeably helps my 
desktop too. Can I measure it? Not sure. Can people on lkml fix the 
application? Certainly not.

Frank


^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-25 15:55               ` Ray Lee
@ 2007-07-25 20:16                 ` Al Boldi
  2007-07-27  0:28                   ` Magnus Naeslund
  0 siblings, 1 reply; 484+ messages in thread
From: Al Boldi @ 2007-07-25 20:16 UTC (permalink / raw)
  To: Ray Lee, david
  Cc: Nick Piggin, Jesper Juhl, Andrew Morton, ck list, Ingo Molnar,
	Paul Jackson, linux-mm, linux-kernel

Ray Lee wrote:
> On 7/24/07, david@lang.hm <david@lang.hm> wrote:
> > by the way, I've also seen comments on the Postgres performance mailing
> > list about how slow linux is compared to other OS's in pulling data back
> > in that's been pushed out to swap (not a factor on dedicated database
> > machines, but a big factor on multi-purpose machines)
>
> Yeah, akpm and... one of the usual suspects, had mentioned something
> such as 2.6 is half the speed of 2.4 for swapin. (Let's see if I can
> find a reference for that, it's been a year or more...) Okay,
> misremembered. Swap in is half the speed of swap out (
> http://lkml.org/lkml/2007/1/22/173 ). Al Boldi (added to the CC:, poor
> sod), is the one who knows how to measure that, I'm guessing.
>
> Al? How are you coming up with those figures? I'm interested in
> reproducing it. It could be due to something stupid, such as the VM
> faulting things out in reverse order or something...

Thanks for asking.  I'm rather surprised why nobody's noticing any of this 
slowdown.  To be fair, it's not really a regression, on the contrary, 2.4 is 
lot worse wrt swapin and swapout, and Rik van Riel even considers a 50% 
swapin slowdown wrt swapout something like better than expected (see thread 
'[RFC] kswapd: Kernel Swapper performance').  He probably meant random 
swapin, which seems to offer a 4x slowdown.

There are two ways to reproduce this:

1. swsusp to disk reports ~44mb/s swapout, and ~25mb/s swapin during resume

2. tmpfs swapout is superfast, whereas swapin is really slow
(see thread '[PATCH] free swap space when (re)activating page')

Here is an excerpt from that thread (note machine config in first line):

============================================
 RAM 512mb , SWAP 1G
 #mount -t tmpfs -o size=1G none /dev/shm
 #time cat /dev/full > /dev/shm/x.dmp
 15sec
 #time cat /dev/shm/x.dmp > /dev/null
 58sec
 #time cat /dev/shm/x.dmp > /dev/null
 72sec
 #time cat /dev/shm/x.dmp > /dev/null
 85sec
 #time cat /dev/shm/x.dmp > /dev/null
 93sec
 #time cat /dev/shm/x.dmp > /dev/null
 99sec
============================================

As you can see, swapout is running full wirespeed, whereas swapin not only is 
4x slower, but increasingly gets the VM tangled up to end at a ~6x slowdown.

So again, I'm really surprised people haven't noticed.


Thanks!

--
Al


^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [ck] Re: -mm merge plans for 2.6.23
  2007-07-25 17:33                                   ` Satyam Sharma
@ 2007-07-25 20:35                                     ` Ingo Molnar
  2007-07-26  2:32                                       ` Bartlomiej Zolnierkiewicz
  0 siblings, 1 reply; 484+ messages in thread
From: Ingo Molnar @ 2007-07-25 20:35 UTC (permalink / raw)
  To: Satyam Sharma
  Cc: Rene Herman, Jos Poortvliet, david, Nick Piggin,
	Valdis.Kletnieks, Ray Lee, Jesper Juhl, linux-kernel, ck list,
	linux-mm, Paul Jackson, Andrew Morton


* Satyam Sharma <satyam.sharma@gmail.com> wrote:

> > concentrate on making sure that both you and the maintainer 
> > understands the problem correctly,
> 
> This itself may require some "convincing" to do. What if the 
> maintainer just doesn't recognize the problem? Note that the 
> development model here is more about the "social" thing than purely a 
> "technical" thing. People do handwave, possibly due to innocent 
> misunderstandings, possibly without. Often it's just a case of seeing 
> different reasons behind the "problematic behaviour". Or it could be a 
> case of all of the above.

sure - but i was really not talking about from the user's perspective, 
but from the enterprising kernel developer's perspective who'd like to 
solve a particular problem. And the nice thing about concentrating on 
the problem: if you do that well, it does not really matter what the 
maintainer thinks!

( Talking to the maintainer can of course be of enormous help in the 
  quest for understanding the problem and figuring out the best fix - 
  the maintainer will most likely know more about the subject than 
  yourself. More communication never hurts. It's an additional bonus if 
  you manage to convince the maintainer to take up the matter for 
  himself. It's not a given right though - a maintainer's main task is 
  to judge code that is being submitted, to keep a subsystem running
  smoothly and to not let it regress - but otherwise there can easily be
  different priorities of what tasks to tackle first, and in that sense 
  the maintainer is just one of the many overworked kernel developers 
  who has no real obligation what to tackle first. )

If the maintainer rejects something despite it being well-reasoned, 
well-researched and robustly implemented with no tradeoffs and 
maintainance problems at all then it's a bad maintainer. (but from all 
i've seen in the past few years the VM maintainers do their job pretty 
damn fine.) And note that i _do_ disagree with them in this particular 
swap-prefetch case, but still, the non-merging of swap-prefetch was not 
a final decision at all. It was more of a "hm, dunno, i still dont 
really like it - shouldnt this be done differently? Could we debug this 
a bit better?" reaction. Yes, it can be frustrating after more than one 
year.

> > possibly write some testcase that clearly exposes it, and
> 
> Oh yes -- that'll be helpful, but definitely not necessarily a 
> prerequisite for all issues, and then you can't even expect everybody 
> to write or test/benchmark with testcases. (oh, btw, this is assuming 
> you do find consensus on a testcase)

no, but Con is/was certainly more than capable to write testcases and to 
debug various scenarios. That's the way how new maintainers are found 
within Linux: people take matters in their own hands and improve a 
subsystem so that they'll either peacefully co-work with the other 
maintainers or they replace them (sometimes not so peacefully - like in 
the IDE/SATA/PATA saga).

> > help the maintainer debug the problem.
> 
> Umm ... well. Should this "dance-with-the-maintainer" and all be 
> really necessary? What you're saying is easy if a "bug" is simple and 
> objective, with mathematically few (probably just one) possible 
> correct solutions. Often (most often, in fact) it's a subjective issue 
> -- could be about APIs, high level design, tradeoffs, even little 
> implementation nits ... with one person wanting to do it one way, 
> another thinks there's something hacky or "band-aidy" about it and a 
> more beautiful/elegant solution exists elsewhere. I think there's a 
> similar deadlock here (?)

you dont _have to_ cooperative with the maintainer, but it's certainly 
useful to work with good maintainers, if your goal is to improve Linux. 
Or if for some reason communication is not working out fine then grow 
into the job and replace the maintainer by doing a better job.

> > _Optionally_, if you find joy in it, you are also free to write a 
> > proposed solution for that problem
> 
> Oh yes. But why "optionally"? This is *precisely* what the spirit of 
> development in such open / distributed projects is ... unless Linux 
> wants to die the same, slow, ivory-towered, miserable death that *BSD 
> have.

perhaps you misunderstood how i meant the 'optional': it is certainly 
not required to write a solution for every problem you are reporting. 
Best-case the maintainer picks the issue up and solves it. Worst-case 
you get ignored. But you always have the option to take matters into 
your own hands and solve the problem.

> >and submit it to the maintainer.
> 
> Umm, ok ... pretty unlikely Linus or Andrew would take patches for any 
> kernel subsystem (that isn't obvious/trivial) from anybody just like 
> that, so you do need to Cc: the ones they trust (maintainer) to ensure 
> they review/ack your work and pick it up.

actually, it happens pretty frequently, and NACK-ing perfectly 
reasonable patches is a sure way towards getting replaced as a 
maintainer.

> > is the wrong way around. It might still work out fine if the 
> > solution is correct (especially if the patch is small and obvious), 
> > but if there are any non-trivial tradeoffs involved, or if 
> > nontrivial amount of code is involved, you might see your patch at 
> > the end of a really long (and constantly growing) waiting list of 
> > patches.
> 
> That's the whole point. For non-trivial / non-obvious / subjective 
> issues, the "process" you laid out above could itself become a problem 
> ...

firstly, there's rarely any 'subjective' issue in maintainance 
decisions, even when it comes to complex patches. The 'subjective' issue 
becomes a factor mostly when a problem has not been researched well 
enough, when it becomes more of a faith thing ('i believe it helps me') 
than a fully fact-backed argument. Maintainers tend to dodge such issues 
until they become more clearly fact-backed.

providing more and more facts gradually reduces the 'judgement/taste' 
leeway of maintainers, down to an almost algorithmic level.

but in any case there's always the ultimate way out: prove that you can 
do a better job yourself and replace the maintainer. But providing an 
overwhelming, irresistable body of facts in favor of a patch does the 
trick too in 99.9% of the cases.

	Ingo

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-25  4:06             ` Nick Piggin
                                 ` (2 preceding siblings ...)
  2007-07-25 16:19               ` Ray Lee
@ 2007-07-25 20:46               ` Andi Kleen
  2007-07-26  8:38                 ` Frank Kingswood
  2007-07-31 16:37               ` [ck] Re: -mm merge plans for 2.6.23 Matthew Hawkins
  4 siblings, 1 reply; 484+ messages in thread
From: Andi Kleen @ 2007-07-25 20:46 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Ray Lee, Jesper Juhl, Andrew Morton, ck list, Ingo Molnar,
	Paul Jackson, linux-mm, linux-kernel

Nick Piggin <nickpiggin@yahoo.com.au> writes:

> Ray Lee wrote:
> > On 7/23/07, Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> 
> >> Also a random day at the desktop, it is quite a broad scope and
> >> pretty well impossible to analyse.
> > It is pretty broad, but that's also what swap prefetch is targetting.
> > As for hard to analyze, I'm not sure I agree. One can black-box test
> > this stuff with only a few controls. e.g., if I use the same apps each
> > day (mercurial, firefox, xorg, gcc), and the total I/O wait time
> > consistently goes down on a swap prefetch kernel (normalized by some
> > control statistic, such as application CPU time or total I/O, or
> > something), then that's a useful measurement.
> 
> I'm not saying that we can't try to tackle that problem, but first of
> all you have a really nice narrow problem where updatedb seems to be
> causing the kernel to completely do the wrong thing. So we start on
> that.

One simple way to fix this would be to implement a fadvise() flag
that puts the dentry/inode on a "soon to be expired" list if there
are no other references. Then if a dentry allocation needs more
memory try to reuse dentries from that list (or better queue) first. Any other
access will remove the dentry from the list. 

Disadvantage would be that the userland would need to be patched,
but I guess it's better than adding very dubious heuristics to the
kernel.

Similar thing could be done for directory buffers although they
are probably less of a problem.

I expect that C.Lameter's directed dentry/inode freeing in slub will also
make a big difference. People who have problems with updatedb should
definitely try mm which has it I believe and enable SLUB.

-Andi (who always thought swap prefetch was just a workaround, not
a real solution) 

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-25 16:02                     ` Ray Lee
@ 2007-07-25 20:55                       ` Zan Lynx
  2007-07-25 21:28                         ` Ray Lee
  2007-07-26  1:15                       ` [ck] " Matthew Hawkins
  1 sibling, 1 reply; 484+ messages in thread
From: Zan Lynx @ 2007-07-25 20:55 UTC (permalink / raw)
  To: Ray Lee
  Cc: Rene Herman, david, Nick Piggin, Jesper Juhl, Andrew Morton,
	ck list, Ingo Molnar, Paul Jackson, linux-mm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 854 bytes --]

On Wed, 2007-07-25 at 09:02 -0700, Ray Lee wrote:

> I'd just like updatedb to amortize its work better. If we had some way
> to track all filesystem events, updatedb could keep a live and
> accurate index on the filesystem. And this isn't just updatedb that
> wants that, beagle and tracker et al also want to know filesystem
> events so that they can index the documents themselves as well as the
> metadata. And if they do it live, that spreads the cost out, including
> the VM pressure.

That would be nice.  It'd be great if there was a per-filesystem inotify
mode.  I can't help but think it'd be more efficient than recursing
every directory and adding a watch.

Or maybe a netlink thing that could buffer events since filesystem mount
until a daemon could get around to starting, so none were lost.
-- 
Zan Lynx <zlynx@acm.org>

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-25 20:55                       ` Zan Lynx
@ 2007-07-25 21:28                         ` Ray Lee
  0 siblings, 0 replies; 484+ messages in thread
From: Ray Lee @ 2007-07-25 21:28 UTC (permalink / raw)
  To: Zan Lynx
  Cc: Rene Herman, david, Nick Piggin, Jesper Juhl, Andrew Morton,
	ck list, Ingo Molnar, Paul Jackson, linux-mm, linux-kernel

On 7/25/07, Zan Lynx <zlynx@acm.org> wrote:
> On Wed, 2007-07-25 at 09:02 -0700, Ray Lee wrote:
>
> > I'd just like updatedb to amortize its work better. If we had some way
> > to track all filesystem events, updatedb could keep a live and
> > accurate index on the filesystem. And this isn't just updatedb that
> > wants that, beagle and tracker et al also want to know filesystem
> > events so that they can index the documents themselves as well as the
> > metadata. And if they do it live, that spreads the cost out, including
> > the VM pressure.
>
> That would be nice.  It'd be great if there was a per-filesystem inotify
> mode.  I can't help but think it'd be more efficient than recursing
> every directory and adding a watch.
>
> Or maybe a netlink thing that could buffer events since filesystem mount
> until a daemon could get around to starting, so none were lost.

See "Filesystem Event Reporter" by Yi Yang, that does pretty much
exactly that. http://lkml.org/lkml/2006/9/30/98 . Author had things to
update, never resubmitted it as far as I can tell.

Ray

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-25 11:34                             ` Ingo Molnar
  2007-07-25 11:40                               ` Rene Herman
  2007-07-25 16:08                               ` Valdis.Kletnieks
@ 2007-07-25 22:05                               ` Paul Jackson
  2007-07-25 22:22                                 ` Zan Lynx
                                                   ` (3 more replies)
  2 siblings, 4 replies; 484+ messages in thread
From: Paul Jackson @ 2007-07-25 22:05 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: rene.herman, Valdis.Kletnieks, david, nickpiggin, ray-lk,
	jesper.juhl, akpm, ck, linux-mm, linux-kernel

> and the fact is: updatedb discards a considerable portion of the cache 
> completely unnecessarily: on a reasonably complex box no way do all the 

I'm wondering how much of this updatedb problem is due to poor layout
of swap and other file systems across disk spindles.

I'll wager that those most impacted by updatedb have just one disk.

I have the following three boxes - three different setups, each with
different updatedb behaviour:

    The first box, with 1 GB ram, becomes dog slow as soon as it
    breaths on the swap device.  Updatedb and backups are painful
    intrusions on any interactive work on that system.  I sometimes
    wait a half minute for a response from an interactive application
    anytime it has to go to disk.  This box has a single disk spindle,
    on an old cheap slow disk, with swap on the opposite end of the
    disk from root and the main usr partition.  It's a worst case
    disk seek test device.

    The second box, also with 1 GB ram, has multiple disk spindles,
    and swap on its own spindle.  I can still notice updatedb and
    backup, but it's far far less painful.

    The third box has dual CPU cores and 4 GB ram.  Updatedb runs
    over the entire system in perhaps 30 seconds with no perceptible
    impact at all on interactive uses.  Everything is still in memory
    from the previous updatedb run; the disk is just used to write
    out new stuff.  Swap is never used on this (sweet) rig.

I'd think that prefetch would help in the single disk spindle
configuration, because it does the swap accesses separately, instead
of intermingling them with root or usr partition accesses, which
would require alot of disk head seeking.

Pretty much anytime that ordinary desktop users complain about
performance as much as they have about this one, it's either disk
head seeks or network delays.  Nothing else is -that- slow, to be so
noticeable to so many users just doing ordinary work.

Question:
  Could those who have found this prefetch helps them alot say how
  many disks they have?  In particular, is their swap on the same
  disk spindle as their root and user files?

Answer - for me:
  On my system where updatedb is a big problem, I have one, slow, disk.
  On my system where updatedb is a small problem, swap is on a separate
    spindle.
  On my system where updatedb is -no- problem, I have so much memory
    I never use swap.

I'd expect the laptop crowd to mostly have a single, slow, disk, and
hence to find updatedb more painful.

-- 
                  I won't rest till it's the best ...
                  Programmer, Linux Scalability
                  Paul Jackson <pj@sgi.com> 1.925.600.0401

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-25 22:05                               ` Paul Jackson
@ 2007-07-25 22:22                                 ` Zan Lynx
  2007-07-25 22:27                                 ` Jesper Juhl
                                                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 484+ messages in thread
From: Zan Lynx @ 2007-07-25 22:22 UTC (permalink / raw)
  To: Paul Jackson
  Cc: Ingo Molnar, rene.herman, Valdis.Kletnieks, david, nickpiggin,
	ray-lk, jesper.juhl, akpm, ck, linux-mm, linux-kernel

[-- Attachment #1: Type: text/plain, Size: 953 bytes --]

On Wed, 2007-07-25 at 15:05 -0700, Paul Jackson wrote:
[snip]
> Question:
>   Could those who have found this prefetch helps them alot say how
>   many disks they have?  In particular, is their swap on the same
>   disk spindle as their root and user files?
> 
> Answer - for me:
>   On my system where updatedb is a big problem, I have one, slow, disk.
>   On my system where updatedb is a small problem, swap is on a separate
>     spindle.
>   On my system where updatedb is -no- problem, I have so much memory
>     I never use swap.
> 
> I'd expect the laptop crowd to mostly have a single, slow, disk, and
> hence to find updatedb more painful.

A well done swap-to-flash would help here.  I sometimes do it anyway to
a 4GB CF card but I can tell it's hitting the read/update/write cycles
on the flash blocks.  The sad thing is that it is still a speed
improvement over swapping to laptop disk.
-- 
Zan Lynx <zlynx@acm.org>

[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-25 22:05                               ` Paul Jackson
  2007-07-25 22:22                                 ` Zan Lynx
@ 2007-07-25 22:27                                 ` Jesper Juhl
  2007-07-25 22:28                                 ` [ck] " Michael Chang
  2007-07-25 23:45                                 ` André Goddard Rosa
  3 siblings, 0 replies; 484+ messages in thread
From: Jesper Juhl @ 2007-07-25 22:27 UTC (permalink / raw)
  To: Paul Jackson
  Cc: Ingo Molnar, rene.herman, Valdis.Kletnieks, david, nickpiggin,
	ray-lk, akpm, ck, linux-mm, linux-kernel

On 26/07/07, Paul Jackson <pj@sgi.com> wrote:
> > and the fact is: updatedb discards a considerable portion of the cache
> > completely unnecessarily: on a reasonably complex box no way do all the
>
> I'm wondering how much of this updatedb problem is due to poor layout
> of swap and other file systems across disk spindles.
>
> I'll wager that those most impacted by updatedb have just one disk.
>
[snip]
>
> Question:
>   Could those who have found this prefetch helps them alot say how
>   many disks they have?  In particular, is their swap on the same
>   disk spindle as their root and user files?
>

Swap prefetch helps me.

In my case I have a single (10K RPM, Ultra 160 SCSI) disk.

# fdisk -l /dev/sda

Disk /dev/sda: 36.7 GB, 36703918080 bytes
255 heads, 63 sectors/track, 4462 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sda1               1         974     7823623+  83  Linux
/dev/sda2             975        1218     1959930   83  Linux
/dev/sda3            1219        1341      987997+  82  Linux swap
/dev/sda4            1342        4462    25069432+  83  Linux

sda1 is "/", sda2 is "/usr/local/" and sda4 is "/home/"


But, I don't think updatedb is the problem, at least not just updatedb
on its own.
My machine has 2GB of RAM, so a single updatedb on its own will not
cause it to start swapping, but it does eat up a chunk of mem no doubt
about that.
The problem with updatedb is simply that it can be a contributing
factor to stuff being swapped out, but any memory hungry application
can do that - just try building an allyesconfig kernel and see how
much the linker eats towards the end.

What swap prefetch helps is not updatedb specifically, In my
experience it helps any case where you have applications running, then
start some memory hungry job that runs for a limited time, push the
previously started apps out to swap and then dies (like updatedb or a
compile job).

Without swap prefetch those apps that were pushed to swap won't be
brought back in before they are used (at which time the user is going
to have to sit there and wait for them).
With swap prefetch, the apps that got swapped out will slowly make
their way back once the mem hungry app has died and will then be fully
or partly back in memory when the user comes back to them.

That's how swap prefetch helps, it's got nothing to do with updatedb
as such - at least not as I see it.

-- 
Jesper Juhl <jesper.juhl@gmail.com>
Don't top-post  http://www.catb.org/~esr/jargon/html/T/top-post.html
Plain text mails only, please      http://www.expita.com/nomime.html

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [ck] Re: -mm merge plans for 2.6.23
  2007-07-25 22:05                               ` Paul Jackson
  2007-07-25 22:22                                 ` Zan Lynx
  2007-07-25 22:27                                 ` Jesper Juhl
@ 2007-07-25 22:28                                 ` Michael Chang
  2007-07-25 23:45                                 ` André Goddard Rosa
  3 siblings, 0 replies; 484+ messages in thread
From: Michael Chang @ 2007-07-25 22:28 UTC (permalink / raw)
  To: Paul Jackson
  Cc: Ingo Molnar, david, nickpiggin, Valdis.Kletnieks, ray-lk,
	jesper.juhl, linux-kernel, ck, linux-mm, akpm, rene.herman

On 7/25/07, Paul Jackson <pj@sgi.com> wrote:
> Question:
>   Could those who have found this prefetch helps them alot say how
>   many disks they have?  In particular, is their swap on the same
>   disk spindle as their root and user files?

I have found that swap prefetch helped on all of the four machines
machine I have, although the effect is more noticeable on machines
with slower disks. They all have one hard disk, and root and swap were
always on the same disk. I have no idea how to determine how many disk
spindles they have, but since the drives are mainly low-end consumer
models sold with low-end sub $500 PCs...

-- 
Michael Chang

Please avoid sending me Word or PowerPoint attachments. Send me ODT,
RTF, or HTML instead.
See http://www.gnu.org/philosophy/no-word-attachments.html
Thank you.

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [ck] Re: -mm merge plans for 2.6.23
  2007-07-25 22:05                               ` Paul Jackson
                                                   ` (2 preceding siblings ...)
  2007-07-25 22:28                                 ` [ck] " Michael Chang
@ 2007-07-25 23:45                                 ` André Goddard Rosa
  3 siblings, 0 replies; 484+ messages in thread
From: André Goddard Rosa @ 2007-07-25 23:45 UTC (permalink / raw)
  To: Paul Jackson
  Cc: Ingo Molnar, david, nickpiggin, Valdis.Kletnieks, ray-lk,
	jesper.juhl, linux-kernel, ck, linux-mm, akpm, rene.herman

> Question:
>   Could those who have found this prefetch helps them alot say how
>   many disks they have?  In particular, is their swap on the same
>   disk spindle as their root and user files?
>
> Answer - for me:
>   On my system where updatedb is a big problem, I have one, slow, disk.

On both desktop and laptop.

Cheers,
-- 
[]s,
André Goddard

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [ck] Re: -mm merge plans for 2.6.23
  2007-07-25 16:02                     ` Ray Lee
  2007-07-25 20:55                       ` Zan Lynx
@ 2007-07-26  1:15                       ` Matthew Hawkins
  2007-07-26  1:32                         ` Ray Lee
  2007-07-26 22:30                         ` Michael Chang
  1 sibling, 2 replies; 484+ messages in thread
From: Matthew Hawkins @ 2007-07-26  1:15 UTC (permalink / raw)
  To: Ray Lee; +Cc: linux-kernel, ck list, linux-mm

On 7/26/07, Ray Lee <ray-lk@madrabbit.org> wrote:
> I'd just like updatedb to amortize its work better. If we had some way
> to track all filesystem events, updatedb could keep a live and
> accurate index on the filesystem. And this isn't just updatedb that
> wants that, beagle and tracker et al also want to know filesystem
> events so that they can index the documents themselves as well as the
> metadata. And if they do it live, that spreads the cost out, including
> the VM pressure.

We already have this, its called inotify (and if I'm not mistaken,
beagle already uses it).  Several years ago when it was still a little
flakey patch, I built a custom filesystem indexer into an enterprise
search engine using it (I needed to pull apart Unix mbox files).  The
only trouble of course is the action is triggered immediately, which
may not always be ideal (but that's a userspace problem)

-- 
Matt

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [ck] Re: -mm merge plans for 2.6.23
  2007-07-26  1:15                       ` [ck] " Matthew Hawkins
@ 2007-07-26  1:32                         ` Ray Lee
  2007-07-26  3:16                           ` Matthew Hawkins
  2007-07-26 22:30                         ` Michael Chang
  1 sibling, 1 reply; 484+ messages in thread
From: Ray Lee @ 2007-07-26  1:32 UTC (permalink / raw)
  To: Matthew Hawkins; +Cc: linux-kernel, ck list, linux-mm

On 7/25/07, Matthew Hawkins <darthmdh@gmail.com> wrote:
> On 7/26/07, Ray Lee <ray-lk@madrabbit.org> wrote:
> > I'd just like updatedb to amortize its work better. If we had some way
> > to track all filesystem events, updatedb could keep a live and
> > accurate index on the filesystem. And this isn't just updatedb that
> > wants that, beagle and tracker et al also want to know filesystem
> > events so that they can index the documents themselves as well as the
> > metadata. And if they do it live, that spreads the cost out, including
> > the VM pressure.
>
> We already have this, its called inotify (and if I'm not mistaken,
> beagle already uses it).

Yeah, I know about inotify, but it doesn't scale.

ray@phoenix:~$ find ~ -type d | wc -l
17933
ray@phoenix:~$

That's not fun with inotify, and that's just my home directory. The
vast majority of those are quiet the vast majority of the time, which
is the crux of the problem, and why inotify isn't a great fit for
on-demand virus scanners or indexers.

Ray

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [ck] Re: -mm merge plans for 2.6.23
  2007-07-25 20:35                                     ` Ingo Molnar
@ 2007-07-26  2:32                                       ` Bartlomiej Zolnierkiewicz
  2007-07-26  4:13                                         ` Jeff Garzik
  0 siblings, 1 reply; 484+ messages in thread
From: Bartlomiej Zolnierkiewicz @ 2007-07-26  2:32 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Satyam Sharma, Rene Herman, Jos Poortvliet, david, Nick Piggin,
	Valdis.Kletnieks, Ray Lee, Jesper Juhl, linux-kernel, ck list,
	linux-mm, Paul Jackson, Andrew Morton


Hi,

Some general thoughts about submitter/maintainer responsibilities,
not necessarily connected with the recents events (I hasn't been
following them closely - some people don't have that much free time
to burn at their hands ;)...

On Wednesday 25 July 2007, Ingo Molnar wrote:
> 
> * Satyam Sharma <satyam.sharma@gmail.com> wrote:
> 
> > > concentrate on making sure that both you and the maintainer 
> > > understands the problem correctly,
> > 
> > This itself may require some "convincing" to do. What if the 
> > maintainer just doesn't recognize the problem? Note that the 
> > development model here is more about the "social" thing than purely a 
> > "technical" thing. People do handwave, possibly due to innocent 
> > misunderstandings, possibly without. Often it's just a case of seeing 
> > different reasons behind the "problematic behaviour". Or it could be a 
> > case of all of the above.
> 
> sure - but i was really not talking about from the user's perspective, 
> but from the enterprising kernel developer's perspective who'd like to 
> solve a particular problem. And the nice thing about concentrating on 
> the problem: if you do that well, it does not really matter what the 
> maintainer thinks!

Yes, this is a really good strategy to get you changes upstream (and it
works) - just make changes so perfect that nobody can really complain. :)

The only problem is that the bigger the change becomes the less likely it
is to get it perfect so for really big changes it is also useful to show
maintainer that you take responsibility of your changes (by taking bugreports
and potential review issues very seriously instead of ignoring them, past
history of your merged changes has also a big influence here) so he will
know that you won't leave him in the cold with your code when bugreports
happen and be _sure_ that they will happen with bigger changes.

> ( Talking to the maintainer can of course be of enormous help in the 
>   quest for understanding the problem and figuring out the best fix - 
>   the maintainer will most likely know more about the subject than 
>   yourself. More communication never hurts. It's an additional bonus if 
>   you manage to convince the maintainer to take up the matter for 
>   himself. It's not a given right though - a maintainer's main task is 
>   to judge code that is being submitted, to keep a subsystem running
>   smoothly and to not let it regress - but otherwise there can easily be
>   different priorities of what tasks to tackle first, and in that sense 
>   the maintainer is just one of the many overworked kernel developers 
>   who has no real obligation what to tackle first. )

Yep, and patch author should try to help maintainer understand both the
problem he is trying to fix and the solution, i.e. throwing some undocumented
patches and screaming at maintainer to merge them is not a way to go.

> If the maintainer rejects something despite it being well-reasoned, 
> well-researched and robustly implemented with no tradeoffs and 
> maintainance problems at all then it's a bad maintainer. (but from all 
> i've seen in the past few years the VM maintainers do their job pretty 
> damn fine.) And note that i _do_ disagree with them in this particular 
> swap-prefetch case, but still, the non-merging of swap-prefetch was not 
> a final decision at all. It was more of a "hm, dunno, i still dont 
> really like it - shouldnt this be done differently? Could we debug this 
> a bit better?" reaction. Yes, it can be frustrating after more than one 
> year.
> 
> > > possibly write some testcase that clearly exposes it, and
> > 
> > Oh yes -- that'll be helpful, but definitely not necessarily a 
> > prerequisite for all issues, and then you can't even expect everybody 
> > to write or test/benchmark with testcases. (oh, btw, this is assuming 
> > you do find consensus on a testcase)
> 
> no, but Con is/was certainly more than capable to write testcases and to 
> debug various scenarios. That's the way how new maintainers are found 
> within Linux: people take matters in their own hands and improve a 
> subsystem so that they'll either peacefully co-work with the other 
> maintainers or they replace them (sometimes not so peacefully - like in 
> the IDE/SATA/PATA saga).

Heh, now that you've raised IDE saga I feel obligated to stand up
and say a few words...

The latest opening of IDE saga was quite interesting in the current context
because we had exactly the reversed situation there - "independent" maintainer
and "enterprise" developer (imagine the amount of frustration on both sides)
but the root source was quite similar (inability to get changes merged).

IMO the source root of the conflict lied in coming from different perspectives
and having a bit different priorities (stabilising/cleaning current code vs
adding new features on top of pile of crap).  In such situations it is very
important to be able to stop for a moment and look at the situation from
the other person's perspective.

In summary:

The IDE-wars are the thing of the past and lets learn from IDE-world
mistakes instead of repeating them in other subsystems, OK? :)

> > > help the maintainer debug the problem.
> > 
> > Umm ... well. Should this "dance-with-the-maintainer" and all be 
> > really necessary? What you're saying is easy if a "bug" is simple and 
> > objective, with mathematically few (probably just one) possible 
> > correct solutions. Often (most often, in fact) it's a subjective issue 
> > -- could be about APIs, high level design, tradeoffs, even little 
> > implementation nits ... with one person wanting to do it one way, 
> > another thinks there's something hacky or "band-aidy" about it and a 
> > more beautiful/elegant solution exists elsewhere. I think there's a 
> > similar deadlock here (?)
> 
> you dont _have to_ cooperative with the maintainer, but it's certainly 
> useful to work with good maintainers, if your goal is to improve Linux. 
> Or if for some reason communication is not working out fine then grow 
> into the job and replace the maintainer by doing a better job.

The idea of growing into the job and replacing the maintainer by proving
the you are doing better job was viable few years ago but may not be
feasible today.

If maintainer is "enterprise" developer and maintaining is part of his
job replacing him may be not possible et all because you simply lack
the time to do the job.  You may be actually better but you can't afford
to show it and without showing it you won't replace him (catch 22).

Oh, and it could happen that if maintainer works for a distro he sticks
his competing solution to the problem to the distro kernel and suddenly
gets order of magnitude more testers and sometimes even contributors.

How are you supposed to win such competition?  [ A: You can't. ]

I'm not even mentioning the situation when the maintainer is just a genius
and one of the best kernel hackers ever (I'm talking about you actually :)
so your chances are pretty slim from the start...

> > > _Optionally_, if you find joy in it, you are also free to write a 
> > > proposed solution for that problem
> > 
> > Oh yes. But why "optionally"? This is *precisely* what the spirit of 
> > development in such open / distributed projects is ... unless Linux 
> > wants to die the same, slow, ivory-towered, miserable death that *BSD 
> > have.
> 
> perhaps you misunderstood how i meant the 'optional': it is certainly 
> not required to write a solution for every problem you are reporting. 
> Best-case the maintainer picks the issue up and solves it. Worst-case 
> you get ignored. But you always have the option to take matters into 
> your own hands and solve the problem.
> 
> > >and submit it to the maintainer.
> > 
> > Umm, ok ... pretty unlikely Linus or Andrew would take patches for any 
> > kernel subsystem (that isn't obvious/trivial) from anybody just like 
> > that, so you do need to Cc: the ones they trust (maintainer) to ensure 
> > they review/ack your work and pick it up.
> 
> actually, it happens pretty frequently, and NACK-ing perfectly 

It actually happens really rarely (there are pretty good reasons for that).

> reasonable patches is a sure way towards getting replaced as a 
> maintainer.

"reasonable" is highly subjective

> > > is the wrong way around. It might still work out fine if the 
> > > solution is correct (especially if the patch is small and obvious), 
> > > but if there are any non-trivial tradeoffs involved, or if 
> > > nontrivial amount of code is involved, you might see your patch at 
> > > the end of a really long (and constantly growing) waiting list of 
> > > patches.
> > 
> > That's the whole point. For non-trivial / non-obvious / subjective 
> > issues, the "process" you laid out above could itself become a problem 
> > ...
> 
> firstly, there's rarely any 'subjective' issue in maintainance 
> decisions, even when it comes to complex patches. The 'subjective' issue 
> becomes a factor mostly when a problem has not been researched well 
> enough, when it becomes more of a faith thing ('i believe it helps me') 
> than a fully fact-backed argument. Maintainers tend to dodge such issues 
> until they become more clearly fact-backed.

Yep.

However there is a some reasonable time limit for this dodging, two years
isn't reasonable.  By being a maintainer you frequently have to sacrifice
your own goals and instead work on other people changes first (sometimes
even on changes that you don't find particulary interesting or important).
Sure it doesn't give you the same credit you'll get for your own changes
but you're investing in people who will help you in a long-term.

Could you allow the luxury of losing these people?

The another problem is that sometimes it seems that independent developers
has to go through more hops than entreprise ones and it is really frustrating
experience for them.  There is no conspiracy here - it is only the natural
mechanism of trusting more in the code of people who you are working with more.

> providing more and more facts gradually reduces the 'judgement/taste' 
> leeway of maintainers, down to an almost algorithmic level.
> but in any case there's always the ultimate way out: prove that you can 
> do a better job yourself and replace the maintainer. But providing an 

As stated before - this is nearly impossible in some cases.

I'm not proposing any kind of justice or fair chances here I'm just saying
that in the long-term it is gonna hurt said maintainer because he will lose
talented people willing to work on the code that he maintains.

> overwhelming, irresistable body of facts in favor of a patch does the 
> trick too in 99.9% of the cases.

Now could I ask people to stop all this -ck threads and give the developers
involved in the recent events some time to calmly rethink the whole case.

Please?

Thanks,
Bart

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [ck] Re: -mm merge plans for 2.6.23
  2007-07-26  1:32                         ` Ray Lee
@ 2007-07-26  3:16                           ` Matthew Hawkins
  0 siblings, 0 replies; 484+ messages in thread
From: Matthew Hawkins @ 2007-07-26  3:16 UTC (permalink / raw)
  To: Ray Lee; +Cc: linux-kernel, ck list, linux-mm

On 7/26/07, Ray Lee <ray-lk@madrabbit.org> wrote:
> Yeah, I know about inotify, but it doesn't scale.

Yeah, the nonrecursive behaviour is a bugger.  Also I found it helped
to queue operations in userspace and execute periodically rather than
trying to execute on every single notification.  Worked well for
indexing, for virus scanning though you'd want to do some risk
analysis.

It'd be nice to have a filesystem that handled that sort of thing
internally *cough*winfs*cough*.  That was my hope for reiserfs a very
long time ago with its pluggable fs modules feature.

-- 
Matt

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: [ck] Re: -mm merge plans for 2.6.23
  2007-07-26  2:32                                       ` Bartlomiej Zolnierkiewicz
@ 2007-07-26  4:13                                         ` Jeff Garzik
  2007-07-26 10:22                                           ` Bartlomiej Zolnierkiewicz
  0 siblings, 1 reply; 484+ messages in thread
From: Jeff Garzik @ 2007-07-26  4:13 UTC (permalink / raw)
  To: Bartlomiej Zolnierkiewicz
  Cc: Ingo Molnar, Satyam Sharma, Rene Herman, Jos Poortvliet, david,
	Nick Piggin, Valdis.Kletnieks, Ray Lee, Jesper Juhl,
	linux-kernel, ck list, linux-mm, Paul Jackson, Andrew Morton

Bartlomiej Zolnierkiewicz wrote:
> On Wednesday 25 July 2007, Ingo Molnar wrote:
>> you dont _have to_ cooperative with the maintainer, but it's certainly 
>> useful to work with good maintainers, if your goal is to improve Linux. 
>> Or if for some reason communication is not working out fine then grow 
>> into the job and replace the maintainer by doing a better job.
> 
> The idea of growing into the job and replacing the maintainer by proving
> the you are doing better job was viable few years ago but may not be
> feasible today.

IMO...  Tejun is an excellent counter-example.  He showed up as an 
independent developer, put a bunch of his own spare time and energy into 
the codebase, and is probably libata's main engineer (in terms of code 
output) today.  If I get hit by a bus tomorrow, I think the Linux 
community would be quite happy with him as the libata maintainer.


> The another problem is that sometimes it seems that independent developers
> has to go through more hops than entreprise ones and it is really frustrating
> experience for them.  There is no conspiracy here - it is only the natural
> mechanism of trusting more in the code of people who you are working with more.

I think Tejun is a counter-example here too :)  Everyone's experience is 
different, but from my perspective, Tejun "appeared out of nowhere" 
producing good code, and so, it got merged rapidly.

Personally, for merging code, I tend to trust people who are most in 
tune with "the Linux Way(tm)."  It is hard to quantify, but quite often, 
independent developers "get it" when enterprise developers do not.


> Now could I ask people to stop all this -ck threads and give the developers
> involved in the recent events some time to calmly rethink the whole case.

Indeed...

	Jeff



^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-25 16:09                     ` Ray Lee
@ 2007-07-26  4:57                       ` Andrew Morton
  2007-07-26  5:53                         ` Nick Piggin
                                           ` (4 more replies)
  0 siblings, 5 replies; 484+ messages in thread
From: Andrew Morton @ 2007-07-26  4:57 UTC (permalink / raw)
  To: Ray Lee
  Cc: Nick Piggin, Eric St-Laurent, Rene Herman, Jesper Juhl, ck list,
	Ingo Molnar, Paul Jackson, linux-mm, linux-kernel

On Wed, 25 Jul 2007 09:09:01 -0700
"Ray Lee" <ray-lk@madrabbit.org> wrote:

> No, there's a third case which I find the most annoying. I have
> multiple working sets, the sum of which won't fit into RAM. When I
> finish one, the kernel had time to preemptively swap back in the
> other, and yet it didn't. So, I sit around, twiddling my thumbs,
> waiting for my music player to come back to life, or thunderbird,
> or...

Yes, I'm thinking that's a good problem statement and it isn't something
which the kernel even vaguely attempts to address, apart from normal
demand paging.

We could perhaps improve things with larger and smarter fault readaround,
perhaps guided by refault-rate measurement.  But that's still demand-paged
rather than being proactive/predictive/whatever.

None of this is swap-specific though: exactly the same problem would need
to be solved for mmapped files and even plain old pagecache.

In fact I'd restate the problem as "system is in steady state A, then there
is a workload shift causing transition to state B, then the system goes
idle.  We now wish to reinstate state A in anticipation of a resumption of
the original workload".

swap-prefetch solves a part of that.

A complete solution for anon and file-backed memory could be implemented
(ta-da) in userspace using the kernel inspection tools in -mm's maps2-*
patches.  We would need to add a means by which userspace can repopulate
swapcache, but that doesn't sound too hard (especially when you haven't
thought about it).

And userspace can right now work out which pages from which files are in
pagecache so this application can handle pagecache, swap and file-backed
memory.  (file-backed memory might not even need special treatment, given
that it's pagecache anyway).


And userspace can do a much better implementation of this
how-to-handle-large-load-shifts problem, because it is really quite
complex.  The system needs to be monitored to determine what is the "usual"
state (ie: the thing we wish to reestablish when the transient workload
subsides).  The system then needs to be monitored to determine when the
exceptional workload has started, and when it has subsided, and userspace
then needs to decide when to start reestablishing the old working set, at
what rate, when to abort doing that, etc.

All this would end up needing runtime configurability and tweakability and
customisability.  All standard fare for userspace stuff - much easier than
patching the kernel.


So.  We can

a) provide a way for userspace to reload pagecache and

b) merge maps2 (once it's finished) (pokes mpm)

and we're done?

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-26  4:57                       ` Andrew Morton
@ 2007-07-26  5:53                         ` Nick Piggin
  2007-07-26  6:06                           ` Andrew Morton
  2007-07-26  6:33                         ` Ray Lee
                                           ` (3 subsequent siblings)
  4 siblings, 1 reply; 484+ messages in thread
From: Nick Piggin @ 2007-07-26  5:53 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ray Lee, Eric St-Laurent, Rene Herman, Jesper Juhl, ck list,
	Ingo Molnar, Paul Jackson, linux-mm, linux-kernel

Andrew Morton wrote:

> All this would end up needing runtime configurability and tweakability and
> customisability.  All standard fare for userspace stuff - much easier than
> patching the kernel.
> 
> 
> So.  We can
> 
> a) provide a way for userspace to reload pagecache and
> 
> b) merge maps2 (once it's finished) (pokes mpm)
> 
> and we're done?


The userspace solution has been brought up before. It could be a good way
to go. I was thinking about how to do refetching of file backed pages from
the kernel, and it isn't impossible, but it it seems like locking would be
quite hard and it would be pretty complex and inflexible compared to a
userspace solution. Userspace might know what to chuck out, what to keep,
what access patterns to use...

Not that I want to say anything about swap prefetch getting merged: my
inbox is already full of enough "helpful suggestions" about that, so I'll
just be happy to have a look at little things like updatedb.

-- 
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-26  5:53                         ` Nick Piggin
@ 2007-07-26  6:06                           ` Andrew Morton
  2007-07-26  6:17                             ` Nick Piggin
  0 siblings, 1 reply; 484+ messages in thread
From: Andrew Morton @ 2007-07-26  6:06 UTC (permalink / raw)
  To: Nick Piggin
  Cc: Ray Lee, Eric St-Laurent, Rene Herman, Jesper Juhl, ck list,
	Ingo Molnar, Paul Jackson, linux-mm, linux-kernel

On Thu, 26 Jul 2007 15:53:37 +1000 Nick Piggin <nickpiggin@yahoo.com.au> wrote:

> Not that I want to say anything about swap prefetch getting merged: my
> inbox is already full of enough "helpful suggestions" about that,

give them the kernel interfaces, they can do it themselves ;)

> so I'll
> just be happy to have a look at little things like updatedb.

Yes, that is a little thing.  I mean, even if the kernel's behaviour
during an updatedb run was "perfect" (ie: does what we the designers
curently intend it to do (whatever that is)) then the core problem isn't
solved: short-term workload evicts your working set and you have to
synchronously reestablish it.

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-26  6:06                           ` Andrew Morton
@ 2007-07-26  6:17                             ` Nick Piggin
  0 siblings, 0 replies; 484+ messages in thread
From: Nick Piggin @ 2007-07-26  6:17 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Ray Lee, Eric St-Laurent, Rene Herman, Jesper Juhl, ck list,
	Ingo Molnar, Paul Jackson, linux-mm, linux-kernel

Andrew Morton wrote:
> On Thu, 26 Jul 2007 15:53:37 +1000 Nick Piggin <nickpiggin@yahoo.com.au> wrote:
> 
> 
>>Not that I want to say anything about swap prefetch getting merged: my
>>inbox is already full of enough "helpful suggestions" about that,
> 
> 
> give them the kernel interfaces, they can do it themselves ;)

It is a good idea if we can give enough to get started. Then if
they run into something they really need to do in the kernel, we
can take a look.

Page eviction order / prefetch-back-in-order might be tricky to
expose.


>>so I'll
>>just be happy to have a look at little things like updatedb.
> 
> 
> Yes, that is a little thing.  I mean, even if the kernel's behaviour
> during an updatedb run was "perfect" (ie: does what we the designers
> curently intend it to do (whatever that is)) then the core problem isn't
> solved: short-term workload evicts your working set and you have to
> synchronously reestablish it.

Sure, I know and I was never against swap (and/or file) prefetching to
solve this problem. I'm just saying, I'm staying out of that :)

-- 
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-26  4:57                       ` Andrew Morton
  2007-07-26  5:53                         ` Nick Piggin
@ 2007-07-26  6:33                         ` Ray Lee
  2007-07-26  6:50                           ` Andrew Morton
  2007-07-26 14:19                         ` [ck] " Michael Chang
                                           ` (2 subsequent siblings)
  4 siblings, 1 reply; 484+ messages in thread
From: Ray Lee @ 2007-07-26  6:33 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nick Piggin, Eric St-Laurent, Rene Herman, Jesper Juhl, ck list,
	Ingo Molnar, Paul Jackson, linux-mm, linux-kernel

On 7/25/07, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Wed, 25 Jul 2007 09:09:01 -0700
> "Ray Lee" <ray-lk@madrabbit.org> wrote:
>
> > No, there's a third case which I find the most annoying. I have
> > multiple working sets, the sum of which won't fit into RAM. When I
> > finish one, the kernel had time to preemptively swap back in the
> > other, and yet it didn't. So, I sit around, twiddling my thumbs,
> > waiting for my music player to come back to life, or thunderbird,
> > or...
>
> Yes, I'm thinking that's a good problem statement and it isn't something
> which the kernel even vaguely attempts to address, apart from normal
> demand paging.
>
> We could perhaps improve things with larger and smarter fault readaround,
> perhaps guided by refault-rate measurement.  But that's still demand-paged
> rather than being proactive/predictive/whatever.
>
> None of this is swap-specific though: exactly the same problem would need
> to be solved for mmapped files and even plain old pagecache.

<nod> Could be what I'm noticing, but it's important to note that as
others have shown improvement with Con's swap prefetch, it's easily
arguable that targeting just swap is good enough for a first
approximation.

> In fact I'd restate the problem as "system is in steady state A, then there
> is a workload shift causing transition to state B, then the system goes
> idle.  We now wish to reinstate state A in anticipation of a resumption of
> the original workload".

Yes, that's a fair transformation / generalization. It's always nice
talking to someone with more clarity than one's self.

> swap-prefetch solves a part of that.
>
> A complete solution for anon and file-backed memory could be implemented
> (ta-da) in userspace using the kernel inspection tools in -mm's maps2-*
> patches.
> We would need to add a means by which userspace can repopulate
> swapcache,

Okay, let's run with that for argument's sake.

> but that doesn't sound too hard (especially when you haven't
> thought about it).

I've always thought your sense of humor was underappreciated.

> And userspace can right now work out which pages from which files are in
> pagecache so this application can handle pagecache, swap and file-backed
> memory.  (file-backed memory might not even need special treatment, given
> that it's pagecache anyway).

So in your proposed scheme, would userspace be polling, er, <goes and
looks through email for maps2 stuff, only finds Rusty's patches to
it>, well, /proc/<pids>/something_or_another?

A userspace daemon that wakes up regularly to poll a bunch of proc
files fills me with glee. Wait, is that glee? I think, no... wait...
horror, yes, horror is what I'm feeling.

I'm wrong, right? I love being wrong about this kind of stuff.

> And userspace can do a much better implementation of this
> how-to-handle-large-load-shifts problem, because it is really quite
> complex.  The system needs to be monitored to determine what is the "usual"
> state (ie: the thing we wish to reestablish when the transient workload
> subsides).  The system then needs to be monitored to determine when the
> exceptional workload has started, and when it has subsided, and userspace
> then needs to decide when to start reestablishing the old working set, at
> what rate, when to abort doing that, etc.

Oy. I mean this in the most respectful way possible, but you're too
smart for your own good.

I mean, sure, it's possible one could have multiply-chained transient
workloads each of which have their optimum workingset, of which
there's little overlap with the previous. Mainframes made their names
on such loads. Workingset A starts, generates data, finishes and
invokes workingset B, of which the only thing they share in common is
said data. B finishes and invokes C, etc.

So, yeah, that's way too complex to stuff into the kernel. Even if it
were possible to do so, I cringe at the thought. And I can't believe
that would be a common enough pattern nowadays to justify any
hueristics on anyone's part. It's certainly complex enough that I'd
like to punt that scenario out of the conversation entirely -- I think
it has the potential to give a false impression as to how involved of
a process we're talking about here.

Let's go back to your restatement:

> In fact I'd restate the problem as "system is in steady state A, then there
> is a workload shift causing transition to state B, then the system goes
> idle.  We now wish to reinstate state A in anticipation of a resumption of
> the original workload".

I'll take an 80% solution for that one problem, and happily declare
that the kernel's job is done. In particular, when a resource hog
exits (or whatever hueristics prefetch is currently hooking in to),
the kernel (or userspace, if that interface could be made sane) could
exercise a completely workload agnostic refetch of the last n things
evicted, where n is determined by what's suddenly become free (or
whatever Con came up with).

Just, y'know, MRU style.

> All this would end up needing runtime configurability and tweakability and
> customisability.  All standard fare for userspace stuff - much easier than
> patching the kernel.

We're talking about patching the kernel for whatever API you're coming
up with to repopulate pagecache, swap, and inodes, aren't we? If we
are, it doesn't seem like we're saving any work here. Also we're
talking about a creating a new user-visible API instead of augmenting
a pre-existing hueristic -- page replacement -- that the kernel
doesn't export and so can change at a moment's notice. Augmenting an
opaque hueristic seems a lot more friendly to long-term maintenance.

> So.  We can
>
> a) provide a way for userspace to reload pagecache and
>
> b) merge maps2 (once it's finished) (pokes mpm)
>
> and we're done?

Eh, dunno. Maybe?

We're assuming we come up with an API for userspace to get
notifications of evictions (without polling, though poll() would be
fine -- you know what I mean), and an API for re-victing those things
on demand. If you think that adding that API and maintaining it is
simpler/better than including a variation on the above hueristic I
offered, then yeah, I guess we are. It'll all have that vague
userspace s2ram odor about it, but I'm sure it could be made to work.

As I think I've successfully Peter Principled my way through this
conversation to my level of incompetence, I'll shut up now.

Ray

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-26  6:33                         ` Ray Lee
@ 2007-07-26  6:50                           ` Andrew Morton
  2007-07-26  7:43                             ` Ray Lee
  2007-07-28  0:24                             ` Matt Mackall
  0 siblings, 2 replies; 484+ messages in thread
From: Andrew Morton @ 2007-07-26  6:50 UTC (permalink / raw)
  To: Ray Lee
  Cc: Nick Piggin, Eric St-Laurent, Rene Herman, Jesper Juhl, ck list,
	Ingo Molnar, Paul Jackson, linux-mm, linux-kernel

On Wed, 25 Jul 2007 23:33:24 -0700 "Ray Lee" <ray-lk@madrabbit.org> wrote:

> > So.  We can
> >
> > a) provide a way for userspace to reload pagecache and
> >
> > b) merge maps2 (once it's finished) (pokes mpm)
> >
> > and we're done?
> 
> Eh, dunno. Maybe?
> 
> We're assuming we come up with an API for userspace to get
> notifications of evictions (without polling, though poll() would be
> fine -- you know what I mean), and an API for re-victing those things
> on demand.

I was assuming that polling would work OK.  I expect it would.

> If you think that adding that API and maintaining it is
> simpler/better than including a variation on the above hueristic I
> offered, then yeah, I guess we are. It'll all have that vague
> userspace s2ram odor about it, but I'm sure it could be made to work.

Actually, I overdesigned the API, I suspect.  What we _could_ do is to
provide a way of allowing userspace to say "pretend process A touched page
B": adopt its mm and go touch the page.  We in fact already have that:
PTRACE_PEEKTEXT.

So I suspect this could all be done by polling maps2 and using PEEKTEXT. 
The tricky part would be working out when to poll, and when to reestablish.

A neater implementation than PEEKTEXT would be to make the maps2 files
writeable(!) so as a party trick you could tar 'em up and then, when you
want to reestablish firefox's previous working set, do a untar in
/proc/$(pidof firefox)/


^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-26  6:50                           ` Andrew Morton
@ 2007-07-26  7:43                             ` Ray Lee
  2007-07-26  7:59                               ` Nick Piggin
  2007-07-28  0:24                             ` Matt Mackall
  1 sibling, 1 reply; 484+ messages in thread
From: Ray Lee @ 2007-07-26  7:43 UTC (permalink / raw)
  To: Andrew Morton
  Cc: Nick Piggin, Eric St-Laurent, Rene Herman, Jesper Juhl, ck list,
	Ingo Molnar, Paul Jackson, linux-mm, linux-kernel

On 7/25/07, Andrew Morton <akpm@linux-foundation.org> wrote:
> On Wed, 25 Jul 2007 23:33:24 -0700 "Ray Lee" <ray-lk@madrabbit.org> wrote:
> > If you think that adding that API and maintaining it is
> > simpler/better than including a variation on the above hueristic I
> > offered, then yeah, I guess we are. It'll all have that vague
> > userspace s2ram odor about it, but I'm sure it could be made to work.
>
> Actually, I overdesigned the API, I suspect.  What we _could_ do is to
> provide a way of allowing userspace to say "pretend process A touched page
> B": adopt its mm and go touch the page.  We in fact already have that:
> PTRACE_PEEKTEXT.

Huh. All right.

> So I suspect this could all be done by polling maps2 and using PEEKTEXT.
> The tricky part would be working out when to poll, and when to reestablish.

Welllllll.... there is the taskstats interface. It's not required
right now, though, and lacks most of what userspace would need, I
think. It does at least currently provide a notification of process
exit, which is a clue for when to start reestablishment. Gotta be
another way we can get at that...

Oh, stat on /proc, does that work? Huh, it does, sort of. It seems to
be off by 12 or 13, but hey, that's something.

Wish I had the time to look at the maps2 stuff, but regardless, it
probably currently provides too much detail for continual polling? I
suspect what we'd want to do is to take a detailed snapshot a little
after the beginning of a process's lifetime (once the block-in counts
subside), then poll aggregate residency or evicition counts to know
which processes are suffering the burden of the transient workload.

Eh, wait, that doesn't help with inodes. No matter, I guess; I'm the
one who said targetting swap-in would be good enough for a first pass.

On process exit, if userspace can get a hold of an estimate of the
size of what just freed up, it could then spend
min(that,evicted_count) on repopulation. That's probably already
available by polling whatever `free` calls.

> A neater implementation than PEEKTEXT would be to make the maps2 files
> writeable(!) so as a party trick you could tar 'em up and then, when you
> want to reestablish firefox's previous working set, do a untar in
> /proc/$(pidof firefox)/

I'm going to get into trouble if I wake up the other person in the
house with my laughter. That's laughter in a positive sense, not a
"you're daft" kind of way.

Huh. <thinks> So, to go back a little bit, I guess one of my problems
with polling is that it means that userspace can only approximate an
MRU of what's been evicted. Perhaps an approximation is good enough, I
don't know, but that's something to keep in mind. (Hmm, how many pages
can an average desktop evict per second? If we poll everything once
per second, that's how off we could be.)

Another is a more philosophical hangup -- running a process that polls
periodically to improve system performance seems backward. Okay, so
that's my problem to get over, not yours.

Another problem is what poor sod would be willing to write and test
this, given that there's already a written and tested kernel patch to
do much the same thing? Yeah, that's sorta rhetorical, but it's sorta
not. Given that swap prefetch could be ripped out of 2.6.n+1 if it's
introduced in 2.6.n, and nothing in userspace would be the wiser,
where's the burden?  There is some, just as any kernel code has some,
and as it's core code (versus, say, a driver), the burden is
correspondingly greater per line, but given the massive changesets
flowing through each release now, I have to think that the burden this
introduces is marginal compared to the rest of the bulk sweeping
through the kernel weekly.

This is obviously where I'm totally conjecturing, and you'll know far,
far better than I.

Offline for about 20 hours or so, not that anyone would probably notice :-).

Ray

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-26  7:43                             ` Ray Lee
@ 2007-07-26  7:59                               ` Nick Piggin
  0 siblings, 0 replies; 484+ messages in thread
From: Nick Piggin @ 2007-07-26  7:59 UTC (permalink / raw)
  To: Ray Lee
  Cc: Andrew Morton, Eric St-Laurent, Rene Herman, Jesper Juhl,
	ck list, Ingo Molnar, Paul Jackson, linux-mm, linux-kernel

Ray Lee wrote:

> Another is a more philosophical hangup -- running a process that polls
> periodically to improve system performance seems backward.

You mean like the kprefetchd of swap prefetch? ;)


> Okay, so
> that's my problem to get over, not yours.

If it was a problem you could add some event trigger to wake it up.

-- 
SUSE Labs, Novell Inc.

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-25 20:46               ` Andi Kleen
@ 2007-07-26  8:38                 ` Frank Kingswood
  2007-07-26  9:20                   ` Ingo Molnar
  0 siblings, 1 reply; 484+ messages in thread
From: Frank Kingswood @ 2007-07-26  8:38 UTC (permalink / raw)
  To: linux-kernel; +Cc: ck, linux-mm

Andi Kleen wrote:
> One simple way to fix this would be to implement a fadvise() flag
> that puts the dentry/inode on a "soon to be expired" list if there
> are no other references. Then if a dentry allocation needs more
> memory try to reuse dentries from that list (or better queue) first. Any other
> access will remove the dentry from the list. 
> 
> Disadvantage would be that the userland would need to be patched,
> but I guess it's better than adding very dubious heuristics to the
> kernel.

Are you going to change every single large memory application in the 
world? As I wrote before, it is *not* about updatedb, but about all 
applications that use a lot of memory, and then terminate.

Frank


^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-26  8:38                 ` Frank Kingswood
@ 2007-07-26  9:20                   ` Ingo Molnar
  2007-07-26  9:34                     ` Andrew Morton
  0 siblings, 1 reply; 484+ messages in thread
From: Ingo Molnar @ 2007-07-26  9:20 UTC (permalink / raw)
  To: Frank Kingswood
  Cc: Andi Kleen, Nick Piggin, Ray Lee, Jesper Juhl, Andrew Morton,
	ck list, Paul Jackson, linux-mm, linux-kernel


* Frank Kingswood <frank@kingswood-consulting.co.uk> wrote:

> > Disadvantage would be that the userland would need to be patched, 
> > but I guess it's better than adding very dubious heuristics to the 
> > kernel.
> 
> Are you going to change every single large memory application in the 
> world? As I wrote before, it is *not* about updatedb, but about all 
> applications that use a lot of memory, and then terminate.

it is about multiple problems, _one_ problem is updatedb. The _second_ 
problem is large memory applications.

note that updatedb is not a "large memory application". It simply scans 
through the filesystem and has pretty minimal memory footprint.

the _kernel_ ends up blowing up the dentry cache to a rather large size 
(because it has no idea that updatedb uses every dentry only once).

Once we give the kernel the knowledge that the dentry wont be used again 
by this app, the kernel can do a lot more intelligent decision and not 
baloon the dentry cache.

( we _do_ want to baloon the dentry cache otherwise - for things like 
  "find" - having a fast VFS is important. But known-use-once things 
  like the daily updatedb job can clearly be annotated properly. )

the 'large memory apps' are a second category of problems. And those are 
where swap-prefetch could indeed help. (as long as it only 'fills up' 
the free memory that a large-memory-exit left behind it.)

the 'morning after' phenomenon that the majority of testers complained 
about will likely be resolved by the updatedb change. The second 
category is likely an improvement too, for swap-happy desktop (and 
server) workloads.

	Ingo

^ permalink raw reply	[flat|nested] 484+ messages in thread

* Re: -mm merge plans for 2.6.23
  2007-07-26  9:20                   ` Ingo Molnar
@ 2007-07-26  9:34                     ` Andrew Morton
  2007-07-26  9:40                       ` RFT: updatedb "morning after" problem [was: Re: -mm merge plans for 2.6.23] Ingo Molnar
  0 siblings, 1 reply; 4