LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* 2.6.0-test9-mm3
@ 2003-11-13  7:30 Andrew Morton
  2003-11-13 20:03 ` [PATCH] linux-2.6.0-test9-mm3_verbose-timesource-acpi-pm_A0 john stultz
                   ` (4 more replies)
  0 siblings, 5 replies; 49+ messages in thread
From: Andrew Morton @ 2003-11-13  7:30 UTC (permalink / raw)
  To: linux-kernel, linux-mm


http://www.zip.com.au/~akpm/linux/patches/2.6.0-test9-mm3.gz

  kernel.org is being slow.  Will appear at:

ftp://ftp.kernel.org/pub/linux/kernel/people/akpm/patches/2.6/2.6.0-test9/2.6.0-test9-mm3/

- Various new fixes; generally uncritical ones.

- Significant changes to the AIO and direct-io code.  This needs beating
  on; hopefully we're now close to a solution to the fairly complex problems
  in there.

- Several ext2 and ext3 allocator fixes.  These need serious testing on big
  SMP.

- Anyone who has patches in here which they think should go into 2.6.0,
  please retest them in -mm3 and let me know, thanks.



 linus.patch

 Latest Linus tree

-as-badness-warning-fix.patch
-3c509-mca-fix.patch
-ext2-allocation-fix.patch
-ohci-locking-fix.patch
-disable-ide-tcq.patch
-via-quirk-fix.patch
-raid1-recovery-fix.patch
-journal_remove_journal_head-assertion-fix.patch
-x86_64-tss-limit-fix.patch
-keyboard-repeat-rate-setting-fix.patch
-aio-refcounting-fix.patch

 Merged

-RD16-rest-B6.patch

 Al said to drop this.

+cramfs-use-pagecache.patch

 cramfs fixes

-ia32-MSI-support-tweaks.patch

 Folded into ia32-MSI-support.patch

+ia32-MSI-support-x86_64-fixes.patch

 x86_64 build fix

-ia32-efi-asm-warning-fix.patch
-ia32-efi-support-mem-equals-fix.patch
-CONFIG_ACPI_EFI-defaults-off.patch
-ia32-efi-support-warning-fixes.patch
-ia32-efi-support-tidy.patch
-ia32-efi-other-arch-fix.patch
-efi-constant-sizing-fix.patch
-ia32-efi-config-option.patch
-ia32-efi-config-option-tweaks.patch
-ia32-efi-config-help-update.patch
-ia64-CONFIG_EFI-update.patch

 Folded into ia32-efi-support.patch

+ia64-ia32-missing-compat-syscalls.patch
+compat-layer-fixes.patch

 32-bit compat layer fixes

+compat-ioctl-for-i2c.patch

 compat layer for i2c (old version)

+loop-bio-handling-fix.patch

 Loop driver fixlet

-gcc-Os-if-embedded-better-help.patch

 Folded into gcc-Os-if-embedded.patch

+as-request-poisoning-fix.patch
+as-fix-all-known-bugs.patch

 Anticipatory scheduler fixes.

+more-than-256-cpus.patch

 cpumask fixes for huge SMP

+acpi-pm-timer.patch
+acpi-pm-timer-fixes.patch

 Yet another timer source for ia32

+ZONE_SHIFT-from-NODES_SHIFT.patch

 Memory zone arith fixup

+ext2_new_inode-fixes.patch
+ext2_new_inode-fixes-tweaks.patch
+remove-ext2_reverve_inode.patch

 ext2 fixes

+memmove-speedup.patch

 Make memmove() faster.

+percpu-counter-linkage-fix.patch

 Fix the build for when ext2 and ext3 are modular

+ide-scsi-warnings.patch

 Print warnings when someone tries to use ide-scsi for a cdrom

+pipe-readv-writev.patch

 pipe readv() and writev() correctness fix and speedup

+ext3_new_inode-scan-fix.patch

 ext3 inode allocator fix

+lockless-semop.patch

 sysv semaphore SMP speedup

+percpu_counter-use-alloc_percpu.patch

 Fix the percpu counters for huge SMP.

+i450nx-scanning-fix.patch

 PCI bridge fix for i450nx chipset machines

+serio-pm-fix.patch

 Fix psmouse PM resume

+find_busiest_queue-commentary.patch

 CPU scheduler comments

+ext2-block-allocator-fixes.patch

 More ext2 allocator fixes.

+SOUND_CMPCI-config-typo-fix.patch

 Sound driver config fix

+atkbd-24-compatibility.patch

 Make AT keyboard userspace interface compatible with 2.4's.

+init_h-needs-compiler_h.patch
+init_h-needs-compiler_h-fix.patch

 Compile fix

+cpu_sibling_map-fix.patch

 cpu_sibling_map is broken on summit.

+tulip-hash-fix.patch

 Fix multicast hash generation for some tulips

+context-switch-accounting-fix.patch

 Fix CPU scheduler beancounting with CONFIG_PREEMPT.

+access-vfs_permission-fix.patch

 Fix access()

+eicon-linkage-fix.patch

 ISDM build fix

+kobject-docco-additions.patch

 Documentation additions.

-O_DIRECT-race-fixes-rework-XFS-fix.patch
-O_DIRECT-race-fixes-rework-XFS-fix-fix.patch

 Folded into O_DIRECT-race-fixes-rollup.patch

+dio-aio-fixes.patch
+dio-aio-fixes-fixes.patch

 AIO/direct-io fixes

+promise-sata-id.patch

 Additional STAT PCI ID.




All 201 patches


linus.patch

mm.patch
  add -mmN to EXTRAVERSION

kgdb-ga.patch
  kgdb stub for ia32 (George Anzinger's one)
  kgdbL warning fix

kgdb-buff-too-big.patch
  kgdb buffer overflow fix

kgdb-warning-fix.patch
  kgdbL warning fix

kgdb-build-fix.patch

kgdb-spinlock-fix.patch

kgdb-fix-debug-info.patch
  kgdb: CONFIG_DEBUG_INFO fix

kgdb-cpumask_t.patch

kgdb-x86_64-fixes.patch
  x86_64 fixes

kgdb-over-ethernet.patch
  kgdb-over-ethernet patch

kgdb-over-ethernet-fixes.patch
  kgdb-over-ethernet fixlets

kgdb-CONFIG_NET_POLL_CONTROLLER.patch
  kgdb: replace CONFIG_KGDB with CONFIG_NET_RX_POLL in net drivers

kgdb-handle-stopped-NICs.patch
  kgdb: handle netif_stopped NICs

eepro100-poll-controller.patch

tlan-poll_controller.patch

tulip-poll_controller.patch

tg3-poll_controller.patch
  kgdb: tg3 poll_controller

8139too-poll_controller.patch
  8139too poll controller

kgdb-eth-smp-fix.patch
  kgdb-over-ethernet: fix SMP

kgdb-eth-reattach.patch

kgdb-skb_reserve-fix.patch
  kgdb-over-ethernet: skb_reserve() fix

must-fix.patch

should-fix.patch

must-fix-update-01.patch
  must fix lists update

RD1-cdrom_ioctl-B6.patch

RD2-ioctl-B6.patch

RD2-ioctl-B6-fix.patch
  RD2-ioctl-B6 fixes

RD3-cdrom_open-B6.patch

RD4-open-B6.patch

RD5-cdrom_release-B6.patch

RD6-release-B6.patch

RD7-presto_journal_close-B6.patch

RD8-f_mapping-B6.patch

RD9-f_mapping2-B6.patch

RD10-i_sem-B6.patch

RD11-f_mapping3-B6.patch

RD12-generic_osync_inode-B6.patch

RD13-bd_acquire-B6.patch

RD14-generic_write_checks-B6.patch

RD15-I_BDEV-B6.patch

cramfs-use-pagecache.patch
  cramfs: use pagecache better

invalidate_inodes-speedup.patch
  invalidate_inodes speedup

invalidate_inodes-speedup-fixes-2.patch
  more invalidate_inodes speedup fixes

serio-01-renaming.patch
  serio: rename serio_[un]register_slave_port to __serio_[un]register_port

serio-02-race-fix.patch
  serio: possible race between port removal and kseriod

serio-03-blacklist.patch
  Add black list to handler<->device matching

serio-04-synaptics-cleanup.patch
  Synaptics: code cleanup

serio-05-reconnect-facility.patch
  serio: reconnect facility

serio-06-synaptics-use-reconnect.patch
  Synaptics: use serio_reconnect

acpi_off-fix.patch
  fix acpi=off

cfq-4.patch
  CFQ io scheduler
  CFQ fixes

config_spinline.patch
  uninline spinlocks for profiling accuracy.

ppc64-bar-0-fix.patch
  Allow PCI BARs that start at 0

ppc64-reloc_hide.patch

sym-do-160.patch
  make the SYM driver do 160 MB/sec

input-use-after-free-checks.patch
  input layer debug checks

aic7xxx-parallel-build-fix.patch
  fix parallel builds for aic7xxx

ramdisk-cleanup.patch

intel8x0-cleanup.patch
  intel8x0 cleanups

pdflush-diag.patch

kobject-oops-fixes.patch
  fix oopses is kobject parent is removed before child

futex-uninlinings.patch
  futex uninlining

zap_page_range-debug.patch
  zap_page_range() debug

call_usermodehelper-retval-fix-3.patch
  Make call_usermodehelper report exit status

asus-L5-fix.patch
  Asus L5 framebuffer fix

jffs-use-daemonize.patch

tulip-NAPI-support.patch
  tulip NAPI support

tulip-napi-disable.patch
  tulip NAPI: disable poll in close

get_user_pages-handle-VM_IO.patch

ia32-MSI-support.patch
  Updated ia32 MSI Patches

ia32-MSI-support-x86_64-fixes.patch

ia32-efi-support.patch
  EFI support for ia32
  efi warning fix
  fix EFI for ppc64, ia64
  efi: warning fixes
  ia32 EFI: Add CONFIG_EFI
  efi: Update Kconfig help
  efi update patch (ia64)

support-zillions-of-scsi-disks.patch
  support many SCSI disks

SGI-IOC4-IDE-chipset-support.patch
  Add support for SGI's IOC4 chipset

sparc32-sched_clock.patch

pcibios_test_irq-fix.patch
  Fix pcibios test IRQ handler return

fixmap-in-proc-pid-maps.patch
  report user-readable fixmap area in /proc/PID/maps

i82365-sysfs-ordering-fix.patch
  Fix init_i82365 sysfs ordering oops

pci_set_power_state-might-sleep.patch

ia64-ia32-missing-compat-syscalls.patch
  From: Arun Sharma <arun.sharma@intel.com>
  Subject: Missing compat syscalls in ia64

compat-layer-fixes.patch
  Minor bug fixes to the compat layer

compat-ioctl-for-i2c.patch
  compat_ioctl for i2c

compat_ioctl-cleanup.patch
  cleanup of compat_ioctl functions

fix-sqrt.patch
  sqrt() fixes

scale-min_free_kbytes.patch
  scale the initial value of min_free_kbytes

cdrom-allocation-try-harder.patch
  Use __GFP_REPEAT for cdrom buffer

sym-2.1.18f.patch

CONFIG_STANDALONE-default-to-n.patch
  Make CONFIG_STANDALONE default to N

extra-buffer-diags.patch

nosysfs.patch

constant_test_bit-doesnt-like-zwanes-gcc.patch
  gcc bug workaround for constant_test_bit()

slab-leak-detector.patch
  slab leak detector

early-serial-registration-fix.patch
  serial console registration bugfix

3c527-smp-update.patch
  SMP support on 3c527 net driver

3c527-race-fix.patch

ext3-latency-fix.patch
  ext3 scheduling latency fix

videobuf_waiton-race-fix.patch

firmware-kernel_thread-on-demand.patch
  Remove workqueue usage from request_firmware_async()

loop-autoloading-fix.patch
  Fix loop module auto loading

loop-module-alias.patch
  loop needs MODULE_ALIAS_BLOCK

loop-remove-blkdev-special-case.patch

loop-highmem.patch
  remove useless highmem bounce from loop/cryptoloop

loop-highmem-fixes.patch

loop-bio-handling-fix.patch
  loop: BIO handling fix

cmpci-set_fs-fix.patch
  cmpci.c: remove pointless set_fs()

dentry-bloat-fix-2.patch
  Fix dcache and icache bloat with deep directories

nls-config-fixes.patch
  NSL config fixes

proc_pid_lookup-vs-exit-race-fix.patch
  Fix proc_pid_lookup vs exit race

gcc-Os-if-embedded.patch
  Add `gcc -Os' config option

aic7xxx-sleep-in-spinlock-fix.patch

vm86-sysenter-fix.patch
  Fix sysenter disabling in vm86 mode

gettimeofday-resolution-fix.patch
  gettimeofday resolution fix

refill_counter-overflow-fix.patch
  vmscan: reset refill_counter after refilling the inactive list

verbose-timesource.patch
  be verbose about the time source

as-regression-fix.patch
  Fix IO scheduler regression

as-request-poisoning.patch
  AS: request poisoning

as-request-poisoning-fix.patch
  AS: request poisining fix

as-fix-all-known-bugs.patch
  AS fixes

as-new-process-estimation.patch
  AS: new process estimation

as-cooperative-thinktime.patch
  AS: thinktime improvement

scale-nr_requests.patch
  scale nr_requests with TCQ depth

truncate_inode_pages-check.patch

local_bh_enable-warning-fix.patch

cdc-acm-softirq-rx.patch
  cdc-acm: move rx processing to softirq

forcedeth.patch
  forcedeth: nForce ethernet driver

reiserfs-pinned-buffer-fix.patch
  reiserfs pinned buffer fix

proc-pid-maps-output-fix.patch
  Restore /proc/pid/maps formatting

atomic_dec-debug.patch
  atomic_dec debug

sis900-pm-support.patch
  Add PM support to sis900 network driver

8139too-locking-fix.patch
  8139too locking fix

ia32-wp-test-cleanup.patch
  ia32 WP test cleanup

hugetlb-needs-pse.patch
  ia32: hugetlb needs pse

powermate-payload-size-fix.patch
  Griffin Powermate fix

more-than-256-cpus.patch
  Fix for more than 256 CPUs

acpi-pm-timer.patch
  ACPI PM Timer

acpi-pm-timer-fixes.patch
  ACPI PM-Timer fixes

ZONE_SHIFT-from-NODES_SHIFT.patch
  Use NODES_SHIFT to calculate ZONE_SHIFT

ext2_new_inode-fixes.patch
  Fix bugs in ext2_new_inode()

ext2_new_inode-fixes-tweaks.patch
  ext2_new_inode: more tweaking

remove-ext2_reverve_inode.patch

memmove-speedup.patch
  optimize ia32 memmove

percpu-counter-linkage-fix.patch
  fix percpu_counter_mod linkage problem

ide-scsi-warnings.patch
  ide-scsi: warn when used for cdroms

pipe-readv-writev.patch
  Fix writev atomicity on pipe/fifo

ext3_new_inode-scan-fix.patch
  ext3_new_inode fixlet

lockless-semop.patch
  lockless semop

percpu_counter-use-alloc_percpu.patch
  use alloc_percpu in percpu_counters

i450nx-scanning-fix.patch
  i450nx PCI scanning fix

serio-pm-fix.patch
  psmouse pm resume fix

find_busiest_queue-commentary.patch
  find_busiest_queue() commentary fix

ext2-block-allocator-fixes.patch
  ext2 block allocator fixes

SOUND_CMPCI-config-typo-fix.patch
  fix SOUND_CMPCI Configure help entry

atkbd-24-compatibility.patch
  Fixes for keyboard 2.4 compatibility

init_h-needs-compiler_h.patch
  init.h needs to include compiler.h

init_h-needs-compiler_h-fix.patch
  compile fix for older gcc's

cpu_sibling_map-fix.patch
  cpu_sibling_map fix

tulip-hash-fix.patch
  tulip filter hash fix

context-switch-accounting-fix.patch
  Fix context switch accounting

access-vfs_permission-fix.patch
  Subject: Re: [PATCH] fix access() / vfs_permission() bug

eicon-linkage-fix.patch
  eicon/ and hardware/eicon/ drivers using the same symbols

kobject-docco-additions.patch
  Improve documentation for kobjects

list_del-debug.patch
  list_del debug check

print-build-options-on-oops.patch

show_task-free-stack-fix.patch
  show_task() fix and cleanup

oops-dump-preceding-code.patch
  i386 oops output: dump preceding code

lockmeter.patch

printk-oops-mangle-fix.patch
  disentangle printk's whilst oopsing on SMP

4g-2.6.0-test2-mm2-A5.patch
  4G/4G split patch
  4G/4G: remove debug code
  4g4g: pmd fix
  4g/4g: fixes from Bill
  4g4g: fpu emulation fix
  4g/4g usercopy atomicity fix
  4G/4G: remove debug code
  4g4g: pmd fix
  4g/4g: fixes from Bill
  4g4g: fpu emulation fix
  4g/4g usercopy atomicity fix
  4G/4G preempt on vstack
  4G/4G: even number of kmap types
  4g4g: fix __get_user in slab
  4g4g: Remove extra .data.idt section definition
  4g/4g linker error (overlapping sections)
  4G/4G: remove debug code
  4g4g: pmd fix
  4g/4g: fixes from Bill
  4g4g: fpu emulation fix
  4g4g: show_registers() fix
  4g/4g usercopy atomicity fix
  4g4g: debug flags fix
  4g4g: Fix wrong asm-offsets entry
  cyclone time fixmap fix
  4G/4G preempt on vstack
  4G/4G: even number of kmap types
  4g4g: fix __get_user in slab
  4g4g: Remove extra .data.idt section definition
  4g/4g linker error (overlapping sections)
  4G/4G: remove debug code
  4g4g: pmd fix
  4g/4g: fixes from Bill
  4g4g: fpu emulation fix
  4g4g: show_registers() fix
  4g/4g usercopy atomicity fix
  4g4g: debug flags fix
  4g4g: Fix wrong asm-offsets entry
  cyclone time fixmap fix
  use direct_copy_{to,from}_user for kernel access in mm/usercopy.c
  4G/4G might_sleep warning fix
  4g/4g pagetable accounting fix

4g4g-athlon-prefetch-handling-fix.patch

4g4g-wp-test-fix.patch
  Fix 4G/4G and WP test lockup

4g4g-KERNEL_DS-usercopy-fix.patch
  4G/4G KERNEL_DS usercopy again

ppc-fixes.patch
  make mm4 compile on ppc

aic7xxx_old-oops-fix.patch

O_DIRECT-race-fixes-rollup.patch
  DIO fixes forward port and AIO-DIO fix
  O_DIRECT race fixes comments
  O_DRIECT race fixes fix fix fix
  DIO locking rework
  O_DIRECT XFS fix

dio-aio-fixes.patch
  direct-io AIO fixes

dio-aio-fixes-fixes.patch
  dio-aio fix fix

readahead-multiple-fixes.patch
  readahead: multipole performance fixes

readahead-simplification.patch
  readahead simplification

aio-sysctl-parms.patch
  aio sysctl parms

aio-01-retry.patch
  AIO: Core retry infrastructure
  Fix aio process hang on EINVAL
  AIO: flush workqueues before destroying ioctx'es
  AIO: hold the context lock across unuse_mm
  task task_lock in use_mm()

4g4g-aio-hang-fix.patch
  Fix AIO and 4G-4G hang

aio-retry-elevated-refcount.patch
  aio: extra ref count during retry

aio-splice-runlist.patch
  Splice AIO runlist for fairer handling of multiple io contexts

aio-02-lockpage_wq.patch
  AIO: Async page wait

aio-03-fs_read.patch
  AIO: Filesystem aio read

aio-04-buffer_wq.patch
  AIO: Async buffer wait
  lock_buffer_wq fix

aio-05-fs_write.patch
  AIO: Filesystem aio write

aio-06-bread_wq.patch
  AIO: Async block read

aio-07-ext2getblk_wq.patch
  AIO: Async get block for ext2

O_SYNC-speedup-2.patch
  speed up O_SYNC writes

O_SYNC-speedup-2-f_mapping-fixes.patch

aio-09-o_sync.patch
  aio O_SYNC
  AIO: fix a BUG
  Unify o_sync changes for aio and regular writes
  aio-O_SYNC-fix bits got lost
  aio: writev nr_segs fix
  More AIO O_SYNC related fixes

aio-09-o_sync-f_mapping-fixes.patch

gang_lookup_next.patch
  Change the page gang lookup API

aio-gang_lookup-fix.patch
  AIO gang lookup fixes

aio-O_SYNC-short-write-fix.patch
  Fix for O_SYNC short writes

aio-12-readahead.patch
  AIO: readahead fixes
  aio O_DIRECT no readahead
  Unified page range readahead for aio and regular reads

aio-12-readahead-f_mapping-fix.patch

aio-readahead-speedup.patch
  Readahead issues and AIO read speedup

promise-sata-id.patch
  add Promise 20376 PCI ID




^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH] linux-2.6.0-test9-mm3_verbose-timesource-acpi-pm_A0
  2003-11-13  7:30 2.6.0-test9-mm3 Andrew Morton
@ 2003-11-13 20:03 ` john stultz
  2003-11-13 22:03 ` 2.6.0-test9-mm3 - AIO test results Daniel McNeil
                   ` (3 subsequent siblings)
  4 siblings, 0 replies; 49+ messages in thread
From: john stultz @ 2003-11-13 20:03 UTC (permalink / raw)
  To: Andrew Morton; +Cc: lkml, linux-mm

On Wed, 2003-11-12 at 23:30, Andrew Morton wrote:
> +acpi-pm-timer.patch
> +acpi-pm-timer-fixes.patch
> 
>  Yet another timer source for ia32
> 
[snip]
> verbose-timesource.patch
>   be verbose about the time source

Andrew, 
	I forgot that I sent you the verbose-timesource patch. The ACPI PM time
source will need this simple fix to work along side that patch.

thanks
-john

===== arch/i386/kernel/timers/timer_pm.c 1.6 vs edited =====
--- 1.6/arch/i386/kernel/timers/timer_pm.c	Tue Nov  4 11:39:50 2003
+++ edited/arch/i386/kernel/timers/timer_pm.c	Thu Nov 13 11:12:23 2003
@@ -185,6 +185,7 @@
 
 /* acpi timer_opts struct */
 struct timer_opts timer_pmtmr = {
+	.name			= "pmtmr",
 	.init 			= init_pmtmr,
 	.mark_offset		= mark_offset_pmtmr, 
 	.get_offset		= get_offset_pmtmr,




^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: 2.6.0-test9-mm3 - AIO test results
  2003-11-13  7:30 2.6.0-test9-mm3 Andrew Morton
  2003-11-13 20:03 ` [PATCH] linux-2.6.0-test9-mm3_verbose-timesource-acpi-pm_A0 john stultz
@ 2003-11-13 22:03 ` Daniel McNeil
  2003-11-17  5:25   ` Suparna Bhattacharya
  2003-11-13 22:04 ` 2.6.0-test9-mm3 (compile stats) John Cherry
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 49+ messages in thread
From: Daniel McNeil @ 2003-11-13 22:03 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Linux Kernel Mailing List, linux-mm, linux-aio

Andrew,

I'm testing test9-mm3 on a 2-proc Xeon with a ext3 file system.
I tested using the test programs aiocp and aiodio_sparse.
(see http://developer.osdl.org/daniel/AIO/)

Using aiocp with i/o sizes from 1k to 512k to copy files worked
without any errors or kernel debug messages.

With 64k i/o, the aiodio_sparse program complete without any errors.
There are no kernel error messages, so that is good.

There are still problems with non power of 2 i/o sizes using AIO and
O_DIRECT.  It hangs with aio's that do not seem to complete.  The test
does exit when hitting ^c and there are no kernel messages.  Test output
below:

$ ./aiodio_sparse

$ ./aiodio_sparse -dd -s 1751k -r 18k -w 11k
child 1843, read loop count 0
io_submit() return 16
aiodio_sparse: 16 i/o in flight
aiodio_sparse: offset 180224 filesize 1793024 inflight 16
aiodio_sparse: io_getevent() returned 1
aiodio_sparse: io_getevent() res 11264 res2 0
io_submit() return 1
aiodio_sparse: offset 191488 filesize 1793024 inflight 16
aiodio_sparse: io_getevent() returned 1
aiodio_sparse: io_getevent() res 11264 res2 0
io_submit() return 1
aiodio_sparse: offset 202752 filesize 1793024 inflight 16
aiodio_sparse: io_getevent() returned 1
aiodio_sparse: io_getevent() res 11264 res2 0
io_submit() return 1
aiodio_sparse: offset 214016 filesize 1793024 inflight 16
aiodio_sparse: io_getevent() returned 1
aiodio_sparse: io_getevent() res 11264 res2 0
io_submit() return 1
aiodio_sparse: offset 225280 filesize 1793024 inflight 16
aiodio_sparse: io_getevent() returned 1
aiodio_sparse: io_getevent() res 11264 res2 0
io_submit() return 1
aiodio_sparse: offset 236544 filesize 1793024 inflight 16
aiodio_sparse: io_getevent() returned 1
aiodio_sparse: io_getevent() res 11264 res2 0
io_submit() return 1
aiodio_sparse: offset 247808 filesize 1793024 inflight 16
aiodio_sparse: io_getevent() returned 1
aiodio_sparse: io_getevent() res 11264 res2 0
io_submit() return 1
aiodio_sparse: offset 259072 filesize 1793024 inflight 16
aiodio_sparse: io_getevent() returned 1
aiodio_sparse: io_getevent() res 11264 res2 0
io_submit() return 1
aiodio_sparse: offset 270336 filesize 1793024 inflight 16
aiodio_sparse: io_getevent() returned 1
aiodio_sparse: io_getevent() res 11264 res2 0
io_submit() return 1
aiodio_sparse: offset 281600 filesize 1793024 inflight 16
aiodio_sparse: io_getevent() returned 1
aiodio_sparse: io_getevent() res 11264 res2 0
io_submit() return 1
aiodio_sparse: offset 292864 filesize 1793024 inflight 16
aiodio_sparse: io_getevent() returned 1
aiodio_sparse: io_getevent() res 11264 res2 0
io_submit() return 1
aiodio_sparse: offset 304128 filesize 1793024 inflight 16
aiodio_sparse: io_getevent() returned 1
aiodio_sparse: io_getevent() res 11264 res2 0
child 1843, read loop count 10
io_submit() return 1
aiodio_sparse: offset 315392 filesize 1793024 inflight 16
aiodio_sparse: io_getevent() returned 1
aiodio_sparse: io_getevent() res 11264 res2 0
io_submit() return 1
aiodio_sparse: offset 326656 filesize 1793024 inflight 16
aiodio_sparse: io_getevent() returned 1
aiodio_sparse: io_getevent() res 11264 res2 0
io_submit() return 1
aiodio_sparse: offset 337920 filesize 1793024 inflight 16
aiodio_sparse: io_getevent() returned 1
aiodio_sparse: io_getevent() res 11264 res2 0
io_submit() return 1
aiodio_sparse: offset 349184 filesize 1793024 inflight 16
aiodio_sparse: io_getevent() returned 1
aiodio_sparse: io_getevent() res 11264 res2 0
io_submit() return 1
aiodio_sparse: offset 360448 filesize 1793024 inflight 16
child 1843, read loop count 20
child 1843, read loop count 30
child 1843, read loop count 40
child 1843, read loop count 50
child 1843, read loop count 60
child 1843, read loop count 70

$ ./aiodio_sparse -i 9 -d -s 180k -r 18k -w 18k  
io_submit() return 9
aiodio_sparse: 9 i/o in flight
aiodio_sparse: offset 165888 filesize 184320 inflight 9
aiodio_sparse: io_getevent() returned 1
aiodio_sparse: io_getevent() res 18432 res2 0
io_submit() return 1
child 2060, read loop count 0
child 2060, read loop count 10
child 2060, read loop count 20

Daniel

On Wed, 2003-11-12 at 23:30, Andrew Morton wrote:

> - Significant changes to the AIO and direct-io code.  This needs beating
>   on; hopefully we're now close to a solution to the fairly complex problems
>   in there.
> 



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: 2.6.0-test9-mm3 (compile stats)
  2003-11-13  7:30 2.6.0-test9-mm3 Andrew Morton
  2003-11-13 20:03 ` [PATCH] linux-2.6.0-test9-mm3_verbose-timesource-acpi-pm_A0 john stultz
  2003-11-13 22:03 ` 2.6.0-test9-mm3 - AIO test results Daniel McNeil
@ 2003-11-13 22:04 ` John Cherry
  2003-11-14  5:07 ` 2.6.0-test9-mm3 Martin J. Bligh
  2003-11-14 19:08 ` 2.6.0-test9-mm3 Martin J. Bligh
  4 siblings, 0 replies; 49+ messages in thread
From: John Cherry @ 2003-11-13 22:04 UTC (permalink / raw)
  To: Andrew Morton; +Cc: linux-kernel

Linux 2.6 (mm tree) Compile Statistics (gcc 3.2.2)
Warnings/Errors Summary

Kernel             bzImage   bzImage  bzImage  modules  bzImage  
modules
                 (defconfig) (allno)  (allyes) (allyes) (allmod)
(allmod)
---------------  ---------- -------- -------- -------- --------
---------
2.6.0-test9-mm3    0w/0e     0w/0e   172w/ 0e  12w/0e   3w/0e    211w/0e
2.6.0-test9-mm2    0w/0e     0w/0e   172w/ 0e  12w/0e   3w/0e    211w/1e
2.6.0-test9-mm1    0w/0e     0w/0e   179w/ 1e  12w/0e   3w/0e    213w/1e
2.6.0-test8-mm1    0w/0e     0w/0e   183w/ 1e  13w/0e   3w/0e    223w/1e
2.6.0-test7-mm1    0w/0e     1w/0e   176w/ 1e   9w/0e   3w/0e    231w/1e
2.6.0-test6-mm4    0w/0e     1w/0e   179w/ 1e   9w/0e   3w/0e    234w/1e
2.6.0-test6-mm3    0w/0e     1w/0e   178w/ 1e   9w/0e   3w/0e    252w/2e
2.6.0-test6-mm2    0w/0e     1w/0e   179w/ 1e   9w/0e   3w/0e    252w/2e
2.6.0-test6-mm1    0w/0e     1w/0e   179w/ 1e   9w/0e   3w/0e    252w/2e

Web page with links to complete details:
   http://developer.osdl.org/cherry/compile/

Version information for host [ cherrypit.pdx.osdl.net ]
 gcc:    3.2.2
 patch:  2.5.4

Kernel version: 2.6.0-test9-mm3
Kernel build: 
   Making bzImage (defconfig): 0 warnings, 0 errors
   Making modules (defconfig): 0 warnings, 0 errors
   Making bzImage (allnoconfig): 0 warnings, 0 errors
   Making bzImage (allyesconfig): 172 warnings, 0 errors
   Making modules (allyesconfig): 12 warnings, 0 errors
   Making bzImage (allmodconfig): 3 warnings, 0 errors
   Making modules (allmodconfig): 211 warnings, 0 errors

Building directories:
   Building fs/adfs: clean
   Building fs/affs: clean
   Building fs/afs: clean
   Building fs/autofs: clean
   Building fs/autofs4: clean
   Building fs/befs: clean
   Building fs/bfs: clean
   Building fs/cifs: clean
   Building fs/coda: clean
   Building fs/cramfs: clean
   Building fs/devfs: clean
   Building fs/devpts: clean
   Building fs/efs: clean
   Building fs/exportfs: clean
   Building fs/ext2: clean
   Building fs/ext3: clean
   Building fs/fat: clean
   Building fs/freevxfs: clean
   Building fs/hfs: clean
   Building fs/hpfs: clean
   Building fs/hugetlbfs: clean
   Building fs/intermezzo: clean
   Building fs/isofs: clean
   Building fs/jbd: clean
   Building fs/jffs: clean
   Building fs/jffs2: clean
   Building fs/jfs: clean
   Building fs/lockd: clean
   Building fs/minix: clean
   Building fs/msdos: clean
   Building fs/ncpfs: clean
   Building fs/nfs: clean
   Building fs/nfsd: clean
   Building fs/nls: clean
   Building fs/ntfs: clean
   Building fs/partitions: clean
   Building fs/proc: clean
   Building fs/qnx4: clean
   Building fs/ramfs: clean
   Building fs/reiserfs: clean
   Building fs/romfs: clean
   Building fs/smbfs: clean
   Building fs/sysfs: clean
   Building fs/sysv: clean
   Building fs/udf: clean
   Building fs/ufs: clean
   Building fs/vfat: clean
   Building fs/xfs: clean
   Building drivers/i2c: clean
   Building drivers/net: 31 warnings, 0 errors
   Building drivers/media: 1 warnings, 0 errors
   Building drivers/base: clean
   Building drivers/pci: clean
   Building drivers/eisa: clean
   Building drivers/isdn: clean
   Building drivers/char: 1 warnings, 0 errors
   Building drivers/acpi: clean
   Building drivers/serial: 1 warnings, 0 errors
   Building drivers/fc4: clean
   Building drivers/parport: clean
   Building drivers/mtd: 23 warnings, 0 errors
   Building drivers/usb: clean
   Building drivers/block: 1 warnings, 0 errors
   Building drivers/pcmcia: 3 warnings, 0 errors
   Building drivers/input: clean
   Building drivers/atm: clean
   Building drivers/ide: 30 warnings, 0 errors
   Building drivers/pnp: clean
   Building drivers/oprofile: clean
   Building drivers/ieee1394: clean
   Building drivers/cdrom: 3 warnings, 0 errors
   Building drivers/md: clean
   Building drivers/message: 1 warnings, 0 errors
   Building drivers/cpufreq: clean
   Building drivers/sbus: clean
   Building drivers/bluetooth: clean
   Building drivers/telephony: 5 warnings, 0 errors
   Building drivers/zorro: clean
   Building drivers/acorn: clean
   Building drivers/tc: clean
   Building drivers/mca: clean
   Building drivers/nubus: clean
   Building drivers/misc: clean
   Building drivers/dio: clean
   Building drivers/scsi/aacraid: clean
   Building drivers/scsi/aic7xxx: clean
   Building drivers/scsi/pcmcia: 4 warnings, 0 errors
   Building drivers/scsi/sym53c8xx_2: clean
   Building drivers/video/aty: 3 warnings, 0 errors
   Building drivers/video/console: 2 warnings, 0 errors
   Building drivers/video/i810: clean
   Building drivers/video/logo: clean
   Building drivers/video/matrox: 5 warnings, 0 errors
   Building drivers/video/riva: clean
   Building drivers/video/sis: 1 warnings, 0 errors
   Building sound/core: clean
   Building sound/drivers: clean
   Building sound/i2c: clean
   Building sound/isa: 3 warnings, 0 errors
   Building sound/oss: 33 warnings, 0 errors
   Building sound/pci: clean
   Building sound/pcmcia: clean
   Building sound/synth: clean
   Building sound/usb: clean
   Building arch/i386: clean
   Building crypto: clean
   Building lib: clean
   Building net: 9 warnings, 0 errors
   Building security: clean
   Building sound: clean
   Building usr: clean
   Building fs: clean
   Building drivers/video: 8 warnings, 0 errors
   Building drivers/scsi: 44 warnings, 0 errors
   Building drivers/net: 0 warnings, 1 errors


Error Summary (individual module builds):

   drivers/net: 0 warnings, 1 errors


Warning Summary (individual module builds):

   drivers/block: 1 warnings, 0 errors
   drivers/cdrom: 3 warnings, 0 errors
   drivers/char: 1 warnings, 0 errors
   drivers/ide: 30 warnings, 0 errors
   drivers/media: 1 warnings, 0 errors
   drivers/message: 1 warnings, 0 errors
   drivers/mtd: 23 warnings, 0 errors
   drivers/net: 31 warnings, 0 errors
   drivers/pcmcia: 3 warnings, 0 errors
   drivers/scsi/pcmcia: 4 warnings, 0 errors
   drivers/scsi: 44 warnings, 0 errors
   drivers/serial: 1 warnings, 0 errors
   drivers/telephony: 5 warnings, 0 errors
   drivers/video/aty: 3 warnings, 0 errors
   drivers/video/console: 2 warnings, 0 errors
   drivers/video/matrox: 5 warnings, 0 errors
   drivers/video/sis: 1 warnings, 0 errors
   drivers/video: 8 warnings, 0 errors
   net: 9 warnings, 0 errors
   sound/isa: 3 warnings, 0 errors
   sound/oss: 33 warnings, 0 errors


Error List:

make[1]: [arch/i386/boot/bzImage] Error 1 (ignored)
make[2]: [drivers/net/wan/wanxlfw.inc] Error 127 (ignored)


Warning List:

arch/i386/kernel/cpu/cpufreq/powernow-k8.c:38:2: warning: #warning this
driver has not been tested on a preempt system
arch/i386/kernel/cpu/cpufreq/powernow-k8.c:938:2: warning: #warning
pol->policy is in undefined state here
drivers/cdrom/aztcd.c:379: warning: `pa_ok' defined but not used
drivers/cdrom/isp16.c:124: warning: `check_region' is deprecated
(declared at include/linux/ioport.h:119)
drivers/cdrom/mcdx.h:180:2: warning: #warning You have not edited mcdx.h
drivers/cdrom/mcdx.h:181:2: warning: #warning Perhaps irq and i/o
settings are wrong.
drivers/cdrom/sjcd.c:1700: warning: `check_region' is deprecated
(declared at include/linux/ioport.h:119)
drivers/char/applicom.c:522:2: warning: #warning "Je suis stupide. DW. -
copy*user in cli"
drivers/char/applicom.c:67: warning: `applicom_pci_tbl' defined but not
used
drivers/char/watchdog/alim1535_wdt.c:320: warning: `ali_pci_tbl' defined
but not used
drivers/ide/ide-probe.c:1326: warning: `MOD_INC_USE_COUNT' is deprecated
(declared at include/linux/module.h:482)
drivers/ide/ide-probe.c:1353: warning: `MOD_DEC_USE_COUNT' is deprecated
(declared at include/linux/module.h:494)
drivers/ide/ide-tape.c:6213: warning: duplicate `const'
drivers/ide/ide.c:2470: warning: implicit declaration of function
`pnpide_init'
drivers/ide/legacy/ide-cs.c:365: warning: `MOD_INC_USE_COUNT' is
deprecated (declared at include/linux/module.h:482)
drivers/ide/legacy/ide-cs.c:411: warning: `MOD_DEC_USE_COUNT' is
deprecated (declared at include/linux/module.h:494)
drivers/ide/pci/aec62xx.c:533: warning: `MOD_INC_USE_COUNT' is
deprecated (declared at include/linux/module.h:482)
drivers/ide/pci/alim15x3.c:871: warning: `MOD_INC_USE_COUNT' is
deprecated (declared at include/linux/module.h:482)
drivers/ide/pci/amd74xx.c:451: warning: `MOD_INC_USE_COUNT' is
deprecated (declared at include/linux/module.h:482)
drivers/ide/pci/cmd64x.c:755: warning: `MOD_INC_USE_COUNT' is deprecated
(declared at include/linux/module.h:482)
drivers/ide/pci/cs5520.c:294: warning: `MOD_INC_USE_COUNT' is deprecated
(declared at include/linux/module.h:482)
drivers/ide/pci/cs5530.c:416: warning: `MOD_INC_USE_COUNT' is deprecated
(declared at include/linux/module.h:482)
drivers/ide/pci/cy82c693.c:437: warning: `MOD_INC_USE_COUNT' is
deprecated (declared at include/linux/module.h:482)
drivers/ide/pci/hpt34x.c:334: warning: `MOD_INC_USE_COUNT' is deprecated
(declared at include/linux/module.h:482)
drivers/ide/pci/hpt366.c:1223: warning: `MOD_INC_USE_COUNT' is
deprecated (declared at include/linux/module.h:482)
drivers/ide/pci/ns87415.c:228: warning: `MOD_INC_USE_COUNT' is
deprecated (declared at include/linux/module.h:482)
drivers/ide/pci/opti621.c:364: warning: `MOD_INC_USE_COUNT' is
deprecated (declared at include/linux/module.h:482)
drivers/ide/pci/pdc202xx_new.c:631: warning: `MOD_INC_USE_COUNT' is
deprecated (declared at include/linux/module.h:482)
drivers/ide/pci/pdc202xx_old.c:925: warning: `MOD_INC_USE_COUNT' is
deprecated (declared at include/linux/module.h:482)
drivers/ide/pci/piix.c:746: warning: `MOD_INC_USE_COUNT' is deprecated
(declared at include/linux/module.h:482)
drivers/ide/pci/rz1000.c:65: warning: `MOD_INC_USE_COUNT' is deprecated
(declared at include/linux/module.h:482)
drivers/ide/pci/sc1200.c:557: warning: `MOD_INC_USE_COUNT' is deprecated
(declared at include/linux/module.h:482)
drivers/ide/pci/serverworks.c:804: warning: `MOD_INC_USE_COUNT' is
deprecated (declared at include/linux/module.h:482)
drivers/ide/pci/siimage.c:1174: warning: `MOD_INC_USE_COUNT' is
deprecated (declared at include/linux/module.h:482)
drivers/ide/pci/sis5513.c:956: warning: `MOD_INC_USE_COUNT' is
deprecated (declared at include/linux/module.h:482)
drivers/ide/pci/slc90e66.c:376: warning: `MOD_INC_USE_COUNT' is
deprecated (declared at include/linux/module.h:482)
drivers/ide/pci/triflex.c:227: warning: `MOD_INC_USE_COUNT' is
deprecated (declared at include/linux/module.h:482)
drivers/ide/pci/trm290.c:378: warning: `check_region' is deprecated
(declared at include/linux/ioport.h:119)
drivers/ide/pci/trm290.c:406: warning: `MOD_INC_USE_COUNT' is deprecated
(declared at include/linux/module.h:482)
drivers/ide/pci/via82cxxx.c:618: warning: `MOD_INC_USE_COUNT' is
deprecated (declared at include/linux/module.h:482)
drivers/input/gameport/ns558.c:121: warning: `check_region' is
deprecated (declared at include/linux/ioport.h:119)
drivers/input/gameport/ns558.c:80: warning: `check_region' is deprecated
(declared at include/linux/ioport.h:119)
drivers/media/common/saa7146_vbi.c:6: warning: `vbi_workaround' defined
but not used
drivers/media/video/zoran_card.c:149: warning: `zr36067_pci_tbl' defined
but not used
drivers/message/fusion/mptscsih.c:6922: warning: `mptscsih_setup'
defined but not used
drivers/message/i2o/i2o_block.c:1506: warning: `MOD_DEC_USE_COUNT' is
deprecated (declared at include/linux/module.h:494)
drivers/mtd/chips/amd_flash.c:783: warning: `MOD_INC_USE_COUNT' is
deprecated (declared at include/linux/module.h:482)
drivers/mtd/chips/cfi_cmdset_0001.c:381: warning: unsigned int format,
different type arg (arg 2)
drivers/mtd/chips/cfi_cmdset_0001.c:965: warning: unsigned int format,
different type arg (arg 2)
drivers/mtd/chips/cfi_cmdset_0002.c:1157: warning: unsigned int format,
different type arg (arg 4)
drivers/mtd/chips/cfi_cmdset_0002.c:513: warning: unsigned int format,
different type arg (arg 4)
drivers/mtd/chips/cfi_cmdset_0002.c:651: warning: unsigned int format,
different type arg (arg 4)
drivers/mtd/chips/cfi_cmdset_0002.c:977: warning: unsigned int format,
different type arg (arg 4)
drivers/mtd/chips/cfi_cmdset_0020.c:1139: warning: unsigned int format,
different type arg (arg 3)
drivers/mtd/chips/cfi_cmdset_0020.c:1288: warning: unsigned int format,
different type arg (arg 3)
drivers/mtd/chips/cfi_cmdset_0020.c:493: warning: unsigned int format,
different type arg (arg 3)
drivers/mtd/chips/cfi_cmdset_0020.c:853: warning: unsigned int format,
different type arg (arg 3)
drivers/mtd/chips/sharp.c:157: warning: `MOD_INC_USE_COUNT' is
deprecated (declared at include/linux/module.h:482)
drivers/mtd/cmdlinepart.c:344: warning: `mtdpart_setup' defined but not
used
drivers/mtd/devices/doc2000.c:567: warning: assignment from incompatible
pointer type
drivers/mtd/devices/doc2000.c:568: warning: assignment from incompatible
pointer type
drivers/mtd/devices/doc2001.c:376: warning: assignment from incompatible
pointer type
drivers/mtd/devices/doc2001.c:377: warning: assignment from incompatible
pointer type
drivers/mtd/nftlcore.c:354: warning: passing arg 7 of pointer to
function makes pointer from integer without a cast
drivers/mtd/nftlcore.c:358: warning: passing arg 7 of pointer to
function makes pointer from integer without a cast
drivers/mtd/nftlcore.c:363: warning: passing arg 7 of pointer to
function makes pointer from integer without a cast
drivers/mtd/nftlcore.c:632: warning: passing arg 7 of pointer to
function makes pointer from integer without a cast
drivers/mtd/nftlcore.c:696: warning: passing arg 7 of pointer to
function makes pointer from integer without a cast
drivers/mtd/nftlmount.c:220: warning: passing arg 7 of pointer to
function makes pointer from integer without a cast
drivers/net/3c515.c:529: warning: `check_region' is deprecated (declared
at include/linux/ioport.h:119)
drivers/net/acenic.c:135: warning: `acenic_pci_tbl' defined but not used
drivers/net/arcnet/arc-rimi.c:319: warning: `dev_alloc' is deprecated
(declared at include/linux/netdevice.h:525)
drivers/net/arcnet/com20020-isa.c:152: warning: `dev_alloc' is
deprecated (declared at include/linux/netdevice.h:525)
drivers/net/arcnet/com20020-pci.c:71: warning: `dev_alloc' is deprecated
(declared at include/linux/netdevice.h:525)
drivers/net/arcnet/com90io.c:385: warning: `dev_alloc' is deprecated
(declared at include/linux/netdevice.h:525)
drivers/net/arcnet/com90xx.c:146: warning: `check_region' is deprecated
(declared at include/linux/ioport.h:119)
drivers/net/arcnet/com90xx.c:412: warning: `dev_alloc' is deprecated
(declared at include/linux/netdevice.h:525)
drivers/net/arcnet/com90xx.c:609: warning: `dev_alloc' is deprecated
(declared at include/linux/netdevice.h:525)
drivers/net/dgrs.c:124: warning: `dgrs_pci_tbl' defined but not used
drivers/net/eepro.c:575: warning: `check_region' is deprecated (declared
at include/linux/ioport.h:119)
drivers/net/ewrk3.c:1291: warning: `check_region' is deprecated
(declared at include/linux/ioport.h:119)
drivers/net/ewrk3.c:1335: warning: `check_region' is deprecated
(declared at include/linux/ioport.h:119)
drivers/net/hp100.c:288: warning: `hp100_pci_tbl' defined but not used
drivers/net/hp100.c:385: warning: `check_region' is deprecated (declared
at include/linux/ioport.h:119)
drivers/net/hp100.c:432: warning: `check_region' is deprecated (declared
at include/linux/ioport.h:119)
drivers/net/hp100.c:463: warning: `check_region' is deprecated (declared
at include/linux/ioport.h:119)
drivers/net/hp100.c:471: warning: `check_region' is deprecated (declared
at include/linux/ioport.h:119)
drivers/net/sk98lin/skaddr.c:1092: warning: `ReturnCode' might be used
uninitialized in this function
drivers/net/sk98lin/skaddr.c:1624: warning: `ReturnCode' might be used
uninitialized in this function
drivers/net/skfp/skfddi.c:185: warning: `skfddi_pci_tbl' defined but not
used
drivers/net/tokenring/smctr.c:3494: warning: `MOD_INC_USE_COUNT' is
deprecated (declared at include/linux/module.h:482)
drivers/net/tokenring/smctr.c:733: warning: `MOD_DEC_USE_COUNT' is
deprecated (declared at include/linux/module.h:494)
drivers/net/tulip/winbond-840.c:149: warning: `version' defined but not
used
drivers/net/wan/cycx_drv.c:430: warning: long unsigned int format, u32
arg (arg 2)
drivers/net/wan/farsync.c:1316: warning: `MOD_INC_USE_COUNT' is
deprecated (declared at include/linux/module.h:482)
drivers/net/wan/farsync.c:1329: warning: `MOD_DEC_USE_COUNT' is
deprecated (declared at include/linux/module.h:494)
drivers/net/wan/hostess_sv11.c:125: warning: `MOD_INC_USE_COUNT' is
deprecated (declared at include/linux/module.h:482)
drivers/net/wan/hostess_sv11.c:157: warning: `MOD_DEC_USE_COUNT' is
deprecated (declared at include/linux/module.h:494)
drivers/net/wan/lmc/lmc_main.c:1063: warning: `check_region' is
deprecated (declared at include/linux/ioport.h:119)
drivers/net/wan/lmc/lmc_main.c:1184: warning: `MOD_INC_USE_COUNT' is
deprecated (declared at include/linux/module.h:482)
drivers/net/wan/lmc/lmc_main.c:1355: warning: `MOD_DEC_USE_COUNT' is
deprecated (declared at include/linux/module.h:494)
drivers/net/wan/pc300_drv.c:3168: warning: `MOD_INC_USE_COUNT' is
deprecated (declared at include/linux/module.h:482)
drivers/net/wan/pc300_drv.c:3204: warning: `MOD_DEC_USE_COUNT' is
deprecated (declared at include/linux/module.h:494)
drivers/net/wan/sbni.c:308: warning: `check_region' is deprecated
(declared at include/linux/ioport.h:119)
drivers/pcmcia/i82365.c:680: warning: `check_region' is deprecated
(declared at include/linux/ioport.h:119)
drivers/pcmcia/i82365.c:817: warning: `check_region' is deprecated
(declared at include/linux/ioport.h:119)
drivers/pcmcia/tcic.c:340: warning: `check_region' is deprecated
(declared at include/linux/ioport.h:119)
drivers/scsi/BusLogic.c:1003: warning: `check_region' is deprecated
(declared at include/linux/ioport.h:119)
drivers/scsi/BusLogic.c:1008: warning: `check_region' is deprecated
(declared at include/linux/ioport.h:119)
drivers/scsi/BusLogic.c:700: warning: `check_region' is deprecated
(declared at include/linux/ioport.h:119)
drivers/scsi/BusLogic.c:704: warning: `check_region' is deprecated
(declared at include/linux/ioport.h:119)
drivers/scsi/BusLogic.c:708: warning: `check_region' is deprecated
(declared at include/linux/ioport.h:119)
drivers/scsi/BusLogic.c:712: warning: `check_region' is deprecated
(declared at include/linux/ioport.h:119)
drivers/scsi/BusLogic.c:716: warning: `check_region' is deprecated
(declared at include/linux/ioport.h:119)
drivers/scsi/BusLogic.c:720: warning: `check_region' is deprecated
(declared at include/linux/ioport.h:119)
drivers/scsi/BusLogic.c:973: warning: `check_region' is deprecated
(declared at include/linux/ioport.h:119)
drivers/scsi/BusLogic.c:988: warning: `check_region' is deprecated
(declared at include/linux/ioport.h:119)
drivers/scsi/BusLogic.c:993: warning: `check_region' is deprecated
(declared at include/linux/ioport.h:119)
drivers/scsi/BusLogic.c:998: warning: `check_region' is deprecated
(declared at include/linux/ioport.h:119)
drivers/scsi/NCR5380.c:396: warning: `phases' defined but not used
drivers/scsi/NCR5380.c:699: warning: `NCR5380_probe_irq' defined but not
used
drivers/scsi/NCR5380.c:756: warning: `NCR5380_print_options' defined but
not used
drivers/scsi/NCR53c406a.c:611: warning: `NCR53c406a_setup' defined but
not used
drivers/scsi/NCR53c406a.c:660: warning: initialization from incompatible
pointer type
drivers/scsi/NCR53c406a.c:669: warning: `wait_intr' defined but not used
drivers/scsi/advansys.c:10006: warning: `check_region' is deprecated
(declared at include/linux/ioport.h:119)
drivers/scsi/advansys.c:4622: warning: `check_region' is deprecated
(declared at include/linux/ioport.h:119)
drivers/scsi/aha152x.c:396: warning: `id_table' defined but not used
drivers/scsi/aha152x.c:793: warning: `aha152x_setup' defined but not
used
drivers/scsi/aha152x.c:852: warning: `check_region' is deprecated
(declared at include/linux/ioport.h:119)
drivers/scsi/aha152x.c:870: warning: `check_region' is deprecated
(declared at include/linux/ioport.h:119)
drivers/scsi/atp870u.c:2350: warning: `check_region' is deprecated
(declared at include/linux/ioport.h:119)
drivers/scsi/atp870u.c:2422: warning: `check_region' is deprecated
(declared at include/linux/ioport.h:119)
drivers/scsi/cpqfcTSinit.c:1583: warning: unused variable `timeout'
drivers/scsi/cpqfcTSinit.c:1584: warning: unused variable `retries'
drivers/scsi/cpqfcTSinit.c:1585: warning: unused variable `scsi_cdb'
drivers/scsi/cpqfcTSinit.c:471: warning: `my_ioctl_done' defined but not
used
drivers/scsi/dtc.c:187: warning: `dtc_setup' defined but not used
drivers/scsi/eata_pio.c:596: warning: `check_region' is deprecated
(declared at include/linux/ioport.h:119)
drivers/scsi/fd_mcs.c:300: warning: `fd_mcs_setup' defined but not used
drivers/scsi/fd_mcs.c:311: warning: initialization from incompatible
pointer type
drivers/scsi/fd_mcs.h:27: warning: `fd_mcs_command' declared `static'
but never defined
drivers/scsi/fdomain.c:763: warning: `check_region' is deprecated
(declared at include/linux/ioport.h:119)
drivers/scsi/g_NCR5380.c:926: warning: `id_table' defined but not used
drivers/scsi/gdth.c:881: warning: `gdthtable' defined but not used
drivers/scsi/inia100.h:70: warning: `inia100_detect' declared `static'
but never defined
drivers/scsi/inia100.h:71: warning: `inia100_release' declared `static'
but never defined
drivers/scsi/inia100.h:72: warning: `inia100_queue' declared `static'
but never defined
drivers/scsi/inia100.h:73: warning: `inia100_abort' declared `static'
but never defined
drivers/scsi/inia100.h:74: warning: `inia100_device_reset' declared
`static' but never defined
drivers/scsi/inia100.h:75: warning: `inia100_bus_reset' declared
`static' but never defined
drivers/scsi/libata-core.c:2133: warning: `ata_qc_push' defined but not
used
drivers/scsi/psi240i.c:713: warning: initialization from incompatible
pointer type
drivers/scsi/psi240i.c:714: warning: initialization from incompatible
pointer type
drivers/scsi/sym53c416.c:627: warning: `check_region' is deprecated
(declared at include/linux/ioport.h:119)
drivers/scsi/sym53c416.c:715: warning: `check_region' is deprecated
(declared at include/linux/ioport.h:119)
drivers/scsi/wd7000.c:1611: warning: `wd7000_abort' defined but not used
drivers/serial/8250.c:693: warning: `check_region' is deprecated
(declared at include/linux/ioport.h:119)
drivers/telephony/ixj.c:7737: warning: `check_region' is deprecated
(declared at include/linux/ioport.h:119)
drivers/telephony/ixj.c:7799: warning: `check_region' is deprecated
(declared at include/linux/ioport.h:119)
drivers/telephony/ixj.c:7835: warning: `check_region' is deprecated
(declared at include/linux/ioport.h:119)
drivers/telephony/ixj.h:41: warning: `ixj_h_rcsid' defined but not used
drivers/usb/class/usb-midi.h:150: warning: `usb_midi_ids' defined but
not used
drivers/video/aty/aty128fb.c:2335: warning: `aty128fb_exit' defined but
not used
drivers/video/aty/aty128fb.c:254: warning: `mode' defined but not used
drivers/video/aty/aty128fb.c:256: warning: `nomtrr' defined but not used
drivers/video/console/mdacon.c:374: warning: `MOD_INC_USE_COUNT' is
deprecated (declared at include/linux/module.h:482)
drivers/video/console/mdacon.c:384: warning: `MOD_DEC_USE_COUNT' is
deprecated (declared at include/linux/module.h:494)
drivers/video/hgafb.c:452: warning: `hgafb_fillrect' defined but not
used
drivers/video/hgafb.c:472: warning: `hgafb_copyarea' defined but not
used
drivers/video/hgafb.c:502: warning: `hgafb_imageblit' defined but not
used
drivers/video/imsttfb.c:1089: warning: `imsttfb_load_cursor_image'
defined but not used
drivers/video/imsttfb.c:1159: warning: `imstt_set_cursor' defined but
not used
drivers/video/matrox/matroxfb_base.c:1250: warning: `inverse' defined
but not used
drivers/video/matrox/matroxfb_g450.c:129: warning: duplicate `const'
drivers/video/matrox/matroxfb_g450.c:130: warning: duplicate `const'
drivers/video/matrox/matroxfb_maven.c:347: warning: duplicate `const'
drivers/video/matrox/matroxfb_maven.c:348: warning: duplicate `const'
drivers/video/sis/sis_main.c:622: warning: unused variable `reg'
drivers/video/tdfxfb.c:1005: warning: `tdfxfb_cursor' defined but not
used
drivers/video/tdfxfb.c:198: warning: `inverse' defined but not used
drivers/video/tdfxfb.c:199: warning: `mode_option' defined but not used
drivers/video/tridentfb.c:455: warning: `tridentfb_fillrect' defined but
not used
drivers/video/tridentfb.c:473: warning: `tridentfb_copyarea' defined but
not used
include/linux/ixjuser.h:45: warning: `ixjuser_h_rcsid' defined but not
used
include/linux/mca-legacy.h:12:2: warning: #warning "MCA legacy - please
move your driver to the new sysfs api"
net/decnet/dn_nsp_in.c:805: warning: `skb_linearize' is deprecated
(declared at include/linux/skbuff.h:1136)
net/decnet/dn_route.c:639: warning: `skb_linearize' is deprecated
(declared at include/linux/skbuff.h:1136)
net/ipv4/ipcomp.c:189: warning: `skb_linearize' is deprecated (declared
at include/linux/skbuff.h:1136)
net/ipv4/ipcomp.c:72: warning: `skb_linearize' is deprecated (declared
at include/linux/skbuff.h:1136)
net/ipv6/ipcomp6.c:174: warning: `skb_linearize' is deprecated (declared
at include/linux/skbuff.h:1136)
net/ipv6/ipcomp6.c:61: warning: `skb_linearize' is deprecated (declared
at include/linux/skbuff.h:1136)
net/ipv6/netfilter/ip6_tables.c:349: warning: `skb_linearize' is
deprecated (declared at include/linux/skbuff.h:1136)
net/ipv6/netfilter/ip6table_mangle.c:162: warning: `skb_linearize' is
deprecated (declared at include/linux/skbuff.h:1136)
net/wanrouter/wanmain.c:729: warning: `dev_get' is deprecated (declared
at include/linux/netdevice.h:514)
sound/isa/opti9xx/opti92x-ad1848.c:1670: warning: `check_region' is
deprecated (declared at include/linux/ioport.h:119)
sound/isa/opti9xx/opti92x-ad1848.c:1686: warning: `check_region' is
deprecated (declared at include/linux/ioport.h:119)
sound/isa/opti9xx/opti92x-ad1848.c:314: warning: `check_region' is
deprecated (declared at include/linux/ioport.h:119)
sound/oss/ad1848.c:1580: warning: `check_region' is deprecated (declared
at include/linux/ioport.h:119)
sound/oss/ad1848.c:2530: warning: `check_region' is deprecated (declared
at include/linux/ioport.h:119)
sound/oss/ad1848.c:2967: warning: `id_table' defined but not used
sound/oss/cmpci.c:1465: warning: unused variable `s'
sound/oss/cmpci.c:2865: warning: `cmpci_pci_tbl' defined but not used
sound/oss/cs4232.c:141: warning: `check_region' is deprecated (declared
at include/linux/ioport.h:119)
sound/oss/cs4232.c:193: warning: `check_region' is deprecated (declared
at include/linux/ioport.h:119)
sound/oss/gus_card.c:76: warning: `check_region' is deprecated (declared
at include/linux/ioport.h:119)
sound/oss/gus_card.c:78: warning: `check_region' is deprecated (declared
at include/linux/ioport.h:119)
sound/oss/gus_card.c:93: warning: `check_region' is deprecated (declared
at include/linux/ioport.h:119)
sound/oss/gus_card.c:94: warning: `check_region' is deprecated (declared
at include/linux/ioport.h:119)
sound/oss/mad16.c:322: warning: `check_region' is deprecated (declared
at include/linux/ioport.h:119)
sound/oss/maui.c:307: warning: `check_region' is deprecated (declared at
include/linux/ioport.h:119)
sound/oss/mpu401.c:1217: warning: `check_region' is deprecated (declared
at include/linux/ioport.h:119)
sound/oss/msnd.c:74: warning: `MOD_INC_USE_COUNT' is deprecated
(declared at include/linux/module.h:482)
sound/oss/msnd.c:95: warning: `MOD_DEC_USE_COUNT' is deprecated
(declared at include/linux/module.h:494)
sound/oss/msnd_pinnacle.c:1123: warning: `check_region' is deprecated
(declared at include/linux/ioport.h:119)
sound/oss/msnd_pinnacle.c:1811: warning: `check_region' is deprecated
(declared at include/linux/ioport.h:119)
sound/oss/opl3sa.c:114: warning: `check_region' is deprecated (declared
at include/linux/ioport.h:119)
sound/oss/opl3sa.c:122: warning: `check_region' is deprecated (declared
at include/linux/ioport.h:119)
sound/oss/pss.c:1004: warning: `check_region' is deprecated (declared at
include/linux/ioport.h:119)
sound/oss/pss.c:191: warning: `check_region' is deprecated (declared at
include/linux/ioport.h:119)
sound/oss/pss.c:640: warning: `check_region' is deprecated (declared at
include/linux/ioport.h:119)
sound/oss/pss.c:710: warning: `check_region' is deprecated (declared at
include/linux/ioport.h:119)
sound/oss/sb_common.c:1224: warning: `check_region' is deprecated
(declared at include/linux/ioport.h:119)
sound/oss/sb_common.c:523: warning: `check_region' is deprecated
(declared at include/linux/ioport.h:119)
sound/oss/sgalaxy.c:89: warning: `check_region' is deprecated (declared
at include/linux/ioport.h:119)
sound/oss/sgalaxy.c:97: warning: `check_region' is deprecated (declared
at include/linux/ioport.h:119)
sound/oss/sscape.c:1113: warning: `check_region' is deprecated (declared
at include/linux/ioport.h:119)
sound/oss/sscape.c:1132: warning: `check_region' is deprecated (declared
at include/linux/ioport.h:119)
sound/oss/sscape.c:1137: warning: `check_region' is deprecated (declared
at include/linux/ioport.h:119)
sound/oss/sscape.c:737: warning: `check_region' is deprecated (declared
at include/linux/ioport.h:119)
sound/oss/trix.c:147: warning: `check_region' is deprecated (declared at
include/linux/ioport.h:119)
sound/oss/trix.c:292: warning: `check_region' is deprecated (declared at
include/linux/ioport.h:119)
sound/oss/trix.c:85: warning: `check_region' is deprecated (declared at
include/linux/ioport.h:119)
sound/oss/wavfront.c:2426: warning: `check_region' is deprecated
(declared at include/linux/ioport.h:119)
sound/oss/wf_midi.c:788: warning: `check_region' is deprecated (declared
at include/linux/ioport.h:119)




^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: 2.6.0-test9-mm3
  2003-11-13  7:30 2.6.0-test9-mm3 Andrew Morton
                   ` (2 preceding siblings ...)
  2003-11-13 22:04 ` 2.6.0-test9-mm3 (compile stats) John Cherry
@ 2003-11-14  5:07 ` Martin J. Bligh
  2003-11-14 20:57   ` 2.6.0-test9-mm3 Zwane Mwaikambo
  2003-11-14 19:08 ` 2.6.0-test9-mm3 Martin J. Bligh
  4 siblings, 1 reply; 49+ messages in thread
From: Martin J. Bligh @ 2003-11-14  5:07 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel, linux-mm

> - Several ext2 and ext3 allocator fixes.  These need serious testing on big
>   SMP.

Survives kernbench and SDET on ext2 at least on 16-way. I'll try ext3
later.

M.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: 2.6.0-test9-mm3
  2003-11-14 19:08 ` 2.6.0-test9-mm3 Martin J. Bligh
@ 2003-11-14 18:59   ` Andrew Morton
  2003-11-14 19:32     ` 2.6.0-test9-mm3 Mike Fedyk
  2003-11-14 19:10   ` 2.6.0-test9-mm3 Badari Pulavarty
  1 sibling, 1 reply; 49+ messages in thread
From: Andrew Morton @ 2003-11-14 18:59 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: linux-kernel, linux-mm

"Martin J. Bligh" <mbligh@aracnet.com> wrote:
>
> 
> 
> > - Several ext2 and ext3 allocator fixes.  These need serious testing on big
> >   SMP.
> 
> OK, ext3 survived a swatting on the 16-way as well>

Great, thanks.

> It's still slow as snot, but it does work ;-)

I think SDET generates storms of metadata updates.  Making the journal
larger may help get that idle time down.

Probably the default journal size is too small nowadays.  Most tests seem
to run faster when it is enlarged.



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: 2.6.0-test9-mm3
  2003-11-13  7:30 2.6.0-test9-mm3 Andrew Morton
                   ` (3 preceding siblings ...)
  2003-11-14  5:07 ` 2.6.0-test9-mm3 Martin J. Bligh
@ 2003-11-14 19:08 ` Martin J. Bligh
  2003-11-14 18:59   ` 2.6.0-test9-mm3 Andrew Morton
  2003-11-14 19:10   ` 2.6.0-test9-mm3 Badari Pulavarty
  4 siblings, 2 replies; 49+ messages in thread
From: Martin J. Bligh @ 2003-11-14 19:08 UTC (permalink / raw)
  To: Andrew Morton, linux-kernel, linux-mm



> - Several ext2 and ext3 allocator fixes.  These need serious testing on big
>   SMP.

OK, ext3 survived a swatting on the 16-way as well. It's still slow as snot,
but it does work ;-) No changes from before, methinks.

Diffprofile for kernbench (-j) from ext2 to ext3 on mm3

     27022    16.3% total
     24069    53.3% default_idle
       583     2.4% page_remove_rmap
       539   248.4% fd_install
       478   388.6% __blk_queue_bounce
       319     4.0% __d_lookup
       220   122.9% may_open
       204    68.2% filemap_nopage
       124     0.0% journal_add_journal_head
       122   321.1% __find_get_block_slow
       122     0.0% do_get_write_access
       101    57.1% generic_fillattr
...
       -52   -73.2% .text.lock.highmem
       -52   -94.5% generic_file_read
       -53   -18.7% do_generic_mapping_read
       -58    -3.3% do_no_page
       -65   -13.0% page_address
       -65   -60.2% kmap_high
       -74  -100.0% grab_block
       -75    -3.3% do_page_fault
       -85    -1.9% __copy_from_user_ll
      -273   -19.5% link_path_walk
      -299    -6.5% find_get_page
      -758  -100.0% generic_file_open

SDET:

   1726439   214.7% total
   1383611   345.4% default_idle
    115417     0.0% .text.lock.transaction
     79362     0.0% find_next_usable_block
     38003     0.0% do_get_write_access
     32429  2316.4% __down
     31231     0.0% journal_dirty_metadata
     15114   553.8% schedule
     14350  1253.3% __wake_up
     13459     0.0% start_this_handle
     13100     0.0% journal_stop
...
     -1105   -25.1% copy_mm
     -1144  -100.0% generic_file_open
     -1205   -45.0% .text.lock.dec_and_lock
     -1342  -100.0% ext2_new_inode
     -1365   -50.5% follow_mount
     -1453  -100.0% grab_block
     -1580   -30.5% remove_shared_vm_struct
     -1759   -11.0% copy_page_range
     -2145   -18.4% __d_lookup
     -2157   -35.6% path_lookup
     -2222   -33.7% atomic_dec_and_lock
     -2813   -25.0% release_pages
     -3764   -19.1% zap_pte_range
     -8954   -21.2% page_add_rmap
    -22707   -25.0% page_remove_rmap


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: 2.6.0-test9-mm3
  2003-11-14 19:08 ` 2.6.0-test9-mm3 Martin J. Bligh
  2003-11-14 18:59   ` 2.6.0-test9-mm3 Andrew Morton
@ 2003-11-14 19:10   ` Badari Pulavarty
  2003-11-14 20:29     ` 2.6.0-test9-mm3 Martin J. Bligh
  1 sibling, 1 reply; 49+ messages in thread
From: Badari Pulavarty @ 2003-11-14 19:10 UTC (permalink / raw)
  To: Martin J. Bligh, Andrew Morton, linux-kernel, linux-mm

On Friday 14 November 2003 11:08 am, Martin J. Bligh wrote:
> > - Several ext2 and ext3 allocator fixes.  These need serious testing on
> > big SMP.
>
> OK, ext3 survived a swatting on the 16-way as well. It's still slow as
> snot, but it does work ;-) No changes from before, methinks.
>
> Diffprofile for kernbench (-j) from ext2 to ext3 on mm3
>
>      27022    16.3% total
>      24069    53.3% default_idle
>        583     2.4% page_remove_rmap
>        539   248.4% fd_install
>        478   388.6% __blk_queue_bounce

What driver are you using ? Why are you bouncing ?

Thanks,
Badari

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: 2.6.0-test9-mm3
  2003-11-14 18:59   ` 2.6.0-test9-mm3 Andrew Morton
@ 2003-11-14 19:32     ` Mike Fedyk
  2003-11-14 20:27       ` 2.6.0-test9-mm3 John Stoffel
  0 siblings, 1 reply; 49+ messages in thread
From: Mike Fedyk @ 2003-11-14 19:32 UTC (permalink / raw)
  To: Andrew Morton; +Cc: Martin J. Bligh, linux-kernel, linux-mm

On Fri, Nov 14, 2003 at 10:59:47AM -0800, Andrew Morton wrote:
> "Martin J. Bligh" <mbligh@aracnet.com> wrote:
> >
> > 
> > 
> > > - Several ext2 and ext3 allocator fixes.  These need serious testing on big
> > >   SMP.
> > 
> > OK, ext3 survived a swatting on the 16-way as well>
> 
> Great, thanks.
> 
> > It's still slow as snot, but it does work ;-)
> 
> I think SDET generates storms of metadata updates.  Making the journal
> larger may help get that idle time down.
> 
> Probably the default journal size is too small nowadays.  Most tests seem
> to run faster when it is enlarged.

Or maybe if it didn't start sync committing from the journal once it hits 50%.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: 2.6.0-test9-mm3
  2003-11-14 19:32     ` 2.6.0-test9-mm3 Mike Fedyk
@ 2003-11-14 20:27       ` John Stoffel
  2003-11-15  1:01         ` 2.6.0-test9-mm3 Mike Fedyk
  0 siblings, 1 reply; 49+ messages in thread
From: John Stoffel @ 2003-11-14 20:27 UTC (permalink / raw)
  To: Mike Fedyk; +Cc: Andrew Morton, Martin J. Bligh, linux-kernel, linux-mm


Mike> Or maybe if it didn't start sync committing from the journal
Mike> once it hits 50%.

Instead of using a percentage like this, would it make sense to flush
the journal when there are only N number of free journal slots/entries
left?  Now the question is how to compute N in a sane way that works
for small (memory) systems, as well as for larger systems.

You don't want to grow N too aggresively, or base it on the memory of
the system, do you?  When you have a 20mb journal, maybe starting
writeout after 10mb is used makes sense, because you've only got 10
transaction slots open.  But when you have a 200mb journal, does it
make sense to start writeout when you only have 100 transaction slots
left?  

Since I don't know the internals of Ext3 at all, I'm probably
completely missing the idea here, but my gut feeling is that the
scaling we use in these cases shouldn't be linear at all, but more
likely inverse logyrythmic instead.  Basically, the larger we get with
a resource, the slower we grow our useage, or the smaller we grow the
absolute size of the writeout buffer(s).

Hmmm... this doesn't sound clear even to me.  But the idea I think I'm
trying to get at is that if we have X size of a journal, we want to
start writeout when we have X/2 available.  But when we have Y size of
a journal, where Y is X*10 (or larger), we don't want Y/2 as the
cutover point, we want something like  Y/10.  The idea is that we grow
the denominator here at a slow rate, since it will shrink the free
buffer percentage nicely, yet not let us get too close to a truly zero
sized buffer.

     X     X/N
    ----- --------
     10    5
     100   10
     1000  25
     10000 125

Does this make any sense to anyone?

John

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: 2.6.0-test9-mm3
  2003-11-14 19:10   ` 2.6.0-test9-mm3 Badari Pulavarty
@ 2003-11-14 20:29     ` Martin J. Bligh
  2003-11-17 20:58       ` 2.6.0-test9-mm3 bill davidsen
  0 siblings, 1 reply; 49+ messages in thread
From: Martin J. Bligh @ 2003-11-14 20:29 UTC (permalink / raw)
  To: Badari Pulavarty, Andrew Morton, linux-kernel, linux-mm

>> > - Several ext2 and ext3 allocator fixes.  These need serious testing on
>> > big SMP.
>> 
>> OK, ext3 survived a swatting on the 16-way as well. It's still slow as
>> snot, but it does work ;-) No changes from before, methinks.
>> 
>> Diffprofile for kernbench (-j) from ext2 to ext3 on mm3
>> 
>>      27022    16.3% total
>>      24069    53.3% default_idle
>>        583     2.4% page_remove_rmap
>>        539   248.4% fd_install
>>        478   388.6% __blk_queue_bounce
> 
> What driver are you using ? Why are you bouncing ?

qlogicisp. Because the driver is crap? ;-)

M.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: 2.6.0-test9-mm3
  2003-11-14  5:07 ` 2.6.0-test9-mm3 Martin J. Bligh
@ 2003-11-14 20:57   ` Zwane Mwaikambo
  2003-11-14 21:57     ` 2.6.0-test9-mm3 Martin J. Bligh
  0 siblings, 1 reply; 49+ messages in thread
From: Zwane Mwaikambo @ 2003-11-14 20:57 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: Andrew Morton, linux-kernel, linux-mm

On Thu, 13 Nov 2003, Martin J. Bligh wrote:

> > - Several ext2 and ext3 allocator fixes.  These need serious testing on big
> >   SMP.
> 
> Survives kernbench and SDET on ext2 at least on 16-way. I'll try ext3
> later.

It's actually triple faulting my laptop (K6 family=5 model=8 step=12) when 
i have CONFIG_X86_4G enabled and try and run X11. The same kernel is fine 
on all my other test boxes. Any hints?

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: 2.6.0-test9-mm3
  2003-11-14 21:57     ` 2.6.0-test9-mm3 Martin J. Bligh
@ 2003-11-14 21:37       ` Zwane Mwaikambo
  2003-11-14 21:47       ` 2.6.0-test9-mm3 Linus Torvalds
  1 sibling, 0 replies; 49+ messages in thread
From: Zwane Mwaikambo @ 2003-11-14 21:37 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: Andrew Morton, linux-kernel, linux-mm

On Fri, 14 Nov 2003, Martin J. Bligh wrote:

> >> > - Several ext2 and ext3 allocator fixes.  These need serious testing on big
> >> >   SMP.
> >> 
> >> Survives kernbench and SDET on ext2 at least on 16-way. I'll try ext3
> >> later.
> > 
> > It's actually triple faulting my laptop (K6 family=5 model=8 step=12) when 
> > i have CONFIG_X86_4G enabled and try and run X11. The same kernel is fine 
> > on all my other test boxes. Any hints?
> 
> Linus had some debug thing for triple faults, a few months ago, IIRC ...
> probably in the archives somewhere ...

It should all be in the kernel right now; arch/i386/kernel/doublefault.c 
but i think i may be a bit low on luck =)

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: 2.6.0-test9-mm3
  2003-11-14 21:57     ` 2.6.0-test9-mm3 Martin J. Bligh
  2003-11-14 21:37       ` 2.6.0-test9-mm3 Zwane Mwaikambo
@ 2003-11-14 21:47       ` Linus Torvalds
  2003-11-15  0:55         ` 2.6.0-test9-mm3 Zwane Mwaikambo
  1 sibling, 1 reply; 49+ messages in thread
From: Linus Torvalds @ 2003-11-14 21:47 UTC (permalink / raw)
  To: Martin J. Bligh; +Cc: Zwane Mwaikambo, Andrew Morton, linux-kernel, linux-mm


On Fri, 14 Nov 2003, Martin J. Bligh wrote:
> 
> Linus had some debug thing for triple faults, a few months ago, IIRC ...
> probably in the archives somewhere ...

Triple faults you can't debug, they raise a line outside the CPU, and 
normal PC hardware will cause that to just trigger a reboot.

But double faults do get caught, and that debugging stuff actually is in
the standard kernel. It won't give _nearly_ as good a debug report as a
"normal" oops, since I didn't want the double-fault handler to touch
anything even remotely unsafe, but it often gives a good hint about what
might be wrong. Certainly better than triple-faulting did (which we still
do for _catastrophic_ corruption, eg totally munged kernel page tables etc
- it's just very hard to avoid once you get corrupted enough).

		Linus


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: 2.6.0-test9-mm3
  2003-11-14 20:57   ` 2.6.0-test9-mm3 Zwane Mwaikambo
@ 2003-11-14 21:57     ` Martin J. Bligh
  2003-11-14 21:37       ` 2.6.0-test9-mm3 Zwane Mwaikambo
  2003-11-14 21:47       ` 2.6.0-test9-mm3 Linus Torvalds
  0 siblings, 2 replies; 49+ messages in thread
From: Martin J. Bligh @ 2003-11-14 21:57 UTC (permalink / raw)
  To: Zwane Mwaikambo; +Cc: Andrew Morton, linux-kernel, linux-mm

>> > - Several ext2 and ext3 allocator fixes.  These need serious testing on big
>> >   SMP.
>> 
>> Survives kernbench and SDET on ext2 at least on 16-way. I'll try ext3
>> later.
> 
> It's actually triple faulting my laptop (K6 family=5 model=8 step=12) when 
> i have CONFIG_X86_4G enabled and try and run X11. The same kernel is fine 
> on all my other test boxes. Any hints?

Linus had some debug thing for triple faults, a few months ago, IIRC ...
probably in the archives somewhere ...

M.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: 2.6.0-test9-mm3
  2003-11-14 21:47       ` 2.6.0-test9-mm3 Linus Torvalds
@ 2003-11-15  0:55         ` Zwane Mwaikambo
  2003-11-15 19:34           ` [PATCH][2.6-mm] Fix 4G/4G X11/vm86 oops Zwane Mwaikambo
  0 siblings, 1 reply; 49+ messages in thread
From: Zwane Mwaikambo @ 2003-11-15  0:55 UTC (permalink / raw)
  To: Linus Torvalds; +Cc: Martin J. Bligh, Andrew Morton, linux-kernel, linux-mm

On Fri, 14 Nov 2003, Linus Torvalds wrote:

> Triple faults you can't debug, they raise a line outside the CPU, and 
> normal PC hardware will cause that to just trigger a reboot.
> 
> But double faults do get caught, and that debugging stuff actually is in
> the standard kernel. It won't give _nearly_ as good a debug report as a
> "normal" oops, since I didn't want the double-fault handler to touch
> anything even remotely unsafe, but it often gives a good hint about what
> might be wrong. Certainly better than triple-faulting did (which we still
> do for _catastrophic_ corruption, eg totally munged kernel page tables etc
> - it's just very hard to avoid once you get corrupted enough).

"Catastrophic" seems to be rather apt here. 2.6.0-test8-mm1 produced the 
following, i'm still doing a binary search.

Unable to handle kernel paging request at virtual address 00002000
 printing eip:
00007341
*pde = 00000000
Oops: 0004 [#1]
PREEMPT SMP DEBUG_PAGEALLOC
CPU:    0
EIP:    c000:[<00007341>]    Not tainted VLI
EFLAGS: 00033246
EIP is at 0x7341
eax: 32454256   ebx: 00000000   ecx: 00000000   edx: 00000000
esi: 00000000   edi: 00002000   ebp: 00000fd6   esp: 08763f24
ds: 0000   es: 0000   ss: 0068
Process X (pid: 939, threadinfo=08762000 task=0890b330)
Stack: 00000fcb 00000100 00000000 0000c000 00000000 00000000 00000000 00000000
       00000005 ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
       ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
Call Trace:

Code:  Bad EIP value.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: 2.6.0-test9-mm3
  2003-11-14 20:27       ` 2.6.0-test9-mm3 John Stoffel
@ 2003-11-15  1:01         ` Mike Fedyk
  0 siblings, 0 replies; 49+ messages in thread
From: Mike Fedyk @ 2003-11-15  1:01 UTC (permalink / raw)
  To: John Stoffel; +Cc: Andrew Morton, Martin J. Bligh, linux-kernel, linux-mm

On Fri, Nov 14, 2003 at 03:27:01PM -0500, John Stoffel wrote:
> You don't want to grow N too aggresively, or base it on the memory of
> the system, do you?  When you have a 20mb journal, maybe starting
> writeout after 10mb is used makes sense, because you've only got 10
> transaction slots open.  But when you have a 200mb journal, does it
> make sense to start writeout when you only have 100 transaction slots
> left?  

The minimum transaction size is one block (since ext3 is the only journaling
FS to log entire blocks, instead of the specific logical changes made during
the transaction), and your blocks are 1k, 2k, or 4k.

Though many times you'll have several blocks per transaction since each
transaction can change bitmaps, directory blocks, and etc.

> Since I don't know the internals of Ext3 at all, I'm probably
> completely missing the idea here, but my gut feeling is that the
> scaling we use in these cases shouldn't be linear at all, but more
> likely inverse logyrythmic instead.  Basically, the larger we get with
> a resource, the slower we grow our useage, or the smaller we grow the
> absolute size of the writeout buffer(s).
> 
> Hmmm... this doesn't sound clear even to me.  But the idea I think I'm
> trying to get at is that if we have X size of a journal, we want to
> start writeout when we have X/2 available.  But when we have Y size of
> a journal, where Y is X*10 (or larger), we don't want Y/2 as the
> cutover point, we want something like  Y/10.  The idea is that we grow
> the denominator here at a slow rate, since it will shrink the free
> buffer percentage nicely, yet not let us get too close to a truly zero
> sized buffer.

Last I heard, ext3 will try to flush the journal with an async process and
if that isn't able to keep up, once the journal hits 50% full, the system
will write syncronously until the journal is empty (or was that until it was
25% full or less, I forget...).

AFAIK everyone agrees that this is not optimal, but nobody's taken the time
to fix it yet either.

Mike

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH][2.6-mm] Fix 4G/4G X11/vm86 oops
  2003-11-15  0:55         ` 2.6.0-test9-mm3 Zwane Mwaikambo
@ 2003-11-15 19:34           ` Zwane Mwaikambo
  2003-11-15 19:52             ` Zwane Mwaikambo
  2003-11-17 21:46             ` Zwane Mwaikambo
  0 siblings, 2 replies; 49+ messages in thread
From: Zwane Mwaikambo @ 2003-11-15 19:34 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Martin J. Bligh, Andrew Morton, Linux Kernel, linux-mm, Hugh Dickins

The 4G/4G page fault handling path doesn't appear to handle faults 
happening whilst in vm86. The regs->xcs != __USER_CS so it confused the in 
kernel test.

However i'm still debugging the X11 triple fault in test9-mm3

Unable to handle kernel paging request at virtual address 00002000
 printing eip:
00007341
*pde = 00000000
Oops: 0004 [#1]
SMP DEBUG_PAGEALLOC
CPU:    0
EIP:    c000:[<00007341>]    Not tainted VLI
EFLAGS: 00033246
EIP is at 0x7341
eax: 32454256   ebx: 00000000   ecx: 00000000   edx: 00000000
esi: 00000000   edi: 00002000   ebp: 00000fd6   esp: 087bbf24
ds: 0000   es: 0000   ss: 0068
Process X (pid: 939, threadinfo=087ba000 task=0891c690)
Stack: 00000fcb 00000100 00000000 0000c000 00000000 00000000 00000000 00000000
       00000005 ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
       ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff
Call Trace:

Index: linux-2.6.0-test9-mm3/arch/i386/mm/fault.c
===================================================================
RCS file: /build/cvsroot/linux-2.6.0-test9-mm3/arch/i386/mm/fault.c,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 fault.c
--- linux-2.6.0-test9-mm3/arch/i386/mm/fault.c	13 Nov 2003 08:07:17 -0000	1.1.1.1
+++ linux-2.6.0-test9-mm3/arch/i386/mm/fault.c	15 Nov 2003 19:08:34 -0000
@@ -264,7 +264,9 @@ asmlinkage void do_page_fault(struct pt_
 		if (error_code & 3)
 			goto bad_area_nosemaphore;
 
- 		goto vmalloc_fault;
+		/* If it's vm86 fall through */
+		if (!(error_code & 4))
+			goto vmalloc_fault;
 	}
 #else
 	if (unlikely(address >= TASK_SIZE)) { 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH][2.6-mm] Fix 4G/4G X11/vm86 oops
  2003-11-15 19:34           ` [PATCH][2.6-mm] Fix 4G/4G X11/vm86 oops Zwane Mwaikambo
@ 2003-11-15 19:52             ` Zwane Mwaikambo
  2003-11-17 21:46             ` Zwane Mwaikambo
  1 sibling, 0 replies; 49+ messages in thread
From: Zwane Mwaikambo @ 2003-11-15 19:52 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Martin J. Bligh, Andrew Morton, Linux Kernel, linux-mm, Hugh Dickins

On Sat, 15 Nov 2003, Zwane Mwaikambo wrote:

> The 4G/4G page fault handling path doesn't appear to handle faults 
> happening whilst in vm86. The regs->xcs != __USER_CS so it confused the in 
> kernel test.

Perhaps this would be more desirable?

Index: linux-2.6.0-test9-mm3/arch/i386/mm/fault.c
===================================================================
RCS file: /build/cvsroot/linux-2.6.0-test9-mm3/arch/i386/mm/fault.c,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 fault.c
--- linux-2.6.0-test9-mm3/arch/i386/mm/fault.c	13 Nov 2003 08:07:17 -0000	1.1.1.1
+++ linux-2.6.0-test9-mm3/arch/i386/mm/fault.c	15 Nov 2003 19:40:17 -0000
@@ -264,7 +264,9 @@ asmlinkage void do_page_fault(struct pt_
 		if (error_code & 3)
 			goto bad_area_nosemaphore;
 
- 		goto vmalloc_fault;
+		/* If it's vm86 fall through */
+		if (!(regs->eflags & VM_MASK))
+			goto vmalloc_fault;
 	}
 #else
 	if (unlikely(address >= TASK_SIZE)) { 

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: 2.6.0-test9-mm3 - AIO test results
  2003-11-13 22:03 ` 2.6.0-test9-mm3 - AIO test results Daniel McNeil
@ 2003-11-17  5:25   ` Suparna Bhattacharya
  2003-11-18  1:15     ` Daniel McNeil
  0 siblings, 1 reply; 49+ messages in thread
From: Suparna Bhattacharya @ 2003-11-17  5:25 UTC (permalink / raw)
  To: Daniel McNeil
  Cc: Andrew Morton, Linux Kernel Mailing List, linux-mm, linux-aio

On Thu, Nov 13, 2003 at 02:03:58PM -0800, Daniel McNeil wrote:
> Andrew,
> 
> I'm testing test9-mm3 on a 2-proc Xeon with a ext3 file system.
> I tested using the test programs aiocp and aiodio_sparse.
> (see http://developer.osdl.org/daniel/AIO/)
> 
> Using aiocp with i/o sizes from 1k to 512k to copy files worked
> without any errors or kernel debug messages.
> 
> With 64k i/o, the aiodio_sparse program complete without any errors.
> There are no kernel error messages, so that is good.
> 
> There are still problems with non power of 2 i/o sizes using AIO and
> O_DIRECT.  It hangs with aio's that do not seem to complete.  The test
> does exit when hitting ^c and there are no kernel messages.  Test output
> below:

Could you check if the following patch fixes the problem for you ?

Regards
Suparna

--------------------------------------------------------------

With this patch, when the DIO code falls back to buffered i/o after
having submitted part of the i/o, then buffered i/o is issued only
for the remaining part of the request (i.e. the part not already 
covered by DIO).

diff -ur pure-mm3/fs/direct-io.c linux-2.6.0-test9-mm3/fs/direct-io.c
--- pure-mm3/fs/direct-io.c	2003-11-14 09:09:06.000000000 +0530
+++ linux-2.6.0-test9-mm3/fs/direct-io.c	2003-11-17 09:00:47.000000000 +0530
@@ -74,6 +74,7 @@
 					   been performed at the start of a
 					   write */
 	int pages_in_io;		/* approximate total IO pages */
+	size_t	size;			/* total request size (doesn't change)*/
 	sector_t block_in_file;		/* Current offset into the underlying
 					   file in dio_block units. */
 	unsigned blocks_available;	/* At block_in_file.  changes */
@@ -226,7 +227,7 @@
 			dio_complete(dio, dio->block_in_file << dio->blkbits,
 					dio->result);
 			/* Complete AIO later if falling back to buffered i/o */
-			if (dio->result != -ENOTBLK) {
+			if (dio->result >= dio->size || dio->rw == READ) {
 				aio_complete(dio->iocb, dio->result, 0);
 				kfree(dio);
 			} else {
@@ -889,6 +890,7 @@
 	dio->blkbits = blkbits;
 	dio->blkfactor = inode->i_blkbits - blkbits;
 	dio->start_zero_done = 0;
+	dio->size = 0;
 	dio->block_in_file = offset >> blkbits;
 	dio->blocks_available = 0;
 	dio->cur_page = NULL;
@@ -925,7 +927,7 @@
 
 	for (seg = 0; seg < nr_segs; seg++) {
 		user_addr = (unsigned long)iov[seg].iov_base;
-		bytes = iov[seg].iov_len;
+		dio->size += bytes = iov[seg].iov_len;
 
 		/* Index into the first page of the first block */
 		dio->first_block_in_page = (user_addr & ~PAGE_MASK) >> blkbits;
@@ -956,6 +958,13 @@
 		}
 	} /* end iovec loop */
 
+	if (ret == -ENOTBLK && rw == WRITE) {
+		/*
+		 * The remaining part of the request will be 
+		 * be handled by buffered I/O when we return
+		 */
+		ret = 0;
+	}
 	/*
 	 * There may be some unwritten disk at the end of a part-written
 	 * fs-block-sized block.  Go zero that now.
@@ -986,19 +995,13 @@
 	 */
 	if (dio->is_async) {
 		if (ret == 0)
-			ret = dio->result;	/* Bytes written */
-		if (ret == -ENOTBLK) {
-			/*
-			 * The request will be reissued via buffered I/O
-			 * when we return; Any I/O already issued
-			 * effectively becomes redundant.
-			 */
-			dio->result = ret;
+			ret = dio->result;
+		if (ret > 0 && dio->result < dio->size && rw == WRITE) {
 			dio->waiter = current;
 		}
 		finished_one_bio(dio);		/* This can free the dio */
 		blk_run_queues();
-		if (ret == -ENOTBLK) {
+		if (dio->waiter) {
 			/*
 			 * Wait for already issued I/O to drain out and
 			 * release its references to user-space pages
@@ -1032,7 +1035,8 @@
 		}
 		dio_complete(dio, offset, ret);
 		/* We could have also come here on an AIO file extend */
-		if (!is_sync_kiocb(iocb) && (ret != -ENOTBLK))
+		if (!is_sync_kiocb(iocb) && !(rw == WRITE && ret >= 0 && 
+			dio->result < dio->size))
 			aio_complete(iocb, ret, 0);
 		kfree(dio);
 	}
diff -ur pure-mm3/mm/filemap.c linux-2.6.0-test9-mm3/mm/filemap.c
--- pure-mm3/mm/filemap.c	2003-11-14 09:15:08.000000000 +0530
+++ linux-2.6.0-test9-mm3/mm/filemap.c	2003-11-15 11:11:16.000000000 +0530
@@ -1895,14 +1895,16 @@
 		 */
 		if (written >= 0 && file->f_flags & O_SYNC)
 			status = generic_osync_inode(inode, mapping, OSYNC_METADATA);
-		if (written >= 0 && !is_sync_kiocb(iocb))
+		if (written >= count && !is_sync_kiocb(iocb))
 			written = -EIOCBQUEUED;
-		if (written != -ENOTBLK)
+		if (written < 0 || written >= count)
 			goto out_status;
 		/*
 		 * direct-io write to a hole: fall through to buffered I/O
+		 * for completing the rest of the request.
 		 */
-		written = 0;
+		pos += written;
+		count -= written;
 	}
 
 	buf = iov->iov_base;

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: 2.6.0-test9-mm3
  2003-11-14 20:29     ` 2.6.0-test9-mm3 Martin J. Bligh
@ 2003-11-17 20:58       ` bill davidsen
  0 siblings, 0 replies; 49+ messages in thread
From: bill davidsen @ 2003-11-17 20:58 UTC (permalink / raw)
  To: linux-kernel

In article <100480000.1068841761@flay>,
Martin J. Bligh <mbligh@aracnet.com> wrote:
| >> > - Several ext2 and ext3 allocator fixes.  These need serious testing on
| >> > big SMP.
| >> 
| >> OK, ext3 survived a swatting on the 16-way as well. It's still slow as
| >> snot, but it does work ;-) No changes from before, methinks.
| >> 
| >> Diffprofile for kernbench (-j) from ext2 to ext3 on mm3
| >> 
| >>      27022    16.3% total
| >>      24069    53.3% default_idle
| >>        583     2.4% page_remove_rmap
| >>        539   248.4% fd_install
| >>        478   388.6% __blk_queue_bounce
| > 
| > What driver are you using ? Why are you bouncing ?
| 
| qlogicisp. Because the driver is crap? ;-)

The question is, does that make your testing better or worse in terms of
checking the new code? Clearly you have done a good job of checking the
"disk can't keep up" case, is there a need to test further with a much
higher transaction rate?

I would assume that if there were lock issues they would have shown up,
which is probably all that's needed.
-- 
bill davidsen <davidsen@tmr.com>
  CTO, TMR Associates, Inc
Doing interesting things with little computers since 1979.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH][2.6-mm] Fix 4G/4G X11/vm86 oops
  2003-11-15 19:34           ` [PATCH][2.6-mm] Fix 4G/4G X11/vm86 oops Zwane Mwaikambo
  2003-11-15 19:52             ` Zwane Mwaikambo
@ 2003-11-17 21:46             ` Zwane Mwaikambo
  2003-11-17 22:42               ` Linus Torvalds
  1 sibling, 1 reply; 49+ messages in thread
From: Zwane Mwaikambo @ 2003-11-17 21:46 UTC (permalink / raw)
  To: Ingo Molnar
  Cc: Martin J. Bligh, Andrew Morton, Linux Kernel, linux-mm, Hugh Dickins

On Sat, 15 Nov 2003, Zwane Mwaikambo wrote:

> The 4G/4G page fault handling path doesn't appear to handle faults 
> happening whilst in vm86. The regs->xcs != __USER_CS so it confused the in 
> kernel test.
> 
> However i'm still debugging the X11 triple fault in test9-mm3

I've managed to `fix` the triple fault (see further below for the patch 
in all it's glory). Unfortunately i have been unable to come up with a 
simpler workaround which is fewer instructions and easier to debug. I have 
tried the following;

mb()/barrier()
flush_tlb_all()
wbinvd()
outb(0x80,0x00)
local_irq_save(flags); local_irq_enable(); loop(); local_irq_restore(flags);
long_loop()

What i do know is that in the following code;

	__asm__ __volatile__(
		"xorl %%eax,%%eax; movl %%eax,%%fs; movl %%eax,%%gs\n\t"
		"movl %0,%%esp\n\t"
		"movl %1,%%ebp\n\t"
		"jmp resume_userspace"
		: /* no outputs */
		:"r" (&info->regs), "r" (tsk->thread_info) : "ax");

It does get to resume_userspace as putting a $0 into %ebp will oops in 
__switch_to

And here is the current 'workaround'. Any hints?

Index: arch/i386/kernel/vm86.c
===================================================================
RCS file: /build/cvsroot/linux-2.6.0-test9-mm3/arch/i386/kernel/vm86.c,v
retrieving revision 1.1.1.1
diff -u -p -B -r1.1.1.1 vm86.c
--- arch/i386/kernel/vm86.c	13 Nov 2003 08:07:17 -0000	1.1.1.1
+++ arch/i386/kernel/vm86.c	17 Nov 2003 21:45:13 -0000
@@ -312,6 +311,8 @@ static void do_sys_vm86(struct kernel_vm
 	tsk->thread.screen_bitmap = info->screen_bitmap;
 	if (info->flags & VM86_SCREEN_BITMAP)
 		mark_screen_rdonly(tsk);
+
+	printk("ooh la la\n");
 	__asm__ __volatile__(
 		"xorl %%eax,%%eax; movl %%eax,%%fs; movl %%eax,%%gs\n\t"
 		"movl %0,%%esp\n\t"

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH][2.6-mm] Fix 4G/4G X11/vm86 oops
  2003-11-17 21:46             ` Zwane Mwaikambo
@ 2003-11-17 22:42               ` Linus Torvalds
  2003-11-17 23:01                 ` Zwane Mwaikambo
  0 siblings, 1 reply; 49+ messages in thread
From: Linus Torvalds @ 2003-11-17 22:42 UTC (permalink / raw)
  To: Zwane Mwaikambo
  Cc: Ingo Molnar, Martin J. Bligh, Andrew Morton, Linux Kernel,
	linux-mm, Hugh Dickins


On Mon, 17 Nov 2003, Zwane Mwaikambo wrote:
> 
> I've managed to `fix` the triple fault (see further below for the patch 
> in all it's glory).

What's the generated assembly language for this function with and without 
the "fix"?

If adding that printk fixes a triple fault, the issue is not likely to be 
the printk itself as much as the difference in code that the compiler 
generates - stack frame, memory re-ordering etc...

		Linus


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH][2.6-mm] Fix 4G/4G X11/vm86 oops
  2003-11-17 22:42               ` Linus Torvalds
@ 2003-11-17 23:01                 ` Zwane Mwaikambo
  2003-11-17 23:14                   ` Zwane Mwaikambo
  0 siblings, 1 reply; 49+ messages in thread
From: Zwane Mwaikambo @ 2003-11-17 23:01 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Martin J. Bligh, Andrew Morton, Linux Kernel,
	linux-mm, Hugh Dickins

On Mon, 17 Nov 2003, Linus Torvalds wrote:

> What's the generated assembly language for this function with and without 
> the "fix"?
> 
> If adding that printk fixes a triple fault, the issue is not likely to be 
> the printk itself as much as the difference in code that the compiler 
> generates - stack frame, memory re-ordering etc...

This would be my 'trusty' gcc 3.2.2 from RedHat 9
(gcc version 3.2.2 20030222 (Red Hat Linux 3.2.2-5)

With the fix:
0x0210e860 <do_sys_vm86+0>:     push   %edi
0x0210e861 <do_sys_vm86+1>:     mov    $0xffffe000,%eax
0x0210e866 <do_sys_vm86+6>:     push   %esi
0x0210e867 <do_sys_vm86+7>:     and    %esp,%eax
0x0210e869 <do_sys_vm86+9>:     push   %ebx
0x0210e86a <do_sys_vm86+10>:    mov    0x10(%esp,1),%edi
0x0210e86e <do_sys_vm86+14>:    mov    0x14(%esp,1),%esi
0x0210e872 <do_sys_vm86+18>:    movl   $0x0,0x1c(%edi)
0x0210e879 <do_sys_vm86+25>:    movl   $0x0,0x20(%edi)
0x0210e880 <do_sys_vm86+32>:    mov    (%eax),%edx
0x0210e882 <do_sys_vm86+34>:    mov    0x30(%edi),%eax
0x0210e885 <do_sys_vm86+37>:    mov    %eax,0x5b8(%edx)
0x0210e88b <do_sys_vm86+43>:    mov    0x30(%edi),%edx
0x0210e88e <do_sys_vm86+46>:    mov    0xbc(%edi),%eax
0x0210e894 <do_sys_vm86+52>:    and    $0xdd5,%edx
0x0210e89a <do_sys_vm86+58>:    mov    %edx,0x30(%edi)
0x0210e89d <do_sys_vm86+61>:    mov    0x30(%eax),%eax
0x0210e8a0 <do_sys_vm86+64>:    and    $0xfffff22a,%eax
0x0210e8a5 <do_sys_vm86+69>:    or     %eax,%edx
0x0210e8a7 <do_sys_vm86+71>:    mov    0x54(%edi),%eax
0x0210e8aa <do_sys_vm86+74>:    or     $0x20000,%edx
0x0210e8b0 <do_sys_vm86+80>:    cmp    $0x3,%eax
0x0210e8b3 <do_sys_vm86+83>:    mov    %edx,0x30(%edi)
0x0210e8b6 <do_sys_vm86+86>:    je     0x210e9f0 <do_sys_vm86+400>
0x0210e8bc <do_sys_vm86+92>:    cmp    $0x3,%eax
0x0210e8bf <do_sys_vm86+95>:    ja     0x210e9d5 <do_sys_vm86+373>
0x0210e8c5 <do_sys_vm86+101>:   cmp    $0x2,%eax
0x0210e8c8 <do_sys_vm86+104>:   je     0x210e9c6 <do_sys_vm86+358>
0x0210e8ce <do_sys_vm86+110>:   movl   $0x247000,0x5bc(%esi)
0x0210e8d8 <do_sys_vm86+120>:   mov    0xbc(%edi),%eax
0x0210e8de <do_sys_vm86+126>:   movl   $0x0,0x18(%eax)
0x0210e8e5 <do_sys_vm86+133>:   mov    0x360(%esi),%eax
0x0210e8eb <do_sys_vm86+139>:   mov    %eax,0x5c0(%esi)
0x0210e8f1 <do_sys_vm86+145>:   movl   %fs,0x5c4(%esi)
0x0210e8f7 <do_sys_vm86+151>:   movl   %gs,0x5c8(%esi)
0x0210e8fd <do_sys_vm86+157>:   mov    $0xffffe000,%ebx
0x0210e902 <do_sys_vm86+162>:   and    %esp,%ebx
0x0210e904 <do_sys_vm86+164>:   mov    0x14(%ebx),%eax
0x0210e907 <do_sys_vm86+167>:   inc    %eax
0x0210e908 <do_sys_vm86+168>:   mov    %eax,0x14(%ebx)
0x0210e90b <do_sys_vm86+171>:   mov    0x10(%ebx),%eax
0x0210e90e <do_sys_vm86+174>:   mov    0x4(%esi),%edx
0x0210e911 <do_sys_vm86+177>:   shl    $0x9,%eax
0x0210e914 <do_sys_vm86+180>:   lea    0x26ff000(%eax),%ecx
0x0210e91a <do_sys_vm86+186>:   lea    0x4c(%edi),%eax
0x0210e91d <do_sys_vm86+189>:   mov    %eax,0x360(%esi)
0x0210e923 <do_sys_vm86+195>:   sub    0x1c(%edx),%eax
0x0210e926 <do_sys_vm86+198>:   add    0x20(%edx),%eax
0x0210e929 <do_sys_vm86+201>:   mov    %eax,0x4(%ecx)
0x0210e92c <do_sys_vm86+204>:   mov    0x25fe52c,%eax
0x0210e931 <do_sys_vm86+209>:   test   $0x800,%eax
0x0210e936 <do_sys_vm86+214>:   je     0x210e942 <do_sys_vm86+226>
0x0210e938 <do_sys_vm86+216>:   movl   $0x0,0x364(%esi)
0x0210e942 <do_sys_vm86+226>:   lea    0x340(%esi),%edx
0x0210e948 <do_sys_vm86+232>:   mov    0x20(%edx),%eax
0x0210e94b <do_sys_vm86+235>:   mov    %eax,0x4(%ecx)
0x0210e94e <do_sys_vm86+238>:   mov    0x10(%ecx),%ax
0x0210e952 <do_sys_vm86+242>:   and    $0xffff,%eax
0x0210e957 <do_sys_vm86+247>:   cmp    0x24(%edx),%eax
0x0210e95a <do_sys_vm86+250>:   jne    0x210e9b0 <do_sys_vm86+336>
0x0210e95c <do_sys_vm86+252>:   mov    0x14(%ebx),%eax
0x0210e95f <do_sys_vm86+255>:   dec    %eax
0x0210e960 <do_sys_vm86+256>:   mov    %eax,0x14(%ebx)
0x0210e963 <do_sys_vm86+259>:   mov    0x8(%ebx),%eax
0x0210e966 <do_sys_vm86+262>:   and    $0x8,%eax
0x0210e969 <do_sys_vm86+265>:   jne    0x210e9a9 <do_sys_vm86+329>
0x0210e96b <do_sys_vm86+267>:   push   $0x255f121
0x0210e970 <do_sys_vm86+272>:   call   0x21285a0 <printk>
0x0210e975 <do_sys_vm86+277>:   mov    0x50(%edi),%eax
0x0210e978 <do_sys_vm86+280>:   mov    %eax,0x5b4(%esi)
0x0210e97e <do_sys_vm86+286>:   pop    %eax
0x0210e97f <do_sys_vm86+287>:   testb  $0x1,0x4c(%edi)
0x0210e983 <do_sys_vm86+291>:   jne    0x210e9a0 <do_sys_vm86+320>
0x0210e985 <do_sys_vm86+293>:   mov    0x4(%esi),%edx
0x0210e988 <do_sys_vm86+296>:   xor    %eax,%eax
0x0210e98a <do_sys_vm86+298>:   mov    %eax,%fs
0x0210e98c <do_sys_vm86+300>:   mov    %eax,%gs
0x0210e98e <do_sys_vm86+302>:   mov    %edi,%esp
0x0210e990 <do_sys_vm86+304>:   mov    %edx,%ebp
0x0210e992 <do_sys_vm86+306>:   jmp    0xfffeb100 <resume_userspace>
0x0210e997 <do_sys_vm86+311>:   pop    %ebx
0x0210e998 <do_sys_vm86+312>:   pop    %esi
0x0210e999 <do_sys_vm86+313>:   pop    %edi
0x0210e99a <do_sys_vm86+314>:   ret
0x0210e99b <do_sys_vm86+315>:   nop
0x0210e99c <do_sys_vm86+316>:   lea    0x0(%esi,1),%esi
0x0210e9a0 <do_sys_vm86+320>:   push   %esi
0x0210e9a1 <do_sys_vm86+321>:   call   0x210e5b0 <mark_screen_rdonly>
0x0210e9a6 <do_sys_vm86+326>:   pop    %eax
0x0210e9a7 <do_sys_vm86+327>:   jmp    0x210e985 <do_sys_vm86+293>
0x0210e9a9 <do_sys_vm86+329>:   call   0x21222d0 <preempt_schedule>
0x0210e9ae <do_sys_vm86+334>:   jmp    0x210e96b <do_sys_vm86+267>
0x0210e9b0 <do_sys_vm86+336>:   mov    0x24(%edx),%ax
0x0210e9b4 <do_sys_vm86+340>:   mov    %ax,0x10(%ecx)
0x0210e9b8 <do_sys_vm86+344>:   mov    $0x174,%ecx
0x0210e9bd <do_sys_vm86+349>:   mov    0x24(%edx),%eax
0x0210e9c0 <do_sys_vm86+352>:   xor    %edx,%edx
0x0210e9c2 <do_sys_vm86+354>:   wrmsr
0x0210e9c4 <do_sys_vm86+356>:   jmp    0x210e95c <do_sys_vm86+252>
0x0210e9c6 <do_sys_vm86+358>:   movl   $0x0,0x5bc(%esi)
0x0210e9d0 <do_sys_vm86+368>:   jmp    0x210e8d8 <do_sys_vm86+120>
0x0210e9d5 <do_sys_vm86+373>:   cmp    $0x4,%eax
0x0210e9d8 <do_sys_vm86+376>:   jne    0x210e8ce <do_sys_vm86+110>
0x0210e9de <do_sys_vm86+382>:   movl   $0x47000,0x5bc(%esi)
0x0210e9e8 <do_sys_vm86+392>:   jmp    0x210e8d8 <do_sys_vm86+120>
0x0210e9ed <do_sys_vm86+397>:   lea    0x0(%esi),%esi
0x0210e9f0 <do_sys_vm86+400>:   movl   $0x7000,0x5bc(%esi)
0x0210e9fa <do_sys_vm86+410>:   jmp    0x210e8d8 <do_sys_vm86+120>

Without the fix:
0x0210e860 <do_sys_vm86+0>:     push   %edi
0x0210e861 <do_sys_vm86+1>:     mov    $0xffffe000,%eax
0x0210e866 <do_sys_vm86+6>:     push   %esi
0x0210e867 <do_sys_vm86+7>:     and    %esp,%eax
0x0210e869 <do_sys_vm86+9>:     push   %ebx
0x0210e86a <do_sys_vm86+10>:    mov    0x10(%esp,1),%edi
0x0210e86e <do_sys_vm86+14>:    mov    0x14(%esp,1),%esi
0x0210e872 <do_sys_vm86+18>:    movl   $0x0,0x1c(%edi)
0x0210e879 <do_sys_vm86+25>:    movl   $0x0,0x20(%edi)
0x0210e880 <do_sys_vm86+32>:    mov    (%eax),%edx
0x0210e882 <do_sys_vm86+34>:    mov    0x30(%edi),%eax
0x0210e885 <do_sys_vm86+37>:    mov    %eax,0x5b8(%edx)
0x0210e88b <do_sys_vm86+43>:    mov    0x30(%edi),%edx
0x0210e88e <do_sys_vm86+46>:    mov    0xbc(%edi),%eax
0x0210e894 <do_sys_vm86+52>:    and    $0xdd5,%edx
0x0210e89a <do_sys_vm86+58>:    mov    %edx,0x30(%edi)
0x0210e89d <do_sys_vm86+61>:    mov    0x30(%eax),%eax
0x0210e8a0 <do_sys_vm86+64>:    and    $0xfffff22a,%eax
0x0210e8a5 <do_sys_vm86+69>:    or     %eax,%edx
0x0210e8a7 <do_sys_vm86+71>:    mov    0x54(%edi),%eax
0x0210e8aa <do_sys_vm86+74>:    or     $0x20000,%edx
0x0210e8b0 <do_sys_vm86+80>:    cmp    $0x3,%eax
0x0210e8b3 <do_sys_vm86+83>:    mov    %edx,0x30(%edi)
0x0210e8b6 <do_sys_vm86+86>:    je     0x210e9e0 <do_sys_vm86+384>
0x0210e8bc <do_sys_vm86+92>:    cmp    $0x3,%eax
0x0210e8bf <do_sys_vm86+95>:    ja     0x210e9c5 <do_sys_vm86+357>
0x0210e8c5 <do_sys_vm86+101>:   cmp    $0x2,%eax
0x0210e8c8 <do_sys_vm86+104>:   je     0x210e9b6 <do_sys_vm86+342>
0x0210e8ce <do_sys_vm86+110>:   movl   $0x247000,0x5bc(%esi)
0x0210e8d8 <do_sys_vm86+120>:   mov    0xbc(%edi),%eax
0x0210e8de <do_sys_vm86+126>:   movl   $0x0,0x18(%eax)
0x0210e8e5 <do_sys_vm86+133>:   mov    0x360(%esi),%eax
0x0210e8eb <do_sys_vm86+139>:   mov    %eax,0x5c0(%esi)
0x0210e8f1 <do_sys_vm86+145>:   movl   %fs,0x5c4(%esi)
0x0210e8f7 <do_sys_vm86+151>:   movl   %gs,0x5c8(%esi)
0x0210e8fd <do_sys_vm86+157>:   mov    $0xffffe000,%ebx
0x0210e902 <do_sys_vm86+162>:   and    %esp,%ebx
0x0210e904 <do_sys_vm86+164>:   mov    0x14(%ebx),%eax
0x0210e907 <do_sys_vm86+167>:   inc    %eax
0x0210e908 <do_sys_vm86+168>:   mov    %eax,0x14(%ebx)
0x0210e90b <do_sys_vm86+171>:   mov    0x10(%ebx),%eax
0x0210e90e <do_sys_vm86+174>:   mov    0x4(%esi),%edx
0x0210e911 <do_sys_vm86+177>:   shl    $0x9,%eax
0x0210e914 <do_sys_vm86+180>:   lea    0x26ff000(%eax),%ecx
0x0210e91a <do_sys_vm86+186>:   lea    0x4c(%edi),%eax
0x0210e91d <do_sys_vm86+189>:   mov    %eax,0x360(%esi)
0x0210e923 <do_sys_vm86+195>:   sub    0x1c(%edx),%eax
0x0210e926 <do_sys_vm86+198>:   add    0x20(%edx),%eax
0x0210e929 <do_sys_vm86+201>:   mov    %eax,0x4(%ecx)
0x0210e92c <do_sys_vm86+204>:   mov    0x25fe52c,%eax
0x0210e931 <do_sys_vm86+209>:   test   $0x800,%eax
0x0210e936 <do_sys_vm86+214>:   je     0x210e942 <do_sys_vm86+226>
0x0210e938 <do_sys_vm86+216>:   movl   $0x0,0x364(%esi)
0x0210e942 <do_sys_vm86+226>:   lea    0x340(%esi),%edx
0x0210e948 <do_sys_vm86+232>:   mov    0x20(%edx),%eax
0x0210e94b <do_sys_vm86+235>:   mov    %eax,0x4(%ecx)
0x0210e94e <do_sys_vm86+238>:   mov    0x10(%ecx),%ax
0x0210e952 <do_sys_vm86+242>:   and    $0xffff,%eax
0x0210e957 <do_sys_vm86+247>:   cmp    0x24(%edx),%eax
0x0210e95a <do_sys_vm86+250>:   jne    0x210e9a0 <do_sys_vm86+320>
0x0210e95c <do_sys_vm86+252>:   mov    0x14(%ebx),%eax
0x0210e95f <do_sys_vm86+255>:   dec    %eax
0x0210e960 <do_sys_vm86+256>:   mov    %eax,0x14(%ebx)
0x0210e963 <do_sys_vm86+259>:   mov    0x8(%ebx),%eax
0x0210e966 <do_sys_vm86+262>:   and    $0x8,%eax
0x0210e969 <do_sys_vm86+265>:   jne    0x210e999 <do_sys_vm86+313>
0x0210e96b <do_sys_vm86+267>:   mov    0x50(%edi),%eax
0x0210e96e <do_sys_vm86+270>:   mov    %eax,0x5b4(%esi)
0x0210e974 <do_sys_vm86+276>:   testb  $0x1,0x4c(%edi)
0x0210e978 <do_sys_vm86+280>:   jne    0x210e990 <do_sys_vm86+304>
0x0210e97a <do_sys_vm86+282>:   mov    0x4(%esi),%edx
0x0210e97d <do_sys_vm86+285>:   xor    %eax,%eax
0x0210e97f <do_sys_vm86+287>:   mov    %eax,%fs
0x0210e981 <do_sys_vm86+289>:   mov    %eax,%gs
0x0210e983 <do_sys_vm86+291>:   mov    %edi,%esp
0x0210e985 <do_sys_vm86+293>:   mov    %edx,%ebp
0x0210e987 <do_sys_vm86+295>:   jmp    0xfffeb100 <resume_userspace>
0x0210e98c <do_sys_vm86+300>:   pop    %ebx
0x0210e98d <do_sys_vm86+301>:   pop    %esi
0x0210e98e <do_sys_vm86+302>:   pop    %edi
0x0210e98f <do_sys_vm86+303>:   ret
0x0210e990 <do_sys_vm86+304>:   push   %esi
0x0210e991 <do_sys_vm86+305>:   call   0x210e5b0 <mark_screen_rdonly>
0x0210e996 <do_sys_vm86+310>:   pop    %eax
0x0210e997 <do_sys_vm86+311>:   jmp    0x210e97a <do_sys_vm86+282>
0x0210e999 <do_sys_vm86+313>:   call   0x21222c0 <preempt_schedule>
0x0210e99e <do_sys_vm86+318>:   jmp    0x210e96b <do_sys_vm86+267>
0x0210e9a0 <do_sys_vm86+320>:   mov    0x24(%edx),%ax
0x0210e9a4 <do_sys_vm86+324>:   mov    %ax,0x10(%ecx)
0x0210e9a8 <do_sys_vm86+328>:   mov    $0x174,%ecx
0x0210e9ad <do_sys_vm86+333>:   mov    0x24(%edx),%eax
0x0210e9b0 <do_sys_vm86+336>:   xor    %edx,%edx
0x0210e9b2 <do_sys_vm86+338>:   wrmsr
0x0210e9b4 <do_sys_vm86+340>:   jmp    0x210e95c <do_sys_vm86+252>
0x0210e9b6 <do_sys_vm86+342>:   movl   $0x0,0x5bc(%esi)
0x0210e9c0 <do_sys_vm86+352>:   jmp    0x210e8d8 <do_sys_vm86+120>
0x0210e9c5 <do_sys_vm86+357>:   cmp    $0x4,%eax
0x0210e9c8 <do_sys_vm86+360>:   jne    0x210e8ce <do_sys_vm86+110>
0x0210e9ce <do_sys_vm86+366>:   movl   $0x47000,0x5bc(%esi)
0x0210e9d8 <do_sys_vm86+376>:   jmp    0x210e8d8 <do_sys_vm86+120>
0x0210e9dd <do_sys_vm86+381>:   lea    0x0(%esi),%esi
0x0210e9e0 <do_sys_vm86+384>:   movl   $0x7000,0x5bc(%esi)
0x0210e9ea <do_sys_vm86+394>:   jmp    0x210e8d8 <do_sys_vm86+120>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH][2.6-mm] Fix 4G/4G X11/vm86 oops
  2003-11-17 23:01                 ` Zwane Mwaikambo
@ 2003-11-17 23:14                   ` Zwane Mwaikambo
  2003-11-18  7:21                     ` Zwane Mwaikambo
  0 siblings, 1 reply; 49+ messages in thread
From: Zwane Mwaikambo @ 2003-11-17 23:14 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Martin J. Bligh, Andrew Morton, Linux Kernel,
	linux-mm, Hugh Dickins

On Mon, 17 Nov 2003, Zwane Mwaikambo wrote:

> On Mon, 17 Nov 2003, Linus Torvalds wrote:
> 
> > What's the generated assembly language for this function with and without 
> > the "fix"?
> > 
> > If adding that printk fixes a triple fault, the issue is not likely to be 
> > the printk itself as much as the difference in code that the compiler 
> > generates - stack frame, memory re-ordering etc...
> 
> This would be my 'trusty' gcc 3.2.2 from RedHat 9
> (gcc version 3.2.2 20030222 (Red Hat Linux 3.2.2-5)

A little bird told me to send diffs... But there is a lot of noise due to 
offsets i'm afraid.

--- buggy	2003-11-17 18:09:35.302964248 -0500
+++ works	2003-11-17 18:09:47.744072912 -0500
@@ -21,11 +21,11 @@
 0x0210e8aa <do_sys_vm86+74>:    or     $0x20000,%edx
 0x0210e8b0 <do_sys_vm86+80>:    cmp    $0x3,%eax
 0x0210e8b3 <do_sys_vm86+83>:    mov    %edx,0x30(%edi)
-0x0210e8b6 <do_sys_vm86+86>:    je     0x210e9e0 <do_sys_vm86+384>
+0x0210e8b6 <do_sys_vm86+86>:    je     0x210e9f0 <do_sys_vm86+400>
 0x0210e8bc <do_sys_vm86+92>:    cmp    $0x3,%eax
-0x0210e8bf <do_sys_vm86+95>:    ja     0x210e9c5 <do_sys_vm86+357>
+0x0210e8bf <do_sys_vm86+95>:    ja     0x210e9d5 <do_sys_vm86+373>
 0x0210e8c5 <do_sys_vm86+101>:   cmp    $0x2,%eax
-0x0210e8c8 <do_sys_vm86+104>:   je     0x210e9b6 <do_sys_vm86+342>
+0x0210e8c8 <do_sys_vm86+104>:   je     0x210e9c6 <do_sys_vm86+358>
 0x0210e8ce <do_sys_vm86+110>:   movl   $0x247000,0x5bc(%esi)
 0x0210e8d8 <do_sys_vm86+120>:   mov    0xbc(%edi),%eax
 0x0210e8de <do_sys_vm86+126>:   movl   $0x0,0x18(%eax)
@@ -57,47 +57,52 @@
 0x0210e94e <do_sys_vm86+238>:   mov    0x10(%ecx),%ax
 0x0210e952 <do_sys_vm86+242>:   and    $0xffff,%eax
 0x0210e957 <do_sys_vm86+247>:   cmp    0x24(%edx),%eax
-0x0210e95a <do_sys_vm86+250>:   jne    0x210e9a0 <do_sys_vm86+320>
+0x0210e95a <do_sys_vm86+250>:   jne    0x210e9b0 <do_sys_vm86+336>
 0x0210e95c <do_sys_vm86+252>:   mov    0x14(%ebx),%eax
 0x0210e95f <do_sys_vm86+255>:   dec    %eax
 0x0210e960 <do_sys_vm86+256>:   mov    %eax,0x14(%ebx)
 0x0210e963 <do_sys_vm86+259>:   mov    0x8(%ebx),%eax
 0x0210e966 <do_sys_vm86+262>:   and    $0x8,%eax
-0x0210e969 <do_sys_vm86+265>:   jne    0x210e999 <do_sys_vm86+313>
-0x0210e96b <do_sys_vm86+267>:   mov    0x50(%edi),%eax
-0x0210e96e <do_sys_vm86+270>:   mov    %eax,0x5b4(%esi)
-0x0210e974 <do_sys_vm86+276>:   testb  $0x1,0x4c(%edi)
-0x0210e978 <do_sys_vm86+280>:   jne    0x210e990 <do_sys_vm86+304>
-0x0210e97a <do_sys_vm86+282>:   mov    0x4(%esi),%edx
-0x0210e97d <do_sys_vm86+285>:   xor    %eax,%eax
-0x0210e97f <do_sys_vm86+287>:   mov    %eax,%fs
-0x0210e981 <do_sys_vm86+289>:   mov    %eax,%gs
-0x0210e983 <do_sys_vm86+291>:   mov    %edi,%esp
-0x0210e985 <do_sys_vm86+293>:   mov    %edx,%ebp
-0x0210e987 <do_sys_vm86+295>:   jmp    0xfffeb100 <resume_userspace>
-0x0210e98c <do_sys_vm86+300>:   pop    %ebx
-0x0210e98d <do_sys_vm86+301>:   pop    %esi
-0x0210e98e <do_sys_vm86+302>:   pop    %edi
-0x0210e98f <do_sys_vm86+303>:   ret
-0x0210e990 <do_sys_vm86+304>:   push   %esi
-0x0210e991 <do_sys_vm86+305>:   call   0x210e5b0 <mark_screen_rdonly>
-0x0210e996 <do_sys_vm86+310>:   pop    %eax
-0x0210e997 <do_sys_vm86+311>:   jmp    0x210e97a <do_sys_vm86+282>
-0x0210e999 <do_sys_vm86+313>:   call   0x21222c0 <preempt_schedule>
-0x0210e99e <do_sys_vm86+318>:   jmp    0x210e96b <do_sys_vm86+267>
-0x0210e9a0 <do_sys_vm86+320>:   mov    0x24(%edx),%ax
-0x0210e9a4 <do_sys_vm86+324>:   mov    %ax,0x10(%ecx)
-0x0210e9a8 <do_sys_vm86+328>:   mov    $0x174,%ecx
-0x0210e9ad <do_sys_vm86+333>:   mov    0x24(%edx),%eax
-0x0210e9b0 <do_sys_vm86+336>:   xor    %edx,%edx
-0x0210e9b2 <do_sys_vm86+338>:   wrmsr
-0x0210e9b4 <do_sys_vm86+340>:   jmp    0x210e95c <do_sys_vm86+252>
-0x0210e9b6 <do_sys_vm86+342>:   movl   $0x0,0x5bc(%esi)
-0x0210e9c0 <do_sys_vm86+352>:   jmp    0x210e8d8 <do_sys_vm86+120>
-0x0210e9c5 <do_sys_vm86+357>:   cmp    $0x4,%eax
-0x0210e9c8 <do_sys_vm86+360>:   jne    0x210e8ce <do_sys_vm86+110>
-0x0210e9ce <do_sys_vm86+366>:   movl   $0x47000,0x5bc(%esi)
-0x0210e9d8 <do_sys_vm86+376>:   jmp    0x210e8d8 <do_sys_vm86+120>
-0x0210e9dd <do_sys_vm86+381>:   lea    0x0(%esi),%esi
-0x0210e9e0 <do_sys_vm86+384>:   movl   $0x7000,0x5bc(%esi)
-0x0210e9ea <do_sys_vm86+394>:   jmp    0x210e8d8 <do_sys_vm86+120>
+0x0210e969 <do_sys_vm86+265>:   jne    0x210e9a9 <do_sys_vm86+329>
+0x0210e96b <do_sys_vm86+267>:   push   $0x255f121
+0x0210e970 <do_sys_vm86+272>:   call   0x21285a0 <printk>
+0x0210e975 <do_sys_vm86+277>:   mov    0x50(%edi),%eax
+0x0210e978 <do_sys_vm86+280>:   mov    %eax,0x5b4(%esi)
+0x0210e97e <do_sys_vm86+286>:   pop    %eax
+0x0210e97f <do_sys_vm86+287>:   testb  $0x1,0x4c(%edi)
+0x0210e983 <do_sys_vm86+291>:   jne    0x210e9a0 <do_sys_vm86+320>
+0x0210e985 <do_sys_vm86+293>:   mov    0x4(%esi),%edx
+0x0210e988 <do_sys_vm86+296>:   xor    %eax,%eax
+0x0210e98a <do_sys_vm86+298>:   mov    %eax,%fs
+0x0210e98c <do_sys_vm86+300>:   mov    %eax,%gs
+0x0210e98e <do_sys_vm86+302>:   mov    %edi,%esp
+0x0210e990 <do_sys_vm86+304>:   mov    %edx,%ebp
+0x0210e992 <do_sys_vm86+306>:   jmp    0xfffeb100 <resume_userspace>
+0x0210e997 <do_sys_vm86+311>:   pop    %ebx
+0x0210e998 <do_sys_vm86+312>:   pop    %esi
+0x0210e999 <do_sys_vm86+313>:   pop    %edi
+0x0210e99a <do_sys_vm86+314>:   ret
+0x0210e99b <do_sys_vm86+315>:   nop
+0x0210e99c <do_sys_vm86+316>:   lea    0x0(%esi,1),%esi
+0x0210e9a0 <do_sys_vm86+320>:   push   %esi
+0x0210e9a1 <do_sys_vm86+321>:   call   0x210e5b0 <mark_screen_rdonly>
+0x0210e9a6 <do_sys_vm86+326>:   pop    %eax
+0x0210e9a7 <do_sys_vm86+327>:   jmp    0x210e985 <do_sys_vm86+293>
+0x0210e9a9 <do_sys_vm86+329>:   call   0x21222d0 <preempt_schedule>
+0x0210e9ae <do_sys_vm86+334>:   jmp    0x210e96b <do_sys_vm86+267>
+0x0210e9b0 <do_sys_vm86+336>:   mov    0x24(%edx),%ax
+0x0210e9b4 <do_sys_vm86+340>:   mov    %ax,0x10(%ecx)
+0x0210e9b8 <do_sys_vm86+344>:   mov    $0x174,%ecx
+0x0210e9bd <do_sys_vm86+349>:   mov    0x24(%edx),%eax
+0x0210e9c0 <do_sys_vm86+352>:   xor    %edx,%edx
+0x0210e9c2 <do_sys_vm86+354>:   wrmsr
+0x0210e9c4 <do_sys_vm86+356>:   jmp    0x210e95c <do_sys_vm86+252>
+0x0210e9c6 <do_sys_vm86+358>:   movl   $0x0,0x5bc(%esi)
+0x0210e9d0 <do_sys_vm86+368>:   jmp    0x210e8d8 <do_sys_vm86+120>
+0x0210e9d5 <do_sys_vm86+373>:   cmp    $0x4,%eax
+0x0210e9d8 <do_sys_vm86+376>:   jne    0x210e8ce <do_sys_vm86+110>
+0x0210e9de <do_sys_vm86+382>:   movl   $0x47000,0x5bc(%esi)
+0x0210e9e8 <do_sys_vm86+392>:   jmp    0x210e8d8 <do_sys_vm86+120>
+0x0210e9ed <do_sys_vm86+397>:   lea    0x0(%esi),%esi
+0x0210e9f0 <do_sys_vm86+400>:   movl   $0x7000,0x5bc(%esi)
+0x0210e9fa <do_sys_vm86+410>:   jmp    0x210e8d8 <do_sys_vm86+120>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: 2.6.0-test9-mm3 - AIO test results
  2003-11-17  5:25   ` Suparna Bhattacharya
@ 2003-11-18  1:15     ` Daniel McNeil
  2003-11-18  1:37       ` Daniel McNeil
  0 siblings, 1 reply; 49+ messages in thread
From: Daniel McNeil @ 2003-11-18  1:15 UTC (permalink / raw)
  To: Suparna Bhattacharya
  Cc: Andrew Morton, Linux Kernel Mailing List, linux-mm, linux-aio

Suparna,

Good news and bad news.  Your patch does fix the non-power of two i/o
size problems where AIO previously did not complete:

$ ./aiodio_sparse  -s 1751k -r 18k -w 11k
$ aiodio_sparse -i 9 -dd -s 180k -r 18k -w 18k  
io_submit() return 9
aiodio_sparse: 9 i/o in flight
aiodio_sparse: offset 165888 filesize 184320 inflight 9
aiodio_sparse: io_getevent() returned 1
aiodio_sparse: io_getevent() res 18432 res2 0
io_submit() return 1
AIO DIO write done unlinking file
dio_sparse done writing, kill children
aiodio_sparse 0 children had errors

But when testing using aiocp using O_DIRECT to copy a file to
an already allocated file, the aiocp process hangs.  I used i/o
size of 4k and that compeleted.  Using i/o size of 1k and 2k,
the aiocp process hung during io_sumbit() and are unkillable.
Here are the stack traces:

# ps -fu daniel | grep aiocp
daniel    1920     1  0 16:45 ?        00:00:07 aiocp -b 1k -n 1 -f DIRECT glibc-2.3.2.tar ff2
daniel    2083  2037  0 17:00 pts/2    00:00:03 aiocp -dd -b 1k -n 8 -f DIRECT glibc-2.3.2.tar ff2


aiocp         D 00000001  1920      1                1902 (NOTLB)
e70abd04 00200086 c18dbc80 00000001 00000003 c02897fc 00000060 00200246
       f7cdb8b4 c16522f0 c18dbc80 0000309c 640a05eb 0000008b e6d9e660
c0289a16
       f7cdb8b4 e87e95cc c18dbc80 00000000 00000001 e70abd10 c0123712
e70aa000
Call Trace:
 [<c02897fc>] generic_unplug_device+0x50/0xbd
 [<c0289a16>] blk_run_queues+0xa9/0x15c
 [<c0123712>] io_schedule+0x26/0x30
 [<c0192242>] direct_io_worker+0x376/0x5ab
 [<c014840f>] generic_file_direct_IO+0x70/0x89
 [<c019264a>] __blockdev_direct_IO+0x1d3/0x2d5
 [<c01ac73e>] ext3_direct_io_get_blocks+0x0/0xbf
 [<c01ad72d>] ext3_direct_IO+0xc0/0x1e1
 [<c01ac73e>] ext3_direct_io_get_blocks+0x0/0xbf
 [<c014840f>] generic_file_direct_IO+0x70/0x89
 [<c0145e11>] __generic_file_aio_read+0xfb/0x1ff
 [<c0121b70>] schedule+0x3ac/0x7ef
 [<c0145f48>] generic_file_aio_read+0x33/0x37
 [<c0194ad3>] aio_pread+0x34/0x5f
 [<c0193bec>] aio_run_iocb+0xa6/0x1ed
 [<c019316f>] __aio_get_req+0x27/0x158
 [<c0194a9f>] aio_pread+0x0/0x5f
 [<c0194f62>] io_submit_one+0x1ea/0x2b7
 [<c0195110>] sys_io_submit+0xe1/0x194
 [<c03c29a7>] syscall_call+0x7/0xb
 [<c03c007b>] rpc_depopulate+0x1aa/0x24b


aiocp         D 366EDC94  2083   2037                     (NOTLB)
e758bd04 00200082 f71ba000 366edc94 00000161 c02897fc 00000060 366edc94
       00000161 f71ba000 c18d3c80 000069a9 366f5a0e 00000161 e8d4acc0 c0289a16
       f7cdb8b4 e960465c c18d3c80 00000000 00000001 e758bd10 c0123712 e758a000
Call Trace:
 [<c02897fc>] generic_unplug_device+0x50/0xbd
 [<c0289a16>] blk_run_queues+0xa9/0x15c
 [<c0123712>] io_schedule+0x26/0x30
 [<c0192242>] direct_io_worker+0x376/0x5ab
 [<c019264a>] __blockdev_direct_IO+0x1d3/0x2d5
 [<c01ac73e>] ext3_direct_io_get_blocks+0x0/0xbf
 [<c01ad72d>] ext3_direct_IO+0xc0/0x1e1
 [<c01ac73e>] ext3_direct_io_get_blocks+0x0/0xbf
 [<c014840f>] generic_file_direct_IO+0x70/0x89
 [<c0145e11>] __generic_file_aio_read+0xfb/0x1ff
 [<c0259d3e>] write_chan+0x165/0x21e
 [<c0145f48>] generic_file_aio_read+0x33/0x37
 [<c0194ad3>] aio_pread+0x34/0x5f
 [<c0193bec>] aio_run_iocb+0xa6/0x1ed
 [<c019316f>] __aio_get_req+0x27/0x158
 [<c0194a9f>] aio_pread+0x0/0x5f
 [<c02532ab>] tty_write+0x1e8/0x3b2
 [<c0194f62>] io_submit_one+0x1ea/0x2b7
 [<c0195110>] sys_io_submit+0xe1/0x194
 [<c03c29a7>] syscall_call+0x7/0xb
 [<c03c007b>] rpc_depopulate+0x1aa/0x24b



Daniel

On Sun, 2003-11-16 at 21:25, Suparna Bhattacharya wrote:
> On Thu, Nov 13, 2003 at 02:03:58PM -0800, Daniel McNeil wrote:
> > Andrew,
> > 
> > I'm testing test9-mm3 on a 2-proc Xeon with a ext3 file system.
> > I tested using the test programs aiocp and aiodio_sparse.
> > (see http://developer.osdl.org/daniel/AIO/)
> > 
> > Using aiocp with i/o sizes from 1k to 512k to copy files worked
> > without any errors or kernel debug messages.
> > 
> > With 64k i/o, the aiodio_sparse program complete without any errors.
> > There are no kernel error messages, so that is good.
> > 
> > There are still problems with non power of 2 i/o sizes using AIO and
> > O_DIRECT.  It hangs with aio's that do not seem to complete.  The test
> > does exit when hitting ^c and there are no kernel messages.  Test output
> > below:
> 
> Could you check if the following patch fixes the problem for you ?
> 
> Regards
> Suparna
> 
> --------------------------------------------------------------
> 
> With this patch, when the DIO code falls back to buffered i/o after
> having submitted part of the i/o, then buffered i/o is issued only
> for the remaining part of the request (i.e. the part not already 
> covered by DIO).
> 
> diff -ur pure-mm3/fs/direct-io.c linux-2.6.0-test9-mm3/fs/direct-io.c
> --- pure-mm3/fs/direct-io.c	2003-11-14 09:09:06.000000000 +0530
> +++ linux-2.6.0-test9-mm3/fs/direct-io.c	2003-11-17 09:00:47.000000000 +0530
> @@ -74,6 +74,7 @@
>  					   been performed at the start of a
>  					   write */
>  	int pages_in_io;		/* approximate total IO pages */
> +	size_t	size;			/* total request size (doesn't change)*/
>  	sector_t block_in_file;		/* Current offset into the underlying
>  					   file in dio_block units. */
>  	unsigned blocks_available;	/* At block_in_file.  changes */
> @@ -226,7 +227,7 @@
>  			dio_complete(dio, dio->block_in_file << dio->blkbits,
>  					dio->result);
>  			/* Complete AIO later if falling back to buffered i/o */
> -			if (dio->result != -ENOTBLK) {
> +			if (dio->result >= dio->size || dio->rw == READ) {
>  				aio_complete(dio->iocb, dio->result, 0);
>  				kfree(dio);
>  			} else {
> @@ -889,6 +890,7 @@
>  	dio->blkbits = blkbits;
>  	dio->blkfactor = inode->i_blkbits - blkbits;
>  	dio->start_zero_done = 0;
> +	dio->size = 0;
>  	dio->block_in_file = offset >> blkbits;
>  	dio->blocks_available = 0;
>  	dio->cur_page = NULL;
> @@ -925,7 +927,7 @@
>  
>  	for (seg = 0; seg < nr_segs; seg++) {
>  		user_addr = (unsigned long)iov[seg].iov_base;
> -		bytes = iov[seg].iov_len;
> +		dio->size += bytes = iov[seg].iov_len;
>  
>  		/* Index into the first page of the first block */
>  		dio->first_block_in_page = (user_addr & ~PAGE_MASK) >> blkbits;
> @@ -956,6 +958,13 @@
>  		}
>  	} /* end iovec loop */
>  
> +	if (ret == -ENOTBLK && rw == WRITE) {
> +		/*
> +		 * The remaining part of the request will be 
> +		 * be handled by buffered I/O when we return
> +		 */
> +		ret = 0;
> +	}
>  	/*
>  	 * There may be some unwritten disk at the end of a part-written
>  	 * fs-block-sized block.  Go zero that now.
> @@ -986,19 +995,13 @@
>  	 */
>  	if (dio->is_async) {
>  		if (ret == 0)
> -			ret = dio->result;	/* Bytes written */
> -		if (ret == -ENOTBLK) {
> -			/*
> -			 * The request will be reissued via buffered I/O
> -			 * when we return; Any I/O already issued
> -			 * effectively becomes redundant.
> -			 */
> -			dio->result = ret;
> +			ret = dio->result;
> +		if (ret > 0 && dio->result < dio->size && rw == WRITE) {
>  			dio->waiter = current;
>  		}
>  		finished_one_bio(dio);		/* This can free the dio */
>  		blk_run_queues();
> -		if (ret == -ENOTBLK) {
> +		if (dio->waiter) {
>  			/*
>  			 * Wait for already issued I/O to drain out and
>  			 * release its references to user-space pages
> @@ -1032,7 +1035,8 @@
>  		}
>  		dio_complete(dio, offset, ret);
>  		/* We could have also come here on an AIO file extend */
> -		if (!is_sync_kiocb(iocb) && (ret != -ENOTBLK))
> +		if (!is_sync_kiocb(iocb) && !(rw == WRITE && ret >= 0 && 
> +			dio->result < dio->size))
>  			aio_complete(iocb, ret, 0);
>  		kfree(dio);
>  	}
> diff -ur pure-mm3/mm/filemap.c linux-2.6.0-test9-mm3/mm/filemap.c
> --- pure-mm3/mm/filemap.c	2003-11-14 09:15:08.000000000 +0530
> +++ linux-2.6.0-test9-mm3/mm/filemap.c	2003-11-15 11:11:16.000000000 +0530
> @@ -1895,14 +1895,16 @@
>  		 */
>  		if (written >= 0 && file->f_flags & O_SYNC)
>  			status = generic_osync_inode(inode, mapping, OSYNC_METADATA);
> -		if (written >= 0 && !is_sync_kiocb(iocb))
> +		if (written >= count && !is_sync_kiocb(iocb))
>  			written = -EIOCBQUEUED;
> -		if (written != -ENOTBLK)
> +		if (written < 0 || written >= count)
>  			goto out_status;
>  		/*
>  		 * direct-io write to a hole: fall through to buffered I/O
> +		 * for completing the rest of the request.
>  		 */
> -		written = 0;
> +		pos += written;
> +		count -= written;
>  	}
>  
>  	buf = iov->iov_base;


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: 2.6.0-test9-mm3 - AIO test results
  2003-11-18  1:15     ` Daniel McNeil
@ 2003-11-18  1:37       ` Daniel McNeil
  2003-11-18 11:55         ` Suparna Bhattacharya
  0 siblings, 1 reply; 49+ messages in thread
From: Daniel McNeil @ 2003-11-18  1:37 UTC (permalink / raw)
  To: Suparna Bhattacharya
  Cc: Andrew Morton, Linux Kernel Mailing List, linux-mm, linux-aio

Obviously, the ps output in my previous email showed that the hangs were
with 1k i/o sizes.  

More testing using 2k, 4k, 16k, 32k, 64k, 128k, 256k and 512k all
completed correctly.

Even 11k and 17k worked.

$ ls -l
-rw-------    1 daniel   daniel   88289280 Jun  9 16:54 glibc-2.3.2.tar
-rw-rw-r--    1 daniel   daniel   88289280 Nov 17 17:32 ff2


So, only 1k is hanging so far.

Daniel

On Mon, 2003-11-17 at 17:15, Daniel McNeil wrote:
> Suparna,
> 
> Good news and bad news.  Your patch does fix the non-power of two i/o
> size problems where AIO previously did not complete:
> 
> $ ./aiodio_sparse  -s 1751k -r 18k -w 11k
> $ aiodio_sparse -i 9 -dd -s 180k -r 18k -w 18k  
> io_submit() return 9
> aiodio_sparse: 9 i/o in flight
> aiodio_sparse: offset 165888 filesize 184320 inflight 9
> aiodio_sparse: io_getevent() returned 1
> aiodio_sparse: io_getevent() res 18432 res2 0
> io_submit() return 1
> AIO DIO write done unlinking file
> dio_sparse done writing, kill children
> aiodio_sparse 0 children had errors
> 
> But when testing using aiocp using O_DIRECT to copy a file to
> an already allocated file, the aiocp process hangs.  I used i/o
> size of 4k and that compeleted.  Using i/o size of 1k and 2k,
> the aiocp process hung during io_sumbit() and are unkillable.
> Here are the stack traces:
> 
> # ps -fu daniel | grep aiocp
> daniel    1920     1  0 16:45 ?        00:00:07 aiocp -b 1k -n 1 -f DIRECT glibc-2.3.2.tar ff2
> daniel    2083  2037  0 17:00 pts/2    00:00:03 aiocp -dd -b 1k -n 8 -f DIRECT glibc-2.3.2.tar ff2
> 
> 
> aiocp         D 00000001  1920      1                1902 (NOTLB)
> e70abd04 00200086 c18dbc80 00000001 00000003 c02897fc 00000060 00200246
>        f7cdb8b4 c16522f0 c18dbc80 0000309c 640a05eb 0000008b e6d9e660
> c0289a16
>        f7cdb8b4 e87e95cc c18dbc80 00000000 00000001 e70abd10 c0123712
> e70aa000
> Call Trace:
>  [<c02897fc>] generic_unplug_device+0x50/0xbd
>  [<c0289a16>] blk_run_queues+0xa9/0x15c
>  [<c0123712>] io_schedule+0x26/0x30
>  [<c0192242>] direct_io_worker+0x376/0x5ab
>  [<c014840f>] generic_file_direct_IO+0x70/0x89
>  [<c019264a>] __blockdev_direct_IO+0x1d3/0x2d5
>  [<c01ac73e>] ext3_direct_io_get_blocks+0x0/0xbf
>  [<c01ad72d>] ext3_direct_IO+0xc0/0x1e1
>  [<c01ac73e>] ext3_direct_io_get_blocks+0x0/0xbf
>  [<c014840f>] generic_file_direct_IO+0x70/0x89
>  [<c0145e11>] __generic_file_aio_read+0xfb/0x1ff
>  [<c0121b70>] schedule+0x3ac/0x7ef
>  [<c0145f48>] generic_file_aio_read+0x33/0x37
>  [<c0194ad3>] aio_pread+0x34/0x5f
>  [<c0193bec>] aio_run_iocb+0xa6/0x1ed
>  [<c019316f>] __aio_get_req+0x27/0x158
>  [<c0194a9f>] aio_pread+0x0/0x5f
>  [<c0194f62>] io_submit_one+0x1ea/0x2b7
>  [<c0195110>] sys_io_submit+0xe1/0x194
>  [<c03c29a7>] syscall_call+0x7/0xb
>  [<c03c007b>] rpc_depopulate+0x1aa/0x24b
> 
> 
> aiocp         D 366EDC94  2083   2037                     (NOTLB)
> e758bd04 00200082 f71ba000 366edc94 00000161 c02897fc 00000060 366edc94
>        00000161 f71ba000 c18d3c80 000069a9 366f5a0e 00000161 e8d4acc0 c0289a16
>        f7cdb8b4 e960465c c18d3c80 00000000 00000001 e758bd10 c0123712 e758a000
> Call Trace:
>  [<c02897fc>] generic_unplug_device+0x50/0xbd
>  [<c0289a16>] blk_run_queues+0xa9/0x15c
>  [<c0123712>] io_schedule+0x26/0x30
>  [<c0192242>] direct_io_worker+0x376/0x5ab
>  [<c019264a>] __blockdev_direct_IO+0x1d3/0x2d5
>  [<c01ac73e>] ext3_direct_io_get_blocks+0x0/0xbf
>  [<c01ad72d>] ext3_direct_IO+0xc0/0x1e1
>  [<c01ac73e>] ext3_direct_io_get_blocks+0x0/0xbf
>  [<c014840f>] generic_file_direct_IO+0x70/0x89
>  [<c0145e11>] __generic_file_aio_read+0xfb/0x1ff
>  [<c0259d3e>] write_chan+0x165/0x21e
>  [<c0145f48>] generic_file_aio_read+0x33/0x37
>  [<c0194ad3>] aio_pread+0x34/0x5f
>  [<c0193bec>] aio_run_iocb+0xa6/0x1ed
>  [<c019316f>] __aio_get_req+0x27/0x158
>  [<c0194a9f>] aio_pread+0x0/0x5f
>  [<c02532ab>] tty_write+0x1e8/0x3b2
>  [<c0194f62>] io_submit_one+0x1ea/0x2b7
>  [<c0195110>] sys_io_submit+0xe1/0x194
>  [<c03c29a7>] syscall_call+0x7/0xb
>  [<c03c007b>] rpc_depopulate+0x1aa/0x24b
> 
> 
> 
> Daniel
> 
> On Sun, 2003-11-16 at 21:25, Suparna Bhattacharya wrote:
> > On Thu, Nov 13, 2003 at 02:03:58PM -0800, Daniel McNeil wrote:
> > > Andrew,
> > > 
> > > I'm testing test9-mm3 on a 2-proc Xeon with a ext3 file system.
> > > I tested using the test programs aiocp and aiodio_sparse.
> > > (see http://developer.osdl.org/daniel/AIO/)
> > > 
> > > Using aiocp with i/o sizes from 1k to 512k to copy files worked
> > > without any errors or kernel debug messages.
> > > 
> > > With 64k i/o, the aiodio_sparse program complete without any errors.
> > > There are no kernel error messages, so that is good.
> > > 
> > > There are still problems with non power of 2 i/o sizes using AIO and
> > > O_DIRECT.  It hangs with aio's that do not seem to complete.  The test
> > > does exit when hitting ^c and there are no kernel messages.  Test output
> > > below:
> > 
> > Could you check if the following patch fixes the problem for you ?
> > 
> > Regards
> > Suparna
> > 
> > --------------------------------------------------------------
> > 
> > With this patch, when the DIO code falls back to buffered i/o after
> > having submitted part of the i/o, then buffered i/o is issued only
> > for the remaining part of the request (i.e. the part not already 
> > covered by DIO).
> > 
> > diff -ur pure-mm3/fs/direct-io.c linux-2.6.0-test9-mm3/fs/direct-io.c
> > --- pure-mm3/fs/direct-io.c	2003-11-14 09:09:06.000000000 +0530
> > +++ linux-2.6.0-test9-mm3/fs/direct-io.c	2003-11-17 09:00:47.000000000 +0530
> > @@ -74,6 +74,7 @@
> >  					   been performed at the start of a
> >  					   write */
> >  	int pages_in_io;		/* approximate total IO pages */
> > +	size_t	size;			/* total request size (doesn't change)*/
> >  	sector_t block_in_file;		/* Current offset into the underlying
> >  					   file in dio_block units. */
> >  	unsigned blocks_available;	/* At block_in_file.  changes */
> > @@ -226,7 +227,7 @@
> >  			dio_complete(dio, dio->block_in_file << dio->blkbits,
> >  					dio->result);
> >  			/* Complete AIO later if falling back to buffered i/o */
> > -			if (dio->result != -ENOTBLK) {
> > +			if (dio->result >= dio->size || dio->rw == READ) {
> >  				aio_complete(dio->iocb, dio->result, 0);
> >  				kfree(dio);
> >  			} else {
> > @@ -889,6 +890,7 @@
> >  	dio->blkbits = blkbits;
> >  	dio->blkfactor = inode->i_blkbits - blkbits;
> >  	dio->start_zero_done = 0;
> > +	dio->size = 0;
> >  	dio->block_in_file = offset >> blkbits;
> >  	dio->blocks_available = 0;
> >  	dio->cur_page = NULL;
> > @@ -925,7 +927,7 @@
> >  
> >  	for (seg = 0; seg < nr_segs; seg++) {
> >  		user_addr = (unsigned long)iov[seg].iov_base;
> > -		bytes = iov[seg].iov_len;
> > +		dio->size += bytes = iov[seg].iov_len;
> >  
> >  		/* Index into the first page of the first block */
> >  		dio->first_block_in_page = (user_addr & ~PAGE_MASK) >> blkbits;
> > @@ -956,6 +958,13 @@
> >  		}
> >  	} /* end iovec loop */
> >  
> > +	if (ret == -ENOTBLK && rw == WRITE) {
> > +		/*
> > +		 * The remaining part of the request will be 
> > +		 * be handled by buffered I/O when we return
> > +		 */
> > +		ret = 0;
> > +	}
> >  	/*
> >  	 * There may be some unwritten disk at the end of a part-written
> >  	 * fs-block-sized block.  Go zero that now.
> > @@ -986,19 +995,13 @@
> >  	 */
> >  	if (dio->is_async) {
> >  		if (ret == 0)
> > -			ret = dio->result;	/* Bytes written */
> > -		if (ret == -ENOTBLK) {
> > -			/*
> > -			 * The request will be reissued via buffered I/O
> > -			 * when we return; Any I/O already issued
> > -			 * effectively becomes redundant.
> > -			 */
> > -			dio->result = ret;
> > +			ret = dio->result;
> > +		if (ret > 0 && dio->result < dio->size && rw == WRITE) {
> >  			dio->waiter = current;
> >  		}
> >  		finished_one_bio(dio);		/* This can free the dio */
> >  		blk_run_queues();
> > -		if (ret == -ENOTBLK) {
> > +		if (dio->waiter) {
> >  			/*
> >  			 * Wait for already issued I/O to drain out and
> >  			 * release its references to user-space pages
> > @@ -1032,7 +1035,8 @@
> >  		}
> >  		dio_complete(dio, offset, ret);
> >  		/* We could have also come here on an AIO file extend */
> > -		if (!is_sync_kiocb(iocb) && (ret != -ENOTBLK))
> > +		if (!is_sync_kiocb(iocb) && !(rw == WRITE && ret >= 0 && 
> > +			dio->result < dio->size))
> >  			aio_complete(iocb, ret, 0);
> >  		kfree(dio);
> >  	}
> > diff -ur pure-mm3/mm/filemap.c linux-2.6.0-test9-mm3/mm/filemap.c
> > --- pure-mm3/mm/filemap.c	2003-11-14 09:15:08.000000000 +0530
> > +++ linux-2.6.0-test9-mm3/mm/filemap.c	2003-11-15 11:11:16.000000000 +0530
> > @@ -1895,14 +1895,16 @@
> >  		 */
> >  		if (written >= 0 && file->f_flags & O_SYNC)
> >  			status = generic_osync_inode(inode, mapping, OSYNC_METADATA);
> > -		if (written >= 0 && !is_sync_kiocb(iocb))
> > +		if (written >= count && !is_sync_kiocb(iocb))
> >  			written = -EIOCBQUEUED;
> > -		if (written != -ENOTBLK)
> > +		if (written < 0 || written >= count)
> >  			goto out_status;
> >  		/*
> >  		 * direct-io write to a hole: fall through to buffered I/O
> > +		 * for completing the rest of the request.
> >  		 */
> > -		written = 0;
> > +		pos += written;
> > +		count -= written;
> >  	}
> >  
> >  	buf = iov->iov_base;
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-aio' in
> the body to majordomo@kvack.org.  For more info on Linux AIO,
> see: http://www.kvack.org/aio/
> Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH][2.6-mm] Fix 4G/4G X11/vm86 oops
  2003-11-17 23:14                   ` Zwane Mwaikambo
@ 2003-11-18  7:21                     ` Zwane Mwaikambo
  2003-11-18 15:47                       ` Linus Torvalds
  0 siblings, 1 reply; 49+ messages in thread
From: Zwane Mwaikambo @ 2003-11-18  7:21 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Martin J. Bligh, Andrew Morton, Linux Kernel,
	linux-mm, Hugh Dickins

On Mon, 17 Nov 2003, Zwane Mwaikambo wrote:

> A little bird told me to send diffs... But there is a lot of noise due to 
> offsets i'm afraid.

Another note from our avian friends; i seem to have sent a slightly 
different dump from the patch, although they do both achieve the same 
effect. I shall append it for completeness.

0x0210e860 <do_sys_vm86+0>:     push   %edi
0x0210e861 <do_sys_vm86+1>:     mov    $0xffffe000,%eax
0x0210e866 <do_sys_vm86+6>:     push   %esi
0x0210e867 <do_sys_vm86+7>:     and    %esp,%eax
0x0210e869 <do_sys_vm86+9>:     push   %ebx
0x0210e86a <do_sys_vm86+10>:    mov    0x10(%esp,1),%edi
0x0210e86e <do_sys_vm86+14>:    mov    0x14(%esp,1),%esi
0x0210e872 <do_sys_vm86+18>:    movl   $0x0,0x1c(%edi)
0x0210e879 <do_sys_vm86+25>:    movl   $0x0,0x20(%edi)
0x0210e880 <do_sys_vm86+32>:    mov    (%eax),%edx
0x0210e882 <do_sys_vm86+34>:    mov    0x30(%edi),%eax
0x0210e885 <do_sys_vm86+37>:    mov    %eax,0x5b8(%edx)
0x0210e88b <do_sys_vm86+43>:    mov    0x30(%edi),%edx
0x0210e88e <do_sys_vm86+46>:    mov    0xbc(%edi),%eax
0x0210e894 <do_sys_vm86+52>:    and    $0xdd5,%edx
0x0210e89a <do_sys_vm86+58>:    mov    %edx,0x30(%edi)
0x0210e89d <do_sys_vm86+61>:    mov    0x30(%eax),%eax
0x0210e8a0 <do_sys_vm86+64>:    and    $0xfffff22a,%eax
0x0210e8a5 <do_sys_vm86+69>:    or     %eax,%edx
0x0210e8a7 <do_sys_vm86+71>:    mov    0x54(%edi),%eax
0x0210e8aa <do_sys_vm86+74>:    or     $0x20000,%edx
0x0210e8b0 <do_sys_vm86+80>:    cmp    $0x3,%eax
0x0210e8b3 <do_sys_vm86+83>:    mov    %edx,0x30(%edi)
0x0210e8b6 <do_sys_vm86+86>:    je     0x210e9f0 <do_sys_vm86+400>
0x0210e8bc <do_sys_vm86+92>:    cmp    $0x3,%eax
0x0210e8bf <do_sys_vm86+95>:    ja     0x210e9d5 <do_sys_vm86+373>
0x0210e8c5 <do_sys_vm86+101>:   cmp    $0x2,%eax
0x0210e8c8 <do_sys_vm86+104>:   je     0x210e9c6 <do_sys_vm86+358>
0x0210e8ce <do_sys_vm86+110>:   movl   $0x247000,0x5bc(%esi)
0x0210e8d8 <do_sys_vm86+120>:   mov    0xbc(%edi),%eax
0x0210e8de <do_sys_vm86+126>:   movl   $0x0,0x18(%eax)
0x0210e8e5 <do_sys_vm86+133>:   mov    0x360(%esi),%eax
0x0210e8eb <do_sys_vm86+139>:   mov    %eax,0x5c0(%esi)
0x0210e8f1 <do_sys_vm86+145>:   movl   %fs,0x5c4(%esi)
0x0210e8f7 <do_sys_vm86+151>:   movl   %gs,0x5c8(%esi)
0x0210e8fd <do_sys_vm86+157>:   mov    $0xffffe000,%ebx
0x0210e902 <do_sys_vm86+162>:   and    %esp,%ebx
0x0210e904 <do_sys_vm86+164>:   mov    0x14(%ebx),%eax
0x0210e907 <do_sys_vm86+167>:   inc    %eax
0x0210e908 <do_sys_vm86+168>:   mov    %eax,0x14(%ebx)
0x0210e90b <do_sys_vm86+171>:   mov    0x10(%ebx),%eax
0x0210e90e <do_sys_vm86+174>:   mov    0x4(%esi),%edx
0x0210e911 <do_sys_vm86+177>:   shl    $0x9,%eax
0x0210e914 <do_sys_vm86+180>:   lea    0x26ff000(%eax),%ecx
0x0210e91a <do_sys_vm86+186>:   lea    0x4c(%edi),%eax
0x0210e91d <do_sys_vm86+189>:   mov    %eax,0x360(%esi)
0x0210e923 <do_sys_vm86+195>:   sub    0x1c(%edx),%eax
0x0210e926 <do_sys_vm86+198>:   add    0x20(%edx),%eax
0x0210e929 <do_sys_vm86+201>:   mov    %eax,0x4(%ecx)
0x0210e92c <do_sys_vm86+204>:   mov    0x25fe52c,%eax
0x0210e931 <do_sys_vm86+209>:   test   $0x800,%eax
0x0210e936 <do_sys_vm86+214>:   je     0x210e942 <do_sys_vm86+226>
0x0210e938 <do_sys_vm86+216>:   movl   $0x0,0x364(%esi)
0x0210e942 <do_sys_vm86+226>:   lea    0x340(%esi),%edx
0x0210e948 <do_sys_vm86+232>:   mov    0x20(%edx),%eax
0x0210e94b <do_sys_vm86+235>:   mov    %eax,0x4(%ecx)
0x0210e94e <do_sys_vm86+238>:   mov    0x10(%ecx),%ax
0x0210e952 <do_sys_vm86+242>:   and    $0xffff,%eax
0x0210e957 <do_sys_vm86+247>:   cmp    0x24(%edx),%eax
0x0210e95a <do_sys_vm86+250>:   jne    0x210e9b0 <do_sys_vm86+336>
0x0210e95c <do_sys_vm86+252>:   mov    0x14(%ebx),%eax
0x0210e95f <do_sys_vm86+255>:   dec    %eax
0x0210e960 <do_sys_vm86+256>:   mov    %eax,0x14(%ebx)
0x0210e963 <do_sys_vm86+259>:   mov    0x8(%ebx),%eax
0x0210e966 <do_sys_vm86+262>:   and    $0x8,%eax
0x0210e969 <do_sys_vm86+265>:   jne    0x210e9a9 <do_sys_vm86+329>
0x0210e96b <do_sys_vm86+267>:   mov    0x50(%edi),%eax
0x0210e96e <do_sys_vm86+270>:   mov    %eax,0x5b4(%esi)
0x0210e974 <do_sys_vm86+276>:   testb  $0x1,0x4c(%edi)
0x0210e978 <do_sys_vm86+280>:   jne    0x210e9a0 <do_sys_vm86+320>
0x0210e97a <do_sys_vm86+282>:   push   $0x255f121
0x0210e97f <do_sys_vm86+287>:   call   0x21285a0 <printk>
0x0210e984 <do_sys_vm86+292>:   mov    0x4(%esi),%edx
0x0210e987 <do_sys_vm86+295>:   xor    %eax,%eax
0x0210e989 <do_sys_vm86+297>:   mov    %eax,%fs
0x0210e98b <do_sys_vm86+299>:   mov    %eax,%gs
0x0210e98d <do_sys_vm86+301>:   mov    %edi,%esp
0x0210e98f <do_sys_vm86+303>:   mov    %edx,%ebp
0x0210e991 <do_sys_vm86+305>:   jmp    0xfffeb100 <resume_userspace>
0x0210e996 <do_sys_vm86+310>:   pop    %esi
0x0210e997 <do_sys_vm86+311>:   pop    %ebx
0x0210e998 <do_sys_vm86+312>:   pop    %esi
0x0210e999 <do_sys_vm86+313>:   pop    %edi
0x0210e99a <do_sys_vm86+314>:   ret
0x0210e99b <do_sys_vm86+315>:   nop
0x0210e99c <do_sys_vm86+316>:   lea    0x0(%esi,1),%esi
0x0210e9a0 <do_sys_vm86+320>:   push   %esi
0x0210e9a1 <do_sys_vm86+321>:   call   0x210e5b0 <mark_screen_rdonly>
0x0210e9a6 <do_sys_vm86+326>:   pop    %eax
0x0210e9a7 <do_sys_vm86+327>:   jmp    0x210e97a <do_sys_vm86+282>
0x0210e9a9 <do_sys_vm86+329>:   call   0x21222d0 <preempt_schedule>
0x0210e9ae <do_sys_vm86+334>:   jmp    0x210e96b <do_sys_vm86+267>
0x0210e9b0 <do_sys_vm86+336>:   mov    0x24(%edx),%ax
0x0210e9b4 <do_sys_vm86+340>:   mov    %ax,0x10(%ecx)
0x0210e9b8 <do_sys_vm86+344>:   mov    $0x174,%ecx
0x0210e9bd <do_sys_vm86+349>:   mov    0x24(%edx),%eax
0x0210e9c0 <do_sys_vm86+352>:   xor    %edx,%edx
0x0210e9c2 <do_sys_vm86+354>:   wrmsr
0x0210e9c4 <do_sys_vm86+356>:   jmp    0x210e95c <do_sys_vm86+252>
0x0210e9c6 <do_sys_vm86+358>:   movl   $0x0,0x5bc(%esi)
0x0210e9d0 <do_sys_vm86+368>:   jmp    0x210e8d8 <do_sys_vm86+120>
0x0210e9d5 <do_sys_vm86+373>:   cmp    $0x4,%eax
0x0210e9d8 <do_sys_vm86+376>:   jne    0x210e8ce <do_sys_vm86+110>
0x0210e9de <do_sys_vm86+382>:   movl   $0x47000,0x5bc(%esi)
0x0210e9e8 <do_sys_vm86+392>:   jmp    0x210e8d8 <do_sys_vm86+120>
0x0210e9ed <do_sys_vm86+397>:   lea    0x0(%esi),%esi
0x0210e9f0 <do_sys_vm86+400>:   movl   $0x7000,0x5bc(%esi)
0x0210e9fa <do_sys_vm86+410>:   jmp    0x210e8d8 <do_sys_vm86+120>

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: 2.6.0-test9-mm3 - AIO test results
  2003-11-18  1:37       ` Daniel McNeil
@ 2003-11-18 11:55         ` Suparna Bhattacharya
  2003-11-18 23:47           ` Daniel McNeil
  0 siblings, 1 reply; 49+ messages in thread
From: Suparna Bhattacharya @ 2003-11-18 11:55 UTC (permalink / raw)
  To: Daniel McNeil
  Cc: Andrew Morton, Linux Kernel Mailing List, linux-mm, linux-aio

I don't seem to able to recreate this at my end - even with 1k 
block sizes.  Did you notice if this problem occurs without
the latest patch ?

Regards
Suparna

On Mon, Nov 17, 2003 at 05:37:14PM -0800, Daniel McNeil wrote:
> Obviously, the ps output in my previous email showed that the hangs were
> with 1k i/o sizes.  
> 
> More testing using 2k, 4k, 16k, 32k, 64k, 128k, 256k and 512k all
> completed correctly.
> 
> Even 11k and 17k worked.
> 
> $ ls -l
> -rw-------    1 daniel   daniel   88289280 Jun  9 16:54 glibc-2.3.2.tar
> -rw-rw-r--    1 daniel   daniel   88289280 Nov 17 17:32 ff2
> 
> 
> So, only 1k is hanging so far.
> 
> Daniel
> 
> On Mon, 2003-11-17 at 17:15, Daniel McNeil wrote:
> > Suparna,
> > 
> > Good news and bad news.  Your patch does fix the non-power of two i/o
> > size problems where AIO previously did not complete:
> > 
> > $ ./aiodio_sparse  -s 1751k -r 18k -w 11k
> > $ aiodio_sparse -i 9 -dd -s 180k -r 18k -w 18k  
> > io_submit() return 9
> > aiodio_sparse: 9 i/o in flight
> > aiodio_sparse: offset 165888 filesize 184320 inflight 9
> > aiodio_sparse: io_getevent() returned 1
> > aiodio_sparse: io_getevent() res 18432 res2 0
> > io_submit() return 1
> > AIO DIO write done unlinking file
> > dio_sparse done writing, kill children
> > aiodio_sparse 0 children had errors
> > 
> > But when testing using aiocp using O_DIRECT to copy a file to
> > an already allocated file, the aiocp process hangs.  I used i/o
> > size of 4k and that compeleted.  Using i/o size of 1k and 2k,
> > the aiocp process hung during io_sumbit() and are unkillable.
> > Here are the stack traces:
> > 
> > # ps -fu daniel | grep aiocp
> > daniel    1920     1  0 16:45 ?        00:00:07 aiocp -b 1k -n 1 -f DIRECT glibc-2.3.2.tar ff2
> > daniel    2083  2037  0 17:00 pts/2    00:00:03 aiocp -dd -b 1k -n 8 -f DIRECT glibc-2.3.2.tar ff2
> > 
> > 
> > aiocp         D 00000001  1920      1                1902 (NOTLB)
> > e70abd04 00200086 c18dbc80 00000001 00000003 c02897fc 00000060 00200246
> >        f7cdb8b4 c16522f0 c18dbc80 0000309c 640a05eb 0000008b e6d9e660
> > c0289a16
> >        f7cdb8b4 e87e95cc c18dbc80 00000000 00000001 e70abd10 c0123712
> > e70aa000
> > Call Trace:
> >  [<c02897fc>] generic_unplug_device+0x50/0xbd
> >  [<c0289a16>] blk_run_queues+0xa9/0x15c
> >  [<c0123712>] io_schedule+0x26/0x30
> >  [<c0192242>] direct_io_worker+0x376/0x5ab
> >  [<c014840f>] generic_file_direct_IO+0x70/0x89
> >  [<c019264a>] __blockdev_direct_IO+0x1d3/0x2d5
> >  [<c01ac73e>] ext3_direct_io_get_blocks+0x0/0xbf
> >  [<c01ad72d>] ext3_direct_IO+0xc0/0x1e1
> >  [<c01ac73e>] ext3_direct_io_get_blocks+0x0/0xbf
> >  [<c014840f>] generic_file_direct_IO+0x70/0x89
> >  [<c0145e11>] __generic_file_aio_read+0xfb/0x1ff
> >  [<c0121b70>] schedule+0x3ac/0x7ef
> >  [<c0145f48>] generic_file_aio_read+0x33/0x37
> >  [<c0194ad3>] aio_pread+0x34/0x5f
> >  [<c0193bec>] aio_run_iocb+0xa6/0x1ed
> >  [<c019316f>] __aio_get_req+0x27/0x158
> >  [<c0194a9f>] aio_pread+0x0/0x5f
> >  [<c0194f62>] io_submit_one+0x1ea/0x2b7
> >  [<c0195110>] sys_io_submit+0xe1/0x194
> >  [<c03c29a7>] syscall_call+0x7/0xb
> >  [<c03c007b>] rpc_depopulate+0x1aa/0x24b
> > 
> > 
> > aiocp         D 366EDC94  2083   2037                     (NOTLB)
> > e758bd04 00200082 f71ba000 366edc94 00000161 c02897fc 00000060 366edc94
> >        00000161 f71ba000 c18d3c80 000069a9 366f5a0e 00000161 e8d4acc0 c0289a16
> >        f7cdb8b4 e960465c c18d3c80 00000000 00000001 e758bd10 c0123712 e758a000
> > Call Trace:
> >  [<c02897fc>] generic_unplug_device+0x50/0xbd
> >  [<c0289a16>] blk_run_queues+0xa9/0x15c
> >  [<c0123712>] io_schedule+0x26/0x30
> >  [<c0192242>] direct_io_worker+0x376/0x5ab
> >  [<c019264a>] __blockdev_direct_IO+0x1d3/0x2d5
> >  [<c01ac73e>] ext3_direct_io_get_blocks+0x0/0xbf
> >  [<c01ad72d>] ext3_direct_IO+0xc0/0x1e1
> >  [<c01ac73e>] ext3_direct_io_get_blocks+0x0/0xbf
> >  [<c014840f>] generic_file_direct_IO+0x70/0x89
> >  [<c0145e11>] __generic_file_aio_read+0xfb/0x1ff
> >  [<c0259d3e>] write_chan+0x165/0x21e
> >  [<c0145f48>] generic_file_aio_read+0x33/0x37
> >  [<c0194ad3>] aio_pread+0x34/0x5f
> >  [<c0193bec>] aio_run_iocb+0xa6/0x1ed
> >  [<c019316f>] __aio_get_req+0x27/0x158
> >  [<c0194a9f>] aio_pread+0x0/0x5f
> >  [<c02532ab>] tty_write+0x1e8/0x3b2
> >  [<c0194f62>] io_submit_one+0x1ea/0x2b7
> >  [<c0195110>] sys_io_submit+0xe1/0x194
> >  [<c03c29a7>] syscall_call+0x7/0xb
> >  [<c03c007b>] rpc_depopulate+0x1aa/0x24b
> > 
> > 
> > 
> > Daniel
> > 
> > On Sun, 2003-11-16 at 21:25, Suparna Bhattacharya wrote:
> > > On Thu, Nov 13, 2003 at 02:03:58PM -0800, Daniel McNeil wrote:
> > > > Andrew,
> > > > 
> > > > I'm testing test9-mm3 on a 2-proc Xeon with a ext3 file system.
> > > > I tested using the test programs aiocp and aiodio_sparse.
> > > > (see http://developer.osdl.org/daniel/AIO/)
> > > > 
> > > > Using aiocp with i/o sizes from 1k to 512k to copy files worked
> > > > without any errors or kernel debug messages.
> > > > 
> > > > With 64k i/o, the aiodio_sparse program complete without any errors.
> > > > There are no kernel error messages, so that is good.
> > > > 
> > > > There are still problems with non power of 2 i/o sizes using AIO and
> > > > O_DIRECT.  It hangs with aio's that do not seem to complete.  The test
> > > > does exit when hitting ^c and there are no kernel messages.  Test output
> > > > below:
> > > 
> > > Could you check if the following patch fixes the problem for you ?
> > > 
> > > Regards
> > > Suparna
> > > 
> > > --------------------------------------------------------------
> > > 
> > > With this patch, when the DIO code falls back to buffered i/o after
> > > having submitted part of the i/o, then buffered i/o is issued only
> > > for the remaining part of the request (i.e. the part not already 
> > > covered by DIO).
> > > 
> > > diff -ur pure-mm3/fs/direct-io.c linux-2.6.0-test9-mm3/fs/direct-io.c
> > > --- pure-mm3/fs/direct-io.c	2003-11-14 09:09:06.000000000 +0530
> > > +++ linux-2.6.0-test9-mm3/fs/direct-io.c	2003-11-17 09:00:47.000000000 +0530
> > > @@ -74,6 +74,7 @@
> > >  					   been performed at the start of a
> > >  					   write */
> > >  	int pages_in_io;		/* approximate total IO pages */
> > > +	size_t	size;			/* total request size (doesn't change)*/
> > >  	sector_t block_in_file;		/* Current offset into the underlying
> > >  					   file in dio_block units. */
> > >  	unsigned blocks_available;	/* At block_in_file.  changes */
> > > @@ -226,7 +227,7 @@
> > >  			dio_complete(dio, dio->block_in_file << dio->blkbits,
> > >  					dio->result);
> > >  			/* Complete AIO later if falling back to buffered i/o */
> > > -			if (dio->result != -ENOTBLK) {
> > > +			if (dio->result >= dio->size || dio->rw == READ) {
> > >  				aio_complete(dio->iocb, dio->result, 0);
> > >  				kfree(dio);
> > >  			} else {
> > > @@ -889,6 +890,7 @@
> > >  	dio->blkbits = blkbits;
> > >  	dio->blkfactor = inode->i_blkbits - blkbits;
> > >  	dio->start_zero_done = 0;
> > > +	dio->size = 0;
> > >  	dio->block_in_file = offset >> blkbits;
> > >  	dio->blocks_available = 0;
> > >  	dio->cur_page = NULL;
> > > @@ -925,7 +927,7 @@
> > >  
> > >  	for (seg = 0; seg < nr_segs; seg++) {
> > >  		user_addr = (unsigned long)iov[seg].iov_base;
> > > -		bytes = iov[seg].iov_len;
> > > +		dio->size += bytes = iov[seg].iov_len;
> > >  
> > >  		/* Index into the first page of the first block */
> > >  		dio->first_block_in_page = (user_addr & ~PAGE_MASK) >> blkbits;
> > > @@ -956,6 +958,13 @@
> > >  		}
> > >  	} /* end iovec loop */
> > >  
> > > +	if (ret == -ENOTBLK && rw == WRITE) {
> > > +		/*
> > > +		 * The remaining part of the request will be 
> > > +		 * be handled by buffered I/O when we return
> > > +		 */
> > > +		ret = 0;
> > > +	}
> > >  	/*
> > >  	 * There may be some unwritten disk at the end of a part-written
> > >  	 * fs-block-sized block.  Go zero that now.
> > > @@ -986,19 +995,13 @@
> > >  	 */
> > >  	if (dio->is_async) {
> > >  		if (ret == 0)
> > > -			ret = dio->result;	/* Bytes written */
> > > -		if (ret == -ENOTBLK) {
> > > -			/*
> > > -			 * The request will be reissued via buffered I/O
> > > -			 * when we return; Any I/O already issued
> > > -			 * effectively becomes redundant.
> > > -			 */
> > > -			dio->result = ret;
> > > +			ret = dio->result;
> > > +		if (ret > 0 && dio->result < dio->size && rw == WRITE) {
> > >  			dio->waiter = current;
> > >  		}
> > >  		finished_one_bio(dio);		/* This can free the dio */
> > >  		blk_run_queues();
> > > -		if (ret == -ENOTBLK) {
> > > +		if (dio->waiter) {
> > >  			/*
> > >  			 * Wait for already issued I/O to drain out and
> > >  			 * release its references to user-space pages
> > > @@ -1032,7 +1035,8 @@
> > >  		}
> > >  		dio_complete(dio, offset, ret);
> > >  		/* We could have also come here on an AIO file extend */
> > > -		if (!is_sync_kiocb(iocb) && (ret != -ENOTBLK))
> > > +		if (!is_sync_kiocb(iocb) && !(rw == WRITE && ret >= 0 && 
> > > +			dio->result < dio->size))
> > >  			aio_complete(iocb, ret, 0);
> > >  		kfree(dio);
> > >  	}
> > > diff -ur pure-mm3/mm/filemap.c linux-2.6.0-test9-mm3/mm/filemap.c
> > > --- pure-mm3/mm/filemap.c	2003-11-14 09:15:08.000000000 +0530
> > > +++ linux-2.6.0-test9-mm3/mm/filemap.c	2003-11-15 11:11:16.000000000 +0530
> > > @@ -1895,14 +1895,16 @@
> > >  		 */
> > >  		if (written >= 0 && file->f_flags & O_SYNC)
> > >  			status = generic_osync_inode(inode, mapping, OSYNC_METADATA);
> > > -		if (written >= 0 && !is_sync_kiocb(iocb))
> > > +		if (written >= count && !is_sync_kiocb(iocb))
> > >  			written = -EIOCBQUEUED;
> > > -		if (written != -ENOTBLK)
> > > +		if (written < 0 || written >= count)
> > >  			goto out_status;
> > >  		/*
> > >  		 * direct-io write to a hole: fall through to buffered I/O
> > > +		 * for completing the rest of the request.
> > >  		 */
> > > -		written = 0;
> > > +		pos += written;
> > > +		count -= written;
> > >  	}
> > >  
> > >  	buf = iov->iov_base;
> > 
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-aio' in
> > the body to majordomo@kvack.org.  For more info on Linux AIO,
> > see: http://www.kvack.org/aio/
> > Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-aio' in
> the body to majordomo@kvack.org.  For more info on Linux AIO,
> see: http://www.kvack.org/aio/
> Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

-- 
Suparna Bhattacharya (suparna@in.ibm.com)
Linux Technology Center
IBM Software Labs, India


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH][2.6-mm] Fix 4G/4G X11/vm86 oops
  2003-11-18  7:21                     ` Zwane Mwaikambo
@ 2003-11-18 15:47                       ` Linus Torvalds
  2003-11-18 16:16                         ` Zwane Mwaikambo
  0 siblings, 1 reply; 49+ messages in thread
From: Linus Torvalds @ 2003-11-18 15:47 UTC (permalink / raw)
  To: Zwane Mwaikambo
  Cc: Ingo Molnar, Martin J. Bligh, Andrew Morton, Linux Kernel,
	linux-mm, Hugh Dickins


On Tue, 18 Nov 2003, Zwane Mwaikambo wrote:
> 
> Another note from our avian friends; i seem to have sent a slightly 
> different dump from the patch, although they do both achieve the same 
> effect. I shall append it for completeness.

Hmm. I don't see anything. However, it's a lot easier to read the
gcc-generated assembly ("make arch/i386/kernel/vm86.s") than it is to read
the objdump disassembly.

It's also a lot easier to see what the assembly language is when giving 
the

	-fno-reorder-blocks

switch to gcc. Without it, modern gcc's tend to have _way_ too many jumps 
around. But maybe that actually changes the behaviour too.

		Linus


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH][2.6-mm] Fix 4G/4G X11/vm86 oops
  2003-11-18 15:47                       ` Linus Torvalds
@ 2003-11-18 16:16                         ` Zwane Mwaikambo
  2003-11-18 16:37                           ` Linus Torvalds
  0 siblings, 1 reply; 49+ messages in thread
From: Zwane Mwaikambo @ 2003-11-18 16:16 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Martin J. Bligh, Andrew Morton, Linux Kernel,
	linux-mm, Hugh Dickins

On Tue, 18 Nov 2003, Linus Torvalds wrote:

> Hmm. I don't see anything. However, it's a lot easier to read the
> gcc-generated assembly ("make arch/i386/kernel/vm86.s") than it is to read
> the objdump disassembly.
>
> 
> It's also a lot easier to see what the assembly language is when giving 
> the
> 
> 	-fno-reorder-blocks

I'll recompile and verify that the bug can be reproduced and worked around 
with that flag.
 
> switch to gcc. Without it, modern gcc's tend to have _way_ too many jumps 
> around. But maybe that actually changes the behaviour too.

Here are diffs from the do_sys_vm86 only.

--- asm-before	2003-11-18 10:56:02.967643808 -0500
+++ asm-after	2003-11-18 10:55:37.880457640 -0500
@@ -897,6 +897,10 @@
 .LFE473:
 .Lfe4:
 	.size	sys_vm86,.Lfe4-sys_vm86
+	.section	.rodata.str1.1
+.LC6:
+	.string	"ooh la la\n"
+	.text
 	.p2align 4,,15
 	.type	do_sys_vm86,@function
 do_sys_vm86:
@@ -1053,29 +1057,37 @@
 	jne	.L213
 .L210:
 	.loc 1 315 0
+	pushl	$.LC6
+.LCFI98:
+	call	printk
+	.loc 1 316 0
 	movl	4(%esi), %edx
 #APP
 	xorl %eax,%eax; movl %eax,%fs; movl %eax,%gs
 	movl %edi,%esp
 	movl %edx,%ebp
 	jmp resume_userspace
-	.loc 1 323 0
 #NO_APP
-	popl	%ebx
-.LCFI98:
+.LBE53:
 	popl	%esi
 .LCFI99:
-	popl	%edi
+	.loc 1 324 0
+	popl	%ebx
 .LCFI100:
+	popl	%esi
+.LCFI101:
+	popl	%edi
+.LCFI102:
 	ret
 	.loc 1 313 0
 	.p2align 4,,7
 .L213:
+.LBB65:
 	pushl	%esi
-.LCFI101:
+.LCFI103:
 	call	mark_screen_rdonly
 	popl	%eax
-.LCFI102:
+.LCFI104:
 	jmp	.L210
 	.loc 1 310 0
 .L212:
@@ -1083,7 +1095,7 @@
 	jmp	.L197
 	.loc 14 454 0
 .L211:
-.LBB65:
+.LBB66:
 	movw	36(%edx), %ax
 	movw	%ax, 16(%ecx)
 	.loc 14 455 0
@@ -1097,7 +1109,7 @@
 	.p2align 4,,7
 .L183:
 	.loc 1 283 0
-.LBE65:
+.LBE66:
 	movl	$0, 1468(%esi)
 	.loc 1 284 0
 	jmp	.L182
@@ -1115,7 +1127,7 @@
 	movl	$28672, 1468(%esi)
 	.loc 1 287 0
 	jmp	.L182
-.LBE53:
+.LBE65:
 .LFE475:
 .Lfe5:
 	.size	do_sys_vm86,.Lfe5-do_sys_vm86


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH][2.6-mm] Fix 4G/4G X11/vm86 oops
  2003-11-18 16:16                         ` Zwane Mwaikambo
@ 2003-11-18 16:37                           ` Linus Torvalds
  2003-11-18 17:08                             ` Zwane Mwaikambo
  2003-11-19 20:32                             ` Matt Mackall
  0 siblings, 2 replies; 49+ messages in thread
From: Linus Torvalds @ 2003-11-18 16:37 UTC (permalink / raw)
  To: Zwane Mwaikambo
  Cc: Ingo Molnar, Martin J. Bligh, Andrew Morton, Linux Kernel,
	linux-mm, Hugh Dickins


On Tue, 18 Nov 2003, Zwane Mwaikambo wrote:
> 
> Here are diffs from the do_sys_vm86 only.

Ok. Much more readable.

And there is something very suspicious there.

The code with and without the printk() looks _identical_ apart from some 
trivial label renumbering, and the added

	pushl   $.LC6
	call    printk
	.. asm ..
	popl %esi

which all looks fine (esi is dead at that point, so the compiler is just
using a "popl" as a shorter form of "addl $4,%esp").

Btw, you seem to compile with debugging, which makes the assembly 
language pretty much unreadable and accounts for most of the 
differences: the line numbers change. If you compile a kernel where the 
line numbers don't change (by commenting _out_ the printk rather than 
removing the whole line), your diff would be more readable.

Anyway, there are _zero_ differences.

Just for fun, try this: move the "printk()" to _below_ the "asm"  
statement. It will never actually get executed, but if it's an issue of
some subtle code or data placement things (cache lines etc), maybe that
also hides the oops, since all the same code and data will be generated, 
just not run...

		Linus


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH][2.6-mm] Fix 4G/4G X11/vm86 oops
  2003-11-18 16:37                           ` Linus Torvalds
@ 2003-11-18 17:08                             ` Zwane Mwaikambo
  2003-11-18 17:38                               ` Martin J. Bligh
  2003-11-19 20:32                             ` Matt Mackall
  1 sibling, 1 reply; 49+ messages in thread
From: Zwane Mwaikambo @ 2003-11-18 17:08 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Ingo Molnar, Martin J. Bligh, Andrew Morton, Linux Kernel,
	linux-mm, Hugh Dickins

On Tue, 18 Nov 2003, Linus Torvalds wrote:

> Ok. Much more readable.
> 
> And there is something very suspicious there.
> 
> The code with and without the printk() looks _identical_ apart from some 
> trivial label renumbering, and the added
> 
> 	pushl   $.LC6
> 	call    printk
> 	.. asm ..
> 	popl %esi
> 
> which all looks fine (esi is dead at that point, so the compiler is just
> using a "popl" as a shorter form of "addl $4,%esp").
> 
> Btw, you seem to compile with debugging, which makes the assembly 
> language pretty much unreadable and accounts for most of the 
> differences: the line numbers change. If you compile a kernel where the 
> line numbers don't change (by commenting _out_ the printk rather than 
> removing the whole line), your diff would be more readable.

Aha! Thanks for mentioning that, noted.

> Anyway, there are _zero_ differences.
> 
> Just for fun, try this: move the "printk()" to _below_ the "asm"  
> statement. It will never actually get executed, but if it's an issue of
> some subtle code or data placement things (cache lines etc), maybe that
> also hides the oops, since all the same code and data will be generated, 
> just not run...

Ok i just tried that and it still fails. Matt Mackall suggested i also try 
writing a minimal printk which has the same effect.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH][2.6-mm] Fix 4G/4G X11/vm86 oops
  2003-11-18 17:38                               ` Martin J. Bligh
@ 2003-11-18 17:22                                 ` Zwane Mwaikambo
  0 siblings, 0 replies; 49+ messages in thread
From: Zwane Mwaikambo @ 2003-11-18 17:22 UTC (permalink / raw)
  To: Martin J. Bligh
  Cc: Linus Torvalds, Ingo Molnar, Andrew Morton, Linux Kernel,
	linux-mm, Hugh Dickins

On Tue, 18 Nov 2003, Martin J. Bligh wrote:

> The other thing I've found printks to hide before is timing bugs / races.
> Unfortunately I can't see one here, but maybe someone else can ;-)
> Maybe inserting a 1ms delay or something in place of the printk would
> have the same effect?

I've tried a number of timing related workarounds, namely;
schedule_timeout(2*HZ) and some long spinning loops. I've also thrown a 
schedule() in there at some point.

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH][2.6-mm] Fix 4G/4G X11/vm86 oops
  2003-11-18 17:08                             ` Zwane Mwaikambo
@ 2003-11-18 17:38                               ` Martin J. Bligh
  2003-11-18 17:22                                 ` Zwane Mwaikambo
  0 siblings, 1 reply; 49+ messages in thread
From: Martin J. Bligh @ 2003-11-18 17:38 UTC (permalink / raw)
  To: Zwane Mwaikambo, Linus Torvalds
  Cc: Ingo Molnar, Andrew Morton, Linux Kernel, linux-mm, Hugh Dickins

>> Btw, you seem to compile with debugging, which makes the assembly 
>> language pretty much unreadable and accounts for most of the 
>> differences: the line numbers change. If you compile a kernel where the 
>> line numbers don't change (by commenting _out_ the printk rather than 
>> removing the whole line), your diff would be more readable.
> 
> Aha! Thanks for mentioning that, noted.
> 
>> Anyway, there are _zero_ differences.
>> 
>> Just for fun, try this: move the "printk()" to _below_ the "asm"  
>> statement. It will never actually get executed, but if it's an issue of
>> some subtle code or data placement things (cache lines etc), maybe that
>> also hides the oops, since all the same code and data will be generated, 
>> just not run...
> 
> Ok i just tried that and it still fails. Matt Mackall suggested i also try 
> writing a minimal printk which has the same effect.

The other thing I've found printks to hide before is timing bugs / races.
Unfortunately I can't see one here, but maybe someone else can ;-)
Maybe inserting a 1ms delay or something in place of the printk would
have the same effect?

M.


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: 2.6.0-test9-mm3 - AIO test results
  2003-11-18 11:55         ` Suparna Bhattacharya
@ 2003-11-18 23:47           ` Daniel McNeil
  2003-11-24  9:42             ` Suparna Bhattacharya
  0 siblings, 1 reply; 49+ messages in thread
From: Daniel McNeil @ 2003-11-18 23:47 UTC (permalink / raw)
  To: Suparna Bhattacharya
  Cc: Andrew Morton, Linux Kernel Mailing List, linux-mm, linux-aio

Suparna,

I was unable to reproduce the hang in io_submit() without your patch.
I ran aiocp with 1k i/o size constantly for 2 hours and it never hung.

I re-ran with your patch with both as-iosched and deadline and both
hung in io_submit().  aiocp would run a few times, but I put the
aiocp in a while loop and it hung on the 1st or 2nd time.  It
did get most of the way through copying the file before hanging.
This is on a 2-proc to ide disks running ext3.

Here is the stack trace and other info for as-iosched:
daniel    2005  0.7  0.0  1388  384 pts/0    D    13:51   0:08 aiocp -dd -b 1k -n 8 -f DIRECT glibc-2.3.2.tar ff2
cat /proc/2005/wchan
io_schedule

aiocp         D 00000001  2005   1870                     (NOTLB)
e53cfc08 00200086 c18d3c80 00000001 00000003 c02897fc 00000060 00200246
       f7cdb8b4 c0191630 c18d3c80 0000bfc6 78d5d3e5 00000233 e4dc1980 c0289a16
       f7cdb8b4 d92978e4 c18d3c80 00000000 00000001 e53cfc14 c0123712 e53ce000
Call Trace:
 [<c02897fc>] generic_unplug_device+0x50/0xbd
 [<c0191630>] dio_bio_add_page+0x34/0x79
 [<c0289a16>] blk_run_queues+0xa9/0x15c
 [<c0123712>] io_schedule+0x26/0x30
 [<c0192242>] direct_io_worker+0x376/0x5ab
 [<c019264a>] __blockdev_direct_IO+0x1d3/0x2d5
 [<c01ac73e>] ext3_direct_io_get_blocks+0x0/0xbf
 [<c01ad72d>] ext3_direct_IO+0xc0/0x1e1
 [<c01ac73e>] ext3_direct_io_get_blocks+0x0/0xbf
 [<c014840f>] generic_file_direct_IO+0x70/0x89
 [<c0147a80>] __generic_file_aio_write_nolock+0xa3a/0xda5
 [<c025b049>] pty_write+0x1c8/0x1ca
 [<c01480a4>] generic_file_aio_write+0x7e/0x115
 [<c0256d12>] opost+0x9e/0x1cf
 [<c01aa4a3>] ext3_file_write+0x3f/0xcc
 [<c0194b3a>] aio_pwrite+0x3c/0xad
 [<c0193bec>] aio_run_iocb+0xa6/0x1ed
 [<c019316f>] __aio_get_req+0x27/0x158
 [<c0194afe>] aio_pwrite+0x0/0xad
 [<c02532ab>] tty_write+0x1e8/0x3b2
 [<c0194f62>] io_submit_one+0x1ea/0x2b7
 [<c0195110>] sys_io_submit+0xe1/0x194
 [<c03c29a7>] syscall_call+0x7/0xb

For deadline iosched:

daniel    1889  0.1  0.0  1388  384 pts/0    D    15:12   0:01 aiocp -dd -b 1k -n 8 -f DIRECT glibc-2.3.2.tar ff2
                                                                                                                        
$ cat /proc/1889/wchan
io_schedule
        
$ cat /sys/block/hdb/stat
 209058    23145    45744    58542   209022    22069        0    20758    45210

aiocp         D 0AD7701D  1889   1752                     (NOTLB)
ee2ddd04 00200086 f75e6660 0ad7701d 0000004e 00200282 ebd37cbc 0ad7701d
       0000004e f75e6660 c18d3c80 00060539 0ad7701d 0000004e f75e6000 0000006b
       ee2ddd10 c0192212 c18d3c80 00000000 00000001 ee2ddd10 c0123712 ee2dc000
Call Trace:
 [<c0192212>] direct_io_worker+0x346/0x5ab
 [<c0123712>] io_schedule+0x26/0x30
 [<c0192242>] direct_io_worker+0x376/0x5ab
 [<c019264a>] __blockdev_direct_IO+0x1d3/0x2d5
 [<c01ac73e>] ext3_direct_io_get_blocks+0x0/0xbf
 [<c01ad72d>] ext3_direct_IO+0xc0/0x1e1
 [<c01ac73e>] ext3_direct_io_get_blocks+0x0/0xbf
 [<c014840f>] generic_file_direct_IO+0x70/0x89
 [<c0145e11>] __generic_file_aio_read+0xfb/0x1ff
 [<c0259d3e>] write_chan+0x165/0x21e
 [<c0145f48>] generic_file_aio_read+0x33/0x37
 [<c0194ad3>] aio_pread+0x34/0x5f
 [<c0193bec>] aio_run_iocb+0xa6/0x1ed
 [<c019316f>] __aio_get_req+0x27/0x158
 [<c0194a9f>] aio_pread+0x0/0x5f
 [<c02532ab>] tty_write+0x1e8/0x3b2
 [<c0194f62>] io_submit_one+0x1ea/0x2b7
 [<c0195110>] sys_io_submit+0xe1/0x194
 [<c03c29a7>] syscall_call+0x7/0xb


The hung processes are stuck in the 'D' state and unkillable, of course.
I would appear something is wrong with your patch.  Any ideas?

Daniel                                                                                                                    
                                                                                                              



On Tue, 2003-11-18 at 03:55, Suparna Bhattacharya wrote:
> I don't seem to able to recreate this at my end - even with 1k 
> block sizes.  Did you notice if this problem occurs without
> the latest patch ?
> 
> Regards
> Suparna
> 
> On Mon, Nov 17, 2003 at 05:37:14PM -0800, Daniel McNeil wrote:
> > Obviously, the ps output in my previous email showed that the hangs were
> > with 1k i/o sizes.  
> > 
> > More testing using 2k, 4k, 16k, 32k, 64k, 128k, 256k and 512k all
> > completed correctly.
> > 
> > Even 11k and 17k worked.
> > 
> > $ ls -l
> > -rw-------    1 daniel   daniel   88289280 Jun  9 16:54 glibc-2.3.2.tar
> > -rw-rw-r--    1 daniel   daniel   88289280 Nov 17 17:32 ff2
> > 
> > 
> > So, only 1k is hanging so far.
> > 
> > Daniel
> > 
> > On Mon, 2003-11-17 at 17:15, Daniel McNeil wrote:
> > > Suparna,
> > > 
> > > Good news and bad news.  Your patch does fix the non-power of two i/o
> > > size problems where AIO previously did not complete:
> > > 
> > > $ ./aiodio_sparse  -s 1751k -r 18k -w 11k
> > > $ aiodio_sparse -i 9 -dd -s 180k -r 18k -w 18k  
> > > io_submit() return 9
> > > aiodio_sparse: 9 i/o in flight
> > > aiodio_sparse: offset 165888 filesize 184320 inflight 9
> > > aiodio_sparse: io_getevent() returned 1
> > > aiodio_sparse: io_getevent() res 18432 res2 0
> > > io_submit() return 1
> > > AIO DIO write done unlinking file
> > > dio_sparse done writing, kill children
> > > aiodio_sparse 0 children had errors
> > > 
> > > But when testing using aiocp using O_DIRECT to copy a file to
> > > an already allocated file, the aiocp process hangs.  I used i/o
> > > size of 4k and that compeleted.  Using i/o size of 1k and 2k,
> > > the aiocp process hung during io_sumbit() and are unkillable.
> > > Here are the stack traces:
> > > 
> > > # ps -fu daniel | grep aiocp
> > > daniel    1920     1  0 16:45 ?        00:00:07 aiocp -b 1k -n 1 -f DIRECT glibc-2.3.2.tar ff2
> > > daniel    2083  2037  0 17:00 pts/2    00:00:03 aiocp -dd -b 1k -n 8 -f DIRECT glibc-2.3.2.tar ff2
> > > 
> > > 
> > > aiocp         D 00000001  1920      1                1902 (NOTLB)
> > > e70abd04 00200086 c18dbc80 00000001 00000003 c02897fc 00000060 00200246
> > >        f7cdb8b4 c16522f0 c18dbc80 0000309c 640a05eb 0000008b e6d9e660
> > > c0289a16
> > >        f7cdb8b4 e87e95cc c18dbc80 00000000 00000001 e70abd10 c0123712
> > > e70aa000
> > > Call Trace:
> > >  [<c02897fc>] generic_unplug_device+0x50/0xbd
> > >  [<c0289a16>] blk_run_queues+0xa9/0x15c
> > >  [<c0123712>] io_schedule+0x26/0x30
> > >  [<c0192242>] direct_io_worker+0x376/0x5ab
> > >  [<c014840f>] generic_file_direct_IO+0x70/0x89
> > >  [<c019264a>] __blockdev_direct_IO+0x1d3/0x2d5
> > >  [<c01ac73e>] ext3_direct_io_get_blocks+0x0/0xbf
> > >  [<c01ad72d>] ext3_direct_IO+0xc0/0x1e1
> > >  [<c01ac73e>] ext3_direct_io_get_blocks+0x0/0xbf
> > >  [<c014840f>] generic_file_direct_IO+0x70/0x89
> > >  [<c0145e11>] __generic_file_aio_read+0xfb/0x1ff
> > >  [<c0121b70>] schedule+0x3ac/0x7ef
> > >  [<c0145f48>] generic_file_aio_read+0x33/0x37
> > >  [<c0194ad3>] aio_pread+0x34/0x5f
> > >  [<c0193bec>] aio_run_iocb+0xa6/0x1ed
> > >  [<c019316f>] __aio_get_req+0x27/0x158
> > >  [<c0194a9f>] aio_pread+0x0/0x5f
> > >  [<c0194f62>] io_submit_one+0x1ea/0x2b7
> > >  [<c0195110>] sys_io_submit+0xe1/0x194
> > >  [<c03c29a7>] syscall_call+0x7/0xb
> > >  [<c03c007b>] rpc_depopulate+0x1aa/0x24b
> > > 
> > > 
> > > aiocp         D 366EDC94  2083   2037                     (NOTLB)
> > > e758bd04 00200082 f71ba000 366edc94 00000161 c02897fc 00000060 366edc94
> > >        00000161 f71ba000 c18d3c80 000069a9 366f5a0e 00000161 e8d4acc0 c0289a16
> > >        f7cdb8b4 e960465c c18d3c80 00000000 00000001 e758bd10 c0123712 e758a000
> > > Call Trace:
> > >  [<c02897fc>] generic_unplug_device+0x50/0xbd
> > >  [<c0289a16>] blk_run_queues+0xa9/0x15c
> > >  [<c0123712>] io_schedule+0x26/0x30
> > >  [<c0192242>] direct_io_worker+0x376/0x5ab
> > >  [<c019264a>] __blockdev_direct_IO+0x1d3/0x2d5
> > >  [<c01ac73e>] ext3_direct_io_get_blocks+0x0/0xbf
> > >  [<c01ad72d>] ext3_direct_IO+0xc0/0x1e1
> > >  [<c01ac73e>] ext3_direct_io_get_blocks+0x0/0xbf
> > >  [<c014840f>] generic_file_direct_IO+0x70/0x89
> > >  [<c0145e11>] __generic_file_aio_read+0xfb/0x1ff
> > >  [<c0259d3e>] write_chan+0x165/0x21e
> > >  [<c0145f48>] generic_file_aio_read+0x33/0x37
> > >  [<c0194ad3>] aio_pread+0x34/0x5f
> > >  [<c0193bec>] aio_run_iocb+0xa6/0x1ed
> > >  [<c019316f>] __aio_get_req+0x27/0x158
> > >  [<c0194a9f>] aio_pread+0x0/0x5f
> > >  [<c02532ab>] tty_write+0x1e8/0x3b2
> > >  [<c0194f62>] io_submit_one+0x1ea/0x2b7
> > >  [<c0195110>] sys_io_submit+0xe1/0x194
> > >  [<c03c29a7>] syscall_call+0x7/0xb
> > >  [<c03c007b>] rpc_depopulate+0x1aa/0x24b
> > > 
> > > 
> > > 
> > > Daniel
> > > 
> > > On Sun, 2003-11-16 at 21:25, Suparna Bhattacharya wrote:
> > > > On Thu, Nov 13, 2003 at 02:03:58PM -0800, Daniel McNeil wrote:
> > > > > Andrew,
> > > > > 
> > > > > I'm testing test9-mm3 on a 2-proc Xeon with a ext3 file system.
> > > > > I tested using the test programs aiocp and aiodio_sparse.
> > > > > (see http://developer.osdl.org/daniel/AIO/)
> > > > > 
> > > > > Using aiocp with i/o sizes from 1k to 512k to copy files worked
> > > > > without any errors or kernel debug messages.
> > > > > 
> > > > > With 64k i/o, the aiodio_sparse program complete without any errors.
> > > > > There are no kernel error messages, so that is good.
> > > > > 
> > > > > There are still problems with non power of 2 i/o sizes using AIO and
> > > > > O_DIRECT.  It hangs with aio's that do not seem to complete.  The test
> > > > > does exit when hitting ^c and there are no kernel messages.  Test output
> > > > > below:
> > > > 
> > > > Could you check if the following patch fixes the problem for you ?
> > > > 
> > > > Regards
> > > > Suparna
> > > > 
> > > > --------------------------------------------------------------
> > > > 
> > > > With this patch, when the DIO code falls back to buffered i/o after
> > > > having submitted part of the i/o, then buffered i/o is issued only
> > > > for the remaining part of the request (i.e. the part not already 
> > > > covered by DIO).
> > > > 
> > > > diff -ur pure-mm3/fs/direct-io.c linux-2.6.0-test9-mm3/fs/direct-io.c
> > > > --- pure-mm3/fs/direct-io.c	2003-11-14 09:09:06.000000000 +0530
> > > > +++ linux-2.6.0-test9-mm3/fs/direct-io.c	2003-11-17 09:00:47.000000000 +0530
> > > > @@ -74,6 +74,7 @@
> > > >  					   been performed at the start of a
> > > >  					   write */
> > > >  	int pages_in_io;		/* approximate total IO pages */
> > > > +	size_t	size;			/* total request size (doesn't change)*/
> > > >  	sector_t block_in_file;		/* Current offset into the underlying
> > > >  					   file in dio_block units. */
> > > >  	unsigned blocks_available;	/* At block_in_file.  changes */
> > > > @@ -226,7 +227,7 @@
> > > >  			dio_complete(dio, dio->block_in_file << dio->blkbits,
> > > >  					dio->result);
> > > >  			/* Complete AIO later if falling back to buffered i/o */
> > > > -			if (dio->result != -ENOTBLK) {
> > > > +			if (dio->result >= dio->size || dio->rw == READ) {
> > > >  				aio_complete(dio->iocb, dio->result, 0);
> > > >  				kfree(dio);
> > > >  			} else {
> > > > @@ -889,6 +890,7 @@
> > > >  	dio->blkbits = blkbits;
> > > >  	dio->blkfactor = inode->i_blkbits - blkbits;
> > > >  	dio->start_zero_done = 0;
> > > > +	dio->size = 0;
> > > >  	dio->block_in_file = offset >> blkbits;
> > > >  	dio->blocks_available = 0;
> > > >  	dio->cur_page = NULL;
> > > > @@ -925,7 +927,7 @@
> > > >  
> > > >  	for (seg = 0; seg < nr_segs; seg++) {
> > > >  		user_addr = (unsigned long)iov[seg].iov_base;
> > > > -		bytes = iov[seg].iov_len;
> > > > +		dio->size += bytes = iov[seg].iov_len;
> > > >  
> > > >  		/* Index into the first page of the first block */
> > > >  		dio->first_block_in_page = (user_addr & ~PAGE_MASK) >> blkbits;
> > > > @@ -956,6 +958,13 @@
> > > >  		}
> > > >  	} /* end iovec loop */
> > > >  
> > > > +	if (ret == -ENOTBLK && rw == WRITE) {
> > > > +		/*
> > > > +		 * The remaining part of the request will be 
> > > > +		 * be handled by buffered I/O when we return
> > > > +		 */
> > > > +		ret = 0;
> > > > +	}
> > > >  	/*
> > > >  	 * There may be some unwritten disk at the end of a part-written
> > > >  	 * fs-block-sized block.  Go zero that now.
> > > > @@ -986,19 +995,13 @@
> > > >  	 */
> > > >  	if (dio->is_async) {
> > > >  		if (ret == 0)
> > > > -			ret = dio->result;	/* Bytes written */
> > > > -		if (ret == -ENOTBLK) {
> > > > -			/*
> > > > -			 * The request will be reissued via buffered I/O
> > > > -			 * when we return; Any I/O already issued
> > > > -			 * effectively becomes redundant.
> > > > -			 */
> > > > -			dio->result = ret;
> > > > +			ret = dio->result;
> > > > +		if (ret > 0 && dio->result < dio->size && rw == WRITE) {
> > > >  			dio->waiter = current;
> > > >  		}
> > > >  		finished_one_bio(dio);		/* This can free the dio */
> > > >  		blk_run_queues();
> > > > -		if (ret == -ENOTBLK) {
> > > > +		if (dio->waiter) {
> > > >  			/*
> > > >  			 * Wait for already issued I/O to drain out and
> > > >  			 * release its references to user-space pages
> > > > @@ -1032,7 +1035,8 @@
> > > >  		}
> > > >  		dio_complete(dio, offset, ret);
> > > >  		/* We could have also come here on an AIO file extend */
> > > > -		if (!is_sync_kiocb(iocb) && (ret != -ENOTBLK))
> > > > +		if (!is_sync_kiocb(iocb) && !(rw == WRITE && ret >= 0 && 
> > > > +			dio->result < dio->size))
> > > >  			aio_complete(iocb, ret, 0);
> > > >  		kfree(dio);
> > > >  	}
> > > > diff -ur pure-mm3/mm/filemap.c linux-2.6.0-test9-mm3/mm/filemap.c
> > > > --- pure-mm3/mm/filemap.c	2003-11-14 09:15:08.000000000 +0530
> > > > +++ linux-2.6.0-test9-mm3/mm/filemap.c	2003-11-15 11:11:16.000000000 +0530
> > > > @@ -1895,14 +1895,16 @@
> > > >  		 */
> > > >  		if (written >= 0 && file->f_flags & O_SYNC)
> > > >  			status = generic_osync_inode(inode, mapping, OSYNC_METADATA);
> > > > -		if (written >= 0 && !is_sync_kiocb(iocb))
> > > > +		if (written >= count && !is_sync_kiocb(iocb))
> > > >  			written = -EIOCBQUEUED;
> > > > -		if (written != -ENOTBLK)
> > > > +		if (written < 0 || written >= count)
> > > >  			goto out_status;
> > > >  		/*
> > > >  		 * direct-io write to a hole: fall through to buffered I/O
> > > > +		 * for completing the rest of the request.
> > > >  		 */
> > > > -		written = 0;
> > > > +		pos += written;
> > > > +		count -= written;
> > > >  	}
> > > >  
> > > >  	buf = iov->iov_base;
> > > 
> > > --
> > > To unsubscribe, send a message with 'unsubscribe linux-aio' in
> > > the body to majordomo@kvack.org.  For more info on Linux AIO,
> > > see: http://www.kvack.org/aio/
> > > Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
> > 
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-aio' in
> > the body to majordomo@kvack.org.  For more info on Linux AIO,
> > see: http://www.kvack.org/aio/
> > Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH][2.6-mm] Fix 4G/4G X11/vm86 oops
  2003-11-18 16:37                           ` Linus Torvalds
  2003-11-18 17:08                             ` Zwane Mwaikambo
@ 2003-11-19 20:32                             ` Matt Mackall
  2003-11-19 23:09                               ` Matt Mackall
  1 sibling, 1 reply; 49+ messages in thread
From: Matt Mackall @ 2003-11-19 20:32 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Zwane Mwaikambo, Ingo Molnar, Martin J. Bligh, Andrew Morton,
	Linux Kernel, linux-mm, Hugh Dickins

On Tue, Nov 18, 2003 at 08:37:25AM -0800, Linus Torvalds wrote:
> 
> On Tue, 18 Nov 2003, Zwane Mwaikambo wrote:
> > 
> > Here are diffs from the do_sys_vm86 only.
> 
> Ok. Much more readable.
> 
> And there is something very suspicious there.
> 
> The code with and without the printk() looks _identical_ apart from some 
> trivial label renumbering, and the added
> 
> 	pushl   $.LC6
> 	call    printk
> 	.. asm ..
> 	popl %esi
> 
> which all looks fine (esi is dead at that point, so the compiler is just
> using a "popl" as a shorter form of "addl $4,%esp").
> 
> Btw, you seem to compile with debugging, which makes the assembly 
> language pretty much unreadable and accounts for most of the 
> differences: the line numbers change. If you compile a kernel where the 
> line numbers don't change (by commenting _out_ the printk rather than 
> removing the whole line), your diff would be more readable.
> 
> Anyway, there are _zero_ differences.
> 
> Just for fun, try this: move the "printk()" to _below_ the "asm"  
> statement. It will never actually get executed, but if it's an issue of
> some subtle code or data placement things (cache lines etc), maybe that
> also hides the oops, since all the same code and data will be generated, 
> just not run...

Zwane's got a K6-2 500MHz. I've just managed to reproduce this on my
1.4GHz Opteron box (with Debian gcc 3.2). Here, the "ooh la la" bit
doesn't help. So my suspicion is that the printk is changing the
timing just enough on Zwane's box that he's getting a timer interrupt
knocking him out of vm86 mode before he hits a fatal bit in the fault
handling path for 4/4. Printks in handle_vm86_trap, handle_vm86_fault,
do_trap:vm86_trap, and do_general_protection:gp_in_vm86 never fire so
there's probably something amiss in the trampoline code.

-- 
Matt Mackall : http://www.selenic.com : Linux development and consulting

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH][2.6-mm] Fix 4G/4G X11/vm86 oops
  2003-11-19 20:32                             ` Matt Mackall
@ 2003-11-19 23:09                               ` Matt Mackall
  2003-11-20  7:14                                 ` Zwane Mwaikambo
  2003-11-20  7:44                                 ` Matt Mackall
  0 siblings, 2 replies; 49+ messages in thread
From: Matt Mackall @ 2003-11-19 23:09 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Zwane Mwaikambo, Ingo Molnar, Martin J. Bligh, Andrew Morton,
	Linux Kernel, linux-mm, Hugh Dickins

On Wed, Nov 19, 2003 at 02:32:10PM -0600, Matt Mackall wrote:
> 
> Zwane's got a K6-2 500MHz. I've just managed to reproduce this on my
> 1.4GHz Opteron box (with Debian gcc 3.2). Here, the "ooh la la" bit
> doesn't help. So my suspicion is that the printk is changing the
> timing just enough on Zwane's box that he's getting a timer interrupt
> knocking him out of vm86 mode before he hits a fatal bit in the fault
> handling path for 4/4. Printks in handle_vm86_trap, handle_vm86_fault,
> do_trap:vm86_trap, and do_general_protection:gp_in_vm86 never fire so
> there's probably something amiss in the trampoline code.

Some more datapoints:

CPU          distro          compiler  video        X     result
K6-2/500     connectiva 9    2.96      trident      4.3   reboot (zwane)
K6-2/500     connectiva 9    3.2.2     trident      4.3   reboot (zwane)
Opteron 240  debian unstable 3.2       S3           4.2.1 reboot
Athlon 2100  debian unstable 3.2       radeon 7500  4.2.1 works
P4M 1800     debian unstable 3.2       radeon m7    4.2.1 reboot

-- 
Matt Mackall : http://www.selenic.com : Linux development and consulting

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH][2.6-mm] Fix 4G/4G X11/vm86 oops
  2003-11-19 23:09                               ` Matt Mackall
@ 2003-11-20  7:14                                 ` Zwane Mwaikambo
  2003-11-20  7:44                                 ` Matt Mackall
  1 sibling, 0 replies; 49+ messages in thread
From: Zwane Mwaikambo @ 2003-11-20  7:14 UTC (permalink / raw)
  To: Matt Mackall
  Cc: Linus Torvalds, Ingo Molnar, Martin J. Bligh, Andrew Morton,
	Linux Kernel, linux-mm, Hugh Dickins

On Wed, 19 Nov 2003, Matt Mackall wrote:

> On Wed, Nov 19, 2003 at 02:32:10PM -0600, Matt Mackall wrote:
> > 
> > Zwane's got a K6-2 500MHz. I've just managed to reproduce this on my
> > 1.4GHz Opteron box (with Debian gcc 3.2). Here, the "ooh la la" bit
> > doesn't help. So my suspicion is that the printk is changing the
> > timing just enough on Zwane's box that he's getting a timer interrupt
> > knocking him out of vm86 mode before he hits a fatal bit in the fault
> > handling path for 4/4. Printks in handle_vm86_trap, handle_vm86_fault,
> > do_trap:vm86_trap, and do_general_protection:gp_in_vm86 never fire so
> > there's probably something amiss in the trampoline code.
> 
> Some more datapoints:

Thanks for trying those out, i got another one to add.

> CPU          distro          compiler  video        X     result
> K6-2/500     connectiva 9    2.96      trident      4.3   reboot (zwane)
> K6-2/500     connectiva 9    3.2.2     trident      4.3   reboot (zwane)
> Opteron 240  debian unstable 3.2       S3           4.2.1 reboot
> Athlon 2100  debian unstable 3.2       radeon 7500  4.2.1 works
> P4M 1800     debian unstable 3.2       radeon m7    4.2.1 reboot

P4/Xeon 2000	Fedora Core 1	3.3.2	ATI Rage XL	4.3.0 reboot

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH][2.6-mm] Fix 4G/4G X11/vm86 oops
  2003-11-19 23:09                               ` Matt Mackall
  2003-11-20  7:14                                 ` Zwane Mwaikambo
@ 2003-11-20  7:44                                 ` Matt Mackall
  2003-11-20  7:53                                   ` Andrew Morton
  2003-11-20  8:13                                   ` Matt Mackall
  1 sibling, 2 replies; 49+ messages in thread
From: Matt Mackall @ 2003-11-20  7:44 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Zwane Mwaikambo, Ingo Molnar, Martin J. Bligh, Andrew Morton,
	Linux Kernel, linux-mm, Hugh Dickins

On Wed, Nov 19, 2003 at 05:09:28PM -0600, Matt Mackall wrote:
> On Wed, Nov 19, 2003 at 02:32:10PM -0600, Matt Mackall wrote:
> > 
> > Zwane's got a K6-2 500MHz. I've just managed to reproduce this on my
> > 1.4GHz Opteron box (with Debian gcc 3.2). Here, the "ooh la la" bit
> > doesn't help. So my suspicion is that the printk is changing the
> > timing just enough on Zwane's box that he's getting a timer interrupt
> > knocking him out of vm86 mode before he hits a fatal bit in the fault
> > handling path for 4/4. Printks in handle_vm86_trap, handle_vm86_fault,
> > do_trap:vm86_trap, and do_general_protection:gp_in_vm86 never fire so
> > there's probably something amiss in the trampoline code.
> 
> Some more datapoints:
> 
> CPU          distro          compiler  video        X     result
> K6-2/500     connectiva 9    2.96      trident      4.3   reboot (zwane)
> K6-2/500     connectiva 9    3.2.2     trident      4.3   reboot (zwane)
> Opteron 240  debian unstable 3.2       S3           4.2.1 reboot
> Athlon 2100  debian unstable 3.2       radeon 7500  4.2.1 works
> P4M 1800     debian unstable 3.2       radeon m7    4.2.1 reboot

And indeed it does turn out to be a problem with the trampoline
mechanics. The fix for -mm4:


Fix triple faulting on some boxes with 4G/4G


 mm-mpm/arch/i386/kernel/vm86.c |    2 +-
 1 files changed, 1 insertion(+), 1 deletion(-)

diff -puN arch/i386/kernel/vm86.c~virtual-esp arch/i386/kernel/vm86.c
--- mm/arch/i386/kernel/vm86.c~virtual-esp	2003-11-20 01:36:32.000000000 -0600
+++ mm-mpm/arch/i386/kernel/vm86.c	2003-11-20 01:36:32.000000000 -0600
@@ -306,7 +306,7 @@ static void do_sys_vm86(struct kernel_vm
 	tss->esp0 = virtual_esp0(tsk);
 	if (cpu_has_sep)
 		tsk->thread.sysenter_cs = 0;
-	load_esp0(tss, &tsk->thread);
+	load_virtual_esp0(tss, tsk);
 	put_cpu();
 
 	tsk->thread.screen_bitmap = info->screen_bitmap;

_
 

-- 
Matt Mackall : http://www.selenic.com : Linux development and consulting

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH][2.6-mm] Fix 4G/4G X11/vm86 oops
  2003-11-20  7:44                                 ` Matt Mackall
@ 2003-11-20  7:53                                   ` Andrew Morton
  2003-11-20  8:13                                   ` Matt Mackall
  1 sibling, 0 replies; 49+ messages in thread
From: Andrew Morton @ 2003-11-20  7:53 UTC (permalink / raw)
  To: Matt Mackall; +Cc: torvalds, zwane, mingo, mbligh, linux-kernel, linux-mm, hugh

Matt Mackall <mpm@selenic.com> wrote:
>
>  -	load_esp0(tss, &tsk->thread);
>  +	load_virtual_esp0(tss, tsk);

Thanks guys.

Now I'll have to put something else in there to keep you amused ;)



^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH][2.6-mm] Fix 4G/4G X11/vm86 oops
  2003-11-20  7:44                                 ` Matt Mackall
  2003-11-20  7:53                                   ` Andrew Morton
@ 2003-11-20  8:13                                   ` Matt Mackall
  1 sibling, 0 replies; 49+ messages in thread
From: Matt Mackall @ 2003-11-20  8:13 UTC (permalink / raw)
  To: Linus Torvalds
  Cc: Zwane Mwaikambo, Ingo Molnar, Martin J. Bligh, Andrew Morton,
	Linux Kernel, linux-mm, Hugh Dickins

On Thu, Nov 20, 2003 at 01:44:05AM -0600, Matt Mackall wrote:
> On Wed, Nov 19, 2003 at 05:09:28PM -0600, Matt Mackall wrote:
> > On Wed, Nov 19, 2003 at 02:32:10PM -0600, Matt Mackall wrote:
> > > 
> > > Zwane's got a K6-2 500MHz. I've just managed to reproduce this on my
> > > 1.4GHz Opteron box (with Debian gcc 3.2). Here, the "ooh la la" bit
> > > doesn't help. So my suspicion is that the printk is changing the
> > > timing just enough on Zwane's box that he's getting a timer interrupt
> > > knocking him out of vm86 mode before he hits a fatal bit in the fault
> > > handling path for 4/4. Printks in handle_vm86_trap, handle_vm86_fault,
> > > do_trap:vm86_trap, and do_general_protection:gp_in_vm86 never fire so
> > > there's probably something amiss in the trampoline code.
> > 
> > Some more datapoints:
> > 
> > CPU          distro          compiler  video        X     result
> > K6-2/500     connectiva 9    2.96      trident      4.3   reboot (zwane)
> > K6-2/500     connectiva 9    3.2.2     trident      4.3   reboot (zwane)
> > Opteron 240  debian unstable 3.2       S3           4.2.1 reboot
> > Athlon 2100  debian unstable 3.2       radeon 7500  4.2.1 works
> > P4M 1800     debian unstable 3.2       radeon m7    4.2.1 reboot
> 
> And indeed it does turn out to be a problem with the trampoline
> mechanics. The fix for -mm4:

Cleanup, as pointed out by Zwane:

Fix triple faulting on some boxes with 4G/4G


 mm-mpm/arch/i386/kernel/vm86.c |    3 +--
 1 files changed, 1 insertion(+), 2 deletions(-)

diff -puN arch/i386/kernel/vm86.c~virtual-esp arch/i386/kernel/vm86.c
--- mm/arch/i386/kernel/vm86.c~virtual-esp	2003-11-20 01:36:32.000000000 -0600
+++ mm-mpm/arch/i386/kernel/vm86.c	2003-11-20 02:08:38.000000000 -0600
@@ -303,10 +303,9 @@ static void do_sys_vm86(struct kernel_vm
 
 	tss = init_tss + get_cpu();
 	tsk->thread.esp0 = (unsigned long) &info->VM86_TSS_ESP0;
-	tss->esp0 = virtual_esp0(tsk);
 	if (cpu_has_sep)
 		tsk->thread.sysenter_cs = 0;
-	load_esp0(tss, &tsk->thread);
+	load_virtual_esp0(tss, tsk);
 	put_cpu();
 
 	tsk->thread.screen_bitmap = info->screen_bitmap;

_



-- 
Matt Mackall : http://www.selenic.com : Linux development and consulting

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: 2.6.0-test9-mm3 - AIO test results
  2003-11-18 23:47           ` Daniel McNeil
@ 2003-11-24  9:42             ` Suparna Bhattacharya
  2003-11-25 23:49               ` [PATCH 2.6.0-test9-mm5] aio-dio-fallback-bio_count-race.patch Daniel McNeil
  0 siblings, 1 reply; 49+ messages in thread
From: Suparna Bhattacharya @ 2003-11-24  9:42 UTC (permalink / raw)
  To: Daniel McNeil
  Cc: Andrew Morton, Linux Kernel Mailing List, linux-mm, linux-aio

On Tue, Nov 18, 2003 at 03:47:53PM -0800, Daniel McNeil wrote:
> Suparna,
> 
> I was unable to reproduce the hang in io_submit() without your patch.
> I ran aiocp with 1k i/o size constantly for 2 hours and it never hung.
> 
> I re-ran with your patch with both as-iosched and deadline and both
> hung in io_submit().  aiocp would run a few times, but I put the
> aiocp in a while loop and it hung on the 1st or 2nd time.  It
> did get most of the way through copying the file before hanging.
> This is on a 2-proc to ide disks running ext3.
> 

Found one race ... not sure if its the one causing the hangs
you see. The attached patch is not a complete fix (there is one
other race to close), but it would be interesting to see if 
this makes any difference for you.

Regards
Suparna

-- 
Suparna Bhattacharya (suparna@in.ibm.com)
Linux Technology Center
IBM Software Labs, India

------------------------------------------------------
Don't access dio fields if its possible that the dio could 
already have been freed asynchronously during i/o completion.
Fixme: This still leaves a window between decrement of
bio_count and accessing dio->waiter during i/o completion 
wherein the dio could get freed by the submission path.


--- pure-mm3/fs/direct-io.c	2003-11-24 13:00:33.000000000 +0530
+++ linux-2.6.0-test9-mm3/fs/direct-io.c	2003-11-24 14:15:30.000000000 +0530
@@ -994,14 +995,17 @@
 	 * reflect the number of to-be-processed BIOs.
 	 */
 	if (dio->is_async) {
-		if (ret == 0)
-			ret = dio->result;
-		if (ret > 0 && dio->result < dio->size && rw == WRITE) {
+		int should_wait = 0;
+
+		if (dio->result < dio->size && rw == WRITE) {
 			dio->waiter = current;
+			should_wait = 1;
 		}
+		if (ret == 0)
+			ret = dio->result;
 		finished_one_bio(dio);		/* This can free the dio */
 		blk_run_queues();
-		if (dio->waiter) {
+		if (should_wait) {
 			/*
 			 * Wait for already issued I/O to drain out and
 			 * release its references to user-space pages
@@ -1013,7 +1017,7 @@
 				set_current_state(TASK_UNINTERRUPTIBLE);
 			}
 			set_current_state(TASK_RUNNING);
-			dio->waiter = NULL;
+			kfree(dio);
 		}
 	} else {
 		finished_one_bio(dio);

^ permalink raw reply	[flat|nested] 49+ messages in thread

* [PATCH 2.6.0-test9-mm5] aio-dio-fallback-bio_count-race.patch
  2003-11-24  9:42             ` Suparna Bhattacharya
@ 2003-11-25 23:49               ` Daniel McNeil
  2003-11-26  7:55                 ` Suparna Bhattacharya
  0 siblings, 1 reply; 49+ messages in thread
From: Daniel McNeil @ 2003-11-25 23:49 UTC (permalink / raw)
  To: Suparna Bhattacharya
  Cc: Andrew Morton, Linux Kernel Mailing List, linux-mm, linux-aio

[-- Attachment #1: Type: text/plain, Size: 1861 bytes --]

Suparna,

Yes your patch did help.  I originally had CONFIG_DEBUG_SLAB=y which
was helping me see problems because the the freed dio was getting
poisoned.  I also tested with CONFIG_DEBUG_PAGEALLOC=y which is
very good at catching these.

I updated your AIO fallback patch plus your AIO race plus I fixed
the bio_count decrement fix.  This patch has all three fixes and
it is working for me.

I fixed the bio_count race, by changing bio_list_lock into bio_lock
and using that for all the bio fields.  I changed bio_count and
bios_in_flight from atomics into int.  They are now proctected by
the bio_lock.  I fixed the race, by in finished_one_bio() by
leaving the bio_count at 1 until after the dio_complete()
and then do the bio_count decrement and wakeup holding the bio_lock.

Take a look, give it a try, and let me know what you think.

I've tested this on my 2-way and so far all my tests have past.
I have more testing to do, but this is working better.

Thanks,

Daniel



On Mon, 2003-11-24 at 01:42, Suparna Bhattacharya wrote:
> On Tue, Nov 18, 2003 at 03:47:53PM -0800, Daniel McNeil wrote:
> > Suparna,
> > 
> > I was unable to reproduce the hang in io_submit() without your patch.
> > I ran aiocp with 1k i/o size constantly for 2 hours and it never hung.
> > 
> > I re-ran with your patch with both as-iosched and deadline and both
> > hung in io_submit().  aiocp would run a few times, but I put the
> > aiocp in a while loop and it hung on the 1st or 2nd time.  It
> > did get most of the way through copying the file before hanging.
> > This is on a 2-proc to ide disks running ext3.
> > 
> 
> Found one race ... not sure if its the one causing the hangs
> you see. The attached patch is not a complete fix (there is one
> other race to close), but it would be interesting to see if 
> this makes any difference for you.
> 
> Regards
> Suparna

[-- Attachment #2: 2.6.0-test9-mm5.aio-dio-fallback-bio_count-race.patch --]
[-- Type: text/x-patch, Size: 9088 bytes --]

diff -rupN -X /home/daniel/dontdiff linux-2.6.0-test9-mm5/fs/direct-io.c linux-2.6.0-test9-mm5.ddm/fs/direct-io.c
--- linux-2.6.0-test9-mm5/fs/direct-io.c	2003-11-24 09:06:05.000000000 -0800
+++ linux-2.6.0-test9-mm5.ddm/fs/direct-io.c	2003-11-25 14:52:43.566103685 -0800
@@ -74,6 +74,7 @@ struct dio {
 					   been performed at the start of a
 					   write */
 	int pages_in_io;		/* approximate total IO pages */
+	size_t	size;			/* total request size (doesn't change)*/
 	sector_t block_in_file;		/* Current offset into the underlying
 					   file in dio_block units. */
 	unsigned blocks_available;	/* At block_in_file.  changes */
@@ -115,9 +116,9 @@ struct dio {
 	int page_errors;		/* errno from get_user_pages() */
 
 	/* BIO completion state */
-	atomic_t bio_count;		/* nr bios to be completed */
-	atomic_t bios_in_flight;	/* nr bios in flight */
-	spinlock_t bio_list_lock;	/* protects bio_list */
+	spinlock_t bio_lock;		/* protects BIO fields below */
+	int bio_count;			/* nr bios to be completed */
+	int bios_in_flight;		/* nr bios in flight */
 	struct bio *bio_list;		/* singly linked via bi_private */
 	struct task_struct *waiter;	/* waiting task (NULL if none) */
 
@@ -221,20 +222,38 @@ static void dio_complete(struct dio *dio
  */
 static void finished_one_bio(struct dio *dio)
 {
-	if (atomic_dec_and_test(&dio->bio_count)) {
+	unsigned long flags;
+
+	spin_lock_irqsave(&dio->bio_lock, flags);
+	if (dio->bio_count == 1) {
 		if (dio->is_async) {
+			/*
+			 * Last reference to the dio is going away.
+			 * Drop spinlock and complete the DIO.
+			 */
+			spin_unlock_irqrestore(&dio->bio_lock, flags);
 			dio_complete(dio, dio->block_in_file << dio->blkbits,
 					dio->result);
 			/* Complete AIO later if falling back to buffered i/o */
-			if (dio->result != -ENOTBLK) {
+			if (dio->result >= dio->size || dio->rw == READ) {
 				aio_complete(dio->iocb, dio->result, 0);
 				kfree(dio);
+				return;
 			} else {
+				/*
+				 * Falling back to buffered
+				 */
+				spin_lock_irqsave(&dio->bio_lock, flags);
+				dio->bio_count--;
 				if (dio->waiter)
 					wake_up_process(dio->waiter);
+				spin_unlock_irqrestore(&dio->bio_lock, flags);
+				return;
 			}
 		}
 	}
+	dio->bio_count--;
+	spin_unlock_irqrestore(&dio->bio_lock, flags);
 }
 
 static int dio_bio_complete(struct dio *dio, struct bio *bio);
@@ -268,13 +287,13 @@ static int dio_bio_end_io(struct bio *bi
 	if (bio->bi_size)
 		return 1;
 
-	spin_lock_irqsave(&dio->bio_list_lock, flags);
+	spin_lock_irqsave(&dio->bio_lock, flags);
 	bio->bi_private = dio->bio_list;
 	dio->bio_list = bio;
-	atomic_dec(&dio->bios_in_flight);
-	if (dio->waiter && atomic_read(&dio->bios_in_flight) == 0)
+	dio->bios_in_flight--;
+	if (dio->waiter && dio->bios_in_flight == 0)
 		wake_up_process(dio->waiter);
-	spin_unlock_irqrestore(&dio->bio_list_lock, flags);
+	spin_unlock_irqrestore(&dio->bio_lock, flags);
 	return 0;
 }
 
@@ -307,10 +326,13 @@ dio_bio_alloc(struct dio *dio, struct bl
 static void dio_bio_submit(struct dio *dio)
 {
 	struct bio *bio = dio->bio;
+	unsigned long flags;
 
 	bio->bi_private = dio;
-	atomic_inc(&dio->bio_count);
-	atomic_inc(&dio->bios_in_flight);
+	spin_lock_irqsave(&dio->bio_lock, flags);
+	dio->bio_count++;
+	dio->bios_in_flight++;
+	spin_unlock_irqrestore(&dio->bio_lock, flags);
 	if (dio->is_async && dio->rw == READ)
 		bio_set_pages_dirty(bio);
 	submit_bio(dio->rw, bio);
@@ -336,22 +358,22 @@ static struct bio *dio_await_one(struct 
 	unsigned long flags;
 	struct bio *bio;
 
-	spin_lock_irqsave(&dio->bio_list_lock, flags);
+	spin_lock_irqsave(&dio->bio_lock, flags);
 	while (dio->bio_list == NULL) {
 		set_current_state(TASK_UNINTERRUPTIBLE);
 		if (dio->bio_list == NULL) {
 			dio->waiter = current;
-			spin_unlock_irqrestore(&dio->bio_list_lock, flags);
+			spin_unlock_irqrestore(&dio->bio_lock, flags);
 			blk_run_queues();
 			io_schedule();
-			spin_lock_irqsave(&dio->bio_list_lock, flags);
+			spin_lock_irqsave(&dio->bio_lock, flags);
 			dio->waiter = NULL;
 		}
 		set_current_state(TASK_RUNNING);
 	}
 	bio = dio->bio_list;
 	dio->bio_list = bio->bi_private;
-	spin_unlock_irqrestore(&dio->bio_list_lock, flags);
+	spin_unlock_irqrestore(&dio->bio_lock, flags);
 	return bio;
 }
 
@@ -393,7 +415,12 @@ static int dio_await_completion(struct d
 	if (dio->bio)
 		dio_bio_submit(dio);
 
-	while (atomic_read(&dio->bio_count)) {
+	/*
+	 * The bio_lock is not held for the read of bio_count.
+	 * This is ok since it is the dio_bio_complete() that changes
+	 * bio_count.
+	 */
+	while (dio->bio_count) {
 		struct bio *bio = dio_await_one(dio);
 		int ret2;
 
@@ -420,10 +447,10 @@ static int dio_bio_reap(struct dio *dio)
 			unsigned long flags;
 			struct bio *bio;
 
-			spin_lock_irqsave(&dio->bio_list_lock, flags);
+			spin_lock_irqsave(&dio->bio_lock, flags);
 			bio = dio->bio_list;
 			dio->bio_list = bio->bi_private;
-			spin_unlock_irqrestore(&dio->bio_list_lock, flags);
+			spin_unlock_irqrestore(&dio->bio_lock, flags);
 			ret = dio_bio_complete(dio, bio);
 		}
 		dio->reap_counter = 0;
@@ -889,6 +916,7 @@ direct_io_worker(int rw, struct kiocb *i
 	dio->blkbits = blkbits;
 	dio->blkfactor = inode->i_blkbits - blkbits;
 	dio->start_zero_done = 0;
+	dio->size = 0;
 	dio->block_in_file = offset >> blkbits;
 	dio->blocks_available = 0;
 	dio->cur_page = NULL;
@@ -913,9 +941,9 @@ direct_io_worker(int rw, struct kiocb *i
 	 * (or synchronous) device could take the count to zero while we're
 	 * still submitting BIOs.
 	 */
-	atomic_set(&dio->bio_count, 1);
-	atomic_set(&dio->bios_in_flight, 0);
-	spin_lock_init(&dio->bio_list_lock);
+	dio->bio_count = 1;
+	dio->bios_in_flight = 0;
+	spin_lock_init(&dio->bio_lock);
 	dio->bio_list = NULL;
 	dio->waiter = NULL;
 
@@ -925,7 +953,7 @@ direct_io_worker(int rw, struct kiocb *i
 
 	for (seg = 0; seg < nr_segs; seg++) {
 		user_addr = (unsigned long)iov[seg].iov_base;
-		bytes = iov[seg].iov_len;
+		dio->size += bytes = iov[seg].iov_len;
 
 		/* Index into the first page of the first block */
 		dio->first_block_in_page = (user_addr & ~PAGE_MASK) >> blkbits;
@@ -956,6 +984,13 @@ direct_io_worker(int rw, struct kiocb *i
 		}
 	} /* end iovec loop */
 
+	if (ret == -ENOTBLK && rw == WRITE) {
+		/*
+		 * The remaining part of the request will be 
+		 * be handled by buffered I/O when we return
+		 */
+		ret = 0;
+	}
 	/*
 	 * There may be some unwritten disk at the end of a part-written
 	 * fs-block-sized block.  Go zero that now.
@@ -985,32 +1020,35 @@ direct_io_worker(int rw, struct kiocb *i
 	 * reflect the number of to-be-processed BIOs.
 	 */
 	if (dio->is_async) {
-		if (ret == 0)
-			ret = dio->result;	/* Bytes written */
-		if (ret == -ENOTBLK) {
-			/*
-			 * The request will be reissued via buffered I/O
-			 * when we return; Any I/O already issued
-			 * effectively becomes redundant.
-			 */
-			dio->result = ret;
+		int should_wait = 0;
+		
+		if (dio->result < dio->size && rw == WRITE) {
 			dio->waiter = current;
+			should_wait = 1;
 		}
+		if (ret == 0)
+			ret = dio->result;
 		finished_one_bio(dio);		/* This can free the dio */
 		blk_run_queues();
-		if (ret == -ENOTBLK) {
+		if (should_wait) {
+			unsigned long flags;
 			/*
 			 * Wait for already issued I/O to drain out and
 			 * release its references to user-space pages
 			 * before returning to fallback on buffered I/O
 			 */
+
+			spin_lock_irqsave(&dio->bio_lock, flags);
 			set_current_state(TASK_UNINTERRUPTIBLE);
-			while (atomic_read(&dio->bio_count)) {
+			while (dio->bio_count) {
+				spin_unlock_irqrestore(&dio->bio_lock, flags);
 				io_schedule();
+				spin_lock_irqsave(&dio->bio_lock, flags);
 				set_current_state(TASK_UNINTERRUPTIBLE);
 			}
+			spin_unlock_irqrestore(&dio->bio_lock, flags);
 			set_current_state(TASK_RUNNING);
-			dio->waiter = NULL;
+			kfree(dio);
 		}
 	} else {
 		finished_one_bio(dio);
@@ -1032,7 +1070,8 @@ direct_io_worker(int rw, struct kiocb *i
 		}
 		dio_complete(dio, offset, ret);
 		/* We could have also come here on an AIO file extend */
-		if (!is_sync_kiocb(iocb) && (ret != -ENOTBLK))
+		if (!is_sync_kiocb(iocb) && !(rw == WRITE && ret >= 0 && 
+			dio->result < dio->size))
 			aio_complete(iocb, ret, 0);
 		kfree(dio);
 	}
diff -rupN -X /home/daniel/dontdiff linux-2.6.0-test9-mm5/mm/filemap.c linux-2.6.0-test9-mm5.ddm/mm/filemap.c
--- linux-2.6.0-test9-mm5/mm/filemap.c	2003-11-24 09:06:06.000000000 -0800
+++ linux-2.6.0-test9-mm5.ddm/mm/filemap.c	2003-11-21 14:20:09.000000000 -0800
@@ -1908,14 +1908,16 @@ __generic_file_aio_write_nolock(struct k
 		 */
 		if (written >= 0 && file->f_flags & O_SYNC)
 			status = generic_osync_inode(inode, mapping, OSYNC_METADATA);
-		if (written >= 0 && !is_sync_kiocb(iocb))
+		if (written >= count && !is_sync_kiocb(iocb))
 			written = -EIOCBQUEUED;
-		if (written != -ENOTBLK)
+		if (written < 0 || written >= count)
 			goto out_status;
 		/*
 		 * direct-io write to a hole: fall through to buffered I/O
+		 * for completing the rest of the request.
 		 */
-		written = 0;
+		pos += written;
+		count -= written;
 	}
 
 	buf = iov->iov_base;

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 2.6.0-test9-mm5] aio-dio-fallback-bio_count-race.patch
  2003-11-25 23:49               ` [PATCH 2.6.0-test9-mm5] aio-dio-fallback-bio_count-race.patch Daniel McNeil
@ 2003-11-26  7:55                 ` Suparna Bhattacharya
  2003-12-02  1:35                   ` Daniel McNeil
  0 siblings, 1 reply; 49+ messages in thread
From: Suparna Bhattacharya @ 2003-11-26  7:55 UTC (permalink / raw)
  To: Daniel McNeil
  Cc: Andrew Morton, Linux Kernel Mailing List, linux-mm, linux-aio

On Tue, Nov 25, 2003 at 03:49:31PM -0800, Daniel McNeil wrote:
> Suparna,
> 
> Yes your patch did help.  I originally had CONFIG_DEBUG_SLAB=y which
> was helping me see problems because the the freed dio was getting
> poisoned.  I also tested with CONFIG_DEBUG_PAGEALLOC=y which is
> very good at catching these.

Ah I see - perhaps that explains why neither Janet nor I could
recreate the problem that you were hitting so easily. So we 
should probably try running with CONFIG_DEBUG_SLAB and
CONFIG_DEBUG_PAGEALLOC as well.

> 
> I updated your AIO fallback patch plus your AIO race plus I fixed
> the bio_count decrement fix.  This patch has all three fixes and
> it is working for me.
> 
> I fixed the bio_count race, by changing bio_list_lock into bio_lock
> and using that for all the bio fields.  I changed bio_count and
> bios_in_flight from atomics into int.  They are now proctected by
> the bio_lock.  I fixed the race, by in finished_one_bio() by
> leaving the bio_count at 1 until after the dio_complete()
> and then do the bio_count decrement and wakeup holding the bio_lock.
> 
> Take a look, give it a try, and let me know what you think.

I had been trying a slightly different kind of fix -- appended is
the updated version of the patch I last posted. It uses the bio_list_lock
to protect the dio->waiter field, which finished_one_bio sets back
to NULL after it has issued the wakeup; and the code that waits for
i/o to drain out checks the dio->waiter field instead of bio_count.
This might not seem very obvious given the nomenclature of the 
bio_list_lock, so I was holding back wondering if it could be 
improved. 

Your approach looks clearer in that sense -- its pretty unambiguous
about what lock protects what fields. The only thing that bothers me (and
this is what I was trying to avoid in my patch) is the increased
use of spin_lock_irq 's (overhead of turning interrupts off and on)
instead of simple atomic inc/dec in most places.

Thoughts ?

Regards
Suparna

-- 
Suparna Bhattacharya (suparna@in.ibm.com)
Linux Technology Center
IBM Software Labs, India

------------------------------------

Don't access dio fields if its possible that the dio could
already have been freed asynchronously during i/o completion.
The dio->bio_list_lock protects the dio->waiter field as in
the case of synchronous i/o.

--- pure-mm3/fs/direct-io.c	2003-11-24 13:00:33.000000000 +0530
+++ linux-2.6.0-test9-mm3/fs/direct-io.c	2003-11-25 14:08:26.000000000 +0530
@@ -231,8 +231,17 @@
 				aio_complete(dio->iocb, dio->result, 0);
 				kfree(dio);
 			} else {
-				if (dio->waiter)
-					wake_up_process(dio->waiter);
+				struct task_struct *waiter;
+				unsigned long flags;
+
+				spin_lock_irqsave(&dio->bio_list_lock, flags);
+				waiter = dio->waiter;
+				if (waiter) {
+					dio->waiter = NULL;
+					wake_up_process(waiter);
+				}
+				spin_unlock_irqrestore(&dio->bio_list_lock, 
+					flags);
 			}
 		}
 	}
@@ -994,26 +1004,35 @@
 	 * reflect the number of to-be-processed BIOs.
 	 */
 	if (dio->is_async) {
-		if (ret == 0)
-			ret = dio->result;
-		if (ret > 0 && dio->result < dio->size && rw == WRITE) {
+		int should_wait = 0;
+
+		if (dio->result < dio->size && rw == WRITE) {
 			dio->waiter = current;
+			should_wait = 1;
 		}
+		if (ret == 0)
+			ret = dio->result;
 		finished_one_bio(dio);		/* This can free the dio */
 		blk_run_queues();
-		if (dio->waiter) {
+		if (should_wait) {
+			unsigned long flags;
 			/*
 			 * Wait for already issued I/O to drain out and
 			 * release its references to user-space pages
 			 * before returning to fallback on buffered I/O
 			 */
+			spin_lock_irqsave(&dio->bio_list_lock, flags);
 			set_current_state(TASK_UNINTERRUPTIBLE);
-			while (atomic_read(&dio->bio_count)) {
+			while (dio->waiter) {
+				spin_unlock_irqrestore(&dio->bio_list_lock, 
+				flags);
 				io_schedule();
 				set_current_state(TASK_UNINTERRUPTIBLE);
+				spin_lock_irqsave(&dio->bio_list_lock, flags);
 			}
 			set_current_state(TASK_RUNNING);
-			dio->waiter = NULL;
+			spin_unlock_irqrestore(&dio->bio_list_lock, flags);
+			kfree(dio);
 		}
 	} else {
 		finished_one_bio(dio);

^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 2.6.0-test9-mm5] aio-dio-fallback-bio_count-race.patch
  2003-11-26  7:55                 ` Suparna Bhattacharya
@ 2003-12-02  1:35                   ` Daniel McNeil
  2003-12-02 15:25                     ` Suparna Bhattacharya
  0 siblings, 1 reply; 49+ messages in thread
From: Daniel McNeil @ 2003-12-02  1:35 UTC (permalink / raw)
  To: Suparna Bhattacharya
  Cc: Andrew Morton, Linux Kernel Mailing List, linux-mm, linux-aio

Suparna,

Sorry I did not respond sooner, I was on vacation.

Your patch should also fix the problem.  I like mine with the
cleaner locking.

I am not sure your approach has less overhead.  At least
on x86, cli/sti are fairly inexpensive.  The locked xchange or locked
inc/dec is what is expensive (from what I understand).

So comparing:

my patch:				Your patch:

dio_bio_submit()		
	spin_lock()			atomic_inc(bio_count);
	bio_count++			atomic_inc(bios_in_flight);
	bios_in_flight++
	spin_unlock

My guess is that the spin_lock/spin_unlock is faster than 2 atomic_inc's
since it is only 1 locked operation (spin_lock) verses 2 (atomic_inc's)

finished_one_bio() (normal case)

My patch:
	spin_lock()			atomic_dec_and_test(bio_count)
	bio_count--
	spin_unlock()	
	
1 locked instruction each, so very close -- atomic_dec_and_test() does
not disable interrupts, so it is probabably a little bit faster.

finished_one-bio (fallback case):

	spin_lock()			spin_lock()
	bio_count--;			dio->waiter = null
	spin_unlock()			spin_unlock()

Both approaches are the same.
	
dio_bio_complete()

	spin_lock()			spin_lock()
	bios_in_flight--		atomic_dec()
	spin_unlock			spin_unlock()

My patch is faster since it removed 1 locked instruction.

Conclusion:

My guess would be that both approaches are close, but my patch
has less locked instructions but does disable interrupts more.
My preference is for the cleaner locking approach that is easier
to understand and modify in the future.

Daniel

On Tue, 2003-11-25 at 23:55, Suparna Bhattacharya wrote:
> On Tue, Nov 25, 2003 at 03:49:31PM -0800, Daniel McNeil wrote:
> > Suparna,
> > 
> > Yes your patch did help.  I originally had CONFIG_DEBUG_SLAB=y which
> > was helping me see problems because the the freed dio was getting
> > poisoned.  I also tested with CONFIG_DEBUG_PAGEALLOC=y which is
> > very good at catching these.
> 
> Ah I see - perhaps that explains why neither Janet nor I could
> recreate the problem that you were hitting so easily. So we 
> should probably try running with CONFIG_DEBUG_SLAB and
> CONFIG_DEBUG_PAGEALLOC as well.
> 
> > 
> > I updated your AIO fallback patch plus your AIO race plus I fixed
> > the bio_count decrement fix.  This patch has all three fixes and
> > it is working for me.
> > 
> > I fixed the bio_count race, by changing bio_list_lock into bio_lock
> > and using that for all the bio fields.  I changed bio_count and
> > bios_in_flight from atomics into int.  They are now proctected by
> > the bio_lock.  I fixed the race, by in finished_one_bio() by
> > leaving the bio_count at 1 until after the dio_complete()
> > and then do the bio_count decrement and wakeup holding the bio_lock.
> > 
> > Take a look, give it a try, and let me know what you think.
> 
> I had been trying a slightly different kind of fix -- appended is
> the updated version of the patch I last posted. It uses the bio_list_lock
> to protect the dio->waiter field, which finished_one_bio sets back
> to NULL after it has issued the wakeup; and the code that waits for
> i/o to drain out checks the dio->waiter field instead of bio_count.
> This might not seem very obvious given the nomenclature of the 
> bio_list_lock, so I was holding back wondering if it could be 
> improved. 
> 
> Your approach looks clearer in that sense -- its pretty unambiguous
> about what lock protects what fields. The only thing that bothers me (and
> this is what I was trying to avoid in my patch) is the increased
> use of spin_lock_irq 's (overhead of turning interrupts off and on)
> instead of simple atomic inc/dec in most places.
> 
> Thoughts ?
> 
> Regards
> Suparna


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 2.6.0-test9-mm5] aio-dio-fallback-bio_count-race.patch
  2003-12-02  1:35                   ` Daniel McNeil
@ 2003-12-02 15:25                     ` Suparna Bhattacharya
  2003-12-03 23:14                       ` Daniel McNeil
  0 siblings, 1 reply; 49+ messages in thread
From: Suparna Bhattacharya @ 2003-12-02 15:25 UTC (permalink / raw)
  To: Daniel McNeil
  Cc: Andrew Morton, Linux Kernel Mailing List, linux-mm, linux-aio

I suspect the degree to which the spin_lock_irq is costlier 
than atomic_inc/dec would vary across architectures - cli/sti
is probably more expensive on certain archs than others.

The patch I sent just kept things the way they were in terms of 
locking costs, assuming that those choices were thought through
at that time (should check with akpm). Yours changes it by 
switching to spin_lock(unlock)_irq instead of atomic_dec in 
the normal (common) path for finished_one_bio, for both sync 
and async i/o. At the same time, for the sync i/o case, as 
you observe it takes away one atomic_dec from dio_bio_end_io. 

Since these probably aren't really very hot paths ... possibly
the difference doesn't matter that much. I do agree that your
patch makes the locking easier to follow.

Regards
Suparna

On Mon, Dec 01, 2003 at 05:35:18PM -0800, Daniel McNeil wrote:
> Suparna,
> 
> Sorry I did not respond sooner, I was on vacation.
> 
> Your patch should also fix the problem.  I like mine with the
> cleaner locking.
> 
> I am not sure your approach has less overhead.  At least
> on x86, cli/sti are fairly inexpensive.  The locked xchange or locked
> inc/dec is what is expensive (from what I understand).
> 
> So comparing:
> 
> my patch:				Your patch:
> 
> dio_bio_submit()		
> 	spin_lock()			atomic_inc(bio_count);
> 	bio_count++			atomic_inc(bios_in_flight);
> 	bios_in_flight++
> 	spin_unlock
> 
> My guess is that the spin_lock/spin_unlock is faster than 2 atomic_inc's
> since it is only 1 locked operation (spin_lock) verses 2 (atomic_inc's)
> 
> finished_one_bio() (normal case)
> 
> My patch:
> 	spin_lock()			atomic_dec_and_test(bio_count)
> 	bio_count--
> 	spin_unlock()	
> 	
> 1 locked instruction each, so very close -- atomic_dec_and_test() does
> not disable interrupts, so it is probabably a little bit faster.
> 
> finished_one-bio (fallback case):
> 
> 	spin_lock()			spin_lock()
> 	bio_count--;			dio->waiter = null
> 	spin_unlock()			spin_unlock()
> 
> Both approaches are the same.
> 	
> dio_bio_complete()
> 
> 	spin_lock()			spin_lock()
> 	bios_in_flight--		atomic_dec()
> 	spin_unlock			spin_unlock()
> 
> My patch is faster since it removed 1 locked instruction.
> 
> Conclusion:
> 
> My guess would be that both approaches are close, but my patch
> has less locked instructions but does disable interrupts more.
> My preference is for the cleaner locking approach that is easier
> to understand and modify in the future.
> 
> Daniel
> 
> On Tue, 2003-11-25 at 23:55, Suparna Bhattacharya wrote:
> > On Tue, Nov 25, 2003 at 03:49:31PM -0800, Daniel McNeil wrote:
> > > Suparna,
> > > 
> > > Yes your patch did help.  I originally had CONFIG_DEBUG_SLAB=y which
> > > was helping me see problems because the the freed dio was getting
> > > poisoned.  I also tested with CONFIG_DEBUG_PAGEALLOC=y which is
> > > very good at catching these.
> > 
> > Ah I see - perhaps that explains why neither Janet nor I could
> > recreate the problem that you were hitting so easily. So we 
> > should probably try running with CONFIG_DEBUG_SLAB and
> > CONFIG_DEBUG_PAGEALLOC as well.
> > 
> > > 
> > > I updated your AIO fallback patch plus your AIO race plus I fixed
> > > the bio_count decrement fix.  This patch has all three fixes and
> > > it is working for me.
> > > 
> > > I fixed the bio_count race, by changing bio_list_lock into bio_lock
> > > and using that for all the bio fields.  I changed bio_count and
> > > bios_in_flight from atomics into int.  They are now proctected by
> > > the bio_lock.  I fixed the race, by in finished_one_bio() by
> > > leaving the bio_count at 1 until after the dio_complete()
> > > and then do the bio_count decrement and wakeup holding the bio_lock.
> > > 
> > > Take a look, give it a try, and let me know what you think.
> > 
> > I had been trying a slightly different kind of fix -- appended is
> > the updated version of the patch I last posted. It uses the bio_list_lock
> > to protect the dio->waiter field, which finished_one_bio sets back
> > to NULL after it has issued the wakeup; and the code that waits for
> > i/o to drain out checks the dio->waiter field instead of bio_count.
> > This might not seem very obvious given the nomenclature of the 
> > bio_list_lock, so I was holding back wondering if it could be 
> > improved. 
> > 
> > Your approach looks clearer in that sense -- its pretty unambiguous
> > about what lock protects what fields. The only thing that bothers me (and
> > this is what I was trying to avoid in my patch) is the increased
> > use of spin_lock_irq 's (overhead of turning interrupts off and on)
> > instead of simple atomic inc/dec in most places.
> > 
> > Thoughts ?
> > 
> > Regards
> > Suparna
> 
> --
> To unsubscribe, send a message with 'unsubscribe linux-aio' in
> the body to majordomo@kvack.org.  For more info on Linux AIO,
> see: http://www.kvack.org/aio/
> Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>

-- 
Suparna Bhattacharya (suparna@in.ibm.com)
Linux Technology Center
IBM Software Lab, India


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 2.6.0-test9-mm5] aio-dio-fallback-bio_count-race.patch
  2003-12-02 15:25                     ` Suparna Bhattacharya
@ 2003-12-03 23:14                       ` Daniel McNeil
  2003-12-04  4:40                         ` Suparna Bhattacharya
  0 siblings, 1 reply; 49+ messages in thread
From: Daniel McNeil @ 2003-12-03 23:14 UTC (permalink / raw)
  To: Suparna Bhattacharya
  Cc: Andrew Morton, Linux Kernel Mailing List, linux-mm, linux-aio

Suparna,

I did a quick test of your patch and my patch by running my aiocp
program to write zeros to a file and to read a file.  I used
a 50MB file on ext2 file system on a ramdisk.  The machine
is a 2-proc IBM box with:

model name      : Intel(R) Xeon(TM) CPU 1700MHz
stepping        : 10
cpu MHz         : 1686.033
cache size      : 256 KB

The write test was:

time aiocp -n 32 -b 1k -s 50m -z -f DIRECT file

The read test was

time aiocp -n 32 -b 1k -s 50 -w -f DIRECT file

I ran each test more than 10 times and here are the averages:

		my patch		your patch

aiocp write	real 0.7328		real 0.7756
		user 0.01425		user 0.01221
		sys  0.716		sys  0.76157

aiocp read	real 0.7250		real 0.7456
		user 0.0144		user 0.0130
		sys  0.07149		sys  0.7307

It looks like using the spin_lock instead of the atomic inc/dec
is very close performance wise.  The spin_lock averages a bit
faster.  This is not testing the fallback base, but both patches
would be very similar in performance for that case.

I don't have any non-intel hardware to test with.

Daniel



On Tue, 2003-12-02 at 07:25, Suparna Bhattacharya wrote:
> I suspect the degree to which the spin_lock_irq is costlier 
> than atomic_inc/dec would vary across architectures - cli/sti
> is probably more expensive on certain archs than others.
> 
> The patch I sent just kept things the way they were in terms of 
> locking costs, assuming that those choices were thought through
> at that time (should check with akpm). Yours changes it by 
> switching to spin_lock(unlock)_irq instead of atomic_dec in 
> the normal (common) path for finished_one_bio, for both sync 
> and async i/o. At the same time, for the sync i/o case, as 
> you observe it takes away one atomic_dec from dio_bio_end_io. 
> 
> Since these probably aren't really very hot paths ... possibly
> the difference doesn't matter that much. I do agree that your
> patch makes the locking easier to follow.
> 
> Regards
> Suparna
> 
> On Mon, Dec 01, 2003 at 05:35:18PM -0800, Daniel McNeil wrote:
> > Suparna,
> > 
> > Sorry I did not respond sooner, I was on vacation.
> > 
> > Your patch should also fix the problem.  I like mine with the
> > cleaner locking.
> > 
> > I am not sure your approach has less overhead.  At least
> > on x86, cli/sti are fairly inexpensive.  The locked xchange or locked
> > inc/dec is what is expensive (from what I understand).
> > 
> > So comparing:
> > 
> > my patch:				Your patch:
> > 
> > dio_bio_submit()		
> > 	spin_lock()			atomic_inc(bio_count);
> > 	bio_count++			atomic_inc(bios_in_flight);
> > 	bios_in_flight++
> > 	spin_unlock
> > 
> > My guess is that the spin_lock/spin_unlock is faster than 2 atomic_inc's
> > since it is only 1 locked operation (spin_lock) verses 2 (atomic_inc's)
> > 
> > finished_one_bio() (normal case)
> > 
> > My patch:
> > 	spin_lock()			atomic_dec_and_test(bio_count)
> > 	bio_count--
> > 	spin_unlock()	
> > 	
> > 1 locked instruction each, so very close -- atomic_dec_and_test() does
> > not disable interrupts, so it is probabably a little bit faster.
> > 
> > finished_one-bio (fallback case):
> > 
> > 	spin_lock()			spin_lock()
> > 	bio_count--;			dio->waiter = null
> > 	spin_unlock()			spin_unlock()
> > 
> > Both approaches are the same.
> > 	
> > dio_bio_complete()
> > 
> > 	spin_lock()			spin_lock()
> > 	bios_in_flight--		atomic_dec()
> > 	spin_unlock			spin_unlock()
> > 
> > My patch is faster since it removed 1 locked instruction.
> > 
> > Conclusion:
> > 
> > My guess would be that both approaches are close, but my patch
> > has less locked instructions but does disable interrupts more.
> > My preference is for the cleaner locking approach that is easier
> > to understand and modify in the future.
> > 
> > Daniel
> > 
> > On Tue, 2003-11-25 at 23:55, Suparna Bhattacharya wrote:
> > > On Tue, Nov 25, 2003 at 03:49:31PM -0800, Daniel McNeil wrote:
> > > > Suparna,
> > > > 
> > > > Yes your patch did help.  I originally had CONFIG_DEBUG_SLAB=y which
> > > > was helping me see problems because the the freed dio was getting
> > > > poisoned.  I also tested with CONFIG_DEBUG_PAGEALLOC=y which is
> > > > very good at catching these.
> > > 
> > > Ah I see - perhaps that explains why neither Janet nor I could
> > > recreate the problem that you were hitting so easily. So we 
> > > should probably try running with CONFIG_DEBUG_SLAB and
> > > CONFIG_DEBUG_PAGEALLOC as well.
> > > 
> > > > 
> > > > I updated your AIO fallback patch plus your AIO race plus I fixed
> > > > the bio_count decrement fix.  This patch has all three fixes and
> > > > it is working for me.
> > > > 
> > > > I fixed the bio_count race, by changing bio_list_lock into bio_lock
> > > > and using that for all the bio fields.  I changed bio_count and
> > > > bios_in_flight from atomics into int.  They are now proctected by
> > > > the bio_lock.  I fixed the race, by in finished_one_bio() by
> > > > leaving the bio_count at 1 until after the dio_complete()
> > > > and then do the bio_count decrement and wakeup holding the bio_lock.
> > > > 
> > > > Take a look, give it a try, and let me know what you think.
> > > 
> > > I had been trying a slightly different kind of fix -- appended is
> > > the updated version of the patch I last posted. It uses the bio_list_lock
> > > to protect the dio->waiter field, which finished_one_bio sets back
> > > to NULL after it has issued the wakeup; and the code that waits for
> > > i/o to drain out checks the dio->waiter field instead of bio_count.
> > > This might not seem very obvious given the nomenclature of the 
> > > bio_list_lock, so I was holding back wondering if it could be 
> > > improved. 
> > > 
> > > Your approach looks clearer in that sense -- its pretty unambiguous
> > > about what lock protects what fields. The only thing that bothers me (and
> > > this is what I was trying to avoid in my patch) is the increased
> > > use of spin_lock_irq 's (overhead of turning interrupts off and on)
> > > instead of simple atomic inc/dec in most places.
> > > 
> > > Thoughts ?
> > > 
> > > Regards
> > > Suparna
> > 
> > --
> > To unsubscribe, send a message with 'unsubscribe linux-aio' in
> > the body to majordomo@kvack.org.  For more info on Linux AIO,
> > see: http://www.kvack.org/aio/
> > Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>


^ permalink raw reply	[flat|nested] 49+ messages in thread

* Re: [PATCH 2.6.0-test9-mm5] aio-dio-fallback-bio_count-race.patch
  2003-12-03 23:14                       ` Daniel McNeil
@ 2003-12-04  4:40                         ` Suparna Bhattacharya
  0 siblings, 0 replies; 49+ messages in thread
From: Suparna Bhattacharya @ 2003-12-04  4:40 UTC (permalink / raw)
  To: Daniel McNeil
  Cc: Andrew Morton, Linux Kernel Mailing List, linux-mm, linux-aio

Interesting :) 
Could it be that extra atomic_dec in dio_bio_end_io in the current code 
that makes the difference ? Or are spin_lock_irq and atomic_inc/dec
really that close ? Would be good to know from other archs just to keep
in mind in the future.

As for the fallback case, performance doesn't even matter - its not a
typical situation. So we don't need to bother with that.

Unless someone thinks otherwise, we should go with your fix.

Regards
Suparna

On Wed, Dec 03, 2003 at 03:14:06PM -0800, Daniel McNeil wrote:
> Suparna,
> 
> I did a quick test of your patch and my patch by running my aiocp
> program to write zeros to a file and to read a file.  I used
> a 50MB file on ext2 file system on a ramdisk.  The machine
> is a 2-proc IBM box with:
> 
> model name      : Intel(R) Xeon(TM) CPU 1700MHz
> stepping        : 10
> cpu MHz         : 1686.033
> cache size      : 256 KB
> 
> The write test was:
> 
> time aiocp -n 32 -b 1k -s 50m -z -f DIRECT file
> 
> The read test was
> 
> time aiocp -n 32 -b 1k -s 50 -w -f DIRECT file
> 
> I ran each test more than 10 times and here are the averages:
> 
> 		my patch		your patch
> 
> aiocp write	real 0.7328		real 0.7756
> 		user 0.01425		user 0.01221
> 		sys  0.716		sys  0.76157
> 
> aiocp read	real 0.7250		real 0.7456
> 		user 0.0144		user 0.0130
> 		sys  0.07149		sys  0.7307
> 
> It looks like using the spin_lock instead of the atomic inc/dec
> is very close performance wise.  The spin_lock averages a bit
> faster.  This is not testing the fallback base, but both patches
> would be very similar in performance for that case.
> 
> I don't have any non-intel hardware to test with.
> 
> Daniel
> 
> 
> 
> On Tue, 2003-12-02 at 07:25, Suparna Bhattacharya wrote:
> > I suspect the degree to which the spin_lock_irq is costlier 
> > than atomic_inc/dec would vary across architectures - cli/sti
> > is probably more expensive on certain archs than others.
> > 
> > The patch I sent just kept things the way they were in terms of 
> > locking costs, assuming that those choices were thought through
> > at that time (should check with akpm). Yours changes it by 
> > switching to spin_lock(unlock)_irq instead of atomic_dec in 
> > the normal (common) path for finished_one_bio, for both sync 
> > and async i/o. At the same time, for the sync i/o case, as 
> > you observe it takes away one atomic_dec from dio_bio_end_io. 
> > 
> > Since these probably aren't really very hot paths ... possibly
> > the difference doesn't matter that much. I do agree that your
> > patch makes the locking easier to follow.
> > 
> > Regards
> > Suparna
> > 
> > On Mon, Dec 01, 2003 at 05:35:18PM -0800, Daniel McNeil wrote:
> > > Suparna,
> > > 
> > > Sorry I did not respond sooner, I was on vacation.
> > > 
> > > Your patch should also fix the problem.  I like mine with the
> > > cleaner locking.
> > > 
> > > I am not sure your approach has less overhead.  At least
> > > on x86, cli/sti are fairly inexpensive.  The locked xchange or locked
> > > inc/dec is what is expensive (from what I understand).
> > > 
> > > So comparing:
> > > 
> > > my patch:				Your patch:
> > > 
> > > dio_bio_submit()		
> > > 	spin_lock()			atomic_inc(bio_count);
> > > 	bio_count++			atomic_inc(bios_in_flight);
> > > 	bios_in_flight++
> > > 	spin_unlock
> > > 
> > > My guess is that the spin_lock/spin_unlock is faster than 2 atomic_inc's
> > > since it is only 1 locked operation (spin_lock) verses 2 (atomic_inc's)
> > > 
> > > finished_one_bio() (normal case)
> > > 
> > > My patch:
> > > 	spin_lock()			atomic_dec_and_test(bio_count)
> > > 	bio_count--
> > > 	spin_unlock()	
> > > 	
> > > 1 locked instruction each, so very close -- atomic_dec_and_test() does
> > > not disable interrupts, so it is probabably a little bit faster.
> > > 
> > > finished_one-bio (fallback case):
> > > 
> > > 	spin_lock()			spin_lock()
> > > 	bio_count--;			dio->waiter = null
> > > 	spin_unlock()			spin_unlock()
> > > 
> > > Both approaches are the same.
> > > 	
> > > dio_bio_complete()
> > > 
> > > 	spin_lock()			spin_lock()
> > > 	bios_in_flight--		atomic_dec()
> > > 	spin_unlock			spin_unlock()
> > > 
> > > My patch is faster since it removed 1 locked instruction.
> > > 
> > > Conclusion:
> > > 
> > > My guess would be that both approaches are close, but my patch
> > > has less locked instructions but does disable interrupts more.
> > > My preference is for the cleaner locking approach that is easier
> > > to understand and modify in the future.
> > > 
> > > Daniel
> > > 
> > > On Tue, 2003-11-25 at 23:55, Suparna Bhattacharya wrote:
> > > > On Tue, Nov 25, 2003 at 03:49:31PM -0800, Daniel McNeil wrote:
> > > > > Suparna,
> > > > > 
> > > > > Yes your patch did help.  I originally had CONFIG_DEBUG_SLAB=y which
> > > > > was helping me see problems because the the freed dio was getting
> > > > > poisoned.  I also tested with CONFIG_DEBUG_PAGEALLOC=y which is
> > > > > very good at catching these.
> > > > 
> > > > Ah I see - perhaps that explains why neither Janet nor I could
> > > > recreate the problem that you were hitting so easily. So we 
> > > > should probably try running with CONFIG_DEBUG_SLAB and
> > > > CONFIG_DEBUG_PAGEALLOC as well.
> > > > 
> > > > > 
> > > > > I updated your AIO fallback patch plus your AIO race plus I fixed
> > > > > the bio_count decrement fix.  This patch has all three fixes and
> > > > > it is working for me.
> > > > > 
> > > > > I fixed the bio_count race, by changing bio_list_lock into bio_lock
> > > > > and using that for all the bio fields.  I changed bio_count and
> > > > > bios_in_flight from atomics into int.  They are now proctected by
> > > > > the bio_lock.  I fixed the race, by in finished_one_bio() by
> > > > > leaving the bio_count at 1 until after the dio_complete()
> > > > > and then do the bio_count decrement and wakeup holding the bio_lock.
> > > > > 
> > > > > Take a look, give it a try, and let me know what you think.
> > > > 
> > > > I had been trying a slightly different kind of fix -- appended is
> > > > the updated version of the patch I last posted. It uses the bio_list_lock
> > > > to protect the dio->waiter field, which finished_one_bio sets back
> > > > to NULL after it has issued the wakeup; and the code that waits for
> > > > i/o to drain out checks the dio->waiter field instead of bio_count.
> > > > This might not seem very obvious given the nomenclature of the 
> > > > bio_list_lock, so I was holding back wondering if it could be 
> > > > improved. 
> > > > 
> > > > Your approach looks clearer in that sense -- its pretty unambiguous
> > > > about what lock protects what fields. The only thing that bothers me (and
> > > > this is what I was trying to avoid in my patch) is the increased
> > > > use of spin_lock_irq 's (overhead of turning interrupts off and on)
> > > > instead of simple atomic inc/dec in most places.
> > > > 
> > > > Thoughts ?
> > > > 
> > > > Regards
> > > > Suparna
> > > 
> > > --
> > > To unsubscribe, send a message with 'unsubscribe linux-aio' in
> > > the body to majordomo@kvack.org.  For more info on Linux AIO,
> > > see: http://www.kvack.org/aio/
> > > Don't email: <a href=mailto:"aart@kvack.org">aart@kvack.org</a>
> 

-- 
Suparna Bhattacharya (suparna@in.ibm.com)
Linux Technology Center
IBM Software Lab, India


^ permalink raw reply	[flat|nested] 49+ messages in thread

end of thread, other threads:[~2003-12-04  4:35 UTC | newest]

Thread overview: 49+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2003-11-13  7:30 2.6.0-test9-mm3 Andrew Morton
2003-11-13 20:03 ` [PATCH] linux-2.6.0-test9-mm3_verbose-timesource-acpi-pm_A0 john stultz
2003-11-13 22:03 ` 2.6.0-test9-mm3 - AIO test results Daniel McNeil
2003-11-17  5:25   ` Suparna Bhattacharya
2003-11-18  1:15     ` Daniel McNeil
2003-11-18  1:37       ` Daniel McNeil
2003-11-18 11:55         ` Suparna Bhattacharya
2003-11-18 23:47           ` Daniel McNeil
2003-11-24  9:42             ` Suparna Bhattacharya
2003-11-25 23:49               ` [PATCH 2.6.0-test9-mm5] aio-dio-fallback-bio_count-race.patch Daniel McNeil
2003-11-26  7:55                 ` Suparna Bhattacharya
2003-12-02  1:35                   ` Daniel McNeil
2003-12-02 15:25                     ` Suparna Bhattacharya
2003-12-03 23:14                       ` Daniel McNeil
2003-12-04  4:40                         ` Suparna Bhattacharya
2003-11-13 22:04 ` 2.6.0-test9-mm3 (compile stats) John Cherry
2003-11-14  5:07 ` 2.6.0-test9-mm3 Martin J. Bligh
2003-11-14 20:57   ` 2.6.0-test9-mm3 Zwane Mwaikambo
2003-11-14 21:57     ` 2.6.0-test9-mm3 Martin J. Bligh
2003-11-14 21:37       ` 2.6.0-test9-mm3 Zwane Mwaikambo
2003-11-14 21:47       ` 2.6.0-test9-mm3 Linus Torvalds
2003-11-15  0:55         ` 2.6.0-test9-mm3 Zwane Mwaikambo
2003-11-15 19:34           ` [PATCH][2.6-mm] Fix 4G/4G X11/vm86 oops Zwane Mwaikambo
2003-11-15 19:52             ` Zwane Mwaikambo
2003-11-17 21:46             ` Zwane Mwaikambo
2003-11-17 22:42               ` Linus Torvalds
2003-11-17 23:01                 ` Zwane Mwaikambo
2003-11-17 23:14                   ` Zwane Mwaikambo
2003-11-18  7:21                     ` Zwane Mwaikambo
2003-11-18 15:47                       ` Linus Torvalds
2003-11-18 16:16                         ` Zwane Mwaikambo
2003-11-18 16:37                           ` Linus Torvalds
2003-11-18 17:08                             ` Zwane Mwaikambo
2003-11-18 17:38                               ` Martin J. Bligh
2003-11-18 17:22                                 ` Zwane Mwaikambo
2003-11-19 20:32                             ` Matt Mackall
2003-11-19 23:09                               ` Matt Mackall
2003-11-20  7:14                                 ` Zwane Mwaikambo
2003-11-20  7:44                                 ` Matt Mackall
2003-11-20  7:53                                   ` Andrew Morton
2003-11-20  8:13                                   ` Matt Mackall
2003-11-14 19:08 ` 2.6.0-test9-mm3 Martin J. Bligh
2003-11-14 18:59   ` 2.6.0-test9-mm3 Andrew Morton
2003-11-14 19:32     ` 2.6.0-test9-mm3 Mike Fedyk
2003-11-14 20:27       ` 2.6.0-test9-mm3 John Stoffel
2003-11-15  1:01         ` 2.6.0-test9-mm3 Mike Fedyk
2003-11-14 19:10   ` 2.6.0-test9-mm3 Badari Pulavarty
2003-11-14 20:29     ` 2.6.0-test9-mm3 Martin J. Bligh
2003-11-17 20:58       ` 2.6.0-test9-mm3 bill davidsen

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).