LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* Strange DMA-errors and system hang with Promise 20268
@ 2004-03-06 19:47 Henrik Persson
2004-03-06 19:55 ` Henrik Persson
` (2 more replies)
0 siblings, 3 replies; 26+ messages in thread
From: Henrik Persson @ 2004-03-06 19:47 UTC (permalink / raw)
To: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 688 bytes --]
Hi.
The last month or so I've experienced some strangeness with one of my
boxes. It is up and running without any problems and then suddently i
get this in the syslog:
Mar 6 20:29:42 eurisco kernel: hdf: dma_timer_expiry: dma status ==
0x61
(sometimes dma status has been 0x41)
And a few seconds later the system has frozen and I have to reset the
box to get it back up and running.
It isn't always the same hdX. If I remove the device that produces the
error another device, but it's allways a device on the promise
controller, fails.
I've seen this behaviour with 2.4.25, 2.4.24 and 2.4.23 (I think).
Any ideas?
--
Henrik Persson <nix@syndicalist.net>
[-- Attachment #2: This is a digitally signed message part --]
[-- Type: application/pgp-signature, Size: 189 bytes --]
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Strange DMA-errors and system hang with Promise 20268
2004-03-06 19:47 Strange DMA-errors and system hang with Promise 20268 Henrik Persson
@ 2004-03-06 19:55 ` Henrik Persson
2004-03-07 1:05 ` Mario 'BitKoenig' Holbe
[not found] ` <200405052339.i45NdXsx003369@darkside.22.kls.lan>
2 siblings, 0 replies; 26+ messages in thread
From: Henrik Persson @ 2004-03-06 19:55 UTC (permalink / raw)
To: linux-kernel
[-- Attachment #1: Type: text/plain, Size: 803 bytes --]
On Sat, 2004-03-06 at 20:47, Henrik Persson wrote:
> Hi.
>
> The last month or so I've experienced some strangeness with one of my
> boxes. It is up and running without any problems and then suddently i
> get this in the syslog:
>
> Mar 6 20:29:42 eurisco kernel: hdf: dma_timer_expiry: dma status ==
> 0x61
>
> (sometimes dma status has been 0x41)
>
> And a few seconds later the system has frozen and I have to reset the
> box to get it back up and running.
>
> It isn't always the same hdX. If I remove the device that produces the
> error another device, but it's allways a device on the promise
> controller, fails.
>
> I've seen this behaviour with 2.4.25, 2.4.24 and 2.4.23 (I think).
>
> Any ideas?
And ah. .config, lspci and cpuinfo attached.
--
Henrik Persson <nix@syndicalist.net>
[-- Attachment #2: eurisco.config --]
[-- Type: text/plain, Size: 16507 bytes --]
#
# Automatically generated by make menuconfig: don't edit
#
CONFIG_X86=y
# CONFIG_SBUS is not set
CONFIG_UID16=y
#
# Code maturity level options
#
CONFIG_EXPERIMENTAL=y
#
# Loadable module support
#
CONFIG_MODULES=y
CONFIG_MODVERSIONS=y
CONFIG_KMOD=y
#
# Processor type and features
#
# CONFIG_M386 is not set
# CONFIG_M486 is not set
# CONFIG_M586 is not set
# CONFIG_M586TSC is not set
# CONFIG_M586MMX is not set
# CONFIG_M686 is not set
# CONFIG_MPENTIUMIII is not set
# CONFIG_MPENTIUM4 is not set
# CONFIG_MK6 is not set
CONFIG_MK7=y
# CONFIG_MK8 is not set
# CONFIG_MELAN is not set
# CONFIG_MCRUSOE is not set
# CONFIG_MWINCHIPC6 is not set
# CONFIG_MWINCHIP2 is not set
# CONFIG_MWINCHIP3D is not set
# CONFIG_MCYRIXIII is not set
# CONFIG_MVIAC3_2 is not set
CONFIG_X86_WP_WORKS_OK=y
CONFIG_X86_INVLPG=y
CONFIG_X86_CMPXCHG=y
CONFIG_X86_XADD=y
CONFIG_X86_BSWAP=y
CONFIG_X86_POPAD_OK=y
# CONFIG_RWSEM_GENERIC_SPINLOCK is not set
CONFIG_RWSEM_XCHGADD_ALGORITHM=y
CONFIG_X86_L1_CACHE_SHIFT=6
CONFIG_X86_HAS_TSC=y
CONFIG_X86_GOOD_APIC=y
CONFIG_X86_USE_3DNOW=y
CONFIG_X86_PGE=y
CONFIG_X86_USE_PPRO_CHECKSUM=y
CONFIG_X86_F00F_WORKS_OK=y
CONFIG_X86_MCE=y
# CONFIG_TOSHIBA is not set
# CONFIG_I8K is not set
# CONFIG_MICROCODE is not set
# CONFIG_X86_MSR is not set
# CONFIG_X86_CPUID is not set
# CONFIG_EDD is not set
CONFIG_NOHIGHMEM=y
# CONFIG_HIGHMEM4G is not set
# CONFIG_HIGHMEM64G is not set
# CONFIG_HIGHMEM is not set
# CONFIG_MATH_EMULATION is not set
CONFIG_MTRR=y
# CONFIG_SMP is not set
# CONFIG_X86_UP_APIC is not set
# CONFIG_X86_UP_IOAPIC is not set
# CONFIG_X86_TSC_DISABLE is not set
CONFIG_X86_TSC=y
#
# General setup
#
CONFIG_NET=y
CONFIG_PCI=y
# CONFIG_PCI_GOBIOS is not set
# CONFIG_PCI_GODIRECT is not set
CONFIG_PCI_GOANY=y
CONFIG_PCI_BIOS=y
CONFIG_PCI_DIRECT=y
# CONFIG_ISA is not set
CONFIG_PCI_NAMES=y
# CONFIG_EISA is not set
# CONFIG_MCA is not set
# CONFIG_HOTPLUG is not set
# CONFIG_PCMCIA is not set
# CONFIG_HOTPLUG_PCI is not set
CONFIG_SYSVIPC=y
# CONFIG_BSD_PROCESS_ACCT is not set
CONFIG_SYSCTL=y
CONFIG_KCORE_ELF=y
# CONFIG_KCORE_AOUT is not set
CONFIG_BINFMT_AOUT=m
CONFIG_BINFMT_ELF=y
CONFIG_BINFMT_MISC=m
# CONFIG_OOM_KILLER is not set
# CONFIG_PM is not set
# CONFIG_APM is not set
#
# ACPI Support
#
CONFIG_ACPI=y
CONFIG_ACPI_BOOT=y
CONFIG_ACPI_BUS=y
CONFIG_ACPI_INTERPRETER=y
CONFIG_ACPI_EC=y
CONFIG_ACPI_POWER=y
CONFIG_ACPI_PCI=y
CONFIG_ACPI_SLEEP=y
CONFIG_ACPI_SYSTEM=y
# CONFIG_ACPI_AC is not set
# CONFIG_ACPI_BATTERY is not set
CONFIG_ACPI_BUTTON=y
# CONFIG_ACPI_FAN is not set
# CONFIG_ACPI_PROCESSOR is not set
# CONFIG_ACPI_THERMAL is not set
# CONFIG_ACPI_ASUS is not set
# CONFIG_ACPI_TOSHIBA is not set
# CONFIG_ACPI_DEBUG is not set
# CONFIG_ACPI_RELAXED_AML is not set
#
# Memory Technology Devices (MTD)
#
# CONFIG_MTD is not set
#
# Parallel port support
#
# CONFIG_PARPORT is not set
#
# Plug and Play configuration
#
CONFIG_PNP=y
# CONFIG_ISAPNP is not set
#
# Block devices
#
CONFIG_BLK_DEV_FD=y
# CONFIG_BLK_DEV_XD is not set
# CONFIG_PARIDE is not set
# CONFIG_BLK_CPQ_DA is not set
# CONFIG_BLK_CPQ_CISS_DA is not set
# CONFIG_CISS_SCSI_TAPE is not set
# CONFIG_CISS_MONITOR_THREAD is not set
# CONFIG_BLK_DEV_DAC960 is not set
# CONFIG_BLK_DEV_UMEM is not set
CONFIG_BLK_DEV_LOOP=y
# CONFIG_BLK_DEV_NBD is not set
# CONFIG_BLK_DEV_RAM is not set
# CONFIG_BLK_DEV_INITRD is not set
# CONFIG_BLK_STATS is not set
#
# Multi-device support (RAID and LVM)
#
# CONFIG_MD is not set
# CONFIG_BLK_DEV_MD is not set
# CONFIG_MD_LINEAR is not set
# CONFIG_MD_RAID0 is not set
# CONFIG_MD_RAID1 is not set
# CONFIG_MD_RAID5 is not set
# CONFIG_MD_MULTIPATH is not set
# CONFIG_BLK_DEV_LVM is not set
#
# Networking options
#
CONFIG_PACKET=y
CONFIG_PACKET_MMAP=y
# CONFIG_NETLINK_DEV is not set
# CONFIG_NETFILTER is not set
# CONFIG_FILTER is not set
CONFIG_UNIX=y
CONFIG_INET=y
CONFIG_IP_MULTICAST=y
# CONFIG_IP_ADVANCED_ROUTER is not set
# CONFIG_IP_PNP is not set
# CONFIG_NET_IPIP is not set
# CONFIG_NET_IPGRE is not set
# CONFIG_IP_MROUTE is not set
# CONFIG_ARPD is not set
# CONFIG_INET_ECN is not set
# CONFIG_SYN_COOKIES is not set
# CONFIG_IPV6 is not set
# CONFIG_KHTTPD is not set
#
# SCTP Configuration (EXPERIMENTAL)
#
CONFIG_IPV6_SCTP__=y
# CONFIG_IP_SCTP is not set
# CONFIG_ATM is not set
# CONFIG_VLAN_8021Q is not set
# CONFIG_IPX is not set
# CONFIG_ATALK is not set
#
# Appletalk devices
#
# CONFIG_DEV_APPLETALK is not set
# CONFIG_DECNET is not set
# CONFIG_BRIDGE is not set
# CONFIG_X25 is not set
# CONFIG_LAPB is not set
# CONFIG_LLC is not set
# CONFIG_NET_DIVERT is not set
# CONFIG_ECONET is not set
# CONFIG_WAN_ROUTER is not set
# CONFIG_NET_FASTROUTE is not set
# CONFIG_NET_HW_FLOWCONTROL is not set
#
# QoS and/or fair queueing
#
# CONFIG_NET_SCHED is not set
#
# Network testing
#
# CONFIG_NET_PKTGEN is not set
#
# Telephony Support
#
# CONFIG_PHONE is not set
# CONFIG_PHONE_IXJ is not set
# CONFIG_PHONE_IXJ_PCMCIA is not set
#
# ATA/IDE/MFM/RLL support
#
CONFIG_IDE=y
#
# IDE, ATA and ATAPI Block devices
#
CONFIG_BLK_DEV_IDE=y
# CONFIG_BLK_DEV_HD_IDE is not set
# CONFIG_BLK_DEV_HD is not set
CONFIG_BLK_DEV_IDEDISK=y
CONFIG_IDEDISK_MULTI_MODE=y
# CONFIG_IDEDISK_STROKE is not set
# CONFIG_BLK_DEV_IDECS is not set
CONFIG_BLK_DEV_IDECD=y
# CONFIG_BLK_DEV_IDETAPE is not set
# CONFIG_BLK_DEV_IDEFLOPPY is not set
# CONFIG_BLK_DEV_IDESCSI is not set
# CONFIG_IDE_TASK_IOCTL is not set
# CONFIG_BLK_DEV_CMD640 is not set
# CONFIG_BLK_DEV_CMD640_ENHANCED is not set
# CONFIG_BLK_DEV_ISAPNP is not set
CONFIG_BLK_DEV_IDEPCI=y
# CONFIG_BLK_DEV_GENERIC is not set
CONFIG_IDEPCI_SHARE_IRQ=y
CONFIG_BLK_DEV_IDEDMA_PCI=y
# CONFIG_BLK_DEV_OFFBOARD is not set
# CONFIG_BLK_DEV_IDEDMA_FORCED is not set
CONFIG_IDEDMA_PCI_AUTO=y
# CONFIG_IDEDMA_ONLYDISK is not set
CONFIG_BLK_DEV_IDEDMA=y
# CONFIG_IDEDMA_PCI_WIP is not set
# CONFIG_BLK_DEV_ADMA100 is not set
# CONFIG_BLK_DEV_AEC62XX is not set
# CONFIG_BLK_DEV_ALI15X3 is not set
# CONFIG_WDC_ALI15X3 is not set
# CONFIG_BLK_DEV_AMD74XX is not set
# CONFIG_AMD74XX_OVERRIDE is not set
# CONFIG_BLK_DEV_CMD64X is not set
# CONFIG_BLK_DEV_TRIFLEX is not set
# CONFIG_BLK_DEV_CY82C693 is not set
# CONFIG_BLK_DEV_CS5530 is not set
# CONFIG_BLK_DEV_HPT34X is not set
# CONFIG_HPT34X_AUTODMA is not set
# CONFIG_BLK_DEV_HPT366 is not set
# CONFIG_BLK_DEV_PIIX is not set
# CONFIG_BLK_DEV_NS87415 is not set
# CONFIG_BLK_DEV_OPTI621 is not set
# CONFIG_BLK_DEV_PDC202XX_OLD is not set
# CONFIG_PDC202XX_BURST is not set
CONFIG_BLK_DEV_PDC202XX_NEW=y
# CONFIG_PDC202XX_FORCE is not set
# CONFIG_BLK_DEV_RZ1000 is not set
# CONFIG_BLK_DEV_SC1200 is not set
# CONFIG_BLK_DEV_SVWKS is not set
# CONFIG_BLK_DEV_SIIMAGE is not set
# CONFIG_BLK_DEV_SIS5513 is not set
# CONFIG_BLK_DEV_SLC90E66 is not set
# CONFIG_BLK_DEV_TRM290 is not set
CONFIG_BLK_DEV_VIA82CXXX=y
# CONFIG_IDE_CHIPSETS is not set
CONFIG_IDEDMA_AUTO=y
# CONFIG_IDEDMA_IVB is not set
# CONFIG_DMA_NONPCI is not set
CONFIG_BLK_DEV_PDC202XX=y
# CONFIG_BLK_DEV_ATARAID is not set
# CONFIG_BLK_DEV_ATARAID_PDC is not set
# CONFIG_BLK_DEV_ATARAID_HPT is not set
# CONFIG_BLK_DEV_ATARAID_SII is not set
#
# SCSI support
#
# CONFIG_SCSI is not set
#
# Fusion MPT device support
#
# CONFIG_FUSION is not set
# CONFIG_FUSION_BOOT is not set
# CONFIG_FUSION_ISENSE is not set
# CONFIG_FUSION_CTL is not set
# CONFIG_FUSION_LAN is not set
#
# IEEE 1394 (FireWire) support (EXPERIMENTAL)
#
# CONFIG_IEEE1394 is not set
#
# I2O device support
#
# CONFIG_I2O is not set
# CONFIG_I2O_PCI is not set
# CONFIG_I2O_BLOCK is not set
# CONFIG_I2O_LAN is not set
# CONFIG_I2O_SCSI is not set
# CONFIG_I2O_PROC is not set
#
# Network device support
#
CONFIG_NETDEVICES=y
#
# ARCnet devices
#
# CONFIG_ARCNET is not set
# CONFIG_DUMMY is not set
# CONFIG_BONDING is not set
# CONFIG_EQUALIZER is not set
# CONFIG_TUN is not set
CONFIG_ETHERTAP=y
#
# Ethernet (10 or 100Mbit)
#
CONFIG_NET_ETHERNET=y
# CONFIG_SUNLANCE is not set
# CONFIG_HAPPYMEAL is not set
# CONFIG_SUNBMAC is not set
# CONFIG_SUNQE is not set
# CONFIG_SUNGEM is not set
CONFIG_NET_VENDOR_3COM=y
# CONFIG_EL1 is not set
# CONFIG_EL2 is not set
# CONFIG_ELPLUS is not set
# CONFIG_EL16 is not set
# CONFIG_ELMC is not set
# CONFIG_ELMC_II is not set
CONFIG_VORTEX=y
# CONFIG_TYPHOON is not set
# CONFIG_LANCE is not set
# CONFIG_NET_VENDOR_SMC is not set
# CONFIG_NET_VENDOR_RACAL is not set
# CONFIG_HP100 is not set
# CONFIG_NET_ISA is not set
# CONFIG_NET_PCI is not set
# CONFIG_NET_POCKET is not set
#
# Ethernet (1000 Mbit)
#
# CONFIG_ACENIC is not set
# CONFIG_DL2K is not set
# CONFIG_E1000 is not set
# CONFIG_MYRI_SBUS is not set
# CONFIG_NS83820 is not set
# CONFIG_HAMACHI is not set
# CONFIG_YELLOWFIN is not set
# CONFIG_R8169 is not set
# CONFIG_SK98LIN is not set
# CONFIG_TIGON3 is not set
# CONFIG_FDDI is not set
# CONFIG_HIPPI is not set
# CONFIG_PLIP is not set
# CONFIG_PPP is not set
# CONFIG_SLIP is not set
#
# Wireless LAN (non-hamradio)
#
# CONFIG_NET_RADIO is not set
#
# Token Ring devices
#
# CONFIG_TR is not set
# CONFIG_NET_FC is not set
# CONFIG_RCPCI is not set
# CONFIG_SHAPER is not set
#
# Wan interfaces
#
# CONFIG_WAN is not set
#
# Amateur Radio support
#
# CONFIG_HAMRADIO is not set
#
# IrDA (infrared) support
#
# CONFIG_IRDA is not set
#
# ISDN subsystem
#
# CONFIG_ISDN is not set
#
# Input core support
#
# CONFIG_INPUT is not set
# CONFIG_INPUT_KEYBDEV is not set
# CONFIG_INPUT_MOUSEDEV is not set
# CONFIG_INPUT_JOYDEV is not set
# CONFIG_INPUT_EVDEV is not set
# CONFIG_INPUT_UINPUT is not set
#
# Character devices
#
CONFIG_VT=y
CONFIG_VT_CONSOLE=y
CONFIG_SERIAL=y
# CONFIG_SERIAL_CONSOLE is not set
# CONFIG_SERIAL_EXTENDED is not set
# CONFIG_SERIAL_NONSTANDARD is not set
CONFIG_UNIX98_PTYS=y
CONFIG_UNIX98_PTY_COUNT=256
#
# I2C support
#
# CONFIG_I2C is not set
#
# Mice
#
# CONFIG_BUSMOUSE is not set
CONFIG_MOUSE=y
CONFIG_PSMOUSE=y
# CONFIG_82C710_MOUSE is not set
# CONFIG_PC110_PAD is not set
# CONFIG_MK712_MOUSE is not set
#
# Joysticks
#
# CONFIG_INPUT_GAMEPORT is not set
# CONFIG_QIC02_TAPE is not set
# CONFIG_IPMI_HANDLER is not set
# CONFIG_IPMI_PANIC_EVENT is not set
# CONFIG_IPMI_DEVICE_INTERFACE is not set
# CONFIG_IPMI_KCS is not set
# CONFIG_IPMI_WATCHDOG is not set
#
# Watchdog Cards
#
# CONFIG_WATCHDOG is not set
# CONFIG_SCx200 is not set
# CONFIG_SCx200_GPIO is not set
# CONFIG_AMD_RNG is not set
# CONFIG_INTEL_RNG is not set
CONFIG_HW_RANDOM=y
# CONFIG_AMD_PM768 is not set
# CONFIG_NVRAM is not set
CONFIG_RTC=y
# CONFIG_DTLK is not set
# CONFIG_R3964 is not set
# CONFIG_APPLICOM is not set
# CONFIG_SONYPI is not set
#
# Ftape, the floppy tape device driver
#
# CONFIG_FTAPE is not set
CONFIG_AGP=y
# CONFIG_AGP_INTEL is not set
# CONFIG_AGP_I810 is not set
CONFIG_AGP_VIA=y
# CONFIG_AGP_AMD is not set
# CONFIG_AGP_AMD_K8 is not set
# CONFIG_AGP_SIS is not set
# CONFIG_AGP_ALI is not set
# CONFIG_AGP_SWORKS is not set
# CONFIG_AGP_NVIDIA is not set
# CONFIG_AGP_ATI is not set
#
# Direct Rendering Manager (XFree86 DRI support)
#
CONFIG_DRM=y
# CONFIG_DRM_OLD is not set
CONFIG_DRM_NEW=y
CONFIG_DRM_TDFX=y
# CONFIG_DRM_GAMMA is not set
# CONFIG_DRM_R128 is not set
# CONFIG_DRM_RADEON is not set
# CONFIG_DRM_I810 is not set
# CONFIG_DRM_I810_XFREE_41 is not set
# CONFIG_DRM_I830 is not set
# CONFIG_DRM_MGA is not set
# CONFIG_DRM_SIS is not set
# CONFIG_MWAVE is not set
# CONFIG_OBMOUSE is not set
#
# Multimedia devices
#
# CONFIG_VIDEO_DEV is not set
#
# File systems
#
# CONFIG_QUOTA is not set
# CONFIG_QFMT_V2 is not set
# CONFIG_AUTOFS_FS is not set
# CONFIG_AUTOFS4_FS is not set
# CONFIG_REISERFS_FS is not set
# CONFIG_REISERFS_CHECK is not set
# CONFIG_REISERFS_PROC_INFO is not set
# CONFIG_ADFS_FS is not set
# CONFIG_ADFS_FS_RW is not set
# CONFIG_AFFS_FS is not set
# CONFIG_HFS_FS is not set
# CONFIG_HFSPLUS_FS is not set
# CONFIG_BEFS_FS is not set
# CONFIG_BEFS_DEBUG is not set
# CONFIG_BFS_FS is not set
CONFIG_EXT3_FS=y
CONFIG_JBD=y
# CONFIG_JBD_DEBUG is not set
CONFIG_FAT_FS=m
CONFIG_MSDOS_FS=m
# CONFIG_UMSDOS_FS is not set
CONFIG_VFAT_FS=m
# CONFIG_EFS_FS is not set
# CONFIG_JFFS_FS is not set
# CONFIG_JFFS2_FS is not set
# CONFIG_CRAMFS is not set
CONFIG_TMPFS=y
CONFIG_RAMFS=y
CONFIG_ISO9660_FS=y
CONFIG_JOLIET=y
# CONFIG_ZISOFS is not set
# CONFIG_JFS_FS is not set
# CONFIG_JFS_DEBUG is not set
# CONFIG_JFS_STATISTICS is not set
# CONFIG_MINIX_FS is not set
# CONFIG_VXFS_FS is not set
# CONFIG_NTFS_FS is not set
# CONFIG_NTFS_RW is not set
# CONFIG_HPFS_FS is not set
CONFIG_PROC_FS=y
# CONFIG_DEVFS_FS is not set
# CONFIG_DEVFS_MOUNT is not set
# CONFIG_DEVFS_DEBUG is not set
CONFIG_DEVPTS_FS=y
# CONFIG_QNX4FS_FS is not set
# CONFIG_QNX4FS_RW is not set
# CONFIG_ROMFS_FS is not set
CONFIG_EXT2_FS=y
# CONFIG_SYSV_FS is not set
# CONFIG_UDF_FS is not set
# CONFIG_UDF_RW is not set
# CONFIG_UFS_FS is not set
# CONFIG_UFS_FS_WRITE is not set
CONFIG_XFS_FS=y
# CONFIG_XFS_QUOTA is not set
# CONFIG_XFS_RT is not set
# CONFIG_XFS_TRACE is not set
# CONFIG_XFS_DEBUG is not set
#
# Network File Systems
#
# CONFIG_CODA_FS is not set
# CONFIG_INTERMEZZO_FS is not set
CONFIG_NFS_FS=y
CONFIG_NFS_V3=y
# CONFIG_NFS_DIRECTIO is not set
# CONFIG_ROOT_NFS is not set
CONFIG_NFSD=y
CONFIG_NFSD_V3=y
CONFIG_NFSD_TCP=y
CONFIG_SUNRPC=y
CONFIG_LOCKD=y
CONFIG_LOCKD_V4=y
# CONFIG_SMB_FS is not set
# CONFIG_NCP_FS is not set
# CONFIG_NCPFS_PACKET_SIGNING is not set
# CONFIG_NCPFS_IOCTL_LOCKING is not set
# CONFIG_NCPFS_STRONG is not set
# CONFIG_NCPFS_NFS_NS is not set
# CONFIG_NCPFS_OS2_NS is not set
# CONFIG_NCPFS_SMALLDOS is not set
# CONFIG_NCPFS_NLS is not set
# CONFIG_NCPFS_EXTRAS is not set
# CONFIG_ZISOFS_FS is not set
#
# Partition Types
#
# CONFIG_PARTITION_ADVANCED is not set
CONFIG_MSDOS_PARTITION=y
# CONFIG_SMB_NLS is not set
CONFIG_NLS=y
#
# Native Language Support
#
CONFIG_NLS_DEFAULT="iso8859-1"
CONFIG_NLS_CODEPAGE_437=y
# CONFIG_NLS_CODEPAGE_737 is not set
# CONFIG_NLS_CODEPAGE_775 is not set
# CONFIG_NLS_CODEPAGE_850 is not set
# CONFIG_NLS_CODEPAGE_852 is not set
# CONFIG_NLS_CODEPAGE_855 is not set
# CONFIG_NLS_CODEPAGE_857 is not set
# CONFIG_NLS_CODEPAGE_860 is not set
# CONFIG_NLS_CODEPAGE_861 is not set
# CONFIG_NLS_CODEPAGE_862 is not set
# CONFIG_NLS_CODEPAGE_863 is not set
# CONFIG_NLS_CODEPAGE_864 is not set
# CONFIG_NLS_CODEPAGE_865 is not set
# CONFIG_NLS_CODEPAGE_866 is not set
# CONFIG_NLS_CODEPAGE_869 is not set
# CONFIG_NLS_CODEPAGE_936 is not set
# CONFIG_NLS_CODEPAGE_950 is not set
# CONFIG_NLS_CODEPAGE_932 is not set
# CONFIG_NLS_CODEPAGE_949 is not set
# CONFIG_NLS_CODEPAGE_874 is not set
# CONFIG_NLS_ISO8859_8 is not set
# CONFIG_NLS_CODEPAGE_1250 is not set
# CONFIG_NLS_CODEPAGE_1251 is not set
CONFIG_NLS_ISO8859_1=y
# CONFIG_NLS_ISO8859_2 is not set
# CONFIG_NLS_ISO8859_3 is not set
# CONFIG_NLS_ISO8859_4 is not set
# CONFIG_NLS_ISO8859_5 is not set
# CONFIG_NLS_ISO8859_6 is not set
# CONFIG_NLS_ISO8859_7 is not set
# CONFIG_NLS_ISO8859_9 is not set
# CONFIG_NLS_ISO8859_13 is not set
# CONFIG_NLS_ISO8859_14 is not set
# CONFIG_NLS_ISO8859_15 is not set
# CONFIG_NLS_KOI8_R is not set
# CONFIG_NLS_KOI8_U is not set
CONFIG_NLS_UTF8=y
#
# Console drivers
#
CONFIG_VGA_CONSOLE=y
CONFIG_VIDEO_SELECT=y
# CONFIG_MDA_CONSOLE is not set
#
# Frame-buffer support
#
# CONFIG_FB is not set
#
# Sound
#
CONFIG_SOUND=y
# CONFIG_SOUND_ALI5455 is not set
# CONFIG_SOUND_BT878 is not set
# CONFIG_SOUND_CMPCI is not set
# CONFIG_SOUND_EMU10K1 is not set
# CONFIG_MIDI_EMU10K1 is not set
# CONFIG_SOUND_FUSION is not set
# CONFIG_SOUND_CS4281 is not set
# CONFIG_SOUND_ES1370 is not set
# CONFIG_SOUND_ES1371 is not set
# CONFIG_SOUND_ESSSOLO1 is not set
# CONFIG_SOUND_MAESTRO is not set
# CONFIG_SOUND_MAESTRO3 is not set
# CONFIG_SOUND_FORTE is not set
# CONFIG_SOUND_ICH is not set
# CONFIG_SOUND_RME96XX is not set
# CONFIG_SOUND_SONICVIBES is not set
# CONFIG_SOUND_TRIDENT is not set
# CONFIG_SOUND_MSNDCLAS is not set
# CONFIG_SOUND_MSNDPIN is not set
CONFIG_SOUND_VIA82CXXX=y
# CONFIG_MIDI_VIA82CXXX is not set
# CONFIG_SOUND_OSS is not set
# CONFIG_SOUND_TVMIXER is not set
# CONFIG_SOUND_AD1980 is not set
# CONFIG_SOUND_WM97XX is not set
#
# USB support
#
# CONFIG_USB is not set
#
# Support for USB gadgets
#
# CONFIG_USB_GADGET is not set
#
# Bluetooth support
#
# CONFIG_BLUEZ is not set
#
# Kernel hacking
#
# CONFIG_DEBUG_KERNEL is not set
CONFIG_LOG_BUF_SHIFT=0
#
# Cryptographic options
#
# CONFIG_CRYPTO is not set
#
# Library routines
#
CONFIG_CRC32=y
# CONFIG_ZLIB_INFLATE is not set
# CONFIG_ZLIB_DEFLATE is not set
[-- Attachment #3: eurisco.cpuinfo --]
[-- Type: text/plain, Size: 404 bytes --]
processor : 0
vendor_id : AuthenticAMD
cpu family : 6
model : 4
model name : AMD Athlon(tm) Processor
stepping : 4
cpu MHz : 1394.454
cache size : 256 KB
fdiv_bug : no
hlt_bug : no
f00f_bug : no
coma_bug : no
fpu : yes
fpu_exception : yes
cpuid level : 1
wp : yes
flags : fpu vme de pse tsc msr pae mce cx8 sep mtrr pge mca cmov pat pse36 mmx fxsr syscall mmxext 3dnowext 3dnow
bogomips : 2778.72
[-- Attachment #4: eurisco.lspci --]
[-- Type: text/plain, Size: 5568 bytes --]
00:00.0 Host bridge: VIA Technologies, Inc. VT8363/8365 [KT133/KM133] (rev 03)
Control: I/O- Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR-
Latency: 8
Region 0: Memory at d0000000 (32-bit, prefetchable) [size=64M]
Capabilities: <available only to root>
00:01.0 PCI bridge: VIA Technologies, Inc. VT8363/8365 [KT133/KM133 AGP] (prog-if 00 [Normal decode])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort+ >SERR- <PERR-
Latency: 0
Bus: primary=00, secondary=01, subordinate=01, sec-latency=0
I/O behind bridge: 0000a000-0000afff
Memory behind bridge: d4000000-d7ffffff
Prefetchable memory behind bridge: d8000000-d9ffffff
BridgeCtl: Parity- SERR- NoISA+ VGA+ MAbort- >Reset- FastB2B-
Capabilities: <available only to root>
00:07.0 ISA bridge: VIA Technologies, Inc. VT82C686 [Apollo Super South] (rev 40)
Subsystem: VIA Technologies, Inc. VT82C686/A PCI to ISA Bridge
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping+ SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 0
Capabilities: <available only to root>
00:07.1 IDE interface: VIA Technologies, Inc. VT82C586A/B/VT82C686/A/B/VT8233/A/C/VT8235 PIPC Bus Master IDE (rev 06) (prog-if 8a [Master SecP PriP])
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 32
Region 4: I/O ports at b000 [size=16]
Capabilities: <available only to root>
00:07.2 USB Controller: VIA Technologies, Inc. USB (rev 16) (prog-if 00 [UHCI])
Subsystem: VIA Technologies, Inc. (Wrong ID) USB Controller
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 32, Cache Line Size: 0x08 (32 bytes)
Interrupt: pin D routed to IRQ 11
Region 4: I/O ports at b400 [size=32]
Capabilities: <available only to root>
00:07.3 USB Controller: VIA Technologies, Inc. USB (rev 16) (prog-if 00 [UHCI])
Subsystem: VIA Technologies, Inc. (Wrong ID) USB Controller
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 32, Cache Line Size: 0x08 (32 bytes)
Interrupt: pin D routed to IRQ 11
Region 4: I/O ports at b800 [size=32]
Capabilities: <available only to root>
00:07.4 Host bridge: VIA Technologies, Inc. VT82C686 [Apollo Super ACPI] (rev 40)
Control: I/O- Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B+ ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Interrupt: pin ? routed to IRQ 12
Capabilities: <available only to root>
00:07.5 Multimedia audio controller: VIA Technologies, Inc. VT82C686 AC97 Audio Controller (rev 50)
Subsystem: Micro-Star International Co., Ltd.: Unknown device 3300
Control: I/O+ Mem- BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Interrupt: pin C routed to IRQ 10
Region 0: I/O ports at bc00 [size=256]
Region 1: I/O ports at c000 [size=4]
Region 2: I/O ports at c400 [size=4]
Capabilities: <available only to root>
00:0d.0 Unknown mass storage controller: Promise Technology, Inc. 20268 (rev 02) (prog-if 85)
Subsystem: Promise Technology, Inc. Ultra100TX2
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz+ UDF- FastB2B- ParErr- DEVSEL=slow >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 32 (1000ns min, 4500ns max), Cache Line Size: 0x08 (32 bytes)
Interrupt: pin A routed to IRQ 11
Region 0: I/O ports at cc00 [size=8]
Region 1: I/O ports at d000 [size=4]
Region 2: I/O ports at d400 [size=8]
Region 3: I/O ports at d800 [size=4]
Region 4: I/O ports at dc00 [size=16]
Region 5: Memory at db000000 (32-bit, non-prefetchable) [size=16K]
Expansion ROM at <unassigned> [disabled] [size=16K]
Capabilities: <available only to root>
00:0e.0 Ethernet controller: 3Com Corporation 3c905 100BaseTX [Boomerang]
Control: I/O+ Mem+ BusMaster+ SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap- 66Mhz- UDF- FastB2B- ParErr- DEVSEL=medium >TAbort- <TAbort- <MAbort- >SERR- <PERR-
Latency: 32 (750ns min, 2000ns max)
Interrupt: pin A routed to IRQ 10
Region 0: I/O ports at e000 [size=64]
Expansion ROM at <unassigned> [disabled] [size=64K]
01:00.0 VGA compatible controller: 3Dfx Interactive, Inc. Voodoo 3 (rev 01) (prog-if 00 [VGA])
Subsystem: 3Dfx Interactive, Inc. Voodoo3 AGP
Control: I/O+ Mem+ BusMaster- SpecCycle- MemWINV- VGASnoop- ParErr- Stepping- SERR- FastB2B-
Status: Cap+ 66Mhz+ UDF- FastB2B+ ParErr- DEVSEL=fast >TAbort- <TAbort- <MAbort- >SERR- <PERR+
Interrupt: pin A routed to IRQ 11
Region 0: Memory at d4000000 (32-bit, non-prefetchable) [size=32M]
Region 1: Memory at d8000000 (32-bit, prefetchable) [size=32M]
Region 2: I/O ports at a000 [size=256]
Expansion ROM at <unassigned> [disabled] [size=64K]
Capabilities: <available only to root>
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Strange DMA-errors and system hang with Promise 20268
2004-03-06 19:47 Strange DMA-errors and system hang with Promise 20268 Henrik Persson
2004-03-06 19:55 ` Henrik Persson
@ 2004-03-07 1:05 ` Mario 'BitKoenig' Holbe
2004-03-08 13:30 ` Henrik Persson
` (2 more replies)
[not found] ` <200405052339.i45NdXsx003369@darkside.22.kls.lan>
2 siblings, 3 replies; 26+ messages in thread
From: Mario 'BitKoenig' Holbe @ 2004-03-07 1:05 UTC (permalink / raw)
To: linux-kernel
Henrik Persson <nix@syndicalist.net> wrote:
> boxes. It is up and running without any problems and then suddently i
> get this in the syslog:
> Mar 6 20:29:42 eurisco kernel: hdf: dma_timer_expiry: dma status == 0x61
> And a few seconds later the system has frozen and I have to reset the
Same here:
Mar 4 01:01:06 darkside kernel: hde: dma_timer_expiry: dma status == 0x21
Mar 5 01:02:00 darkside kernel: hde: dma_timer_expiry: dma status == 0x21
Mar 6 01:10:22 darkside kernel: hde: dma_timer_expiry: dma status == 0x21
Can you somehow correlate this to start of S.M.A.R.T selftests?
I suspect it having something to do with 2.4.25 new "One last
read after the timeout" in ide-iops.c and accessing the drive
while selftest running (possibly especially short selftest).
Here, daily at 01:00 smartmontools runs smart short selftests
and a bit later the machine hangs.
Today, I disabled that job and the machine stays stable.
> error another device, but it's allways a device on the promise
> controller, fails.
Dito... PDC20269 U133TX2
CONFIG_BLK_DEV_PDC202XX_NEW=y
And until now it was always hde connected to the promise
controller.
> I've seen this behaviour with 2.4.25, 2.4.24 and 2.4.23 (I think).
My machine did run at least since:
Jan 18 09:41:21 darkside kernel: Linux version 2.4.24
...
Feb 28 01:43:48 darkside kernel: Linux version 2.4.24
Feb 28 04:58:47 darkside kernel: Linux version 2.4.25
First time the problem occured was Mar 4 01:01:06.
smartmontools last update was at Feb 14 03:04
regards,
Mario
--
I heard, if you play a NT-CD backwards, you get satanic messages...
That's nothing. If you play it forwards, it installs NT.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Strange DMA-errors and system hang with Promise 20268
2004-03-07 1:05 ` Mario 'BitKoenig' Holbe
@ 2004-03-08 13:30 ` Henrik Persson
2004-03-10 11:50 ` Bruce Allen
2004-05-19 17:20 ` Sebastian
2004-05-29 14:42 ` Strange DMA-errors and system hang with Promise 20268 Mario 'BitKoenig' Holbe
2 siblings, 1 reply; 26+ messages in thread
From: Henrik Persson @ 2004-03-08 13:30 UTC (permalink / raw)
To: Mario 'BitKoenig' Holbe; +Cc: linux-kernel
On Sun, 2004-03-07 at 02:05, Mario 'BitKoenig' Holbe wrote:
> Same here:
>
> Mar 4 01:01:06 darkside kernel: hde: dma_timer_expiry: dma status == 0x21
> Mar 5 01:02:00 darkside kernel: hde: dma_timer_expiry: dma status == 0x21
> Mar 6 01:10:22 darkside kernel: hde: dma_timer_expiry: dma status == 0x21
>
> Can you somehow correlate this to start of S.M.A.R.T selftests?
Nope. To this date I wasn't running anything of the sort. I ran a few
selftest now though.. Nothing happened..
> I suspect it having something to do with 2.4.25 new "One last
> read after the timeout" in ide-iops.c and accessing the drive
> while selftest running (possibly especially short selftest).
> Here, daily at 01:00 smartmontools runs smart short selftests
> and a bit later the machine hangs.
> Today, I disabled that job and the machine stays stable.
This happens every now and then.. Sometimes once a week or once a month.
Sometimes it's once per hour. I can't correlate this behaviour with any
activity that the box in question is doing (mysql, nfsd)..
> > error another device, but it's allways a device on the promise
> > controller, fails.
>
> Dito... PDC20269 U133TX2
> CONFIG_BLK_DEV_PDC202XX_NEW=y
>
> And until now it was always hde connected to the promise
> controller.
>
> > I've seen this behaviour with 2.4.25, 2.4.24 and 2.4.23 (I think).
>
> My machine did run at least since:
> Jan 18 09:41:21 darkside kernel: Linux version 2.4.24
> ...
> Feb 28 01:43:48 darkside kernel: Linux version 2.4.24
> Feb 28 04:58:47 darkside kernel: Linux version 2.4.25
>
> First time the problem occured was Mar 4 01:01:06.
I've had those problems for at least a month. ;/
I just have no clue what's wrong with the damn thing.
--
Henrik Persson <nix@syndicalist.net>
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Strange DMA-errors and system hang with Promise 20268
2004-03-08 13:30 ` Henrik Persson
@ 2004-03-10 11:50 ` Bruce Allen
2004-03-10 12:36 ` Mario 'BitKoenig' Holbe
2004-03-10 15:41 ` Mario 'BitKoenig' Holbe
0 siblings, 2 replies; 26+ messages in thread
From: Bruce Allen @ 2004-03-10 11:50 UTC (permalink / raw)
To: Henrik Persson; +Cc: Mario 'BitKoenig' Holbe, linux-kernel
> > Mar 4 01:01:06 darkside kernel: hde: dma_timer_expiry: dma status == 0x21
> > Mar 5 01:02:00 darkside kernel: hde: dma_timer_expiry: dma status == 0x21
> > Mar 6 01:10:22 darkside kernel: hde: dma_timer_expiry: dma status == 0x21
> >
> > Can you somehow correlate this to start of S.M.A.R.T selftests?
>
> Nope. To this date I wasn't running anything of the sort. I ran a few
> selftest now though.. Nothing happened..
>
> > I suspect it having something to do with 2.4.25 new "One last
> > read after the timeout" in ide-iops.c and accessing the drive
> > while selftest running (possibly especially short selftest).
> > Here, daily at 01:00 smartmontools runs smart short selftests
> > and a bit later the machine hangs.
> > Today, I disabled that job and the machine stays stable.
Does the disk's SMART error log (smartctl -l error) show any entries
related to this problem? If so, please print them with the latest version
of smartmontools (5.30) which makes them more 'human readable" than
previous versions.
Bruce
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Strange DMA-errors and system hang with Promise 20268
2004-03-10 11:50 ` Bruce Allen
@ 2004-03-10 12:36 ` Mario 'BitKoenig' Holbe
2004-03-10 15:00 ` Henrik Persson
2004-03-10 15:41 ` Mario 'BitKoenig' Holbe
1 sibling, 1 reply; 26+ messages in thread
From: Mario 'BitKoenig' Holbe @ 2004-03-10 12:36 UTC (permalink / raw)
To: Bruce Allen; +Cc: Henrik Persson, linux-kernel
On Wed, Mar 10, 2004 at 05:50:12AM -0600, Bruce Allen wrote:
> Does the disk's SMART error log (smartctl -l error) show any entries
> related to this problem? If so, please print them with the latest version
No, none at all. This was the first I was looking at, because
I just thought it was some disk problem.
regards,
Mario
--
"Why are we hiding from the police, daddy?" | J. E. Guenther
"Because we use SuSE son, they use SYSVR4." | de.alt.sysadmin.recovery
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Strange DMA-errors and system hang with Promise 20268
2004-03-10 12:36 ` Mario 'BitKoenig' Holbe
@ 2004-03-10 15:00 ` Henrik Persson
2004-03-11 9:36 ` Bruce Allen
0 siblings, 1 reply; 26+ messages in thread
From: Henrik Persson @ 2004-03-10 15:00 UTC (permalink / raw)
To: Mario 'BitKoenig' Holbe; +Cc: Bruce Allen, linux-kernel
[-- Attachment #1: Type: text/plain, Size: 944 bytes --]
On Wed, 2004-03-10 at 13:36, Mario 'BitKoenig' Holbe wrote:
> On Wed, Mar 10, 2004 at 05:50:12AM -0600, Bruce Allen wrote:
> > Does the disk's SMART error log (smartctl -l error) show any entries
> > related to this problem? If so, please print them with the latest version
>
> No, none at all. This was the first I was looking at, because
> I just thought it was some disk problem.
Same here. Just one of the discs that has stopped during the last month
has any entries in the log at all. Those errors are attached.
The funny thing is that the machine stops responding after the
dma_timer_expiry.. Why doesn't just the kernel (or the controller for
that matter) disable DMA and then the problem would be solved, if the
problem is related to DMA, right? Sure, the speed (or lack of it) would
be painful but I wouldn't need to sit 60km from home and wondering why
my box just stopped responding. ;/
--
Henrik Persson <nix@syndicalist.net>
[-- Attachment #2: smarterrors --]
[-- Type: text/plain, Size: 3753 bytes --]
smartctl version 5.26 Copyright (C) 2002-3 Bruce Allen
Home page is http://smartmontools.sourceforge.net/
=== START OF READ SMART DATA SECTION ===
SMART Error Log Version: 1
ATA Error Count: 4
CR = Command Register [HEX]
FR = Features Register [HEX]
SC = Sector Count Register [HEX]
SN = Sector Number Register [HEX]
CL = Cylinder Low Register [HEX]
CH = Cylinder High Register [HEX]
DH = Device/Head Register [HEX]
DC = Device Command Register [HEX]
ER = Error register [HEX]
ST = Status register [HEX]
Timestamp = decimal seconds since the previous disk power-on.
Note: timestamp "wraps" after 2^32 msec = 49.710 days.
Error 4 occurred at disk power-on lifetime: 6619 hours
When the command that caused the error occurred, the device was in an unknown state.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 00 00 00 00 e0 Error: ICRC, ABRT
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Timestamp Command/Feature_Name
-- -- -- -- -- -- -- -- --------- --------------------
c8 ff 01 00 00 00 e0 08 546.992 READ DMA
ef 03 45 20 77 a5 e0 08 546.992 SET FEATURES [Set transfer mode]
c6 ff 10 20 77 a5 e0 08 546.992 SET MULTIPLE MODE
10 ff 50 20 77 a5 e0 08 546.992 RECALIBRATE [OBS-4]
91 03 3f 20 77 a5 ef 08 546.992 INITIALIZE DEVICE PARAMETERS [OBS-6]
Error 3 occurred at disk power-on lifetime: 6619 hours
When the command that caused the error occurred, the device was in an unknown state.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 00 00 00 00 e0 Error: ICRC, ABRT
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Timestamp Command/Feature_Name
-- -- -- -- -- -- -- -- --------- --------------------
c8 ff 01 00 00 00 e0 08 516.560 READ DMA
ef 03 45 c5 7b e3 e0 08 516.560 SET FEATURES [Set transfer mode]
c6 ff 10 c5 7b e3 e0 08 516.560 SET MULTIPLE MODE
10 ff 50 c5 7b e3 e0 08 516.544 RECALIBRATE [OBS-4]
91 03 3f c5 7b e3 ef 08 516.544 INITIALIZE DEVICE PARAMETERS [OBS-6]
Error 2 occurred at disk power-on lifetime: 6619 hours
When the command that caused the error occurred, the device was in an unknown state.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 00 00 00 00 e0 Error: ICRC, ABRT
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Timestamp Command/Feature_Name
-- -- -- -- -- -- -- -- --------- --------------------
c8 ff 01 00 00 00 e0 08 501.328 READ DMA
ef 03 45 18 bb 65 e0 08 501.328 SET FEATURES [Set transfer mode]
c6 ff 10 18 bb 65 e0 08 501.328 SET MULTIPLE MODE
10 ff 50 18 bb 65 e0 08 501.312 RECALIBRATE [OBS-4]
91 03 3f 18 bb 65 ef 08 501.312 INITIALIZE DEVICE PARAMETERS [OBS-6]
Error 1 occurred at disk power-on lifetime: 6619 hours
When the command that caused the error occurred, the device was in an unknown state.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 00 00 00 00 e0 Error: ICRC, ABRT
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Timestamp Command/Feature_Name
-- -- -- -- -- -- -- -- --------- --------------------
c8 ff 01 00 00 00 e0 08 420.528 READ DMA
ef 03 45 73 3d 65 e0 08 412.896 SET FEATURES [Set transfer mode]
c6 ff 10 73 3d 65 e0 08 412.896 SET MULTIPLE MODE
10 ff 50 73 3d 65 e0 08 412.896 RECALIBRATE [OBS-4]
91 03 3f 73 3d 65 ef 08 412.896 INITIALIZE DEVICE PARAMETERS [OBS-6]
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Strange DMA-errors and system hang with Promise 20268
2004-03-10 11:50 ` Bruce Allen
2004-03-10 12:36 ` Mario 'BitKoenig' Holbe
@ 2004-03-10 15:41 ` Mario 'BitKoenig' Holbe
2004-03-11 9:25 ` Bruce Allen
1 sibling, 1 reply; 26+ messages in thread
From: Mario 'BitKoenig' Holbe @ 2004-03-10 15:41 UTC (permalink / raw)
To: Bruce Allen; +Cc: Henrik Persson, linux-kernel
On Wed, Mar 10, 2004 at 05:50:12AM -0600, Bruce Allen wrote:
> > > I suspect it having something to do with 2.4.25 new "One last
> > > read after the timeout" in ide-iops.c and accessing the drive
> > > while selftest running (possibly especially short selftest).
> Does the disk's SMART error log (smartctl -l error) show any entries
Just in addition, to point this out more clearly:
I personally don't suspect smartmontools having some
problem.
I run debians smartmontools package since a long time
and it does the selftests a long time as well. It never
had problems with it, it wasnt updated close to first
occurence of the problem or changed in any other way.
I have 4 disks, 2 on the onboard VIA controller, 2 on
the Promise. The problem always occured on the Promise
(like Henrik pointed too) disk.
I more suspect any kernel ide <-> promise-driver timing
problem. Maybe smartmontools makes it more possibe that
this timing problem occurs, maybe not (with Henriks
answer to my question I rather favorite the 'maybe not'),
maybe it's even just some load issue making the problem
occur.
Mario
--
<jv> Oh well, config
<jv> one actually wonders what force in the universe is holding it
<jv> and makes it working
<Beeth> chances and accidents :)
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Strange DMA-errors and system hang with Promise 20268
2004-03-10 15:41 ` Mario 'BitKoenig' Holbe
@ 2004-03-11 9:25 ` Bruce Allen
0 siblings, 0 replies; 26+ messages in thread
From: Bruce Allen @ 2004-03-11 9:25 UTC (permalink / raw)
To: Mario 'BitKoenig' Holbe; +Cc: Henrik Persson, linux-kernel
> On Wed, Mar 10, 2004 at 05:50:12AM -0600, Bruce Allen wrote:
> > > > I suspect it having something to do with 2.4.25 new "One last
> > > > read after the timeout" in ide-iops.c and accessing the drive
> > > > while selftest running (possibly especially short selftest).
> > Does the disk's SMART error log (smartctl -l error) show any entries
>
> Just in addition, to point this out more clearly:
> I personally don't suspect smartmontools having some
> problem.
> I run debians smartmontools package since a long time
> and it does the selftests a long time as well. It never
> had problems with it, it wasnt updated close to first
> occurence of the problem or changed in any other way.
> I have 4 disks, 2 on the onboard VIA controller, 2 on
> the Promise. The problem always occured on the Promise
> (like Henrik pointed too) disk.
> I more suspect any kernel ide <-> promise-driver timing
> problem. Maybe smartmontools makes it more possibe that
> this timing problem occurs, maybe not (with Henriks
> answer to my question I rather favorite the 'maybe not'),
> maybe it's even just some load issue making the problem
> occur.
OK, thanks for the reassurance. There have been some warnings about
promise 20262 and 20265 controllers interacting badly with smartmontools
(locking up systems). See
http://cvs.sourceforge.net/viewcvs.py/smartmontools/sm5/WARNINGS?view=markup
Perhaps this is in some way related.
Cheers,
Bruce
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Strange DMA-errors and system hang with Promise 20268
2004-03-10 15:00 ` Henrik Persson
@ 2004-03-11 9:36 ` Bruce Allen
2004-03-11 14:31 ` Henrik Persson
0 siblings, 1 reply; 26+ messages in thread
From: Bruce Allen @ 2004-03-11 9:36 UTC (permalink / raw)
To: Henrik Persson; +Cc: Mario 'BitKoenig' Holbe, Bruce Allen, linux-kernel
> On Wed, 2004-03-10 at 13:36, Mario 'BitKoenig' Holbe wrote:
> > On Wed, Mar 10, 2004 at 05:50:12AM -0600, Bruce Allen wrote:
> > > Does the disk's SMART error log (smartctl -l error) show any entries
> > > related to this problem? If so, please print them with the latest version
> >
> > No, none at all. This was the first I was looking at, because
> > I just thought it was some disk problem.
>
> Same here. Just one of the discs that has stopped during the last
> month has any entries in the log at all. Those errors are attached.
FWIW, these four errors:
Error 4 occurred at disk power-on lifetime: 6619 hours
When the command that caused the error occurred, the device was in an
unknown state.
After command completion occurred, registers were:
ER ST SC SN CL CH DH
-- -- -- -- -- -- --
84 51 00 00 00 00 e0 Error: ICRC, ABRT
Commands leading to the command that caused the error were:
CR FR SC SN CL CH DH DC Timestamp Command/Feature_Name
-- -- -- -- -- -- -- -- --------- --------------------
c8 ff 01 00 00 00 e0 08 546.992 READ DMA
ef 03 45 20 77 a5 e0 08 546.992 SET FEATURES [Set transfer mode]
c6 ff 10 20 77 a5 e0 08 546.992 SET MULTIPLE MODE
10 ff 50 20 77 a5 e0 08 546.992 RECALIBRATE [OBS-4]
91 03 3f 20 77 a5 ef 08 546.992 INITIALIZE DEVICE PARAMETERS [OBS-6]
are all 'conventional' DMA errors, in which there was a CRC error in the
hardware interface between the disk and the controller. Either the cable
connections to this disk were briefly problematic or their was electrical
noise on the lines. Probably not anything to worry about.
> The funny thing is that the machine stops responding after the
> dma_timer_expiry.. Why doesn't just the kernel (or the controller for
> that matter) disable DMA and then the problem would be solved, if the
> problem is related to DMA, right? Sure, the speed (or lack of it) would
> be painful but I wouldn't need to sit 60km from home and wondering why
> my box just stopped responding. ;/
FWIW, there have been reports of problems (system lockup) with
smartmontools on systems with Promise 20262 and 20265 controllers. See:
http://cvs.sourceforge.net/viewcvs.py/smartmontools/sm5/WARNINGS?sortby=date&view=markup
So I guess I will need to add the 20268 controller to this list, although
as Mario says, smartmontools may play only an indirect role, in exposing
an existing problem.
Cheers,
Bruce
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Strange DMA-errors and system hang with Promise 20268
2004-03-11 9:36 ` Bruce Allen
@ 2004-03-11 14:31 ` Henrik Persson
0 siblings, 0 replies; 26+ messages in thread
From: Henrik Persson @ 2004-03-11 14:31 UTC (permalink / raw)
To: Bruce Allen; +Cc: Mario 'BitKoenig' Holbe, linux-kernel
On Thu, 2004-03-11 at 10:36, Bruce Allen wrote:
*snip*
> FWIW, there have been reports of problems (system lockup) with
> smartmontools on systems with Promise 20262 and 20265 controllers. See:
> http://cvs.sourceforge.net/viewcvs.py/smartmontools/sm5/WARNINGS?sortby=date&view=markup
> So I guess I will need to add the 20268 controller to this list, although
> as Mario says, smartmontools may play only an indirect role, in exposing
> an existing problem.
Well. I guess it's an existing problem, cause I didn't even have
smartmontools installed until Mario brought it up. ;)
--
Henrik Persson <nix@syndicalist.net>
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Strange DMA-errors and system hang with Promise 20268
[not found] ` <1083849053.6994.10.camel@vega>
@ 2004-05-06 14:22 ` Mario 'BitKoenig' Holbe
0 siblings, 0 replies; 26+ messages in thread
From: Mario 'BitKoenig' Holbe @ 2004-05-06 14:22 UTC (permalink / raw)
To: Henrik Persson; +Cc: linux-kernel
On Thu, May 06, 2004 at 03:10:54PM +0200, Henrik Persson wrote:
> On Thu, 2004-05-06 at 01:39, Mario 'BitKoenig' Holbe wrote:
> > just to verify things: what is your io_32bit set to?
> io_32bit 0 0 3 rw
Well, it was a chance :) However, I had io_32bit=1 but I
also had a freeze even after setting it back to 0.
> And well. I don't have any problems nowadays. Not a freeze since I sent
> that mail. Strange indeed.
I found a few things: 1st: it happens when the disk is
loaded: best way to freeze the system here is a find over
a big subtree (something around 60.000 files in lots of
directories, ext2), better some of them in parallel
and even better some find -print0 | xargs -0 cat > /dev/null.
But even there: it doesnt happen all time, the chances
just grow up.
2nd: it seems, the disk spins down short before the error
appears. At least all time when I was in front of the
machine when the problem appeared, I heard the disk spinning
down - just as if it was gone to standby mode.
But it was definitely loaded before (I hear the head seeks
and then the spindown). And no, I dont have any automated
spindown like noflushd or something like that.
CC:ing to lkm again. Probably this helps somehow.
regards,
Mario
--
[mod_nessus for iauth]
<delta> "scanning your system...found depreciated OS...found
hole...installing new OS...please reboot and reconnect now"
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Strange DMA-errors and system hang with Promise 20268
2004-03-07 1:05 ` Mario 'BitKoenig' Holbe
2004-03-08 13:30 ` Henrik Persson
@ 2004-05-19 17:20 ` Sebastian
2004-05-19 17:28 ` Mario 'BitKoenig' Holbe
2004-05-29 14:42 ` Strange DMA-errors and system hang with Promise 20268 Mario 'BitKoenig' Holbe
2 siblings, 1 reply; 26+ messages in thread
From: Sebastian @ 2004-05-19 17:20 UTC (permalink / raw)
To: Mario 'BitKoenig' Holbe; +Cc: linux-kernel
Am So, den 07.03.2004 schrieb Mario 'BitKoenig' Holbe um 2:05:
> Mar 6 01:10:22 darkside kernel: hde: dma_timer_expiry: dma status == 0x21
>
> Can you somehow correlate this to start of S.M.A.R.T selftests?
>
> I suspect it having something to do with 2.4.25 new "One last
> read after the timeout" in ide-iops.c and accessing the drive
> while selftest running (possibly especially short selftest).
> Here, daily at 01:00 smartmontools runs smart short selftests
> and a bit later the machine hangs.
> Today, I disabled that job and the machine stays stable.
Same thing here. The machine crashes on weekends shortly after 01:00
after I had upgraded to 2.4.25 and 2.4.26. I disabled smart as you
suggested, but too recently to be sure that it was the cause. It could
just be related to additional load caused by cron jobs?
Any confirmed solutions yet?
Sebastian
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Strange DMA-errors and system hang with Promise 20268
2004-05-19 17:20 ` Sebastian
@ 2004-05-19 17:28 ` Mario 'BitKoenig' Holbe
2004-05-19 18:12 ` Sebastian
0 siblings, 1 reply; 26+ messages in thread
From: Mario 'BitKoenig' Holbe @ 2004-05-19 17:28 UTC (permalink / raw)
To: Sebastian; +Cc: linux-kernel
On Wed, May 19, 2004 at 07:20:58PM +0200, Sebastian wrote:
> Same thing here. The machine crashes on weekends shortly after 01:00
WDC drives involved by accident (i.e. do you have any
WDC drive connected to your promise controller(s))?
> suggested, but too recently to be sure that it was the cause. It could
> just be related to additional load caused by cron jobs?
Yes.
> Any confirmed solutions yet?
Depends :) Not really.
Currently, I suspect it to be some Promise<->WDC issue,
thus it depends on your answer to my question :)
regards,
Mario
--
Uebrigens... Wer frueher stirbt ist laenger tot!
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Strange DMA-errors and system hang with Promise 20268
2004-05-19 17:28 ` Mario 'BitKoenig' Holbe
@ 2004-05-19 18:12 ` Sebastian
[not found] ` <1648.128.150.143.219.1084992082.squirrel@webmail.seven4sky.com>
2004-05-20 9:23 ` Strange DMA-errors and system hang with Promise 20268 Bruce Allen
0 siblings, 2 replies; 26+ messages in thread
From: Sebastian @ 2004-05-19 18:12 UTC (permalink / raw)
To: Mario 'BitKoenig' Holbe; +Cc: linux-kernel
Am Mi, den 19.05.2004 schrieb Mario 'BitKoenig' Holbe um 19:28:
> WDC drives involved by accident (i.e. do you have any
> WDC drive connected to your promise controller(s))?
Actually, it is not a promise controller.
Setup:
00:1f.1 IDE interface: Intel Corp. 82820 820 (Camino 2) Chipset IDE U100
(rev 02)
Device Model: IC35L040AVER07-0
Symptoms are the same, though.
> > suggested, but too recently to be sure that it was the cause. It could
> > just be related to additional load caused by cron jobs?
>
> Yes.
>
> > Any confirmed solutions yet?
>
> Depends :) Not really.
> Currently, I suspect it to be some Promise<->WDC issue,
> thus it depends on your answer to my question :)
Hhmm. I have not changed anything major on that machine except the
Kernel for years. Only after upgrading from 2.4.23 to 2.4.25, I got
these problems.
If there is no problem with the kernel, I have to assume a hardware
failure of some kind. Badblocks/smartlog reveal no errors.
Sebastian
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Strange DMA-errors... (was: ...and system hang with Promise 20268)
[not found] ` <1648.128.150.143.219.1084992082.squirrel@webmail.seven4sky.com>
@ 2004-05-19 20:12 ` Sebastian
2004-05-19 23:47 ` Mario 'BitKoenig' Holbe
0 siblings, 1 reply; 26+ messages in thread
From: Sebastian @ 2004-05-19 20:12 UTC (permalink / raw)
To: samg; +Cc: Mario 'BitKoenig' Holbe, linux-kernel
Hi Sam,
Am Mi, den 19.05.2004 schrieb Sam Gill um 20:41:
> Have you tried changing the dma settings
>
> turn off dma
yes, it probably will help as DMA is currently on, however, that is not
an option as I need acceptable hard drive performance. As I said, the
system worked nicely until that kernel update. If no hardware failure is
causing this, it must be a change in the kernel. I will replace the hard
disk and see if the problem disappears.
Sebastian
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Strange DMA-errors... (was: ...and system hang with Promise 20268)
2004-05-19 20:12 ` Strange DMA-errors... (was: ...and system hang with Promise 20268) Sebastian
@ 2004-05-19 23:47 ` Mario 'BitKoenig' Holbe
0 siblings, 0 replies; 26+ messages in thread
From: Mario 'BitKoenig' Holbe @ 2004-05-19 23:47 UTC (permalink / raw)
To: Sebastian; +Cc: linux-kernel
On Wed, May 19, 2004 at 10:12:57PM +0200, Sebastian wrote:
> system worked nicely until that kernel update. If no hardware failure is
> causing this, it must be a change in the kernel. I will replace the hard
> disk and see if the problem disappears.
Wouldn't it be more reasonable (and more easy too) to
first undo the kernel upgrade, i.e. downgrade back to
your old kernel and see what happens? :)
just my 2 cents,
Mario
--
<snupidity> bjmg: ja, logik ist mein fachgebiet. das liegt im gen
<uepsie> in welchem?
<snupidity> im zweiten X
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Strange DMA-errors and system hang with Promise 20268
2004-05-19 18:12 ` Sebastian
[not found] ` <1648.128.150.143.219.1084992082.squirrel@webmail.seven4sky.com>
@ 2004-05-20 9:23 ` Bruce Allen
2004-05-20 10:35 ` Strange DMA-errors and system hang with SMART (was: ...and system hang with Promise 20268) Sebastian
1 sibling, 1 reply; 26+ messages in thread
From: Bruce Allen @ 2004-05-20 9:23 UTC (permalink / raw)
To: Sebastian; +Cc: Mario 'BitKoenig' Holbe, linux-kernel
> Hhmm. I have not changed anything major on that machine except the
> Kernel for years. Only after upgrading from 2.4.23 to 2.4.25, I got
> these problems.
> If there is no problem with the kernel, I have to assume a hardware
> failure of some kind. Badblocks/smartlog reveal no errors.
Sebastian, does the disk's SMART error log (smartctl -l error) give any
indication of what's wrong?
Bruce
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Strange DMA-errors and system hang with SMART (was: ...and system hang with Promise 20268)
2004-05-20 9:23 ` Strange DMA-errors and system hang with Promise 20268 Bruce Allen
@ 2004-05-20 10:35 ` Sebastian
2004-05-23 12:46 ` Bruce Allen
0 siblings, 1 reply; 26+ messages in thread
From: Sebastian @ 2004-05-20 10:35 UTC (permalink / raw)
To: Bruce Allen; +Cc: Mario 'BitKoenig' Holbe, linux-kernel
Am Do, den 20.05.2004 schrieb Bruce Allen um 11:23:
> Sebastian, does the disk's SMART error log (smartctl -l error) give any
> indication of what's wrong?
Hi Bruce,
no, no errors were logged. However, and that was the reason why I
initially saw a connection to the old thread from March 2004 that talked
about SMART tests correlated with these DMA errors:
SMART Self-test log
# 1 Short offline Interrupted (host reset) 10%
#10 Short offline Interrupted (host reset) 10%
#15 Short offline Interrupted (host reset) 10%
Thus, short offline self-tests were running at the times where the DMA
errors and system hangs occurred.
Further, there seems to be a known problem with SMART related to the
hard drive that I am using:
Device Model: IC35L040AVER07-0
However, I had been running SMART self-tests without any problems until
that kernel upgrade.
I already turned off SMART self-tests - no crashes so far, but too early
to be sure. Maybe it is just a hard drive problem that did not show up
before. Since I had planned to replace that disk drive anyhow, I will do
that sooner.
Thanks,
Sebastian
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Strange DMA-errors and system hang with SMART (was: ...and system hang with Promise 20268)
2004-05-20 10:35 ` Strange DMA-errors and system hang with SMART (was: ...and system hang with Promise 20268) Sebastian
@ 2004-05-23 12:46 ` Bruce Allen
2004-06-02 19:00 ` Sebastian
0 siblings, 1 reply; 26+ messages in thread
From: Bruce Allen @ 2004-05-23 12:46 UTC (permalink / raw)
To: Sebastian; +Cc: Mario 'BitKoenig' Holbe, Linux Kernel Mailing List
Hi Sebastian,
Sorry it's taken me so long to reply. My usual googling of smartmontools
didn't turn this up because you changed the subject line and started a new
thread. You wrote:
> Further, there seems to be a known problem with SMART related to the
> hard drive that I am using:
> Device Model: IC35L040AVER07-0
I hadn't realized until now that the drive is an IBM GXP60.
smartctl is *supposed* to print a warning message for these drives, to
tell users to look at http://www.geocities.com/dtla_update/index.html#rel
for pointers to updated firmware for this drive! What firmware version do
you have?
If you do smartctl -P showall, you'll see that there is the following
entry -- but the regular expression doesn't match your drive because of
the '-0' and '-1' suffix (which usually indicates RAM cache size of the
disk drive). I'll do a bit of research and then probably modify the
smartmontools regular expression to be sure to recognize the drive.
MODEL REGEXP: IC35L0[12346]0AVER07
FIRMWARE REGEXP: .*
ATTRIBUTE OPTIONS: None preset; no -v options are required.
WARNINGS: IBM Deskstar 60GXP drives may need upgraded SMART
firmware. Please see http://www.geocities.com/dtla_update/index.html#rel
Meanwhile, what firmware version do you have? I suggest you upgrade it --
this may fix the problem. The final firmware with the SMART fixes seems
to be A46A.
Cheers,
Bruce
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Strange DMA-errors and system hang with Promise 20268
2004-03-07 1:05 ` Mario 'BitKoenig' Holbe
2004-03-08 13:30 ` Henrik Persson
2004-05-19 17:20 ` Sebastian
@ 2004-05-29 14:42 ` Mario 'BitKoenig' Holbe
2004-05-29 22:51 ` Gene Heskett
` (2 more replies)
2 siblings, 3 replies; 26+ messages in thread
From: Mario 'BitKoenig' Holbe @ 2004-05-29 14:42 UTC (permalink / raw)
To: linux-kernel; +Cc: Henrik Persson, Sebastian, Bruce Allen
Hi,
On Sun, Mar 07, 2004 at 02:05:46AM +0100, Mario 'BitKoenig' Holbe wrote:
> Mar 4 01:01:06 darkside kernel: hde: dma_timer_expiry: dma status == 0x21
hmmm, it seems I solved the issue in my case.
I did just connect the two disks on the Promise to a second separate
power supply and everything works rock-stable and survives all things
that went wrong before (parallel fsck, parallel hdparm -t as well as
'normal operation' over days).
Things remain stable even with the WDC disk connected back to the
Promise, which was heavy unstable before.
Moreover, when I connect the disks back to the machines internal power
supply, the problems arise again - immediately.
IMHO, this does also explain, why the problems happen while heavy I/O
(parallel over all disks) and/or while S.M.A.R.T selftests are running
(also: parallel over all disks).
Well, I thought, a 300W power supply would be enough for 1GHz P-III
and 4 disks. And it definitely was enough for a long time.
Most likely, the power supply just aged and its capacity decreased.
Perhaps, the disks consume more power with newer kernels, but while
crawling the lkm archives I found similar reports also for 2.4.19.
I have no idea, why always the disks connected to the Promise
controller did fail. Perhaps, the Promise is more sensitive regarding
signal quality on the IDE wire.
I have no idea, why my WDC disk failed when connected to the Promise,
while others did work far more stable. Perhaps, the WDC disks signal
quality under low-power is more bad than the one of other disks.
I have no idea, why the WDC disk did work well when it was connected
to the onboard controller. Perhaps, the lower signal quality of the
WDC (if so) and the sensitivity of the Promise (if so) added together
was too much at all.
Henrik, Sebastian: if you still have problems, this is probably
something to test for you.
Sebastian: if you experience more instability after exchanging your
drive with a newer (bigger? faster?) one, your chance is high to get
it solved with a bigger power supply, I'd guess :)
Bruce: this is probably something to hint for at the smartmontools
warning page.
regards,
Mario
--
It is practically impossible to teach good programming style to students
that have had prior exposure to BASIC: as potential programmers they are
mentally mutilated beyond hope of regeneration. -- Dijkstra
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Strange DMA-errors and system hang with Promise 20268
2004-05-29 14:42 ` Strange DMA-errors and system hang with Promise 20268 Mario 'BitKoenig' Holbe
@ 2004-05-29 22:51 ` Gene Heskett
2004-05-30 10:41 ` Henrik Persson
2004-06-04 20:09 ` Bruce Allen
2 siblings, 0 replies; 26+ messages in thread
From: Gene Heskett @ 2004-05-29 22:51 UTC (permalink / raw)
To: Mario 'BitKoenig' Holbe, linux-kernel, Henrik Persson,
Sebastian, Bruce Allen
On Saturday 29 May 2004 10:42, Mario 'BitKoenig' Holbe wrote:
>Hi,
>
>On Sun, Mar 07, 2004 at 02:05:46AM +0100, Mario 'BitKoenig' Holbe
wrote:
>> Mar 4 01:01:06 darkside kernel: hde: dma_timer_expiry: dma status
>> == 0x21
>
>hmmm, it seems I solved the issue in my case.
>
>I did just connect the two disks on the Promise to a second separate
>power supply and everything works rock-stable and survives all
> things that went wrong before (parallel fsck, parallel hdparm -t as
> well as 'normal operation' over days).
>Things remain stable even with the WDC disk connected back to the
>Promise, which was heavy unstable before.
>
>Moreover, when I connect the disks back to the machines internal
> power supply, the problems arise again - immediately.
>
>IMHO, this does also explain, why the problems happen while heavy
> I/O (parallel over all disks) and/or while S.M.A.R.T selftests are
> running (also: parallel over all disks).
>
>Well, I thought, a 300W power supply would be enough for 1GHz P-III
>and 4 disks. And it definitely was enough for a long time.
It probably was, back up the log to when that 1Ghz P-III was new.
However, psu's tend to suck up and trap all the dust in the universe
(I mean there can't be that much dust in *my* house, can there? :)
and can get ungodly hot, thereby cooking the goodies out of the
electrolytic caps, which have failure temps ranging from 85C to 105C.
This is one of the reasons I try to pull all my gear apart and give it
a good blasting with the air hose from my junky old air compressor at
least annually.
Anyway, to make a long story shorter, either have a technician equipt
with a ESR reading meter look all the caps in the psu over and
replace any that aren't up to snuff, or replace the supply with a
fresh one. But, finding a tech with an ESR meter could be quite a
chore, so I'd go psu shopping at Circuit City or wherever since the
tech will probably want more in labor than fresh new ones cost.
>Most likely, the power supply just aged and its capacity decreased.
>
>Perhaps, the disks consume more power with newer kernels, but while
>crawling the lkm archives I found similar reports also for 2.4.19.
>
>I have no idea, why always the disks connected to the Promise
>controller did fail. Perhaps, the Promise is more sensitive
> regarding signal quality on the IDE wire.
>
>I have no idea, why my WDC disk failed when connected to the
> Promise, while others did work far more stable. Perhaps, the WDC
> disks signal quality under low-power is more bad than the one of
> other disks.
>
>I have no idea, why the WDC disk did work well when it was connected
>to the onboard controller. Perhaps, the lower signal quality of the
>WDC (if so) and the sensitivity of the Promise (if so) added
> together was too much at all.
>
>
>Henrik, Sebastian: if you still have problems, this is probably
>something to test for you.
>
>Sebastian: if you experience more instability after exchanging your
>drive with a newer (bigger? faster?) one, your chance is high to get
>it solved with a bigger power supply, I'd guess :)
>
>Bruce: this is probably something to hint for at the smartmontools
>warning page.
>
>
>regards,
> Mario
--
Cheers, Gene
"There are four boxes to be used in defense of liberty:
soap, ballot, jury, and ammo. Please use in that order."
-Ed Howdershelt (Author)
99.23% setiathome rank, not too shabby for a WV hillbilly
Yahoo.com attorneys please note, additions to this message
by Gene Heskett are:
Copyright 2004 by Maurice Eugene Heskett, all rights reserved.
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Strange DMA-errors and system hang with Promise 20268
2004-05-29 14:42 ` Strange DMA-errors and system hang with Promise 20268 Mario 'BitKoenig' Holbe
2004-05-29 22:51 ` Gene Heskett
@ 2004-05-30 10:41 ` Henrik Persson
2004-06-04 20:09 ` Bruce Allen
2 siblings, 0 replies; 26+ messages in thread
From: Henrik Persson @ 2004-05-30 10:41 UTC (permalink / raw)
To: Mario 'BitKoenig' Holbe; +Cc: linux-kernel, Sebastian, Bruce Allen
On Sat, 2004-05-29 at 16:42, Mario 'BitKoenig' Holbe wrote:
> Hi,
>
> On Sun, Mar 07, 2004 at 02:05:46AM +0100, Mario 'BitKoenig' Holbe wrote:
> > Mar 4 01:01:06 darkside kernel: hde: dma_timer_expiry: dma status == 0x21
>
> hmmm, it seems I solved the issue in my case.
>
> I did just connect the two disks on the Promise to a second separate
> power supply and everything works rock-stable and survives all things
> that went wrong before (parallel fsck, parallel hdparm -t as well as
> 'normal operation' over days).
> Things remain stable even with the WDC disk connected back to the
> Promise, which was heavy unstable before.
>
> Moreover, when I connect the disks back to the machines internal power
> supply, the problems arise again - immediately.
>
> IMHO, this does also explain, why the problems happen while heavy I/O
> (parallel over all disks) and/or while S.M.A.R.T selftests are running
> (also: parallel over all disks).
>
> Well, I thought, a 300W power supply would be enough for 1GHz P-III
> and 4 disks. And it definitely was enough for a long time.
>
> Most likely, the power supply just aged and its capacity decreased.
>
> Perhaps, the disks consume more power with newer kernels, but while
> crawling the lkm archives I found similar reports also for 2.4.19.
>
> I have no idea, why always the disks connected to the Promise
> controller did fail. Perhaps, the Promise is more sensitive regarding
> signal quality on the IDE wire.
>
> I have no idea, why my WDC disk failed when connected to the Promise,
> while others did work far more stable. Perhaps, the WDC disks signal
> quality under low-power is more bad than the one of other disks.
>
> I have no idea, why the WDC disk did work well when it was connected
> to the onboard controller. Perhaps, the lower signal quality of the
> WDC (if so) and the sensitivity of the Promise (if so) added together
> was too much at all.
>
>
> Henrik, Sebastian: if you still have problems, this is probably
> something to test for you.
Well. I don't have those problems anymore but..I have a question..
Should the box freeze just because there is some powerfailures? :/
--
Henrik Persson <nix@syndicalist.net>
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Strange DMA-errors and system hang with SMART (was: ...and system hang with Promise 20268)
2004-05-23 12:46 ` Bruce Allen
@ 2004-06-02 19:00 ` Sebastian
2004-06-03 15:06 ` Bruce Allen
0 siblings, 1 reply; 26+ messages in thread
From: Sebastian @ 2004-06-02 19:00 UTC (permalink / raw)
To: Bruce Allen; +Cc: Mario 'BitKoenig' Holbe, Linux Kernel Mailing List
Am So, den 23.05.2004 schrieb Bruce Allen um 14:46:
> Hi Sebastian,
>
> Sorry it's taken me so long to reply. My usual googling of smartmontools
> didn't turn this up because you changed the subject line and started a new
> thread.
Sorry for my late reply, too. I had been out of country and away from
Internet.
> I hadn't realized until now that the drive is an IBM GXP60.
>
> smartctl is *supposed* to print a warning message for these drives, to
> tell users to look at http://www.geocities.com/dtla_update/index.html#rel
> for pointers to updated firmware for this drive! What firmware version do
> you have?
Yes, the warning is there. However, there never had been a problem with
it for years until I upgraded the kernel. I probably should have paid
more attention to that warning... The problem is that most of the links
are broken on the page that you refer to.
I am pretty sure now that the DMA-error is related to smart as the
server run without problems for a couple of weeks until someone started
smartd again by mistake. Three days later the box froze again just after
1 am.
> Meanwhile, what firmware version do you have? I suggest you upgrade it --
> this may fix the problem. The final firmware with the SMART fixes seems
> to be A46A.
ER4OA44A
Thanks for the infos,
Sebastian
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Strange DMA-errors and system hang with SMART (was: ...and system hang with Promise 20268)
2004-06-02 19:00 ` Sebastian
@ 2004-06-03 15:06 ` Bruce Allen
0 siblings, 0 replies; 26+ messages in thread
From: Bruce Allen @ 2004-06-03 15:06 UTC (permalink / raw)
To: Sebastian; +Cc: Mario 'BitKoenig' Holbe, Linux Kernel Mailing List
> Sorry for my late reply, too. I had been out of country and away from
> Internet.
No problem.
> > I hadn't realized until now that the drive is an IBM GXP60.
> >
> > smartctl is *supposed* to print a warning message for these drives, to
> > tell users to look at http://www.geocities.com/dtla_update/index.html#rel
> > for pointers to updated firmware for this drive! What firmware version do
> > you have?
>
> Yes, the warning is there. However, there never had been a problem with
> it for years until I upgraded the kernel. I probably should have paid
> more attention to that warning... The problem is that most of the links
> are broken on the page that you refer to.
Indeed, you are right. Try this:
http://www-3.ibm.com/pc/support/site.wss/document.do?lndocid=MIGR-42215
^ permalink raw reply [flat|nested] 26+ messages in thread
* Re: Strange DMA-errors and system hang with Promise 20268
2004-05-29 14:42 ` Strange DMA-errors and system hang with Promise 20268 Mario 'BitKoenig' Holbe
2004-05-29 22:51 ` Gene Heskett
2004-05-30 10:41 ` Henrik Persson
@ 2004-06-04 20:09 ` Bruce Allen
2 siblings, 0 replies; 26+ messages in thread
From: Bruce Allen @ 2004-06-04 20:09 UTC (permalink / raw)
To: Mario 'BitKoenig' Holbe; +Cc: linux-kernel, Henrik Persson, Sebastian
> hmmm, it seems I solved the issue in my case.
>
> I did just connect the two disks on the Promise to a second separate
> power supply and everything works rock-stable and survives all things
>
> <SNIP>
>
> Bruce: this is probably something to hint for at the smartmontools
> warning page.
Done.
^ permalink raw reply [flat|nested] 26+ messages in thread
end of thread, other threads:[~2004-06-04 20:09 UTC | newest]
Thread overview: 26+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2004-03-06 19:47 Strange DMA-errors and system hang with Promise 20268 Henrik Persson
2004-03-06 19:55 ` Henrik Persson
2004-03-07 1:05 ` Mario 'BitKoenig' Holbe
2004-03-08 13:30 ` Henrik Persson
2004-03-10 11:50 ` Bruce Allen
2004-03-10 12:36 ` Mario 'BitKoenig' Holbe
2004-03-10 15:00 ` Henrik Persson
2004-03-11 9:36 ` Bruce Allen
2004-03-11 14:31 ` Henrik Persson
2004-03-10 15:41 ` Mario 'BitKoenig' Holbe
2004-03-11 9:25 ` Bruce Allen
2004-05-19 17:20 ` Sebastian
2004-05-19 17:28 ` Mario 'BitKoenig' Holbe
2004-05-19 18:12 ` Sebastian
[not found] ` <1648.128.150.143.219.1084992082.squirrel@webmail.seven4sky.com>
2004-05-19 20:12 ` Strange DMA-errors... (was: ...and system hang with Promise 20268) Sebastian
2004-05-19 23:47 ` Mario 'BitKoenig' Holbe
2004-05-20 9:23 ` Strange DMA-errors and system hang with Promise 20268 Bruce Allen
2004-05-20 10:35 ` Strange DMA-errors and system hang with SMART (was: ...and system hang with Promise 20268) Sebastian
2004-05-23 12:46 ` Bruce Allen
2004-06-02 19:00 ` Sebastian
2004-06-03 15:06 ` Bruce Allen
2004-05-29 14:42 ` Strange DMA-errors and system hang with Promise 20268 Mario 'BitKoenig' Holbe
2004-05-29 22:51 ` Gene Heskett
2004-05-30 10:41 ` Henrik Persson
2004-06-04 20:09 ` Bruce Allen
[not found] ` <200405052339.i45NdXsx003369@darkside.22.kls.lan>
[not found] ` <1083849053.6994.10.camel@vega>
2004-05-06 14:22 ` Mario 'BitKoenig' Holbe
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).