LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* doubled idle count
@ 2008-09-18 12:26 Ferenc Wagner
  2008-09-18 19:34 ` Jeremy Fitzhardinge
  0 siblings, 1 reply; 7+ messages in thread
From: Ferenc Wagner @ 2008-09-18 12:26 UTC (permalink / raw)
  To: linux-kernel

Hi,

After upgrading a Xen virtual machine to Debian's 2.6.26-4 kernel, I
noticed that the idle counter doubled its pace on one of the machines:

service2:~# yes >/dev/null &
[1] 32578
service2:~# grep cpu0 /proc/stat; sleep 1; grep cpu0 /proc/stat
cpu0 141208 9113 57273 13379659 61012 0 792 2350 0
cpu0 141310 9113 57274 13379659 61012 0 792 2350 0
service2:~# fg
yes > /dev/null
^C
service2:~# grep cpu0 /proc/stat; sleep 1; grep cpu0 /proc/stat
cpu0 141952 9113 57277 13383481 61012 0 792 2350 0
cpu0 141953 9113 57278 13383681 61012 0 792 2350 0

One out of three machines show this effect, with the exact same kernel
and Xen versions (3.2.0, dom0 is Debian's stock Etch 2.6.18 kernel).
They aren't hosted by the same machine, though: the misbehaving one is
on a different installation with very similar hardware (3 vs 2 GHz).
All the guest are paravirtual.

service2:~# cat /proc/cpuinfo 
processor	: 0
vendor_id	: GenuineIntel
cpu family	: 15
model		: 2
model name	: Intel(R) Xeon(TM) CPU 3.06GHz
stepping	: 5
cpu MHz		: 3056.482
cache size	: 1024 KB
physical id	: 0
siblings	: 1
core id		: 0
cpu cores	: 1
apicid		: 1
initial apicid	: 1
fdiv_bug	: no
hlt_bug		: no
f00f_bug	: no
coma_bug	: no
fpu		: yes
fpu_exception	: yes
cpuid level	: 2
wp		: yes
flags		: fpu de tsc msr pae cx8 cmov pat pse36 clflush dts mmx fxsr sse sse2 ss ht pbe up pebs bts cid xtpr
bogomips	: 6128.11
clflush size	: 64
power management:

I don't know what else would be useful, please ask.
-- 
Thanks,
Feri.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: doubled idle count
  2008-09-18 12:26 doubled idle count Ferenc Wagner
@ 2008-09-18 19:34 ` Jeremy Fitzhardinge
  2008-09-18 21:24   ` Ferenc Wagner
                     ` (4 more replies)
  0 siblings, 5 replies; 7+ messages in thread
From: Jeremy Fitzhardinge @ 2008-09-18 19:34 UTC (permalink / raw)
  To: Ferenc Wagner; +Cc: linux-kernel, Xen-devel

Ferenc Wagner wrote:
> Hi,
>
> After upgrading a Xen virtual machine to Debian's 2.6.26-4 kernel, I
> noticed that the idle counter doubled its pace on one of the machines:
>
> service2:~# yes >/dev/null &
> [1] 32578
> service2:~# grep cpu0 /proc/stat; sleep 1; grep cpu0 /proc/stat
> cpu0 141208 9113 57273 13379659 61012 0 792 2350 0
> cpu0 141310 9113 57274 13379659 61012 0 792 2350 0
> service2:~# fg
> yes > /dev/null
> ^C
> service2:~# grep cpu0 /proc/stat; sleep 1; grep cpu0 /proc/stat
> cpu0 141952 9113 57277 13383481 61012 0 792 2350 0
> cpu0 141953 9113 57278 13383681 61012 0 792 2350 0
>   

Sorry, could you be clearer here?  What are we looking at?

> One out of three machines show this effect, with the exact same kernel
> and Xen versions (3.2.0, dom0 is Debian's stock Etch 2.6.18 kernel).
> They aren't hosted by the same machine, though: the misbehaving one is
> on a different installation with very similar hardware (3 vs 2 GHz).
> All the guest are paravirtual.
>   

So you're saying that they are identical Xen and guest kernel binaries,
but one of three is showing doubled idle time?

That seems unlikely.  The source of that time is from Xen itself, and I
think it should be hardware independent, though I guess its possible
there's something going on at in the Xen-level timekeeping.

That said, I think there's some chance that stolen time may get counted
as idle time.  Does the one machine with a different outcome have
something else running in another virtual machine (including dom0)?

    J

> service2:~# cat /proc/cpuinfo 
> processor	: 0
> vendor_id	: GenuineIntel
> cpu family	: 15
> model		: 2
> model name	: Intel(R) Xeon(TM) CPU 3.06GHz
> stepping	: 5
> cpu MHz		: 3056.482
> cache size	: 1024 KB
> physical id	: 0
> siblings	: 1
> core id		: 0
> cpu cores	: 1
> apicid		: 1
> initial apicid	: 1
> fdiv_bug	: no
> hlt_bug		: no
> f00f_bug	: no
> coma_bug	: no
> fpu		: yes
> fpu_exception	: yes
> cpuid level	: 2
> wp		: yes
> flags		: fpu de tsc msr pae cx8 cmov pat pse36 clflush dts mmx fxsr sse sse2 ss ht pbe up pebs bts cid xtpr
> bogomips	: 6128.11
> clflush size	: 64
> power management:
>
> I don't know what else would be useful, please ask.
>   


^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: doubled idle count
  2008-09-18 19:34 ` Jeremy Fitzhardinge
@ 2008-09-18 21:24   ` Ferenc Wagner
  2008-09-19 13:20   ` Ferenc Wagner
                     ` (3 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Ferenc Wagner @ 2008-09-18 21:24 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: linux-kernel, Xen-devel

Jeremy Fitzhardinge <jeremy@goop.org> writes:

> Ferenc Wagner wrote:
>
>> After upgrading a Xen virtual machine to Debian's 2.6.26-4 kernel, I
>> noticed that the idle counter doubled its pace on one of the machines:
>
> Sorry, could you be clearer here?  What are we looking at?

Sure, see below:

>> service2:~# yes >/dev/null &
>> [1] 32578

Here I made the processor (a single virtual one) fully busy.

>> service2:~# grep cpu0 /proc/stat; sleep 1; grep cpu0 /proc/stat
>> cpu0 141208 9113 57273 13379659 61012 0 792 2350 0
>> cpu0 141310 9113 57274 13379659 61012 0 792 2350 0

Above, the difference of the first numbers is 102, which, given that
USER_HZ=100, means that the CPU spent all its cycles during the sleep
processing user code (the yes running in the background).

>> service2:~# fg
>> yes > /dev/null
>> ^C

Now the CPU is idle again, I killed the yes started above.

>> service2:~# grep cpu0 /proc/stat; sleep 1; grep cpu0 /proc/stat
>> cpu0 141952 9113 57277 13383481 61012 0 792 2350 0
>> cpu0 141953 9113 57278 13383681 61012 0 792 2350 0

And here, the difference of the fourth numbers is 200, meaning that
the processor spent 200% of its time in idle state during this second!
(If I read the procfs documentation correctly, of course.)

This seems wrong by a factor of two, as there are only 100 "ticks" in
a second (actually, this kernel is tickless, but USER_HZ=100, as I'm
running a 686 kernel).

>> One out of three machines show this effect, with the exact same kernel
>> and Xen versions (3.2.0, dom0 is Debian's stock Etch 2.6.18 kernel).
>> They aren't hosted by the same machine, though: the misbehaving one is
>> on a different installation with very similar hardware (3 vs 2 GHz).
>> All the guest are paravirtual.
>
> So you're saying that they are identical Xen and guest kernel binaries,
> but one of three is showing doubled idle time?

Yes.

> That seems unlikely.

I was very much surpised myself, too...  The version numbers surely
are the same, but the binaries came from different downloads.  I'll
compare them, and also start another domU next to the misbehaving one.

> The source of that time is from Xen itself, and I think it should be
> hardware independent, though I guess its possible there's something
> going on at in the Xen-level timekeeping.
>
> That said, I think there's some chance that stolen time may get counted
> as idle time.  Does the one machine with a different outcome have
> something else running in another virtual machine (including dom0)?

Yes, both Xen instances run other domUs, and at abount one on both
consumes significant CPU.  The other domUs are mostly idle, and the
dom0s too.
-- 
Thanks for taking time,
Feri.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: doubled idle count
  2008-09-18 19:34 ` Jeremy Fitzhardinge
  2008-09-18 21:24   ` Ferenc Wagner
@ 2008-09-19 13:20   ` Ferenc Wagner
  2008-09-21 12:05   ` Ferenc Wagner
                     ` (2 subsequent siblings)
  4 siblings, 0 replies; 7+ messages in thread
From: Ferenc Wagner @ 2008-09-19 13:20 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: linux-kernel, Xen-devel

Jeremy Fitzhardinge <jeremy@goop.org> writes:

> Ferenc Wagner wrote:
>
>> After upgrading a Xen virtual machine to Debian's 2.6.26-4 kernel, I
>> noticed that the idle counter doubled its pace on one of the machines:
>> [...]
>> One out of three machines show this effect, with the exact same kernel
>> and Xen versions (3.2.0, dom0 is Debian's stock Etch 2.6.18 kernel).
>> They aren't hosted by the same machine, though: the misbehaving one is
>> on a different installation with very similar hardware (3 vs 2 GHz).
>> All the guest are paravirtual.
>
> So you're saying that they are identical Xen and guest kernel binaries,
> but one of three is showing doubled idle time?

Now I upgraded another domU, and that also shows this doubling effect,
so I've got two domUs (running on xen2-ha) misbehaving, and other two
(running on xen2) behaving correctly.  On the first:

wferi@xen2-ha:~$ sudo xm info
host                   : xen2-ha
release                : 2.6.18-6-xen-686
version                : #1 SMP Mon Aug 18 12:56:50 UTC 2008
machine                : i686
nr_cpus                : 4
nr_nodes               : 1
cores_per_socket       : 1
threads_per_core       : 2
cpu_mhz                : 3056
hw_caps                : bfebfbff:00000000:00000000:00000080:00004400
total_memory           : 3071
free_memory            : 991
node_to_cpu            : node0:0-3
xen_major              : 3
xen_minor              : 2
xen_extra              : -1
xen_caps               : xen-3.0-x86_32p 
xen_scheduler          : credit
xen_pagesize           : 4096
platform_params        : virt_start=0xf5800000
xen_changeset          : unavailable
cc_compiler            : gcc version 4.1.2 20061115 (prerelease) (Debian 4.1.1-21)
cc_compile_by          : fs
cc_compile_domain      : debian.org
cc_compile_date        : Mon Mar 10 15:50:27 UTC 2008
xend_config_format     : 4

while on the other the differing lines are:

host                   : xen2
cpu_mhz                : 1993
total_memory           : 4991
free_memory            : 2464

Hope this adds some useful info.
-- 
Regards,
Feri.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: doubled idle count
  2008-09-18 19:34 ` Jeremy Fitzhardinge
  2008-09-18 21:24   ` Ferenc Wagner
  2008-09-19 13:20   ` Ferenc Wagner
@ 2008-09-21 12:05   ` Ferenc Wagner
  2008-09-21 12:27   ` Ferenc Wagner
  2008-11-07 15:20   ` Ferenc Wagner
  4 siblings, 0 replies; 7+ messages in thread
From: Ferenc Wagner @ 2008-09-21 12:05 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: linux-kernel, Xen-devel

Ferenc Wagner <wferi@niif.hu> writes:

> Jeremy Fitzhardinge <jeremy@goop.org> writes:
> --text follows this line--
>> Ferenc Wagner wrote:
>>
>>> After upgrading a Xen virtual machine to Debian's 2.6.26-4 kernel, I
>>> noticed that the idle counter doubled its pace on one of the machines:
>>> [...]
>>> One out of three machines show this effect, with the exact same kernel
>>> and Xen versions (3.2.0, dom0 is Debian's stock Etch 2.6.18 kernel).
>>> They aren't hosted by the same machine, though: the misbehaving one is
>>> on a different installation with very similar hardware (3 vs 2 GHz).
>>> All the guest are paravirtual.
>>
>> So you're saying that they are identical Xen and guest kernel binaries,
>> but one of three is showing doubled idle time?
>
> Now I upgraded another domU, and that also shows this doubling effect,
> so I've got two domUs (running on xen2-ha) misbehaving, and other two
> (running on xen2) behaving correctly.

Possibly related: three out of the above four 2.6.26 domUs froze
yesterday afternoon.  The consoles don't show anything and are dead,
they don't react to sysrq, even.  According to xm list, the frozen
domains are constantly running (stuck in r state).

I haven't destroyed them yet, in case it is possible to extract any
info from the system.  If not, please tell me so I can restart them.

On the other hand, I'll have a look whether the same kernel running on
the bare metal produces this idle count discrepancy.
-- 
Thanks,
Feri.

Ps: Please note that xen-devel swallows my mails without a trace,
    probably because I'm not subscribed.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: doubled idle count
  2008-09-18 19:34 ` Jeremy Fitzhardinge
                     ` (2 preceding siblings ...)
  2008-09-21 12:05   ` Ferenc Wagner
@ 2008-09-21 12:27   ` Ferenc Wagner
  2008-11-07 15:20   ` Ferenc Wagner
  4 siblings, 0 replies; 7+ messages in thread
From: Ferenc Wagner @ 2008-09-21 12:27 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: linux-kernel, Xen-devel

Ferenc Wagner <wferi@niif.hu> writes:

> Jeremy Fitzhardinge <jeremy@goop.org> writes:
> --text follows this line--
>> Ferenc Wagner wrote:
>>
>>> After upgrading a Xen virtual machine to Debian's 2.6.26-4 kernel, I
>>> noticed that the idle counter doubled its pace on one of the machines:
>>> [...]
>>> One out of three machines show this effect, with the exact same kernel
>>> and Xen versions (3.2.0, dom0 is Debian's stock Etch 2.6.18 kernel).
>>> They aren't hosted by the same machine, though: the misbehaving one is
>>> on a different installation with very similar hardware (3 vs 2 GHz).
>>> All the guest are paravirtual.
>>
>> So you're saying that they are identical Xen and guest kernel binaries,
>> but one of three is showing doubled idle time?
>
> Now I upgraded another domU, and that also shows this doubling effect,
> so I've got two domUs (running on xen2-ha) misbehaving, and other two
> (running on xen2) behaving correctly.

Also, top is not confused, it shows <100% idle times.  Maybe it's
allright, or maybe top simply subtracts all the rest from 100...
-- 
Feri.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: doubled idle count
  2008-09-18 19:34 ` Jeremy Fitzhardinge
                     ` (3 preceding siblings ...)
  2008-09-21 12:27   ` Ferenc Wagner
@ 2008-11-07 15:20   ` Ferenc Wagner
  4 siblings, 0 replies; 7+ messages in thread
From: Ferenc Wagner @ 2008-11-07 15:20 UTC (permalink / raw)
  To: Jeremy Fitzhardinge; +Cc: linux-kernel, Xen-devel

[-- Attachment #1: Type: text/plain, Size: 517 bytes --]

Jeremy Fitzhardinge <jeremy@goop.org> writes:

> Ferenc Wagner wrote:
>
>> After upgrading a Xen virtual machine to Debian's 2.6.26-4 kernel, I
>> noticed that the idle counter doubled its pace on one of the machines:
>> [command line stuff elided, graph follows]
>
> So you're saying that they are identical Xen and guest kernel binaries,
> but one of three is showing doubled idle time?

I've affirmed this in detail on Sept 19.  Now I'm trying to resurrect
this thread, maybe some visual stimulation will help...


[-- Attachment #2: domU CPU graph of the last week --]
[-- Type: image/png, Size: 24571 bytes --]

[-- Attachment #3: Type: text/plain, Size: 2239 bytes --]


Consider the above CPU graph (courtesy of Munin, tweaked to accept
values above 100%).  This virtual machine (Linux 2.6.26.6 + migration
patches from 2.6.27) can run on two identically configured hosts
running the 3.2.1 hypervisor and a Linux 2.6.18 dom0 kernel.

Nov 2 15:22 (rising edge): domU shut down and recreated with
    clocksource=jiffies.

Nov 3 16:01 (nothing to see): domU shut down again and recreated
    without specifying clocksource on the kernel command line.  It
    selected the xen clocksource.

Nov 6 17:22 (falling edge): domU live migrated to the other host.

So live migration alone shaked the idle figures back in shape.

I really wonder if anybody has some insight into this anomaly.  In
itself it isn't a problem, but I've got other issues (some timing
related, like serial overruns, and also some dom0 crashes), so I'd
like to get rid of anything suspicious.

Regards,
Feri.

xm info on the two dom0-s:

host                   : xen2-ha
release                : 2.6.18-6-xen-686
version                : #1 SMP Mon Oct 13 20:36:55 UTC 2008
machine                : i686
nr_cpus                : 4
nr_nodes               : 1
cores_per_socket       : 1
threads_per_core       : 2
cpu_mhz                : 3056
hw_caps                : bfebfbff:00000000:00000000:00000080:00004400
total_memory           : 3071
free_memory            : 2781
node_to_cpu            : node0:0-3
xen_major              : 3
xen_minor              : 2
xen_extra              : -1
xen_caps               : xen-3.0-x86_32p 
xen_scheduler          : credit
xen_pagesize           : 4096
platform_params        : virt_start=0xf5800000
xen_changeset          : unavailable
cc_compiler            : gcc version 4.3.1 (Debian 4.3.1-2) 
cc_compile_by          : waldi
cc_compile_domain      : debian.org
cc_compile_date        : Sat Jun 28 15:25:00 UTC 2008
xend_config_format     : 4

domU config:

name = "noc.grid"
kernel = "/home/wferi/vmlinuz-2.6.26-1-686-bigmem"
ramdisk = "/home/wferi/initrd.img-2.6.26-1-686-bigmem"
memory = 512
vif = [ 'mac=00:16:3e:00:00:13, bridge=br894',
	'mac=00:16:3e:00:00:30, bridge=br897' ]
root = "/dev/mapper/noc-root ro"
extra = ""
on_poweroff = 'destroy'
on_reboot   = 'restart'
on_crash    = 'restart'

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2008-11-07 16:09 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2008-09-18 12:26 doubled idle count Ferenc Wagner
2008-09-18 19:34 ` Jeremy Fitzhardinge
2008-09-18 21:24   ` Ferenc Wagner
2008-09-19 13:20   ` Ferenc Wagner
2008-09-21 12:05   ` Ferenc Wagner
2008-09-21 12:27   ` Ferenc Wagner
2008-11-07 15:20   ` Ferenc Wagner

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).