LKML Archive on lore.kernel.org help / color / mirror / Atom feed
* [PATCH] correct inconsistent ntp interval/tick_length usage @ 2008-01-24 2:38 john stultz 2008-01-24 9:14 ` Valdis.Kletnieks 2008-01-25 14:07 ` Roman Zippel 0 siblings, 2 replies; 36+ messages in thread From: john stultz @ 2008-01-24 2:38 UTC (permalink / raw) To: lkml, Andrew Morton; +Cc: Ingo Molnar, Steven Rostedt, Roman Zippel I recently noticed on one of my boxes that when synched with an NTP server, the drift value reported for the system was ~283ppm. While in some cases, clock hardware can be that bad, it struck me as unusual as the system was using the acpi_pm clocksource, which is one of the more trustworthy and accurate clocksources on x86 hardware. I brought up another system and let it sync to the same NTP server, and I noticed a similar 280some ppm drift. In looking at the code, I found that the acpi_pm's constant frequency was being computed correctly at boot-up, however once the system was up, even without the ntp daemon running, the clocksource's frequency was being modified by the clocksource_adjust() function. Digging deeper, I realized that in the code that keeps track of how much the clocksource is skewing from the ntp desired time, we were using different lengths to establish how long an time interval was. The clocksource was being setup with the following interval: NTP_INTERVAL_LENGTH = NSEC_PER_SEC/NTP_INTERVAL_FREQ While the ntp code was using the tick_length_base value: tick_length_base ~= (tick_usec * NSEC_PER_USEC * USER_HZ) /NTP_INTERVAL_FREQ The subtle difference is: (tick_usec * NSEC_PER_USEC * USER_HZ) != NSEC_PER_SEC This difference in calculation was causing the clocksource correction code to apply a correction factor to the clocksource so the two intervals were the same, however this results in the actual frequency of the clocksource to be made incorrect. I believe this difference would affect all clocksources, although to differing degrees depending on the clocksource resolution. The issue was introduced when my HZ free ntp patch landed in 2.6.21-rc1, so my apologies for the mistake, and for not noticing it until now. The following patch, corrects the clocksource's initialization code so it uses the same interval length as the code in ntp.c. After applying this patch, the drift value for the same system went from ~283ppm to only 2.635ppm. I believe this patch to be good, however it does affect all arches and I've only tested on x86, so some caution is advised. I do think it would be a likely candidate for a stable 2.6.24.x release. Any thoughts or feedback would be appreciated. Signed-off-by: John Stultz <johnstul@us.ibm.com> Index: linux/kernel/time/timekeeping.c =================================================================== --- linux.orig/kernel/time/timekeeping.c +++ linux/kernel/time/timekeeping.c @@ -208,7 +208,8 @@ static void change_clocksource(void) clock->error = 0; clock->xtime_nsec = 0; - clocksource_calculate_interval(clock, NTP_INTERVAL_LENGTH); + clocksource_calculate_interval(clock, + (unsigned long)(current_tick_length()>>TICK_LENGTH_SHIFT)); tick_clock_notify(); @@ -265,7 +266,8 @@ void __init timekeeping_init(void) ntp_clear(); clock = clocksource_get_next(); - clocksource_calculate_interval(clock, NTP_INTERVAL_LENGTH); + clocksource_calculate_interval(clock, + (unsigned long)(current_tick_length()>>TICK_LENGTH_SHIFT)); clock->cycle_last = clocksource_read(clock); xtime.tv_sec = sec; ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] correct inconsistent ntp interval/tick_length usage 2008-01-24 2:38 [PATCH] correct inconsistent ntp interval/tick_length usage john stultz @ 2008-01-24 9:14 ` Valdis.Kletnieks 2008-01-25 14:07 ` Roman Zippel 1 sibling, 0 replies; 36+ messages in thread From: Valdis.Kletnieks @ 2008-01-24 9:14 UTC (permalink / raw) To: john stultz Cc: lkml, Andrew Morton, Ingo Molnar, Steven Rostedt, Roman Zippel [-- Attachment #1: Type: text/plain, Size: 862 bytes --] On Wed, 23 Jan 2008 18:38:53 PST, john stultz said: > I recently noticed on one of my boxes that when synched with an NTP > server, the drift value reported for the system was ~283ppm. While in > some cases, clock hardware can be that bad, it struck me as unusual as > the system was using the acpi_pm clocksource, which is one of the more > trustworthy and accurate clocksources on x86 hardware. > > I brought up another system and let it sync to the same NTP server, and > I noticed a similar 280some ppm drift. So I got curious, and dropped the patch onto my workstation - 24-rc8-mm1 x86_64 and it applied just fine with an offset reported. Before the patch, the drift was reported as 140.67 or so. After, it's sitting at -0.681. Am running with hpet clocksource rather than acpi_pm, which probably explains the 140 I see rather than the 280 John saw.... [-- Attachment #2: Type: application/pgp-signature, Size: 226 bytes --] ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] correct inconsistent ntp interval/tick_length usage 2008-01-24 2:38 [PATCH] correct inconsistent ntp interval/tick_length usage john stultz 2008-01-24 9:14 ` Valdis.Kletnieks @ 2008-01-25 14:07 ` Roman Zippel 2008-01-29 2:28 ` john stultz 1 sibling, 1 reply; 36+ messages in thread From: Roman Zippel @ 2008-01-25 14:07 UTC (permalink / raw) To: john stultz; +Cc: lkml, Andrew Morton, Ingo Molnar, Steven Rostedt Hi, On Wed, 23 Jan 2008, john stultz wrote: > This difference in calculation was causing the clocksource correction > code to apply a correction factor to the clocksource so the two > intervals were the same, however this results in the actual frequency of > the clocksource to be made incorrect. I believe this difference would > affect all clocksources, although to differing degrees depending on the > clocksource resolution. Let's look at why the correction is done in first place. The update steps don't add up precisely to 1sec (LATCH*HZ != CLOCK_TICK_RATE), so a small addjustment is used to make up for it. The problem here is that if the update frequency changes, the addjustment isn't correct anymore. The simple fix is to just omit the addjustment in these cases in ntp.c: #if NTP_INTERVAL_FREQ == HZ ... #else #define CLOCK_TICK_ADJUST 0 #endif bye, Roman ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] correct inconsistent ntp interval/tick_length usage 2008-01-25 14:07 ` Roman Zippel @ 2008-01-29 2:28 ` john stultz 2008-01-29 4:02 ` Roman Zippel 0 siblings, 1 reply; 36+ messages in thread From: john stultz @ 2008-01-29 2:28 UTC (permalink / raw) To: Roman Zippel; +Cc: lkml, Andrew Morton, Ingo Molnar, Steven Rostedt On Fri, 2008-01-25 at 15:07 +0100, Roman Zippel wrote: > Hi, > > On Wed, 23 Jan 2008, john stultz wrote: > > > This difference in calculation was causing the clocksource correction > > code to apply a correction factor to the clocksource so the two > > intervals were the same, however this results in the actual frequency of > > the clocksource to be made incorrect. I believe this difference would > > affect all clocksources, although to differing degrees depending on the > > clocksource resolution. > > Let's look at why the correction is done in first place. The update steps > don't add up precisely to 1sec (LATCH*HZ != CLOCK_TICK_RATE), so a small > addjustment is used to make up for it. The problem here is that if the > update frequency changes, the addjustment isn't correct anymore. > The simple fix is to just omit the addjustment in these cases in ntp.c: > > #if NTP_INTERVAL_FREQ == HZ > ... > #else > #define CLOCK_TICK_ADJUST 0 > #endif Hmmm, although this doesn't explain why the issue is seen when NTP_INTERVAL_FREQ == HZ (as it is in my system's case). Or am I missing something? Regardless, current_tick_length() really is the base interval we're using in the error accumulation loop, so it seems the cleanest interface to use (just to avoid redundancy at least) when establishing the clocksource's interval length. Or do you not agree? thanks -john ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] correct inconsistent ntp interval/tick_length usage 2008-01-29 2:28 ` john stultz @ 2008-01-29 4:02 ` Roman Zippel 2008-01-30 2:14 ` john stultz 0 siblings, 1 reply; 36+ messages in thread From: Roman Zippel @ 2008-01-29 4:02 UTC (permalink / raw) To: john stultz; +Cc: lkml, Andrew Morton, Ingo Molnar, Steven Rostedt Hi, On Mon, 28 Jan 2008, john stultz wrote: > Regardless, current_tick_length() really is the base interval we're > using in the error accumulation loop, so it seems the cleanest interface > to use (just to avoid redundancy at least) when establishing the > clocksource's interval length. Or do you not agree? I see, what you need to use in timex.h for !CONFIG_NO_HZ is: #define NTP_INTERVAL_LENGTH ((s64)LATCH * NSEC_PER_SEC) / (s64)CLOCK_TICK_RATE) this calculates the base length of a clock tick in nsec. current_tick_length() would only work during boot. If you switch clocks later, it would include random adjustments specific to the old clock. bye, Roman ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] correct inconsistent ntp interval/tick_length usage 2008-01-29 4:02 ` Roman Zippel @ 2008-01-30 2:14 ` john stultz 2008-01-31 1:55 ` Roman Zippel 0 siblings, 1 reply; 36+ messages in thread From: john stultz @ 2008-01-30 2:14 UTC (permalink / raw) To: Roman Zippel; +Cc: lkml, Andrew Morton, Ingo Molnar, Steven Rostedt On Tue, 2008-01-29 at 05:02 +0100, Roman Zippel wrote: > Hi, > > On Mon, 28 Jan 2008, john stultz wrote: > > > Regardless, current_tick_length() really is the base interval we're > > using in the error accumulation loop, so it seems the cleanest interface > > to use (just to avoid redundancy at least) when establishing the > > clocksource's interval length. Or do you not agree? > > I see, what you need to use in timex.h for !CONFIG_NO_HZ is: > > #define NTP_INTERVAL_LENGTH ((s64)LATCH * NSEC_PER_SEC) / (s64)CLOCK_TICK_RATE) > > this calculates the base length of a clock tick in nsec. > > current_tick_length() would only work during boot. If you switch clocks > later, it would include random adjustments specific to the old clock. Ah. Good point. How about the following, tested on x86_64 both with and without CONFIG_NO_HZ? Thanks for the review and feedback! -john Correct NTP drift caused by using inconsistent NTP_INTERVAL_LENGTHs in clocksource initialization and error accumulation. This corrects a 280ppm drift seen on some systems using acpi_pm, and affects other clocksources as well (likely to a lesser degree). Signed-off-by: John Stultz <johnstul@us.ibm.com> Index: linux/include/linux/timex.h =================================================================== --- linux.orig/include/linux/timex.h +++ linux/include/linux/timex.h @@ -231,7 +231,13 @@ static inline int ntp_synced(void) #else #define NTP_INTERVAL_FREQ (HZ) #endif -#define NTP_INTERVAL_LENGTH (NSEC_PER_SEC/NTP_INTERVAL_FREQ) + +#define CLOCK_TICK_OVERFLOW (LATCH * HZ - CLOCK_TICK_RATE) +#define CLOCK_TICK_ADJUST (((s64)CLOCK_TICK_OVERFLOW * NSEC_PER_SEC) / \ + (s64)CLOCK_TICK_RATE) + +/* Because using NSEC_PER_SEC would be too easy */ +#define NTP_INTERVAL_LENGTH ((((s64)TICK_USEC*NSEC_PER_USEC*USER_HZ)+CLOCK_TICK_ADJUST)/NTP_INTERVAL_FREQ) /* Returns how long ticks are at present, in ns / 2^(SHIFT_SCALE-10). */ extern u64 current_tick_length(void); Index: linux/kernel/time/ntp.c =================================================================== --- linux.orig/kernel/time/ntp.c +++ linux/kernel/time/ntp.c @@ -43,10 +43,6 @@ long time_freq; /* frequency offset ( static long time_reftime; /* time at last adjustment (s) */ long time_adjust; -#define CLOCK_TICK_OVERFLOW (LATCH * HZ - CLOCK_TICK_RATE) -#define CLOCK_TICK_ADJUST (((s64)CLOCK_TICK_OVERFLOW * NSEC_PER_SEC) / \ - (s64)CLOCK_TICK_RATE) - static void ntp_update_frequency(void) { u64 second_length = (u64)(tick_usec * NSEC_PER_USEC * USER_HZ) ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] correct inconsistent ntp interval/tick_length usage 2008-01-30 2:14 ` john stultz @ 2008-01-31 1:55 ` Roman Zippel 2008-01-31 2:16 ` john stultz 0 siblings, 1 reply; 36+ messages in thread From: Roman Zippel @ 2008-01-31 1:55 UTC (permalink / raw) To: john stultz; +Cc: lkml, Andrew Morton, Ingo Molnar, Steven Rostedt Hi, On Tue, 29 Jan 2008, john stultz wrote: > +/* Because using NSEC_PER_SEC would be too easy */ > +#define NTP_INTERVAL_LENGTH ((((s64)TICK_USEC*NSEC_PER_USEC*USER_HZ)+CLOCK_TICK_ADJUST)/NTP_INTERVAL_FREQ) Why are you using USER_HZ? Did you test this with HZ!=100? Anyway, please don't make more complicated than it already is. What I said previously about the update interval is still valid, so the correct solution is to use the simpler NTP_INTERVAL_LENGTH calculation from my last mail and to omit the correction for NO_HZ. bye, Roman ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] correct inconsistent ntp interval/tick_length usage 2008-01-31 1:55 ` Roman Zippel @ 2008-01-31 2:16 ` john stultz 2008-01-31 5:02 ` Roman Zippel 0 siblings, 1 reply; 36+ messages in thread From: john stultz @ 2008-01-31 2:16 UTC (permalink / raw) To: Roman Zippel; +Cc: lkml, Andrew Morton, Ingo Molnar, Steven Rostedt On Thu, 2008-01-31 at 02:55 +0100, Roman Zippel wrote: > Hi, > > On Tue, 29 Jan 2008, john stultz wrote: > > > +/* Because using NSEC_PER_SEC would be too easy */ > > +#define NTP_INTERVAL_LENGTH ((((s64)TICK_USEC*NSEC_PER_USEC*USER_HZ)+CLOCK_TICK_ADJUST)/NTP_INTERVAL_FREQ) > > Why are you using USER_HZ? Did you test this with HZ!=100? I'm using USER_HZ, because ntp_update_frequency() uses USER_HZ. And my tests were run at HZ=1000. > Anyway, please don't make more complicated than it already is. > What I said previously about the update interval is still valid, so the > correct solution is to use the simpler NTP_INTERVAL_LENGTH calculation > from my last mail and to omit the correction for NO_HZ. I'm not sure I follow how having two separate and totally different definitions of NTP_INTERVAL_LENGTH is less complicated then my last patch. Maybe I'm missing something from your suggestion? Or maybe could you contribute your suggestion as a patch? My concern is we state the accumulation interval is X ns long. Then current_tick_length() is to return (X + ntp_adjustment), so each accumulation interval we can keep track of the error and adjust our interval length. So if ntp_update_frequency() sets tick_length_base to be: u64 second_length = (u64)(tick_usec * NSEC_PER_USEC * USER_HZ) << TICK_LENGTH_SHIFT; second_length += (s64)CLOCK_TICK_ADJUST << TICK_LENGTH_SHIFT; second_length += (s64)time_freq << (TICK_LENGTH_SHIFT - SHIFT_NSEC); tick_length_base = second_length; do_div(tick_length_base, NTP_INTERVAL_FREQ); The above is basically (X + part of ntp_adjustment) So I'm trying to calculate the X as: #define NTP_INTERVAL_LENGTH ((((s64)TICK_USEC*NSEC_PER_USEC*USER_HZ)+CLOCK_TICK_ADJUST)/NTP_INTERVAL_FREQ) thanks -john ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] correct inconsistent ntp interval/tick_length usage 2008-01-31 2:16 ` john stultz @ 2008-01-31 5:02 ` Roman Zippel 2008-02-02 1:02 ` John Stultz 0 siblings, 1 reply; 36+ messages in thread From: Roman Zippel @ 2008-01-31 5:02 UTC (permalink / raw) To: john stultz; +Cc: lkml, Andrew Morton, Ingo Molnar, Steven Rostedt Hi, On Wed, 30 Jan 2008, john stultz wrote: > My concern is we state the accumulation interval is X ns long. Then > current_tick_length() is to return (X + ntp_adjustment), so each > accumulation interval we can keep track of the error and adjust our > interval length. > > So if ntp_update_frequency() sets tick_length_base to be: > > u64 second_length = (u64)(tick_usec * NSEC_PER_USEC * USER_HZ) > << TICK_LENGTH_SHIFT; > second_length += (s64)CLOCK_TICK_ADJUST << TICK_LENGTH_SHIFT; > second_length += (s64)time_freq > << (TICK_LENGTH_SHIFT - SHIFT_NSEC); > > tick_length_base = second_length; > do_div(tick_length_base, NTP_INTERVAL_FREQ); > > > The above is basically (X + part of ntp_adjustment) CLOCK_TICK_ADJUST is based on LATCH and HZ, if the update frequency isn't based on HZ, there is no point in using it! Let's look at what actually needs to be done: 1. initializing clock interval: clock_cycle_interval = timer_cycle_interval * clock_frequency / timer_frequency It's simply about converting timer cycles into clock cycles, so they're about the same time interval. We already make it a bit more complicated than necessary as we go via nsec: ntp_interval = timer_cycle_interval * 10^9nsec / timer_frequency and in clocksource_calculate_interval() basically: clock_cycle_interval = ntp_interval * clock_frequency / 10^9nsec Without a fixed timer tick it's actually even easier, then we use the same frequency for clock and timer and the cycle interval is simply: clock_cycle_interval = timer_cycle_interval = clock_frequency / NTP_INTERVAL_FREQ There is no need to use the adjustment here, you'll only cause a mismatch between the clock and timer cycle interval, which had to be corrected by NTP. 2. initializing clock adjustment: clock_adjust = timer_cycle_interval * NTP_INTERVAL_FREQ / timer_frequency - 1sec This adjustment is used make up for the difference that the timer frequency isn't evenly divisible by HZ, so that the clock is advanced by 1sec after timer_frequency cycles. Like above the clock frequency is used for the timer frequency for this calculation for CONFIG_NO_HZ, so it would be incorrect to use CLOCK_TICK_RATE/LATCH/HZ here and since NTP_INTERVAL_FREQ is quite small the resulting adjustment would be rather small, it's easier not to bother in this case. What you're basically trying is to add an error to the clock initialization, so that we can later compensate for it. The correct solution is really to not add the error in first place, so that there is no need to compensate for it. bye. Roman ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] correct inconsistent ntp interval/tick_length usage 2008-01-31 5:02 ` Roman Zippel @ 2008-02-02 1:02 ` John Stultz 2008-02-08 17:33 ` Roman Zippel 0 siblings, 1 reply; 36+ messages in thread From: John Stultz @ 2008-02-02 1:02 UTC (permalink / raw) To: Roman Zippel; +Cc: lkml, Andrew Morton, Ingo Molnar, Steven Rostedt On Thu, 2008-01-31 at 06:02 +0100, Roman Zippel wrote: > Hi, > > On Wed, 30 Jan 2008, john stultz wrote: > > > My concern is we state the accumulation interval is X ns long. Then > > current_tick_length() is to return (X + ntp_adjustment), so each > > accumulation interval we can keep track of the error and adjust our > > interval length. > > > > So if ntp_update_frequency() sets tick_length_base to be: > > > > u64 second_length = (u64)(tick_usec * NSEC_PER_USEC * USER_HZ) > > << TICK_LENGTH_SHIFT; > > second_length += (s64)CLOCK_TICK_ADJUST << TICK_LENGTH_SHIFT; > > second_length += (s64)time_freq > > << (TICK_LENGTH_SHIFT - SHIFT_NSEC); > > > > tick_length_base = second_length; > > do_div(tick_length_base, NTP_INTERVAL_FREQ); > > > > > > The above is basically (X + part of ntp_adjustment) > > CLOCK_TICK_ADJUST is based on LATCH and HZ, if the update frequency isn't > based on HZ, there is no point in using it! Hey Roman, Again, I'm sorry I don't seem to be following your objections. If you want to suggest a different patch to fix the issue, it might help. The big issue for me, is that we have to initialize the clocksource cycle interval so it matches the base tick_length that NTP uses. To be clear, the issue I'm trying to address is only this: Assuming there is no NTP adjustment yet to be made, if we initialize the clocksource interval to X, then compare it with Y when we accumulate, we introduce error if X and Y are not the same. It really doesn't matter how long the length is, if we're including CLOCK_TICK_ADJUST, or if it really matches the actual HZ tick length or not. The issue is that we have to be consistent. If we're not, then we introduce error that ntpd has to additionally correct for. Now, once we're consistent, then CLOCK_TICK_ADJUST can be zero or not and it won't really matter, other then making sure we accumulate at exact tick length units. You can set it either way with my patch and things will still work out to be correct. My apologies for being so thick headed if I'm just missing something. I do appreciate the feedback. -john ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] correct inconsistent ntp interval/tick_length usage 2008-02-02 1:02 ` John Stultz @ 2008-02-08 17:33 ` Roman Zippel 2008-02-09 2:17 ` john stultz 0 siblings, 1 reply; 36+ messages in thread From: Roman Zippel @ 2008-02-08 17:33 UTC (permalink / raw) To: John Stultz; +Cc: lkml, Andrew Morton, Ingo Molnar, Steven Rostedt Hi, On Fri, 1 Feb 2008, John Stultz wrote: > > CLOCK_TICK_ADJUST is based on LATCH and HZ, if the update frequency isn't > > based on HZ, there is no point in using it! > > Hey Roman, > > Again, I'm sorry I don't seem to be following your objections. If you > want to suggest a different patch to fix the issue, it might help. I already gave you the necessary details for how to set NTP_INTERVAL_LENGTH and in the previous mail I explained the basis for it. I really don't understand what's your problem with it. Why do you try to make it more complex than necessary? > The big issue for me, is that we have to initialize the clocksource > cycle interval so it matches the base tick_length that NTP uses. > > To be clear, the issue I'm trying to address is only this: > Assuming there is no NTP adjustment yet to be made, if we initialize the > clocksource interval to X, then compare it with Y when we accumulate, we > introduce error if X and Y are not the same. > > It really doesn't matter how long the length is, if we're including > CLOCK_TICK_ADJUST, or if it really matches the actual HZ tick length or > not. The issue is that we have to be consistent. If we're not, then we > introduce error that ntpd has to additionally correct for. You don't create consistency by adding corrections all over the place until it adds up to the right sum. The current correction is already somewhat of a hack and I'd rather get rid of it than to let it spread all over the place (it's really only needed so that people with weird HZ settings don't hit the 500ppm limit and we're basically cheating to the ntpd by not telling it the real frequency). Please keep the knowledge about this crutch at a single place and don't spread it. Anyway, for NO_HZ this correction is completely irrelevant, so again there's no point in adding random values all over the place until you get the correct result. The only other alternative would be to calculate this correction dynamically. For this you leave NTP_INTERVAL_LENGTH as is and when changing clocks you check whether "abs(((cs->xtime_interval * NTP_INTERVAL_FREQ) >> cs->shift) - NSEC_PER_SEC)" exceeds a certain limit (e.g. 200usec) and in this case you print a warning message, that the clock has large base drift value and is a bad ntp source and apply a correction value. This way the correction only hits the very few system which might need it and it would be the prefered solution, but it also requires a few more changes. bye, Roman ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] correct inconsistent ntp interval/tick_length usage 2008-02-08 17:33 ` Roman Zippel @ 2008-02-09 2:17 ` john stultz 2008-02-09 4:47 ` Roman Zippel 2008-02-10 18:45 ` Roman Zippel 0 siblings, 2 replies; 36+ messages in thread From: john stultz @ 2008-02-09 2:17 UTC (permalink / raw) To: Roman Zippel; +Cc: lkml, Andrew Morton, Ingo Molnar, Steven Rostedt On Fri, 2008-02-08 at 18:33 +0100, Roman Zippel wrote: > Hi, > > On Fri, 1 Feb 2008, John Stultz wrote: > > > > CLOCK_TICK_ADJUST is based on LATCH and HZ, if the update frequency isn't > > > based on HZ, there is no point in using it! > > > > Hey Roman, > > > > Again, I'm sorry I don't seem to be following your objections. If you > > want to suggest a different patch to fix the issue, it might help. > > I already gave you the necessary details for how to set > NTP_INTERVAL_LENGTH and in the previous mail I explained the basis for it. -ENOPATCH We're taking weeks to critique fairly small bug fix. I'm sure we both have better things to do then continue to misunderstand each other. I'll do the testing and will happily ack it if it resolves the issue. > I really don't understand what's your problem with it. Why do you try to > make it more complex than necessary? Again, I profusely apologize for not understanding your suggestions. I do appreciate your feedback and I really respect your insistence on getting this right, but I do not see exactly how your comments connect to resolving the bug I'm trying to fix. > > The big issue for me, is that we have to initialize the clocksource > > cycle interval so it matches the base tick_length that NTP uses. > > > > To be clear, the issue I'm trying to address is only this: > > Assuming there is no NTP adjustment yet to be made, if we initialize the > > clocksource interval to X, then compare it with Y when we accumulate, we > > introduce error if X and Y are not the same. > > > > It really doesn't matter how long the length is, if we're including > > CLOCK_TICK_ADJUST, or if it really matches the actual HZ tick length or > > not. The issue is that we have to be consistent. If we're not, then we > > introduce error that ntpd has to additionally correct for. > > You don't create consistency by adding corrections all over the place > until it adds up to the right sum. Uh, I'd argue that you create consistency by using the same method on both sides of the equation. That's what I'm trying to do. Now, If you're disputing that I'm correcting the wrong side of the equation, then we're getting somewhere. But its still not clear to me how you're suggesting the other side (which is calculated in ntp_update_frequency) be changed. > The current correction is already somewhat of a hack and I'd rather get > rid of it than to let it spread all over the place (it's really only > needed so that people with weird HZ settings don't hit the 500ppm limit > and we're basically cheating to the ntpd by not telling it the real > frequency). Please keep the knowledge about this crutch at a single place > and don't spread it. > Anyway, for NO_HZ this correction is completely irrelevant, so again > there's no point in adding random values all over the place until you get > the correct result. You keep on bringing up NO_HZ, and again, the bug I'm trying to fix occurs with or without NO_HZ. The fix I proposed resolves the issue with or without NO_HZ. > The only other alternative would be to calculate this correction > dynamically. For this you leave NTP_INTERVAL_LENGTH as is and when > changing clocks you check whether "abs(((cs->xtime_interval * > NTP_INTERVAL_FREQ) >> cs->shift) - NSEC_PER_SEC)" exceeds a certain limit > (e.g. 200usec) and in this case you print a warning message, that the > clock has large base drift value and is a bad ntp source and apply a > correction value. This way the correction only hits the very few system > which might need it and it would be the prefered solution, but it also > requires a few more changes. Uh, that seems to be just checking if the xtime_interval is off base, or if the ntp correction has gone too far. I just don't see how this connects to the issue at hand. Just so we're current, I've refined the patch further and included it below. It now addresses error caused by the intervals of adjusted clocksource being recalculated on clocksource changes (and thus including the adjustment in the base interval). thanks -john Fix time skew caused by inconsistent use of NTP_INTERVAL_LENGTH and current_tick_length(). Patch has three components: 1) Reverts the first version of this patch that made it upstream (while not perfect, its still avoids the issue for most users). 2) Changes NTP_INTERVAL_LENGTH to be consistent with the calculations made in ntp_update_frequency(). 3) Splits the clocksource mult value into two parts: the original mult component and the ntp adjusted mult_adj component. This allows us to correctly recalculate the base xtime_intervals if users switch back and forth between adjusted clocksources. Signed-off-by: John Stultz <johnstul@us.ibm.com> diff --git a/arch/ia64/kernel/time.c b/arch/ia64/kernel/time.c index 3ab0427..07cacc9 100644 --- a/arch/ia64/kernel/time.c +++ b/arch/ia64/kernel/time.c @@ -357,7 +357,7 @@ void update_vsyscall(struct timespec *wall, struct clocksource *c) /* copy fsyscall clock data */ fsyscall_gtod_data.clk_mask = c->mask; - fsyscall_gtod_data.clk_mult = c->mult; + fsyscall_gtod_data.clk_mult = c->mult + c->mult_adj; fsyscall_gtod_data.clk_shift = c->shift; fsyscall_gtod_data.clk_fsys_mmio = c->fsys_mmio; fsyscall_gtod_data.clk_cycle_last = c->cycle_last; diff --git a/arch/powerpc/kernel/time.c b/arch/powerpc/kernel/time.c index 3b26fbd..36aa8da 100644 --- a/arch/powerpc/kernel/time.c +++ b/arch/powerpc/kernel/time.c @@ -814,7 +814,7 @@ void update_vsyscall(struct timespec *wall_time, struct clocksource *clock) /* XXX this assumes clock->shift == 22 */ /* 4611686018 ~= 2^(20+64-22) / 1e9 */ - t2x = (u64) clock->mult * 4611686018ULL; + t2x = (u64) (clock->mult + clock->mult_adj) * 4611686018ULL; stamp_xsec = (u64) xtime.tv_nsec * XSEC_PER_SEC; do_div(stamp_xsec, 1000000000); stamp_xsec += (u64) xtime.tv_sec * XSEC_PER_SEC; diff --git a/arch/x86/kernel/vsyscall_64.c b/arch/x86/kernel/vsyscall_64.c index 3f82427..14afd48 100644 --- a/arch/x86/kernel/vsyscall_64.c +++ b/arch/x86/kernel/vsyscall_64.c @@ -83,7 +83,7 @@ void update_vsyscall(struct timespec *wall_time, struct clocksource *clock) vsyscall_gtod_data.clock.vread = clock->vread; vsyscall_gtod_data.clock.cycle_last = clock->cycle_last; vsyscall_gtod_data.clock.mask = clock->mask; - vsyscall_gtod_data.clock.mult = clock->mult; + vsyscall_gtod_data.clock.mult = clock->mult + clock->mult_adj; vsyscall_gtod_data.clock.shift = clock->shift; vsyscall_gtod_data.wall_time_sec = wall_time->tv_sec; vsyscall_gtod_data.wall_time_nsec = wall_time->tv_nsec; diff --git a/include/linux/clocksource.h b/include/linux/clocksource.h index 85778a4..6719e50 100644 --- a/include/linux/clocksource.h +++ b/include/linux/clocksource.h @@ -63,6 +63,7 @@ struct clocksource { cycle_t (*read)(void); cycle_t mask; u32 mult; + s32 mult_adj; u32 shift; unsigned long flags; cycle_t (*vread)(void); @@ -179,7 +180,7 @@ static inline cycle_t clocksource_read(struct clocksource *cs) static inline s64 cyc2ns(struct clocksource *cs, cycle_t cycles) { u64 ret = (u64)cycles; - ret = (ret * cs->mult) >> cs->shift; + ret = (ret * (cs->mult + cs->mult_adj)) >> cs->shift; return ret; } diff --git a/include/linux/timex.h b/include/linux/timex.h index 8ea3e71..66ae8dd 100644 --- a/include/linux/timex.h +++ b/include/linux/timex.h @@ -232,7 +232,13 @@ static inline int ntp_synced(void) #else #define NTP_INTERVAL_FREQ (HZ) #endif -#define NTP_INTERVAL_LENGTH (NSEC_PER_SEC/NTP_INTERVAL_FREQ) + +#define CLOCK_TICK_OVERFLOW (LATCH * HZ - CLOCK_TICK_RATE) +#define CLOCK_TICK_ADJUST (((s64)CLOCK_TICK_OVERFLOW * NSEC_PER_SEC) / \ + (s64)CLOCK_TICK_RATE) + +/* Becaues using NSEC_PER_SEC would be too easy */ +#define NTP_INTERVAL_LENGTH ((((s64)TICK_USEC*NSEC_PER_USEC*USER_HZ)+CLOCK_TICK_ADJUST)/NTP_INTERVAL_FREQ) /* Returns how long ticks are at present, in ns / 2^(SHIFT_SCALE-10). */ extern u64 current_tick_length(void); diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c index e64efaf..c88b591 100644 --- a/kernel/time/ntp.c +++ b/kernel/time/ntp.c @@ -43,10 +43,6 @@ long time_freq; /* frequency offset (scaled ppm)*/ static long time_reftime; /* time at last adjustment (s) */ long time_adjust; -#define CLOCK_TICK_OVERFLOW (LATCH * HZ - CLOCK_TICK_RATE) -#define CLOCK_TICK_ADJUST (((s64)CLOCK_TICK_OVERFLOW * NSEC_PER_SEC) / \ - (s64)CLOCK_TICK_RATE) - static void ntp_update_frequency(void) { u64 second_length = (u64)(tick_usec * NSEC_PER_USEC * USER_HZ) diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index cd5dbc4..e319ecc 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -187,8 +187,7 @@ static void change_clocksource(void) clock->error = 0; clock->xtime_nsec = 0; - clocksource_calculate_interval(clock, - (unsigned long)(current_tick_length()>>TICK_LENGTH_SHIFT)); + clocksource_calculate_interval(clock, NTP_INTERVAL_LENGTH); tick_clock_notify(); @@ -245,8 +244,7 @@ void __init timekeeping_init(void) ntp_clear(); clock = clocksource_get_next(); - clocksource_calculate_interval(clock, - (unsigned long)(current_tick_length()>>TICK_LENGTH_SHIFT)); + clocksource_calculate_interval(clock, NTP_INTERVAL_LENGTH); clock->cycle_last = clocksource_read(clock); xtime.tv_sec = sec; @@ -426,7 +424,7 @@ static void clocksource_adjust(s64 offset) } else return; - clock->mult += adj; + clock->mult_adj += adj; clock->xtime_interval += interval; clock->xtime_nsec -= offset; clock->error -= (interval - offset) << ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] correct inconsistent ntp interval/tick_length usage 2008-02-09 2:17 ` john stultz @ 2008-02-09 4:47 ` Roman Zippel 2008-02-09 5:04 ` Andrew Morton 2008-02-10 18:45 ` Roman Zippel 1 sibling, 1 reply; 36+ messages in thread From: Roman Zippel @ 2008-02-09 4:47 UTC (permalink / raw) To: john stultz; +Cc: lkml, Andrew Morton, Ingo Molnar, Steven Rostedt Hi, On Fri, 8 Feb 2008, john stultz wrote: > > clock = clocksource_get_next(); > - clocksource_calculate_interval(clock, > - (unsigned long)(current_tick_length()>>TICK_LENGTH_SHIFT)); > + clocksource_calculate_interval(clock, NTP_INTERVAL_LENGTH); > clock->cycle_last = clocksource_read(clock); > Only now I noticed that the first patch had been merged without any further question. :-( What point is there in reviewing patches, if everything is merged anyway. :-( bye, Roman ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] correct inconsistent ntp interval/tick_length usage 2008-02-09 4:47 ` Roman Zippel @ 2008-02-09 5:04 ` Andrew Morton 2008-02-09 14:13 ` Roman Zippel 0 siblings, 1 reply; 36+ messages in thread From: Andrew Morton @ 2008-02-09 5:04 UTC (permalink / raw) To: Roman Zippel; +Cc: john stultz, lkml, Ingo Molnar, Steven Rostedt On Sat, 9 Feb 2008 05:47:18 +0100 (CET) Roman Zippel <zippel@linux-m68k.org> wrote: > Hi, > > On Fri, 8 Feb 2008, john stultz wrote: > > > > > clock = clocksource_get_next(); > > - clocksource_calculate_interval(clock, > > - (unsigned long)(current_tick_length()>>TICK_LENGTH_SHIFT)); > > + clocksource_calculate_interval(clock, NTP_INTERVAL_LENGTH); > > clock->cycle_last = clocksource_read(clock); > > > > Only now I noticed that the first patch had been merged without any > further question. :-( > What point is there in reviewing patches, if everything is merged anyway. :-( > oops, mistake, sorry. There's plenty of time to fix it though. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] correct inconsistent ntp interval/tick_length usage 2008-02-09 5:04 ` Andrew Morton @ 2008-02-09 14:13 ` Roman Zippel 0 siblings, 0 replies; 36+ messages in thread From: Roman Zippel @ 2008-02-09 14:13 UTC (permalink / raw) To: Andrew Morton; +Cc: john stultz, lkml, Ingo Molnar, Steven Rostedt Hi, On Fri, 8 Feb 2008, Andrew Morton wrote: > > Only now I noticed that the first patch had been merged without any > > further question. :-( > > What point is there in reviewing patches, if everything is merged anyway. :-( > > > > oops, mistake, sorry. There's plenty of time to fix it though. It has been signed off by both Ingo and Thomas and neither noticed anything? This makes me very afraid of the merging process... bye, Roman ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] correct inconsistent ntp interval/tick_length usage 2008-02-09 2:17 ` john stultz 2008-02-09 4:47 ` Roman Zippel @ 2008-02-10 18:45 ` Roman Zippel 2008-02-12 0:09 ` john stultz 1 sibling, 1 reply; 36+ messages in thread From: Roman Zippel @ 2008-02-10 18:45 UTC (permalink / raw) To: john stultz; +Cc: lkml, Andrew Morton, Ingo Molnar, Steven Rostedt Hi, On Fri, 8 Feb 2008, john stultz wrote: > -ENOPATCH > > We're taking weeks to critique fairly small bug fix. I'm sure we both > have better things to do then continue to misunderstand each other. I'll > do the testing and will happily ack it if it resolves the issue. I don't want to just send a patch, I want you to understand why your approach is wrong. > Now, If you're disputing that I'm correcting the wrong side of the > equation, then we're getting somewhere. But its still not clear to me > how you're suggesting the other side (which is calculated in > ntp_update_frequency) be changed. > [..] > You keep on bringing up NO_HZ, and again, the bug I'm trying to fix > occurs with or without NO_HZ. The fix I proposed resolves the issue with > or without NO_HZ. The correction is incorrect for NO_HZ. Let's try it the other way around, as my explanation seem to lack something. Please try to explain what this correction actually means and why it should be correct for NO_HZ as well. > > The only other alternative would be to calculate this correction > > dynamically. For this you leave NTP_INTERVAL_LENGTH as is and when > > changing clocks you check whether "abs(((cs->xtime_interval * > > NTP_INTERVAL_FREQ) >> cs->shift) - NSEC_PER_SEC)" exceeds a certain limit > > (e.g. 200usec) and in this case you print a warning message, that the > > clock has large base drift value and is a bad ntp source and apply a > > correction value. This way the correction only hits the very few system > > which might need it and it would be the prefered solution, but it also > > requires a few more changes. > > Uh, that seems to be just checking if the xtime_interval is off base, or > if the ntp correction has gone too far. I just don't see how this > connects to the issue at hand. Above is the key to understanding the problem, if this difference is small enough there is no need to correct anything. This is the original patch which introduced the correction: http://git.kernel.org/?p=linux/kernel/git/torvalds/old-2.6-bkcvs.git;a=commitdiff;h=69c1cd9218e4cf3016b0f435d6ef3dffb5a53860 Keep in mind that at that time PIT was used as the base clock (even if the tsc was used, it was relative to PIT). So how much of those assumptions are still valid today (especially with NO_HZ enabled)? bye, Roman ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] correct inconsistent ntp interval/tick_length usage 2008-02-10 18:45 ` Roman Zippel @ 2008-02-12 0:09 ` john stultz 2008-02-12 14:36 ` Roman Zippel 0 siblings, 1 reply; 36+ messages in thread From: john stultz @ 2008-02-12 0:09 UTC (permalink / raw) To: Roman Zippel; +Cc: lkml, Andrew Morton, Ingo Molnar, Steven Rostedt On Sun, 2008-02-10 at 19:45 +0100, Roman Zippel wrote: > Hi, > > On Fri, 8 Feb 2008, john stultz wrote: > > > -ENOPATCH > > > > We're taking weeks to critique fairly small bug fix. I'm sure we both > > have better things to do then continue to misunderstand each other. I'll > > do the testing and will happily ack it if it resolves the issue. > > I don't want to just send a patch, I want you to understand why your > approach is wrong. With all due respect, it also keeps the critique in one direction and makes your review less collaborative and more confrontational then I suspect (or maybe just hope) you intend. > > Now, If you're disputing that I'm correcting the wrong side of the > > equation, then we're getting somewhere. But its still not clear to me > > how you're suggesting the other side (which is calculated in > > ntp_update_frequency) be changed. > > [..] > > You keep on bringing up NO_HZ, and again, the bug I'm trying to fix > > occurs with or without NO_HZ. The fix I proposed resolves the issue with > > or without NO_HZ. > > The correction is incorrect for NO_HZ. > Let's try it the other way around, as my explanation seem to lack > something. > Please try to explain what this correction actually means and why it > should be correct for NO_HZ as well. For the course of this discussion, lets assume we're talking about 2.6.24. One of the goals of the timekeeping design is to let it be flexible enough that accumulation interval can be irregular and it will still behave correctly. This stems from the one of original issues with the old tick based timekeeping: lost or irregular timer interrupts. Now, we do used fixed interval accumulation in update_wall_time, but that was for two reasons: 1) In the common case, its faster to do the compare and add/subtract then the mult orignally used (*a) 2) It allows for the extra fine-grained NTP error accounting. This fine grained error accounting is where the bug I'm trying to address is cropping up from. In order to have the comparison we need to have two values: A: The clocksource's notion of how long the fixed interval is. B: NTP's notion of how long the fixed interval is. When no NTP adjustment is being made, these two values should be equal, but currently they are not. This is what causes the 280ppm error seen on my test system. Part A is calculated in the following fashion: #define NTP_INTERVAL_LENGTH (NSEC_PER_SEC/NTP_INTERVAL_FREQ) Which is then functionally shifted up by TICK_LENGTH_SHIFT, but for the course of this discussion, lets ignore that. Part B is calculated in ntp_update_frequency() as: u64 second_length = (u64)(tick_usec * NSEC_PER_USEC * USER_HZ) << TICK_LENGTH_SHIFT; second_length += (s64)CLOCK_TICK_ADJUST << TICK_LENGTH_SHIFT; second_length += (s64)time_freq << (TICK_LENGTH_SHIFT - SHIFT_NSEC); tick_length_base = second_length; do_div(tick_length_base, NTP_INTERVAL_FREQ); If we're assuming there is no NTP adjustment, and avoiding the TICK_LENGTH_SHIFT, this can be shorted to: B = ((TICK_USEC * NSEC_PER_USEC * USER_HZ) + CLOCK_TICK_ADJUST)/NTP_ITNERVAL_FREQ The A vs B comparison can be shortened further to: NSEC_PER_SEC != (TICK_USEC * NSEC_PER_USEC * USER_HZ) + CLOCK_TICK_ADJUST So now on to what seems to be the point of contention: If A != B, which side do we fix? My patches fix the A side so that it matches B, which on its face isn't terribly complicated, but you seem to be suggesting we fix the B side instead (Now I'm assuming here, because there's no patch. So I can only speak to your emails, which were not clear to me). Really it doesn't matter to me, as long as it works in all the cases needed. You're argument of simplifying B so its closer to A does sound initially compelling. Simpler is better, so lets follow the discussion below to see if it will work, or if I'm just missing something. *a: Note: this does backfire on us if the fixed interval is too small, causing the loop to run too many times, which was seen when NO_HZ was being merged. This was why we changed the NTP_INTERVAL_FREQ to be about half a second w/ NO_HZ, which reduced the time spent in the accumulation loop. > This is the original patch which introduced the correction: > > http://git.kernel.org/?p=linux/kernel/git/torvalds/old-2.6-bkcvs.git;a=commitdiff;h=69c1cd9218e4cf3016b0f435d6ef3dffb5a53860 > > Keep in mind that at that time PIT was used as the base clock (even if the > tsc was used, it was relative to PIT). So how much of those assumptions > are still valid today (especially with NO_HZ enabled)? Indeed! When we use the TSC, acpi_pm, HPET or other fine grained clocksource (regardless of NO_HZ), it does seem silly to take these odd PIT based values into account. However, what happens if a system uses the PIT or jiffies clocksource, which do suffer from the HZ / ACTHZ error resolved by the patch linked to above? Since those clocksources are so course, we still need to take that error into account. So in that regard the assumptions are still valid, no? You've seemingly suggested we set CLOCK_TICK_ADJUST to zero in the CONFIG_NO_HZ situation, however note that a CONFIG_NO_HZ kernel should be able to boot and run on a box that only supports the jiffies or pit clocksource (just not be able to switching into NO_HZ mode). Further, how does that resolve the issue w/ the tsc/hpet/acpi_pm clocksources that still exist when CONFIG_NO_HZ=n? I just don't see how this will work. Now, don't get me wrong, I'd love it if we didn't have to include CLOCK_TICK_ADJUST, simpler is better, but I don't see how it will work correctly. So instead of trying to fix part B by removing the CLOCK_TICK_ADJUST component, lets look at my solution, which adds the pit based corrections to part A. ***This is the key part***: Since the hpet/acpi_pm/tsc are high res clocksources, they can easily adapt to the request for a slightly odd interval length, and it will not affect them. They are flexible like that. AGAIN: For high res clocksources, the interval length requested can be almost(see note *a above, as well as hardware wrapping which puts some limitations on us) arbitrary and the code should function correctly. So CLOCK_TICK_ADJUST doesn't really mean anything in relation to hpet/acpi_pm/tsc, but it also doesn't hurt us to include it. But if, when using the same kernel, we switch to the jiffies clocksource, we need to request a interval length that matches the actual jiffy/pit tick length. This is why CLOCK_TICK_ADJUST is still needed. So by modifying part A, we achieve the consistency needed (that is A==B), and we still function well with low-res clocksources. Further we don't need any config time decisions to be made to function correctly. It just all works out. Now, I'm sure its quite likely I've somehow not understood your suggestions. Maybe you are not suggesting adding lots of compile time logic to try to even the B side out. Again, a patch would likely clarify all of this. thanks -john ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] correct inconsistent ntp interval/tick_length usage 2008-02-12 0:09 ` john stultz @ 2008-02-12 14:36 ` Roman Zippel 2008-02-14 4:36 ` john stultz 0 siblings, 1 reply; 36+ messages in thread From: Roman Zippel @ 2008-02-12 14:36 UTC (permalink / raw) To: john stultz; +Cc: lkml, Andrew Morton, Ingo Molnar, Steven Rostedt Hi, On Mon, 11 Feb 2008, john stultz wrote: > > I don't want to just send a patch, I want you to understand why your > > approach is wrong. > > With all due respect, it also keeps the critique in one direction and > makes your review less collaborative and more confrontational then I > suspect (or maybe just hope) you intend. I don't think that's necessarily a contradiction, if we keep it to confronting the problem. A simple patch wouldn't have provided any further understanding of the problem compared to what I already said. You would have seen what the patch does (which I described already differently), but not really why it does that. In this sense I prefer to force the confrontation of the problem. I'm afraid a working patch would encourage to simply ignore the problem, as your problem at hand would be solved without completely understanding it. The point is that I'd really like you to understand the problem, so I'm not the only one who understands this code :) and in the end it might allow better collaboration to further improve this code. To make it very clear this is just about understanding the problem, I don't want to force a specific solution (which a patch would practically do). If we both understand the problem, we can also discuss the solution and maybe we find something better, but maybe I'm also totally wrong, which would be a little embarrassing :), but that would be fine too. There may be better ways to go about this problem, but IMO it would still be better than just ignoring the problem and force it with a patch. > This fine grained error accounting is where the bug I'm trying to > address is cropping up from. In order to have the comparison we need to > have two values: > A: The clocksource's notion of how long the fixed interval is. > B: NTP's notion of how long the fixed interval is. > > When no NTP adjustment is being made, these two values should be equal, > but currently they are not. This is what causes the 280ppm error seen on > my test system. > > Part A is calculated in the following fashion: > #define NTP_INTERVAL_LENGTH (NSEC_PER_SEC/NTP_INTERVAL_FREQ) > > Which is then functionally shifted up by TICK_LENGTH_SHIFT, but for the > course of this discussion, lets ignore that. > > Part B is calculated in ntp_update_frequency() as: > > u64 second_length = (u64)(tick_usec * NSEC_PER_USEC * USER_HZ) > << TICK_LENGTH_SHIFT; > second_length += (s64)CLOCK_TICK_ADJUST << TICK_LENGTH_SHIFT; > second_length += (s64)time_freq << (TICK_LENGTH_SHIFT - SHIFT_NSEC); > > tick_length_base = second_length; > do_div(tick_length_base, NTP_INTERVAL_FREQ); > > > If we're assuming there is no NTP adjustment, and avoiding the > TICK_LENGTH_SHIFT, this can be shorted to: > B = ((TICK_USEC * NSEC_PER_USEC * USER_HZ) > + CLOCK_TICK_ADJUST)/NTP_ITNERVAL_FREQ > > > The A vs B comparison can be shortened further to: > NSEC_PER_SEC != (TICK_USEC * NSEC_PER_USEC * USER_HZ) > + CLOCK_TICK_ADJUST > > So now on to what seems to be the point of contention: > If A != B, which side do we fix? > > > My patches fix the A side so that it matches B, which on its face isn't > terribly complicated, but you seem to be suggesting we fix the B side > instead (Now I'm assuming here, because there's no patch. So I can only > speak to your emails, which were not clear to me). If we go from your base assumption above "there is no NTP adjustment", I would actually agree with you and it wouldn't matter much on which side to correct the equation. The question is now what happens, if there are NTP adjustments, i.e. when the time_freq value isn't zero. Based on this initialization we tell the NTP daemon the base frequency, although not directly but it knows the length freq1 == 1 sec. If the clock now needs adjustment, the NTP daemon tells the kernel via time_freq how to change the speed so that freq1 == 1 sec + time_freq. The problem is now that by using CLOCK_TICK_ADJUST we are cheating and we don't tell the NTP daemon the real frequency. We define 1 sec as freq2 + tick_adjust and with the NTP adjustment we have freq2 + tick_adj == 1 sec + time_freq. Above initialization now calcalutes the base time length for an update interval of freq2 / NTP_INTERVAL_FREQ, this means the requested time_freq adjustment isn't applied to (freq2 + tick_adj) cycles but to freq2 cycles, so this finally means any adjustment made by the NTP daemon is slightly off. To demonstrate this let's look at some real values and let's use the PIT for it (as this is where this originated and on which CLOCK_TICK_ADJUST is based on). With freq1=1193182 and HZ=1000 we program the timer with 1193 cycles and the actual update frequency is freq2=1193000. To adjust for this difference we change the length of a timer tick: (NSEC_PER_SEC + CLOCK_TICK_ADJUST) / NTP_INTERVAL_FREQ or (10^9 nsec - 152533 nsec) / 1000 This way every 1000 interrupts or 1193000 cycles the clock is advanced by 999847467 nsec, so that we advance time according to freq1, but any further adjustments aren't applied to an interval of what the NTP daemon thinks is 1 sec but 999847467 nsec instead. In consequence this means, if we want to improve timekeeping, we first set the (update_cycles * NTP_INTERVAL_FREQ) interval as close as possible to the real frequency. It doesn't has to be perfect as we usually don't know the real frequency with 100% certainty anyway. Second, we drop the tick adjustment if possible and leave the adjustments to the NTP daemon and as long as the drift is within the 500ppm limit it has no problem to manage this. > However, what happens if a system uses the PIT or jiffies clocksource, > which do suffer from the HZ / ACTHZ error resolved by the patch linked > to above? Since those clocksources are so course, we still need to take > that error into account. So in that regard the assumptions are still > valid, no? > > You've seemingly suggested we set CLOCK_TICK_ADJUST to zero in the > CONFIG_NO_HZ situation, however note that a CONFIG_NO_HZ kernel should > be able to boot and run on a box that only supports the jiffies or pit > clocksource (just not be able to switching into NO_HZ mode). Further, > how does that resolve the issue w/ the tsc/hpet/acpi_pm clocksources > that still exist when CONFIG_NO_HZ=n? > > I just don't see how this will work. Look at your Part A, with NO_HZ the update interval is much larger, so any rounding error becomes practically irrelevant. In the above PIT example the update interval wouldn't be 1193 cycles but 596591 cycles instead. I hope this helps to understand what I suggested. For NO_HZ we can just drop this correction without problems. In the other case it's a little more complicated, but we basically have to check the update interval and if (interval * NTP_INTERVAL_FREQ) is within the 500ppm limit there isn't much we really have to do. This correction was only introduced to accommodate clocks which exceed this 500ppm limit for whatever reason, so the NTP daemon isn't confused. The best solution would be really to limit this correction to these clocks. bye, Roman ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] correct inconsistent ntp interval/tick_length usage 2008-02-12 14:36 ` Roman Zippel @ 2008-02-14 4:36 ` john stultz 2008-02-16 4:24 ` Roman Zippel 0 siblings, 1 reply; 36+ messages in thread From: john stultz @ 2008-02-14 4:36 UTC (permalink / raw) To: Roman Zippel; +Cc: lkml, Andrew Morton, Ingo Molnar, Steven Rostedt [-- Attachment #1: Type: text/plain, Size: 12352 bytes --] On Tue, 2008-02-12 at 15:36 +0100, Roman Zippel wrote: > On Mon, 11 Feb 2008, john stultz wrote: > > This fine grained error accounting is where the bug I'm trying to > > address is cropping up from. In order to have the comparison we need to > > have two values: > > A: The clocksource's notion of how long the fixed interval is. > > B: NTP's notion of how long the fixed interval is. > > > > When no NTP adjustment is being made, these two values should be equal, > > but currently they are not. This is what causes the 280ppm error seen on > > my test system. > > > > Part A is calculated in the following fashion: > > #define NTP_INTERVAL_LENGTH (NSEC_PER_SEC/NTP_INTERVAL_FREQ) > > > > Which is then functionally shifted up by TICK_LENGTH_SHIFT, but for the > > course of this discussion, lets ignore that. > > > > Part B is calculated in ntp_update_frequency() as: > > > > u64 second_length = (u64)(tick_usec * NSEC_PER_USEC * USER_HZ) > > << TICK_LENGTH_SHIFT; > > second_length += (s64)CLOCK_TICK_ADJUST << TICK_LENGTH_SHIFT; > > second_length += (s64)time_freq << (TICK_LENGTH_SHIFT - SHIFT_NSEC); > > > > tick_length_base = second_length; > > do_div(tick_length_base, NTP_INTERVAL_FREQ); > > > > > > If we're assuming there is no NTP adjustment, and avoiding the > > TICK_LENGTH_SHIFT, this can be shorted to: > > B = ((TICK_USEC * NSEC_PER_USEC * USER_HZ) > > + CLOCK_TICK_ADJUST)/NTP_ITNERVAL_FREQ > > > > > > The A vs B comparison can be shortened further to: > > NSEC_PER_SEC != (TICK_USEC * NSEC_PER_USEC * USER_HZ) > > + CLOCK_TICK_ADJUST > > > > So now on to what seems to be the point of contention: > > If A != B, which side do we fix? > > > > > > My patches fix the A side so that it matches B, which on its face isn't > > terribly complicated, but you seem to be suggesting we fix the B side > > instead (Now I'm assuming here, because there's no patch. So I can only > > speak to your emails, which were not clear to me). > > If we go from your base assumption above "there is no NTP adjustment", I > would actually agree with you and it wouldn't matter much on which side > to correct the equation. > > The question is now what happens, if there are NTP adjustments, i.e. when > the time_freq value isn't zero. Based on this initialization we tell the > NTP daemon the base frequency, although not directly but it knows the > length freq1 == 1 sec. If the clock now needs adjustment, the NTP daemon > tells the kernel via time_freq how to change the speed so that freq1 == > 1 sec + time_freq. > > The problem is now that by using CLOCK_TICK_ADJUST we are cheating and we > don't tell the NTP daemon the real frequency. We define 1 sec as freq2 + > tick_adjust and with the NTP adjustment we have freq2 + tick_adj == 1 sec > + time_freq. Above initialization now calcalutes the base time length for > an update interval of freq2 / NTP_INTERVAL_FREQ, this means the requested > time_freq adjustment isn't applied to (freq2 + tick_adj) cycles but to > freq2 cycles, so this finally means any adjustment made by the NTP daemon > is slightly off. Oh! So your issue is that since time_freq is stored in ppm, or in effect a usecs per sec offset, when we add it to something other then 1 second we mis-apply what NTP tells us to. Is that right? > To demonstrate this let's look at some real values and let's use the PIT > for it (as this is where this originated and on which CLOCK_TICK_ADJUST is > based on). With freq1=1193182 and HZ=1000 we program the timer with 1193 > cycles and the actual update frequency is freq2=1193000. To adjust for > this difference we change the length of a timer tick: > > (NSEC_PER_SEC + CLOCK_TICK_ADJUST) / NTP_INTERVAL_FREQ > > or > > (10^9 nsec - 152533 nsec) / 1000 > > This way every 1000 interrupts or 1193000 cycles the clock is advanced by > 999847467 nsec, so that we advance time according to freq1, but any > further adjustments aren't applied to an interval of what the NTP daemon > thinks is 1 sec but 999847467 nsec instead. Right, so if NTP has us apply a 10ppm adjustment, instead of doing: NSEC_PER_SEC + 10,000 We're doing: NSEC_PER_SEC + CLOCK_TICK_ADJUST + 10,000 Which, if I'm doing my math right, results in a 10.002ppm adjustment (using the 999847467ns number above), instead of just a 10ppm adjustment. Now, true, this is an error, but it is a pretty small one. Even at the maximum 500ppm value, it only results in an error of 76 parts per billion. As you'll see below, that tends to be less error then what we get from the clock granularity. Is there something else I'm missing here or is this really the core issue you're concerned with? If not, I'll agree that we may need to tune the "B" side of the equation, but can we also agree that the very large error caused, even without NTP adjustments, being involved by the A != B can be addressed separately. That way if A is calculated in the same way as B, when we fix B, we will at the same time fix A. Does that sound like a step forward? > In consequence this means, if we want to improve timekeeping, we first set > the (update_cycles * NTP_INTERVAL_FREQ) interval as close as possible to > the real frequency. It doesn't has to be perfect as we usually don't know > the real frequency with 100% certainty anyway. This might need some more explanation, as I'm not certain I know what update_cycles refers to. Do you mean cycle_interval? I guess I'm not completely sure how you're suggesting we change things here. > Second, we drop the tick > adjustment if possible and leave the adjustments to the NTP daemon and as > long as the drift is within the 500ppm limit it has no problem to manage > this. Dropping the tick adjustment? By that do you mean the tick_usec value set by adjtimex()? I don't quite see why we want that. Could you expand here? > > However, what happens if a system uses the PIT or jiffies clocksource, > > which do suffer from the HZ / ACTHZ error resolved by the patch linked > > to above? Since those clocksources are so course, we still need to take > > that error into account. So in that regard the assumptions are still > > valid, no? > > > > You've seemingly suggested we set CLOCK_TICK_ADJUST to zero in the > > CONFIG_NO_HZ situation, however note that a CONFIG_NO_HZ kernel should > > be able to boot and run on a box that only supports the jiffies or pit > > clocksource (just not be able to switching into NO_HZ mode). Further, > > how does that resolve the issue w/ the tsc/hpet/acpi_pm clocksources > > that still exist when CONFIG_NO_HZ=n? > > > > I just don't see how this will work. > > Look at your Part A, with NO_HZ the update interval is much larger, so any > rounding error becomes practically irrelevant. In the above PIT example > the update interval wouldn't be 1193 cycles but 596591 cycles instead. Ok, so sitting down and doing the math, I see for the pit clocksource your point is correct. For relatively fine grained clocksources (the pit having ~900ns resolution) the larger the cycle_interval gets, the smaller the error becomes. So when the NTP_INTERVAL_FREQ is 2 instead of HZ, the error does get quite small, even if we don't add in the CLOCK_TICK_ADJUST factor. I think here's where the contention has been, as I've been using the jiffies clocksource in my head for the previous back-and-forths. The jiffies clocksource is different, as due to its coarseness, the normal cycles_interval is only 1. So dropping NTP_INTERVAL_FREQ from HZ to 2 doesn't improve the error, it basically stays constant. I sat down and ran through some numbers, seeing how the error rates changed as HZ changed, with and without CLOCK_TICK_ADJUST and how NO_HZ effects it as well. The .c file is attached, but here is the output for jiffies, pit, and acpi_pm: HZ=1000 CLOCK_TICK_ADJUST=-152533 jiffies 467 ppb error jiffies NOHZ 467 ppb error pit 0 ppb error pit NOHZ 0 ppb error acpi_pm -280 ppb error acpi_pm NOHZ 279 ppb error HZ=100 CLOCK_TICK_ADJUST=15085 jiffies 15085 ppb error jiffies NOHZ 15085 ppb error pit -1 ppb error pit NOHZ -1 ppb error acpi_pm -281 ppb error acpi_pm NOHZ 278 ppb error HZ=250 CLOCK_TICK_ADJUST=56990 jiffies -5510 ppb error jiffies NOHZ -5510 ppb error pit 0 ppb error pit NOHZ 0 ppb error acpi_pm -281 ppb error acpi_pm NOHZ 278 ppb error HZ=300 CLOCK_TICK_ADJUST=-68723 jiffies -3523 ppb error jiffies NOHZ -3523 ppb error pit 1 ppb error pit NOHZ 1 ppb error acpi_pm -279 ppb error acpi_pm NOHZ 279 ppb error HZ=1000 CLOCK_TICK_ADJUST=0 jiffies 153000 ppb error jiffies NOHZ 153000 ppb error pit 152533 ppb error pit NOHZ 0 ppb error acpi_pm -127112 ppb error acpi_pm NOHZ 279 ppb error HZ=100 CLOCK_TICK_ADJUST=0 jiffies 0 ppb error jiffies NOHZ 0 ppb error pit -15086 ppb error pit NOHZ 0 ppb error acpi_pm 12571 ppb error acpi_pm NOHZ 279 ppb error HZ=250 CLOCK_TICK_ADJUST=0 jiffies -62500 ppb error jiffies NOHZ -62500 ppb error pit -56990 ppb error pit NOHZ 0 ppb error acpi_pm 12571 ppb error acpi_pm NOHZ 279 ppb error HZ=300 CLOCK_TICK_ADJUST=0 jiffies 65200 ppb error jiffies NOHZ 65200 ppb error pit 68724 ppb error pit NOHZ 0 ppb error acpi_pm -15366 ppb error acpi_pm NOHZ 279 ppb error So you are right, w/ pit & NO_HZ, the granularity error is always very small both with or without CLOCK_TICK_ADJUST. However, without CLOCK_TICK_ADJUST, the jiffies error increases for all values of HZ except 100 (which at first seems odd, but seems to be due to loss from rounding in the ACTHZ calculation). One interesting surprise in the data: With CLOCK_TICK_ADJUST=0, the acpi_pm's error frequency shot up in the !NO_HZ cases. This ends up being due to the acpi_pm being a very close to a multiple (3x) of the pit frequency, so CLOCK_TICK_ADJUST helps it as well. > I hope this helps to understand what I suggested. For NO_HZ we can just > drop this correction without problems. In the other case it's a little > more complicated, but we basically have to check the update interval and > if (interval * NTP_INTERVAL_FREQ) is within the 500ppm limit there isn't > much we really have to do. Well, if it weren't for the jiffies clocksource error, to correct for the sub 100ppb error, I'd agree with you to #define CLOCK_TICK_ADJUST 0 for CONFIG_NO_HZ. But until that's worked out I'm not sure. Further it seems to point that if we are going to be chasing down small sub-100ppb errors (which I think would be great to do, but lets not make users to endure 200+ppm errors while we debate the fine-tuning :) we might want to consider a method where we let ntp_update_freq take into account the current clocksource's interval length, so it becomes the base value against which we apply adjustments (scaled appropriately). Phew, this is turning into a long mail. So let's summarize: There are 3 sources of error that we've discussed here: 1) The large (280ppm) error seen with no-NTP adjustment, caused by the inconsistent (A!=B) interval comparisons which started this discussion, which my patch does address. 2) The medium (153-0ppm)clocksource granularity error shown in the data above, caused by the actual interval length not matching the requested interval length (what CLOCK_TICK_ADJUST tries to compensate for when using jiffies/PIT). 3) The smaller (0.076ppm max) NTP adjustment error caused by the parts-per-billion == nsec-per-sec shortcut used in combination w/ a base interval that is not actually a second (due to CLOCK_TICK_ADJUST being added in) in ntp_update_frequency() All of these are somewhat interdependent. One could consider that #1 is in part due to #2, and #2 also causes #3, as there isn't proper scaling being done in #3. Fixing all of these in one swoop would be great, and I think we should spend time continuing to refine the code. However, I'm just not sure that we have a solution at this point for all the cases we have to consider. So given I do want to continue this discussion and work, would it be possible that I fix up the #1 item above (including resolving the double conflicting merges that have already gone in) as implemented in my last patch, without your objection? Again, I really appreciate you're continued deep observations and feedback here. thanks -john [-- Attachment #2: error-calc.c --] [-- Type: text/x-csrc, Size: 2313 bytes --] /* The following contains adapted code from the kernel, thus is Licensed under the GPLv2 */ #include <stdio.h> #define HZ 1000 #define NSEC_PER_SEC 1000000000LL #define PMTMR_TICKS_PER_SEC 3579545 #define CLOCK_TICK_RATE 1193182 #define LATCH ((CLOCK_TICK_RATE + HZ/2) / HZ) /* For divider */ #define CLOCK_TICK_OVERFLOW (LATCH * HZ - CLOCK_TICK_RATE) #define SH_DIV(NOM,DEN,LSH) ( (((NOM) / (DEN)) << (LSH)) \ + ((((NOM) % (DEN)) << (LSH)) + (DEN) / 2) / (DEN)) #define ACTHZ (SH_DIV (CLOCK_TICK_RATE, LATCH, 8)) #define NSEC_PER_JIFFY ((unsigned long)((((unsigned long long)NSEC_PER_SEC)<<8)/ACTHZ)) #define CLOCK_TICK_ADJUST 0 #define XCLOCK_TICK_ADJUST (((long long)CLOCK_TICK_OVERFLOW * NSEC_PER_SEC) / \ (long long)CLOCK_TICK_RATE) unsigned long hz2mult(unsigned long hz, unsigned long shift) { unsigned long long tmp = NSEC_PER_SEC << shift; return (tmp + hz/2)/hz; } void calculate_values(char* name, unsigned long mult, unsigned long shift, unsigned long ntp_hz) { unsigned long long tmp; unsigned long long length_nsec = (NSEC_PER_SEC + CLOCK_TICK_ADJUST); unsigned long long interval_length = length_nsec/ntp_hz; long long error; // printf("\n%s\n==========\n", name); printf("%s ", name); tmp = interval_length; tmp <<= shift; tmp += mult/2; tmp /= mult; if(tmp == 0) { printf("dived to zero!\n"); tmp = 1; } // printf("%llu cycles per %llu ns\n", tmp, interval_length); // printf("%llu ns per cycle\n", (1*mult)>>shift); // printf("%llu ns per interval\n", (tmp*mult)>>shift); error = ((interval_length*ntp_hz) << shift) - ((tmp*ntp_hz*mult)); error >>= shift; printf("%lld ppb error\n",error); } void main(void) { unsigned long shift, mult; printf("HZ=%ld CLOCK_TICK_ADJUST=%ld\n", HZ, CLOCK_TICK_ADJUST); /* jiffies */ calculate_values("jiffies ", NSEC_PER_JIFFY << 8, 8, HZ); calculate_values("jiffies NOHZ", NSEC_PER_JIFFY << 8, 8, 2); /* pit */ calculate_values("pit ", hz2mult(CLOCK_TICK_RATE,20), 20, HZ); calculate_values("pit NOHZ", hz2mult(CLOCK_TICK_RATE,20), 20, 2); /* acpi_pm */ calculate_values("acpi_pm ", hz2mult(PMTMR_TICKS_PER_SEC,22), 22, HZ); calculate_values("acpi_pm NOHZ", hz2mult(PMTMR_TICKS_PER_SEC,22), 22, 2); printf("\n"); } ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] correct inconsistent ntp interval/tick_length usage 2008-02-14 4:36 ` john stultz @ 2008-02-16 4:24 ` Roman Zippel 2008-02-19 1:02 ` john stultz 0 siblings, 1 reply; 36+ messages in thread From: Roman Zippel @ 2008-02-16 4:24 UTC (permalink / raw) To: john stultz; +Cc: lkml, Andrew Morton, Ingo Molnar, Steven Rostedt Hi, On Wed, 13 Feb 2008, john stultz wrote: > Oh! So your issue is that since time_freq is stored in ppm, or in effect > a usecs per sec offset, when we add it to something other then 1 second > we mis-apply what NTP tells us to. Is that right? Pretty much everything is centered around that 1sec, so the closer the update frequency is to it the better. > Right, so if NTP has us apply a 10ppm adjustment, instead of doing: > NSEC_PER_SEC + 10,000 > > We're doing: > NSEC_PER_SEC + CLOCK_TICK_ADJUST + 10,000 > > Which, if I'm doing my math right, results in a 10.002ppm adjustment > (using the 999847467ns number above), instead of just a 10ppm > adjustment. > > Now, true, this is an error, but it is a pretty small one. Even at the > maximum 500ppm value, it only results in an error of 76 parts per > billion. As you'll see below, that tends to be less error then what we > get from the clock granularity. Is there something else I'm missing here > or is this really the core issue you're concerned with? The error accumulates and there is no good reason to do this for the common case. > > In consequence this means, if we want to improve timekeeping, we first set > > the (update_cycles * NTP_INTERVAL_FREQ) interval as close as possible to > > the real frequency. It doesn't has to be perfect as we usually don't know > > the real frequency with 100% certainty anyway. > > This might need some more explanation, as I'm not certain I know what > update_cycles refers to. Do you mean cycle_interval? I guess I'm not > completely sure how you're suggesting we change things here. clock->cycle_interval > > Second, we drop the tick > > adjustment if possible and leave the adjustments to the NTP daemon and as > > long as the drift is within the 500ppm limit it has no problem to manage > > this. > > Dropping the tick adjustment? By that do you mean the tick_usec value > set by adjtimex()? I don't quite see why we want that. Could you expand > here? CLOCK_TICK_ADJUST > HZ=1000 CLOCK_TICK_ADJUST=-152533 > jiffies 467 ppb error > jiffies NOHZ 467 ppb error > pit 0 ppb error > pit NOHZ 0 ppb error > acpi_pm -280 ppb error > acpi_pm NOHZ 279 ppb error > > HZ=1000 CLOCK_TICK_ADJUST=0 > jiffies 153000 ppb error > jiffies NOHZ 153000 ppb error > pit 152533 ppb error > pit NOHZ 0 ppb error > acpi_pm -127112 ppb error > acpi_pm NOHZ 279 ppb error > > So you are right, w/ pit & NO_HZ, the granularity error is always very > small both with or without CLOCK_TICK_ADJUST. If you change the frequency of acpi_pm to 3579000 you'll get this: HZ=1000 CLOCK_TICK_ADJUST=0 jiffies 153000 ppb error jiffies NOHZ 153000 ppb error pit 152533 ppb error pit NOHZ 0 ppb error acpi_pm 0 ppb error acpi_pm NOHZ 0 ppb error HZ=1000 CLOCK_TICK_ADJUST=-152533 jiffies 0 ppb error jiffies NOHZ 466 ppb error pit -467 ppb error pit NOHZ -1 ppb error acpi_pm 126407 ppb error acpi_pm NOHZ 22 ppb error CLOCK_TICK_ADJUST has only any meaning for PIT (and indirectly for jiffies). For every other clock you just add some random value, where it doesn't do _any_ good. The worst case error there will always be (ntp_hz/freq/2*10^9nsec), all you do with CLOCK_TICK_ADJUST is to do shift it around, but it doesn't actually fix the error - it's still there. > However, without CLOCK_TICK_ADJUST, the jiffies error increases for all > values of HZ except 100 (which at first seems odd, but seems to be due > to loss from rounding in the ACTHZ calculation). jiffies depends on the timer resolution, so it will practically produce the same results as PIT (assuming it's used to generate the timer tick). > One interesting surprise in the data: With CLOCK_TICK_ADJUST=0, the > acpi_pm's error frequency shot up in the !NO_HZ cases. This ends up > being due to the acpi_pm being a very close to a multiple (3x) of the > pit frequency, so CLOCK_TICK_ADJUST helps it as well. What exactly does it help with? All you are doing is number cosmetics, it has _no_ practically value and only decreases the quality of timekeeping. > Further it seems to point that if we are going to be chasing down small > sub-100ppb errors (which I think would be great to do, but lets not make > users to endure 200+ppm errors while we debate the fine-tuning :) we > might want to consider a method where we let ntp_update_freq take into > account the current clocksource's interval length, so it becomes the > base value against which we apply adjustments (scaled appropriately). The error at least is real, the use value of CLOCK_TICK_ADJUST for the common case is not existent. > There are 3 sources of error that we've discussed here: > 1) The large (280ppm) error seen with no-NTP adjustment, caused by the > inconsistent (A!=B) interval comparisons which started this discussion, > which my patch does address. Part of the error is caused by CLOCK_TICK_ADJUST, but the other part of the error is real, all you do is hiding it. > 2) The medium (153-0ppm)clocksource granularity error shown in the data > above, caused by the actual interval length not matching the requested > interval length (what CLOCK_TICK_ADJUST tries to compensate for when > using jiffies/PIT). To be precise: CLOCK_TICK_ADJUST doesn't compensate anything, it only hides the error for jiffies/PIT. In any other case it's just some random value added to it. > 3) The smaller (0.076ppm max) NTP adjustment error caused by the > parts-per-billion == nsec-per-sec shortcut used in combination w/ a base > interval that is not actually a second (due to CLOCK_TICK_ADJUST being > added in) in ntp_update_frequency() The maximum is not that trivial, e.g. alpha has a CLOCK_TICK_RATE value of 32768 and has two possible HZ values 1024/1200. Everything is fine with 1024, but with 1200 you create an error of ~11230ppm. In the alpha case you have the interesting problem, that the timer tick is generated by the RTC (if I see it correctly). If the alpha used clock sources, what value should CLOCK_TICK_RATE have? Which of the three possible clocks do you want to adjust - the jiffies, the PIT or the cpu cycle clock? That's the other big problem with CLOCK_TICK_RATE, for many archs it's just some random value and in some unlucky configuration it now may do more harm than it does any good. It had some value when process timer were jiffies based, but since hrtimer that reason is gone and in the NTP code it does more harm than good. bye, Roman ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] correct inconsistent ntp interval/tick_length usage 2008-02-16 4:24 ` Roman Zippel @ 2008-02-19 1:02 ` john stultz 2008-02-19 4:04 ` Roman Zippel 0 siblings, 1 reply; 36+ messages in thread From: john stultz @ 2008-02-19 1:02 UTC (permalink / raw) To: Roman Zippel; +Cc: lkml, Andrew Morton, Ingo Molnar, Steven Rostedt On Sat, 2008-02-16 at 05:24 +0100, Roman Zippel wrote: > Hi, > > On Wed, 13 Feb 2008, john stultz wrote: > > > Oh! So your issue is that since time_freq is stored in ppm, or in effect > > a usecs per sec offset, when we add it to something other then 1 second > > we mis-apply what NTP tells us to. Is that right? > > Pretty much everything is centered around that 1sec, so the closer the > update frequency is to it the better. > > > Right, so if NTP has us apply a 10ppm adjustment, instead of doing: > > NSEC_PER_SEC + 10,000 > > > > We're doing: > > NSEC_PER_SEC + CLOCK_TICK_ADJUST + 10,000 > > > > Which, if I'm doing my math right, results in a 10.002ppm adjustment > > (using the 999847467ns number above), instead of just a 10ppm > > adjustment. > > > > Now, true, this is an error, but it is a pretty small one. Even at the > > maximum 500ppm value, it only results in an error of 76 parts per > > billion. As you'll see below, that tends to be less error then what we > > get from the clock granularity. Is there something else I'm missing here > > or is this really the core issue you're concerned with? > > The error accumulates and there is no good reason to do this for the > common case. I agree, but we can also easily resolve this by scaling the ppm adjustment to the interval length, rather then just applying it as usec/sec. No? So instead of: adjusted_length = NSEC_PER_SEC + time_adj We could do: adjusted_length = ntp_interval_length + (ntp_interval_length * time_adj)/MILLION So this fixes the time_adj scaling error, while letting us be more flexible w/ our interval length, so we can better map the requested length to the clocksource granularity, lessening the granularity error. > > HZ=1000 CLOCK_TICK_ADJUST=-152533 > > jiffies 467 ppb error > > jiffies NOHZ 467 ppb error > > pit 0 ppb error > > pit NOHZ 0 ppb error > > acpi_pm -280 ppb error > > acpi_pm NOHZ 279 ppb error > > > > HZ=1000 CLOCK_TICK_ADJUST=0 > > jiffies 153000 ppb error > > jiffies NOHZ 153000 ppb error > > pit 152533 ppb error > > pit NOHZ 0 ppb error > > acpi_pm -127112 ppb error > > acpi_pm NOHZ 279 ppb error > > > > So you are right, w/ pit & NO_HZ, the granularity error is always very > > small both with or without CLOCK_TICK_ADJUST. > > If you change the frequency of acpi_pm to 3579000 you'll get this: > > HZ=1000 CLOCK_TICK_ADJUST=0 > jiffies 153000 ppb error > jiffies NOHZ 153000 ppb error > pit 152533 ppb error > pit NOHZ 0 ppb error > acpi_pm 0 ppb error > acpi_pm NOHZ 0 ppb error > > HZ=1000 CLOCK_TICK_ADJUST=-152533 > jiffies 0 ppb error > jiffies NOHZ 466 ppb error > pit -467 ppb error > pit NOHZ -1 ppb error > acpi_pm 126407 ppb error > acpi_pm NOHZ 22 ppb error Right, but the acpi_pm's frequency isn't 3579000. It's supposed to be 3579545. That is injecting error, and I believe it to be different then what I'm claiming is achieved by CLOCK_TICK_ADJUST. Further, the specific example above was agreeing with you that that CLOCK_TICK_ADJUST=0 looks ok for NOHZ, with the exception in the case of the jiffies clocksource. That one still has the 153ppm drift. What do we do about that? > CLOCK_TICK_ADJUST has only any meaning for PIT (and indirectly for > jiffies). For every other clock you just add some random value, where > it doesn't do _any_ good. While not the main point, the aside I included about the acpi_pm being interesting, is that because the ACPI PM's frequency is a multiple of the PIT's (and thus jiffies), the same granularity issues arise (although to a lesser degree). That is why CLOCK_TICK_ADJUST helps it in the !NOHZ case. > The worst case error there will always be (ntp_hz/freq/2*10^9nsec), all > you do with CLOCK_TICK_ADJUST is to do shift it around, but it doesn't > actually fix the error - it's still there. > > > However, without CLOCK_TICK_ADJUST, the jiffies error increases for all > > values of HZ except 100 (which at first seems odd, but seems to be due > > to loss from rounding in the ACTHZ calculation). > > jiffies depends on the timer resolution, so it will practically produce > the same results as PIT (assuming it's used to generate the timer tick). > > > One interesting surprise in the data: With CLOCK_TICK_ADJUST=0, the > > acpi_pm's error frequency shot up in the !NO_HZ cases. This ends up > > being due to the acpi_pm being a very close to a multiple (3x) of the > > pit frequency, so CLOCK_TICK_ADJUST helps it as well. > > What exactly does it help with? > All you are doing is number cosmetics, it has _no_ practically value and > only decreases the quality of timekeeping. Unless you want to clarify what aspect of "quality" you're talking about, I'd very much disagree with your claim. This is not cosmetics, this is about addressing granularity error. By using a base interval that does not match the clocksource's natural granularity, we effectively inject a per-interval error. Since the error is tracked and compensated by the clocksource_adjust() function, we end up with a incorrect clocksource frequency. This then has to be discovered and compensated for by NTP, in addition to the actual hardware error. By using a base interval that does match the clocksource's natural granularity, we avoid the per-interval error. Without this per-interval error, we utilize the specified frequency of the hardware, and NTP only has to correct for the actual hardware error. If we are building a train station, and each train car is 60ft, it doesn't make much sense to build 1000ft stations, right? Instead you'll do better if you build a 1020ft station. That's what CLOCK_TICK_ADJ is trying to address. Now, I'm open to doing something other then just adding CLOCK_TICK_ADJ. For instance, if we find a way to push the initial xtime_interval (without clocksource_adjust()'s adjustments - see the mult-adj-split patch from my patchset friday) back to the ntp code so it uses that as its interval base, and then use the time_adj scaling pointed out above. That I think would work. > > Further it seems to point that if we are going to be chasing down small > > sub-100ppb errors (which I think would be great to do, but lets not make > > users to endure 200+ppm errors while we debate the fine-tuning :) we > > might want to consider a method where we let ntp_update_freq take into > > account the current clocksource's interval length, so it becomes the > > base value against which we apply adjustments (scaled appropriately). > > The error at least is real, the use value of CLOCK_TICK_ADJUST for the > common case is not existent. Again, I disagree. > > There are 3 sources of error that we've discussed here: > > 1) The large (280ppm) error seen with no-NTP adjustment, caused by the > > inconsistent (A!=B) interval comparisons which started this discussion, > > which my patch does address. > > Part of the error is caused by CLOCK_TICK_ADJUST, but the other part of > the error is real, all you do is hiding it. Explain hiding it. I'm just making sure we do equal comparisons. If ntp_update_frequency() is using CLOCK_TICK_ADJUST, so should the clocksource_calculate_interval() functions. And yes, if we remove CLOCK_TICK_ADJUST, that would also resolve the (A!=B) issue, but it doesn't address the error from #2 below. > > 2) The medium (153-0ppm)clocksource granularity error shown in the data > > above, caused by the actual interval length not matching the requested > > interval length (what CLOCK_TICK_ADJUST tries to compensate for when > > using jiffies/PIT). > > To be precise: CLOCK_TICK_ADJUST doesn't compensate anything, it only > hides the error for jiffies/PIT. In any other case it's just some random > value added to it. Wrong. We're using a base interval that maps to the granularity of the PIT/jiffies clocksource. It doesn't matter too much for other clocksources, as they're fine grained enough to not really matter, esp if NTP_INTERVAL_FREQ is 2. > > 3) The smaller (0.076ppm max) NTP adjustment error caused by the > > parts-per-billion == nsec-per-sec shortcut used in combination w/ a base > > interval that is not actually a second (due to CLOCK_TICK_ADJUST being > > added in) in ntp_update_frequency() > > The maximum is not that trivial, e.g. alpha has a CLOCK_TICK_RATE value of > 32768 and has two possible HZ values 1024/1200. Everything is fine with > 1024, but with 1200 you create an error of ~11230ppm. Uh. Check that again. I'm seeing the 11230ppm error only when looking at the _granularity error_ when CLOCK_TICK_ADJUST is *not* used (where CLOCK_TICK_ADJUST is used, we get 1.2ppm granularity error). To see the maximum NTP adjustment error caused by the nsec-per-sec==ppb shortcut, its: (NSEC_PER_SEC + 500,000)/NSEC_PER_SEC - (NSEC_PER_SEC + CLOCK_TICK_ADJ + 500,000)/(NSEC_PER_SEC + CLOCK_TICK_ADJ) = -5689ppb error So even here, in the case where a *very* poor HZ value is selected given the CLOCK_TICK_RATE (its that train station analogy again), the error from point #3 is fairly small. > In the alpha case you have the interesting problem, that the timer tick is > generated by the RTC (if I see it correctly). If the alpha used clock > sources, what value should CLOCK_TICK_RATE have? Which of the three > possible clocks do you want to adjust - the jiffies, the PIT or the cpu > cycle clock? On this I agree. If we're using fine grained clocksources, CLOCK_TICK_RATE makes less and less sense. But it also should hurt less, due to the granularity error being smaller on those clocksources. And yes, one could imagine a system with multiple course clocks that have different granularity errors and in that case CLOCK_TICK_RATE doesn't address everything. However right now CLOCK_TICK_RATE is important for very course clocksources such as jiffies and the PIT. And many systems use those clocksources (it should be noted, EVERY non GENERIC_TIME arch uses the jiffies clocksource as its clocksource), so it effects many users. Until systems do not use those as clocksources (when everyone has very fine grained clocks) we need a solution here. I'm not seeing how your suggestions solve this. > That's the other big problem with CLOCK_TICK_RATE, for many archs it's > just some random value and in some unlucky configuration it now may do > more harm than it does any good. It had some value when process timer were > jiffies based, but since hrtimer that reason is gone and in the NTP code > it does more harm than good. Oh yes. Setting it correctly is a responsibility of the arch maintainer. If the arch has a incorrect CLOCK_TICK_RATE, it will very much do damage to the accuracy of the jiffies clocksource and will hurt timekeeping. However, if we just remove CLOCK_TICK_RATE, we will hurt the jiffies clocksource as well. Overall, I think I agree with your basic dislike for CLOCK_TICK_RATE, but I do not see how we just get rid of it as you suggest and still have good timekeeping for all systems. So again I claim these are the issues, in severity order: 1) A!=B interval comparison needs to get solved. This is just a brain dead source of large error, and is easily fixed one way or the other. 2) We need a solution that handles granularity error well, as this is a moderate source of error for course clocksources such as jiffies. CLOCK_TICK_ADJUST does cover this fairly well in most cases. I suspect we could do even better, but that will take some deeper changes. 3) We should make sure the even smaller error done by not scaling the ntp frequency adjustment to the interval length is corrected. My approach solves only #1. #2 is addressed as CLOCK_TICK_ADJUST handles it relatively well in most cases. I've proposed that we scale the frequency adjustment properly to handle #3. My understanding of your approach (removing CLOCK_TICK_ADJUST), addresses issues #1 and #3, but hurts issue #2. So have we talked this out enough? :) If I'm still mis-characterizing your solution, feel free to send a patch and I'll try to refine my critique. thanks -john ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] correct inconsistent ntp interval/tick_length usage 2008-02-19 1:02 ` john stultz @ 2008-02-19 4:04 ` Roman Zippel 2008-02-20 1:50 ` john stultz 0 siblings, 1 reply; 36+ messages in thread From: Roman Zippel @ 2008-02-19 4:04 UTC (permalink / raw) To: john stultz; +Cc: lkml, Andrew Morton, Ingo Molnar, Steven Rostedt Hi, On Mon, 18 Feb 2008, john stultz wrote: > If we are building a train station, and each train car is 60ft, it > doesn't make much sense to build 1000ft stations, right? Instead you'll > do better if you build a 1020ft station. That would assume trains are always 60ft long, which is the basic error in your assumption. Since we're using analogies: What you're doing is to put you winter clothes on your weight scale and reset the scale to zero to offset for the weigth of the clothes. If you stand now with your bathing clothes on the scale, does that mean you lost weight? That's all you do - you change the scale and slightly screw the scale for everyone else trying to use it. To keep in mind what time adjusting is supposed to do: freq = 1sec + time_freq What we do instead is: freq + tick_adj = 1sec + time_freq Where exactly is now the problem to integrate tick_adj into time_freq? The result is _exactly_ the same. The only visible difference is a slightly higher time_freq value and as long as it is within the 500 ppm limit there is simply no problem. > And yes, if we remove CLOCK_TICK_ADJUST, that would also resolve the > (A!=B) issue, but it doesn't address the error from #2 below. > [..] > 2) We need a solution that handles granularity error well, as this is a > moderate source of error for course clocksources such as jiffies. > CLOCK_TICK_ADJUST does cover this fairly well in most cases. I suspect > we could do even better, but that will take some deeper changes. How exactly does CLOCK_TICK_ADJUST address this problem? The error due to insufficient resolution is still there, all it does is shifting the scale, so it's not immediately visible. > My understanding of your approach (removing CLOCK_TICK_ADJUST), > addresses issues #1 and #3, but hurts issue #2. What exactly is hurt? bye, Roman ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] correct inconsistent ntp interval/tick_length usage 2008-02-19 4:04 ` Roman Zippel @ 2008-02-20 1:50 ` john stultz 2008-02-20 17:08 ` Roman Zippel 0 siblings, 1 reply; 36+ messages in thread From: john stultz @ 2008-02-20 1:50 UTC (permalink / raw) To: Roman Zippel; +Cc: lkml, Andrew Morton, Ingo Molnar, Steven Rostedt On Tue, 2008-02-19 at 05:04 +0100, Roman Zippel wrote: > Hi, > > On Mon, 18 Feb 2008, john stultz wrote: > > > If we are building a train station, and each train car is 60ft, it > > doesn't make much sense to build 1000ft stations, right? Instead you'll > > do better if you build a 1020ft station. > > That would assume trains are always 60ft long, which is the basic error in > your assumption. > > Since we're using analogies: What you're doing is to put you winter > clothes on your weight scale and reset the scale to zero to offset for the > weigth of the clothes. If you stand now with your bathing clothes on the > scale, does that mean you lost weight? Sheesh, now you're calling me deceiving *and* fat! ;) > That's all you do - you change the scale and slightly screw the scale for > everyone else trying to use it. I don't think so. This all has to do with intervals, not scale offsets. I'm having a hard time seeing how we're not connecting on this. To better keep with your analogy, you'd have to imagine a scale that only reads in X pound increments. As long as X is fairly small, it should measure everyone's weight fairly well. However, if X is large, like say 50kg, then it won't weigh a 70kg person very accurately (even if he is a liar and he really weighs 77kgs). This is the granularity error I'm talking about. Again, this all has to do with our accumulation intervals. Lets look at the code. I'm going to focus on jiffies as its the best example, being both very common and very course. Let's say we make our base interval 1,000,000ns. Using the jiffies clocksource, the closest we can get to this interval is 1 jiffy, which is 999,847ns. Now every interval we will see 153ns error. The clocksource_adjust() function will watch this error accumulate and adjust the jiffies's frequency by 153ppm to try to compensate. Note, this compensation is not adjusting for any hardware drift, it is only compensating for the error we've injected by choosing an interval the clocksource cannot match. Now, when NTP starts up, if we had perfect hardware and there was no hardware drift, NTP would have to inject a -153ppm correction to offset the systematic error we've introduced. If we do not have perfect hardware, then NTP would have to correct for both the 153ppm error and the hardware error. Sp if the hardware error was in the same direction, we can now only compensate for up to a 347ppm hardware drift, before we hit the 500ppm bound in NTP. Using the test program I sent out before: HZ=1000 CLOCK_TICK_ADJUST=0 jiffies ========== 1 cycles per 1000000 ns 999847 ns per cycle 999847 ns per interval 153000 ppb error jiffies NOHZ ========== 500 cycles per 500000000 ns 999847 ns per cycle 499923500 ns per interval 153000 ppb error Now, if we make our base interval match the clocksource granularity, this can avoid the error. If the base interval we accumulate with is 999,847ns, we avoid needlessly accumulating error. This means without NTP correction, the system clock drift will match the hardware clock drift. So when NTP does kick in, it only has to correct for that hardware drift. HZ=1000 CLOCK_TICK_ADJUST=-152533 jiffies ========== 1 cycles per 999847 ns 999847 ns per cycle 999847 ns per interval 0 ppb error jiffies NOHZ ========== 500 cycles per 499923733 ns 999847 ns per cycle 499923500 ns per interval 466 ppb error > To keep in mind what time adjusting is supposed to do: > > freq = 1sec + time_freq But it is this fixation on 1sec that is the cause of the granularity error. I believe the following is also correct (assuming time_freq is in ppm units): adjusted_freq = interval_length + (interval_length * time_freq)/MILLION This properly scales the adjustment to any interval length. > What we do instead is: > > freq + tick_adj = 1sec + time_freq > > Where exactly is now the problem to integrate tick_adj into time_freq? The > result is _exactly_ the same. The only visible difference is a slightly > higher time_freq value and as long as it is within the 500 ppm limit there > is simply no problem. Well, it is a problem if its large. The 500ppm limit is supposed to be for hardware frequency error correction, not hardware frequency + software error correction. Now, if it were 1-10ppm, it wouldn't be that big of an issue, but with the jiffies example above, 153ppm does cut into the correctable space a good bit. Show me how we cut that error down, and I'll be happy to kill CLOCK_TICK_ADJUST. > > And yes, if we remove CLOCK_TICK_ADJUST, that would also resolve the > > (A!=B) issue, but it doesn't address the error from #2 below. > > [..] > > 2) We need a solution that handles granularity error well, as this is a > > moderate source of error for course clocksources such as jiffies. > > CLOCK_TICK_ADJUST does cover this fairly well in most cases. I suspect > > we could do even better, but that will take some deeper changes. > > How exactly does CLOCK_TICK_ADJUST address this problem? The error due to > insufficient resolution is still there, all it does is shifting the scale, > so it's not immediately visible. Again, I agree there is a small error (76ppb in my earlier example) caused when we time_freq adjustment to what we assume is 1 second, when CLOCK_TICK_ADJ is involved. However, relatively this is small compared with the example 153ppm error above caused by ignoring the granularity issue. Additionally we can fix it using the scaling method I pointed out above. > > My understanding of your approach (removing CLOCK_TICK_ADJUST), > > addresses issues #1 and #3, but hurts issue #2. > > What exactly is hurt? By injecting 153ppm of error, the ability for NTP to correct hardware error within 500ppm is hurt. Sigh. So at this point, if we're not closing the gap in our understanding, I'm not sure how much its worth to continue on the discussion in this manner. I'd welcome anyone to help clarify what I'm missing, or maybe assistance in better communicating my point. In the meantime, I'll add the scaling fix from above to my code and re-submit it. If you want you can send your own patch and I'll gladly test, review it and try to figure out what I'm missing. Once again, thanks for the feedback. -john ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] correct inconsistent ntp interval/tick_length usage 2008-02-20 1:50 ` john stultz @ 2008-02-20 17:08 ` Roman Zippel 2008-02-22 2:39 ` john stultz 0 siblings, 1 reply; 36+ messages in thread From: Roman Zippel @ 2008-02-20 17:08 UTC (permalink / raw) To: john stultz; +Cc: lkml, Andrew Morton, Ingo Molnar, Steven Rostedt Hi, On Tue, 19 Feb 2008, john stultz wrote: > To better keep with your analogy, you'd have to imagine a scale that > only reads in X pound increments. As long as X is fairly small, it > should measure everyone's weight fairly well. However, if X is large, > like say 50kg, then it won't weigh a 70kg person very accurately (even > if he is a liar and he really weighs 77kgs). > > > This is the granularity error I'm talking about. There is a big difference between accuracy and granularity. Even a coarse grained scale can be accurate (just within its resolution) and a fine grained scale can be very inaccurate if you shift around the scale. Please keep this separate, otherwise this can go on forever... > Now, when NTP starts up, if we had perfect hardware and there was no > hardware drift, NTP would have to inject a -153ppm correction to offset > the systematic error we've introduced. > > If we do not have perfect hardware, then NTP would have to correct for > both the 153ppm error and the hardware error. Sp if the hardware error > was in the same direction, we can now only compensate for up to a 347ppm > hardware drift, before we hit the 500ppm bound in NTP. Out of curiosity, what kind of hardware error do you expect? If you have that crappy hardware you're better off initializing the clock with the correct frequency. > > To keep in mind what time adjusting is supposed to do: > > > > freq = 1sec + time_freq > > But it is this fixation on 1sec that is the cause of the granularity > error. You need some kind of fix point and everyone is using 1sec for as base length, for some mysterious reason you're now trying to redefine it. The PIT (and any clock based on it) has a certain resolution, so with an update frequency of HZ=1000 it produces a certain error, but why on earth are you trying to introduce this error as universal constant? This is a property of the PIT clock, applying this error to every other clock makes no sense. > I believe the following is also correct (assuming time_freq is in ppm units): > > adjusted_freq = interval_length + (interval_length * time_freq)/MILLION > > This properly scales the adjustment to any interval length. Actually it's not that simple, ntp_update_frequency() only calculates the base length, time_offset had to be scaled as well (and back when requested from user space). You would add a lot of complexity for a silly little error, which the current mechanisms can handle just fine. > > What we do instead is: > > > > freq + tick_adj = 1sec + time_freq > > > > Where exactly is now the problem to integrate tick_adj into time_freq? The > > result is _exactly_ the same. The only visible difference is a slightly > > higher time_freq value and as long as it is within the 500 ppm limit there > > is simply no problem. > > Well, it is a problem if its large. The 500ppm limit is supposed to be > for hardware frequency error correction, not hardware frequency + > software error correction. Now, if it were 1-10ppm, it wouldn't be that > big of an issue, but with the jiffies example above, 153ppm does cut > into the correctable space a good bit. Again, what kind of crappy hardware do you expect? Aren't clocks supposed to get better and not worse? Where do you get this idea that the 500ppm are exclusively for hardware errors? If you have such bad hardware, there is another simple solution: change HZ to 100 and the error is reduced to 15ppm. I would see the point if this problem had actually any practically relevance, but this error is not a problem for pretty much all existing standard hardware. Why are you insisting on redesigning timekeeping for broken hardware? > > > My understanding of your approach (removing CLOCK_TICK_ADJUST), > > > addresses issues #1 and #3, but hurts issue #2. > > > > What exactly is hurt? > > By injecting 153ppm of error, the ability for NTP to correct hardware > error within 500ppm is hurt. There's nothing 'injected', that resolution error is very real and the 500ppm limit is more than enough to deal with this. _Nobody_ is hurt by this. > Sigh. So at this point, if we're not closing the gap in our > understanding, I'm not sure how much its worth to continue on the > discussion in this manner. I'd welcome anyone to help clarify what I'm > missing, or maybe assistance in better communicating my point. The point is you are redefining a mechanism which has _never_ been intendend for purpose you're trying to abuse it for now. Reread the original patch, it was intended for kernel with HZ of 2000, where the error would be 687ppm. Now we have other ways to increase the resolution, timekeeping isn't solely based on the PIT anymore. The whole reason for the original patch is pretty much gone by now. If you really need some kind of adjustment for your extremely broken hardware, below is the absolute maximum you need, which doesn't inflict more insanity on all the sane hardware. bye, Roman Revert bbe4d18ac2e058c56adb0cd71f49d9ed3216a405 and e13a2e61dd5152f5499d2003470acf9c838eab84 and remove CLOCK_TICK_ADJUST completely. Add a optional kernel parameter ntp_tick_adj instead to allow adjusting of a large base drift and thus keeping ntpd happy. The CLOCK_TICK_ADJUST mechanism was introduced at a time PIT was the primary clock, but we have a varity of clock sources now, so a global PIT specific adjustment makes little sense anymore. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> --- include/linux/timex.h | 9 +-------- kernel/time/ntp.c | 11 ++++++++++- kernel/time/timekeeping.c | 6 ++---- 3 files changed, 13 insertions(+), 13 deletions(-) Index: linux-2.6/include/linux/timex.h =================================================================== --- linux-2.6.orig/include/linux/timex.h +++ linux-2.6/include/linux/timex.h @@ -232,14 +232,7 @@ static inline int ntp_synced(void) #else #define NTP_INTERVAL_FREQ (HZ) #endif - -#define CLOCK_TICK_OVERFLOW (LATCH * HZ - CLOCK_TICK_RATE) -#define CLOCK_TICK_ADJUST (((s64)CLOCK_TICK_OVERFLOW * NSEC_PER_SEC) / \ - (s64)CLOCK_TICK_RATE) - -/* Because using NSEC_PER_SEC would be too easy */ -#define NTP_INTERVAL_LENGTH ((((s64)TICK_USEC * NSEC_PER_USEC * USER_HZ) + \ - CLOCK_TICK_ADJUST) / NTP_INTERVAL_FREQ) +#define NTP_INTERVAL_LENGTH (NSEC_PER_SEC/NTP_INTERVAL_FREQ) /* Returns how long ticks are at present, in ns / 2^(SHIFT_SCALE-10). */ extern u64 current_tick_length(void); Index: linux-2.6/kernel/time/ntp.c =================================================================== --- linux-2.6.orig/kernel/time/ntp.c +++ linux-2.6/kernel/time/ntp.c @@ -42,12 +42,13 @@ long time_esterror = NTP_PHASE_LIMIT; /* long time_freq; /* frequency offset (scaled ppm)*/ static long time_reftime; /* time at last adjustment (s) */ long time_adjust; +long ntp_tick_adj; static void ntp_update_frequency(void) { u64 second_length = (u64)(tick_usec * NSEC_PER_USEC * USER_HZ) << TICK_LENGTH_SHIFT; - second_length += (s64)CLOCK_TICK_ADJUST << TICK_LENGTH_SHIFT; + second_length += (s64)ntp_tick_adj << TICK_LENGTH_SHIFT; second_length += (s64)time_freq << (TICK_LENGTH_SHIFT - SHIFT_NSEC); tick_length_base = second_length; @@ -400,3 +401,11 @@ leave: if ((time_status & (STA_UNSYNC|ST notify_cmos_timer(); return(result); } + +static int __init ntp_tick_adj_setup(char *str) +{ + ntp_tick_adj = simple_strtol(str, NULL, 0); + return 1; +} + +__setup("ntp_tick_adj=", ntp_tick_adj_setup); Index: linux-2.6/kernel/time/timekeeping.c =================================================================== --- linux-2.6.orig/kernel/time/timekeeping.c +++ linux-2.6/kernel/time/timekeeping.c @@ -187,8 +187,7 @@ static void change_clocksource(void) clock->error = 0; clock->xtime_nsec = 0; - clocksource_calculate_interval(clock, - (unsigned long)(current_tick_length()>>TICK_LENGTH_SHIFT)); + clocksource_calculate_interval(clock, NTP_INTERVAL_LENGTH); tick_clock_notify(); @@ -245,8 +244,7 @@ void __init timekeeping_init(void) ntp_clear(); clock = clocksource_get_next(); - clocksource_calculate_interval(clock, - (unsigned long)(current_tick_length()>>TICK_LENGTH_SHIFT)); + clocksource_calculate_interval(clock, NTP_INTERVAL_LENGTH); clock->cycle_last = clocksource_read(clock); xtime.tv_sec = sec; ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] correct inconsistent ntp interval/tick_length usage 2008-02-20 17:08 ` Roman Zippel @ 2008-02-22 2:39 ` john stultz 2008-02-25 14:44 ` Roman Zippel ` (2 more replies) 0 siblings, 3 replies; 36+ messages in thread From: john stultz @ 2008-02-22 2:39 UTC (permalink / raw) To: Roman Zippel; +Cc: lkml, Andrew Morton, Ingo Molnar, Steven Rostedt On Wed, 2008-02-20 at 18:08 +0100, Roman Zippel wrote: > > Well, it is a problem if its large. The 500ppm limit is supposed to be > > for hardware frequency error correction, not hardware frequency + > > software error correction. Now, if it were 1-10ppm, it wouldn't be that > > big of an issue, but with the jiffies example above, 153ppm does cut > > into the correctable space a good bit. > > Again, what kind of crappy hardware do you expect? Aren't clocks supposed > to get better and not worse? Well, while I've seen much worse, I consider crappy hardware to be 100 +ppm error. So if the hardware is perfect and the system results in 153ppm error, I'd consider that pretty crappy, especially if its not the hardware's fault. > Where do you get this idea that the 500ppm are exclusively for hardware > errors? If you have such bad hardware, there is another simple solution: > change HZ to 100 and the error is reduced to 15ppm. True its not exclusively for hardware errors, and if we were talking about only 15ppm I wouldn't really worry about it. But when we're saying the system is adding 30% of the maximum error, that's just not good. > I would see the point if this problem had actually any practically > relevance, but this error is not a problem for pretty much all existing > standard hardware. Why are you insisting on redesigning timekeeping for > broken hardware? Remember my earlier data? Where I was talking about the acpi_pm being a multiple of the PIT frequency? By removing CLOCK_TICK_ADJUST we got a 127ppm error when HZ=1000. NO_HZ drops that down to where we don't care, but this _does_ effect current hardware, so I'd call it relevant. > > > > My understanding of your approach (removing CLOCK_TICK_ADJUST), > > > > addresses issues #1 and #3, but hurts issue #2. > > > > > > What exactly is hurt? > > > > By injecting 153ppm of error, the ability for NTP to correct hardware > > error within 500ppm is hurt. > > There's nothing 'injected', that resolution error is very real and the > 500ppm limit is more than enough to deal with this. _Nobody_ is hurt by > this. Sure, 500ppm is enough for most people with good hardware. But remember the alpha example you brought up earlier? The HZ=1200 case, with the CLOCK_TICK_RATE=32768? If we don't take CLOCK_TICK_ADJUST into account, we end up with a **11230ppm** error from the granularity issue. NTP just won't work on those systems. Now granted, the three types of alpha systems that actually use that HZ value is probably as close to "nobody" as you're going to get, but I don't think we can just throw the granularity issue aside. > Revert bbe4d18ac2e058c56adb0cd71f49d9ed3216a405 and > e13a2e61dd5152f5499d2003470acf9c838eab84 and remove CLOCK_TICK_ADJUST > completely. Add a optional kernel parameter ntp_tick_adj instead to allow > adjusting of a large base drift and thus keeping ntpd happy. > The CLOCK_TICK_ADJUST mechanism was introduced at a time PIT was the > primary clock, but we have a varity of clock sources now, so a global PIT > specific adjustment makes little sense anymore. > > Signed-off-by: Roman Zippel <zippel@linux-m68k.org> So thanks so much for sending the patch. It makes clear your solution. My initial comments: Its simple and that really does have its merits. It resolves the inconsistent comparison issue, and does not have the smallish scaling error we talked about as well. As I've said before, I do like the idea, I'm just worried about the corner cases (mainly jiffies based systems). The granularity issue is still present. Depending on the HZ settings and the clocksource hardware, systems may see large errors added on to the actual hardware error, and its possible the kernel error may dominate the actual hardware error. The ntp_tick_adjust option does give a way out if you have, for example, one of those alpha systems where it would be necessary, but I do wish there was a better way then forcing users to calculate for themselves what the granularity adjustment should be (esp given that it is more a function of the kernel compile options, so different kernels would need different values for the same system). So then yes, your patch is simple and corrects the issue that started the discussion. I think we're closing the gaps. :) I still think my claims hold that your patch as it stands may worsen the drift error depending on HZ settings, especially on jiffies based systems (which means every non-GENERIC_TIME arch). However, if folks don't really care, then that may be acceptable. As promised, here is my own patch, which takes the scaling error you pointed out into account, as well as resolving the granularity issue in a way similar to your ntp_tick_adjust option does, only the kernel will calculate such a granularity correction on a per-clocksource base, so users don't have to do the math. Sadly I've not had the chance to really test and debug this (there's a lot of shifting logic, so I may have flubed something there), but I wanted to send it out for comments, as I think it communicates the gist of the idea. I'll test and refine it tomorrow and send it out again if needed. Let me know what you think. Again, thanks for sending the patch and your continued feedback! -john -----------------------------------------> My first version of the ntp_interval/tick_length inconsistent usage patch was recently merged as bbe4d18ac2e058c56adb0cd71f49d9ed3216a405 http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=bbe4d18ac2e058c56adb0cd71f49d9ed3216a405 While the fix did greatly improve the situation, Roman correctly pointed out that it does have a small bug: If the users change clocksources after the system has been running and NTP has made corrections, the correctoins made against the old clocksource will be applied against the new clocksource, causing error. My second attempt, which corrects the issue in the NTP_INTERVAL_LENGTH definition has also made it up-stream as commit e13a2e61dd5152f5499d2003470acf9c838eab84 http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=e13a2e61dd5152f5499d2003470acf9c838eab84 Roman also had objections to this patch, and I agree with some, but not all of them. This patch reverts both of those changes, and introduces a new solution: Roman has correctly pointed out that CLOCK_TICK_ADJUST is calculated based on the PIT's frequency, and isn't really relevant to non-PIT driven clocksources (that is, clocksources other then jiffies and pit). However, by removing CLOCK_TICK_ADJUST, we no longer take into account the granularity error that the PIT driven clocksources have. This can result in large errors in time accumulation for systems that use jiffies or the pit. This patch tries to rectify the issue by introducing the ntp_set_granularity_error() function, which is called when we begin using a clocksource. This function allows the actual clocksource granularity error to be taken into consideration by ntp when it calculates the current_tick_length() function. Using this method, we should be able to have the system clock drift match very closely to the actual hardware drift, regardless of the clocksource granularity or the tick frequency. UNTESTED! DO NOT MERGE! Signed-off-by: John Stultz <johnstul@us.ibm.com> UNTESTED! DO NOT MERGE! diff --git a/include/linux/timex.h b/include/linux/timex.h index c3f3747..24fa43b 100644 --- a/include/linux/timex.h +++ b/include/linux/timex.h @@ -233,17 +233,11 @@ static inline int ntp_synced(void) #define NTP_INTERVAL_FREQ (HZ) #endif -#define CLOCK_TICK_OVERFLOW (LATCH * HZ - CLOCK_TICK_RATE) -#define CLOCK_TICK_ADJUST (((s64)CLOCK_TICK_OVERFLOW * NSEC_PER_SEC) / \ - (s64)CLOCK_TICK_RATE) - -/* Because using NSEC_PER_SEC would be too easy */ -#define NTP_INTERVAL_LENGTH ((((s64)TICK_USEC * NSEC_PER_USEC * USER_HZ) + \ - CLOCK_TICK_ADJUST) / NTP_INTERVAL_FREQ) +#define NTP_INTERVAL_LENGTH (NSEC_PER_SEC / NTP_INTERVAL_FREQ) /* Returns how long ticks are at present, in ns / 2^(SHIFT_SCALE-10). */ extern u64 current_tick_length(void); - +extern void ntp_set_granularity_error(s64 len); extern void second_overflow(void); extern void update_ntp_one_tick(void); extern int do_adjtimex(struct timex *); diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c index c88b591..fe25c94 100644 --- a/kernel/time/ntp.c +++ b/kernel/time/ntp.c @@ -43,19 +43,32 @@ long time_freq; /* frequency offset (scaled ppm)*/ static long time_reftime; /* time at last adjustment (s) */ long time_adjust; +static s64 granularity_error_adjust; + +void ntp_set_granularity_error(s64 len) +{ + granularity_error_adjust = len * NTP_INTERVAL_FREQ; +} + static void ntp_update_frequency(void) { u64 second_length = (u64)(tick_usec * NSEC_PER_USEC * USER_HZ) << TICK_LENGTH_SHIFT; - second_length += (s64)CLOCK_TICK_ADJUST << TICK_LENGTH_SHIFT; - second_length += (s64)time_freq << (TICK_LENGTH_SHIFT - SHIFT_NSEC); + s64 adj; + + /* Compensate for clocksource granularity error */ + second_length += granularity_error_adjust; + + /* Scale the base second length by the frequency adjustment */ + adj = second_length * time_freq; + do_div(adj, 1000000); + second_length += adj>>SHIFT_NSEC; tick_length_base = second_length; + do_div(tick_length_base, NTP_INTERVAL_FREQ); do_div(second_length, HZ); tick_nsec = second_length >> TICK_LENGTH_SHIFT; - - do_div(tick_length_base, NTP_INTERVAL_FREQ); } /** diff --git a/kernel/time/timekeeping.c b/kernel/time/timekeeping.c index 1af9fb0..c91f3ec 100644 --- a/kernel/time/timekeeping.c +++ b/kernel/time/timekeeping.c @@ -172,6 +172,7 @@ static void change_clocksource(void) struct clocksource *new; cycle_t now; u64 nsec; + s64 adj; new = clocksource_get_next(); @@ -187,8 +188,15 @@ static void change_clocksource(void) clock->error = 0; clock->xtime_nsec = 0; - clocksource_calculate_interval(clock, - (unsigned long)(current_tick_length()>>TICK_LENGTH_SHIFT)); + clocksource_calculate_interval(clock, NTP_INTERVAL_LENGTH); + + /* Calculate the granularity error */ + adj = clock->cycle_interval * clock->mult; + adj <<= TICK_LENGTH_SHIFT - clock->shift; + adj = (NTP_INTERVAL_LENGTH << TICK_LENGTH_SHIFT) - adj; + ntp_set_granularity_error(adj); + + ntp_clear(); tick_clock_notify(); @@ -245,8 +253,7 @@ void __init timekeeping_init(void) ntp_clear(); clock = clocksource_get_next(); - clocksource_calculate_interval(clock, - (unsigned long)(current_tick_length()>>TICK_LENGTH_SHIFT)); + clocksource_calculate_interval(clock, NTP_INTERVAL_LENGTH); clock->cycle_last = clocksource_read(clock); xtime.tv_sec = sec; ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] correct inconsistent ntp interval/tick_length usage 2008-02-22 2:39 ` john stultz @ 2008-02-25 14:44 ` Roman Zippel 2008-02-29 4:49 ` [PATCH] Remove obsolete CLOCK_TICK_ADJUST Roman Zippel 2008-02-29 18:54 ` [PATCH] correct inconsistent ntp interval/tick_length usage Jörg-Volker Peetz 2 siblings, 0 replies; 36+ messages in thread From: Roman Zippel @ 2008-02-25 14:44 UTC (permalink / raw) To: john stultz; +Cc: lkml, Andrew Morton, Ingo Molnar, Steven Rostedt Hi, On Thu, 21 Feb 2008, john stultz wrote: > > Again, what kind of crappy hardware do you expect? Aren't clocks supposed > > to get better and not worse? > > Well, while I've seen much worse, I consider crappy hardware to be 100 > +ppm error. So if the hardware is perfect and the system results in > 153ppm error, I'd consider that pretty crappy, especially if its not the > hardware's fault. Nevertheless this error is real, why are you trying to hide it? This is isn't an error we can't handle, it's still perfectly within the limit and except that NTP reports a somewhat larger drift than you'd like to see, everything works fine. > > Where do you get this idea that the 500ppm are exclusively for hardware > > errors? If you have such bad hardware, there is another simple solution: > > change HZ to 100 and the error is reduced to 15ppm. > > True its not exclusively for hardware errors, and if we were talking > about only 15ppm I wouldn't really worry about it. But when we're saying > the system is adding 30% of the maximum error, that's just not good. Another 30% is required for normal to crappy hardware clocks and then there is still enough room left. > > I would see the point if this problem had actually any practically > > relevance, but this error is not a problem for pretty much all existing > > standard hardware. Why are you insisting on redesigning timekeeping for > > broken hardware? > > Remember my earlier data? Where I was talking about the acpi_pm being a > multiple of the PIT frequency? By removing CLOCK_TICK_ADJUST we got a > 127ppm error when HZ=1000. NO_HZ drops that down to where we don't care, > but this _does_ effect current hardware, so I'd call it relevant. How exactly does it effect current hardware in a way that it breaks them? Despite this error everything still works fine, the hardware doesn't care. > > There's nothing 'injected', that resolution error is very real and the > > 500ppm limit is more than enough to deal with this. _Nobody_ is hurt by > > this. > > Sure, 500ppm is enough for most people with good hardware. But remember > the alpha example you brought up earlier? The HZ=1200 case, with the > CLOCK_TICK_RATE=32768? If we don't take CLOCK_TICK_ADJUST into account, > we end up with a **11230ppm** error from the granularity issue. NTP just > won't work on those systems. > > Now granted, the three types of alpha systems that actually use that HZ > value is probably as close to "nobody" as you're going to get, but I > don't think we can just throw the granularity issue aside. That's actually a good example, why it's irrelevant. First it's using a cycle based clock, thus the rounding error is irrelevant. Second in the common case they already use 1024 as HZ to reduce this error, so something similiar could be done for the HZ=1200 case and I suspect that it was already done and only CLOCK_TICK_RATE is just wrong. This mail http://consortiumlibrary.org/axp-list/archive/2002-11/0101.html suggest that this is the right thing to do. There is _no_ reason to artificially optimize this error value, there are still enough other ways to improve timekeeping. The granularity error is there no matter what you do and as long as it's within a reasonable limit there is nothing that needs fixing. bye, Roman ^ permalink raw reply [flat|nested] 36+ messages in thread
* [PATCH] Remove obsolete CLOCK_TICK_ADJUST 2008-02-22 2:39 ` john stultz 2008-02-25 14:44 ` Roman Zippel @ 2008-02-29 4:49 ` Roman Zippel 2008-02-29 6:25 ` Ray Lee ` (2 more replies) 2008-02-29 18:54 ` [PATCH] correct inconsistent ntp interval/tick_length usage Jörg-Volker Peetz 2 siblings, 3 replies; 36+ messages in thread From: Roman Zippel @ 2008-02-29 4:49 UTC (permalink / raw) To: john stultz; +Cc: lkml, Andrew Morton, Ingo Molnar, Steven Rostedt Hi, Can we please clean up the current mess and get the patch below merged? bye, Roman Revert bbe4d18ac2e058c56adb0cd71f49d9ed3216a405 and e13a2e61dd5152f5499d2003470acf9c838eab84 and remove CLOCK_TICK_ADJUST completely. Add a optional kernel parameter ntp_tick_adj instead to allow adjusting of a large base drift and thus keeping ntpd happy. The CLOCK_TICK_ADJUST mechanism was introduced at a time PIT was the primary clock, but we have a varity of clock sources now, so a global PIT specific adjustment makes little sense anymore. Signed-off-by: Roman Zippel <zippel@linux-m68k.org> --- include/linux/timex.h | 9 +-------- kernel/time/ntp.c | 11 ++++++++++- kernel/time/timekeeping.c | 6 ++---- 3 files changed, 13 insertions(+), 13 deletions(-) Index: linux-2.6/include/linux/timex.h =================================================================== --- linux-2.6.orig/include/linux/timex.h +++ linux-2.6/include/linux/timex.h @@ -232,14 +232,7 @@ static inline int ntp_synced(void) #else #define NTP_INTERVAL_FREQ (HZ) #endif - -#define CLOCK_TICK_OVERFLOW (LATCH * HZ - CLOCK_TICK_RATE) -#define CLOCK_TICK_ADJUST (((s64)CLOCK_TICK_OVERFLOW * NSEC_PER_SEC) / \ - (s64)CLOCK_TICK_RATE) - -/* Because using NSEC_PER_SEC would be too easy */ -#define NTP_INTERVAL_LENGTH ((((s64)TICK_USEC * NSEC_PER_USEC * USER_HZ) + \ - CLOCK_TICK_ADJUST) / NTP_INTERVAL_FREQ) +#define NTP_INTERVAL_LENGTH (NSEC_PER_SEC/NTP_INTERVAL_FREQ) /* Returns how long ticks are at present, in ns / 2^(SHIFT_SCALE-10). */ extern u64 current_tick_length(void); Index: linux-2.6/kernel/time/ntp.c =================================================================== --- linux-2.6.orig/kernel/time/ntp.c +++ linux-2.6/kernel/time/ntp.c @@ -42,12 +42,13 @@ long time_esterror = NTP_PHASE_LIMIT; /* long time_freq; /* frequency offset (scaled ppm)*/ static long time_reftime; /* time at last adjustment (s) */ long time_adjust; +long ntp_tick_adj; static void ntp_update_frequency(void) { u64 second_length = (u64)(tick_usec * NSEC_PER_USEC * USER_HZ) << TICK_LENGTH_SHIFT; - second_length += (s64)CLOCK_TICK_ADJUST << TICK_LENGTH_SHIFT; + second_length += (s64)ntp_tick_adj << TICK_LENGTH_SHIFT; second_length += (s64)time_freq << (TICK_LENGTH_SHIFT - SHIFT_NSEC); tick_length_base = second_length; @@ -400,3 +401,11 @@ leave: if ((time_status & (STA_UNSYNC|ST notify_cmos_timer(); return(result); } + +static int __init ntp_tick_adj_setup(char *str) +{ + ntp_tick_adj = simple_strtol(str, NULL, 0); + return 1; +} + +__setup("ntp_tick_adj=", ntp_tick_adj_setup); Index: linux-2.6/kernel/time/timekeeping.c =================================================================== --- linux-2.6.orig/kernel/time/timekeeping.c +++ linux-2.6/kernel/time/timekeeping.c @@ -187,8 +187,7 @@ static void change_clocksource(void) clock->error = 0; clock->xtime_nsec = 0; - clocksource_calculate_interval(clock, - (unsigned long)(current_tick_length()>>TICK_LENGTH_SHIFT)); + clocksource_calculate_interval(clock, NTP_INTERVAL_LENGTH); tick_clock_notify(); @@ -245,8 +244,7 @@ void __init timekeeping_init(void) ntp_clear(); clock = clocksource_get_next(); - clocksource_calculate_interval(clock, - (unsigned long)(current_tick_length()>>TICK_LENGTH_SHIFT)); + clocksource_calculate_interval(clock, NTP_INTERVAL_LENGTH); clock->cycle_last = clocksource_read(clock); xtime.tv_sec = sec; ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] Remove obsolete CLOCK_TICK_ADJUST 2008-02-29 4:49 ` [PATCH] Remove obsolete CLOCK_TICK_ADJUST Roman Zippel @ 2008-02-29 6:25 ` Ray Lee 2008-02-29 13:31 ` Roman Zippel 2008-02-29 22:08 ` Andrew Morton 2008-02-29 23:11 ` john stultz 2 siblings, 1 reply; 36+ messages in thread From: Ray Lee @ 2008-02-29 6:25 UTC (permalink / raw) To: Roman Zippel Cc: john stultz, lkml, Andrew Morton, Ingo Molnar, Steven Rostedt On Thu, Feb 28, 2008 at 8:49 PM, Roman Zippel <zippel@linux-m68k.org> wrote: > Revert bbe4d18ac2e058c56adb0cd71f49d9ed3216a405 and > e13a2e61dd5152f5499d2003470acf9c838eab84 and remove CLOCK_TICK_ADJUST > completely. Add a optional kernel parameter ntp_tick_adj instead to allow > adjusting of a large base drift and thus keeping ntpd happy. Wait, so you're saying that something that the kernel correctly figures out on its own today should instead be a kernel parameter, with users having to supply it explicitly? Am I confused? If I'm not confused, would you'd be so kind to explain to the peanut gallery over here why this is a good thing? ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] Remove obsolete CLOCK_TICK_ADJUST 2008-02-29 6:25 ` Ray Lee @ 2008-02-29 13:31 ` Roman Zippel 0 siblings, 0 replies; 36+ messages in thread From: Roman Zippel @ 2008-02-29 13:31 UTC (permalink / raw) To: Ray Lee; +Cc: john stultz, lkml, Andrew Morton, Ingo Molnar, Steven Rostedt Hi, On Thu, 28 Feb 2008, Ray Lee wrote: > > Revert bbe4d18ac2e058c56adb0cd71f49d9ed3216a405 and > > e13a2e61dd5152f5499d2003470acf9c838eab84 and remove CLOCK_TICK_ADJUST > > completely. Add a optional kernel parameter ntp_tick_adj instead to allow > > adjusting of a large base drift and thus keeping ntpd happy. > > Wait, so you're saying that something that the kernel correctly > figures out on its own today should instead be a kernel parameter, > with users having to supply it explicitly? Am I confused? > > If I'm not confused, would you'd be so kind to explain to the peanut > gallery over here why this is a good thing? Sorry, for causing this confusion. Unless you run a computer with the PIT or jiffies clock, this value is just a random value added to the calculation. Even if you use the PIT clock, the drift isn't large enough to be worried about it with standard configurations. You have to manually set HZ to 2000 and use the PIT clock, then you might have a need for this parameter. bye, Roman ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] Remove obsolete CLOCK_TICK_ADJUST 2008-02-29 4:49 ` [PATCH] Remove obsolete CLOCK_TICK_ADJUST Roman Zippel 2008-02-29 6:25 ` Ray Lee @ 2008-02-29 22:08 ` Andrew Morton 2008-02-29 22:27 ` Roman Zippel 2008-02-29 23:11 ` john stultz 2 siblings, 1 reply; 36+ messages in thread From: Andrew Morton @ 2008-02-29 22:08 UTC (permalink / raw) To: Roman Zippel; +Cc: johnstul, linux-kernel, mingo, rostedt On Fri, 29 Feb 2008 05:49:35 +0100 (CET) Roman Zippel <zippel@linux-m68k.org> wrote: > Can we please clean up the current mess and get the patch below merged? > > bye, Roman > > > Revert bbe4d18ac2e058c56adb0cd71f49d9ed3216a405 and > e13a2e61dd5152f5499d2003470acf9c838eab84 and remove CLOCK_TICK_ADJUST > completely. Add a optional kernel parameter ntp_tick_adj instead to allow > adjusting of a large base drift and thus keeping ntpd happy. > The CLOCK_TICK_ADJUST mechanism was introduced at a time PIT was the > primary clock, but we have a varity of clock sources now, so a global PIT > specific adjustment makes little sense anymore. > The changelog provides no reason for the revert of those two patches. Look at it from the point of view of a person who hasn't been following the discussion (whose initials might be LT). That person might get puzzled and upset, no? ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] Remove obsolete CLOCK_TICK_ADJUST 2008-02-29 22:08 ` Andrew Morton @ 2008-02-29 22:27 ` Roman Zippel 2008-02-29 22:38 ` Andrew Morton 2008-02-29 23:11 ` john stultz 0 siblings, 2 replies; 36+ messages in thread From: Roman Zippel @ 2008-02-29 22:27 UTC (permalink / raw) To: Andrew Morton; +Cc: johnstul, linux-kernel, mingo, rostedt Hi, On Fri, 29 Feb 2008, Andrew Morton wrote: > The changelog provides no reason for the revert of those two patches. > > Look at it from the point of view of a person who hasn't been following the > discussion (whose initials might be LT). That person might get puzzled and > upset, no? I'm puzzled at how to explain this... The whole details have been explained over and over during the discussion. The simple answer is that CLOCK_TICK_ADJUST has been causing extra clock drift, John's attempt didn't fix the real cause. My patch doesn't just revert the patches, it also includes the _real_ fix, so why would the real fix require more justification? That person also didn't get puzzled why two patches claiming to fix the same problem got merged... bye, Roman ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] Remove obsolete CLOCK_TICK_ADJUST 2008-02-29 22:27 ` Roman Zippel @ 2008-02-29 22:38 ` Andrew Morton 2008-02-29 23:11 ` john stultz 1 sibling, 0 replies; 36+ messages in thread From: Andrew Morton @ 2008-02-29 22:38 UTC (permalink / raw) To: Roman Zippel; +Cc: johnstul, linux-kernel, mingo, rostedt On Fri, 29 Feb 2008 23:27:01 +0100 (CET) Roman Zippel <zippel@linux-m68k.org> wrote: > Hi, > > On Fri, 29 Feb 2008, Andrew Morton wrote: > > > The changelog provides no reason for the revert of those two patches. > > > > Look at it from the point of view of a person who hasn't been following the > > discussion (whose initials might be LT). That person might get puzzled and > > upset, no? > > I'm puzzled at how to explain this... Please try. > The whole details have been explained over and over during the discussion. That's no use to someone who is trying to understand this patch. > The simple answer is that CLOCK_TICK_ADJUST has been causing extra clock > drift, John's attempt didn't fix the real cause. > My patch doesn't just revert the patches, it also includes the _real_ fix, > so why would the real fix require more justification? That person also > didn't get puzzled why two patches claiming to fix the same problem got > merged... A revert patch's changelog should explain why the code is being reverted. ie: what was wrong with it. A bugfix patch's changelog should explain the problem which is being solved, and how it solves it. Look at it from the point of view of someone who isn't intimately involved in that subsystem and who isn't following every email on whatever mailing list. Your changelog des not provide sufficient detail for such a person to understand your patch. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] Remove obsolete CLOCK_TICK_ADJUST 2008-02-29 22:27 ` Roman Zippel 2008-02-29 22:38 ` Andrew Morton @ 2008-02-29 23:11 ` john stultz 1 sibling, 0 replies; 36+ messages in thread From: john stultz @ 2008-02-29 23:11 UTC (permalink / raw) To: Roman Zippel; +Cc: Andrew Morton, linux-kernel, mingo, rostedt On Fri, 2008-02-29 at 23:27 +0100, Roman Zippel wrote: > Hi, > > On Fri, 29 Feb 2008, Andrew Morton wrote: > > > The changelog provides no reason for the revert of those two patches. > > > > Look at it from the point of view of a person who hasn't been following the > > discussion (whose initials might be LT). That person might get puzzled and > > upset, no? > > I'm puzzled at how to explain this... > The whole details have been explained over and over during the discussion. > The simple answer is that CLOCK_TICK_ADJUST has been causing extra clock > drift, John's attempt didn't fix the real cause. > My patch doesn't just revert the patches, it also includes the _real_ fix, > so why would the real fix require more justification? That person also > didn't get puzzled why two patches claiming to fix the same problem got > merged... How about this tweaked version of what I was using: The first version of the ntp_interval/tick_length inconsistent usage patch was recently merged as bbe4d18ac2e058c56adb0cd71f49d9ed3216a405 http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=bbe4d18ac2e058c56adb0cd71f49d9ed3216a405 While the fix did greatly improve the situation, it was correctly pointed out by Roman that it does have a small bug: If the users change clocksources after the system has been running and NTP has made corrections, the correctoins made against the old clocksource will be applied against the new clocksource, causing error. The second attempt, which corrects the issue in the NTP_INTERVAL_LENGTH definition has also made it up-stream as commit e13a2e61dd5152f5499d2003470acf9c838eab84 http://git.kernel.org/gitweb.cgi?p=linux/kernel/git/torvalds/linux-2.6.git;a=commit;h=e13a2e61dd5152f5499d2003470acf9c838eab84 Roman has correctly pointed out that CLOCK_TICK_ADJUST is calculated based on the PIT's frequency, and isn't really relevant to non-PIT driven clocksources (that is, clocksources other then jiffies and pit). This patch reverts both of those changes, and simply removes CLOCK_TICK_ADJUST. This does remove the granularity error correction for users of PIT and Jiffies clocksource users, but the granularity error but for the majority of users, it should be within the 500ppm range NTP can accommodate for. For systems that have granularity errors greater then 500ppm, the "ntp_tick_adj=" boot option can be used to compensate. ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] Remove obsolete CLOCK_TICK_ADJUST 2008-02-29 4:49 ` [PATCH] Remove obsolete CLOCK_TICK_ADJUST Roman Zippel 2008-02-29 6:25 ` Ray Lee 2008-02-29 22:08 ` Andrew Morton @ 2008-02-29 23:11 ` john stultz 2008-03-02 4:03 ` Roman Zippel 2 siblings, 1 reply; 36+ messages in thread From: john stultz @ 2008-02-29 23:11 UTC (permalink / raw) To: Roman Zippel; +Cc: lkml, Andrew Morton, Ingo Molnar, Steven Rostedt On Fri, 2008-02-29 at 05:49 +0100, Roman Zippel wrote: > Hi, > > Can we please clean up the current mess and get the patch below merged? Yea. Sorry for being slow this week, had some other distractions going on. I'm still not happy about the granularity issue (or non-issue in your mind), but both my version and your version of the solution are pretty close at this point and I can rework my difference on-top of this patch without much trouble. > Revert bbe4d18ac2e058c56adb0cd71f49d9ed3216a405 and > e13a2e61dd5152f5499d2003470acf9c838eab84 and remove CLOCK_TICK_ADJUST > completely. Add a optional kernel parameter ntp_tick_adj instead to allow > adjusting of a large base drift and thus keeping ntpd happy. > The CLOCK_TICK_ADJUST mechanism was introduced at a time PIT was the > primary clock, but we have a varity of clock sources now, so a global PIT > specific adjustment makes little sense anymore. > > Signed-off-by: Roman Zippel <zippel@linux-m68k.org> Acked-by: John Stultz <johnstul@us.ibm.com> > --- > include/linux/timex.h | 9 +-------- > kernel/time/ntp.c | 11 ++++++++++- > kernel/time/timekeeping.c | 6 ++---- > 3 files changed, 13 insertions(+), 13 deletions(-) > > Index: linux-2.6/include/linux/timex.h > =================================================================== > --- linux-2.6.orig/include/linux/timex.h > +++ linux-2.6/include/linux/timex.h > @@ -232,14 +232,7 @@ static inline int ntp_synced(void) > #else > #define NTP_INTERVAL_FREQ (HZ) > #endif > - > -#define CLOCK_TICK_OVERFLOW (LATCH * HZ - CLOCK_TICK_RATE) > -#define CLOCK_TICK_ADJUST (((s64)CLOCK_TICK_OVERFLOW * NSEC_PER_SEC) / \ > - (s64)CLOCK_TICK_RATE) > - > -/* Because using NSEC_PER_SEC would be too easy */ > -#define NTP_INTERVAL_LENGTH ((((s64)TICK_USEC * NSEC_PER_USEC * USER_HZ) + \ > - CLOCK_TICK_ADJUST) / NTP_INTERVAL_FREQ) > +#define NTP_INTERVAL_LENGTH (NSEC_PER_SEC/NTP_INTERVAL_FREQ) > > /* Returns how long ticks are at present, in ns / 2^(SHIFT_SCALE-10). */ > extern u64 current_tick_length(void); > Index: linux-2.6/kernel/time/ntp.c > =================================================================== > --- linux-2.6.orig/kernel/time/ntp.c > +++ linux-2.6/kernel/time/ntp.c > @@ -42,12 +42,13 @@ long time_esterror = NTP_PHASE_LIMIT; /* > long time_freq; /* frequency offset (scaled ppm)*/ > static long time_reftime; /* time at last adjustment (s) */ > long time_adjust; > +long ntp_tick_adj; > > static void ntp_update_frequency(void) > { > u64 second_length = (u64)(tick_usec * NSEC_PER_USEC * USER_HZ) > << TICK_LENGTH_SHIFT; > - second_length += (s64)CLOCK_TICK_ADJUST << TICK_LENGTH_SHIFT; > + second_length += (s64)ntp_tick_adj << TICK_LENGTH_SHIFT; > second_length += (s64)time_freq << (TICK_LENGTH_SHIFT - SHIFT_NSEC); > > tick_length_base = second_length; > @@ -400,3 +401,11 @@ leave: if ((time_status & (STA_UNSYNC|ST > notify_cmos_timer(); > return(result); > } > + > +static int __init ntp_tick_adj_setup(char *str) > +{ > + ntp_tick_adj = simple_strtol(str, NULL, 0); > + return 1; > +} > + > +__setup("ntp_tick_adj=", ntp_tick_adj_setup); > Index: linux-2.6/kernel/time/timekeeping.c > =================================================================== > --- linux-2.6.orig/kernel/time/timekeeping.c > +++ linux-2.6/kernel/time/timekeeping.c > @@ -187,8 +187,7 @@ static void change_clocksource(void) > > clock->error = 0; > clock->xtime_nsec = 0; > - clocksource_calculate_interval(clock, > - (unsigned long)(current_tick_length()>>TICK_LENGTH_SHIFT)); > + clocksource_calculate_interval(clock, NTP_INTERVAL_LENGTH); > > tick_clock_notify(); > > @@ -245,8 +244,7 @@ void __init timekeeping_init(void) > ntp_clear(); > > clock = clocksource_get_next(); > - clocksource_calculate_interval(clock, > - (unsigned long)(current_tick_length()>>TICK_LENGTH_SHIFT)); > + clocksource_calculate_interval(clock, NTP_INTERVAL_LENGTH); > clock->cycle_last = clocksource_read(clock); > > xtime.tv_sec = sec; ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] Remove obsolete CLOCK_TICK_ADJUST 2008-02-29 23:11 ` john stultz @ 2008-03-02 4:03 ` Roman Zippel 0 siblings, 0 replies; 36+ messages in thread From: Roman Zippel @ 2008-03-02 4:03 UTC (permalink / raw) To: john stultz; +Cc: lkml, Andrew Morton, Ingo Molnar, Steven Rostedt Hi, On Fri, 29 Feb 2008, john stultz wrote: > I'm still not happy about the granularity issue (or non-issue in your > mind), It's an issue, but it's not as big as you seem to think. The error drift should be small and 150ppm isn't too good, but it isn't an error to be too worried about. As the alpha example shows, in the past we already tried to reduce this error anyway, in this case by choosing an appropriate HZ value. > but both my version and your version of the solution are pretty > close at this point and I can rework my difference on-top of this patch > without much trouble. Practically it should be enough to just check the base drift and print a warning message, as it suggest to be more likely a configuration error. bye, Roman ^ permalink raw reply [flat|nested] 36+ messages in thread
* Re: [PATCH] correct inconsistent ntp interval/tick_length usage 2008-02-22 2:39 ` john stultz 2008-02-25 14:44 ` Roman Zippel 2008-02-29 4:49 ` [PATCH] Remove obsolete CLOCK_TICK_ADJUST Roman Zippel @ 2008-02-29 18:54 ` Jörg-Volker Peetz 2 siblings, 0 replies; 36+ messages in thread From: Jörg-Volker Peetz @ 2008-02-29 18:54 UTC (permalink / raw) To: linux-kernel john stultz wrote: <snip> > diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c > index c88b591..fe25c94 100644 > --- a/kernel/time/ntp.c > +++ b/kernel/time/ntp.c > @@ -43,19 +43,32 @@ long time_freq; /* frequency offset (scaled ppm)*/ > static long time_reftime; /* time at last adjustment (s) */ > long time_adjust; > > +static s64 granularity_error_adjust; > + > +void ntp_set_granularity_error(s64 len) > +{ > + granularity_error_adjust = len * NTP_INTERVAL_FREQ; > +} > + > static void ntp_update_frequency(void) > { > u64 second_length = (u64)(tick_usec * NSEC_PER_USEC * USER_HZ) > << TICK_LENGTH_SHIFT; > - second_length += (s64)CLOCK_TICK_ADJUST << TICK_LENGTH_SHIFT; > - second_length += (s64)time_freq << (TICK_LENGTH_SHIFT - SHIFT_NSEC); > + s64 adj; > + > + /* Compensate for clocksource granularity error */ > + second_length += granularity_error_adjust; > + > + /* Scale the base second length by the frequency adjustment */ > + adj = second_length * time_freq; > + do_div(adj, 1000000); > + second_length += adj>>SHIFT_NSEC; > > tick_length_base = second_length; > + do_div(tick_length_base, NTP_INTERVAL_FREQ); > > do_div(second_length, HZ); > tick_nsec = second_length >> TICK_LENGTH_SHIFT; > - > - do_div(tick_length_base, NTP_INTERVAL_FREQ); > } > > /** <snip> Hi John, out of curiosity and inspired by the patch you suggested, I did a test with the following ntp_update_frequency function in kernel/time/ntp.c of kernel 2.6.24.3 using NO_HZ and the hpet timer: static void ntp_update_frequency(void) { s64 adj; u64 second_length = (u64)(tick_usec * NSEC_PER_USEC * USER_HZ) << TICK_LENGTH_SHIFT; printk(KERN_NOTICE "*ntp* timefreq = %lld\n", (s64)time_freq); printk(KERN_NOTICE "*ntp* s len = %lld\n", second_length); printk(KERN_NOTICE "*ntp* corr 1 = %lld\n", (s64)time_freq << (TICK_LENGTH_SHIFT - SHIFT_NSEC)); /* Scale the base second length by the frequency adjustment */ adj = second_length * time_freq; do_div(adj, 1000000); printk(KERN_NOTICE "*ntp* corr 2 = %lld\n", adj>>SHIFT_NSEC); second_length += (s64)time_freq << (TICK_LENGTH_SHIFT - SHIFT_NSEC); tick_length_base = second_length; do_div(tick_length_base, NTP_INTERVAL_FREQ); do_div(second_length, HZ); tick_nsec = second_length >> TICK_LENGTH_SHIFT; } Running this kernel and ntpd I get numbers like the following in my syslog file: Feb 29 12:28:16 skadi kernel: *ntp* timefreq = 305345 Feb 29 12:28:16 skadi kernel: *ntp* s len = 4294967296000000000 Feb 29 12:28:16 skadi kernel: *ntp* corr 1 = 320177438720 Feb 29 12:28:16 skadi kernel: *ntp* corr 2 = 3030411349 Feb 29 12:36:53 skadi kernel: *ntp* timefreq = 730456 Feb 29 12:36:53 skadi kernel: *ntp* s len = 4294967296000000000 Feb 29 12:36:53 skadi kernel: *ntp* corr 1 = 765938630656 Feb 29 12:36:53 skadi kernel: *ntp* corr 2 = 2434829845 Feb 29 12:39:03 skadi kernel: *ntp* timefreq = 868771 Feb 29 12:39:03 skadi kernel: *ntp* s len = 4294967296000000000 Feb 29 12:39:03 skadi kernel: *ntp* corr 1 = 910972420096 Feb 29 12:39:03 skadi kernel: *ntp* corr 2 = 2301870005 So the original correction and the correction suggested by you differ significantly by a factor of approximately 1000. Incidentally, both corrections are nearly neglectable compared to second_length. But I think the correction suggested by you is calculated wrong due to an overflow of the multiplication second_length * time_freq Calculating the correction as (second_length / 1000000) * time_freq the correction would be 1311446788997120000 which is much bigger as the original correction 320177438720 (by a factor of 10^7). Is it normal to have a second_length of approx. 4 * 10^18 ? In what units? -- Regards, Jörg-Volker. ^ permalink raw reply [flat|nested] 36+ messages in thread
end of thread, other threads:[~2008-03-02 4:04 UTC | newest] Thread overview: 36+ messages (download: mbox.gz / follow: Atom feed) -- links below jump to the message on this page -- 2008-01-24 2:38 [PATCH] correct inconsistent ntp interval/tick_length usage john stultz 2008-01-24 9:14 ` Valdis.Kletnieks 2008-01-25 14:07 ` Roman Zippel 2008-01-29 2:28 ` john stultz 2008-01-29 4:02 ` Roman Zippel 2008-01-30 2:14 ` john stultz 2008-01-31 1:55 ` Roman Zippel 2008-01-31 2:16 ` john stultz 2008-01-31 5:02 ` Roman Zippel 2008-02-02 1:02 ` John Stultz 2008-02-08 17:33 ` Roman Zippel 2008-02-09 2:17 ` john stultz 2008-02-09 4:47 ` Roman Zippel 2008-02-09 5:04 ` Andrew Morton 2008-02-09 14:13 ` Roman Zippel 2008-02-10 18:45 ` Roman Zippel 2008-02-12 0:09 ` john stultz 2008-02-12 14:36 ` Roman Zippel 2008-02-14 4:36 ` john stultz 2008-02-16 4:24 ` Roman Zippel 2008-02-19 1:02 ` john stultz 2008-02-19 4:04 ` Roman Zippel 2008-02-20 1:50 ` john stultz 2008-02-20 17:08 ` Roman Zippel 2008-02-22 2:39 ` john stultz 2008-02-25 14:44 ` Roman Zippel 2008-02-29 4:49 ` [PATCH] Remove obsolete CLOCK_TICK_ADJUST Roman Zippel 2008-02-29 6:25 ` Ray Lee 2008-02-29 13:31 ` Roman Zippel 2008-02-29 22:08 ` Andrew Morton 2008-02-29 22:27 ` Roman Zippel 2008-02-29 22:38 ` Andrew Morton 2008-02-29 23:11 ` john stultz 2008-02-29 23:11 ` john stultz 2008-03-02 4:03 ` Roman Zippel 2008-02-29 18:54 ` [PATCH] correct inconsistent ntp interval/tick_length usage Jörg-Volker Peetz
This is a public inbox, see mirroring instructions for how to clone and mirror all data and code used for this inbox; as well as URLs for NNTP newsgroup(s).