LKML Archive on lore.kernel.org
help / color / mirror / Atom feed
* [PATCH] time, ntp: Do not update time_state in middle of leap second
@ 2015-02-04 12:28 Prarit Bhargava
  2015-02-04 16:30 ` [PATCH] time, ntp: Do not update time_state in middle of leap Miroslav Lichvar
  0 siblings, 1 reply; 10+ messages in thread
From: Prarit Bhargava @ 2015-02-04 12:28 UTC (permalink / raw)
  To: linux-kernel; +Cc: Prarit Bhargava, John Stultz, Thomas Gleixner

Resending ...

P.

----8<----

During leap second insertion testing it was noticed that a small window
exists where the time_state could be reset such that
time_state = TIME_OK, which then causes the leap second to not occur, or
causes the entire leap second state machine to fail.

While this is highly unlikely to ever happen in the real world it is
still something we should protect against, as breaking the state machine
is obviously bad.

If the time_state == TIME_OOP (ie, the leap second is in progress) do not
allow an external update to time_state.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
---
 kernel/time/ntp.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index 28bf91c..f9ebf06 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -534,7 +534,8 @@ void ntp_notify_cmos_timer(void) { }
  */
 static inline void process_adj_status(struct timex *txc, struct timespec64 *ts)
 {
-	if ((time_status & STA_PLL) && !(txc->status & STA_PLL)) {
+	if ((time_status & STA_PLL) && !(txc->status & STA_PLL) &&
+	    (time_state != TIME_OOP)) {
 		time_state = TIME_OK;
 		time_status = STA_UNSYNC;
 		/* restart PPS frequency calibration */
-- 
1.7.9.3


^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] time, ntp: Do not update time_state in middle of leap
  2015-02-04 12:28 [PATCH] time, ntp: Do not update time_state in middle of leap second Prarit Bhargava
@ 2015-02-04 16:30 ` Miroslav Lichvar
  2015-02-05 13:20   ` Prarit Bhargava
  0 siblings, 1 reply; 10+ messages in thread
From: Miroslav Lichvar @ 2015-02-04 16:30 UTC (permalink / raw)
  To: Prarit Bhargava; +Cc: John Stultz, Thomas Gleixner, linux-kernel

Prarit Bhargava wrote:
> While this is highly unlikely to ever happen in the real world it is
> still something we should protect against, as breaking the state machine
> is obviously bad.

I'm not sure what exactly breaks here. If the PLL is disabled before
time_state is set to TIME_OOP, the insertion/deletion will be aborted.
If after that, adjtimex() will return with TIME_ERROR as expected, or
not?

>  static inline void process_adj_status(struct timex *txc, struct timespec64 *ts)
>  {
> -	if ((time_status & STA_PLL) && !(txc->status & STA_PLL)) {
> +	if ((time_status & STA_PLL) && !(txc->status & STA_PLL) &&
> +	    (time_state != TIME_OOP)) {
>  		time_state = TIME_OK;
>  		time_status = STA_UNSYNC;
>  		/* restart PPS frequency calibration */

Shouldn't be time_status reset and the PPS calibration restarted even
when state is TIME_OOP?

-- 
Miroslav Lichvar

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] time, ntp: Do not update time_state in middle of leap
  2015-02-04 16:30 ` [PATCH] time, ntp: Do not update time_state in middle of leap Miroslav Lichvar
@ 2015-02-05 13:20   ` Prarit Bhargava
  2015-02-06 10:38     ` Miroslav Lichvar
  0 siblings, 1 reply; 10+ messages in thread
From: Prarit Bhargava @ 2015-02-05 13:20 UTC (permalink / raw)
  To: Miroslav Lichvar; +Cc: John Stultz, Thomas Gleixner, linux-kernel



On 02/04/2015 11:30 AM, Miroslav Lichvar wrote:
> Prarit Bhargava wrote:
>> While this is highly unlikely to ever happen in the real world it is
>> still something we should protect against, as breaking the state machine
>> is obviously bad.
> 
> I'm not sure what exactly breaks here. If the PLL is disabled before
> time_state is set to TIME_OOP, the insertion/deletion will be aborted.

Yes, that is correct.

> If after that, adjtimex() will return with TIME_ERROR as expected, or
> not?

It is possible that an adjtimex() will set the time_state here back to TIME_OK
and return TIME_OK to userspace.  Again, and I want to stress this, this is
extremely unlikely to happen.  I only hit this due to a bug in a test program.
But at the end of the day, it is possible that this happens and we should
protect against it.


[  942.952833] time_state [1] change from TIME_OK to TIME_INS

Fri Feb 13 18:59:51 2015 + 318126 us    TIME_INS
Fri Feb 13 18:59:51 2015 + 818167 us    TIME_INS
Fri Feb 13 18:59:52 2015 + 318208 us    TIME_INS
Fri Feb 13 18:59:52 2015 + 818248 us    TIME_INS
Fri Feb 13 18:59:53 2015 + 318290 us    TIME_INS
Fri Feb 13 18:59:53 2015 + 818331 us    TIME_INS
Fri Feb 13 18:59:54 2015 + 318372 us    TIME_INS
Fri Feb 13 18:59:54 2015 + 818413 us    TIME_INS
Fri Feb 13 18:59:55 2015 + 318454 us    TIME_INS
Fri Feb 13 18:59:55 2015 + 818495 us    TIME_INS
Fri Feb 13 18:59:56 2015 + 318534 us    TIME_INS
Fri Feb 13 18:59:56 2015 + 818575 us    TIME_INS
Fri Feb 13 18:59:57 2015 + 318617 us    TIME_INS
Fri Feb 13 18:59:57 2015 + 818660 us    TIME_INS
Fri Feb 13 18:59:58 2015 + 318702 us    TIME_INS
Fri Feb 13 18:59:58 2015 + 818744 us    TIME_INS
Fri Feb 13 18:59:59 2015 + 318785 us    TIME_INS
Fri Feb 13 18:59:59 2015 + 818837 us    TIME_INS

[  952.953143] time_state [4] change from TIME_INS to TIME_OOP
[  952.953150] Clock: inserting leap second 23:59:60 UTC
[  953.299905] process_adj_status: insert_leap_sec[1223] setting time_state back
to TIME_OK [1, 1]   <<< adjtimex() call
[  953.299913] time_state [9] change from TIME_OOP to TIME_OK

Fri Feb 13 18:59:59 2015 + 318878 us    TIME_OK
Fri Feb 13 18:59:59 2015 + 818931 us    TIME_OK

[  954.064237] time_state [1] change from TIME_OK to TIME_INS

Fri Feb 13 19:00:00 2015 + 318972 us    TIME_INS
Fri Feb 13 19:00:00 2015 + 819012 us    TIME_INS
Fri Feb 13 19:00:01 2015 + 319051 us    TIME_INS
Fri Feb 13 19:00:01 2015 + 819089 us    TIME_INS
Fri Feb 13 19:00:02 2015 + 319128 us    TIME_INS

P.

> 
>>  static inline void process_adj_status(struct timex *txc, struct timespec64 *ts)
>>  {
>> -	if ((time_status & STA_PLL) && !(txc->status & STA_PLL)) {
>> +	if ((time_status & STA_PLL) && !(txc->status & STA_PLL) &&
>> +	    (time_state != TIME_OOP)) {
>>  		time_state = TIME_OK;
>>  		time_status = STA_UNSYNC;
>>  		/* restart PPS frequency calibration */
> 
> Shouldn't be time_status reset and the PPS calibration restarted even
> when state is TIME_OOP?

No, this should only happen after the leap second is done IMO (which should be
no more than 2 seconds later).

> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] time, ntp: Do not update time_state in middle of leap
  2015-02-05 13:20   ` Prarit Bhargava
@ 2015-02-06 10:38     ` Miroslav Lichvar
  2015-02-06 10:50       ` Prarit Bhargava
  0 siblings, 1 reply; 10+ messages in thread
From: Miroslav Lichvar @ 2015-02-06 10:38 UTC (permalink / raw)
  To: Prarit Bhargava; +Cc: John Stultz, Thomas Gleixner, linux-kernel

On Thu, Feb 05, 2015 at 08:20:08AM -0500, Prarit Bhargava wrote:
> On 02/04/2015 11:30 AM, Miroslav Lichvar wrote:
> > If after that, adjtimex() will return with TIME_ERROR as expected, or
> > not?
> 
> It is possible that an adjtimex() will set the time_state here back to TIME_OK
> and return TIME_OK to userspace.  Again, and I want to stress this, this is
> extremely unlikely to happen.  I only hit this due to a bug in a test program.
> But at the end of the day, it is possible that this happens and we should
> protect against it.

Could it break any applications? I guess PLL is normally disabled only
when a time synchronization process ends. FWIW, the reference
nanokernel implementation has this too.

> >> -	if ((time_status & STA_PLL) && !(txc->status & STA_PLL)) {
> >> +	if ((time_status & STA_PLL) && !(txc->status & STA_PLL) &&
> >> +	    (time_state != TIME_OOP)) {
> >>  		time_state = TIME_OK;
> >>  		time_status = STA_UNSYNC;
> >>  		/* restart PPS frequency calibration */
> > 
> > Shouldn't be time_status reset and the PPS calibration restarted even
> > when state is TIME_OOP?
> 
> No, this should only happen after the leap second is done IMO (which should be
> no more than 2 seconds later).

But that will not happen automatically, the application would have to
enable and disable the PLL again. Interestingly, the "time_status =
STA_UNSYNC" assignment doesn't seem to do anything here, as the
variable is always reset couple lines after that, STA_UNSYNC is not a
readonly flag.

-- 
Miroslav Lichvar

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] time, ntp: Do not update time_state in middle of leap
  2015-02-06 10:38     ` Miroslav Lichvar
@ 2015-02-06 10:50       ` Prarit Bhargava
  0 siblings, 0 replies; 10+ messages in thread
From: Prarit Bhargava @ 2015-02-06 10:50 UTC (permalink / raw)
  To: Miroslav Lichvar; +Cc: John Stultz, Thomas Gleixner, linux-kernel



On 02/06/2015 05:38 AM, Miroslav Lichvar wrote:
> On Thu, Feb 05, 2015 at 08:20:08AM -0500, Prarit Bhargava wrote:
>> On 02/04/2015 11:30 AM, Miroslav Lichvar wrote:
>>> If after that, adjtimex() will return with TIME_ERROR as expected, or
>>> not?
>>
>> It is possible that an adjtimex() will set the time_state here back to TIME_OK
>> and return TIME_OK to userspace.  Again, and I want to stress this, this is
>> extremely unlikely to happen.  I only hit this due to a bug in a test program.
>> But at the end of the day, it is possible that this happens and we should
>> protect against it.
> 
> Could it break any applications? I guess PLL is normally disabled only
> when a time synchronization process ends. FWIW, the reference
> nanokernel implementation has this too.

Not that I saw.  I did take a look with top, etc., to see if anything in
userspace went bad, and I ran programs that were calling gettimeofday() and
clock_gettime() to see if there were any problems.  I didn't see anything.  I
also played around with a program to see if the timer expiry failed but again,
didn't see anything.

The outcome of TIME_INS->TIME_OOP->TIME_OK->TIME_INS, AFAICT was only that
TIME_INS was left issued.  Which could lead to another leap second insertion
down the road unless ntp (or some other program) was left to reset the state.

> 
>>>> -	if ((time_status & STA_PLL) && !(txc->status & STA_PLL)) {
>>>> +	if ((time_status & STA_PLL) && !(txc->status & STA_PLL) &&
>>>> +	    (time_state != TIME_OOP)) {
>>>>  		time_state = TIME_OK;
>>>>  		time_status = STA_UNSYNC;
>>>>  		/* restart PPS frequency calibration */
>>>
>>> Shouldn't be time_status reset and the PPS calibration restarted even
>>> when state is TIME_OOP?
>>
>> No, this should only happen after the leap second is done IMO (which should be
>> no more than 2 seconds later).
> 
> But that will not happen automatically, the application would have to
> enable and disable the PLL again. Interestingly, the "time_status =
> STA_UNSYNC" assignment doesn't seem to do anything here, as the

Hmmm ... good point.  I didn't think of that.  Let me go back and change the
code to do the reset.

> variable is always reset couple lines after that, STA_UNSYNC is not a
> readonly flag.
> 

P.

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] time, ntp: Do not update time_state in middle of leap second
  2015-02-10 23:47 ` John Stultz
@ 2015-02-11 10:47   ` Prarit Bhargava
  0 siblings, 0 replies; 10+ messages in thread
From: Prarit Bhargava @ 2015-02-11 10:47 UTC (permalink / raw)
  To: John Stultz; +Cc: lkml, Thomas Gleixner, Miroslav Lichvar



On 02/10/2015 06:47 PM, John Stultz wrote:
> On Sun, Feb 8, 2015 at 2:29 AM, Prarit Bhargava <prarit@redhat.com> wrote:
>> During leap second insertion testing it was noticed that a small window
>> exists where the time_state could be reset such that
>> time_state = TIME_OK, which then causes the leap second to not occur, or
>> causes the entire leap second state machine to fail.
> 
> 
> I think this description is fairly opaque, and probably needs the
> specific example of the state change transitions that motivates this
> patch.
> 
>> While this is highly unlikely to ever happen in the real world it is
>> still something we should protect against, as breaking the state machine
>> is obviously bad.
> 
> In this case it was a test-case bug where uninitialized data being
> passed to adjtimex (when the test intended to only read the time
> state) was causing an unexpected state change transition. So its not
> immediately obvious that resetting the state machine when the root
> called adjtimex is invalid, so it would be good to make this more
> clear and explicit (ie: show the expected state transitions and the
> command that caused the strange transition you saw).
> 
> Sorry for the slow response here, I've been on the fence as to if this
> is the right thing or not, and have needed to get some time to stare
> at this a bit more to see if I can convince myself its the right
> thing, so improving the commit message might make it more obvious to
> me and others. :)

Will do :)  I'll write up a proper and detailed description.  My bad.

P.

> 
> thanks
> -john
> 

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] time, ntp: Do not update time_state in middle of leap second
  2015-02-07 18:29 [PATCH] time, ntp: Do not update time_state in middle of leap second Prarit Bhargava
  2015-02-10 14:01 ` Peter Zijlstra
@ 2015-02-10 23:47 ` John Stultz
  2015-02-11 10:47   ` Prarit Bhargava
  1 sibling, 1 reply; 10+ messages in thread
From: John Stultz @ 2015-02-10 23:47 UTC (permalink / raw)
  To: Prarit Bhargava; +Cc: lkml, Thomas Gleixner, Miroslav Lichvar

On Sun, Feb 8, 2015 at 2:29 AM, Prarit Bhargava <prarit@redhat.com> wrote:
> During leap second insertion testing it was noticed that a small window
> exists where the time_state could be reset such that
> time_state = TIME_OK, which then causes the leap second to not occur, or
> causes the entire leap second state machine to fail.


I think this description is fairly opaque, and probably needs the
specific example of the state change transitions that motivates this
patch.

> While this is highly unlikely to ever happen in the real world it is
> still something we should protect against, as breaking the state machine
> is obviously bad.

In this case it was a test-case bug where uninitialized data being
passed to adjtimex (when the test intended to only read the time
state) was causing an unexpected state change transition. So its not
immediately obvious that resetting the state machine when the root
called adjtimex is invalid, so it would be good to make this more
clear and explicit (ie: show the expected state transitions and the
command that caused the strange transition you saw).

Sorry for the slow response here, I've been on the fence as to if this
is the right thing or not, and have needed to get some time to stare
at this a bit more to see if I can convince myself its the right
thing, so improving the commit message might make it more obvious to
me and others. :)

thanks
-john

^ permalink raw reply	[flat|nested] 10+ messages in thread

* Re: [PATCH] time, ntp: Do not update time_state in middle of leap second
  2015-02-07 18:29 [PATCH] time, ntp: Do not update time_state in middle of leap second Prarit Bhargava
@ 2015-02-10 14:01 ` Peter Zijlstra
  2015-02-10 23:47 ` John Stultz
  1 sibling, 0 replies; 10+ messages in thread
From: Peter Zijlstra @ 2015-02-10 14:01 UTC (permalink / raw)
  To: Prarit Bhargava
  Cc: linux-kernel, John Stultz, Thomas Gleixner, Miroslav Lichvar

On Sat, Feb 07, 2015 at 01:29:39PM -0500, Prarit Bhargava wrote:
> During leap second insertion testing it was noticed that a small window
> exists where the time_state could be reset such that
> time_state = TIME_OK, which then causes the leap second to not occur, or
> causes the entire leap second state machine to fail.
> 
> While this is highly unlikely to ever happen in the real world it is
> still something we should protect against, as breaking the state machine
> is obviously bad.
> 
> If the time_state == TIME_OOP (ie, the leap second is in progress) do not
> allow an external update to time_state.
> 
> [v2]: Only block time_state change when TIME_OOP
> 
> Signed-off-by: Prarit Bhargava <prarit@redhat.com>
> Cc: John Stultz <john.stultz@linaro.org>
> Cc: Thomas Gleixner <tglx@linutronix.de>
> Cc: Miroslav Lichvar <mlichvar@redhat.com>

John, ACK?

^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH] time, ntp: Do not update time_state in middle of leap second
@ 2015-02-07 18:29 Prarit Bhargava
  2015-02-10 14:01 ` Peter Zijlstra
  2015-02-10 23:47 ` John Stultz
  0 siblings, 2 replies; 10+ messages in thread
From: Prarit Bhargava @ 2015-02-07 18:29 UTC (permalink / raw)
  To: linux-kernel
  Cc: Prarit Bhargava, John Stultz, Thomas Gleixner, Miroslav Lichvar

During leap second insertion testing it was noticed that a small window
exists where the time_state could be reset such that
time_state = TIME_OK, which then causes the leap second to not occur, or
causes the entire leap second state machine to fail.

While this is highly unlikely to ever happen in the real world it is
still something we should protect against, as breaking the state machine
is obviously bad.

If the time_state == TIME_OOP (ie, the leap second is in progress) do not
allow an external update to time_state.

[v2]: Only block time_state change when TIME_OOP

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
Cc: John Stultz <john.stultz@linaro.org>
Cc: Thomas Gleixner <tglx@linutronix.de>
Cc: Miroslav Lichvar <mlichvar@redhat.com>
---
 kernel/time/ntp.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index 28bf91c..6ff5cd5 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -535,7 +535,8 @@ void ntp_notify_cmos_timer(void) { }
 static inline void process_adj_status(struct timex *txc, struct timespec64 *ts)
 {
 	if ((time_status & STA_PLL) && !(txc->status & STA_PLL)) {
-		time_state = TIME_OK;
+		if (time_state != TIME_OOP)
+			time_state = TIME_OK;
 		time_status = STA_UNSYNC;
 		/* restart PPS frequency calibration */
 		pps_reset_freq_interval();
-- 
1.7.9.3


^ permalink raw reply	[flat|nested] 10+ messages in thread

* [PATCH] time, ntp: Do not update time_state in middle of leap second
@ 2015-01-29 13:35 Prarit Bhargava
  0 siblings, 0 replies; 10+ messages in thread
From: Prarit Bhargava @ 2015-01-29 13:35 UTC (permalink / raw)
  To: linux-kernel; +Cc: Prarit Bhargava

During leap second insertion testing it was noticed that a small window
exists where the time_state could be reset such that
time_state = TIME_OK, which then causes the leap second to not occur, or
causes the entire leap second state machine to fail.

While this is highly unlikely to ever happen in the real world it is
still something we should protect against, as breaking the state machine
is obviously bad.

If the time_state == TIME_OOP (ie, the leap second is in progress) do not
allow an external update to time_state.

Signed-off-by: Prarit Bhargava <prarit@redhat.com>
---
 kernel/time/ntp.c |    3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

diff --git a/kernel/time/ntp.c b/kernel/time/ntp.c
index 28bf91c..f9ebf06 100644
--- a/kernel/time/ntp.c
+++ b/kernel/time/ntp.c
@@ -534,7 +534,8 @@ void ntp_notify_cmos_timer(void) { }
  */
 static inline void process_adj_status(struct timex *txc, struct timespec64 *ts)
 {
-	if ((time_status & STA_PLL) && !(txc->status & STA_PLL)) {
+	if ((time_status & STA_PLL) && !(txc->status & STA_PLL) &&
+	    (time_state != TIME_OOP)) {
 		time_state = TIME_OK;
 		time_status = STA_UNSYNC;
 		/* restart PPS frequency calibration */
-- 
1.7.9.3


^ permalink raw reply	[flat|nested] 10+ messages in thread

end of thread, other threads:[~2015-02-11 10:47 UTC | newest]

Thread overview: 10+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-02-04 12:28 [PATCH] time, ntp: Do not update time_state in middle of leap second Prarit Bhargava
2015-02-04 16:30 ` [PATCH] time, ntp: Do not update time_state in middle of leap Miroslav Lichvar
2015-02-05 13:20   ` Prarit Bhargava
2015-02-06 10:38     ` Miroslav Lichvar
2015-02-06 10:50       ` Prarit Bhargava
  -- strict thread matches above, loose matches on Subject: below --
2015-02-07 18:29 [PATCH] time, ntp: Do not update time_state in middle of leap second Prarit Bhargava
2015-02-10 14:01 ` Peter Zijlstra
2015-02-10 23:47 ` John Stultz
2015-02-11 10:47   ` Prarit Bhargava
2015-01-29 13:35 Prarit Bhargava

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for NNTP newsgroup(s).