From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1031625AbbD2IOU (ORCPT ); Wed, 29 Apr 2015 04:14:20 -0400 Received: from mail.skyhub.de ([78.46.96.112]:46820 "EHLO mail.skyhub.de" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1031044AbbD2IOP (ORCPT ); Wed, 29 Apr 2015 04:14:15 -0400 Date: Wed, 29 Apr 2015 10:13:55 +0200 From: Borislav Petkov To: "Zheng, Lv" Cc: linux-edac , Jiri Kosina , Borislav Petkov , "Rafael J. Wysocki" , Len Brown , "Luck, Tony" , Tomasz Nowicki , "Chen, Gong" , Wolfram Sang , Naoya Horiguchi , "linux-acpi@vger.kernel.org" , "linux-kernel@vger.kernel.org" Subject: Re: [RFC PATCH 5/5] GHES: Make NMI handler have a single reader Message-ID: <20150429081355.GA5498@pd.tnic> References: <1427448178-20689-1-git-send-email-bp@alien8.de> <1427448178-20689-6-git-send-email-bp@alien8.de> <1AE640813FDE7649BE1B193DEA596E880270F835@SHSMSX101.ccr.corp.intel.com> <20150428135913.GD19025@pd.tnic> <1AE640813FDE7649BE1B193DEA596E880270FB3B@SHSMSX101.ccr.corp.intel.com> MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Disposition: inline In-Reply-To: <1AE640813FDE7649BE1B193DEA596E880270FB3B@SHSMSX101.ccr.corp.intel.com> User-Agent: Mutt/1.5.23 (2014-03-12) Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On Wed, Apr 29, 2015 at 12:49:59AM +0000, Zheng, Lv wrote: > > > We absolutely want to use atomic_add_unless() because we get to save us > > > the expensive > > > > > > LOCK; CMPXCHG > > > > > > if the value was already 1. Which is exactly what this patch is trying > > > to avoid - a thundering herd of cores CMPXCHGing a global variable. > > > > IMO, on most architectures, the "cmp" part should work just like what you've done with "if". > > And on some architectures, if the "xchg" doesn't happen, the "cmp" part even won't cause a pipe line hazard. Even if CMPXCHG is being split into several microops, they all still need to flow down the pipe and require resources and tracking. And you only know at retire time what the CMP result is and can "discard" the XCHG part. Provided the uarch is smart enough to do that. This is probably why CMPXCHG needs 5,6,7,10,22,... cycles depending on uarch and vendor, if I can trust Agner Fog's tables. And I bet those numbers are best-case only and in real-life they probably tend to fall out even worse. CMP needs only 1. On almost every uarch and vendor. And even that cycle probably gets hidden with a good branch predictor. > If you man the LOCK prefix, I understand now. And that makes several times worse: 22, 40, 80, ... cycles. -- Regards/Gruss, Boris. ECO tip #101: Trim your mails when you reply. --