From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1752794AbeEGRig (ORCPT ); Mon, 7 May 2018 13:38:36 -0400 Received: from mail-lf0-f68.google.com ([209.85.215.68]:44964 "EHLO mail-lf0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1751951AbeEGRie (ORCPT ); Mon, 7 May 2018 13:38:34 -0400 X-Google-Smtp-Source: AB8JxZrUZivrpfAdIYDVZnQs0oFvD5Ni1ZIHcdWjhtxpuuJH27AtsfrEgGqxTQ5LysEbgY2nMgECuQ== Subject: Re: [PATCH v1 4/4] iommu/tegra: gart: Optimize map/unmap From: Dmitry Osipenko To: Joerg Roedel Cc: Robin Murphy , Thierry Reding , linux-tegra@vger.kernel.org, iommu@lists.linux-foundation.org, linux-kernel@vger.kernel.org, Jonathan Hunter References: <20180427100202.GO30388@ulmo> <716edf58-38a7-21e5-1668-b866bf392e34@arm.com> <6827bda3-1aa2-da60-a749-8e2dd2e595f3@gmail.com> <20180507080420.GB18595@8bytes.org> Openpgp: preference=signencrypt Autocrypt: addr=digetx@gmail.com; prefer-encrypt=mutual; keydata= xsBNBFpX5TwBCADQhg+lBnTunWSPbP5I+rM9q6EKPm5fu2RbqyVAh/W3fRvLyghdb58Yrmjm KpDYUhBIZvAQoFLEL1IPAgJBtmPvemO1XUGPxfYNh/3BlcDFBAgERrI3BfA/6pk7SAFn8u84 p+J1TW4rrPYcusfs44abJrn8CH0GZKt2AZIsGbGQ79O2HHXKHr9V95ZEPWH5AR0UtL6wxg6o O56UNG3rIzSL5getRDQW3yCtjcqM44mz6GPhSE2sxNgqureAbnzvr4/93ndOHtQUXPzzTrYB z/WqLGhPdx5Ouzn0Q0kSVCQiqeExlcQ7i7aKRRrELz/5/IXbCo2O+53twlX8xOps9iMfABEB AAHNIkRtaXRyeSBPc2lwZW5rbyA8ZGlnZXR4QGdtYWlsLmNvbT7CwJQEEwEIAD4WIQSczHcO 3uc4K1eb3yvTNNaPsNRzvAUCWlflPAIbAwUJA8JnAAULCQgHAgYVCgkICwIEFgIDAQIeAQIX gAAKCRDTNNaPsNRzvFjTCACqAh1M9/YPq73/ai5h2ExDquTgJnjegL8KL2yHL3G+XINwzN5E nPI7esoYm+zVWDJbv3UuRqylpookLNSRA01yyvkaMcipB/B128UnqmUiGRqezj9QE20yIauo uHRuwHPE2q+UkfUhRX9iuOaEyQtZDiCa0myMjmRkJ+Z8ZetclEPG8dYZu47w04phuMlu1QAt a0gkZOaMKvXgj21ushALS6nYnvm7HiIPQXfnEXThartatRvFdmbG4PCn0IoICkQBizwJtXrL HEjELIFap0M8krVJlUoZTFaZnaZkGpUDWikeFtAuie2KuIxmVBYPM4X7pM3eP3AVvIPGS7EE UUFuzsBNBFpX5TwBCADFNDou220thijaLLGaQsebWjzc/gPRxMixIpk856MRyRaQin+IbGD6 YskMb5ZSD3nS88LIKNfY4MMH0LwfYztI++ICG2vdFLkbBt78E+LqEa+kZ9072l4W5KO3mWQo +jMfxXbpgGlc7iuEReDgl8iyZ27r51kSW665CYvvu2YJhLqgdj6QM1lN2D1UnhEhkkU+pRAj 1rJVOxdfJaQNQS4+204p3TrURovzNGkN/brqakpNIcqGOAGQqb8F0tuwwuP7ERq/BzDNkbdr qJOrVC/wkHRq1jfabQczWKf8MwYOvivR3HY8d3CpSQxmUXDtdOWfg0XGm1dxYnVfqPjuJaZt ABEBAAHCwHwEGAEIACYWIQSczHcO3uc4K1eb3yvTNNaPsNRzvAUCWlflPAIbDAUJA8JnAAAK CRDTNNaPsNRzvJzuB/9d+sxcwHbO8ZDcgaLX9N+bXFqN9fIRVmBUyWa+qqTSREA4uVAtYcRT lfPE2OQ7aMFxaYPwo+/z5SLpu8HcEhN/FG9uIkfYwK0mdCO0vgvlfvBJm4VHe7C6vyAeEPJQ DKbBvdgeqFqO+PsLkk2sawF/9sontMJ5iFfjNDj4UeAo4VsdlduTBZv5hHFvIbv/p7jKH6OT 90FsgUSVbShh7SH5OzAcgqSy4kxuS1AHizWo6P3f9vei987LZWTyhuEuhJsOfivDsjKIq7qQ c5eR+JJtyLEA0Jt4cQGhpzHtWB0yB3XxXzHVa4QUp00BNVWyiJ/t9JHT4S5mdyLfcKm7ddc9 Message-ID: <8967e349-3af1-af17-dfa2-187d06dca18c@gmail.com> Date: Mon, 7 May 2018 20:38:31 +0300 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.7.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Language: en-US Content-Transfer-Encoding: 8bit Sender: linux-kernel-owner@vger.kernel.org List-ID: X-Mailing-List: linux-kernel@vger.kernel.org On 07.05.2018 18:51, Dmitry Osipenko wrote: [snip] > Secondly, the interesting part is that mapping / unmapping of a contiguous > allocation (CMA using DMA API) is slower by ~50% then doing it for a sparse > allocation (get_pages using bare IOMMU API). /I think/ it's a shortcoming of the > arch/arm/mm/dma-mapping.c, which also suffers from other inflexibilities that > Thierry faced recently. Though I haven't really tried to figure out what is the > bottleneck yet and Thierry was going to re-write ARM's dma-mapping > implementation anyway, I'll take a closer look at this issue a bit later. Please scratch my accusation of ARM's dma-mapping, it's not the culprit at all. I completely forgot that in a case of sparse allocation displays framebuffer IOMMU mapping is "pinned" to the GART and hence it's not getting dynamically mapped / unmapped during of my testing. I also forgot to set CPU freq governor to "perfomance", that reduced 50% to 20% of the above perf difference. The rest of the testing is unaffected, flushing after whole mapping is still much more efficient than flushing after modification of each page entry. And yet again, performance of sparse mapping is nearly the same as of contiguous mapping unless sparse allocation is large and _very_ fragmented.