[PATCH 092/104] mm: fix aio performance regression for database caused by THP

Mon Sep 30 13:31:35 UTC 2013

On 09/30/2013 07:26 AM, Greg Kroah-Hartman wrote:
> On Mon, Sep 30, 2013 at 03:14:52PM +0200, Jack Wang wrote:
>> On 09/30/2013 12:11 PM, Luis Henriques wrote:
>>> 3.5.7.22 -stable review patch.  If anyone has any objections, please let me know.
>>>
>>> ------------------
>>>
>>> From: Khalid Aziz <khalid.aziz at oracle.com>
>>>
>>> commit 7cb2ef56e6a8b7b368b2e883a0a47d02fed66911 upstream.
>>>
>>> I am working with a tool that simulates oracle database I/O workload.
>>> This tool (orion to be specific -
>>> <http://docs.oracle.com/cd/E11882_01/server.112/e16638/iodesign.htm#autoId24>)
>>> allocates hugetlbfs pages using shmget() with SHM_HUGETLB flag.  It then
>>> does aio into these pages from flash disks using various common block
>>> sizes used by database.  I am looking at performance with two of the most
>>> common block sizes - 1M and 64K.  aio performance with these two block
>>> sizes plunged after Transparent HugePages was introduced in the kernel.
>>> Here are performance numbers:
>>>
>>> 		pre-THP		2.6.39		3.11-rc5
>>> 1M read		8384 MB/s	5629 MB/s	6501 MB/s
>>> 64K read	7867 MB/s	4576 MB/s	4251 MB/s
>>>
>>> I have narrowed the performance impact down to the overheads introduced by
>>> THP in __get_page_tail() and put_compound_page() routines.  perf top shows
>>>> 40% of cycles being spent in these two routines.  Every time direct I/O
>>> to hugetlbfs pages starts, kernel calls get_page() to grab a reference to
>>> the pages and calls put_page() when I/O completes to put the reference
>>> away.  THP introduced significant amount of locking overhead to get_page()
>>> and put_page() when dealing with compound pages because hugepages can be
>>> split underneath get_page() and put_page().  It added this overhead
>>> irrespective of whether it is dealing with hugetlbfs pages or transparent
>>> hugepages.  This resulted in 20%-45% drop in aio performance when using
>>> hugetlbfs pages.
>>>
>>> Since hugetlbfs pages can not be split, there is no reason to go through
>>> all the locking overhead for these pages from what I can see.  I added
>>> code to __get_page_tail() and put_compound_page() to bypass all the
>>> locking code when working with hugetlbfs pages.  This improved performance
>>> significantly.  Performance numbers with this patch:
>>>
>>> 		pre-THP		3.11-rc5	3.11-rc5 + Patch
>>> 1M read		8384 MB/s	6501 MB/s	8371 MB/s
>>> 64K read	7867 MB/s	4251 MB/s	6510 MB/s
>>>
>>> Performance with 64K read is still lower than what it was before THP, but
>>> still a 53% improvement.  It does mean there is more work to be done but I
>>> will take a 53% improvement for now.
>>>
>>> Please take a look at the following patch and let me know if it looks
>>> reasonable.
>>>
>>> [akpm at linux-foundation.org: tweak comments]
>>> Signed-off-by: Khalid Aziz <khalid.aziz at oracle.com>
>>> Cc: Pravin B Shelar <pshelar at nicira.com>
>>> Cc: Christoph Lameter <cl at linux.com>
>>> Cc: Andrea Arcangeli <aarcange at redhat.com>
>>> Cc: Johannes Weiner <hannes at cmpxchg.org>
>>> Cc: Mel Gorman <mel at csn.ul.ie>
>>> Cc: Rik van Riel <riel at redhat.com>
>>> Cc: Minchan Kim <minchan at kernel.org>
>>> Cc: Andi Kleen <andi at firstfloor.org>
>>> Signed-off-by: Andrew Morton <akpm at linux-foundation.org>
>>> Signed-off-by: Linus Torvalds <torvalds at linux-foundation.org>
>>> [ luis: backported to 3.5: adjusted context ]
>>> Signed-off-by: Luis Henriques <luis.henriques at canonical.com>
>> Hi Greg,
>>
>> I suppose this patch also needed for 3.4, right?
>
> As it didn't originally apply there, I didn't apply it.
>
> If people think it should be applicable for 3.4, I'll take it.
>
> thanks,
>
> greg k-h
>

Hi Greg,

I did send you a backported version of this patch to apply to 3.0, 3.2 
and 3.4 last Monday and cc'd stable at vger.kernel.org. That patch should 
apply cleanly to those three kernels.

--
Khalid