Jaunty SRU #348836 - ext4: fix bb_prealloc_list corruption due to wrong group locking

Tim Gardner tim.gardner at canonical.com
Tue Apr 14 16:57:46 UTC 2009


Given that ext4 is not the default FS, then this patch is likely not
going to make the release candidate (unless I have to upload for some
other show stopper).

Yes - these and other Jaunty SRUs are about to start falling on your head.

rtg

Stefan Bader wrote:
> Does this indicate that Steve did not think this is worth another pre-release 
> upload? You wrote that you got positive feedback, so ACK from me. But as you 
> put up this and the other mail, I take it, this fall onto my head. Then we need 
> one more ACK.
> 
> Tim Gardner wrote:
>> The following changes since commit 1901c1b176e0d8c62aeb1fd9512c45bc6010b180:
>>   Eric Sandeen (1):
>>         ext4: fix bb_prealloc_list corruption due to wrong group locking
>>
>> are available in the git repository at:
>>
>>   git://kernel.ubuntu.com/rtg/ubuntu-jaunty lp348836
>>
>> From 1901c1b176e0d8c62aeb1fd9512c45bc6010b180 Mon Sep 17 00:00:00 2001
>> From: Eric Sandeen <sandeen at redhat.com>
>> Date: Mon, 16 Mar 2009 23:25:40 -0400
>> Subject: [PATCH] ext4: fix bb_prealloc_list corruption due to wrong group locking
>>
>> Bug: #348836
>>
>> This is for Red Hat bug 490026: EXT4 panic, list corruption in
>> ext4_mb_new_inode_pa
>>
>> ext4_lock_group(sb, group) is supposed to protect this list for
>> each group, and a common code flow to remove an album is like
>> this:
>>
>>     ext4_get_group_no_and_offset(sb, pa->pa_pstart, &grp, NULL);
>>     ext4_lock_group(sb, grp);
>>     list_del(&pa->pa_group_list);
>>     ext4_unlock_group(sb, grp);
>>
>> so it's critical that we get the right group number back for
>> this prealloc context, to lock the right group (the one
>> associated with this pa) and prevent concurrent list manipulation.
>>
>> however, ext4_mb_put_pa() passes in (pa->pa_pstart - 1) with a
>> comment, "-1 is to protect from crossing allocation group".
>>
>> This makes sense for the group_pa, where pa_pstart is advanced
>> by the length which has been used (in ext4_mb_release_context()),
>> and when the entire length has been used, pa_pstart has been
>> advanced to the first block of the next group.
>>
>> However, for inode_pa, pa_pstart is never advanced; it's just
>> set once to the first block in the group and not moved after
>> that.  So in this case, if we subtract one in ext4_mb_put_pa(),
>> we are actually locking the *previous* group, and opening the
>> race with the other threads which do not subtract off the extra
>> block.
>>
>> Signed-off-by: Eric Sandeen <sandeen at redhat.com>
>> Signed-off-by: "Theodore Ts'o" <tytso at mit.edu>
>> Signed-off-by: Tim Gardner <tim.gardner at canonical.com>
>> ---
>>  fs/ext4/mballoc.c |    9 +++++++--
>>  1 files changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>> index f8e923f..add854a 100644
>> --- a/fs/ext4/mballoc.c
>> +++ b/fs/ext4/mballoc.c
>> @@ -3586,6 +3586,7 @@ static void ext4_mb_put_pa(struct ext4_allocation_context *ac,
>>  			struct super_block *sb, struct ext4_prealloc_space *pa)
>>  {
>>  	unsigned long grp;
>> +	ext4_fsblk_t grp_blk;
>>  
>>  	if (!atomic_dec_and_test(&pa->pa_count) || pa->pa_free != 0)
>>  		return;
>> @@ -3600,8 +3601,12 @@ static void ext4_mb_put_pa(struct ext4_allocation_context *ac,
>>  	pa->pa_deleted = 1;
>>  	spin_unlock(&pa->pa_lock);
>>  
>> -	/* -1 is to protect from crossing allocation group */
>> -	ext4_get_group_no_and_offset(sb, pa->pa_pstart - 1, &grp, NULL);
>> +	grp_blk = pa->pa_pstart;
>> +	/* If linear, pa_pstart may be in the next group when pa is used up */
>> +	if (pa->pa_linear)
>> +		grp_blk--;
>> +
>> +	ext4_get_group_no_and_offset(sb, grp_blk, &grp, NULL);
>>  
>>  	/*
>>  	 * possible race:
> 
> 


-- 
Tim Gardner tim.gardner at canonical.com




More information about the kernel-team mailing list