Jaunty SRU #348836 - ext4: fix bb_prealloc_list corruption due to wrong group locking
Tim Gardner
tim.gardner at canonical.com
Tue Apr 14 16:57:46 UTC 2009
Given that ext4 is not the default FS, then this patch is likely not
going to make the release candidate (unless I have to upload for some
other show stopper).
Yes - these and other Jaunty SRUs are about to start falling on your head.
rtg
Stefan Bader wrote:
> Does this indicate that Steve did not think this is worth another pre-release
> upload? You wrote that you got positive feedback, so ACK from me. But as you
> put up this and the other mail, I take it, this fall onto my head. Then we need
> one more ACK.
>
> Tim Gardner wrote:
>> The following changes since commit 1901c1b176e0d8c62aeb1fd9512c45bc6010b180:
>> Eric Sandeen (1):
>> ext4: fix bb_prealloc_list corruption due to wrong group locking
>>
>> are available in the git repository at:
>>
>> git://kernel.ubuntu.com/rtg/ubuntu-jaunty lp348836
>>
>> From 1901c1b176e0d8c62aeb1fd9512c45bc6010b180 Mon Sep 17 00:00:00 2001
>> From: Eric Sandeen <sandeen at redhat.com>
>> Date: Mon, 16 Mar 2009 23:25:40 -0400
>> Subject: [PATCH] ext4: fix bb_prealloc_list corruption due to wrong group locking
>>
>> Bug: #348836
>>
>> This is for Red Hat bug 490026: EXT4 panic, list corruption in
>> ext4_mb_new_inode_pa
>>
>> ext4_lock_group(sb, group) is supposed to protect this list for
>> each group, and a common code flow to remove an album is like
>> this:
>>
>> ext4_get_group_no_and_offset(sb, pa->pa_pstart, &grp, NULL);
>> ext4_lock_group(sb, grp);
>> list_del(&pa->pa_group_list);
>> ext4_unlock_group(sb, grp);
>>
>> so it's critical that we get the right group number back for
>> this prealloc context, to lock the right group (the one
>> associated with this pa) and prevent concurrent list manipulation.
>>
>> however, ext4_mb_put_pa() passes in (pa->pa_pstart - 1) with a
>> comment, "-1 is to protect from crossing allocation group".
>>
>> This makes sense for the group_pa, where pa_pstart is advanced
>> by the length which has been used (in ext4_mb_release_context()),
>> and when the entire length has been used, pa_pstart has been
>> advanced to the first block of the next group.
>>
>> However, for inode_pa, pa_pstart is never advanced; it's just
>> set once to the first block in the group and not moved after
>> that. So in this case, if we subtract one in ext4_mb_put_pa(),
>> we are actually locking the *previous* group, and opening the
>> race with the other threads which do not subtract off the extra
>> block.
>>
>> Signed-off-by: Eric Sandeen <sandeen at redhat.com>
>> Signed-off-by: "Theodore Ts'o" <tytso at mit.edu>
>> Signed-off-by: Tim Gardner <tim.gardner at canonical.com>
>> ---
>> fs/ext4/mballoc.c | 9 +++++++--
>> 1 files changed, 7 insertions(+), 2 deletions(-)
>>
>> diff --git a/fs/ext4/mballoc.c b/fs/ext4/mballoc.c
>> index f8e923f..add854a 100644
>> --- a/fs/ext4/mballoc.c
>> +++ b/fs/ext4/mballoc.c
>> @@ -3586,6 +3586,7 @@ static void ext4_mb_put_pa(struct ext4_allocation_context *ac,
>> struct super_block *sb, struct ext4_prealloc_space *pa)
>> {
>> unsigned long grp;
>> + ext4_fsblk_t grp_blk;
>>
>> if (!atomic_dec_and_test(&pa->pa_count) || pa->pa_free != 0)
>> return;
>> @@ -3600,8 +3601,12 @@ static void ext4_mb_put_pa(struct ext4_allocation_context *ac,
>> pa->pa_deleted = 1;
>> spin_unlock(&pa->pa_lock);
>>
>> - /* -1 is to protect from crossing allocation group */
>> - ext4_get_group_no_and_offset(sb, pa->pa_pstart - 1, &grp, NULL);
>> + grp_blk = pa->pa_pstart;
>> + /* If linear, pa_pstart may be in the next group when pa is used up */
>> + if (pa->pa_linear)
>> + grp_blk--;
>> +
>> + ext4_get_group_no_and_offset(sb, grp_blk, &grp, NULL);
>>
>> /*
>> * possible race:
>
>
--
Tim Gardner tim.gardner at canonical.com
More information about the kernel-team
mailing list