[Bug 852760] Re: valgrind false positives on gcc-generated string routines
Bug Watch Updater
852760 at bugs.launchpad.net
Fri Jun 15 13:03:24 UTC 2012
Launchpad has imported 15 comments from the remote bug at
https://bugs.kde.org/show_bug.cgi?id=264936.
If you reply to an imported comment from within Launchpad, your comment
will be sent to the remote bug automatically. Read more about
Launchpad's inter-bugtracker facilities at
https://help.launchpad.net/InterBugTracking.
------------------------------------------------------------------------
On 2011-01-31T12:17:11+00:00 Joost-vandevondele wrote:
This bug report relates to two (closed invalid) bug reports in gcc
bugzilla.
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47522
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44183
PR47522 includes a runable example in the first comment.
the issue appears to be that vectorization can result in code that loads
elements beyond the last element of an allocated array. However, these
loads will only happen for unaligned data, where access to the last+1
element can't trigger a page fault or other side effects (according to
my interpretation of comments by gcc developers) and are never used. As
such, this is considered valid.
Since this kind of code will be produced increasingly by gcc, especially
for numerical codes (whenever vectorization triggers, essentially) it
would be great to have this somehow dealt with in valgrind.
Reply at: https://bugs.launchpad.net/valgrind/+bug/852760/comments/0
------------------------------------------------------------------------
On 2011-01-31T12:58:55+00:00 Jseward wrote:
> http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47522#c4
>
> I think valgrind should simply special-case these kind of out of bounds
> checks based on the instruction that was used.
Great. Why don't you tell me then how I am supposed to differentiate
between a vector load that is deliberately out of bounds vs one that is
out of bounds by accident, so I can emit an error for the latter but
not for the former?
Reply at: https://bugs.launchpad.net/valgrind/+bug/852760/comments/1
------------------------------------------------------------------------
On 2011-01-31T13:06:33+00:00 Joost-vandevondele wrote:
(In reply to comment #1)
> Great. Why don't you tell me then how I am supposed to differentiate
> between a vector load that is deliberately out of bounds vs one that is
> out of bounds by accident, so I can emit an error for the latter but
> not for the former?
Hey.... I'm a user, you're the developer ;-)
I'm really not the right person to ask. I guess there are some
signatures... it is a vector load, with at least one element that is
still part of an allocated array. Additionally, based on alignment the
'offending load(s)' can not cross a page boundary. Finally, the loaded
byte(s) propagate as uninitialized data, but never trigger the 'used
uninitialized error'. I suppose that you might get more details in the
gcc bugzilla.
Reply at: https://bugs.launchpad.net/valgrind/+bug/852760/comments/2
------------------------------------------------------------------------
On 2011-01-31T13:14:09+00:00 Jseward wrote:
Can you objdump -d the loop containing the complained-about load,
and post the results?
Reply at: https://bugs.launchpad.net/valgrind/+bug/852760/comments/3
------------------------------------------------------------------------
On 2011-01-31T13:34:08+00:00 Joost-vandevondele wrote:
So the valgrind message I have is:
==12860== Invalid read of size 8
==12860== at 0x400A38: integrate_gf_npbc_ (in /data03/vondele/bugs/valgrind/a.out)
==12860== by 0x40245B: main (in /data03/vondele/bugs/valgrind/a.out)
==12860== Address 0x58e9e40 is 0 bytes after a block of size 272 alloc'd
==12860== at 0x4C26C3A: malloc (in /usr/lib64/valgrind/vgpreload_memcheck-amd64-linux.so)
==12860== by 0x402209: main (in /data03/vondele/bugs/valgrind/a.out)
The corresponding asm from objdump is:
0000000000400720 <integrate_gf_npbc_>:
400720: 41 57 push %r15
400722: 41 56 push %r14
400724: 41 55 push %r13
400726: 41 54 push %r12
400728: 49 89 fc mov %rdi,%r12
40072b: 31 ff xor %edi,%edi
40072d: 55 push %rbp
40072e: 53 push %rbx
40072f: 48 83 ec 50 sub $0x50,%rsp
400733: 49 63 18 movslq (%r8),%rbx
400736: 45 8b 09 mov (%r9),%r9d
400739: 48 89 54 24 20 mov %rdx,0x20(%rsp)
40073e: 49 63 50 04 movslq 0x4(%r8),%rdx
400742: 48 89 74 24 a0 mov %rsi,-0x60(%rsp)
400747: 49 63 70 08 movslq 0x8(%r8),%rsi
40074b: 48 8b 84 24 b0 00 00 mov 0xb0(%rsp),%rax
400752: 00
400753: 48 83 c2 01 add $0x1,%rdx
400757: 48 29 da sub %rbx,%rdx
40075a: 48 0f 48 d7 cmovs %rdi,%rdx
40075e: 48 89 54 24 f0 mov %rdx,-0x10(%rsp)
400763: 49 63 50 0c movslq 0xc(%r8),%rdx
400767: 48 8b 6c 24 f0 mov -0x10(%rsp),%rbp
40076c: 48 83 c2 01 add $0x1,%rdx
400770: 48 29 f2 sub %rsi,%rdx
400773: 48 0f af 54 24 f0 imul -0x10(%rsp),%rdx
400779: 48 85 d2 test %rdx,%rdx
40077c: 48 0f 49 fa cmovns %rdx,%rdi
400780: 48 89 da mov %rbx,%rdx
400783: 48 01 db add %rbx,%rbx
400786: 48 0f af ee imul %rsi,%rbp
40078a: 48 89 7c 24 c0 mov %rdi,-0x40(%rsp)
40078f: 48 f7 da neg %rdx
400792: 49 63 78 10 movslq 0x10(%r8),%rdi
400796: 48 01 f6 add %rsi,%rsi
400799: 48 f7 d3 not %rbx
40079c: 48 f7 d6 not %rsi
40079f: 48 89 5c 24 b0 mov %rbx,-0x50(%rsp)
4007a4: 44 89 4c 24 cc mov %r9d,-0x34(%rsp)
4007a9: 48 89 74 24 10 mov %rsi,0x10(%rsp)
4007ae: 48 8b b4 24 88 00 00 mov 0x88(%rsp),%rsi
4007b5: 00
4007b6: 48 29 ea sub %rbp,%rdx
4007b9: 48 8b 6c 24 c0 mov -0x40(%rsp),%rbp
4007be: 48 8d 1c 3f lea (%rdi,%rdi,1),%rbx
4007c2: 8b 36 mov (%rsi),%esi
4007c4: 48 0f af ef imul %rdi,%rbp
4007c8: 48 f7 d3 not %rbx
4007cb: 89 74 24 08 mov %esi,0x8(%rsp)
4007cf: 48 29 ea sub %rbp,%rdx
4007d2: 41 39 f1 cmp %esi,%r9d
4007d5: 0f 8f 4d 05 00 00 jg 400d28 <integrate_gf_npbc_+0x608>
4007db: 48 8b b4 24 90 00 00 mov 0x90(%rsp),%rsi
4007e2: 00
4007e3: 48 8b 7c 24 10 mov 0x10(%rsp),%rdi
4007e8: 4c 8b 74 24 20 mov 0x20(%rsp),%r14
4007ed: 8b 36 mov (%rsi),%esi
4007ef: 89 74 24 04 mov %esi,0x4(%rsp)
4007f3: 48 8b b4 24 98 00 00 mov 0x98(%rsp),%rsi
4007fa: 00
4007fb: 8b 36 mov (%rsi),%esi
4007fd: 89 74 24 0c mov %esi,0xc(%rsp)
400801: 83 ee 01 sub $0x1,%esi
400804: 89 74 24 1c mov %esi,0x1c(%rsp)
400808: 2b 74 24 04 sub 0x4(%rsp),%esi
40080c: d1 ee shr %esi
40080e: 89 74 24 2c mov %esi,0x2c(%rsp)
400812: 48 63 74 24 0c movslq 0xc(%rsp),%rsi
400817: 44 8b 7c 24 2c mov 0x2c(%rsp),%r15d
40081c: 48 89 74 24 30 mov %rsi,0x30(%rsp)
400821: 49 63 f1 movslq %r9d,%rsi
400824: 48 8d 5c 73 01 lea 0x1(%rbx,%rsi,2),%rbx
400829: 48 0f af 74 24 c0 imul -0x40(%rsp),%rsi
40082f: 4c 8d 2c d9 lea (%rcx,%rbx,8),%r13
400833: 48 8b 4c 24 f0 mov -0x10(%rsp),%rcx
400838: 48 01 d6 add %rdx,%rsi
40083b: 48 8b 54 24 30 mov 0x30(%rsp),%rdx
400840: 48 0f af 54 24 f0 imul -0x10(%rsp),%rdx
400846: 48 8d 14 16 lea (%rsi,%rdx,1),%rdx
40084a: 48 89 54 24 f8 mov %rdx,-0x8(%rsp)
40084f: 48 63 54 24 04 movslq 0x4(%rsp),%rdx
400854: 48 0f af ca imul %rdx,%rcx
400858: 48 8d 14 57 lea (%rdi,%rdx,2),%rdx
40085c: 49 8d 14 d6 lea (%r14,%rdx,8),%rdx
400860: 48 8d 0c 0e lea (%rsi,%rcx,1),%rcx
400864: 48 89 54 24 38 mov %rdx,0x38(%rsp)
400869: 8b 54 24 04 mov 0x4(%rsp),%edx
40086d: 48 89 4c 24 e0 mov %rcx,-0x20(%rsp)
400872: 48 89 4c 24 e8 mov %rcx,-0x18(%rsp)
400877: 48 8b 4c 24 30 mov 0x30(%rsp),%rcx
40087c: 46 8d 7c 7a 01 lea 0x1(%rdx,%r15,2),%r15d
400881: 44 89 7c 24 44 mov %r15d,0x44(%rsp)
400886: 48 8d 4c 4f 01 lea 0x1(%rdi,%rcx,2),%rcx
40088b: 48 89 4c 24 48 mov %rcx,0x48(%rsp)
400890: 8b 5c 24 1c mov 0x1c(%rsp),%ebx
400894: 39 5c 24 04 cmp %ebx,0x4(%rsp)
400898: ba ff ff ff 7f mov $0x7fffffff,%edx
40089d: 0f 8f 51 03 00 00 jg 400bf4 <integrate_gf_npbc_+0x4d4>
4008a3: 48 8b b4 24 a0 00 00 mov 0xa0(%rsp),%rsi
4008aa: 00
4008ab: 48 8b bc 24 a8 00 00 mov 0xa8(%rsp),%rdi
4008b2: 00
4008b3: 48 8b 4c 24 e8 mov -0x18(%rsp),%rcx
4008b8: 4c 8b 74 24 f0 mov -0x10(%rsp),%r14
4008bd: 4c 8b 5c 24 f0 mov -0x10(%rsp),%r11
4008c2: 4c 03 5c 24 e8 add -0x18(%rsp),%r11
4008c7: 8b 36 mov (%rsi),%esi
4008c9: 8b 3f mov (%rdi),%edi
4008cb: 48 8b 5c 24 b0 mov -0x50(%rsp),%rbx
4008d0: f2 44 0f 10 10 movsd (%rax),%xmm10
4008d5: f2 44 0f 10 48 08 movsd 0x8(%rax),%xmm9
4008db: 49 c1 e6 04 shl $0x4,%r14
4008df: 48 63 d6 movslq %esi,%rdx
4008e2: 89 74 24 9c mov %esi,-0x64(%rsp)
4008e6: 89 7c 24 98 mov %edi,-0x68(%rsp)
4008ea: 48 8d 0c 0a lea (%rdx,%rcx,1),%rcx
4008ee: f2 44 0f 10 40 10 movsd 0x10(%rax),%xmm8
4008f4: 4c 89 74 24 88 mov %r14,-0x78(%rsp)
4008f9: 4c 8b 74 24 a0 mov -0x60(%rsp),%r14
4008fe: 4e 8d 1c 1a lea (%rdx,%r11,1),%r11
400902: 49 8d 34 cc lea (%r12,%rcx,8),%rsi
400906: 8b 4c 24 98 mov -0x68(%rsp),%ecx
40090a: 2b 4c 24 9c sub -0x64(%rsp),%ecx
40090e: 48 8d 14 53 lea (%rbx,%rdx,2),%rdx
400912: 4c 8b 7c 24 f0 mov -0x10(%rsp),%r15
400917: 4c 8b 54 24 f0 mov -0x10(%rsp),%r10
40091c: 4c 03 54 24 e0 add -0x20(%rsp),%r10
400921: 48 8b 7c 24 38 mov 0x38(%rsp),%rdi
400926: 49 c1 e3 03 shl $0x3,%r11
40092a: 4d 8d 74 d6 10 lea 0x10(%r14,%rdx,8),%r14
40092f: 4c 8b 4c 24 e0 mov -0x20(%rsp),%r9
400934: 44 8b 44 24 2c mov 0x2c(%rsp),%r8d
400939: 83 c1 01 add $0x1,%ecx
40093c: 4d 01 ff add %r15,%r15
40093f: 48 89 54 24 b8 mov %rdx,-0x48(%rsp)
400944: 89 cd mov %ecx,%ebp
400946: 89 4c 24 a8 mov %ecx,-0x58(%rsp)
40094a: 4c 89 7c 24 90 mov %r15,-0x70(%rsp)
40094f: d1 ed shr %ebp
400951: 4c 89 74 24 d0 mov %r14,-0x30(%rsp)
400956: 8d 4c 2d 00 lea 0x0(%rbp,%rbp,1),%ecx
40095a: 89 4c 24 ac mov %ecx,-0x54(%rsp)
40095e: 03 4c 24 9c add -0x64(%rsp),%ecx
400962: 89 4c 24 dc mov %ecx,-0x24(%rsp)
400966: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
40096d: 00 00 00
400970: 44 8b 7c 24 98 mov -0x68(%rsp),%r15d
400975: 44 39 7c 24 9c cmp %r15d,-0x64(%rsp)
40097a: 66 0f 57 ed xorpd %xmm5,%xmm5
40097e: 66 0f 28 c5 movapd %xmm5,%xmm0
400982: 66 0f 28 fd movapd %xmm5,%xmm7
400986: 0f 8f 1e 02 00 00 jg 400baa <integrate_gf_npbc_+0x48a>
40098c: 8b 54 24 ac mov -0x54(%rsp),%edx
400990: 85 d2 test %edx,%edx
400992: 0f 84 9f 03 00 00 je 400d37 <integrate_gf_npbc_+0x617>
400998: 83 7c 24 a8 09 cmpl $0x9,-0x58(%rsp)
40099d: 0f 86 94 03 00 00 jbe 400d37 <integrate_gf_npbc_+0x617>
4009a3: 66 0f 57 e4 xorpd %xmm4,%xmm4
4009a7: 48 8b 54 24 b8 mov -0x48(%rsp),%rdx
4009ac: 4c 8b 74 24 a0 mov -0x60(%rsp),%r14
4009b1: 48 8b 5c 24 d0 mov -0x30(%rsp),%rbx
4009b6: 4f 8d 3c 1c lea (%r12,%r11,1),%r15
4009ba: 66 0f 28 fc movapd %xmm4,%xmm7
4009be: 66 0f 28 ec movapd %xmm4,%xmm5
4009c2: 66 44 0f 28 dc movapd %xmm4,%xmm11
4009c7: 49 8d 4c d6 08 lea 0x8(%r14,%rdx,8),%rcx
4009cc: 31 d2 xor %edx,%edx
4009ce: 45 31 f6 xor %r14d,%r14d
4009d1: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
4009d8: f2 44 0f 10 24 16 movsd (%rsi,%rdx,1),%xmm12
4009de: 41 83 c6 01 add $0x1,%r14d
4009e2: f2 0f 10 31 movsd (%rcx),%xmm6
4009e6: 66 44 0f 16 64 16 08 movhpd 0x8(%rsi,%rdx,1),%xmm12
4009ed: f2 41 0f 10 04 17 movsd (%r15,%rdx,1),%xmm0
4009f3: 66 0f 16 71 08 movhpd 0x8(%rcx),%xmm6
4009f8: 66 41 0f 28 dc movapd %xmm12,%xmm3
4009fd: f2 44 0f 10 61 10 movsd 0x10(%rcx),%xmm12
400a03: 66 0f 28 ce movapd %xmm6,%xmm1
400a07: 66 41 0f 16 44 17 08 movhpd 0x8(%r15,%rdx,1),%xmm0
400a0e: 66 44 0f 16 61 18 movhpd 0x18(%rcx),%xmm12
400a14: f2 0f 10 33 movsd (%rbx),%xmm6
400a18: 66 0f 28 d0 movapd %xmm0,%xmm2
400a1c: 48 83 c2 10 add $0x10,%rdx
400a20: 66 41 0f 14 cc unpcklpd %xmm12,%xmm1
400a25: 66 0f 16 73 08 movhpd 0x8(%rbx),%xmm6
400a2a: f2 44 0f 10 63 10 movsd 0x10(%rbx),%xmm12
400a30: 48 83 c1 20 add $0x20,%rcx
400a34: 66 0f 28 c6 movapd %xmm6,%xmm0
400a38: 66 44 0f 16 63 18 movhpd 0x18(%rbx),%xmm12
400a3e: 66 0f 28 f1 movapd %xmm1,%xmm6
400a42: 66 0f 59 ca mulpd %xmm2,%xmm1
400a46: 48 83 c3 20 add $0x20,%rbx
400a4a: 41 39 ee cmp %ebp,%r14d
400a4d: 66 41 0f 14 c4 unpcklpd %xmm12,%xmm0
400a52: 66 0f 59 f3 mulpd %xmm3,%xmm6
400a56: 66 0f 59 d8 mulpd %xmm0,%xmm3
400a5a: 66 0f 58 f9 addpd %xmm1,%xmm7
400a5e: 66 0f 59 c2 mulpd %xmm2,%xmm0
400a62: 66 44 0f 58 de addpd %xmm6,%xmm11
400a67: 66 0f 58 eb addpd %xmm3,%xmm5
400a6b: 66 0f 58 e0 addpd %xmm0,%xmm4
400a6f: 0f 82 63 ff ff ff jb 4009d8 <integrate_gf_npbc_+0x2b8>
400a75: 66 0f 28 c4 movapd %xmm4,%xmm0
400a79: 8b 54 24 a8 mov -0x58(%rsp),%edx
400a7d: 66 44 0f 28 e7 movapd %xmm7,%xmm12
400a82: 39 54 24 ac cmp %edx,-0x54(%rsp)
400a86: 66 0f 15 c0 unpckhpd %xmm0,%xmm0
400a8a: 8b 4c 24 dc mov -0x24(%rsp),%ecx
400a8e: 66 45 0f 15 e4 unpckhpd %xmm12,%xmm12
400a93: 66 0f 28 f0 movapd %xmm0,%xmm6
400a97: 66 0f 28 c5 movapd %xmm5,%xmm0
400a9b: f2 0f 58 f4 addsd %xmm4,%xmm6
400a9f: 66 41 0f 28 e4 movapd %xmm12,%xmm4
400aa4: 66 0f 15 c0 unpckhpd %xmm0,%xmm0
400aa8: 66 45 0f 28 e3 movapd %xmm11,%xmm12
400aad: f2 0f 58 e7 addsd %xmm7,%xmm4
400ab1: 66 45 0f 15 e4 unpckhpd %xmm12,%xmm12
400ab6: 66 0f 28 f8 movapd %xmm0,%xmm7
400aba: f2 0f 58 fd addsd %xmm5,%xmm7
400abe: 66 41 0f 28 ec movapd %xmm12,%xmm5
400ac3: f2 41 0f 58 eb addsd %xmm11,%xmm5
400ac8: 0f 84 90 00 00 00 je 400b5e <integrate_gf_npbc_+0x43e>
400ace: 48 63 d1 movslq %ecx,%rdx
400ad1: 4c 8b 7c 24 b0 mov -0x50(%rsp),%r15
400ad6: 4a 8d 1c 0a lea (%rdx,%r9,1),%rbx
400ada: 4d 8d 34 dc lea (%r12,%rbx,8),%r14
400ade: 4a 8d 1c 12 lea (%rdx,%r10,1),%rbx
400ae2: 49 8d 54 57 01 lea 0x1(%r15,%rdx,2),%rdx
400ae7: 4c 8b 7c 24 a0 mov -0x60(%rsp),%r15
400aec: 49 8d 1c dc lea (%r12,%rbx,8),%rbx
400af0: 49 8d 14 d7 lea (%r15,%rdx,8),%rdx
400af4: 44 8b 7c 24 98 mov -0x68(%rsp),%r15d
400af9: 41 29 cf sub %ecx,%r15d
400afc: 31 c9 xor %ecx,%ecx
400afe: 4e 8d 3c fd 08 00 00 lea 0x8(,%r15,8),%r15
400b05: 00
400b06: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1)
400b0d: 00 00 00
400b10: f2 41 0f 10 1e movsd (%r14),%xmm3
400b15: 48 83 c1 08 add $0x8,%rcx
400b19: f2 0f 10 0b movsd (%rbx),%xmm1
400b1d: 49 83 c6 08 add $0x8,%r14
400b21: f2 0f 10 12 movsd (%rdx),%xmm2
400b25: 48 83 c3 08 add $0x8,%rbx
400b29: f2 0f 10 42 08 movsd 0x8(%rdx),%xmm0
400b2e: 48 83 c2 10 add $0x10,%rdx
400b32: 66 44 0f 28 db movapd %xmm3,%xmm11
400b37: 4c 39 f9 cmp %r15,%rcx
400b3a: f2 0f 59 d8 mulsd %xmm0,%xmm3
400b3e: f2 44 0f 59 da mulsd %xmm2,%xmm11
400b43: f2 0f 59 c1 mulsd %xmm1,%xmm0
400b47: f2 0f 59 d1 mulsd %xmm1,%xmm2
400b4b: f2 0f 58 fb addsd %xmm3,%xmm7
400b4f: f2 41 0f 58 eb addsd %xmm11,%xmm5
400b54: f2 0f 58 f0 addsd %xmm0,%xmm6
400b58: f2 0f 58 e2 addsd %xmm2,%xmm4
400b5c: 75 b2 jne 400b10 <integrate_gf_npbc_+0x3f0>
400b5e: f2 0f 10 4f 08 movsd 0x8(%rdi),%xmm1
400b63: f2 0f 10 57 18 movsd 0x18(%rdi),%xmm2
400b68: 66 0f 28 dc movapd %xmm4,%xmm3
400b6c: f2 0f 10 47 10 movsd 0x10(%rdi),%xmm0
400b71: f2 0f 59 da mulsd %xmm2,%xmm3
400b75: f2 0f 59 c5 mulsd %xmm5,%xmm0
400b79: f2 0f 59 67 20 mulsd 0x20(%rdi),%xmm4
400b7e: f2 0f 59 e9 mulsd %xmm1,%xmm5
400b82: f2 0f 59 f9 mulsd %xmm1,%xmm7
400b86: f2 0f 59 f2 mulsd %xmm2,%xmm6
400b8a: f2 41 0f 10 4d 00 movsd 0x0(%r13),%xmm1
400b90: f2 0f 58 eb addsd %xmm3,%xmm5
400b94: f2 0f 58 c4 addsd %xmm4,%xmm0
400b98: f2 0f 58 fe addsd %xmm6,%xmm7
400b9c: f2 41 0f 59 6d 08 mulsd 0x8(%r13),%xmm5
400ba2: f2 0f 59 c1 mulsd %xmm1,%xmm0
400ba6: f2 0f 59 f9 mulsd %xmm1,%xmm7
400baa: f2 44 0f 58 d7 addsd %xmm7,%xmm10
400baf: 48 83 c7 20 add $0x20,%rdi
400bb3: 48 03 74 24 88 add -0x78(%rsp),%rsi
400bb8: f2 44 0f 58 c8 addsd %xmm0,%xmm9
400bbd: 4c 03 5c 24 88 add -0x78(%rsp),%r11
400bc2: 4c 03 4c 24 90 add -0x70(%rsp),%r9
400bc7: f2 44 0f 58 c5 addsd %xmm5,%xmm8
400bcc: 4c 03 54 24 90 add -0x70(%rsp),%r10
400bd1: 45 85 c0 test %r8d,%r8d
400bd4: f2 44 0f 11 10 movsd %xmm10,(%rax)
400bd9: f2 44 0f 11 48 08 movsd %xmm9,0x8(%rax)
400bdf: f2 44 0f 11 40 10 movsd %xmm8,0x10(%rax)
400be5: 74 09 je 400bf0 <integrate_gf_npbc_+0x4d0>
400be7: 41 83 e8 01 sub $0x1,%r8d
400beb: e9 80 fd ff ff jmpq 400970 <integrate_gf_npbc_+0x250>
400bf0: 8b 54 24 44 mov 0x44(%rsp),%edx
400bf4: 3b 54 24 0c cmp 0xc(%rsp),%edx
400bf8: 0f 84 f5 00 00 00 je 400cf3 <integrate_gf_npbc_+0x5d3>
400bfe: 48 8b 94 24 a0 00 00 mov 0xa0(%rsp),%rdx
400c05: 00
400c06: 48 8b 9c 24 a8 00 00 mov 0xa8(%rsp),%rbx
400c0d: 00
400c0e: 66 0f 57 c0 xorpd %xmm0,%xmm0
400c12: 8b 0a mov (%rdx),%ecx
400c14: 8b 33 mov (%rbx),%esi
400c16: 66 0f 28 d0 movapd %xmm0,%xmm2
400c1a: 66 0f 28 d8 movapd %xmm0,%xmm3
400c1e: 39 f1 cmp %esi,%ecx
400c20: 0f 8f b1 00 00 00 jg 400cd7 <integrate_gf_npbc_+0x5b7>
400c26: 48 8b 5c 24 f8 mov -0x8(%rsp),%rbx
400c2b: 48 8b 7c 24 b0 mov -0x50(%rsp),%rdi
400c30: 48 63 d1 movslq %ecx,%rdx
400c33: 4c 8b 74 24 a0 mov -0x60(%rsp),%r14
400c38: 66 0f 57 c9 xorpd %xmm1,%xmm1
400c3c: 29 ce sub %ecx,%esi
400c3e: 31 c9 xor %ecx,%ecx
400c40: 48 8d 1c 1a lea (%rdx,%rbx,1),%rbx
400c44: 48 8d 54 57 01 lea 0x1(%rdi,%rdx,2),%rdx
400c49: 48 8d 34 f5 08 00 00 lea 0x8(,%rsi,8),%rsi
400c50: 00
400c51: 66 0f 28 d1 movapd %xmm1,%xmm2
400c55: 49 8d 1c dc lea (%r12,%rbx,8),%rbx
400c59: 49 8d 14 d6 lea (%r14,%rdx,8),%rdx
400c5d: 0f 1f 00 nopl (%rax)
400c60: f2 0f 10 03 movsd (%rbx),%xmm0
400c64: 48 83 c1 08 add $0x8,%rcx
400c68: f2 0f 10 1a movsd (%rdx),%xmm3
400c6c: 48 83 c3 08 add $0x8,%rbx
400c70: f2 0f 59 d8 mulsd %xmm0,%xmm3
400c74: f2 0f 59 42 08 mulsd 0x8(%rdx),%xmm0
400c79: 48 83 c2 10 add $0x10,%rdx
400c7d: 48 39 f1 cmp %rsi,%rcx
400c80: f2 0f 58 cb addsd %xmm3,%xmm1
400c84: f2 0f 58 d0 addsd %xmm0,%xmm2
400c88: 75 d6 jne 400c60 <integrate_gf_npbc_+0x540>
400c8a: 48 8b 54 24 20 mov 0x20(%rsp),%rdx
400c8f: 4c 8b 7c 24 48 mov 0x48(%rsp),%r15
400c94: f2 41 0f 10 65 00 movsd 0x0(%r13),%xmm4
400c9a: 48 8b 4c 24 30 mov 0x30(%rsp),%rcx
400c9f: 48 8b 5c 24 10 mov 0x10(%rsp),%rbx
400ca4: 48 8b 74 24 20 mov 0x20(%rsp),%rsi
400ca9: f2 42 0f 10 04 fa movsd (%rdx,%r15,8),%xmm0
400caf: 66 0f 28 d8 movapd %xmm0,%xmm3
400cb3: 48 8d 54 4b 02 lea 0x2(%rbx,%rcx,2),%rdx
400cb8: f2 41 0f 59 45 08 mulsd 0x8(%r13),%xmm0
400cbe: f2 0f 59 dc mulsd %xmm4,%xmm3
400cc2: f2 0f 59 da mulsd %xmm2,%xmm3
400cc6: f2 0f 10 14 d6 movsd (%rsi,%rdx,8),%xmm2
400ccb: f2 0f 59 c1 mulsd %xmm1,%xmm0
400ccf: f2 0f 59 d4 mulsd %xmm4,%xmm2
400cd3: f2 0f 59 d1 mulsd %xmm1,%xmm2
400cd7: f2 0f 58 18 addsd (%rax),%xmm3
400cdb: f2 0f 58 50 08 addsd 0x8(%rax),%xmm2
400ce0: f2 0f 58 40 10 addsd 0x10(%rax),%xmm0
400ce5: f2 0f 11 18 movsd %xmm3,(%rax)
400ce9: f2 0f 11 50 08 movsd %xmm2,0x8(%rax)
400cee: f2 0f 11 40 10 movsd %xmm0,0x10(%rax)
400cf3: 48 8b 7c 24 c0 mov -0x40(%rsp),%rdi
400cf8: 49 83 c5 10 add $0x10,%r13
400cfc: 48 01 7c 24 f8 add %rdi,-0x8(%rsp)
400d01: 48 01 7c 24 e0 add %rdi,-0x20(%rsp)
400d06: 44 8b 74 24 08 mov 0x8(%rsp),%r14d
400d0b: 48 01 7c 24 e8 add %rdi,-0x18(%rsp)
400d10: 44 39 74 24 cc cmp %r14d,-0x34(%rsp)
400d15: 74 11 je 400d28 <integrate_gf_npbc_+0x608>
400d17: 83 44 24 cc 01 addl $0x1,-0x34(%rsp)
400d1c: e9 6f fb ff ff jmpq 400890 <integrate_gf_npbc_+0x170>
400d21: 0f 1f 80 00 00 00 00 nopl 0x0(%rax)
400d28: 48 83 c4 50 add $0x50,%rsp
400d2c: 5b pop %rbx
400d2d: 5d pop %rbp
400d2e: 41 5c pop %r12
400d30: 41 5d pop %r13
400d32: 41 5e pop %r14
400d34: 41 5f pop %r15
400d36: c3 retq
400d37: 66 0f 57 e4 xorpd %xmm4,%xmm4
400d3b: 8b 4c 24 9c mov -0x64(%rsp),%ecx
400d3f: 66 0f 28 ec movapd %xmm4,%xmm5
400d43: 66 0f 28 f4 movapd %xmm4,%xmm6
400d47: 66 0f 28 fc movapd %xmm4,%xmm7
400d4b: e9 7e fd ff ff jmpq 400ace <integrate_gf_npbc_+0x3ae>
Reply at: https://bugs.launchpad.net/valgrind/+bug/852760/comments/4
------------------------------------------------------------------------
On 2011-01-31T14:03:38+00:00 Jseward wrote:
This seems to me like a bug in gcc. From the following analysis
(start reading at 0x400a38), the value loaded from memory is never
used -- xmm12 is completely overwritten by subsequent instructions,
either in the post-loop block, or in the first instruction of the
next iteration.
==12860== Invalid read of size 8
==12860== at 0x400A38: integrate_gf_npbc_
# def xmm12 (low half loaded, high half zeroed)
4009d8: f2 44 0f 10 24 16 movsd (%rsi,%rdx,1),%xmm12
4009de: 41 83 c6 01 add $0x1,%r14d
4009e2: f2 0f 10 31 movsd (%rcx),%xmm6
4009e6: 66 44 0f 16 64 16 08 movhpd 0x8(%rsi,%rdx,1),%xmm12
4009ed: f2 41 0f 10 04 17 movsd (%r15,%rdx,1),%xmm0
4009f3: 66 0f 16 71 08 movhpd 0x8(%rcx),%xmm6
4009f8: 66 41 0f 28 dc movapd %xmm12,%xmm3
4009fd: f2 44 0f 10 61 10 movsd 0x10(%rcx),%xmm12
400a03: 66 0f 28 ce movapd %xmm6,%xmm1
400a07: 66 41 0f 16 44 17 08 movhpd 0x8(%r15,%rdx,1),%xmm0
400a0e: 66 44 0f 16 61 18 movhpd 0x18(%rcx),%xmm12
400a14: f2 0f 10 33 movsd (%rbx),%xmm6
400a18: 66 0f 28 d0 movapd %xmm0,%xmm2
400a1c: 48 83 c2 10 add $0x10,%rdx
400a20: 66 41 0f 14 cc unpcklpd %xmm12,%xmm1
400a25: 66 0f 16 73 08 movhpd 0x8(%rbx),%xmm6
400a2a: f2 44 0f 10 63 10 movsd 0x10(%rbx),%xmm12
400a30: 48 83 c1 20 add $0x20,%rcx
400a34: 66 0f 28 c6 movapd %xmm6,%xmm0
# load high half xmm12 (error reported here). low half unchanged.
400a38: 66 44 0f 16 63 18 movhpd 0x18(%rbx),%xmm12
400a3e: 66 0f 28 f1 movapd %xmm1,%xmm6
400a42: 66 0f 59 ca mulpd %xmm2,%xmm1
400a46: 48 83 c3 20 add $0x20,%rbx
400a4a: 41 39 ee cmp %ebp,%r14d
# reads low half xmm12 only
400a4d: 66 41 0f 14 c4 unpcklpd %xmm12,%xmm0
400a52: 66 0f 59 f3 mulpd %xmm3,%xmm6
400a56: 66 0f 59 d8 mulpd %xmm0,%xmm3
400a5a: 66 0f 58 f9 addpd %xmm1,%xmm7
400a5e: 66 0f 59 c2 mulpd %xmm2,%xmm0
400a62: 66 44 0f 58 de addpd %xmm6,%xmm11
400a67: 66 0f 58 eb addpd %xmm3,%xmm5
400a6b: 66 0f 58 e0 addpd %xmm0,%xmm4
400a6f: 0f 82 63 ff ff ff jb 4009d8 # (loop head)
400a75: 66 0f 28 c4 movapd %xmm4,%xmm0
400a79: 8b 54 24 a8 mov -0x58(%rsp),%edx
# def xmm12 (overwrite both halves)
400a7d: 66 44 0f 28 e7 movapd %xmm7,%xmm12
Reply at: https://bugs.launchpad.net/valgrind/+bug/852760/comments/5
------------------------------------------------------------------------
On 2011-01-31T14:04:53+00:00 Jakub Jelinek wrote:
Similar testcase is gcc's own libcpp/lex.c optimization, which also can access a few bytes after malloced area, as long as at least one byte in the value read is from within the malloced area. See search_line_* routines in lex.c, not just SSE4.2/SSE2, but also even the generic C version actually does this.
I guess valgrind could mark somehow the extra bytes as undefined content and propagate it through following arithmetic instructions, complain only if some conditional jump was made solely on the undefined bits or if the undefined bits were stored somewhere (or similar heuristics).
Reply at: https://bugs.launchpad.net/valgrind/+bug/852760/comments/6
------------------------------------------------------------------------
On 2011-01-31T14:22:34+00:00 Joost-vandevondele wrote:
(In reply to comment #5)
> This seems to me like a bug in gcc.
Unfortunately, I'm an asm novice, so I can't tell. I see Jakub is on the
CC as well, so maybe he can judge?
Alternatively, I can reopen
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47522
and refer here?
Reply at: https://bugs.launchpad.net/valgrind/+bug/852760/comments/7
------------------------------------------------------------------------
On 2011-01-31T14:43:59+00:00 Jseward wrote:
(In reply to comment #6)
> Similar testcase is gcc's own libcpp/lex.c optimization, which also can access
> a few bytes after malloced area, as long as at least one byte in the value read
> is from within the malloced area.
Those loops are (effectively) vectorised while loops, in which you use
standard carry-chain propagation tricks to ensure that the stopping
condition for the loop does not rely on the data from beyond the malloced
area. It is not possible to vectorise them without such over-reading.
By contrast, Joost's loop (and anything gcc can vectorise) are countable
loops: the trip count is known (at run time) before the loop begins. It
is always possible to vectorise such a loop without generating memory
over reads, by having a vector loop to do (trip_count / vector_width)
iterations, and a scalar fixup loop to do the final (trip_count % vector_width)
iterations.
> I guess valgrind could mark somehow the extra bytes as undefined content and
> propagate it through following arithmetic instructions, complain only if some
> conditional jump was made solely on the undefined bits or if the undefined bits
> were stored somewhere (or similar heuristics).
Well, maybe .. but Memcheck is too slow already. I don't want to junk it up
with expensive and complicated heuristics that are irrelevant for 99.9% of
the loads it will encounter.
If you can show me some way to identify just the loads that need special
treatment, then maybe. I don't see how to identify them, though.
Reply at: https://bugs.launchpad.net/valgrind/+bug/852760/comments/8
------------------------------------------------------------------------
On 2011-02-18T18:28:20+00:00 Jakub Jelinek wrote:
Another simple testcase: https://bugzilla.redhat.com/show_bug.cgi?id=678518
I don't think 99% above is the right figure, at least with recent gcc generated code these false positives are just way too common. We disable a bunch of them in glibc through a suppression file or overloading the strops implementations,
but when gcc inlines those there is no way to get rid of the false positives.
Can't valgrind just start tracking in more details whether the bytes are
actually used or not when memcheck sees a suspect read (in most cases
just an aligned read where at least the first byte is still in the
allocated region and perhaps some further ones aren't)? Force then
retranslation of the bb it was used in or something similar?
Reply at: https://bugs.launchpad.net/valgrind/+bug/852760/comments/9
------------------------------------------------------------------------
On 2011-02-18T19:28:18+00:00 Jseward wrote:
(In reply to comment #9)
> I don't think 99% above is the right figure, at least with recent
> gcc generated
What version of gcc?
Reply at: https://bugs.launchpad.net/valgrind/+bug/852760/comments/10
------------------------------------------------------------------------
On 2011-02-18T19:47:47+00:00 Jakub Jelinek wrote:
The
#include <stdlib.h>
#include <string.h>
__attribute__((noinline)) void
foo (void *p)
{
memcpy (p, "0123456789abcd", 15);
}
int
main (void)
{
void *p = malloc (15);
foo (p);
return strlen (p) - 14;
}
testcase where strlen does this is expanded that way with GCC 4.6 (currently used e.g. in Fedora 15) with default options, but e.g. 4.5 or even earlier versions expand this the same way with -O2 -minline-all-stringops.
Reply at: https://bugs.launchpad.net/valgrind/+bug/852760/comments/11
------------------------------------------------------------------------
On 2011-02-18T20:02:46+00:00 Jseward wrote:
I can see this problem isn't going to go away (alas); and we are
seeing similar things on icc generated code. I'll look into it,
but that won't happen for at least a couple of weeks.
Reply at: https://bugs.launchpad.net/valgrind/+bug/852760/comments/12
------------------------------------------------------------------------
On 2012-02-17T16:16:17+00:00 Patrick J. LoPresti wrote:
Isn't this exactly the problem that "--partial-loads-ok" is meant to
address? (cf. bug 294285)
http://valgrind.org/docs/manual/mc-manual.html#opt.partial-loads-ok
Reply at: https://bugs.launchpad.net/valgrind/+bug/852760/comments/14
------------------------------------------------------------------------
On 2012-06-14T20:01:27+00:00 Kevyn-Alexandre Pare wrote:
Could this bug be the same issue?: bug 301922
Reply at: https://bugs.launchpad.net/valgrind/+bug/852760/comments/16
** Changed in: valgrind
Status: Unknown => New
** Changed in: valgrind
Importance: Unknown => Medium
** Bug watch added: GCC Bugzilla #47522
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=47522
** Bug watch added: GCC Bugzilla #44183
http://gcc.gnu.org/bugzilla/show_bug.cgi?id=44183
--
You received this bug notification because you are a member of Ubuntu
Foundations Bugs, which is subscribed to valgrind in Ubuntu.
https://bugs.launchpad.net/bugs/852760
Title:
valgrind false positives on gcc-generated string routines
Status in Valgrind:
New
Status in “valgrind” package in Ubuntu:
New
Status in “valgrind” package in ALT Linux:
New
Status in “valgrind” package in Fedora:
Unknown
Bug description:
#include <string.h>
#include <stdio.h>
#include <stdlib.h>
main()
{
char *a = malloc(1);
a[0] = '\0';
printf("%lu\n", (unsigned long)strlen(a));
}
Compile with "gcc -O2" and run valgrind.
==5977== Invalid read of size 4
==5977== at 0x400494: main (x.c:9)
==5977== Address 0x51ce040 is 0 bytes inside a block of size 1 alloc'd
==5977== at 0x4C28F9F: malloc (vg_replace_malloc.c:236)
==5977== by 0x40048D: main (x.c:7)
To manage notifications about this bug go to:
https://bugs.launchpad.net/valgrind/+bug/852760/+subscriptions
More information about the foundations-bugs
mailing list