As you can see, the result is an always clean bytes32 word with the address. In the case of only doing calldataload(sub(calldatasize(), 32)) it will load the last 32 bytes without cleaning. I think it's not a big deal but a change requires discussion, and I'm personally in favor of keeping it as it is right now.
But you mention cleaning. Note that you're assigning shr(96, calldataload(sub(calldatasize(), 20))) to address type.
What this means is that if you do sender := calldataload(sub(calldatasize(), 32)), this will always load 20bytes from the right side in which case, address sender won't anyways contain 0xFFFFFFF..., but only aaaaa. This is why I was asking that it doesn't make any difference, even with cleaning argument you mentioned. Thoughts ?
But you mention cleaning. Note that you're assigning shr(96, calldataload(sub(calldatasize(), 20))) to address type.
Consider that under the hood everything is bytes32 given the EVM works on 32-bytes words. I'm sure Solidity cleans those bytes before assigning to an address. The problem here is that the address is being assigned in assembly, which means that assigning doesn't come with the regular guarantees of a variable assignment in Solidity.
Anyway, I think you're right and it's generally safe if you're using regular Solidity. I guess the problem may come if you try to load an address from the stack directly using assembly, in that case, those bytes could leak to other variables if you're not careful.
I would suggest opening an issue in the OpenZeppelin Contracts repository for discussion, but our rule of thumb is that if we can save a few gas units at the expense of leaving a footgun, we'll rather keep users safe.
Then this is interpreted as an address by only reading the RIGHTMOST 20bytes ... so that would be interpreted as addess 0xbbbbbbbbbbbbbbbb000000000000000000000000 ... which is obviously incorrect!
So we need to take our bytes32 0xbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb000000000000000000000000, and shift it to the right so it becomes 0x000000000000000000000000bbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbbb. At that point, considering its an address is ok. The rightmost 20bytes contain the correct value.
Please feel free to checkout v4.8, change the code (remove the shr) and run the tests ... you'll see by yourself!
Thanks so much. It makes sense. I had a feeling that it was not about the dirty bytes. Somehow I missed the case of testing less than 32 bytes calldata. <3