I don’t know why, but this assembler command haunts me LEA
.
C++
int f (int t)
{
return t + 1;
}
int f (int * t)
{
return * t + 1;
}
int f (int & amp; t)
{
return t + 1;
}
Assembler
f (int): # @f (int)
lea eax, [rdi + 1]
ret
f (int *): # @f (int *)
mov eax, dword ptr [rdi]
inc eax
ret
f (int & amp;): # @f (int & amp;)
mov eax, dword ptr [rdi]
inc eax
ret
If the MOV
command is clear as day, then the LEA
command is not clear!
I know that the LEA
command calculates the address of the second operand and writes it to the first operand (that’s all I know)
In the example that by reference, namely: lea eax, [rdi + 1]
this is clearly not calculating the address and not writing to the first operand, no record will be but most likely something other. Or am I misunderstanding something? Please explain in accordance with the C++ code.
P.S I was looking for it, but I couldn’t find an exhaustive answer to my question, I even looked in Kalashnikov’s book, and there I didn’t even find this command … ehh.
Answer 1, authority 100%
lea eax, [rdi + 1]
This command loads into eax
the address of the value located at the address rdi + 1. That is, it loads into eax
just rdi + 1
.
Looks strange, and in order to understand exactly why lea
is needed, and why it is better than just a similar call to mov
or manually calculating the address, you need to understand how commands are written in memory and executed processor.
For example, you have a command to read a value:
mov eax, [rdi + 1]; take the value at the address "rdi + 1"
It compiles to something like
[mov opcode] [flag that we add to eax] [flag that we take at rdi] [+ 1]
Ie in 66 67 8B 47 01
Suppose you want to get the actual address rdi + 1
in eax
You can do one of two things:
Calculate it by hand:
mov eax, rdi + 1; does not work, move cannot plus!
and you have to write:
mov eax, rdi
inc eax; 66 05 01 00 00 00
i.e. follow two instructions. Possibly a good option, but only for simple +1. And for addresses like [bp + si + 4]
?
mov eax, bp
add eax, si
add eax, 4; yes, ugly!
or execute lea
:
lea eax, [rdi + 1]
Compare with mov
:
Bytecode: 66 67 8D 47 01
only opcode differs, 8B – & gt; 8D.
The processor has a ready-made, very efficient mechanism for basic address operations. And it has already been implemented for the mov
operation – after all, mov
can retrieve a value from an address !.
When using lea
, the processor does everything it does with mov
, but skips the last step – retrieving the value by address. Instead, it adds the address itself to eax
. This is much more convenient and faster than treating things like rdi + 1
as separate commands.
How does this relate to your example?
In your example, the parameter is in rdi
, and you should return the result in eax
.
To be honest, the compiler should have written
mov eax, rdi; 66 A1
add eax, 1; 66 05 01 00 00 00
Well ok, for 1 you can use inc
:
mov eax, rdi; 66 A1
inc eax; 66 40
But these are still two commands. The processor will execute them one at a time.
The compiler is smart. He knows that the processor can add register values with small constants when processing the lea
command. And it inserts one command which will produce the same result.
lea eax, [rdi + 1]
It doesn’t matter that no address is actually loaded anywhere – the main thing is that it will work the same way, and a little faster. the processor calculates addresses in memory faster than adding numbers 🙂
Answer 2, authority 12%
This command loads into eax
the address of the one on the right – i.e. rdi + 1
. Such a clever way to combine
mov eax, rdi
inc eax
Ie in a sense, translated into C++ it is
& amp; * (rdi + 1)
🙂 Ie getting the address of the object located at [rdi + 1]
.
See, for example, here .
Answer 3, authority 12%
This command
lea eax, [rdi + 1]
sets the eax
register to rdi + 1
, where rdi
stores the argument value.
In the other two cases, when an argument is passed by reference or a pointer to the original argument is passed, the rdi
register contains the address of the argument.
f (int *): # @f (int *)
mov eax, dword ptr [rdi]
inc eax
ret
f (int & amp;): # @f (int & amp;)
mov eax, dword ptr [rdi]
inc eax
ret
Therefore, first, the value of the argument is entered into the eax
register using the address found in rdi
mov eax, dword ptr [rdi]
and then the register value eax
is incremented by 1.
inc eax
That is, the difference between the first function definition and the next two is that in the first case the rdi
register contains a copy of the argument value , while in the last two cases the rdi
gets the address of the original argument .