我正在将32位Delphi BASM代码移植到64位FPC(Win64目标操作系统),并想知道为什么下一条指令不能在64位FPC中编译: {$IFDEF FPC} {$ASMMODE INTEL}{$ENDIF}procedure DoesNotCompile;asm LEA ECX,[ECX + ESI + $265E5A51]
{$IFDEF FPC} {$ASMMODE INTEL} {$ENDIF} procedure DoesNotCompile; asm LEA ECX,[ECX + ESI + $265E5A51] end; // Error: Asm: 16 or 32 Bit references not supported
可能的解决方法是:
procedure Compiles1; asm ADD ECX,ESI ADD ECX,$265E5A51 end; procedure Compiles2; asm LEA ECX,[RCX + RSI + $265E5A51] end;
我只是不明白Win64目标中的32位LEA指令有什么问题(它在32位Delphi中编译正常,因此它是正确的CPU指令).
优化备注:
下一代码由64位FPC 2.6.2编译
{$MODE DELPHI} {$ASMMODE INTEL} procedure Test; asm LEA ECX,[RCX + RSI + $265E5A51] NOP LEA RCX,[RCX + RSI + $265E5A51] NOP ADD ECX,$265E5A51 ADD ECX,ESI NOP end;
生成下一个汇编程序输出:
00000000004013F0 4883ec08 sub $0x8,%rsp project1.lpr:10 LEA ECX,[RCX + RSI + $265E5A51] 00000000004013F4 8d8c31515a5e26 lea 0x265e5a51(%rcx,%rsi,1),%ecx project1.lpr:11 NOP 00000000004013FB 90 nop project1.lpr:12 LEA RCX,[RCX + RSI + $265E5A51] 00000000004013FC 488d8c31515a5e26 lea 0x265e5a51(%rcx,%rsi,1),%rcx project1.lpr:13 NOP 0000000000401404 90 nop project1.lpr:14 ADD ECX,$265E5A51 0000000000401405 81c1515a5e26 add $0x265e5a51,%ecx project1.lpr:15 ADD ECX,ESI 000000000040140B 01f1 add %esi,%ecx project1.lpr:16 NOP 000000000040140D 90 nop project1.lpr:17 end; 000000000040140E 4883c408 add $0x8,%rsp
获胜者是(7个字节长):
LEA ECX,[RCX + RSI + $265E5A51]
所有3个备选方案(包括LEA ECX,[ECX ESI $265E5A51],不通过64位FPC编译)长度为8个字节.
不确定获胜者的速度是否最佳.
我认为这是FPC汇编程序中的错误.您提供的asm代码是有效的,在64位模式下,使用带有32位寄存器的LEA是完全有效的.英特尔处理器文件很清楚. Delphi 64位内联汇编程序接受此代码.要解决此问题,您需要手动组装代码:
DQ $265e5a510e8c8d67
在Delphi CPU视图中,它显示为:
Project1.dpr.12: DQ $265e5a510e8c8d67 0000000000424160 678D8C0E515A5E26 lea ecx,[esi+ecx+$265e5a51]
我执行了一个非常简单的基准测试来比较32位和64位操作数的使用,以及使用两个ADD的版本.代码如下所示:
{$APPTYPE CONSOLE} uses System.Diagnostics; function BenchWithTwoAdds: Integer; asm MOV EDX,ESI XOR EAX,EAX MOV ESI,$98C34 MOV ECX,$ffffffff @loop: ADD EAX,ESI ADD EAX,$265E5A51 DEC ECX CMP ECX,0 JNZ @loop MOV ESI,EDX end; function BenchWith32bitOperands: Integer; asm MOV EDX,ESI XOR EAX,EAX MOV ESI,$98C34 MOV ECX,$ffffffff @loop: LEA EAX,[EAX + ESI + $265E5A51] DEC ECX CMP ECX,0 JNZ @loop MOV ESI,EDX end; {$IFDEF CPUX64} function BenchWith64bitOperands: Integer; asm MOV EDX,ESI XOR EAX,EAX MOV ESI,$98C34 MOV ECX,$ffffffff @loop: LEA EAX,[RAX + RSI + $265E5A51] DEC ECX CMP ECX,0 JNZ @loop MOV ESI,EDX end; {$ENDIF} var Stopwatch: TStopwatch; begin {$IFDEF CPUX64} Writeln('64 bit'); {$ELSE} Writeln('32 bit'); {$ENDIF} Writeln; Writeln('BenchWithTwoAdds'); Stopwatch := TStopwatch.StartNew; Writeln('Value = ', BenchWithTwoAdds); Writeln('Elapsed time = ', Stopwatch.ElapsedMilliseconds); Writeln; Writeln('BenchWith32bitOperands'); Stopwatch := TStopwatch.StartNew; Writeln('Value = ', BenchWith32bitOperands); Writeln('Elapsed time = ', Stopwatch.ElapsedMilliseconds); Writeln; {$IFDEF CPUX64} Writeln('BenchWith64bitOperands'); Stopwatch := TStopwatch.StartNew; Writeln('Value = ', BenchWith64bitOperands); Writeln('Elapsed time = ', Stopwatch.ElapsedMilliseconds); {$ENDIF} Readln; end.
英特尔i5-2300的输出:
32 bit BenchWithTwoAdds Value = -644343429 Elapsed time = 2615 BenchWith32bitOperands Value = -644343429 Elapsed time = 3915 ---------------------- 64 bit BenchWithTwoAdds Value = -644343429 Elapsed time = 2612 BenchWith32bitOperands Value = -644343429 Elapsed time = 3917 BenchWith64bitOperands Value = -644343429 Elapsed time = 3918
正如您所看到的,基于此的LEA选项之间没有任何选择.它们的时间之间的差异完全在测量的可变性之内.但是,使用ADD的变体两次获胜.
不同机器的一些不同结果.这是Xeon E5530的输出:
64 bit BenchWithTwoAdds Value = -644343429 Elapsed time = 3434 BenchWith32bitOperands Value = -644343429 Elapsed time = 3295 BenchWith64bitOperands Value = -644343429 Elapsed time = 3279
在Xeon E5-4640 v2上:
64 bit BenchWithTwoAdds Value = -644343429 Elapsed time = 4102 BenchWith32bitOperands Value = -644343429 Elapsed time = 5868 BenchWith64bitOperands Value = -644343429 Elapsed time = 5868