There is a lot of controversial information about the performance of Float
and Double
on the x86-64 platform
. I would like to understand this issue.
Since this question is difficult to give an unequivocal answer and usually use real with double accuracy, I propose to consider situations in which it is really worth using exactly Float
instead of Double
Answer 1, Authority 100%
tl; dr : float
, as expected, faster Double
, so if you work with large amounts of data and you have enough accuracy Float
You choose Float
. If the accuracy of Float
is not enough, then your choice is small – Double
. If you have no claims at all – choose anything, you will not see the difference.
I, as a participant in the above-mentioned dispute, decided to write an answer. In order to understand what productivity will be, I decided to first study a little theory, for this I wrote the following code:
# include & lt; Cstddef & gt;
INT MAIN ()
{
volatile double darray [] = {5.234234, 2.2143213, 3.214212, 4.123155};
Volatile float Farray [] = {5.234234F, 2.2143213F, 3.214212F, 4.123155F};
volatile double dres = 0.0;
Volatile Float Fres = 0.0F;
For (Size_t i = 0; i & lt; 4; ++ i)
DRES + = DARRAY [I];
For (Size_t i = 0; i & lt; 4; ++ i)
Fres + = Farray [i];
Fres = 0.0;
}
For which we have the following assembler (GCC):
mov rax, QWORD PTR [RBP-96]
MOVSD XMM0, QWORD PTR [RBP-64 + RAX * 8]
MOVSD XMM1, QWORD PTR [RBP-104]
addsd xmm1, xmm0
MOVQ RAX, XMM1
MOV QWORD PTR [RBP-104], RAX
Add QWORD PTR [RBP-96], 1
.L2:
CMP QWORD PTR [RBP-96], 3
JBE .l3.
MOV QWORD PTR [RBP-88], 0
Jmp .l4.
.L5:
MOV RAX, QWORD PTR [RBP-88]
MOVSS XMM0, DWORD PTR [RBP-80 + RAX * 4]
MOVSS XMM1, DWORD PTR [RBP-108]
Addss XMM1, XMM0
MOVD EAX, XMM1
MOV DWORD PTR [RBP-108], EAX
Add QWORD PTR [RBP-88], 1
.L4:
CMP QWORD PTR [RBP-88], 3
JBE .l5
This is not the entire conclusion, but there is enough information. For us, there are two instructions for us: addss
, addsd
– Each is a SIMD instruction for working with Float (first) and double. The first thought is to look for a manual, maybe it’s written there that faster? Such a manual There is , but a runway inspection showed that I will not get there – judging by the manual, these instructions should be executed equally quickly. Good. Let us leave this path and try to assemble the previous code with AVX2 in the studio, we get the following ASM:
; 6:
; 7: Volatile double dres = 0.0;
; 8: Volatile Float Fres = 0.0F;
; 9: FOR (Size_t i = 0; i & lt; 4; ++ i)
XOR EAX, EAX
vxorps xmm2, xmm2, xmm2
VMOVSD QWORD PTR DRES $ [RSP], XMM0
VMOVSS DWORD PTR FRES $ [RSP], XMM2
MOV ECX, EAX
NPAD 9.
$ LL4 @ Main:
; 10: DRES + = DARRAY [I];
VMOVSD XMM1, QWORD PTR Darray $ [RSP + RCX * 8]
VMOVSD XMM0, QWORD PTR DRES $ [RSP]
Inc RCX
vaddsd xmm1, xmm1, xmm0
VMOVSD QWORD PTR Dres $ [RSP], XMM1
CMP RCX, 4
JB Short $ LL4 @ Main
NPAD 1.
$ LL7 @ Main:
; 11: FOR (Size_t i = 0; i & lt; 4; ++ i)
; 12: Fres + = Farray [i];
VMOVSS XMM1, DWORD PTR Farray $ [RSP + RAX * 4]
VMOVSS XMM0, DWORD PTR Fres $ [RSP]
Inc Rax
vaddss xmm1, xmm1, xmm0
VMOVSS DWORD PTR Fres $ [RSP], XMM1
CMP RAX, 4
JB Short $ LL7 @ Main
Code practically did not change, except that the operations began to be called Vaddsd
and Vaddss
. I did not climb into the manual for these teams, I believe that the situation there is similar to those that we have seen earlier.
Then let’s go to another way: we know that Float
is a 32-bit, while Double is a 64-bit. It must inevitably have to affect performance, the question is only one – how? My knowledge of SIMD instructions is very limited, so I do not understand why neither GCC nor the studio used any batch instructions for the addition of numbers. Does anyone tell me why? I have already decided that there are no such. But here this article claims that These are: vaddpd
and Vaddps
, both take the arguments of the size of 256-bit, i.e. At times, such an operation can be folded 8 floats or 4 doubles. This is something else – Float on the right of smaller sizes should be faster and we found that it is actually so.
Another important factor that can withdraw Float is ahead of its smaller effect on the cache: because It is two times less, then the load on the cache will be less. Thus, do not pick up and do not paint more, we get the following conclusion that, in general, immediately comes to mind: Float
Faster than Double
.
It remains to check it in practice, for this we use the following code:
# include & lt; iostream & gt;
#Include & lt; vector & gt;
#Include & lt; Numeric & GT;
#Include & lt; Chrono & gt;
#Include & lt; algorithm & gt;
#Include & lt; String & GT;
INT MAIN ()
{
const Size_t Size = 1'000'000'000;
Std :: Vector & LT; Double & GT; DVector (Size, 2.2143213);
STD :: Vector & lt; Float & gt; FVector (Size, 2.2143213F);
Auto Start = STD :: Chrono :: High_Resolution_Clock :: Now ();
volatile double dres = std :: accumulate (dvector.begin (), dvector.end (), 0.0);
auto doulelapsed = (STD :: Chrono :: High_Resolution_Clock :: Now () - Start) .Count ();
start = STD :: Chrono :: High_Resolution_Clock :: Now ();
volatile float fres = std :: accumulate (fvector.begin (), fvector.end (), 0.0f);
Auto Floatelapsed = (STD :: Chrono :: High_Resolution_Clock :: Now () - Start) .Count ();
STD :: COUT & LT; & LT; "Float elapsed:" & lt; & lt; Floatelapsed & lt; & lt; "\ n";
STD :: COUT & LT; & LT; "Double elapsed:" & lt; & lt; DoubleElapsed & lt; & lt; "\ n";
float ratio = STD :: MAX & LT; Float & gt; (Floatelapsed, doulelapsed) /
STD :: MIN & LT; Float & GT; (Floatelapsed, doublelapsed);
STD :: STRING RELATION = Floatelapsed & lt; DOUBLEELAPSED?
STD :: STRING ("FASTER"): STD :: STRING ("SLOWER");
STD :: COUT & LT; & LT; "Float IS" & lt; & lt; Ratio & lt; & lt; "" & lt; & lt; Relation & lt; & lt; "! \ n";
}
On my PC (Haswell) This code collected in 2015 studio with AVX2 gives a stable advantage of Float
at 1.2-1.3 times, there are peak values much higher, but I did not attach them attention. Even without AVX2 (I tried different options) everything looks like the same.
Of course, the measurements are quite simple, and the argument is quite superficial (I did not put the goal of a full-fledged study, I currently do not have time for it), but even it shows that people claiming that you need to choose Double
and that double
Faster Float
is not right.
And one more test, where I used intrinsics for counting the amount (I could not use them in the best way, but I don’t know how to eat them):
# include & lt; immintrin.h & gt;
#Include & lt; iostream & gt;
#Include & lt; vector & gt;
#Include & lt; Numeric & GT;
#Include & lt; Chrono & gt;
#Include & lt; algorithm & gt;
#Include & lt; String & GT;
Float Accumulate (Const Std :: Vector & LT; Float & gt; & amp; VEC)
{
__m256 res = _mm256_undefined_ps ();
For (Size_t i = 0; i & lt; vec.size (); i + = 8)
{
__m256 m1 = _mm256_load_ps (& amp; VEC [i]);
res = _mm256_add_ps (M1, RES);
}
Float Out [8];
_mm256_store_ps (OUT, RES);
Return Std :: Accumulate (STD :: Begin (Out), Std :: End (Out), 0.0f);
}
Double Accumulate (Const Std :: Vector & LT; Double & GT; & AMP; VEC)
{
__m256d res = _mm256_undefined_pd ();
For (size_t i = 0; i & lt; vec.size (); i + = 4)
{
__m256d m1 = _mm256_load_pd (& amp; VEC [i]);
res = _mm256_add_pd (M1, RES);
}
Double Out [4];
_mm256_store_pd (OUT, RES);
Return Std :: Accumulate (STD :: Begin (Out), Std :: End (Out), 0.0);
}
INT MAIN ()
{
const Size_t Size = 1'000'000;
Std :: Vector & LT; Double & GT; DVector (Size, 2.2143213);
STD :: Vector & lt; Float & gt; FVector (Size, 2.2143213F);
Auto Start = STD :: Chrono :: High_Resolution_Clock :: Now ();
Volatile Double Dres = Accumulate (DVector);
auto doulelapsed = (STD :: Chrono :: High_Resolution_Clock :: Now () - Start) .Count ();
start = STD :: Chrono :: High_Resolution_Clock :: Now ();
Voltile Float Fres = Accumulate (Fvector);
Auto Floatelapsed = (STD :: Chrono :: High_Resolution_Clock :: Now () - Start) .Count ();
STD :: COUT & LT; & LT; "Float elapsed:" & lt; & lt; Floatelapsed & lt; & lt; "\ n";
STD :: COUT & LT; & LT; "Double elapsed:" & lt; & lt; DoubleElapsed & lt; & lt; "\ n";
float ratio = STD :: MAX & LT; Float & gt; (Floatelapsed, doulelapsed) /
STD :: MIN & LT; Float & GT; (Floatelapsed, doublelapsed);
STD :: STRING RELATION = Floatelapsed & lt; DOUBLEELAPSED?
STD :: STRING ("FASTER"): STD :: STRING ("SLOWER");
STD :: COUT & LT; & LT; "Float IS" & lt; & lt; Ratio & lt; & lt; "" & lt; & lt; Relation & lt; & lt; "! \ n";
}
With such code, on the same machine, I get an increase in 2.3-2.5 times.
Answer 2, Authority 35%
And what is performance in this context? Actually Simd
We clearly says that in any vector will contain more Float
, so any vector operations on Float
There will always be faster than similar – over double
, if you rely only on the quantitative characteristics of algorithms.
compare there is nothing in such a context, for example.
If you still want to compare, then you can take the code from of this answer by changing the type of operands on Double
/ Float
respectively (and commands with _mm_cmpgt_epi32
on _mm_cmpgt_pd / _mm_cmpgt_ps
). Performance measurements There everything is there.
Answer 3
Since the weight of Float and Double is the same, the speed of compilation and work will only depend on the number of numbers before and after the comma, my opinion, I use the type Float