Recently, in connection with a molecular docking software, I wrote a sort of benchmark program, which takes a large list of points in 3D space, calculates the distances between each pair of points, and finds the largest distance. The core of the program is a double loop which looks like this:
for (int i = 0; i < N3; i++)
{
P = points[i];
for (int j=i+1; j < N; j++)
{
if ( dist(P, Q) > max_dist)
max_dist = dist(P,Q);
}
}
... where points is a System.Collections.Generic.List with 27000 items. I have rewritten this program also in C++, using STL vectors, and compiled with GCC for my Intel Core 2 processor, using
--mtune=nativeIf I compare the running time on Windows between the C++ and the C# versions, they are roughly equal. However, on Linux, if I use Mono 2.4 as the CLR, the GNU/C++ version is much faster, it is around 2.5 sec, while running with Mono takes more than 7 seconds.
So why running with Mono is slow? Using the Mono Profiler, it turns out that each access to the
points list entails an 'array bounds check' for the i and j indices:
########################
58070.103 364540502 0.000 System.Collections.Generic.List`1::get_Count()
Callers (with count) that contribute at least for 1%:
364540502 5 % Benchmarks.MainClass::Main(string[])
... even though allegedly array bound checks are eliminated from for loops! Indeed, they are, but here I am using List and not Array. Let's switch to Array, and the running time decreases from 7.3 sec to 5.4. This is still twice as much as the time taken by the C++ version, so let's run
mono --profile again:
########################
59853.928 364486500 0.000 Benchmarks.MainClass::dist(Point,Point)
Callers (with count) that contribute at least for 1%:
... we still have 3 billion calls to the dist(Point,Point) function. Why is it not inlined? Checking the GNU Compiler's output for the C++ version, I can see that the dist() is inlined. But for mono, it is probably too big (let me add here that I always use the
--optimize=all option for mono, which includes inline.)Let's inline the dist(...) function manually:
for (int i = 0; i < N3; i++)
{
P = points[i];
for (int j=i+1; j < N; j++)
{
double d = Math.Sqrt( .... );
if ( d > max_dist)
max_dist = d;
}
}
This speeds up the C# / Mono version to 3.9 sec, still far from the 2.5 s of the C++ version.
Why is it that the C# version on Windows / Microsoft CLR is as fast as the C++ version, without any manual optimization (using generic List, no manual inlining...)
Comparing the GCC generated assembly code with the code, generated by
mono --aot -O=all, we can see the the GCC-emitted code is using SSE-based arithmetics, while the Mono JIT code is using x87 floating point instructions:
10ae: dd 45 e8 fldl 0xffffffe8(%ebp)
10b1: dd 45 e8 fldl 0xffffffe8(%ebp)
10b4: de c9 fmulp %st,%st(1)
10b6: de c1 faddp %st,%st(1)
10b8: d9 fa fsqrt
Moreover, for the
if ( d > max_dist) max_dist = d; statement, the GCC code is using the maxsd instruction, while the Mono-generated code is branching with a jump...So Miguel de Icasa and his team still has a lot to do... I look forward to Mono 2.6, which perhaps will amend some of these issues.

0 comments:
Post a Comment