Tuesday, September 29, 2009

The Saint and the Farmers

Imagine a land where people eat software. The farmers of the land make a decent living by sowing and reaping compilers, editors, browsers, and so on.

One day a Saint arrives to this land. He sees that some people are starving because they don't get their daily software to eat. He starts to preach the New Era where software is free. He sets up a Free Foodware Foundation that distributes tasty new software at no cost to everyone.

People rush to his Store and stack up on delicious, crispy, fresh software. Everybody is celebrating... except the farmers. They suddenly cannot earn their living any more, as no one needs their product. A few of them manage to switch to growing some obscure, 'embedded' software, that is not offered by the Foundation, but most of them go bankrupt.

Yet some farmers decide to pick up the gauntlet and start distributing their own software for free, while still making some revenue from offering support, selling ad space, etc.

When the Saint learns this, he becomes furious. What a 'traitors' these farmers are! They are worse than those who sold their software for money: they do this only to keep their pathetic products on the market, hindering the way of the Free Software! He warns all his followers not to eat these software, because it is evil and might be poisonous!

Well, this story came to my mind while reading blogposts and comments about the recent Software Freedom Day in Boston, 19th Sept 2009. Being a great fan of Mono, C# and .NET, Richard Stallman's harsh words about Miguel de Icaza left me with bitter feelings.

I use many free software, both GNU and other. But we should not think that without free software the world would come to a sudden stop. What would happen if we did not have GCC, Linux, Openoffice, Gnome, Firefox, Apache, and so on?

We would simply buy software, as we buy books, music, PC games, or even food.

We can afford it. And reason we can afford it is that we have jobs, thanks to the fact that there are still products and services, for which people are willing to pay.

Saturday, September 12, 2009

Why Mono 2.4 is slow?

Let me start by saying that I am a big fan of .NET, and Mono. I believe that C# as a language is superior to Java and C++, and if I have the choice, I always work in C#. So I would like to believe that C# is as portable as Java, thanks to Novell and the Mono project.

Recently, in connection with a molecular docking software, I wrote a sort of benchmark program, which takes a large list of points in 3D space, calculates the distances between each pair of points, and finds the largest distance. The core of the program is a double loop which looks like this:

for (int i = 0; i < N3; i++)
{
P = points[i];

for (int j=i+1; j < N; j++)
{
if ( dist(P, Q) > max_dist)
max_dist = dist(P,Q);
}
}

... where points is a System.Collections.Generic.List with 27000 items. I have rewritten this program also in C++, using STL vectors, and compiled with GCC for my Intel Core 2 processor, using --mtune=native

If I compare the running time on Windows between the C++ and the C# versions, they are roughly equal. However, on Linux, if I use Mono 2.4 as the CLR, the GNU/C++ version is much faster, it is around 2.5 sec, while running with Mono takes more than 7 seconds.

So why running with Mono is slow? Using the Mono Profiler, it turns out that each access to the points list entails an 'array bounds check' for the i and j indices:

########################
58070.103 364540502 0.000 System.Collections.Generic.List`1::get_Count()
Callers (with count) that contribute at least for 1%:
364540502 5 % Benchmarks.MainClass::Main(string[])

... even though allegedly array bound checks are eliminated from for loops! Indeed, they are, but here I am using List and not Array. Let's switch to Array, and the running time decreases from 7.3 sec to 5.4. This is still twice as much as the time taken by the C++ version, so let's run mono --profile again:

########################
59853.928 364486500 0.000 Benchmarks.MainClass::dist(Point,Point)
Callers (with count) that contribute at least for 1%:

... we still have 3 billion calls to the dist(Point,Point) function. Why is it not inlined? Checking the GNU Compiler's output for the C++ version, I can see that the dist() is inlined. But for mono, it is probably too big (let me add here that I always use the --optimize=all option for mono, which includes inline.)
Let's inline the dist(...) function manually:

for (int i = 0; i < N3; i++)
{
P = points[i];

for (int j=i+1; j < N; j++)
{
double d = Math.Sqrt( .... );

if ( d > max_dist)
max_dist = d;
}
}

This speeds up the C# / Mono version to 3.9 sec, still far from the 2.5 s of the C++ version.

Why is it that the C# version on Windows / Microsoft CLR is as fast as the C++ version, without any manual optimization (using generic List, no manual inlining...)

Comparing the GCC generated assembly code with the code, generated by mono --aot -O=all, we can see the the GCC-emitted code is using SSE-based arithmetics, while the Mono JIT code is using x87 floating point instructions:

10ae: dd 45 e8 fldl 0xffffffe8(%ebp)
10b1: dd 45 e8 fldl 0xffffffe8(%ebp)
10b4: de c9 fmulp %st,%st(1)
10b6: de c1 faddp %st,%st(1)
10b8: d9 fa fsqrt

Moreover, for the if ( d > max_dist) max_dist = d; statement, the GCC code is using the maxsd instruction, while the Mono-generated code is branching with a jump...

So Miguel de Icasa and his team still has a lot to do... I look forward to Mono 2.6, which perhaps will amend some of these issues.

Saturday, February 21, 2009

Evolution and speciation as global optimization

The current model of philogenetic evolution is basically a random searchto optimize a fitness function. Genetic programming works by the same principles and it is able to find good approximations to global optima.

Genetic programming is an optimization technique where we maintain a set of candidate solutions, the 'genomes', and modify them randomly by 1) pointlike mutations, or 2) crossovers: creating a new solution by combining halves from existing 'genomes'.

Crossovers are very important; without crossover, genetic programming would be only a local search, capable of finding only local optima.

Seemingly, evolution has the same twofold mechanism: small genetic changes that 'implement' a local search on the fitness landscape; and large, abrupt changes like whole-genome duplication, etc. causing jumps that land on the slopes of faraway peaks.

However, this is only an illusion. Even chromosome or whole-genome duplication is not radical enough for a truly global search for optimum, as no new genes are introduced. Lateral gene transfer works only for bacteria and viruses.

There is one more problem with abrupt, large-scale genetic changes in multicellular, diploid life forms: the offsprings of the new phenotype would be isolated, and cannot find a mate. While this is not a problem for slow genetic drift, where offsprings are similar enough to their parents and relatives, it poses a problem for 'jumps' that are so successful in genetic programming.

One solution could be that we state that evolution is a local search, and the fitness landscape is smooth enough to make every peak attainable on a local search path.

I don't know if the fossil records support this hypothesis or not; I feel that there are large 'gaps' between groups (taxa), that would suggest that from time to time abrupt changes are necessary, and evolution cannot be explained as a smooth, simple local search.

In the following I share my own 'crackpot' theory on evolution; it is not the result of any kind of study or investigation, just my thoughts on the topic; it was not reviewed or approved by any expert in the field (and what regards me, I am an absolute novice or amateur...). So quit reading or take the following cautiously, as it could be the dumbest thing you've ever heard.

I think there are two distinct mechanisms for evolution: one is the well-known darwinian process: small random mutations accumulate in exons and selected or eliminated by natural selection: this is the local search.

And I think there is another mechanism, that is behind the abrupt, jump-like changes in the genome: it is much less known or discussed in mainstream science. These changes accumulate silently, in non-expressed parts of the genome, like introns or 'junk' DNA, and they are passed on to offsprings so that they are widespread in the genome before the Big Change.

Then there comes a time when the genome becomes bistable: a single point-mutation can turn lots of the exons into introns or junk DNA, while the latent, hidden new genes suddenly become expressed.

As the latent new genome is already widespread at this point, this 'flip over' can happen simultaneously in many offsprings, so they will be able to find mates and create their on population.

The crucial notion here is the bistability of the genome: a state where it contains really two genomes, one of which is suppressed, the other expressed, but a single mutation can turn it around, so that the latent genome becomes expressed. Of course there could be thousands of such mutations, so this flipover event is not at all improbable.

Alternative splicing in some genes is a model for this flip-over in small scale.

Thursday, February 12, 2009

PROPKA vs H++: pKa predictors for protein residues

Recently I decided to try out some free, online servers for predicting pKA values on protein residues: PROPKA, H++, and Karlsberg+. This latter promises to send the results in email, but I never got back anything, so I guess it is dropped from the competition.

First, I tried a famous example for abnormal pKa value: the bovine chymotrypsin, which was one of the first proteases whose catalytic mechanism was unveiled. All textbooks on biochemistry describe this: we have a 'catalytic triad', which are Asp102, His57, and Ser195. Asp102 forms a Hydrogen bond with His57, making its other aromatic nitrogen a strong bases, which deprotonates Ser195. Thus the Serine becomes a nucleophile and attacks the carbon in the peptide bond.

So I expected from the pKa predictors that they find out that the Histidine becomes a strong base, and that the Serine is deprotonated.

First, I tried the PROPKA server, giving 1ab9 (BOVINE GAMMA-CHYMOTRYPSIN)

pKa pKmodel
ASP 102B 0.40 3.80
HIS 57B 6.19 6.50

This is not what I expected. The Histidine is actually getting a bit acidic; while the Aspartate is very acidic. Trying on H++:

HIS57: 11.3
ASP102 -5.1

I am satisfied with the prediction on the Histidine: it became a strong base; but -5.1 for the Asp is a bit off the mark...

Next, I tried Caspase 3, a cysteine protease, with PDB structure 1rhk. H++ did not accept it, complaining about missing residues. The catalytic residue is Cys165A, for which propka gives pKa = 11.14, instead of the standard 9.0. So the cysteine indeed became a strong base and nucleophile.

Sunday, February 01, 2009

MIT OpenCourseWare: there is no such thing as a free lunch

The idea is great. Collect all the course materials: handouts, ppt slides, lecture notes, problem sets & answers, 'further reading lists', and so on from the brilliant professors of the MIT, and put them on the web, so that anyone can access them freely.

A generous gesture for all the students and young people in poor countries, who are eager to learn, but can't afford it. Obviously MIT has nothing to lose with this move, but can gain a lot by its PR value.

I come back to the MIT OCW website from time to time, look around, and leave disappointed. There are appealing titles that I would love to go through: like "7.343 Sophisticated Survival Skills of Simple Microorganisms", or "7.343 Neuron-glial Cell Interactions in Biology and Disease" or "7.343 Protein Folding, Misfolding and Human Disease", - just to mention a few (I am mostly browsing graduate courses in biology); - but when I want to download the 'courseware', it turns out that this includes nothing more than a meagre syllabus, a short abstract of each or some of the lectures, and a list of references to papers in scientific journals that are behind paywalls.

Especially this list of references is the most common 'courseware' that is offered, although I think that someone who goes to MIT OCW for knowledge does not have a ScienceDirect account. Obviously all these courses are taught using a PPT slideshow - why cannot they just put them online? (I dare not think of copyright issues...)

Working at a huge multinational company, comparable in size with MIT, I think I know the answer. Top managers love to be creative, they are enthusiastic about novel ideas and being a 'leader' in an area like OCW (undisputably initiated by MIT), - but they are sluggish and lazy to follow up these ideas and see that they are implemented correctly and completely.

Most probably these top officials of MIT never actually go to the OCW site and click through to see any of the course materials they are so proud of. If they did, they would probably feel cheated and angry, just as any third-world student who find nothing but a teaser in place of the knowledge they seek.

Friday, January 16, 2009

GNU make: target-specific settings

I am a great fan of Make and specifically of the GNU Make: I use it as my primary scripting and building tool. Although I tried other tools like ANT, I soon got disappointed and fled back to Make.

An important feature of GNU make is the use of target-specific variable settings. This is the feature you need to create Makefiles that work for several platforms and compilers. Eg. you could write (just an oversimplified illustration):

gcc: CC = gcc
gcc: CCFLAGS = -g -I.. -std=c99 -pedantic -O3
gcc: CCFLAGS += -Wall -Wwrite-strings -Wmissing-format-attribute -Wstrict-aliasing
gcc: all

icc: CC = icc
icc: CCFLAGS = -g -I.. -std=c99 -fast -fbuiltin -finline -check-uninit
icc: all

all: ... ( the actual targets)

- so that you can compile your code both with GNU and Intel compiler, depending on the main target (gcc or icc).

I use this technique all the time. But recently I found that these Makefiles are not really portable, since they don't work well on GNU make version 3.79 or earlier. And as the current version of GNU make is 3.81, we cannot say that 3.79 is a terribly old and obsolete release, and in fact lots of computers at my workplace have this version.

Actually the feature of target-specific variable settings does exist on version 3.79, the problem is that it is buggy and crashes at slight changes in the Makefile. For example, consider the following code:

.PHONY: test

MYVAR = Error

test: MYVAR = Hello,
test: MYVAR += world!
test:
@echo $(MYVAR)

which works fine with make 3.81, printing "Hello, world!", as intended. However, on 3.79, the output is "Error world!". The following Makefile (adding only the export keyword)

.PHONY: test

export MYVAR = Error

test: MYVAR = Hello,
test: MYVAR += world!
test:
@echo $(MYVAR)

works fine with 3.81, but it crashes with 3.79 with the error

../3.79.1/expand.c:489: failed assertion `current_variable_set_list->next != 0'
0 [sig] make-3.79 7772 open_stackdumpfile: Dumping stack trace to make-3.79.exe.stackdump
Aborted (core dumped)

It is unfortunate that this bug exists in a widely distributed and used version of GNU Make, since open source projects rely on GNU make on the user's site.

Wednesday, January 14, 2009

Automatic Vectorization - Part 2

I wonder if automatic vectorization is also possible without the __attribute__((vector_size(16))) annotation, which is cheating in a way, since we give a quite explicit hint about how to pack data in vectors... The simplest example of truly automatic vectorization is this:



When compiling with gcc -O3 -msse3 -mfpmath=sse -std=c99 -S, we get this assembly output:



We can see that four float values are added in every iteration, using the addps instruction, which adds four packed single-precision floating point numbers. - If the length of the arrays is not a multiple of 4 we still obtain this vectorization using addps, up to 1020, and the last 3 elements are added using addss.

Now if we try to compile the following, very similar C function:



the output is much more complicated, see it here. There are two loops: the L7 loop is with scalar instructions (addss), the L5 is with addps (parallel addition of 4 floats). Which one is used is decided according to the alignment of the input data: this is checked by the
testb $15, %bl
instruction at the beginning.

I was not able to modify my C code so that the function arguments are aligned at 16-bytes boundaries. For example, we could try



... but it gives the same assembler code as before, checking the alignment and deciding if it is worth to vectorize the loop or not.