Hi Pat: Thanks for the reply!
I have to compile the codes, and I've been using gfortran+OpenMPI+netcdf363. I've tried the openmpi mailing list - not so useful. Some of the codes have aspects of OpenMP in them as options, but the heavy lifting is always done by domain-decomposition parallelism.
Just one standalone box (so the wifi thing was just me sobbing about not having a Yast-look-alike anywhere except OpenSuSE...). The main reason for this post was that I was seeing different stability for different distros on this SuperMicro 64-core box. I was really surprised that different distros would be different that way. And I was hoping someone would have wise words about things like the BIOS: NUMA support, HPC support, CPB (core performance boost) support, etc. (and possibly how they relate to the Kernel Panic I was seeing). The OPTERON 63xx chips have been out for a long time, so I would expect all the distros to support them fully by now.
EDIT: I do remember coming across numactl (and one related command I forget) to "bind" processes to specific CPU's and their local memory. I think OpenMPI also supports binding. But there's more to it which I haven't yet learnt - basically, you want adjacent pieces of the computational domain on the same processor, insofar as possible, so they only have to pass data within the socket, rather than going out across a limited HT bus. I'm not sure how up-to-date this image is, but notice 2 "jumps" are needed to get from, say, CPU1 to CPU4. http://www.google.com/imgres?imgurl=htt ... g&tbm=isch
I have not figured out how to identify which threads contain which parts of the solution domain in order to profit from numactl or from the MPI "binding." But I've been told that the modern linux kernels are pretty good at doing such things on the fly. But I don't know how or where to actually check if that is true or not...