Building a stable server for Linux compiling

Chat about Linux in general
Forum rules
Do not post support questions here. Before you post read the forum rules. Topics in this forum are automatically closed 6 months after creation.
User avatar
Michael_Hathaway
Level 4
Level 4
Posts: 313
Joined: Sat Oct 09, 2021 2:27 am
Location: Shebang, USA
Contact:

Building a stable server for Linux compiling

Post by Michael_Hathaway »

Firstly, I apologize in advance. I am in no way attempting to be offensive. I appreciate everyone’s time and advice. I personally do not have a Intel or AMD preference. I use what works for me.

It is my opinion that when you compile software on extreme stable platforms, they have a higher success rate upon completion. Despite all the internet hype, AMD desktop computer design is a huge disappointment to me, for my purposes. It is fast at the compromise of stability. And for most users, gamers included, I am sure the small compromise to stability makes little to no difference to them.

I recently built an AMD Ryzen 5950X 16/32core system and now regret it. I used the most advanced motherboard I could find and the best air cooling solution that would still fit in a 4U server case. I have tested several different kernels, including 5.16. However 5.14 runs as stable as I can make it. I also have to run Windows, which now crashes with the 5950X. It has a suspend to memory issue. There are also virtualization problems. Wine instability that I do not have on any of my other systems.

I wish/goal is to compile different kinds of programs, packages, including kernels. To do this, I have considered to build a very stable server to compile on. AMD Epyc is just not as stable as Intel Xeon, although Epyc is less expensive. Some of the new Xeon cpus are very expensive, priced at $20,000 us dollars each. I cannot afford two of these cpus. Instead I am considering the purchase two Xeon 8081 @ $5000.00 each. These will be loaded onto an X11DPI-N. 28 core, 56 threads 2.5Ghz each times two. I have been given the advice to wait another year for 10nm to come down in price or 7nm around the corner.

Your thoughts, opinions, experience on this?
Last edited by LockBot on Wed Dec 28, 2022 7:16 am, edited 1 time in total.
Reason: Topic automatically closed 6 months after creation. New replies are no longer allowed.
Enterprise Dual Xeon 8081 (112) @3.8Ghz, 16TB NVMe Raid, 387Gb ECC, AMD Pro W7700 16Gb
Debian Support. Deb 12/13 Trixie 6.7.9
Image
rene
Level 20
Level 20
Posts: 12212
Joined: Sun Mar 27, 2016 6:58 pm

Re: Building a stable server for Linux compiling

Post by rene »

Michael_Hathaway wrote: Sun Dec 05, 2021 5:37 pm It is my opinion that when you compile software on extreme stable platforms, they have a higher success rate upon completion.
Basically, no, other of course than in the sense of the system not crashing during the compile. A compiled program is a bag-of-bytes and if you'd with the otherwise same system including same compiler and settings for which find any literal single bit difference between a compile on system 1 and system 2 with just a different CPU you'd have found a compiler bug.

I do not otherwise have many well-founded opinions on stability of server-type AMD vs Intel systems. 4U would seem big enough for sufficient cooling but I'd explicitly monitor that; a 5950X would tend to be coupled with high-end liquid cooling certainly.
User avatar
Michael_Hathaway
Level 4
Level 4
Posts: 313
Joined: Sat Oct 09, 2021 2:27 am
Location: Shebang, USA
Contact:

Re: Building a stable server for Linux compiling

Post by Michael_Hathaway »

The 5950 has a TDP of 105W, unlike the thread rippers which are 280W. Never the less, I have the largest Noctua heat sink and high pressure fans I could get in a 4U. It was very unstable the first 48 hours, but has calmed down after several heat cycles. I am thinking that I should replace the ram with ECC and just use this ram in a different system. My core temps are typically 69-71C at a forced full load.
Enterprise Dual Xeon 8081 (112) @3.8Ghz, 16TB NVMe Raid, 387Gb ECC, AMD Pro W7700 16Gb
Debian Support. Deb 12/13 Trixie 6.7.9
Image
rene
Level 20
Level 20
Posts: 12212
Joined: Sun Mar 27, 2016 6:58 pm

Re: Building a stable server for Linux compiling

Post by rene »

Temp looking good then. While ECC is a good idea in and of itself, I would say it shouldn't be necessary; that if the issue is memory related you're likely more looking at not very compatible memory. Ryzen is (or was...) indeed more sensitive and you'd buy or bought RAM from e.g. G-Skill which specifically mentioned good Ryzen support.

In any case not much relevant to otherwise add, maybe other than noting that both Linus Torvalds and Greg KH, citizen 1 and 2 of Linux so to speak, use an AMD Threadripper 3970X to compile kernels all day long...
User avatar
Michael_Hathaway
Level 4
Level 4
Posts: 313
Joined: Sat Oct 09, 2021 2:27 am
Location: Shebang, USA
Contact:

Re: Building a stable server for Linux compiling

Post by Michael_Hathaway »

rene, thank you for taking the time to answer, you advice is appreciated and welcome.
Yes, Linus uses the Threadripper 3970X, he also uses ECC registered memory. But he doesn't actually compile the final released kernel on that machine. They have an enterprise server that compiles it for them. I found 64GB of ECC that is compatible with my motherboard for $355 usd. I am going to try that and see if that improves my system stability.
Enterprise Dual Xeon 8081 (112) @3.8Ghz, 16TB NVMe Raid, 387Gb ECC, AMD Pro W7700 16Gb
Debian Support. Deb 12/13 Trixie 6.7.9
Image
rene
Level 20
Level 20
Posts: 12212
Joined: Sun Mar 27, 2016 6:58 pm

Re: Building a stable server for Linux compiling

Post by rene »

That's somewhat oddly put. Linus doesn't release a compiled kernel; integrates code and recompiles the kernel to test said code/integration all day long and, yes, simply on his own machine. Of course many automated build systems/farms to then in turn test what he releases (and for testing things before he's even send anything in the co-called "-next" tree) but note that a kernel compile as such on hardware such as that takes under a minute; is very much a thing any kernel developer does "at home" like any user would.
Hoser Rob
Level 20
Level 20
Posts: 11796
Joined: Sat Dec 15, 2012 8:57 am

Re: Building a stable server for Linux compiling

Post by Hoser Rob »

rene wrote: Sun Dec 05, 2021 5:56 pm
Michael_Hathaway wrote: Sun Dec 05, 2021 5:37 pm It is my opinion that when you compile software on extreme stable platforms, they have a higher success rate upon completion.
Basically, no, other of course than in the sense of the system not crashing during the compile. ..
Agree 100%. Of course, earlier attempts at compiling and building may have destabilized the machine.
For every complex problem there is an answer that is clear, simple, and wrong - H. L. Mencken
DisturbedDragon
Level 5
Level 5
Posts: 574
Joined: Mon Oct 29, 2012 6:29 pm
Location: Texas

Re: Building a stable server for Linux compiling

Post by DisturbedDragon »

Just throwing this out there. Your sig shows " Deb12+Mint20.3 packages - 5.14.0-4". Maybe this is part of the stability issue? I mention it only because the system in my sig runs current Mint Cinnamon, current Fedora Cinnamon, openSUSE and other distros (my boot menu is ridiculous) and all are rock solid. Among many other things, I compile quite a bit and have not experienced any issues.

Not to say there is not an issue, just not sure the issue is hardware related.
AMD Ryzen 9 5950X 16C/32T | MSI MPG x570 Gaming Plus | 2TB Mushkin Pilot-E NVMe | 1TB Crucial P1 NVMe | 2x 2TB Inland Gen4 NVMe | 32GB Trident Z DDR4 3600 | Nvidia RTX4090 | Fedora 39 Cinnamon | Linux Mint 21.3 Cinnamon | Kernel 5.15.x lowlatency
User avatar
Portreve
Level 13
Level 13
Posts: 4870
Joined: Mon Apr 18, 2011 12:03 am
Location: Within 20,004 km of YOU!
Contact:

Re: Building a stable server for Linux compiling

Post by Portreve »

I can't really offer comment on a compiling system, but I can tell you that my preferred server OS is Debian because it will outlive the hardware it runs on.

I've used it in exactly that capacity on two very dated (at that point, much less today) computers: a PowerMacintosh G3/266 Desktop, and a Mac mini G4 1.42 GHz. I ran the PowerMac G3 like that for about 3 1/2 years, and the Mac mini for almost 5 (IIRC) and what attracted me to running Debian, apart from the fact they (used to) make ports for everything including your kitchen toaster, was its reputation for stability. I can tell you for a fact those machines ran, one after the other, for the time stated above, and never crashed. Ever.
Flying this flag in support of freedom 🇺🇦

Recommended keyboard layout: English (intl., with AltGR dead keys)

Podcasts: Linux Unplugged, Destination Linux

Also check out Thor Hartmannsson's Linux Tips YouTube Channel
Hoser Rob
Level 20
Level 20
Posts: 11796
Joined: Sat Dec 15, 2012 8:57 am

Re: Building a stable server for Linux compiling

Post by Hoser Rob »

FWIW, for anyone who actually knows about programming, the stability of the underlying system is the least of your concerns. The stability of YOUR CODE is going to be the problem. Programming is actually platform agnostic.
For every complex problem there is an answer that is clear, simple, and wrong - H. L. Mencken
rene
Level 20
Level 20
Posts: 12212
Joined: Sun Mar 27, 2016 6:58 pm

Re: Building a stable server for Linux compiling

Post by rene »

Hoser Rob wrote: Tue Dec 07, 2021 10:06 am Programming is actually platform agnostic.
Muddying the thread but I remember you said this before and that I commented on it before: you may wish to make precise what you mean by that statement, because any non-trivial/non-internal code tends to be anything but platform agnostic.
legacypowers
Level 4
Level 4
Posts: 270
Joined: Sat Dec 19, 2020 8:53 am

Re: Building a stable server for Linux compiling

Post by legacypowers »

My experience with Ryzen is a mixed bag, most of the issues are BIOS(UEFI) related and usually flashing the newest version resolved the issues, i had a customer with a Windows Server that under heavy load would lockup or BSOD, and guess what? BIOS issues.
My now broken laptop wouldn't even boot any linux, BIOS update fixed.
Terminal - zsh wrote: ╭─legacy@forums.linuxmint.com
╰─➜ _
User avatar
Michael_Hathaway
Level 4
Level 4
Posts: 313
Joined: Sat Oct 09, 2021 2:27 am
Location: Shebang, USA
Contact:

Re: Building a stable server for Linux compiling

Post by Michael_Hathaway »

DisturbedDragon wrote: Mon Dec 06, 2021 7:46 pm Just throwing this out there. Your sig shows " Deb12+Mint20.3 packages - 5.14.0-4". Maybe this is part of the stability issue? I mention it only because the system in my sig runs current Mint Cinnamon, current Fedora Cinnamon, openSUSE and other distros (my boot menu is ridiculous) and all are rock solid. Among many other things, I compile quite a bit and have not experienced any issues.

Not to say there is not an issue, just not sure the issue is hardware related.
Interestingly you caught that. The Debian 12 is just for playing around on, someone asked me if it where possible to load Mint 20.3 packages into Debian. I will obviously reload to Debian 11. I use Mint as a tool to help new users who wish to come to Linux and learn. Windows is where I am seeing the majority of my crashes on the 5950X and that is a problem because I use that computer for my professional work in Autocad, Solidworks and Mastercam. After talking with several people in the professional server realm, it was recommended that I upgrade to ECC unbuffered memory on my machine. I purchased 64Gb of 3200, but I think I am going to return this kit and get the 128. Hopefully this helps my stability issue. I have upgraded all firmware I could find. Wendel from LV1 tech says that it could be any number of issues, even DC power routed too close to the memory.

There are different levels of stability from different viewpoints I guess. I have just an entry level workstation computer which seems to work great for the money spent. Obviously an Epyc or Xeon system is far more expensive. But with that expense a more stable platform and other things like better memory bandwidth and support. I looked into the newer Threadrippers and for $4000 for a cpu, I would buy an Epyc 64 core for $5000 as it also comes with the motherboard at that price point. You have the scaleability to run two cpus if needed.

Having better code does help your compile from crashing. We found this out yesterday during live stream when Chris Titus was trying to compile Debian sid into a live iso. I don't even know if it is possible to do that. I think you have to build a bullseye iso and then change the sources. I actually wrote tutorials on this, but they were all deleted by xenopeek.
Enterprise Dual Xeon 8081 (112) @3.8Ghz, 16TB NVMe Raid, 387Gb ECC, AMD Pro W7700 16Gb
Debian Support. Deb 12/13 Trixie 6.7.9
Image
DisturbedDragon
Level 5
Level 5
Posts: 574
Joined: Mon Oct 29, 2012 6:29 pm
Location: Texas

Re: Building a stable server for Linux compiling

Post by DisturbedDragon »

Michael_Hathaway wrote: Wed Dec 08, 2021 6:14 am Interestingly you caught that. The Debian 12 is just for playing around on, someone asked me if it where possible to load Mint 20.3 packages into Debian. I will obviously reload to Debian 11. I use Mint as a tool to help new users who wish to come to Linux and learn. Windows is where I am seeing the majority of my crashes on the 5950X and that is a problem because I use that computer for my professional work in Autocad, Solidworks and Mastercam. After talking with several people in the professional server realm, it was recommended that I upgrade to ECC unbuffered memory on my machine. I purchased 64Gb of 3200, but I think I am going to return this kit and get the 128. Hopefully this helps my stability issue. I have upgraded all firmware I could find. Wendel from LV1 tech says that it could be any number of issues, even DC power routed too close to the memory.
Given the issues perisists across operating systems/distros and notably during higher workloads for the most part; what kind of power are you pushing on this hardware? Could very well be a power supply issue. I had a server (dual 16 core Opterons w/256GB ECC) that kept crashing randomly a little over a year ago. Thought it was CPU, board, RAM or failing drive. Funnily I did test the PSU and it tested out fine. Ran memtest, swapped CPU, checked board with a magnifying glass looking for issues, Nothing but crash, crash, crash, reboot. I swapped out the power supply just for giggles. System has been up for a year now...
Michael_Hathaway wrote: Wed Dec 08, 2021 6:14 am There are different levels of stability from different viewpoints I guess.
Perhaps... Program crashes are unacceptable for me, much less system crashes.
AMD Ryzen 9 5950X 16C/32T | MSI MPG x570 Gaming Plus | 2TB Mushkin Pilot-E NVMe | 1TB Crucial P1 NVMe | 2x 2TB Inland Gen4 NVMe | 32GB Trident Z DDR4 3600 | Nvidia RTX4090 | Fedora 39 Cinnamon | Linux Mint 21.3 Cinnamon | Kernel 5.15.x lowlatency
User avatar
MurphCID
Level 15
Level 15
Posts: 5908
Joined: Fri Sep 25, 2015 10:29 pm
Location: Near San Antonio, Texas

Re: Building a stable server for Linux compiling

Post by MurphCID »

That seems odd to me, but then again I AM an AMD fanboy. I am running a gen 1 Ryzen 1700 and it has been superb. Now at the beginning I did have lot of issues with the Gigabyte motherboard (their tech support is horrible), once I changed to an ASUS Crosshair VI Hero, it has been rock solid. From your description I would suspect either motherboard or ram issues. If I could afford one, I would absolutely get a threadripper 32 core system. Do I need one? Absolutely not. Do I WANT one? Yes. Please keep us posted on the issues, this thread is interesting.
User avatar
Michael_Hathaway
Level 4
Level 4
Posts: 313
Joined: Sat Oct 09, 2021 2:27 am
Location: Shebang, USA
Contact:

Re: Building a stable server for Linux compiling

Post by Michael_Hathaway »

DisturbedDragon wrote: Wed Dec 08, 2021 7:46 amGiven the issues persists across operating systems/distros and notably during higher workloads for the most part; what kind of power are you pushing on this hardware? Could very well be a power supply issue. I had a server (dual 16 core Opterons w/256GB ECC) that kept crashing randomly a little over a year ago. Thought it was CPU, board, RAM or failing drive. Funnily I did test the PSU and it tested out fine. Ran memtest, swapped CPU, checked board with a magnifying glass looking for issues, Nothing but crash, crash, crash, reboot. I swapped out the power supply just for giggles. System has been up for a year now...
That is a good point you made. Yes, a Opteron system crashing like that is very odd. I am used to working on dual Xeon workstations, so you see my reference point to building this system and having crashing. Most of the crashing is happening when either suspending to ram or when I have large workloads in ram. This is what led me to believe it is either an incompatibility with my ram or the pathways to that ram on the motherboard.

I will check the power supply. I purchased an Evga Platinum 80, 850W power supply. This is overkill in the amount of power that is needed, but, it might be faulty. I recently tested the PowerBoostOverdirve setting to see if I had enough cooling. Forced full load resulted in a cpu temperature of 91C. I decided to turn PBO off. For those not familiar what this does, is it allows single core boost to sustain longer time periods by allowing more voltage and current to go to the cpu. The reason I bring this up is that the higher current sent to the processor did not change the amount of instability that I observed, only the increase in cpu temperature.

AMD, the processors are basically set to their maximum performance with stability out of the box. Any changes to this will boost performance at the cost of stability. And I want stability over performance, so I will leave this in its out of the box, stock form.
Enterprise Dual Xeon 8081 (112) @3.8Ghz, 16TB NVMe Raid, 387Gb ECC, AMD Pro W7700 16Gb
Debian Support. Deb 12/13 Trixie 6.7.9
Image
User avatar
Michael_Hathaway
Level 4
Level 4
Posts: 313
Joined: Sat Oct 09, 2021 2:27 am
Location: Shebang, USA
Contact:

Re: Building a stable server for Linux compiling

Post by Michael_Hathaway »

MurphCID wrote: Wed Dec 08, 2021 8:07 am That seems odd to me, but then again I AM an AMD fanboy. I am running a gen 1 Ryzen 1700 and it has been superb. Now at the beginning I did have lot of issues with the Gigabyte motherboard (their tech support is horrible), once I changed to an ASUS Crosshair VI Hero, it has been rock solid. From your description I would suspect either motherboard or ram issues. If I could afford one, I would absolutely get a threadripper 32 core system. Do I need one? Absolutely not. Do I WANT one? Yes. Please keep us posted on the issues, this thread is interesting.
I am thinking that you are correct, and it is motherboard and memory also. Your Crosshair VI Hero is a X370 chipset and I needed the X570. I did try to buy the Asus X570 board, but it is not available, and even the used ones are selling for $750.

At this point, I ordered a new Aorus Xtreme motherboard which I found new for $600 and I have ordered 128Gb of ECC 3200. Not all X570 boards will run ECC so I am limited. The Aorus Xtreme uses heat piping to cool the X570, vs the Master I have now has a gigantic heatsink. I also ordered a new rack case which I plan to hole-saw two additional 120mm fan mounts on the port side. I will have a total of 3 - high pressure 120mm Noctua inlet fans and 1 80mm.

As for Threadripper, in my opinion, is a foolish purchase. The difference between the 5950X (16/32core, $700) and the 3970X (32/64core, $3000), basically thread count power. A Chromium compile on the 5950X takes exactly 10 minutes longer than the 3970X, that is not twice as long, there are huge diminishing returns here. And even if you did justify the +$2300.00 more you are spending, why would you, when you can buy an Epyc for about the same money, have 8 memory channels. Yes the Threadripper has a higher clock speed, but it doesn't compile any faster than the Epyc because it is limited to 4 memory channels. So the real difference is that you get a faster single core speed on the Threadripper at the cost of a lot more power consumption and less stability. But, you can awe all your buddies, saying you have a Threadripper, except for the ones that have Epyc.
Enterprise Dual Xeon 8081 (112) @3.8Ghz, 16TB NVMe Raid, 387Gb ECC, AMD Pro W7700 16Gb
Debian Support. Deb 12/13 Trixie 6.7.9
Image
User avatar
MurphCID
Level 15
Level 15
Posts: 5908
Joined: Fri Sep 25, 2015 10:29 pm
Location: Near San Antonio, Texas

Re: Building a stable server for Linux compiling

Post by MurphCID »

Michael_Hathaway wrote: Thu Dec 09, 2021 6:09 am
MurphCID wrote: Wed Dec 08, 2021 8:07 am That seems odd to me, but then again I AM an AMD fanboy. I am running a gen 1 Ryzen 1700 and it has been superb. Now at the beginning I did have lot of issues with the Gigabyte motherboard (their tech support is horrible), once I changed to an ASUS Crosshair VI Hero, it has been rock solid. From your description I would suspect either motherboard or ram issues. If I could afford one, I would absolutely get a threadripper 32 core system. Do I need one? Absolutely not. Do I WANT one? Yes. Please keep us posted on the issues, this thread is interesting.
I am thinking that you are correct, and it is motherboard and memory also. Your Crosshair VI Hero is a X370 chipset and I needed the X570. I did try to buy the Asus X570 board, but it is not available, and even the used ones are selling for $750.

At this point, I ordered a new Aorus Xtreme motherboard which I found new for $600 and I have ordered 128Gb of ECC 3200. Not all X570 boards will run ECC so I am limited. The Aorus Xtreme uses heat piping to cool the X570, vs the Master I have now has a gigantic heatsink. I also ordered a new rack case which I plan to hole-saw two additional 120mm fan mounts on the port side. I will have a total of 3 - high pressure 120mm Noctua inlet fans and 1 80mm.
I had the top of the line Gigabyte X370 since I only buy top of the line motherboards (less issues normally, plus lots of ports), and the Gigabyte was awful, I was constantly having to install the latest BIOS they were churning out to fix issues, until it finally died one day, and corrupted my NVME drive. I got the ASUS and have never looked back. Also when the Gigabyte died it took a ram stick with it. Gigabyte insisted it was my fault (I do not over clock, etc, I run stock). Then after they "fixed" it they would never tell me what was wrong. I will never purchase another GIgabyte product. Also once I went to G.Skill Trident Z RAM at 2666mhz I have not had a single issue. I never run ram overclocked. I always run the max supported.

My ASUS system has been running stable since July 2017. I was an early adopter. The funny thing is the the 1700 is more than enough processor for me, I truly do not need any more than what I have. I will build a retirement system sometime early next year, still working out the specifications. Most likely either an 8 or 12 core Ryzen 5000 series with 32 or 64 gb of RAM.
User avatar
Michael_Hathaway
Level 4
Level 4
Posts: 313
Joined: Sat Oct 09, 2021 2:27 am
Location: Shebang, USA
Contact:

Re: Building a stable server for Linux compiling

Post by Michael_Hathaway »

I have built just about every manufacture brand there is. You are going to run into issues no matter the brand name.

I think you have a good plan for your retirement system. The 5900X is a better cpu for most people than the 5950. It's half the cost, less heat and nearly the same speed. The Threadripper/Epyc is a peripherals game, as AMD has made Threadripper hardware more attractive. Xeon however, you have a much larger selection of motherboards for example, but at a higher cost. The new 10 and 7nm Xeons are looking promising. I almost purchased a W3375 (38/76 core 4Ghz) xeon 10nm, with 8 channel memory. But I was told to wait for Intel to release it's new ddr5 for xeon. Penguin computing was able to put 7,626 cores into a single rack using xeon 9200 cpus (Relion XO1122eAP server). Competition is a good thing, but I think that AMD is poking the bear.
Enterprise Dual Xeon 8081 (112) @3.8Ghz, 16TB NVMe Raid, 387Gb ECC, AMD Pro W7700 16Gb
Debian Support. Deb 12/13 Trixie 6.7.9
Image
User avatar
Michael_Hathaway
Level 4
Level 4
Posts: 313
Joined: Sat Oct 09, 2021 2:27 am
Location: Shebang, USA
Contact:

Re: Building a stable server for Linux compiling

Post by Michael_Hathaway »

Only half of my memory arrived today. I Installed 64Gb of Ecc 3200, everything is working solid now 3211Mhz. I did not have to adjust any settings, motherboard did take extra time to post like all servers seem to do. There are no XMP profiles on ECC. Cpu temperature went down 3 degrees under full load and idle is down 7 degrees.


Image
Enterprise Dual Xeon 8081 (112) @3.8Ghz, 16TB NVMe Raid, 387Gb ECC, AMD Pro W7700 16Gb
Debian Support. Deb 12/13 Trixie 6.7.9
Image
Locked

Return to “Chat about Linux”