0% found this document useful (0 votes)
74 views

LPMC Type: D-Cache Parity Error

This document discusses a D-Cache parity error detected on a server. It provides the following information: 1) A single D-Cache or I-Cache parity error can be caused by background radiation or overheating CPUs and may not indicate a problem. 2) If more than one error occurs within a day or two, it suggests a physical defect in the CPU that could cause failure, and the CPU should be replaced. 3) Low Priority Machine Check errors like this one indicate a single-bit memory error was detected and corrected, but multiple occurrences per week may warrant replacing the CPU board to prevent future issues.

Uploaded by

liuyl
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
74 views

LPMC Type: D-Cache Parity Error

This document discusses a D-Cache parity error detected on a server. It provides the following information: 1) A single D-Cache or I-Cache parity error can be caused by background radiation or overheating CPUs and may not indicate a problem. 2) If more than one error occurs within a day or two, it suggests a physical defect in the CPU that could cause failure, and the CPU should be replaced. 3) Low Priority Machine Check errors like this one indicate a single-bit memory error was detected and corrected, but multiple occurrences per week may warrant replacing the CPU board to prevent future issues.

Uploaded by

liuyl
Copyright
© © All Rights Reserved
Available Formats
Download as DOC, PDF, TXT or read online on Scribd
You are on page 1/ 2

LPMC type : D-Cache Parity Error

Just a note, a D-Cache or I-Cache parity error (Single Bit) can be caused by anything from
background radiation to overheating CPUs. When only one appears (this one seems to be the only
one in the last 4 months) I would assume that it is due to background radiation. Of course you
should always call HP support (why not, if you paid for it). When it appears more than once, there
is a very good chance that the CPU has a physical defect or heat damage that will cause it to fail
soon. It's always better to be able to schedule downtime to replace a CPU on the weekend, rather
than to have the server go down in the middle of the day.

This is definitely a hardware problem --- but it may not be a problem. This can literally be caused
be stray radiation including cosmic radiation. If you see more than one of these over a day or so
(unless you are in an environment with high background radiation) then it's time to call your HP Mr.
Goodwrench and have him replace a processor board.

Low Priority Machine Check - Data Cache. This means a single-bit error was found and corrected.
If this occurs very rarely then it's no big deal but if you are seeing it often (perhaps twice a week)
then you should call your HP Mr. Goodwrench guys and have them replace a CPU board.

As Clay already wrote flipped one bit in the data cache and if this behavior repeats more often you
should consider to kiss the CPU good bye.
But what is a good measure when to keep or when to replace the CPU?
This depends, from my point of view, on the CPU type. Since PA8500 is the data cache ECC
protected, that means, that the system is able to recover form a single bit error by calculating the
right contents of the memory. Prior to PA8500 is the data cache parity protected. That means that
the system is able to recognize a single bit error and - in case of a clean line - re-fetch the contents
from main memory. In case of a dirty line the system would perform an HPMC.

I wouldn't replace a CPU for just one cache error. This might happen and you will never see it
again. On a prior PA8500 are two data cache errors in one week a solid indicator to kiss the CPU
good bye

If you were getting these parity errors every 30 mins then something was seriously wrong with that
CPU. Don't forget to consider environmental causes - especially power & heat.
If you have had any overtemps in the past, they can cause these type errors down the road. And
the bad part is they'll never cease, in fact they'll get worse.

So I'd recommend that if you ever get just 2 of these in a day in the future, then have that CPU
replaced and verify the integrity of the environment - steady cooling/humidity & clean, reliable
power.
The only other external causes of these type problems could be strong RF (Radio Freq) or EM
(Electro-Magnetic) emissions near the system & believe it or not radioactive (usually Cosmic Rays)
sources.

These are called low-priority errors because the hardware corrected the single bit errors. However,
if one bit is bad, more will often follow and eventually they become uncorrectable errors. When that
happens, the system WILL crash. So consider LPMC's as early wearnings about pending failiures.
NOTE: if only one or two messages have occurred in the last few months, don't worry about it. But
if you have had several in a day or two, schedule a service call and get the problem processor
replaced. The cstm logfile seems to have a number of potential problems logged.

It is my understanding that LPMC D-Cache Parity Error are memory related - not CPU...
LPMC is a Low Priority Machine Check which means that the hardware fixed it's own problem.
Possibly a RAM module has gone completely bad.

You might also like