ACK: [PATCH] olog: olog.json: Refresh OPAL skiboot errors to check on olog scan
Alex Hung
alex.hung at canonical.com
Thu Oct 5 00:32:28 UTC 2017
On 2017-10-04 02:25 PM, Deb McLemore wrote:
> This is a periodic refresh of the OPAL olog.json data which
> is produced (see data/README_JSON.txt).
>
> Signed-off-by: Deb McLemore <debmc at linux.vnet.ibm.com>
> ---
> data/olog.json | 856 ++++++++++++++++++++++++++++-----------------------------
> 1 file changed, 428 insertions(+), 428 deletions(-)
>
> diff --git a/data/olog.json b/data/olog.json
> index 3084300..e876db3 100644
> --- a/data/olog.json
> +++ b/data/olog.json
> @@ -1,354 +1,218 @@
> {
> "olog_error_warning_patterns": [
> {
> - "advice": "NVLink not functional",
> + "advice": "HDAT described an invalid size for timebase, which means there's a disagreement between HDAT and OPAL. This is most certainly a firmware bug.",
> "compare_mode": "regex",
> - "label": "NPUisnInvalid",
> - "last_tag": "skiboot-5.6.0",
> - "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": "NPU[0-9]+: isn 0x[0-9a-f]+ not valid for this NPU"
> - },
> - {
> - "advice": "Firmware probably ran out of memory creating NPU slot. NVLink functionality could be broken.",
> - "compare_mode": "string",
> - "label": "NPUCannotCreatePHBSlot",
> - "last_tag": "skiboot-5.6.0",
> - "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": "NPU: Cannot create PHB slot"
> - },
> - {
> - "advice": "OPAL failed to add the power-mgt device tree node. This could mean that firmware ran out of memory, or there's a bug somewhere.",
> - "compare_mode": "string",
> - "label": "CreateDTPowerMgtNodeFail",
> - "last_tag": "skiboot-5.6.0",
> - "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": "creating dt node /ibm,opal/power-mgt failed"
> - },
> - {
> - "advice": "An error condition occurred in sleep/winkle engines timer state machine. Dumping debug information to root-cause. OPAL/skiboot may be stuck on some operation that requires SLW timer state machine (e.g. core powersaving)",
> - "compare_mode": "string",
> - "label": "SLWRegisterDump",
> - "last_tag": "skiboot-5.6.0",
> - "log_level": "LOG_LEVEL_LOW",
> - "pattern": "SLW: Register state:"
> - },
> - {
> - "advice": "The SLeep/Winkle Engine (SLW) failed to increment the generation number within our timeout period (it *should* have done so within ~10us, not >1ms. OPAL uses the SLW timer to schedule some operations, but can fall back to the (much less frequent OPAL poller, which although does not affect functionality, runs *much* less frequently. This could have the effect of slow I2C operations (for example). It may also mean that you *had* an increase in jitter, due to slow interactions with SLW. This error may also occur if the machine is connected to via soft FSI.",
> - "compare_mode": "string",
> - "label": "SLWTimerStuck",
> - "last_tag": "skiboot-5.6.0",
> - "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": "SLW: timer stuck, falling back to OPAL pollers. You will likely have slower I2C and may have experienced increased jitter."
> - },
> - {
> - "advice": "No GPU/NPU slot information was found. NVLink2 functionality will not work.",
> - "compare_mode": "string",
> - "label": "NPUNoPHBSlotLabel",
> - "last_tag": "skiboot-5.6.0",
> + "label": "HDATBadTimebaseSize",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": "NPU: Cannot find GPU slot information"
> + "pattern": "HDAT: Bad timebase size [0-9]+ @ (0x[0-9a-f]+|nil)"
> },
> {
> - "advice": "Firmware probably ran out of memory creating NPU slot. NVLink functionality could be broken.",
> + "advice": "AST BMC is likely to be non-functional when accessed from host.",
> "compare_mode": "string",
> - "label": "NPUCannotCreatePHBSlot",
> - "last_tag": "skiboot-5.6.0",
> - "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": "NPU: Cannot create PHB slot"
> - },
> - {
> - "advice": "OPAL marked a Centaur (memory buffer) as offline due to CENTAUR_ERR_OFFLINE_THRESHOLD (10) consecutive errors on XSCOMs to this centaur. OPAL will now return OPAL_XSCOM_CTR_OFFLINED and not try any further XSCOMs. This is likely caused by some hardware issue or PRD recovery issue.",
> - "compare_mode": "regex",
> - "label": "CentaurOfflinedTooManyErrors",
> - "last_tag": "skiboot-5.6.0",
> - "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": "CENTAUR: Offlined [0-9a-f]+ due to > [0-9]+ consecutive XSCOM errors. No more XSCOMs to this centaur."
> - },
> - {
> - "advice": "xscom_read was called with an invalid partid. There's likely a bug somewhere in the stack that's causing someone to try an xscom_read on something that isn't a processor, Centaur or EX chiplet.",
> - "compare_mode": "regex",
> - "label": "XSCOMReadInvalidPartID",
> - "last_tag": "skiboot-5.6.0",
> - "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": ".*: invalid XSCOM partid 0x[0-9a-f]+"
> - },
> - {
> - "advice": "xscom_write was called with an invalid partid. There's likely a bug somewhere in the stack that's causing someone to try an xscom_write on something that isn't a processor, Centaur or EX chiplet.",
> - "compare_mode": "regex",
> - "label": "XSCOMWriteInvalidPartID",
> - "last_tag": "skiboot-5.6.0",
> + "label": "ASTBMCFailedSetEnables",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": ".*: invalid XSCOM partid 0x[0-9a-f]+"
> + "pattern": "ASTBMC: failed to set enables"
> },
> {
> - "advice": "The HOMER base address for a chip was not valid. This means that OCC (On Chip Controller) will be non-functional and CPU frequency scaling will not be functional. CPU may be set to a safe, low frequency. Power savings in CPU idle or CPU hotplug may be impacted.",
> + "advice": "Device tree didn't contain enough information to correctly report back PCI inventory. Service processor is likely to be missing information about what hardware is physically present in the machine.",
> "compare_mode": "regex",
> - "label": "OCCInvalidHomerBase",
> - "last_tag": "skiboot-5.6.0",
> + "label": "FirenzePCIInventory",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": "OCC: Chip: [0-9a-f]+ homer_base is not valid"
> + "pattern": "No chip device node for PHB[0-9a-f]+"
> },
> {
> - "advice": "The pstate table for a chip was not valid. This means that OCC (On Chip Controller) will be non-functional and CPU frequency scaling will not be functional. CPU may be set to a low, safe frequency. This means that CPU idle states and CPU frequency scaling may not be functional.",
> + "advice": "More PCI inventory than we can send to service processor. The service processor will have an incomplete view of the world.",
> "compare_mode": "regex",
> - "label": "OCCInvalidPStateTable",
> - "last_tag": "skiboot-5.6.0",
> - "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": "OCC: Chip: [0-9a-f]+ PState table is not valid"
> - },
> - {
> - "advice": "The pstate table for the first chip was not valid. This means that OCC (On Chip Controller) will be non-functional. This means that CPU idle states and CPU frequency scaling will not be functional as OPAL doesn't populate the device tree with pstates in this case.",
> - "compare_mode": "string",
> - "label": "OCCInvalidPStateTableDT",
> - "last_tag": "skiboot-5.6.0",
> + "label": "FirenzePCIInventoryTooLarge",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": "OCC: PState table is not valid"
> + "pattern": "Inventory ([0-9]+ bytes) too large"
> },
> {
> - "advice": "The PState table layout version is not supported in P9. So OPAL will not parse the PState table. CPU frequency scaling will not be functional as frequency and pstate-ids are not added to DT.",
> + "advice": "Error communicating with service processor when sending PCI Inventory.",
> "compare_mode": "regex",
> - "label": "OCCInvalidVersion02",
> - "last_tag": "skiboot-5.6.0",
> + "label": "FirenzePCIInventoryError",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": "OCC: Version [0-9a-f]+ is not supported in P9"
> + "pattern": "Error [0-9]+ sending inventory"
> },
> {
> - "advice": "The PState table layout version is not supported in P8. So OPAL will not parse the PState table. CPU frequency scaling will not be functional as frequency and pstate-ids are not added to DT.",
> + "advice": "On Firenze platforms, I2C is used to control power to PCI slots. Errors here mean we may be in trouble in regards to PCI slot power on/off.",
> "compare_mode": "regex",
> - "label": "OCCInvalidVersion90",
> - "last_tag": "skiboot-5.6.0",
> + "label": "FirenzePCII2CError",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": "OCC: Version [0-9a-f]+ is not supported in P8"
> + "pattern": "Error [0-9]+ from I2C request on slot [0-9a-f]+"
> },
> {
> - "advice": "The PState table layout version is not supported. So OPAL will not parse the PState table. CPU frequency scaling will not be functional as OPAL doesn't populate the device tree with pstates.",
> + "advice": "Likely a coding error: invalid I2C request.",
> "compare_mode": "regex",
> - "label": "OCCUnsupportedVersion",
> - "last_tag": "skiboot-5.6.0",
> + "label": "FirenzePCII2CInvalid",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": "OCC: Unsupported pstate table layout version [0-9]+"
> + "pattern": "Invalid I2C request [0-9]+ on slot [0-9a-f]+"
> },
> {
> - "advice": "The min pstate is greater than the max pstate, this could be due to corrupted/invalid data in OCC-OPAL shared memory region. So OPAL has not added pstates to device tree. This means that CPU Frequency management will not be functional in the host.",
> + "advice": "The Firenze platform uses I2C to control power to PCI slots. Something went wrong in the state machine controlling that. Slots may/may not have power.",
> "compare_mode": "regex",
> - "label": "OCCInvalidPStateLimits",
> - "last_tag": "skiboot-5.6.0",
> + "label": "FirenzePCISlotI2CStateError",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": "OCC: Invalid pstate limits. Pmin([0-9]+) > Pmax ([0-9]+)"
> + "pattern": "Wrong state [0-9a-f]+ on slot [0-9a-f]+"
> },
> {
> - "advice": "The nominal pstate is greater than the max pstate, this could be due to corrupted/invalid data in OCC-OPAL shared memory region. So OPAL has limited the nominal pstate to max pstate.",
> + "advice": "Unexpected state in the FIRENZE PCI Slot state machine. This could mean PCI is not functioning correctly.",
> "compare_mode": "regex",
> - "label": "OCCInvalidNominalPState",
> - "last_tag": "skiboot-5.6.0",
> + "label": "FirenzePCISlotGPowerState",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": "OCC: Clipping nominal pstate([0-9]+) to Pmax([0-9]+)"
> + "pattern": "[0-9a-f]+ GPOWER: Unexpected state [0-9a-f]+"
> },
> {
> - "advice": "The number of pstates is outside the valid range (currently <=1 or > 128), so OPAL has not added pstates to the device tree. This means that OCC (On Chip Controller) will be non-functional. This means that CPU idle states and CPU frequency scaling will not be functional.",
> + "advice": "Unexpected state in the FIRENZE PCI Slot state machine. This could mean PCI is not functioning correctly.",
> "compare_mode": "regex",
> - "label": "OCCInvalidPStateRange",
> - "last_tag": "skiboot-5.6.0",
> + "label": "FirenzePCISlotSPowerState",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": "OCC: OCC range is not valid; No of pstates = [0-9]+"
> + "pattern": "[0-9a-f]+ SPOWER: Unexpected state [0-9a-f]+"
> },
> {
> - "advice": "Device tree node /ibm,opal/power-mgt not found. OPAL didn't add pstate information to device tree. Probably a firmware bug.",
> + "advice": "You are running in manufacturing mode. This mode should only be enabled in a factory during manufacturing.",
> "compare_mode": "string",
> - "label": "OCCDTNodeNotFound",
> - "last_tag": "skiboot-5.6.0",
> - "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": "OCC: dt node /ibm,opal/power-mgt not found"
> - },
> - {
> - "advice": "Failed to create /ibm,opal/power-mgt/occ. Per-chip pstate properties are not added to Device Tree.",
> - "compare_mode": "regex",
> - "label": "OCCDTFailedNodeCreation",
> - "last_tag": "skiboot-5.6.0",
> - "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": "OCC: Failed to create /ibm,opal/power-mgt/occ@[0-9a-f]+"
> + "label": "ManufacturingMode",
> + "last_tag": "v5.8",
> + "log_level": "LOG_LEVEL_MEDIUM",
> + "pattern": "PLAT: Manufacturing mode ON"
> },
> {
> - "advice": "ENOMEM while allocating OCC load message. OCCs not started, consequently no power/frequency scaling will be functional.",
> + "advice": "opal_i2c_request was passed an invalid bus ID. This has likely come from the OS rather than OPAL and thus could indicate an OS bug rather than an OPAL bug.",
> "compare_mode": "string",
> - "label": "OCCload_reqENOMEM",
> - "last_tag": "skiboot-5.6.0",
> - "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": "OCC: Could not allocate occ_load_req"
> - },
> - {
> - "advice": "Invalid request for loading OCCs. Power and frequency management not functional",
> - "compare_mode": "regex",
> - "label": "OCCLoadInvalidScope",
> - "last_tag": "skiboot-5.6.0",
> - "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": "OCC: Load message with invalid scope 0x[0-9a-f]+"
> - },
> - {
> - "advice": "Invalid request for resetting OCCs. Power and frequency management not functional",
> - "compare_mode": "regex",
> - "label": "OCCResetInvalidScope",
> - "last_tag": "skiboot-5.6.0",
> - "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": "OCC: Reset message with invalid scope 0x[0-9a-f]+"
> - },
> - {
> - "advice": "OPAL had trouble creating the sensor nodes in the device tree as there was already one there. This indicates either the device tree from Hostboot already filled in sensors or an OPAL bug.",
> - "compare_mode": "regex",
> - "label": "OPALSensorNodeExists",
> - "last_tag": "skiboot-5.6.0",
> - "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": "SENSOR: node .* exists"
> - },
> - {
> - "advice": "Failed to update System Attention Indicator. Likely means some bug with OPAL interacting with FSP.",
> - "compare_mode": "regex",
> - "label": "FSPSAIFailed",
> - "last_tag": "skiboot-5.6.0",
> - "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": "Update SAI cmd failed [rc=[0-9]+]."
> - },
> - {
> - "advice": "OPAL ran out of memory while trying to allocate an FSP message in SAI code path. This indicates an OPAL bug that caused OPAL to run out of memory.",
> - "compare_mode": "regex",
> - "label": "SAIMallocFail",
> - "last_tag": "skiboot-5.6.0",
> - "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": ".*: Memory allocation failed."
> - },
> - {
> - "advice": "Error in queueing message to FSP in SAI code path. Likely an OPAL bug.",
> - "compare_mode": "regex",
> - "label": "SAIQueueFail",
> - "last_tag": "skiboot-5.6.0",
> + "label": "I2CInvalidBusID",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": ".*: Failed to queue the message"
> + "pattern": "I2C: Invalid 'bus_id' passed to the OPAL"
> },
> {
> - "advice": "Possibly an error on FSP side, OPAL failed to read state from FSP.",
> - "compare_mode": "regex",
> - "label": "FSPSAIGetFailed",
> - "last_tag": "skiboot-5.6.0",
> + "advice": "OPAL failed to allocate memory for an i2c_request. This points to an OPAL bug as OPAL ran out of memory and this should never happen.",
> + "compare_mode": "string",
> + "label": "I2CFailedAllocation",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": "Read real SAI cmd failed [rc = 0x[0-9a-f]+]."
> + "pattern": "I2C: Failed to allocate 'i2c_request'"
> },
> {
> - "advice": "OPAL ran out of memory: OPAL bug.",
> - "compare_mode": "regex",
> - "label": "FSPGetSAIMallocFail",
> - "last_tag": "skiboot-5.6.0",
> + "advice": "OPAL could not find an NVRAM partition on the system flash. Check that the system flash has a valid partition table, and that the firmware build process has added a NVRAM partition.",
> + "compare_mode": "string",
> + "label": "NVRAMNoPartition",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": ".*: Memory allocation failed."
> + "pattern": "FLASH: Can't parse ffs info for NVRAM"
> },
> {
> - "advice": "Failed to queue message to FSP: OPAL bug",
> + "advice": "OPAL Found multiple system flash. Since we've already found a system flash we are going to use that one but this ordering is not guaranteed so may change in future.",
> "compare_mode": "regex",
> - "label": "FSPGetSAIQueueFail",
> - "last_tag": "skiboot-5.6.0",
> - "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": ".*: Failed to queue the message"
> + "label": "SystemFlashMultiple",
> + "last_tag": "v5.8",
> + "log_level": "LOG_LEVEL_HIGH",
> + "pattern": "FLASH: Attempted to register multiple system flash: .*"
> },
> {
> - "advice": "OPAL failed to allocate memory for FSP LED command. Likely an OPAL bug led to out of memory.",
> + "advice": "System flash isn't formatted as expected. This could mean several OPAL utilities do not function as expected. e.g. gard, pflash.",
> "compare_mode": "string",
> - "label": "FSPLEDRequestMallocFail",
> - "last_tag": "skiboot-5.6.0",
> - "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": "SPCN set command node allocation failed"
> + "label": "NoFFS",
> + "last_tag": "v5.8",
> + "log_level": "LOG_LEVEL_HIGH",
> + "pattern": "FLASH: No ffs info; using raw device only"
> },
> {
> - "advice": "Inconsistent state between OPAL and FSP in code path for handling failure of fetching error log from FSP. Likely a bug in interaction between FSP and OPAL.",
> + "advice": "No system flash was found. Check for missing calls flash_register(...).",
> "compare_mode": "regex",
> - "label": "ElogFetchFailureInconsistent",
> - "last_tag": "skiboot-5.6.0",
> - "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": ".*: Inconsistent internal list state !"
> + "label": "SystemFlashNotFound",
> + "last_tag": "v5.8",
> + "log_level": "LOG_LEVEL_HIGH",
> + "pattern": "FLASH: Can't load resource id:[0-9]+. No system flash found"
> },
> {
> - "advice": "Bug in interaction between FSP and OPAL. We expected there to be a pending read from FSP but the list was empty.",
> - "compare_mode": "regex",
> - "label": "ElogQueueInconsistent",
> - "last_tag": "skiboot-5.6.0",
> + "advice": "HBRT triggered assert: you need to debug HBRT",
> + "compare_mode": "string",
> + "label": "HBRTassert",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": ".*: Inconsistent internal list state !"
> + "pattern": "HBRT: Assertion from hostservices"
> },
> {
> - "advice": "We expected there to be an entry in the list of error logs for the error log we're fetching information for. There wasn't. This means there's a bug.",
> - "compare_mode": "regex",
> - "label": "ElogInfoInconsistentState",
> - "last_tag": "skiboot-5.6.0",
> + "advice": "Firmware should have aborted boot",
> + "compare_mode": "string",
> + "label": "HBRTlidLoadFail",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": ".*: Inconsistent internal list state !"
> + "pattern": "HBRT: LID Load failed"
> },
> {
> - "advice": "Inconsistent state while reading error log from FSP. Bug in OPAL and FSP interaction.",
> + "advice": "OPAL was called with a bad token. On POWER8 and earlier, Linux kernels had a bug where they wouldn't check if firmware supported particular OPAL calls before making them. It is, in fact, harmless for these cases. On systems newer than POWER8, this should never happen and indicates a kernel bug where OPAL_CHECK_TOKEN isn't being called where it should be.",
> "compare_mode": "regex",
> - "label": "ElogReadInconsistentState",
> - "last_tag": "skiboot-5.6.0",
> + "label": "OPALBadToken",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": ".*: Inconsistent internal list state !"
> + "pattern": "OPAL: Called with bad token [0-9]+ !"
> },
> {
> - "advice": "Error in acknowledging heartbeat to FSP. This could mean the FSP has gone away or it may mean the FSP may kill us for missing too many heartbeats.",
> + "advice": "Currently removing a poller is DANGEROUS and MUST NOT be done in production firmware.",
> "compare_mode": "string",
> - "label": "FSPHeartbeatAckError",
> - "last_tag": "skiboot-5.6.0",
> - "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": "SURV: Heartbeat Acknowledgment error from FSP"
> - },
> - {
> - "advice": "Bug in interaction between FSP and OPAL. The state maintained by OPAL didn't match what the FSP sent.",
> - "compare_mode": "regex",
> - "label": "ElogListInconsistent",
> - "last_tag": "skiboot-5.6.0",
> + "label": "UnsupportedOPALdelpoller",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": ".*: Inconsistent internal list state !"
> + "pattern": "WARNING: Unsupported opal_del_poller. Interesting locking issues, don't call this."
> },
> {
> - "advice": "Bug in skiboot/FSP code for EPOW event handling",
> + "advice": "Recursion detected in opal_run_pollers(). This indicates a bug in OPAL where a poller ended up running pollers, which doesn't lead anywhere good.",
> "compare_mode": "string",
> - "label": "EPOWSignatureMismatch",
> - "last_tag": "skiboot-5.6.0",
> + "label": "OPALPollerRecursion",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": "Signature mismatch"
> + "pattern": "OPAL: Poller recursion detected."
> },
> {
> - "advice": "Queueing a message from OPAL to FSP failed. This is likely due to either an OPAL bug or the FSP going away.",
> + "advice": "opal_run_pollers() was called with a lock held, which could lead to deadlock if not excessively lucky/careful.",
> "compare_mode": "string",
> - "label": "EPOWMessageQueueFailed",
> - "last_tag": "skiboot-5.6.0",
> + "label": "OPALPollerWithLock",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": "OPAL EPOW message queuing failed"
> + "pattern": "Running pollers with lock held !"
> },
> {
> - "advice": "OPAL doesn't know that the BMC supports PNOR access commands. This will be a bug in the OPAL support for this BMC.",
> + "advice": "Your firmware is buggy, see the 64 messages complaining about opal_run_pollers with lock held.",
> "compare_mode": "string",
> - "label": "PNORAccessYeahButNoBut",
> - "last_tag": "skiboot-5.6.0",
> + "label": "OPALPollerWithLock64",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": "PNOR BUG: access requested but BMC doesn't support request"
> + "pattern": "opal_run_pollers with lock run 64 times, disabling warning."
> },
> {
> - "advice": "In negotiating PNOR access with BMC, we got an odd/invalid request from the BMC. Likely a bug in OPAL/BMC interaction.",
> + "advice": "OPAL will parse the Flattened Device Tree(FDT), which can be generated from different firmware sources. During expansion of FDT, OPAL observed a node assigned multiple times (a duplicate). This indicates either a Hostboot bug *OR*, more likely, a bug in the platform XML. Check the platform XML for duplicate IDs for this type of device. Because of this duplicate node, OPAL won't add the hardware device found with a duplicate node ID into DT, rendering the corresponding device not functional.",
> "compare_mode": "regex",
> - "label": "InvalidPNORAccessRequest",
> - "last_tag": "skiboot-5.6.0",
> + "label": "DTHasDuplicateNodeID",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": "invalid PNOR access requested: [0-9a-f]+"
> + "pattern": "DT: Found duplicate node: .*"
> },
> {
> - "advice": "Likely bug in what sent us the OCC reset.",
> + "advice": "OPAL hit an assert(). During normal usage (even testing) we should never hit an assert. There are other code paths for controlled shutdown/panic in the event of catastrophic errors.",
> "compare_mode": "regex",
> - "label": "SELUnknownOCCReset",
> - "last_tag": "skiboot-5.6.0",
> + "label": "FailedAssert",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": "SEL message to reset an unknown OCC (sensor ID 0x[0-9a-f]+)"
> + "pattern": "Assert fail: .*"
> },
> {
> "advice": "The resource is not registered in the resource_map[] array, but it should be otherwise the resource cannot be measured if trusted mode is on.",
> "compare_mode": "regex",
> "label": "STBMeasureResourceNotMapped",
> - "last_tag": "skiboot-5.6.0",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> "pattern": "STB: .* failed, resource [0-9]+ not mapped"
> },
> @@ -356,7 +220,7 @@
> "advice": "Null resource passed to tb_measure. This has come from the resource load framework and likely indicates a bug in the framework.",
> "compare_mode": "regex",
> "label": "STBNullResourceReceived",
> - "last_tag": "skiboot-5.6.0",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> "pattern": "STB: .* failed: resource .*, buf null"
> },
> @@ -364,15 +228,23 @@
> "advice": "Unregistered resources can be verified, but not measured. The resource should be registered in the resource_map[] array, otherwise the resource cannot be measured if trusted mode is on.",
> "compare_mode": "regex",
> "label": "STBVerifyResourceNotMapped",
> - "last_tag": "skiboot-5.6.0",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_HIGH",
> "pattern": "STB: verifying the non-expected resource [0-9]+"
> },
> {
> + "advice": "ibm,secureboot already registered. Check if rom_init called twice or the same driver is probed twice",
> + "compare_mode": "regex",
> + "label": "ROMAlreadyRegistered",
> + "last_tag": "v5.8",
> + "log_level": "LOG_LEVEL_HIGH",
> + "pattern": "ROM: .* driver already registered"
> + },
> + {
> "advice": "STB_DEBUG should not be enabled in production. PCR read operation failed. This TSS implementation is part of hostboot, but the source code is shared with skiboot. 1) The hostboot TSS may have been updated. 2) This may be caused by the short I2C timeout and can be fixed by increasing the timeout. Otherwise this indicates a bug in the TSS or the TPM device driver. Each one has local debug macros that can help.",
> "compare_mode": "regex",
> "label": "STBPCRReadFailed",
> - "last_tag": "skiboot-5.6.0",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> "pattern": "STB: tpmCmdPcrRead() failed: tpm[0-9]+, alg=[0-9a-f]+, pcr[0-9]+, rc=[0-9]+"
> },
> @@ -380,7 +252,7 @@
> "advice": "TPM node already registered. The same node is being registered twice or there is a tpm node duplicate in the device tree",
> "compare_mode": "regex",
> "label": "TPMAlreadyRegistered",
> - "last_tag": "skiboot-5.6.0",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_HIGH",
> "pattern": "TPM: tpm[0-9]+ already registered"
> },
> @@ -388,7 +260,7 @@
> "advice": "linux,sml-base property not found. This indicates a Hostboot bug if the property really doesn't exist in the tpm node.",
> "compare_mode": "regex",
> "label": "TPMSmlBaseNotFound",
> - "last_tag": "skiboot-5.6.0",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> "pattern": "TPM: linux,sml-base property not found tpm node (0x[0-9a-f]+|nil)"
> },
> @@ -396,7 +268,7 @@
> "advice": "linux,sml-size property not found. This indicates a Hostboot bug if the property really doesn't exist in the tpm node.",
> "compare_mode": "regex",
> "label": "TPMSmlSizeNotFound",
> - "last_tag": "skiboot-5.6.0",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> "pattern": "TPM: linux,sml-size property not found, tpm node (0x[0-9a-f]+|nil)"
> },
> @@ -404,7 +276,7 @@
> "advice": "Hostboot creates and adds entries to the event log. The failed init function is part of hostboot, but the source code is shared with skiboot. If the hostboot TpmLogMgr code (or friends) has been updated, the changes need to be applied to skiboot as well.",
> "compare_mode": "regex",
> "label": "TPMInitEventLogFailed",
> - "last_tag": "skiboot-5.6.0",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> "pattern": "TPM: eventlog init failed: tpm[0-9]+ rc=[0-9]+"
> },
> @@ -412,7 +284,7 @@
> "advice": "TPM already initialized. Check if tpm is being initialized more than once.",
> "compare_mode": "string",
> "label": "TPMAlreadyInitialized",
> - "last_tag": "skiboot-5.6.0",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_HIGH",
> "pattern": "TPM: tpm device(s) already initialized"
> },
> @@ -420,7 +292,7 @@
> "advice": "No TPM chip has been initialized. We may not have a compatible tpm driver or there is no tpm node in the device tree with the expected bindings.",
> "compare_mode": "string",
> "label": "NoTPMRegistered",
> - "last_tag": "skiboot-5.6.0",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> "pattern": "TPM: no tpm chip registered"
> },
> @@ -428,7 +300,7 @@
> "advice": "TpmLogMgr failed to add a new event to the event log. TpmLogMgr is part of hostboot, but the source code is shared with skiboot. 1) The hostboot TpmLogMgr code may have been updated. 2) Check that max event log size was not reached and log marshall executed with no error. Enabling the trace routines in trustedbootUtils.H may help.",
> "compare_mode": "regex",
> "label": "STBAddEventFailed",
> - "last_tag": "skiboot-5.6.0",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> "pattern": "TPM: .* -> elog[0-9]+ FAILED: pcr[0-9]+ et=[0-9a-f]+ rc=[0-9]+"
> },
> @@ -436,23 +308,23 @@
> "advice": "PCR extend operation failed. This TSS implementation is part of hostboot, but the source code is shared with skiboot. 1) The hostboot TSS may have been updated. 2) This may be caused by the short I2C timeout and can be fixed by increasing the timeout. Otherwise, this indicates a bug in the TSS or the TPM device driver. Each one has local debug macros that can help.",
> "compare_mode": "regex",
> "label": "STBPCRExtendFailed",
> - "last_tag": "skiboot-5.6.0",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> "pattern": "TPM: .* -> tpm[0-9]+ FAILED: pcr[0-9]+ rc=[0-9]+"
> },
> {
> - "advice": "ibm,secureboot already registered. Check if rom_init called twice or the same driver is probed twice",
> + "advice": "Hostboot creates the ibm,secureboot node and the hash-algo property. Check that the ibm,secureboot node layout has not changed.",
> "compare_mode": "regex",
> - "label": "ROMAlreadyRegistered",
> - "last_tag": "skiboot-5.6.0",
> - "log_level": "LOG_LEVEL_HIGH",
> - "pattern": "ROM: .* driver already registered"
> + "label": "ROMHashAlgorithmInvalid",
> + "last_tag": "v5.8",
> + "log_level": "LOG_LEVEL_CRITICAL",
> + "pattern": "ROM: hash-algo=.* not expected"
> },
> {
> "advice": "Either the tpm device or the tpm-i2c interface doesn't seem to be working properly. Check the return code (rc) for further details.",
> "compare_mode": "regex",
> "label": "TPMReadCmdReady",
> - "last_tag": "skiboot-5.6.0",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> "pattern": "NUVOTON: fail to read sts.commandReady, rc=[0-9]+"
> },
> @@ -460,7 +332,7 @@
> "advice": "Either the tpm device or the tpm-i2c interface doesn't seem to be working properly. Check the return code (rc) for further details.",
> "compare_mode": "regex",
> "label": "TPMWriteCmdReady",
> - "last_tag": "skiboot-5.6.0",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> "pattern": "NUVOTON: fail to write sts.commandReady, rc=[0-9]+"
> },
> @@ -468,7 +340,7 @@
> "advice": "The command ready bit of the tpm status register is taking longer to be settled. Either the wait time need to be increased or the TPM device is not functional.",
> "compare_mode": "regex",
> "label": "TPMCmdReadyTimeout",
> - "last_tag": "skiboot-5.6.0",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> "pattern": "NUVOTON: timeout on sts.commandReady, delay > [0-9]+"
> },
> @@ -476,7 +348,7 @@
> "advice": "Either the tpm device or the tpm-i2c interface doesn't seem to be working properly. Check the return code (rc) for further details.",
> "compare_mode": "regex",
> "label": "TPMReadFifoStatus",
> - "last_tag": "skiboot-5.6.0",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> "pattern": "NUVOTON: fail to read fifo status: mask [0-9a-f]+, expected [0-9a-f]+, rc=[0-9]+"
> },
> @@ -484,7 +356,7 @@
> "advice": "Either the tpm device or the tpm-i2c interface doesn't seem to be working properly. Check the return code (rc) for further details.",
> "compare_mode": "regex",
> "label": "TPMReadDataAvail",
> - "last_tag": "skiboot-5.6.0",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> "pattern": "NUVOTON: fail to read sts.dataAvail, rc=[0-9]+"
> },
> @@ -492,7 +364,7 @@
> "advice": "The data avail bit of the tpm status register is taking longer to be settled. Either the wait time need to be increased or the TPM device is not functional.",
> "compare_mode": "regex",
> "label": "TPMDataAvailBitTimeout",
> - "last_tag": "skiboot-5.6.0",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> "pattern": "NUVOTON: timeout on sts.dataAvail, delay=[0-9]+/[0-9]+"
> },
> @@ -500,7 +372,7 @@
> "advice": "Either the tpm device or the tpm-i2c interface doesn't seem to be working properly. Check the return code (rc) for further details.",
> "compare_mode": "regex",
> "label": "TPMReadBurstCount",
> - "last_tag": "skiboot-5.6.0",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> "pattern": "NUVOTON: fail to read sts.burstCount, rc=[0-9]+"
> },
> @@ -508,7 +380,7 @@
> "advice": "The burstcount bit of the tpm status register is taking longer to be settled. Either the wait time need to be increased or the TPM device is not functional.",
> "compare_mode": "regex",
> "label": "TPMBurstCountTimeout",
> - "last_tag": "skiboot-5.6.0",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> "pattern": "NUVOTON: timeout on sts.burstCount, delay=[0-9]+/[0-9]+"
> },
> @@ -516,7 +388,7 @@
> "advice": "Either the tpm device or the tpm-i2c interface doesn't seem to be working properly. Check the return code (rc) for further details.",
> "compare_mode": "regex",
> "label": "TPMWriteFifo",
> - "last_tag": "skiboot-5.6.0",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> "pattern": "NUVOTON: fail to write fifo, count=[0-9]+, rc=[0-9]+"
> },
> @@ -524,7 +396,7 @@
> "advice": "The write to the TPM FIFO overflowed, the TPM is not expecting more data. This indicates a bug in the TPM device driver.",
> "compare_mode": "string",
> "label": "TPMWriteFifoNotExpecting",
> - "last_tag": "skiboot-5.6.0",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> "pattern": "NUVOTON: write FIFO overflow, not expecting more data"
> },
> @@ -532,7 +404,7 @@
> "advice": "Either the tpm device or the tpm-i2c interface doesn't seem to be working properly. Check the return code (rc) for further details.",
> "compare_mode": "regex",
> "label": "TPMWriteFifoLastByte",
> - "last_tag": "skiboot-5.6.0",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> "pattern": "NUVOTON: fail to write fifo (last byte), count=[0-9]+, rc=[0-9]+"
> },
> @@ -540,289 +412,417 @@
> "advice": "The write to the TPM FIFO overflowed. It is expecting more data even though we think we are done. This indicates a bug in the TPM device driver.",
> "compare_mode": "string",
> "label": "TPMWriteFifoExpecting",
> - "last_tag": "skiboot-5.6.0",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> "pattern": "TPM: write FIFO overflow, expecting more data"
> },
> {
> - "advice": "The read from TPM FIFO overflowed. It is expecting more data even though we think we are done. This indicates a bug in the TPM device driver.",
> - "compare_mode": "regex",
> - "label": "TPMReadFifoOverflow",
> - "last_tag": "skiboot-5.6.0",
> + "advice": "The read from TPM FIFO overflowed. It is expecting more data even though we think we are done. This indicates a bug in the TPM device driver.",
> + "compare_mode": "regex",
> + "label": "TPMReadFifoOverflow",
> + "last_tag": "v5.8",
> + "log_level": "LOG_LEVEL_CRITICAL",
> + "pattern": "NUVOTON: overflow on fifo read, c=[0-9]+, bc=[0-9]+, bl=[0-9]+"
> + },
> + {
> + "advice": "Either the tpm device or the tpm-i2c interface doesn't seem to be working properly. Check the return code (rc) for further details.",
> + "compare_mode": "regex",
> + "label": "TPMReadFifo",
> + "last_tag": "v5.8",
> + "log_level": "LOG_LEVEL_CRITICAL",
> + "pattern": "NUVOTON: fail to read fifo, count=[0-9]+, rc=[0-9]+"
> + },
> + {
> + "advice": "TPM device is not initialized. This indicates a bug in the tpm_transmit() caller",
> + "compare_mode": "string",
> + "label": "TPMDeviceNotInitialized",
> + "last_tag": "v5.8",
> + "log_level": "LOG_LEVEL_CRITICAL",
> + "pattern": "TPM: tpm device not initialized"
> + },
> + {
> + "advice": "Either the tpm device or the tpm-i2c interface doesn't seem to be working properly. Check the return code (rc) for further details.",
> + "compare_mode": "regex",
> + "label": "TPMWriteGo",
> + "last_tag": "v5.8",
> + "log_level": "LOG_LEVEL_CRITICAL",
> + "pattern": "NUVOTON: fail to write sts.go, rc=[0-9]+"
> + },
> + {
> + "advice": "Either the tpm device or the tpm-i2c interface doesn't seem to be working properly. Check the return code (rc) for further details.",
> + "compare_mode": "regex",
> + "label": "TPMReleaseTpm",
> + "last_tag": "v5.8",
> + "log_level": "LOG_LEVEL_CRITICAL",
> + "pattern": "NUVOTON: fail to release tpm, rc=[0-9]+"
> + },
> + {
> + "advice": "tpm_i2c_request_send was passed an invalid bus ID. This indicates a tb_init() bug.",
> + "compare_mode": "regex",
> + "label": "TPMI2CInvalidBusID",
> + "last_tag": "v5.8",
> + "log_level": "LOG_LEVEL_CRITICAL",
> + "pattern": "TPM: Invalid bus_id=[0-9a-f]+"
> + },
> + {
> + "advice": "OPAL failed to allocate memory for an i2c_request. This points to an OPAL bug as OPAL run out of memory and this should never happen.",
> + "compare_mode": "string",
> + "label": "TPMI2CAllocationFailed",
> + "last_tag": "v5.8",
> + "log_level": "LOG_LEVEL_CRITICAL",
> + "pattern": "TPM: i2c_alloc_req failed"
> + },
> + {
> + "advice": "Hostboot creates the ibm,secureboot node and the hash-algo property. Check that the ibm,secureboot node layout has not changed.",
> + "compare_mode": "regex",
> + "label": "ROMHashAlgorithmInvalid",
> + "last_tag": "v5.8",
> + "log_level": "LOG_LEVEL_CRITICAL",
> + "pattern": "ROM: hash-algo=.* not expected"
> + },
> + {
> + "advice": "NVLink not functional",
> + "compare_mode": "regex",
> + "label": "NPUisnInvalid",
> + "last_tag": "v5.8",
> + "log_level": "LOG_LEVEL_CRITICAL",
> + "pattern": "NPU[0-9]+: isn 0x[0-9a-f]+ not valid for this NPU"
> + },
> + {
> + "advice": "Firmware probably ran out of memory creating NPU slot. NVLink functionality could be broken.",
> + "compare_mode": "string",
> + "label": "NPUCannotCreatePHBSlot",
> + "last_tag": "v5.8",
> + "log_level": "LOG_LEVEL_CRITICAL",
> + "pattern": "NPU: Cannot create PHB slot"
> + },
> + {
> + "advice": "The HOMER base address for a chip was not valid. This means that OCC (On Chip Controller) will be non-functional and CPU frequency scaling will not be functional. CPU may be set to a safe, low frequency. Power savings in CPU idle or CPU hotplug may be impacted.",
> + "compare_mode": "regex",
> + "label": "OCCInvalidHomerBase",
> + "last_tag": "v5.8",
> + "log_level": "LOG_LEVEL_CRITICAL",
> + "pattern": "OCC: Chip: [0-9a-f]+ homer_base is not valid"
> + },
> + {
> + "advice": "The pstate table for a chip was not valid. This means that OCC (On Chip Controller) will be non-functional and CPU frequency scaling will not be functional. CPU may be set to a low, safe frequency. This means that CPU idle states and CPU frequency scaling may not be functional.",
> + "compare_mode": "regex",
> + "label": "OCCInvalidPStateTable",
> + "last_tag": "v5.8",
> + "log_level": "LOG_LEVEL_CRITICAL",
> + "pattern": "OCC: Chip: [0-9a-f]+ PState table is not valid"
> + },
> + {
> + "advice": "The pstate table for the first chip was not valid. This means that OCC (On Chip Controller) will be non-functional. This means that CPU idle states and CPU frequency scaling will not be functional as OPAL doesn't populate the device tree with pstates in this case.",
> + "compare_mode": "string",
> + "label": "OCCInvalidPStateTableDT",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": "NUVOTON: overflow on fifo read, c=[0-9]+, bc=[0-9]+, bl=[0-9]+"
> + "pattern": "OCC: PState table is not valid"
> },
> {
> - "advice": "Either the tpm device or the tpm-i2c interface doesn't seem to be working properly. Check the return code (rc) for further details.",
> + "advice": "The PState table layout version is not supported in P9. So OPAL will not parse the PState table. CPU frequency scaling will not be functional as frequency and pstate-ids are not added to DT.",
> "compare_mode": "regex",
> - "label": "TPMReadFifo",
> - "last_tag": "skiboot-5.6.0",
> + "label": "OCCInvalidVersion02",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": "NUVOTON: fail to read fifo, count=[0-9]+, rc=[0-9]+"
> + "pattern": "OCC: Version [0-9a-f]+ is not supported in P9"
> },
> {
> - "advice": "TPM device is not initialized. This indicates a bug in the tpm_transmit() caller",
> - "compare_mode": "string",
> - "label": "TPMDeviceNotInitialized",
> - "last_tag": "skiboot-5.6.0",
> + "advice": "The PState table layout version is not supported in P8. So OPAL will not parse the PState table. CPU frequency scaling will not be functional as frequency and pstate-ids are not added to DT.",
> + "compare_mode": "regex",
> + "label": "OCCInvalidVersion90",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": "TPM: tpm device not initialized"
> + "pattern": "OCC: Version [0-9a-f]+ is not supported in P8"
> },
> {
> - "advice": "Either the tpm device or the tpm-i2c interface doesn't seem to be working properly. Check the return code (rc) for further details.",
> + "advice": "The PState table layout version is not supported. So OPAL will not parse the PState table. CPU frequency scaling will not be functional as OPAL doesn't populate the device tree with pstates.",
> "compare_mode": "regex",
> - "label": "TPMWriteGo",
> - "last_tag": "skiboot-5.6.0",
> + "label": "OCCUnsupportedVersion",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": "NUVOTON: fail to write sts.go, rc=[0-9]+"
> + "pattern": "OCC: Unsupported pstate table layout version [0-9]+"
> },
> {
> - "advice": "Either the tpm device or the tpm-i2c interface doesn't seem to be working properly. Check the return code (rc) for further details.",
> + "advice": "The min pstate is greater than the max pstate, this could be due to corrupted/invalid data in OCC-OPAL shared memory region. So OPAL has not added pstates to device tree. This means that CPU Frequency management will not be functional in the host.",
> "compare_mode": "regex",
> - "label": "TPMReleaseTpm",
> - "last_tag": "skiboot-5.6.0",
> + "label": "OCCInvalidPStateLimits",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": "NUVOTON: fail to release tpm, rc=[0-9]+"
> + "pattern": "OCC: Invalid pstate limits. Pmin([0-9]+) > Pmax ([0-9]+)"
> },
> {
> - "advice": "Hostboot creates the ibm,secureboot node and the hash-algo property. Check that the ibm,secureboot node layout has not changed.",
> + "advice": "The nominal pstate is greater than the max pstate, this could be due to corrupted/invalid data in OCC-OPAL shared memory region. So OPAL has limited the nominal pstate to max pstate.",
> "compare_mode": "regex",
> - "label": "ROMHashAlgorithmInvalid",
> - "last_tag": "skiboot-5.6.0",
> + "label": "OCCInvalidNominalPState",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": "ROM: hash-algo=.* not expected"
> + "pattern": "OCC: Clipping nominal pstate([0-9]+) to Pmax([0-9]+)"
> },
> {
> - "advice": "tpm_i2c_request_send was passed an invalid bus ID. This indicates a tb_init() bug.",
> + "advice": "The number of pstates is outside the valid range (currently <=1 or > 128), so OPAL has not added pstates to the device tree. This means that OCC (On Chip Controller) will be non-functional. This means that CPU idle states and CPU frequency scaling will not be functional.",
> "compare_mode": "regex",
> - "label": "TPMI2CInvalidBusID",
> - "last_tag": "skiboot-5.6.0",
> + "label": "OCCInvalidPStateRange",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": "TPM: Invalid bus_id=[0-9a-f]+"
> + "pattern": "OCC: OCC range is not valid; No of pstates = [0-9]+"
> },
> {
> - "advice": "OPAL failed to allocate memory for an i2c_request. This points to an OPAL bug as OPAL run out of memory and this should never happen.",
> + "advice": "Device tree node /ibm,opal/power-mgt not found. OPAL didn't add pstate information to device tree. Probably a firmware bug.",
> "compare_mode": "string",
> - "label": "TPMI2CAllocationFailed",
> - "last_tag": "skiboot-5.6.0",
> + "label": "OCCDTNodeNotFound",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": "TPM: i2c_alloc_req failed"
> + "pattern": "OCC: dt node /ibm,opal/power-mgt not found"
> },
> {
> - "advice": "Hostboot creates the ibm,secureboot node and the hash-algo property. Check that the ibm,secureboot node layout has not changed.",
> + "advice": "Failed to create /ibm,opal/power-mgt/occ. Per-chip pstate properties are not added to Device Tree.",
> "compare_mode": "regex",
> - "label": "ROMHashAlgorithmInvalid",
> - "last_tag": "skiboot-5.6.0",
> + "label": "OCCDTFailedNodeCreation",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": "ROM: hash-algo=.* not expected"
> + "pattern": "OCC: Failed to create /ibm,opal/power-mgt/occ@[0-9a-f]+"
> },
> {
> - "advice": "opal_i2c_request was passed an invalid bus ID. This has likely come from the OS rather than OPAL and thus could indicate an OS bug rather than an OPAL bug.",
> + "advice": "ENOMEM while allocating OCC load message. OCCs not started, consequently no power/frequency scaling will be functional.",
> "compare_mode": "string",
> - "label": "I2CInvalidBusID",
> - "last_tag": "skiboot-5.6.0",
> + "label": "OCCload_reqENOMEM",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": "I2C: Invalid 'bus_id' passed to the OPAL"
> + "pattern": "OCC: Could not allocate occ_load_req"
> },
> {
> - "advice": "OPAL failed to allocate memory for an i2c_request. This points to an OPAL bug as OPAL ran out of memory and this should never happen.",
> - "compare_mode": "string",
> - "label": "I2CFailedAllocation",
> - "last_tag": "skiboot-5.6.0",
> + "advice": "Invalid request for loading OCCs. Power and frequency management not functional",
> + "compare_mode": "regex",
> + "label": "OCCLoadInvalidScope",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": "I2C: Failed to allocate 'i2c_request'"
> + "pattern": "OCC: Load message with invalid scope 0x[0-9a-f]+"
> },
> {
> - "advice": "HBRT triggered assert: you need to debug HBRT",
> - "compare_mode": "string",
> - "label": "HBRTassert",
> - "last_tag": "skiboot-5.6.0",
> + "advice": "Invalid request for resetting OCCs. Power and frequency management not functional",
> + "compare_mode": "regex",
> + "label": "OCCResetInvalidScope",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": "HBRT: Assertion from hostservices"
> + "pattern": "OCC: Reset message with invalid scope 0x[0-9a-f]+"
> },
> {
> - "advice": "Firmware should have aborted boot",
> + "advice": "OPAL failed to add the power-mgt device tree node. This could mean that firmware ran out of memory, or there's a bug somewhere.",
> "compare_mode": "string",
> - "label": "HBRTlidLoadFail",
> - "last_tag": "skiboot-5.6.0",
> + "label": "CreateDTPowerMgtNodeFail",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": "HBRT: LID Load failed"
> + "pattern": "creating dt node /ibm,opal/power-mgt failed"
> },
> {
> - "advice": "You are running in manufacturing mode. This mode should only be enabled in a factory during manufacturing.",
> + "advice": "An error condition occurred in sleep/winkle engines timer state machine. Dumping debug information to root-cause. OPAL/skiboot may be stuck on some operation that requires SLW timer state machine (e.g. core powersaving)",
> "compare_mode": "string",
> - "label": "ManufacturingMode",
> - "last_tag": "skiboot-5.6.0",
> - "log_level": "LOG_LEVEL_MEDIUM",
> - "pattern": "PLAT: Manufacturing mode ON"
> + "label": "SLWRegisterDump",
> + "last_tag": "v5.8",
> + "log_level": "LOG_LEVEL_LOW",
> + "pattern": "SLW: Register state:"
> },
> {
> - "advice": "OPAL could not find an NVRAM partition on the system flash. Check that the system flash has a valid partition table, and that the firmware build process has added a NVRAM partition.",
> + "advice": "The SLeep/Winkle Engine (SLW) failed to increment the generation number within our timeout period (it *should* have done so within ~10us, not >1ms. OPAL uses the SLW timer to schedule some operations, but can fall back to the (much less frequent OPAL poller, which although does not affect functionality, runs *much* less frequently. This could have the effect of slow I2C operations (for example). It may also mean that you *had* an increase in jitter, due to slow interactions with SLW. This error may also occur if the machine is connected to via soft FSI.",
> "compare_mode": "string",
> - "label": "NVRAMNoPartition",
> - "last_tag": "skiboot-5.6.0",
> + "label": "SLWTimerStuck",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": "FLASH: Can't parse ffs info for NVRAM"
> + "pattern": "SLW: timer stuck, falling back to OPAL pollers. You will likely have slower I2C and may have experienced increased jitter."
> },
> {
> - "advice": "OPAL Found multiple system flash. Since we've already found a system flash we are going to use that one but this ordering is not guaranteed so may change in future.",
> + "advice": "OPAL marked a Centaur (memory buffer) as offline due to CENTAUR_ERR_OFFLINE_THRESHOLD (10) consecutive errors on XSCOMs to this centaur. OPAL will now return OPAL_XSCOM_CTR_OFFLINED and not try any further XSCOMs. This is likely caused by some hardware issue or PRD recovery issue.",
> "compare_mode": "regex",
> - "label": "SystemFlashMultiple",
> - "last_tag": "skiboot-5.6.0",
> - "log_level": "LOG_LEVEL_HIGH",
> - "pattern": "FLASH: Attempted to register multiple system flash: .*"
> + "label": "CentaurOfflinedTooManyErrors",
> + "last_tag": "v5.8",
> + "log_level": "LOG_LEVEL_CRITICAL",
> + "pattern": "CENTAUR: Offlined [0-9a-f]+ due to > [0-9]+ consecutive XSCOM errors. No more XSCOMs to this centaur."
> },
> {
> - "advice": "System flash isn't formatted as expected. This could mean several OPAL utilities do not function as expected. e.g. gard, pflash.",
> + "advice": "No GPU/NPU2 slot information was found. NVLink2 functionality will not work.",
> "compare_mode": "string",
> - "label": "NoFFS",
> - "last_tag": "skiboot-5.6.0",
> - "log_level": "LOG_LEVEL_HIGH",
> - "pattern": "FLASH: No ffs info; using raw device only"
> + "label": "NPUNoPHBSlotLabel",
> + "last_tag": "v5.8",
> + "log_level": "LOG_LEVEL_CRITICAL",
> + "pattern": "NPU2: Cannot find GPU slot information"
> },
> {
> - "advice": "No system flash was found. Check for missing calls flash_register(...).",
> + "advice": "Firmware probably ran out of memory creating NPU2 slot. NVLink functionality could be broken.",
> + "compare_mode": "string",
> + "label": "NPUCannotCreatePHBSlot",
> + "last_tag": "v5.8",
> + "log_level": "LOG_LEVEL_CRITICAL",
> + "pattern": "NPU2: Cannot create PHB slot"
> + },
> + {
> + "advice": "xscom_read was called with an invalid partid. There's likely a bug somewhere in the stack that's causing someone to try an xscom_read on something that isn't a processor, Centaur or EX chiplet.",
> "compare_mode": "regex",
> - "label": "SystemFlashNotFound",
> - "last_tag": "skiboot-5.6.0",
> - "log_level": "LOG_LEVEL_HIGH",
> - "pattern": "FLASH: Can't load resource id:[0-9]+. No system flash found"
> + "label": "XSCOMReadInvalidPartID",
> + "last_tag": "v5.8",
> + "log_level": "LOG_LEVEL_CRITICAL",
> + "pattern": ".*: invalid XSCOM partid 0x[0-9a-f]+"
> },
> {
> - "advice": "OPAL was called with a bad token. On POWER8 and earlier, Linux kernels had a bug where they wouldn't check if firmware supported particular OPAL calls before making them. It is, in fact, harmless for these cases. On systems newer than POWER8, this should never happen and indicates a kernel bug where OPAL_CHECK_TOKEN isn't being called where it should be.",
> + "advice": "xscom_write was called with an invalid partid. There's likely a bug somewhere in the stack that's causing someone to try an xscom_write on something that isn't a processor, Centaur or EX chiplet.",
> "compare_mode": "regex",
> - "label": "OPALBadToken",
> - "last_tag": "skiboot-5.6.0",
> + "label": "XSCOMWriteInvalidPartID",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": "OPAL: Called with bad token [0-9]+ !"
> + "pattern": ".*: invalid XSCOM partid 0x[0-9a-f]+"
> },
> {
> - "advice": "Currently removing a poller is DANGEROUS and MUST NOT be done in production firmware.",
> + "advice": "OPAL doesn't know that the BMC supports PNOR access commands. This will be a bug in the OPAL support for this BMC.",
> "compare_mode": "string",
> - "label": "UnsupportedOPALdelpoller",
> - "last_tag": "skiboot-5.6.0",
> + "label": "PNORAccessYeahButNoBut",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": "WARNING: Unsupported opal_del_poller. Interesting locking issues, don't call this."
> + "pattern": "PNOR BUG: access requested but BMC doesn't support request"
> },
> {
> - "advice": "Recursion detected in opal_run_pollers(). This indicates a bug in OPAL where a poller ended up running pollers, which doesn't lead anywhere good.",
> - "compare_mode": "string",
> - "label": "OPALPollerRecursion",
> - "last_tag": "skiboot-5.6.0",
> + "advice": "In negotiating PNOR access with BMC, we got an odd/invalid request from the BMC. Likely a bug in OPAL/BMC interaction.",
> + "compare_mode": "regex",
> + "label": "InvalidPNORAccessRequest",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": "OPAL: Poller recursion detected."
> + "pattern": "invalid PNOR access requested: [0-9a-f]+"
> },
> {
> - "advice": "opal_run_pollers() was called with a lock held, which could lead to deadlock if not excessively lucky/careful.",
> - "compare_mode": "string",
> - "label": "OPALPollerWithLock",
> - "last_tag": "skiboot-5.6.0",
> + "advice": "Likely bug in what sent us the OCC reset.",
> + "compare_mode": "regex",
> + "label": "SELUnknownOCCReset",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": "Running pollers with lock held !"
> + "pattern": "SEL message to reset an unknown OCC (sensor ID 0x[0-9a-f]+)"
> },
> {
> - "advice": "Your firmware is buggy, see the 64 messages complaining about opal_run_pollers with lock held.",
> - "compare_mode": "string",
> - "label": "OPALPollerWithLock64",
> - "last_tag": "skiboot-5.6.0",
> + "advice": "Inconsistent state between OPAL and FSP in code path for handling failure of fetching error log from FSP. Likely a bug in interaction between FSP and OPAL.",
> + "compare_mode": "regex",
> + "label": "ElogFetchFailureInconsistent",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": "opal_run_pollers with lock run 64 times, disabling warning."
> + "pattern": ".*: Inconsistent internal list state !"
> },
> {
> - "advice": "OPAL will parse the Flattened Device Tree(FDT), which can be generated from different firmware sources. During expansion of FDT, OPAL observed a node assigned multiple times (a duplicate). This indicates either a Hostboot bug *OR*, more likely, a bug in the platform XML. Check the platform XML for duplicate IDs for this type of device. Because of this duplicate node, OPAL won't add the hardware device found with a duplicate node ID into DT, rendering the corresponding device not functional.",
> + "advice": "Bug in interaction between FSP and OPAL. We expected there to be a pending read from FSP but the list was empty.",
> "compare_mode": "regex",
> - "label": "DTHasDuplicateNodeID",
> - "last_tag": "skiboot-5.6.0",
> + "label": "ElogQueueInconsistent",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": "DT: Found duplicate node: .*"
> + "pattern": ".*: Inconsistent internal list state !"
> },
> {
> - "advice": "OPAL hit an assert(). During normal usage (even testing) we should never hit an assert. There are other code paths for controlled shutdown/panic in the event of catastrophic errors.",
> + "advice": "We expected there to be an entry in the list of error logs for the error log we're fetching information for. There wasn't. This means there's a bug.",
> "compare_mode": "regex",
> - "label": "FailedAssert",
> - "last_tag": "skiboot-5.6.0",
> + "label": "ElogInfoInconsistentState",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": "Assert fail: .*"
> + "pattern": ".*: Inconsistent internal list state !"
> },
> {
> - "advice": "Device tree didn't contain enough information to correctly report back PCI inventory. Service processor is likely to be missing information about what hardware is physically present in the machine.",
> + "advice": "Inconsistent state while reading error log from FSP. Bug in OPAL and FSP interaction.",
> "compare_mode": "regex",
> - "label": "FirenzePCIInventory",
> - "last_tag": "skiboot-5.6.0",
> + "label": "ElogReadInconsistentState",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": "No chip device node for PHB[0-9a-f]+"
> + "pattern": ".*: Inconsistent internal list state !"
> },
> {
> - "advice": "More PCI inventory than we can send to service processor. The service processor will have an incomplete view of the world.",
> + "advice": "Error in acknowledging heartbeat to FSP. This could mean the FSP has gone away or it may mean the FSP may kill us for missing too many heartbeats.",
> + "compare_mode": "string",
> + "label": "FSPHeartbeatAckError",
> + "last_tag": "v5.8",
> + "log_level": "LOG_LEVEL_CRITICAL",
> + "pattern": "SURV: Heartbeat Acknowledgment error from FSP"
> + },
> + {
> + "advice": "Failed to update System Attention Indicator. Likely means some bug with OPAL interacting with FSP.",
> "compare_mode": "regex",
> - "label": "FirenzePCIInventoryTooLarge",
> - "last_tag": "skiboot-5.6.0",
> + "label": "FSPSAIFailed",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": "Inventory ([0-9]+ bytes) too large"
> + "pattern": "Update SAI cmd failed [rc=[0-9]+]."
> },
> {
> - "advice": "Error communicating with service processor when sending PCI Inventory.",
> + "advice": "OPAL ran out of memory while trying to allocate an FSP message in SAI code path. This indicates an OPAL bug that caused OPAL to run out of memory.",
> "compare_mode": "regex",
> - "label": "FirenzePCIInventoryError",
> - "last_tag": "skiboot-5.6.0",
> + "label": "SAIMallocFail",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": "Error [0-9]+ sending inventory"
> + "pattern": ".*: Memory allocation failed."
> },
> {
> - "advice": "On Firenze platforms, I2C is used to control power to PCI slots. Errors here mean we may be in trouble in regards to PCI slot power on/off.",
> + "advice": "Error in queueing message to FSP in SAI code path. Likely an OPAL bug.",
> "compare_mode": "regex",
> - "label": "FirenzePCII2CError",
> - "last_tag": "skiboot-5.6.0",
> + "label": "SAIQueueFail",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": "Error [0-9]+ from I2C request on slot [0-9a-f]+"
> + "pattern": ".*: Failed to queue the message"
> },
> {
> - "advice": "Likely a coding error: invalid I2C request.",
> + "advice": "Possibly an error on FSP side, OPAL failed to read state from FSP.",
> "compare_mode": "regex",
> - "label": "FirenzePCII2CInvalid",
> - "last_tag": "skiboot-5.6.0",
> + "label": "FSPSAIGetFailed",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": "Invalid I2C request [0-9]+ on slot [0-9a-f]+"
> + "pattern": "Read real SAI cmd failed [rc = 0x[0-9a-f]+]."
> },
> {
> - "advice": "The Firenze platform uses I2C to control power to PCI slots. Something went wrong in the state machine controlling that. Slots may/may not have power.",
> + "advice": "OPAL ran out of memory: OPAL bug.",
> "compare_mode": "regex",
> - "label": "FirenzePCISlotI2CStateError",
> - "last_tag": "skiboot-5.6.0",
> + "label": "FSPGetSAIMallocFail",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": "Wrong state [0-9a-f]+ on slot [0-9a-f]+"
> + "pattern": ".*: Memory allocation failed."
> },
> {
> - "advice": "Unexpected state in the FIRENZE PCI Slot state machine. This could mean PCI is not functioning correctly.",
> + "advice": "Failed to queue message to FSP: OPAL bug",
> "compare_mode": "regex",
> - "label": "FirenzePCISlotGPowerState",
> - "last_tag": "skiboot-5.6.0",
> + "label": "FSPGetSAIQueueFail",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": "[0-9a-f]+ GPOWER: Unexpected state [0-9a-f]+"
> + "pattern": ".*: Failed to queue the message"
> },
> {
> - "advice": "Unexpected state in the FIRENZE PCI Slot state machine. This could mean PCI is not functioning correctly.",
> + "advice": "OPAL failed to allocate memory for FSP LED command. Likely an OPAL bug led to out of memory.",
> + "compare_mode": "string",
> + "label": "FSPLEDRequestMallocFail",
> + "last_tag": "v5.8",
> + "log_level": "LOG_LEVEL_CRITICAL",
> + "pattern": "SPCN set command node allocation failed"
> + },
> + {
> + "advice": "OPAL had trouble creating the sensor nodes in the device tree as there was already one there. This indicates either the device tree from Hostboot already filled in sensors or an OPAL bug.",
> "compare_mode": "regex",
> - "label": "FirenzePCISlotSPowerState",
> - "last_tag": "skiboot-5.6.0",
> + "label": "OPALSensorNodeExists",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": "[0-9a-f]+ SPOWER: Unexpected state [0-9a-f]+"
> + "pattern": "SENSOR: node .* exists"
> },
> {
> - "advice": "AST BMC is likely to be non-functional when accessed from host.",
> + "advice": "Bug in skiboot/FSP code for EPOW event handling",
> "compare_mode": "string",
> - "label": "ASTBMCFailedSetEnables",
> - "last_tag": "skiboot-5.6.0",
> + "label": "EPOWSignatureMismatch",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": "ASTBMC: failed to set enables"
> + "pattern": "Signature mismatch"
> },
> {
> - "advice": "HDAT described an invalid size for timebase, which means there's a disagreement between HDAT and OPAL. This is most certainly a firmware bug.",
> + "advice": "Queueing a message from OPAL to FSP failed. This is likely due to either an OPAL bug or the FSP going away.",
> + "compare_mode": "string",
> + "label": "EPOWMessageQueueFailed",
> + "last_tag": "v5.8",
> + "log_level": "LOG_LEVEL_CRITICAL",
> + "pattern": "OPAL EPOW message queuing failed"
> + },
> + {
> + "advice": "Bug in interaction between FSP and OPAL. The state maintained by OPAL didn't match what the FSP sent.",
> "compare_mode": "regex",
> - "label": "HDATBadTimebaseSize",
> - "last_tag": "skiboot-5.6.0",
> + "label": "ElogListInconsistent",
> + "last_tag": "v5.8",
> "log_level": "LOG_LEVEL_CRITICAL",
> - "pattern": "HDAT: Bad timebase size [0-9]+ @ (0x[0-9a-f]+|nil)"
> + "pattern": ".*: Inconsistent internal list state !"
> }
> ]
> }
>
Acked-by: Alex Hung <alex.hung at canonical.com>
More information about the fwts-devel
mailing list