What should be done when a NuoDB storage manager (SM) fails to start? What steps should be taken when starting to avoid corruption?
In most cases, restarting the engine is sufficient to recover from an SM crash. However, certain situations can leave the underlying archive (data) in an invalid state. This is referred to as "corruption". NuoDB supplies the nuoarchive tool, which can validate and attempt to repair database archives.
If an SM fails to start, the first step should be to run this command in dry run mode, pointing to the archive directory:
nuoarchive check --dry-run /var/opt/nuodb/production-archives/test
The above command will print all problems which are found and which of them could be fixed. If the issues are reported fixable, then run nuoarchive tool with --repair flag:
nuoarchive check --repair /var/opt/nuodb/production-archives/test
In production mode, the database size could cause nuoarchive to take more time to run on the archive data. More memory can be allocated by using --mem flag. If the journal directory is in a separate location, it could be specified using the --journal-dir fag.
More information can be found here.
If nuoarchive is unable to repair the archive, the other SM archives should be analyzed. If there are no repairable archives, it's necessary to restore from a valid backup.
It's recommended that all backups be checked with nuoarchive after being taken to ensure validity.