0

docker start ... after unmanaged shutdown

I have created a programmatic way to create a NuoDB database for testing purposes.  It is a Java library that I use to manage docker containers for Postgres, MySql, SQL Server etc and now NuoDB.

The use of this library is to assist automated testing in CI servers and locally for developers.  It means that developers can just run their tests via IDE or command line (maven/gradle) and this library will automatically start/run docker containers as necessary and setup the database, schema ready for the tests to run.

For CI servers we shutdown and remove the docker containers.

For use running local development we keep the docker containers running (it only starts/runs the docker containers if they are not already running).

I think I have it working well now with NuoDB apart from the case where shutdown is unmanaged.  A nice shutdown of the containers means that the library issues a `nuocmd shutdown database` command.

An unmanaged shutdown is when the develop just does shutdown on their laptop or we just do docker stop nuo_admin nuo_sm nuo_tm (so we just stop the containers in a fairly brutal unordered way).

When we bring the docker containers back up via docker start we first start the admin container (no problem) ... then start the storage manager container ... then the transaction manager.

Sometimes the transaction manager and storage manager do not come up properly and when I look in the logs it looks like they have received a request to shutdown:

AdminService::shutdownServer received shutdown request

It is as if the AD service didn't get to fire out the shutdown request (when all 3 docker containers where brought down brutally) ... but then send the shutdown request later after the docker start (when the command are trying to start the database).

 

SM Logs:

2019-08-29T10:58:28.533+0000 [38] (testdb node 1) Change in inter-node network detected by node 1 (SM), node list now: 1 (SM), 2 (TE) (node registered: 2 source node: 1)
2019-08-29T10:59:06.416+0000 [38] (testdb node 1) Listener for node 2 -> 1 failed: connection closed by 172.23.0.4:38888
2019-08-29T10:59:06.416+0000 [38] (testdb node 1) Connection to node id 2 (TE, 172.23.0.4:48006) closed: connection closed by 172.23.0.4:38888
2019-08-29T10:59:06.416+0000 [38] (testdb node 1) Connection from node 172.23.0.4 died: connection closed by 172.23.0.4:38888
2019-08-29T10:59:06.416+0000 [38] (testdb node 1) Unsent queue size to node 2 on shutdown is 0, buffer size is 117
2019-08-29T10:59:06.517+0000 [38] (testdb node 1) Change in inter-node network detected by node 1 (SM), node list now: 1 (SM) (node de-registered: 2)
2019-08-29T10:59:06.720+0000 [38] (testdb node 1) Lost connection to local agent
2019-08-29T10:59:16.873+0000 [38] (testdb node 1) AdminService::shutdownServer received shutdown request

...

 

TE Logs:

2019-08-29T10:59:06.289+0000 [38] (testdb node 2) AdminService::shutdownServer received shutdown request
2019-08-29T10:59:06.289+0000 [38] (testdb node 2) Node state transition from Running to ShuttingDown
2019-08-29T10:59:06.416+0000 [38] (testdb node 2) Connection from node nuodb_sm.nuodb-net died: Shutdown in progress

 

What I possibly need to do is docker start the AD container ... and then reset it's status somehow such that it does not send out the shutdown request.

Do you know if this sounds reasonable or have any suggestions?  

Thanks, Rob.

2 comments

Please sign in to leave a comment.