Troubleshoot failed containers

Problem

There are times that a workspace can fails to load because it crashes shortly after launch.

From the user’s perspective launching a workspace would seem to work correctly, but end in returning the user to the user dashboard.

From the kasm administrators perspective you might see error logs referring to the container or workspace is “is restarting, wait until the container is running".

To diagnose the root cause of this error review the docker logs from the workspace’s docker container.

In a deploy with a limited number of agents this is easy to do because the agent where the workspace will attempt provision can be guessed. To definitely know where a workspace will launch, you edit the workspace to set the “Restrict to Agent” dropdown to a specific agent.

NOTE: That restricting a workspace to a single agent will effect all users who try to use the same workspace. One method to limit this risk is to clone the workspace and restrict the cloned version of the workspace. The disadvantage with cloning a workspace is it will require you copy all existing file mappings and the persistent profile path might change. If the persistent profile or file mapping is the root cause of your workspace crashes, making these modifications to the clone might fix the problem on the clone which might make determining the root cause impossible when using the clone..

Solution

To troubleshoot a failed workspace container review the docker logs to determine where the container is failing.

SSH to the Kasm Agent where the workspace will run. If you are unsure, you can apply this to all Kasm Agents.
Delete all workspace sessions from the UI.
Remove any excess auto generated nginx confs rm -f /opt/kasm/current/conf/nginx/containers.d/*
Stop the Kasm Services sudo /opt/kasm/bin/stop
Modify /opt/kasm/current/conf/app/agent.app.config.yaml set remove_failed_containers to false.
Start the services: sudo /opt/kasm/bin/start.
Repeat as necessary for the remaining Kasm Agents.
Launch a the failing workspace using the UI.
View the docker logs for the container: (Replace the “CONTAINER_ID” in the docker logs command with the correct container ID.)
1. #Get the CONTAINER_ID sudo docker ps -a #Get the docker logs for the CONTAINER_ID sudo docker logs -n 1000 -f CONTAINER_ID #Get the docker logs for the CONTAINER_ID and save to file. sudo docker logs -n 1000 -f CONTAINER_ID 2>&1 | tee container.log
If it fails tail the logs on the services and review for the point where the errors started.
The starting point of the errors will likely be a script, missing file/directory, permissions denial, or system resource error.
1. The most common cause of workspace crashes are missing file references in a persistent profile or modifications to an startup script (ie: the container’s entry point).
If you determine the error to be caused by a Kasm provided workspace please contact Kasm Customer Support to report the bug.

Impacted Versions:

Issues were first observed on the following versions, but may impact other versions as well.

1.12.0+

Resolved Versions:

The issue is no longer applicable for the following and newer versions.

None

Related Docs:

None