Intro

Quoting from the website: "Home Assistant is an open-source home automation platform running on Python 3. Track and control all devices at home and automate control."

I've been using Home Assistant for a while now, and it quickly grew on me as my favorite home automation system. Before Home Assistant, I was using my own home-grown solution, and before that I was using OpenHab. OpenHab was great, but the configuration and scripting had a steep learning curve. Home Assistant is all Python, so it's already easier (for me) to extend it for any features I need that aren't already included.

At some point I do plan on doing a detailed write-up of what I'm using Home Assistant for, but this article is meant as a quick solution to a recent problem I've been experiencing.

The Problem

In the last month or so, I've noticed Home Assistant will become unresponsive. The process is still running, but all automations halt and the UI is no longer accessible. Admittedly I have not debugged this in depth, mostly due to lack of time. However, it happens frequently enough (once every day or two) that it is a major annoyance. A few parts of our daily routine rely on Home Assistant. Things like turning lights on and off automatically no longer happens, and then we have to remember how to use light switches again.

Prerequisites

  • Already have a working Home Assistant installation
  • Using the official Home Assistant docker image
  • Using docker-compose to bring up the container
  • Not using any orchestration tools that probably have their own health checks and monitoring for restarting unhealthy containers

Adding the health check

Docker-compose added support for the healthcheck instruction in version 2.1 of the compose file format, so you'll need to be using at least that. My Home assistant configuration.yaml includes a http section like this which defines a custom port:

http:
  base_url: 'hass.my.lan:8100'
  server_port : 8100

By default, server_port is 8123 so you can use that if you haven't changed it in your configuration. In my docker-compose.yml file, I added a healthcheck section which looks like this:

    healthcheck:
      test: "curl --connect-timeout 10 --silent -f http://127.0.0.1:8100/ || exit 1"
      interval: 45s
      timeout: 30s
      retries: 3

This goes under the service definition at the same level as image, volumes, or ports would be in the compose file. If curl is succesful in requesting the / url, it will return exit code 0 and the container will be marked as healthy. If it returns anything other than 0, docker will retry 3 more times and mark it as unhealthy if it still does not respond appropriately.

My entire docker-compose file looks similar to this for the Home Assistant container:

version: '3'
services:
  home-assistant:
    image: homeassistant/home-assistant:latest
    volumes:
      - "/srv/hass:/config"
      - "/srv/hass/options.xml:/usr/local/lib/python3.6/site-packages/python_openzwave/ozw_config/options.xml"
      - "/srv/hass/zwcfg_0xd299b900.xml:/usr/local/lib/python3.6/site-packages/python_openzwave/ozw_config/zwcfg_0xd299b900.xml"
      - "/etc/localtime:/etc/localtime:ro"
    ports:
      - "3478:3478"
      - "3478:3478/udp"
      - "5228:5228"
      - "8080:8080"
      - "8100:8100"
      - "8443:8443"
      - "8989:8989"
      - "5353:5353/udp"
      - "45772:45772/udp"
      - "50039:50039/udp"
      - "30000:30000/udp"
      - "34790:34790/udp"
    network_mode: "host"
    restart: always
    healthcheck:
      test: "curl --connect-timeout 10 --silent -f http://127.0.0.1:8100/ || exit 1"
      interval: 45s
      timeout: 30s
      retries: 3
    devices:
      - "/dev/ttyACM0:/dev/ttyACM0:rwm"

Once you use docker-compose up -d to bring the container up, you should see the container as "healthy" in the docker ps output.

$ docker ps | grep home
8311a7877ecb        homeassistant/home-assistant:latest             "python -m homeass..."   17 hours ago        Up 17 hours (healthy) hass_home-assistant_1

You can also use docker inspect to determine the health of the container:

$ docker inspect hass_home-assistant_1 | jq -r ".[].State.Health.Status"
healthy

Automatically restarting when unhealthy

When healthchecks were added to Docker, the ability to restart a container automatically on failure was originally rejected.

Since this isn't a built-in feature, a kind soul on the internet created docker-autoheal which watches for containers marked as unhealthy, and restarts them. You can follow the instructions in the readme to run the container, or you can clone the repo to a directory and use this docker-compose file to bring it up:

autoheal:
  build: .
  restart: always
  volumes:
    - "/var/run/docker.sock:/tmp/docker.sock"
  environment:
    - AUTOHEAL_CONTAINER_LABEL=all

The AUTOHEAL_CONTAINER_LABEL=all env variable means it will monitor all containers, not just ones with a specific label. Once running, check docker-compose logs for errors. Assuming no errors, it will wait for a container to become unhealthy and then restart it.

Conclusion

I've been using the configuration mentioned here for close to a week now, and docker-autoheal has restarted HASS for me several times.

Not having to worry about whether automations will trigger is a huge benefit, and will let me wait even longer to debug the issue that causes HASS to stop responding. Even if I wasn't experiencing regular problems, having Home Assistant automatically restart when unresponsive helps re-assure me that my home automation will be more reliable.

Resources