How I Learned to Stop Worrying and Love Couchbase?

TL;DR 🤓

echo "ns_1@$(cat /proc/sys/kernel/random/uuid).\${!DNS_PRIVATE}" > /opt/couchbase/var/lib/couchbase/couchbase-server.node\
  && exec /entrypoint.sh couchbase-server

I inherited an Ansible playbook with Couchbase resources, which unfortunately did not survive an indempotency test. Since the infrastructure was aleady hosted in AWS on EC2 instances, I decided to replace it with a CloudFormation stack and run the database as Docker containers on ECS. We are also using vendor maintained Dockerfile to keep things simple, which at the time of writing deploys Community Edition 6.0.0 build 1693.

This approach posed a number of challenges, since Couchbase doesn't play well in dynamic environment, especially with changing IP addresses. Also, the Couchbase blog article on this subject is incomplete. So, the following describes is some detail what I've done to solve (work-around) these challenges.

In broad terms, this solution is designed to automatically scale Couchbase cluster as ECS container instances are added/removed by EC2 Auto Scaling (group). In our case, this is performend by a tiny orchestrator container called CouchbaseHelper. This container utilises EFS shared storage to manage cluster state and takes care of initialising the cluster, creating indexes, seeding data and adding/removing Couchbase servers (re-balancing). Local volume mapped storage on ECS container instances is used for Couchbase container data. We use awsvpc network mode for Couchbase service, enabling the containers to be assigned VPC (private) IPs, instead of local Docker bridge IPs. There is always only one helper container per ECS cluster and one Couchbase container per ECS container instance enforced using ECS service scheduling strategy, e.g:

  CouchbaseECSService:
    Type: 'AWS::ECS::Service'
    Properties:
      SchedulingStrategy: 'DAEMON'
...

As a starting point, I've used the excellent CFN template by https://cloudonaut.io/. This template builds out an ECS cluster, with (almost) everything required for extension. In my fork, I've added a number of stacks exports required to extend the solution further.

We start by creating:

  • shared resources (e.g. VPC, IAM, ACM, ECR, S3, Route53, CloudWatch, etc.)
  • EFS (NFS) shared storage
  • ECS cluster

Having a main.yml CFN stack containing nested resources of Type: 'AWS::CloudFormation::Stack' is a relatively scalable way to organise your software stack, following is a stub example of what your main.yml parent template may look like.

---
AWSTemplateFormatVersion: '2010-09-09'
Description: Couchbase

Metadata:
  'AWS::CloudFormation::Interface':
    ParameterGroups:
    - Label:
        default: 'Nested templates'
      Parameters:
      - VPCTemplate
      - ECSTemplate
      - R53Template
...

Parameters:
  VPCTemplate:
    Description: 'Nested template containing VPC resources.'
    Type: String
    Default: ''
  ECSTemplate:
    Description: 'Nested template containing ECS resources.'
    Type: String
    Default: ''
...

Conditions:
  HasVPC: !Not [ !Equals [ '', !Ref 'VPCTemplate' ]]
...

Resources:
  VPCStack:
    Type: 'AWS::CloudFormation::Stack'
    Condition: HasVPC
    Properties:
      TemplateURL: !Ref 'VPCTemplate'
      Parameters:
        NameTag: !Sub '${AWS::StackName}'
        ...

  ECSStack:
    Type: 'AWS::CloudFormation::Stack'
...

Outputs:
  StackName:
    Value: !Ref 'AWS::StackName'
    Export:
      Name: !Sub 'StackName-${AWS::StackName}'
  VPCStack:
    Condition: HasVPC
    Value: !GetAtt [ VPCStack, Outputs.VPCStackName ]
    Export:
      Name: !Sub 'VPCStackName-${AWS::StackName}'
...

The first pre-requisite for the orchestration to work is a private DNS namespace, where we can create unique DNS records for our Couchbase cluster nodes. While ECS automatically registers our containers using the AWS::ServiceDiscovery::PrivateDnsNamespace resource (which is effectively creates a private DNS hosted zone in Route53), this hosted zone doesn't allow us to add our own custom DNS records to it. So within our main stack we create a route53.yml nested template containing a private hosted zone.

---
AWSTemplateFormatVersion: '2010-09-09'
Description: 'Route53 resources'

Parameters:
  NameTag:
    Type: String
  HostedZone:
    Type: String
  VpcId:
    Type: String

Resources:
  PrivateHostedZone:
    Type: 'AWS::Route53::HostedZone'
    Properties:
      HostedZoneConfig:
        Comment: !Sub 'Private hosted zone for ${VpcId}.'
      Name:
        Fn::Join:
        - ''
        - - 'private.'
          - !Select [ 5, !Split [ '-', !Ref 'AWS::StackName' ]]
          - !Sub '.${HostedZone}.'
      VPCs:
      - VPCId: !Ref 'VpcId'
        VPCRegion: !Ref 'AWS::Region'
      HostedZoneTags:
      - Key: Name
        Value: !Ref 'NameTag'

Outputs:
  R53StackName:
    Value: !Ref 'AWS::StackName'
    Export:
      Name: !Sub 'R53StackName-${AWS::StackName}'
  PrivateHostedZone:
    Value: !Ref 'PrivateHostedZone'
    Export:
      Name: !Sub 'PrivateHostedZone-${AWS::StackName}'
  DNSPrivate:
    Value:
      Fn::Join:
      - '.'
      - - 'private'
        - !Select [ 5, !Split [ '-', !Ref 'AWS::StackName' ]]
        - !Ref 'HostedZone'
    Export:
      Name: !Sub 'DNSPrivate-${AWS::StackName}'

we assemble our DNS name using the unique alpha-numeric stack Id from AWS::StackName (e.g. private.y6w42p6ucx4m.grsThr!ve.com), so make sure to select the correct element from the split array

Next, within our main stack we nest our application stack (e.g. app.yml), which will contain all of our custom resources, such as ECS tasks, services and service discovery for Couchbase.

---
AWSTemplateFormatVersion: '2010-09-09'
Description: 'Application resources'

Parameters:
  NameTag:
    Type: String
...

Mappings:
  InstanceLookup:
    t3.medium:
      'DCPU': 512       # CPU limit
      'DMEM': 1920      # memory limit = (DATA+IDX+FTS) - reserve
      'DATA': 1024      # data memory size (Mb) = sum(bucket_memory)
      'IDX': 256        # index memory size (Mb)
      'FTS': 256        # full-text search memory size (Mb)
      # bucket-spec: "[<bucket_name>:<bucket_type>:<bucket_memory> ...]"
      'bucketspec': 'bucketA:couchbase:256 bucketB:couchbase:512 memcached:memcached:256'
...

Resources:
  ServiceDiscoveryNamespace:
    Type: 'AWS::ServiceDiscovery::PrivateDnsNamespace'
    Properties:
      Description: !Sub '${NameTag} discovery namespace.'
      Vpc: !Ref 'VpcId'
      Name:
        Fn::Join:
        - ''
        - - 'private.'
          - !Select [ 5, !Split [ '-', !Ref 'AWS::StackName' ]]
          - !Sub '.${HostedZone}.'

  CouchbaseDiscoveryService:
    Type: 'AWS::ServiceDiscovery::Service'
    Properties:
      Description: !Sub '${NameTag} Couchbase discovery service.'
      Name: !Sub '${NameTag}-couchbase'
      NamespaceId: !Ref 'ServiceDiscoveryNamespace'
      DnsConfig:
        DnsRecords:
        - Type: A
          TTL: 60
        NamespaceId: !Ref 'ServiceDiscoveryNamespace'
      HealthCheckCustomConfig:
        FailureThreshold: 1

  CouchbaseTaskDefinition:
    Type: 'AWS::ECS::TaskDefinition'
    Properties:
      Volumes:
      - Name: 'efs'
        Host:
          SourcePath: !Sub '/mnt/efs/${NameTag}'
      - Name: 'local'
        Host:
          SourcePath: !Sub '/opt/${NameTag}'
      - Name: 'local-couchbase-data'
        Host:
          SourcePath: !Sub '/opt/${NameTag}/couchbase-data'
      NetworkMode: awsvpc
      ContainerDefinitions:
      - Image: 'couchbase:community-6.0.0'
        Environment:
        - Name: NAME_TAG
          Value: !Sub '${NameTag}'
        - Name: AWS_REGION
          Value: !Ref 'AWS::Region'
        - Name: AWS_ACCOUNT_ID
          Value: !Ref 'AWS::AccountId'
        - Name: AWS_STACK_NAME
          Value: !Ref 'AWS::StackName'
        - Name: AWS_STACK_ID
          Value: !Ref 'AWS::StackId'
        - Name: ECS_CLUSTER
          Value: !Ref 'Cluster'
        - Name: PRIVATE_DNSNAME
          Value:
            Fn::Join:
            - '.'
            - - !Sub '${NameTag}-couchbase'
              - 'private'
              - !Select [ 5, !Split [ '-', !Ref 'AWS::StackName' ]]
              - !Sub '${HostedZone}'
        - Name: DNS_PRIVATE
          Value: !Ref 'DNSPrivate'
        Command:
        - '/local-data/couchbase-bootstrap.sh'
        Cpu: !FindInMap [ InstanceLookup, !Ref 'InstanceSize', 'DCPU' ]
        Memory: !FindInMap [ InstanceLookup, !Ref 'InstanceSize', 'DMEM' ]
        DockerLabels:
          Name: !Sub '${NameTag}-couchbase'
        Ulimits:
        - HardLimit: 70000
          Name: nofile
          SoftLimit: 70000
        Privileged: true
        LinuxParameters:
          Capabilities:
            Add:
            - ALL
        LogConfiguration:
          LogDriver: 'awslogs'
          Options:
            awslogs-group: !Ref 'LogGroup'
            awslogs-region: !Ref 'AWS::Region'
            awslogs-stream-prefix: 'couchbase-server'
        MountPoints:
        - ContainerPath: '/shared-data'
          SourceVolume: 'efs'
        - ContainerPath: '/local-data'
          SourceVolume: 'local'
        - ContainerPath: '/opt/couchbase/var'
          SourceVolume: 'local-couchbase-data'
        Name: 'couchbase-container'

  HelperTaskDefinition:
    Type: 'AWS::ECS::TaskDefinition'
    Properties:
      Volumes:
      - Name: 'efs'
        Host:
          SourcePath: !Sub '/mnt/efs/${NameTag}'
      - Name: 'local'
        Host:
          SourcePath: !Sub '/opt/${NameTag}'
      ContainerDefinitions:
      - Image: 'couchbase:community-6.0.0'
        Environment:
        - Name: NAME_TAG
          Value: !Sub '${NameTag}'
        - Name: PRIVATE_DNSNAME
          Value:
            Fn::Join:
            - '.'
            - - !Sub '${NameTag}-couchbase'
              - 'private'
              - !Select [ 5, !Split [ '-', !Ref 'AWS::StackName' ]]
              - !Sub '${HostedZone}'
        - Name: CB_MEM_DATA
          Value: !FindInMap [ InstanceLookup, !Ref 'InstanceSize', 'DATA' ]
        - Name: CB_MEM_INDEX
          Value: !FindInMap [ InstanceLookup, !Ref 'InstanceSize', 'IDX' ]
        - Name: CB_MEM_FTS
          Value: !FindInMap [ InstanceLookup, !Ref 'InstanceSize', 'FTS' ]
        - Name: CB_BUCKETS
          Value: !FindInMap [ InstanceLookup, !Ref 'InstanceSize', 'bucketspec' ]
        - Name: AWS_REGION
          Value: !Ref 'AWS::Region'
        - Name: AWS_ACCOUNT_ID
          Value: !Ref 'AWS::AccountId'
        - Name: AWS_STACK_NAME
          Value: !Ref 'AWS::StackName'
        - Name: AWS_STACK_ID
          Value: !Ref 'AWS::StackId'
        - Name: ECS_CLUSTER
          Value: !Ref 'Cluster'
        Command:
        - '/local-data/couchbase-init.sh'
        Cpu: 128
        Memory: 128
        DockerLabels:
          Name: !Sub '${NameTag}-helper'
        User: root
        Privileged: true
        LinuxParameters:
          Capabilities:
            Add:
            - ALL
        LogConfiguration:
          LogDriver: 'awslogs'
          Options:
            awslogs-group: !Ref 'LogGroup'
            awslogs-region: !Ref 'AWS::Region'
            awslogs-stream-prefix: 'couchbase-helper'
        MountPoints:
        - ContainerPath: '/shared-data'
          SourceVolume: 'efs'
        - ContainerPath: '/local-data'
          SourceVolume: 'local'
        Name: 'helper-container'

  CouchbaseECSService:
    Type: 'AWS::ECS::Service'
    Properties:
      SchedulingStrategy: 'DAEMON'
      Cluster: !Ref 'Cluster'
      ServiceRegistries:
      - RegistryArn: !GetAtt CouchbaseDiscoveryService.Arn
        ContainerName: 'couchbase-container'
      TaskDefinition: !Ref 'CouchbaseTaskDefinition'
      NetworkConfiguration:
        AwsvpcConfiguration:
          AssignPublicIp: DISABLED
          SecurityGroups:
          - !Ref 'SecurityGroup'
          Subnets: !Split [ ',', !Ref 'PrivateSubnets' ]

  CouchbaseHelperService:
    Type: 'AWS::ECS::Service'
    DependsOn: CouchbaseECSService
    Properties:
      DeploymentConfiguration:
        MinimumHealthyPercent: 0
      PlacementConstraints:
      - Type: 'memberOf'
        Expression: 'agentConnected == true'
      DesiredCount: !Ref 'DesiredCount'
      Cluster: !Ref 'Cluster'
      TaskDefinition: !Ref 'HelperTaskDefinition'
...

Outputs:
  AppStackName:
    Value: !Ref 'AWS::StackName'
    Export:
      Name: !Sub 'AppStackName-${AWS::StackName}'
...

Our task definitions are started using shell scripts we create on each ECS container instance, in /opt and map to the containers using a volume mount. To create the shell scripts, we use an EC2 sub-service called Systems Manager Services by nesting ssm.yml within our parent stack.

Firstly, SSM allows us to map out EFS (NFS) storage on each ECS container intance using AWS-RunShellScript association as follows.

  MountNFS:
    Type: 'AWS::SSM::Association'
    Properties:
      Name: 'AWS-RunShellScript'
      Parameters:
        commands:
        - !Sub |
            echo ${CurrentTimeStamp}
            
            yum list installed nfs-utils || yum install -y nfs-utils
            which telnet || yum install -y telnet
            which dig || yum install -y bind-utils

            [ -d /mnt/efs ] || mkdir -p /mnt/efs

            grep ${StorageNFS}.efs.${AWS::Region}.amazonaws.com /etc/fstab\
              || echo '${StorageNFS}.efs.${AWS::Region}.amazonaws.com:/    /mnt/efs    nfs    nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2    0 0' >> /etc/fstab

            mount | grep -q /mnt/efs\
              || while ! (echo > /dev/tcp/${StorageNFS}.efs.${AWS::Region}.amazonaws.com/2049) >/dev/null 2>&1; do sleep 10; done

            sleep 10

            mount -t nfs4 -o nfsvers=4.1,rsize=1048576,wsize=1048576,hard,timeo=600,retrans=2 "${StorageNFS}.efs.${AWS::Region}.amazonaws.com:/" /mnt/efs

            chown -R 1000:1000 /mnt/efs/${NameTag}
      Targets:
      - Key: 'tag:aws:autoscaling:groupName'
        Values:
        - !Ref 'AutoScalingGroup'

The association is assigned to the AutoScalingGroup, meaning new ECS container instances automatically mount shared storage on start-up.

Using the following associations, we create a boostrap script or Couchbase containers as well as our main helper script.

The Couchbase bootstrap script generates unique hostnames for each fresh Couchbase node, upserts Route53 CNAME records to point to the internal EC2 hostname (e.g. ip-172-31-24-238.ec2.internal) and records these hostnames in the .uuids state database, which is effectively a text file on the shared storage mapped to each container. Lastly, it renames the Couchbase node from the default IP address to the uniquely generated hostname. For existing nodes (ECS container restarts) coming up on different EC2 provate IPs, the script simply updates the Route53 record, before handing back over to the default entrypoint script.

  CouchbaseScripts:
    Type: 'AWS::SSM::Association'
    DependsOn: MountNFS
    Properties:
      Name: 'AWS-RunShellScript'
      Parameters:
        commands:
        - !Sub |
            echo ${CurrentTimeStamp}
   
            mkdir -p /opt/${NameTag}
            mkdir -p /opt/${NameTag}/couchbase-data
            chown -R 1000:1000 /opt/${NameTag}

            # -------------------------- #
            # Couchbase bootstrap script #
            # -------------------------- #
            cat << EOF > /opt/${NameTag}/couchbase-bootstrap.sh
            #!/usr/bin/env bash

            curl_opts='--silent --fail --retry 3'

            PATH=/root/.local/bin:\${!PATH}

            [ -f /root/.local/bin/pip ] || (wget --quiet https://bootstrap.pypa.io/get-pip.py && python get-pip.py --user)
            [ -f /root/.local/bin/aws ] || /root/.local/bin/pip install awscli --user --quiet

            ec2_ip=\$(hostname -i)
            ec2_hostname=\$(hostname)

            mkdir -p /opt/couchbase/var/lib/couchbase

            if [ -f /opt/couchbase/var/lib/couchbase/ip ] || [ -f /opt/couchbase/var/lib/couchbase/ip_start ]; then
                cb_hostname=\$(cat /opt/couchbase/var/lib/couchbase/ip || cat /opt/couchbase/var/lib/couchbase/ip_start)
            fi

            if ! [[ "\${!cb_hostname}" =~ ^[a-z0-9]+-[a-z0-9]+-[a-z0-9]+-[a-z0-9]+-[a-z0-9]+\.\${!DNS_PRIVATE}\$ ]]; then
                cb_hostname="\$(cat /proc/sys/kernel/random/uuid).\${!DNS_PRIVATE}"
            fi

            echo "ec2_ip=\${!ec2_ip} ec2_hostname=\${!ec2_hostname} cb_hostname=\${!cb_hostname}"

            grep \${!cb_hostname} /shared-data/\${!ECS_CLUSTER}.uuids || echo \${!cb_hostname} >> /shared-data/\${!ECS_CLUSTER}.uuids

            change_id=\$(/root/.local/bin/aws route53 change-resource-record-sets\
              --hosted-zone-id ${PrivateHostedZoneId}\
              --change-batch "{\"Changes\":[{\"Action\":\"UPSERT\",\"ResourceRecordSet\":{\"Name\":\"\${!cb_hostname}.\",\"Type\":\"CNAME\",\"TTL\":60,\"ResourceRecords\":[{\"Value\":\"\${!ec2_hostname}\"}]}}]}"\
              | grep Id | awk '{print \$2}' | sed 's/"//g')

            /root/.local/bin/aws route53 wait resource-record-sets-changed --id \${!change_id}

            echo "set Couchbase hostname: \${!cb_hostname}"
            echo "\${!cb_hostname}" > /opt/couchbase/var/lib/couchbase/ip
            echo "\${!cb_hostname}" > /opt/couchbase/var/lib/couchbase/ip_start
            echo "ns_1@\${!cb_hostname}" > /opt/couchbase/var/lib/couchbase/couchbase-server.node

            exec /entrypoint.sh couchbase-server
            EOF
            chmod +x /opt/${NameTag}/couchbase-bootstrap.sh
...

This next long snippet contains our main orchestrator script, which takes care of cluster initialisation, data seeding, index creation and adding/removing nodes. It uses the .uuids state information recorded by the Couchbase bootstrap script in the shared-data location to create the initial custer by always taking the first hostname from the list. Note, this node becomes the master cluster node, where indexes are created.

Any failed nodes are flagged in the .remove file on NFS shared storage, removed and the cluster re-balanced.

Similarly, any new nodes found in the .uuids file, are added to the cluster and the cluster re-balanced.

# --------------------- #
# Couchbase init script #
# --------------------- #
cat << EOF > /opt/${NameTag}/couchbase-init.sh
#!/usr/bin/env bash

echo "\$@"

curl_opts='--silent --fail --retry 3'
aws_opts='--region ${AWS::Region}'

printenv

apt-get -qq update > /dev/null
which python || apt-get -qq install -y python > /dev/null
which pip || (apt-get -qq install -y python-pip > /dev/null && pip install pip --upgrade --quiet)
which dig || apt-get -qq install -y dnsutils > /dev/null
which openssl || apt-get -qq install -y openssl > /dev/null
which git || apt-get -qq install -y git > /dev/null
which jq || apt-get -qq install -y jq > /dev/null
which curl || apt-get -qq install -y curl > /dev/null
which aws || pip install awscli --upgrade --quiet
pip list | grep bcrypt || pip install bcrypt --upgrade --quiet

if ! [ -d /opt/bmemcached-cli ]; then
    mkdir -p /opt/bmemcached-cli
    git clone https://github.com/RedisLabs/bmemcached-cli.git /opt/bmemcached-cli
    pushd /opt/bmemcached-cli
    pip install . -r requirements.pip
    popd
fi

ecs_metadata=\$(curl \${!curl_opts} \${!ECS_CONTAINER_METADATA_URI} | jq -r '.')
ecs_cluster=\$(echo \${!ecs_metadata} | jq -r '.Labels."com.amazonaws.ecs.cluster"')
cb_cluster=\$(cat /shared-data/\${!ecs_cluster}.uuids | head -n 1)

cb_admin_passwd=\$(cb_admin_passwd=\$(openssl rand -base64 18)
echo "\${!cb_admin_passwd}"

while ! [ -f /shared-data/\${!ecs_cluster}.uuids ]; do sleep 5s; done

while true; do
    cb_cluster=\$(cat /shared-data/\${!ecs_cluster}.uuids | head -n 1)
    cluster_ip=\$(dig +short \${!cb_cluster})

    if ! [ -f /shared-data/\${!ecs_cluster}.init ]; then
        while ! curl \${!curl_opts} http://\${!cluster_ip}:8091/pools; do
            echo "waiting for cluster \${!cluster_ip} to become available..."
            sleep 5s
        done

        cb_pools=\$(curl \${!curl_opts} http://\${!cb_cluster}:8091/pools | jq -r '.pools | length')
        echo "cluster=\${!cb_cluster} cluster_ip=\${!cluster_ip} pools=\${!cb_pools}"

        # initialise new cluster
        if [[ "\${!cb_cluster}" != '' ]] && [[ "\${!cluster_ip}" != '' ]] && [[ \${!cb_pools} -eq 0 ]]; then
            echo "initialise cluster \${!cb_cluster}"
            /opt/couchbase/bin/couchbase-cli cluster-init\
              --cluster \${!cb_cluster}\
              --cluster-name \${!ecs_cluster}\
              --services 'data,index,query,fts'\
              --cluster-ramsize \${!CB_MEM_DATA}\
              --cluster-index-ramsize \${!CB_MEM_INDEX}\
              --cluster-fts-ramsize \${!CB_MEM_FTS}\
              --cluster-username admin\
              --cluster-password "\${!cb_admin_passwd}"

            while [[ \$(curl \${!curl_opts} --user "admin:\${!cb_admin_passwd}"\
              http://\${!cluster_ip}:8091/pools\
              | jq -r '.pools[] | select(.name=="default").name') != 'default' ]]; do
                sleep 5s
            done

            echo "bucket-spec=\${!CB_BUCKETS}"
            for spec in \$(echo \${!CB_BUCKETS}); do
                bucket_name=\$(echo \${!spec} | awk -F':' '{print \$1}')
                bucket_type=\$(echo \${!spec} | awk -F':' '{print \$2}')
                bucket_mem=\$(echo \${!spec} | awk -F':' '{print \$3}')
                bucket_replica=''
                [[ \${!bucket_name} != 'memcached' ]] && bucket_replica='--enable-index-replica 1 --bucket-replica ${DesiredCapacity}'
                echo "create bucket=\${!bucket_name} type=\${!bucket_type} mem=\${!bucket_mem} cluster=\${!cb_cluster}"
                /opt/couchbase/bin/couchbase-cli bucket-create\
                  --cluster \${!cb_cluster}\
                  --bucket-type \${!bucket_type}\
                  --bucket \${!bucket_name}\
                  --bucket-ramsize \${!bucket_mem}\
                  \${!bucket_replica}\
                  --username admin\
                  --password "\${!cb_admin_passwd}"
                bucket_passwd=\$(openssl rand -base64 18)
                echo "\${!bucket_passwd}" > /shared-data/\${!ecs_cluster}.\${!bucket_name}

                echo "create user=\${!bucket_name} cluster=\${!cb_cluster}"
                /opt/couchbase/bin/couchbase-cli user-manage\
                  --cluster \${!cb_cluster}\
                  --username admin\
                  --password "\${!cb_admin_passwd}"\
                  --set\
                  --rbac-username "\${!bucket_name}"\
                  --rbac-password "\${!bucket_passwd}"\
                  --rbac-name "\${!bucket_name}"\
                  --roles "bucket_full_access[\${!bucket_name}]"\
                  --auth-domain local
            done

            sleep 30s

            echo stats | bmemcached-cli memcached:\$(cat /shared-data/\${!ecs_cluster}.memcached | head -n 1)@\${!cb_cluster}:11210

            echo \${!cb_cluster} > /shared-data/\${!ecs_cluster}.init
        fi
    fi

    cluster_hosts=\$(/opt/couchbase/bin/couchbase-cli server-list\
      --cluster \${!cb_cluster}\
      --username admin\
      --password "\${!cb_admin_passwd}")

    echo "\${!cluster_hosts}"

    for failed_host in \$(/opt/couchbase/bin/couchbase-cli server-list\
      --cluster \${!cb_cluster}\
      --username admin\
      --password "\${!cb_admin_passwd}"\
      | grep 'unhealthy inactiveFailed'\
      | awk '{print \$1}' | awk -F'@' '{print \$2}'); do
        if [[ "\${!failed_host}" != "\${!cb_cluster}" ]]; then
            echo "flagging \${!failed_host} for removal"
            echo \${!failed_host} >> /shared-data/\${!ecs_cluster}.remove
        fi
    done

    # remove defunct nodes
    if [ -f /shared-data/\${!ecs_cluster}.remove ]; then
        for cb_host in \$(cat /shared-data/\${!ecs_cluster}.remove); do
            if [[ "\${!cb_host}" != "\${!cb_cluster}" ]]; then
                echo "(hard) failover \${!cb_host} on cluster \${!cb_cluster}"
                /opt/couchbase/bin/couchbase-cli failover\
                  --cluster \${!cb_cluster}\
                  --server-failover \${!cb_host}\
                  --force\
                  --username admin\
                  --password "\${!cb_admin_passwd}"

                while /opt/couchbase/bin/couchbase-cli server-list\
                  --cluster \${!cb_cluster}\
                  --username admin\
                  --password "\${!cb_admin_passwd}" | grep 'warmup'; do
                    for spec in \$(echo \${!CB_BUCKETS}); do
                        bucket=\$(echo \${!spec} | awk -F':' '{print \$1}')
                        bucket_type=\$(echo \${!spec} | awk -F':' '{print \$2}')
                        bucket_pass=\$(cat /shared-data/\${!ecs_cluster}.\${!bucket} | head -n 1)
                        if [[ "\${!bucket_type}" != 'memcached' ]]; then
                            /opt/couchbase/bin/cbstats\
                              \${!cb_cluster}\
                              -b \${!bucket}\
                              -p "\${!bucket_pass}"\
                              -j warmup
                        fi
                    done
                    sleep 60s
                done

                echo "remove \${!cb_host} from cluster \${!cb_cluster}"
                /opt/couchbase/bin/couchbase-cli rebalance\
                  --cluster \${!cb_cluster}\
                  --server-remove \${!cb_host}\
                  --username admin\
                  --password "\${!cb_admin_passwd}"

                ec2_hostname=\$(dig +short \${!cb_host})

                change_id=\$(/root/.local/bin/aws route53 change-resource-record-sets\
                  --hosted-zone-id ${PrivateHostedZoneId}\
                  --change-batch "{\"Changes\":[{\"Action\":\"DELETE\",\"ResourceRecordSet\":{\"Name\":\"\${!cb_host}.\",\"Type\":\"CNAME\",\"TTL\":60,\"ResourceRecords\":[{\"Value\":\"\${!ec2_hostname}\"}]}}]}"\
                  | grep Id | awk '{print \$2}' | sed 's/"//g')

                /root/.local/bin/aws route53 wait resource-record-sets-changed --id \${!change_id}

                tmpfile=\$(mktemp)
                sed "/\${!cb_host}/d" /shared-data/\${!ecs_cluster}.remove > \${!tmpfile}
                cat \${!tmpfile} > /shared-data/\${!ecs_cluster}.remove
                sed "/\${!cb_host}/d" /shared-data/\${!ecs_cluster}.init > \${!tmpfile}
                cat \${!tmpfile} > /shared-data/\${!ecs_cluster}.init
            fi
        done
    fi

    cb_hosts=(\$(cat /shared-data/\${!ecs_cluster}.uuids))

    # add the server to existing cluster
    for cb_host in \${!cb_hosts[@]}; do
        if [ -f /shared-data/\${!ecs_cluster}.init ]\
          && [[ "\${!cb_host}" != "\${!cb_cluster}" ]]\
          && ! grep \${!cb_host} /shared-data/\${!ecs_cluster}.init; then
            if ! /opt/couchbase/bin/couchbase-cli server-list\
              --cluster \${!cb_cluster}\
              --username admin\
              --password "\${!cb_admin_passwd}"\
              | grep 'unhealthy'; then
                echo "adding \${!cb_host} to Couchbase cluster \${!cb_cluster}"
                echo \${!cb_host} > /shared-data/\${!ecs_cluster}.add
                /opt/couchbase/bin/couchbase-cli server-add\
                  --cluster \${!cb_cluster}\
                  --server-add \${!cb_host}\
                  --services 'data,index,query,fts'\
                  --server-add-username admin\
                  --server-add-password "\${!cb_admin_passwd}"\
                  --username admin\
                  --password "\${!cb_admin_passwd}"

                while /opt/couchbase/bin/couchbase-cli server-list\
                  --cluster \${!cb_cluster}\
                  --username admin\
                  --password "\${!cb_admin_passwd}" | grep 'warmup'; do
                    for spec in \$(echo \${!CB_BUCKETS}); do
                        bucket=\$(echo \${!spec} | awk -F':' '{print \$1}')
                        bucket_type=\$(echo \${!spec} | awk -F':' '{print \$2}')
                        bucket_pass=\$(cat /shared-data/\${!ecs_cluster}.\${!bucket} | head -n 1)
                        if [[ "\${!bucket_type}" != 'memcached' ]]; then
                            /opt/couchbase/bin/cbstats\
                              \${!cb_cluster}\
                              -b \${!bucket}\
                              -p "\${!bucket_pass}"\
                              -j warmup
                        fi
                    done
                    sleep 60s
                done

                echo "rebalancing \${!cb_host}"
                /opt/couchbase/bin/couchbase-cli rebalance\
                  --cluster \${!cb_cluster}\
                  --username admin\
                  --password "\${!cb_admin_passwd}"

                echo 'update bucket replicas'
                cluster_hosts=\$(/opt/couchbase/bin/couchbase-cli server-list\
                  --cluster \${!cb_cluster}\
                  --username admin\
                  --password "\${!cb_admin_passwd}" | wc -l)

                echo "bucket-spec=\${!CB_BUCKETS} cluster_hosts=\${!cluster_hosts}"
                for spec in \$(echo \${!CB_BUCKETS}); do
                    bucket_name=\$(echo \${!spec} | awk -F':' '{print \$1}')
                    if [[ \${!bucket_name} != 'memcached' ]]; then
                        echo "edit bucket=\${!bucket_name} cluster_hosts=\${!cluster_hosts}"
                        /opt/couchbase/bin/couchbase-cli bucket-edit\
                          --cluster \${!cb_cluster}\
                          --bucket \${!bucket_name}\
                          --bucket-replica \${!cluster_hosts}\
                          --username admin\
                          --password "\${!cb_admin_passwd}"
                    fi
                done

                echo stats | bmemcached-cli memcached:\$(cat /shared-data/\${!ecs_cluster}.memcached | head -n 1)@\${!cb_host}:11210

                echo \${!cb_host} >> /shared-data/\${!ecs_cluster}.init
                rm -rf /shared-data/\${!ecs_cluster}.add
            fi
        fi
    done

    while /opt/couchbase/bin/couchbase-cli server-list\
      --cluster \${!cb_cluster}\
      --username admin\
      --password "\${!cb_admin_passwd}" | grep 'warmup'; do
        for spec in \$(echo \${!CB_BUCKETS}); do
            bucket=\$(echo \${!spec} | awk -F':' '{print \$1}')
            bucket_type=\$(echo \${!spec} | awk -F':' '{print \$2}')
            bucket_pass=\$(cat /shared-data/\${!ecs_cluster}.\${!bucket} | head -n 1)
            if [[ "\${!bucket_type}" != 'memcached' ]]; then
                /opt/couchbase/bin/cbstats\
                  \${!cb_cluster}\
                  -b \${!bucket}\
                  -p "\${!bucket_pass}"\
                  -j warmup
            fi
        done
        sleep 60s
    done

    sleep 300s
done
EOF
chmod +x /opt/${NameTag}/couchbase-init.sh

The helper script in our case also hadles minor configuration tasks, such as enabling email alerts as well as bootstraps our application containers, by (re)setting application credentials. I've left these out for brevity, but these tasks follow the same documented approach, such as inserting data into the database using cbq and writing state out to /shared-data.

The above helper script is limited to automatically scaling the cluster with a maximum of two nodes. When three or more nodes are in the cluster, the default automatic failover mechanism in Couchbase will prevent the script from completing the rebalancing activities, since it will never exit the warmup wait loops. However, it should be trivial to change the helper script to enable it to automate >2 nodes if desired. Manually failing over one of the nodes usign the Couchbase UI or CLI will allow the script to proceed as is.

The other noteworthy item is the protection of the ECS container instance holding at least the master (first) cluster node. It would be worthy to protect this resource using auto-scaling group scale-in as well as EC2 termination protection mechanisms. Having said that, it's also worth remembering that the community edition of Couchbase does not support index replication and in this example the indexes are always created on the first (master) node.

For the record, I don't and have never loved Couchbase.

-- belodetek