Key Generation

This guide provides detailed instructions for generating SSH key pairs and securely deploying public keys to remote worker machines in various operating system configurations within the AGILab environment.

In the rest of the guide, I would refer to:

  • Manager: The local machine from which SSH connections are initiated.

  • Worker: The remote machine that accepts SSH connections.

  • Remote account: The user account on the worker machine used for SSH login.

Use the real remote account name for your cluster. In the command examples, set remote_user to that account instead of hardcoding a workstation-specific login.

1. Generate the keys

manager$ ssh-keygen -a 100 -t ed25519

You will be prompted for a passphrase. don’t enter one (double return).

If you have not changed the default path, the public key will be stored in ~/.ssh/id_ed25519.pub and the private key in ~/.ssh/id_ed25519.

2. Loading the private key in SSH Agent

2.1 Load the private key

manager$ ssh-add ~/.ssh/id_ed25519

2.2 Verify the key Addition

manager$ ssh-add -l

It should display the public key (not private). To manually check the public key:

manager$ cat ~/.ssh/id_ed25519.pub

If you have set a passphrase, you will be asked to enter it. If you encounter any permission-related errors, refer to the Permissions section.

On Linux, if a window titled “Enter password to unlock the private key” appears when trying to establish an SSH connection, enter the passphrase and check the box “Automatically unlock this key whenever I’m logged in”.

3. Copy the public key to the server

3.1 Allow your key

Follow these steps to add your key to the authorized_keys file of each workers:

Worker Linux:

manager$ remote_user="worker-user"
worker_ip="192.0.2.20"
ssh-copy-id -i ~/.ssh/id_ed25519 "$remote_user@$worker_ip"

Worker Windows:

manager$ remote_user="worker-user"
worker_ip="192.0.2.20"
cat ~/.ssh/id_ed25519.pub | ssh "$remote_user@$worker_ip" powershell -NoProfile -Command "Add-Content -Encoding ascii -Path \"\$env:USERPROFILE\\.ssh\\authorized_keys\" -Value '([Console]::In.ReadToEnd())'"

3.2 Verification

manager$ remote_user="worker-user"
worker_ip="192.0.2.20"
ssh "$remote_user@$worker_ip"

Success

It should connect without asking the account password !

Bidirectional trust between worker Macs

When two macOS workers must talk to each other (for example <worker_a_ip> and <worker_b_ip>), install the same SSH key pair on both hosts so either side can ssh without a password prompt. Set remote_user to the login account used on both workers:

  1. Enable Remote Login on each Mac if it is not already on:

    sudo systemsetup -setremotelogin on
    
  2. From <worker_a_ip> push the public key to <worker_b_ip>:

    remote_user="worker-user"
    worker_b_ip="192.0.2.22"
    ssh-copy-id -i ~/.ssh/id_ed25519 "$remote_user@$worker_b_ip"
    
  3. From <worker_b_ip> push the same key back to <worker_a_ip>:

    remote_user="worker-user"
    worker_a_ip="192.0.2.21"
    ssh-copy-id -i ~/.ssh/id_ed25519 "$remote_user@$worker_a_ip"
    
  4. Verify both directions once so the host keys land in ~/.ssh/known_hosts:

    remote_user="worker-user"
    worker_a_ip="192.0.2.21"
    worker_b_ip="192.0.2.22"
    ssh "$remote_user@$worker_b_ip" hostname
    ssh "$remote_user@$worker_a_ip" hostname
    

Each command should print the remote hostname without asking for a password. If either side still prompts, re-run ssh-copy-id and make sure ~/.ssh/authorized_keys on the target contains the public key content.

Reverse SSH for SSHFS-backed cluster shares

When the cluster share is mounted with SSHFS, the worker initiates a second SSH connection back to the scheduler/manager path. Manager-to-worker SSH is not enough. Validate the reverse direction before running agilab doctor --setup-share sshfs --apply:

manager_user="<manager-user>"
manager_ip="<manager-ip>"
ssh -o BatchMode=yes "$manager_user@$manager_ip" hostname

If this fails with a host-key error, refresh the manager key on the worker:

manager_ip="<manager-ip>"
ssh-keygen -R "$manager_ip" -f ~/.ssh/known_hosts
ssh-keyscan -H -t ed25519,rsa,ecdsa "$manager_ip" >> ~/.ssh/known_hosts

If this fails with Permission denied, add the worker public key to the manager account:

worker$ ssh-keygen -t ed25519 -N "" -f ~/.ssh/id_ed25519
worker$ cat ~/.ssh/id_ed25519.pub

Append that public key to ~/.ssh/authorized_keys for the manager account on the manager, then rerun the reverse SSH command. Keep ~/.ssh at 0700 and authorized_keys at 0600.

Node reinstalled or host key changed

If a worker was reinstalled or its SSH host keys changed, fix the host-key trust first, then restore user-key authentication:

  1. Verify the new SSH host key fingerprint out of band.

  2. On the manager, remove the stale host key and register the new one:

    worker_ip="<worker-ip>"
    ssh-keygen -R "$worker_ip"
    ssh-keyscan -H -t ed25519 "$worker_ip" >> ~/.ssh/known_hosts
    ssh-keygen -F "$worker_ip" -f ~/.ssh/known_hosts
    
  3. Re-push the manager public key to the rebuilt worker:

    remote_user="<remote-user>"
    worker_ip="<worker-ip>"
    ssh-copy-id -i ~/.ssh/id_ed25519 "$remote_user@$worker_ip"
    
  4. If ssh-copy-id is unavailable, recreate ~/.ssh/authorized_keys on the worker manually and keep strict permissions:

    mkdir -p ~/.ssh
    chmod 700 ~/.ssh
    printf '%s\n' '<public key content>' >> ~/.ssh/authorized_keys
    chmod 600 ~/.ssh/authorized_keys
    
  5. Verify passwordless access again before relaunching AGILAB:

    remote_user="<remote-user>"
    worker_ip="<worker-ip>"
    ssh "$remote_user@$worker_ip" hostname
    

If the worker also lost its AGILAB cluster mount, restore ~/.agilab/.env and remount the configured user-scoped clustershare/<user> path before rerunning cluster installs or pipelines.

Troubleshooting

SSHD Service

Check the service status:

sudo systemctl status ssh  # Check SSH Server status
ssh-add -L  # Check if the SSH agent is running

Check the configuration

Check the SSH server configuration in the sshd_config file:

  • Windows Server: C:\ProgramData\ssh\sshd_config

  • Unix Server: /etc/ssh/sshd_config

Ensure the following configuration is set:

PubkeyAuthentication yes
PasswordAuthentication no

To modify, open the file in an elevated text editor, update the lines as shown above, and restart the SSH server (see Restart the SSHD service section).

Restart the SSHD service

workers$ sudo systemctl restart ssh  # Restart SSH Server
workers$ eval "$(ssh-agent -s)"  # Restart SSH Agent

Permissions

chmod 700 ~/.ssh
chmod 600 ~/.ssh/id_ed25519

To verify:

ls -l "~/.ssh"