Recreating GitHub Pages on my own servers

Auto-building a Jekyll site and deploying it over SFTP without GitHub

April 14 2021

GitHub offers a free service to build and host Jekyll websites for a GitHub repo automatically called GitHub Pages. It’s really convenient to work with, and you can even connect your own domain names to it, but what if you don’t want to beholden to Microsoft. Here’s a guide to recreating GitHub pages, but fully self hosted.

Prerequisites

How to setup a git remote

You don’t need to install anything on your server beyond the normal git CLI and an SSH server. On Debian/Ubuntu that’s as simple as sudo apt-get install git openssh-server

Next you need to decide which user to use for the git operations. This will also be the user that the deploy script is executed as. I’ll be creating a user called git. That can be done with sudo adduser git. This makes a new user and gives them a home directory, in this case /home/git/ which we can use to keep our repos in.

(Optional) SSH key authentication

If you want to use key authentication rather than password authentication, we need to run sudo su git to get a shell as our new user, then mkdir -p ~/.ssh to make a directory to hold our SSH configuration files. Next run chmod 700 ~/.ssh so that only the git user can read and write its contents. Then create a file to hold the authorised keys with nano ~/.ssh/authorized_keys and copy in any public keys you want to allow to login to this user. Finally restrict it to only the git user with chmod 600 ~/.ssh/authorized_keys

Setting up the git repo

Now that we have a user configured, we can make a git repo to hold our website. First make a folder for it with mkdir -p ~/website.git (you don’t have to call it website, but it’s good practice to end it with .git), then enter it with cd ~/website.git and initialise the repo with git --bare init. This is slightly different to how you might normally create a git repo. The --bare tells git not to create a working directory for it, and instead just make the files that would go in the .git folder directly in ~/website.git. We do this because we aren’t going to be doing any development on this repo, that’ll be done on a local clone, so we just need the history.

With this we should now be able to clone it locally with git clone git@your_server_domain_or_IP:website.git

Building and deploying automatically with git hooks

Git provides a convenient way to trigger scripts when certain events occur called hooks. In our example we can find them in ~/website.git/hooks/. By default there are a bunch of sample scripts in there that we can ignore. Instead we are going to create a ~/website.git/hooks/post-receive script (use whichever editor you like). I’m going to write the script in Python 3 which can be installed on your Debian/Ubuntu server with sudo apt-get install python3 python3-pip. If you want to deploy the website to a separate hosting over SFTP, you’ll also need to run sudo pip install pysftp.

#!/bin/python3

import sys, os, subprocess, shutil, pysftp

The first line here is the ‘shebang’ that tells linux that this script should be executed using the Python 3 interpreter. After that we have all the library imports that this script will use. If you don’t need to deploy over SFTP, you can remove , pysftp from the end of the line.

def check_refs():
    for line in sys.stdin:
        old_hash, new_hash, ref = line.strip().split(' ')
        if ref == "refs/heads/master":
            return (True, old_hash, new_hash)
    return (False, None, None)

Next we have a function that will check if the master branch has been updated as part of this push. If you want change which branch it is sensitive to, change "refs/heads/master" to "refs/heads/branch_name".

def make_temp_dirs():
    try:
        os.mkdir("/tmp/website_checkout")
    except FileExistsError:
        shutil.rmtree("/tmp/website_checkout")
        os.mkdir("/tmp/website_checkout")
    try:
        os.mkdir("/tmp/website_build")
    except FileExistsError:
        shutil.rmtree("/tmp/website_build")
        os.mkdir("/tmp/website_build")

This function will create temporary directories to store the checkout of the pushed branch and the results of the Jekyll build.

def git_checkout():
    subprocess.run(
        ["git", "--git-dir=.", "--work-tree=/tmp/website_checkout", "checkout", "master", "."]
    ).check_returncode()

This function uses subprocess to invoke git to create a checkout of the pushed branch in our temporary directory. If you want to make it work on a different branch, change "master" to "branch_name".

def jekyll_build():
    os.environ["GEM_HOME"] = "/home/git/gems"
    os.environ["PATH"] = "/home/git/gems/bin:" + os.environ["PATH"]
    os.environ["BUNDLE_GEMFILE"] = "/tmp/website_checkout/Gemfile"
    subprocess.run(["bundle", "install"]).check_returncode()
    subprocess.run(
        ["bundle", "exec", "jekyll", "build", "-s", "/tmp/website_checkout", "-d", "/tmp/website_build"]
    ).check_returncode()

This is the function that actually performs the build. It sets up environment variables, then makes sure all build dependencies are installed, and then executes the Jekyll build with the source and destination directories set to the two temporary directories.

def rmdir(sftp: pysftp.Connection, path: str):
    if not sftp.exists(path):
        return None
    for p in sftp.listdir(path):
        if sftp.isdir(f"{path}/{p}"):
            rmdir(sftp, f"{path}/{p}")
        else:
            sftp.remove(f"{path}/{p}")
    sftp.rmdir(path)

You’ll only need this function if you want to deploy over SFTP. The pysftp lacks the ability to recursively delete a directory, so we use this function to provide that missing functionality. This will be used to delete the old deployment before copying over the new one

def get_sftp_password():
    password_file = open("/home/git/sftp_pass", "r")
    password = password_file.read().strip()
    password_file.close()
    return password

Another SFTP only function. This one reads the password to use from a file.

def deploy(password: str):
    with pysftp.Connection("sftp_host_domain", username="sftp_user", password=password) as sftp:
        rmdir(sftp, "/home/sftp_user/www")
        sftp.mkdir("/home/sftp_user/www")
        sftp.put_r("/tmp/website_build", "/home/sftp_user/www")

This function deploys the website over SFTP. You’ll need to change "sftp_host_domain" to be the domain of the host you want to deploy to, and any occurrences of sftp_user to the appropriate user. You may also need to change www if you want it to end up in a different directory.

def cleanup():
    shutil.rmtree("/tmp/website_build")
    shutil.rmtree("/tmp/website_checkout")

This function is just a simple clean up function that deletes the two temporary directories once we’re done.

def main():
    should_update, old_hash, new_hash = check_refs()
    if not should_update:
        print("Build and Deploy: nothing to do")
        return None
    print(f"Updates master from {old_hash} to {new_hash}, rebuilding Jekyll")
    print("Making temporary dirs")
    make_temp_dirs()
    print("Checking out master")
    git_checkout()
    print("Building Jekyll")
    jekyll_build()
    print("Deploying via SFTP")
    deploy(get_sftp_password())
    print("Cleaing up temporary dirs")
    cleanup()
    print("Build and Deploy complete!")


if __name__ == "__main__":
    main()

Lastly we have the main function that calls our functions in sequence, with some simple logging in between, and the __name__ == "__main__" to run our main function when the script is executed.

All that’s left to do is make it executable with chmod +x ~/website.git/hooks/post-receive, make the SFTP password file at /home/git/sftp_pass and set it to only be read and writeable by the git user with chmod 600 ~/sftp_pass.

You can download the full Python script here

Recommended Projects

Shell scripting in Python

Introducing shell-scripter, a library that makes it almost as easy as bash