Skip to main content
Metaist

Google Drive Links: rclone

How I actually get Google Drive links for many documents at once.

Previously: Problem: Google Drive Links, Workspace Add-On

It's been over a year since I wrote a Google Drive Workspace Add-On to get multiple links from documents. However, the add-on was rejected because the video I made didn't meet their requirements. And while I hope to eventually fix the video and resubmit, I ended up with a completely different solution that I've been using on Linux and macOS.

Let me introduce rclone: it's like sftp but for every cloud provider, including Google Drive.

After installing you need to run rclone config to set up a remote name for the Google Drive you want to interact with. Then create a folder to mount the remote into using rclone mount $name: $GDRIVE_ROOT --daemon. This works like any other mount command.

I wrote a python script that can take one or more locations relative to $GDRIVE_ROOT and return HTML that can be pasted into an email. I discuss the relevant bits below.

First, I define a few objects to describe the input files (PathInfo), the intermediate data retrieved from rclone (RCloneInfo), and the output format with information about gdrive (GDriveInfo).

from typing import NamedTuple

class PathInfo(NamedTuple):
    path: Path
    is_dir: bool


class RCloneInfo(NamedTuple):
    Path: str
    Name: str
    Size: int
    # MimeType: str
    ModTime: str
    IsDir: bool
    ID: str


class GDriveInfo(NamedTuple):
    name: str
    url: str

    @staticmethod
    def from_rclone(item: RCloneInfo) -> GDriveInfo:
        """Return a `GDriveInfo` from an `RCloneInfo`."""
        return GDriveInfo(item.Name, f"https://drive.google.com/open?id={item.ID}")

I learned how to access a Google Drive document directly from Amit Agarwal's great post on Google Drive URL tricks.

Next, I define some helper functions to deal with local and remote paths.

from pathlib import Path
from typing import Dict
from typing import List

def is_relative_to(self: Path, *other: Path) -> bool:
    """Return `True` if `other` is relative to `self`.

    See: https://github.com/python/cpython/blob/41756e3960a38249b9e0076412ef5e08625a7acc/Lib/pathlib.py#L736
    """
    try:
        self.relative_to(*other)
        return True
    except ValueError:
        return False


def group_by_parent(paths: List[Path]) -> Dict[Path, List[Path]]:
    """Return a mapping of parent paths mapped to a list of children paths."""
    result: Dict[Path, List[Path]] = {}
    for path in paths:
        parent = path.parent
        if parent in result:
            result[parent].append(path)
        else:
            result[parent] = [path]
    return result


def remote_path(path: Path) -> str:
    """Convert a path to a remote path."""
    relpath = path.relative_to(PATH_GDRIVE)
    remote = str(relpath).replace("/", ":", 1)  # first slash becomes a colon
    if ":" not in remote:
        remote += ":"  # if you're at the top level
    return remote

Then, I define functions for fetching information about gdrive through rclone.

from subprocess import run

def rclone_lsjson(remote: str, *args: str) -> List[RCloneInfo]:
    """Return rclone info about `remote`."""
    cmd = ["rclone", "lsjson", remote, "--no-modtime", "--no-mimetype"] + list(args)
    proc = run(cmd, capture_output=True)
    return [RCloneInfo(**data) for data in json.loads(proc.stdout)]


def gdrive_by_path(path: Path) -> List[GDriveInfo]:
    """Return info by searching for a path."""
    remote = remote_path(path)
    items = rclone_lsjson(remote)
    return [GDriveInfo.from_rclone(item) for item in items]


def gdrive_by_name(parent: Path, props: Dict[str, PathInfo]) -> List[GDriveInfo]:
    """Return info by searching for names in a directory."""
    remote = remote_path(parent)
    items = rclone_lsjson(remote)
    return [
        GDriveInfo.from_rclone(item)
        for item in items
        if item.Name in props and item.IsDir == props[item.Name].is_dir
    ]


def gdrive_info(paths: List[Path]) -> List[GDriveInfo]:
    """Return GDrive url and name for several paths."""
    result = {path: None for path in paths}
    for parent, children in group_by_parent(paths).items():
        if len(children) == 1 and not children[0].is_dir():
            # one child, non-directory
            path = children[0]
            result[path] = gdrive_by_path(path)[0]
        else:  # multiple children or single directory
            props = {
                p.name: PathInfo(p, p.is_dir())  # assumes unique names
                for p in children
            }
            items = gdrive_by_name(parent, props)
            for item in items:
                path = props[item.name].path
                result[path] = item

    return [v for v in result.values() if v]

Most of the work is being done by rclone lsjson which returns JSON information about specific paths. I learned a lot from this forum post about getting the folder id for a Google Drive path.

One thing that can be surprising about Google Drive is that it is an ID-based file system so it is possible to have multiple files with the exact same name in the same folder. If we set that aside, the main thing we need is to be able to grab the folder our files are contained in so that we can efficiently query Google Drive for multiple files at once and use the single response to construct our output.

def render_html(items: List[GDriveInfo]) -> str:
    """Return a list of items as HTML."""
    result = []
    pre, post = "", ""
    many = len(items) > 1
    if many:
        result.append("<ul>")
        pre, post = "\t<li>", "</li>"

    for item in items:
        result.append(f'{pre}<a href="{item.url}">{item.name}</a>{post}')

    if many:
        result.append("</ul>")
    return "\n".join(result)

Inspired by Commands with Comma, I named this ,gdrive-id.py.

However, that's not very accessible. I don't want to have to open a terminal just to get some URLs. In the next part, I describe how I added context menus in Thunar and macOS Finder to make my life easier.