No, fractional reserve banking doesn't create money

A great example of why you need double-entry bookkeeping and balance sheets.

There are (at least) two claims that have never made sense to me ever since I heard them. One is that double-entry bookkeeping, where you track the capital inside a company by keeping two lists of money amounts that add up to the same total, is actually useful and not a formality that is 50% redundant. The other is that fractional-reserve banking (FRB), whereby a bank is not forced to have at least as much cash on hand as the combined total of all account balances open at that bank, allows banks to create cycles of infinite money. The former is true but hard to believe at first. The latter is false but hard to disbelieve at first. Both become apparent when considering the typical claim about FBR through the lens of double-entry bookkeeping.

Does DeBERTa have infinite context length, and how large is the receptive field of a token?

Disentangling the strangeness of relative distance and more.

The promise of DeBERTa is that it does away with absolute positional embeddings added at the start of the transformer encoder, and replaces it instead with an attention mechanism that takes into account the relative distance between tokens when doing attention, in every head of every layer. Since absolute positional embeddings are the only part of a transformer whose dimensionality is related to the length of the input, would taking it out mean DeBERTa can process an infinite amount of tokens at once? Let’s analyse.

How does HuggingFace's from_pretrained() know which weights in a checkpoint go where?

I dove deep so you don’t have to.

The famous roberta-base HuggingFace checkpoint is a serialised version of a RobertaForMaskedLM model, consisting of a roberta field and an lm_head field. Yet, despite this, you can still call .from_pretrained("roberta-base") on RobertaForTokenClassification and get an object that has a roberta field with exactly the checkpoint’s roberta weights, but a head with a different architecture and randomly initialised weights. Even more strikingly, you can call .from_pretrained("roberta-base") on RobertaModel, which is what normally sits inside that roberta field and consists of the fields embeddings and encoder, and somehow it can still match all the relevant checkpoint weights to the correct fields. Ever wondered how that’s done? Here’s how.

How to create your own Python packages and dependencies

A short tutorial on how to lay out a repo, declare metadata, installing editable code, and doing it recursively.

At the time of writing, I’ve publicly released a handful of Python packages (five, to be precise: Fiject, BPE-knockout, TkTkT, LaMoTO, and MoDeST) covering various parts of the NLP research workflow, with more on the way. It took me a bit to learn how to set up Python code as a package (also inaccurately called “library” or more accurately called a “module”), and as I later discovered, it’s not so trivial to have one custom package be installed automatically upon installing another custom package, especially when you are the author of both and are already using a working version. Let’s dive straight in!

Bits-per-character and its relation to perplexity

A short formalisation of an obscure metric.

I was recently reading a paper on how to train an adapter between a given model and any tokeniser, and noticed that they measured their causal language modelling performance in bits-per-character (BPC) rather than the more standard perplexity (PPL) with no citation to show for it. Let’s dive into that!

BPE-knockout: Pruning Pre-existing BPE Tokenisers with Backwards-compatible Morphological Semi-supervision

Quick overview of my first-ever published paper, presented as a poster at NAACL 2024, one of the leading conferences in NLP.

After a harrowing round of peer review earlier this year (story for another day), I’m pleased to say that I am officially a published author. This article gives a short overview of my first paper, BPE-knockout: Pruning Pre-existing BPE Tokenisers with Backwards-compatible Morphological Semi-supervision. For a more detailed and slightly more technical overview, watch the 13-minute video explainer here.

Exposure 101: How to Use Manual Mode in Practice.

So many tutorials teach the exposure triangle, but how is it actually applied in practice? Take full control of the power of M(anual) mode by training your muscle memory on this one recipe!

When I was starting out in hobby photography in my early teens, I felt that I had to cling to all forms of automatic assistance to coordinate my shots for me: I would merely point the camera, and would let auto-focus and auto-exposure (the infamous P mode, or even worse, AUTO mode, with assistance of the on-camera flash) handle the rest. After all, computers are much better at multi-tasking than we mortals are, and that dreaded M mode seemed to have way too many settings with opposite effects. Better safe than sorry… right?

Pagination