The famous roberta-base HuggingFace checkpoint is a serialised version of a RobertaForMaskedLM model, consisting of a roberta field and an lm_head field. Yet, despite this, you can still call .from_pretrained("roberta-base") on RobertaForTokenClassification and get an object that has a roberta field with exactly the checkpoint’s roberta weights, but a head with a different architecture and randomly initialised weights. Even more strikingly, you can call .from_pretrained("roberta-base") on RobertaModel, which is what normally sits inside that roberta field and consists of the fields embeddings and encoder, and somehow it can still match all the relevant checkpoint weights to the correct fields. Ever wondered how that’s done? Here’s how.
A lesson in supply-and-demand for adults who still believe in fairy dust.
I recently watched a video online of someone complaining about the existence of millionaires and billionaires. I commented with the realistic and optimistic message that anyone has the ability to become a multi-millionaire, even if just through supplying a silly backend commodity like toilet rolls. Then, all hell broke loose.
Cross-platform app development is surprisingly easy in 2024. Learnt it in a weekend.
Knowing how to build mobile apps is likely going to be a skill that will stay relevant for the foreseeable future. I’ve had some ideas for mobile apps in the past, and wanted to make sure that if I was going to learn to develop them, it had better be in a framework that allows having one code base and yet runs on both iOS and Android. We’re in luck: Google’s Flutter framework, built on top of the Dart language, does exactly that. Here are my notes learning it.
A short tutorial on how to lay out a repo, declare metadata, installing editable code, and doing it recursively.
At the time of writing, I’ve publicly released a handful of Python packages (five, to be precise: Fiject, BPE-knockout, TkTkT, LaMoTO, and MoDeST) covering various parts of the NLP research workflow, with more on the way. It took me a bit to learn how to set up Python code as a package (also inaccurately called “library” or more accurately called a “module”), and as I later discovered, it’s not so trivial to have one custom package be installed automatically upon installing another custom package, especially when you are the author of both and are already using a working version. Let’s dive straight in!
under Posts / Explainers about Language, Modelling
A short formalisation of an obscure metric.
I was recently reading a paper on how to train an adapter between a given model and any tokeniser, and noticed that they measured their causal language modelling performance in bits-per-character (BPC) rather than the more standard perplexity (PPL) with no citation to show for it. Let’s dive into that!
So many tutorials teach the exposure triangle, but how is it actually applied in practice? Take full control of the power of M(anual) mode by training your muscle memory on this one recipe!
When I was starting out in hobby photography in my early teens, I felt that I had to cling to all forms of automatic assistance to coordinate my shots for me: I would merely point the camera, and would let auto-focus and auto-exposure (the infamous P mode, or even worse, AUTO mode, with assistance of the on-camera flash) handle the rest. After all, computers are much better at multi-tasking than we mortals are, and that dreaded M mode seemed to have way too many settings with opposite effects. Better safe than sorry… right?