Coding LLMs from scratch!

Today I’m sharing some results of my experiments with coding LLMs from scratch in python/ PyTorch without any extra libraries.

The purpose of this is obviously educational. For me the true value of this is in struggling, running into issues/ bugs and correcting them by myself; however I’m sharing the code because sometimes it can be helpful to solve some frustrating issue by looking at other implementations.

Probably, like many other I got inspired by fantastic tutorial by Andrej Karpathy on GPT-2 from scratch. The material he prepared is a true treasure trove but it’s biggest value for me was the idea/realisation that I can code and train the modern LLM architectures (albeit smaller size) without any major problems or costs fully on my own!

Working as data scientist for many past years, I used multiple Transformer based models, including most of the LLMs out there; for some I modified or extended the code, then also I did the basic transformer implementation multiple times on my own to learn how it works but I never really did the entire language model from start to end with all the training on my own … until now.

I first started following Karapathy tutorial but then inspired by it, I just moved on to all famous LLMs shared by other major companies. As up to the date of publication of this post I implemented:

  1. GPT-2 – as an example of decoder based LLM
  2. BERT – opposed to GPT-2, example of encoder based Language Model
  3. Llama2 – similar to GPT-2 but having some interesting modifications (like rotary positional embeddings which are used by pretty much everybody nowadays).
  4. T5 – example of encoder-decoder based model like the one from original Attention Is All You Need paper

I think the most I got out of my BERT implementation – that is because there aren’t many open / tutorial like implementations out there. I found few but usually they don’t do the full pre-training so it’s hard to compare my own results. Therefore, many problems and issues I had to solve myself, basically following similar methodology as Karapathy for his GPT-2 tutorial – reading the BERT and subsequent RoBERTA papers to figure out all the tiny implementation details. I also peaked into HuggingFace implementation but as noted also in GPT-2 tutorial, its super convoluted. I when ran into multiple frustrating issues, I basically ripped the entire code apart line-by-line. Time consuming but ended being quite satisfactory (after all started finally working ;D).

Example results of pre-training accuracy from my BERT implementation.

Overall, I can recommend the entire exercise even for those who have nothing to do with pre/post training those huge LLMs. I don’t expect myself to do that either during my every day regular work any soon, however I fine-tune on smaller datasets all those major models with billions of parameters pretty much on a daily basis. Understanding with great detail how they are implemented and what are the differences between them makes me feel a lot more comfortable working with all this. Also, doing it all from scratch in PyTorch is quite a good exercise.

For anybody interested, I share all my code and all pre-training results/accuracies/settings in a new repository on my Github account.

Daily wallpaper tool for MacOS

For quite a while now Windows 8 and 10 had this neat feature to set Bing Daily Wallpaper as lock screen image or a wallpaper (via Bing Desktop app). Regardless of what might be the quality of Bing search engine, I think selection of those daily images is really well curated.

In the past, I’ve been using a bunch of shell scripts to get similar functionality in MacOS and there are also some paid apps available in AppStore. However, I wasn’t fully happy with either of those solutions so… I wrote my own Cocoa MacOS app to do the job!

Wallpaper Switcher works as a Preference Pane that integrates into the MacOS System Preferences. You can download it here, and simply open to install on your machine.

My intention was to make an easy-to-use, single click UI without the needed to mess around with shell scripts / command-line each time I reinstall or update MacOS. The app makes the entire process seamless and afterwards it just disappears into the background as if it was part of the OS.

Wallpaper Switcher uses MacOS native scheduler therefore it does not take any additional system resources, saving system memory and battery time if installed on a laptop. Primarily, I tested it with MacOS Mojave (10.14).

As a bonus, apart of Bing Daily Images, Wallpaper Switcher allows to use National Geographic Photo of the Day, images from Reddit posts (like the /r/wallpapers or /r/art that have a daily stream of community voted images), or just your own custom URL.

I’ve written the entire app using Objective-C and Apple’s Cocoa framework. Aside of the Preference Pane, the app has a small command-line tool embedded that does all the downloading and wallpaper setting in the background. To save resources, the daily updates are done using system built-in functionality of MacOS scheduler launchd. All this is setup and managed by the app based on user preferences set in the Preference Pane.

If anyone is interested how all this works under the hood, I published the entire source code and latest binaries on Github.