AI Bytes

A Byte-Sized Look at the World of AI

The Essense of Global Convolution Models

103 min read · December 3, 2023

2023 · operators convolution s4 h3 gss hyena long-range-modeling
Lossless Tokenizer via Byte-level BPE with Tiktoken

The design of Tiktoken, a byte-level BPE tokenizer behing GPT.

13 min read · September 30, 2023

2023 · llms tokenizer
Magnetism from Relativistic Electricity

A sketch of how magnetism arises from special relativity.

5 min read · April 20, 2023

2023 · physics
Helion Fusion in a Nutshell

A personal note on Helion's approach to fusion energy.

6 min read · April 7, 2023

2023 · physics fusion
Measuring Code Generation Abilities of GPT-4 in 10+ Languages

10 min read · March 19, 2023

2023 · codegeneration · transformers gpt4
ChatGPT-4 on Physics Olympiad Problems

3 min read · March 18, 2023

2023 · gpt4 aiforscience
Unreasonable Effectiveness of LLMs for Code Generation

14 min read · March 7, 2023

2023 · codegeneration · transformers
OpenAI Still Makes 2X Profits on ChatGPT at 0.2 Cents Per 1K Tokens

7 min read · March 4, 2023 · medium.com

2023
Memory IO Efficiency of Multi-Query Attention

Multi-query attention can be much more efficient under large batch and context length.

12 min read · February 1, 2023

2023 · llm · transformers
The Illustrated Tensor Parallelism

The framework behind using large language models for inference and tensor parallel training, explained with math, code, and illustrations.

8 min read · November 17, 2022

2022 · llm attention transformers gpt · transformers
The Illustrated Attention via Einstein Summation

Introduction to einsum with attention operations.

9 min read · November 15, 2022

2022 · llm attention transformers gpt · transformers