-
The Essense of Global Convolution Models
-
Lossless Tokenizer via Byte-level BPE with Tiktoken
The design of Tiktoken, a byte-level BPE tokenizer behing GPT.
-
Magnetism from Relativistic Electricity
A sketch of how magnetism arises from special relativity.
-
Helion Fusion in a Nutshell
A personal note on Helion's approach to fusion energy.
-
Measuring Code Generation Abilities of GPT-4 in 10+ Languages
-
ChatGPT-4 on Physics Olympiad Problems
-
Unreasonable Effectiveness of LLMs for Code Generation
-
OpenAI Still Makes 2X Profits on ChatGPT at 0.2 Cents Per 1K Tokens
-
Memory IO Efficiency of Multi-Query Attention
Multi-query attention can be much more efficient under large batch and context length.
-
The Illustrated Tensor Parallelism
The framework behind using large language models for inference and tensor parallel training, explained with math, code, and illustrations.
-
The Illustrated Attention via Einstein Summation
Introduction to einsum with attention operations.