-
Dream 7B
Introducing Dream 7B, the most powerful open diffusion large language model to date.
-
EvaByte: Efficient Byte-level Language Models at Scale
Introducing EvaByte, an efficient and strong byte-level language model
-
Randomized Attention: a Generalized Random Feature Attention Algorithm
A blog post on novel perspectives to understand random feature attention