Anonymous
03/18/2026 (Wed) 13:12
Id: e14872
No.178435
del
>>178382,
>>178383,
>>178384,
>>178385,
>>178386,
>>178387,
>>178388,
>>178389,
>>178390,
>>178391,
>>178392,
>>178393,
>>178394,
>>178395,
>>178396,
>>178397,
>>178398,
>>178399,
>>178400,
>>178401,
>>178402,
>>178403,
>>178404,
>>178405,
>>178406,
>>178407,
>>178408,
>>178409,
>>178410,
>>178411,
>>178413,
>>178414,
>>178415,
>>178416,
>>178417,
>>178418,
>>178419,
>>178420,
>>178421,
>>178422,
>>178423,
>>178424,
>>178425,
>>178426,
>>178427,
>>178428,
>>178429,
>>178430,
>>178431,
>>178432,
>>178433,
>>178434Nainsi Dwivedi @NainsiDwiv50980 - Holy shit... Microsoft open sourced an inference framework that runs a 100B parameter LLM on a single CPU.
It's called BitNet. And it does what was supposed to be impossible.
No GPU. No cloud. No $10K hardware setup. Just your laptop running a 100-billion parameter model at human reading speed.
Here's how it works:
Every other LLM stores weights in 32-bit or 16-bit floats.
BitNet uses 1.58 bits.
Weights are ternary just -1, 0, or +1. That's it. No floats. No expensive matrix math. Pure integer operations your CPU was already built for.
The result:
- 100B model runs on a single CPU at 5-7 tokens/second
- 2.37x to 6.17x faster than llama.cpp on x86
- 82% lower energy consumption on x86 CPUs
- 1.37x to 5.07x speedup on ARM (your MacBook)
- Memory drops by 16-32x vs full-precision models
The wildest part:
Accuracy barely moves.
BitNet b1.58 2B4T their flagship model was trained on 4 trillion tokens and benchmarks competitively against full-precision models of the same size. The quantization isn't destroying quality. It's just removing the bloat.
Message too long. Click here to view full text.