MIT researchers developed Attention Matching, a KV cache compaction technique that compresses LLM memory by 50x in seconds — ...
This breakthrough could make AI far more practical for large-scale use as the method promises to cut cloud computing costs and process huge datasets faster.