: It is significantly faster than second-order methods like SparseGPT because it only requires a single forward pass over a small calibration dataset to estimate activation norms. Comparison with Other Pruning Methods Wanda bridges the gap between simple magnitude-based pruning and complex optimization-based approaches: Method Metric Complexity Retraining Required? Magnitude Pruning $ W $ only Wanda (Sun et al.) $ W \times SparseGPT Second-order Hessian High No Standard Fine-tuning Gradients Very High Yes Technical Background The core insight behind Wanda is that weight magnitude alone does not tell the full story of importance; a large weight that rarely receives significant activation might be less critical than a smaller weight that is frequently "fired". By multiplying the weight by the norm of its input activations, Wanda captures this dynamic importance. For further technical exploration, you can find the original paper and associated research on
A collective gasp swept through Oakhaven. The sky turned a deep, bruising purple. For the first time in living memory, the air grew crisp. A breeze swept through, carrying the scent of rain from the distant mountains. wandasun
He squinted. He saw the gardens, overgrown and wild, fighting for space because there was no winter to kill the pests. He saw the faces—smooth, unlined, but lacking depth. He saw a town that hadn't changed in forty years. : It is significantly faster than second-order methods
People screamed, running for cover, hiding under awnings, clutching their children. They shivered in the sudden coolness. By multiplying the weight by the norm of
"No," Wanda said, placing a hand on his wrench. Her touch was ice cold. "I built you into the system too, Elias. You are the failsafe. You are the one who has to let it go."
Hand trembling, he gave it to her.