Can SiGeo Simplify NAS Using Info Theory & Loss Landscape Geometry?

Original title: SiGeo: Sub-One-Shot NAS via Information Theory and Geometry of Loss Landscape

Authors: Hua Zheng, Kuang-Hung Liu, Igor Fedorov, Xin Zhang, Wen-Yen Chen, Wei Wen

The article dives into improving how we design neural networks using something called Neural Architecture Search (NAS). There are two main methods: one that’s fast but needs lots of training, and another that’s quicker but not always reliable, especially for complex tasks like Recommender Systems (RecSys). To bridge these gaps, they introduce a new approach called “sub-one-shot” NAS. This method involves a “warm-up” phase where they train a supernet (a versatile network that contains many possible architectures) using only a small bit of data. They call their approach SiGeo, which is based on a new idea connecting the supernet warm-up with how effective their method is. Their experiments show that SiGeo consistently outperforms other methods on different benchmarks for NAS, giving similar results to the time-consuming methods but using way less computation—about 60% less!

Original article: https://arxiv.org/abs/2311.13169