Information dynamics plays a crucial role in complex systems, from cells to societies. Recent advances in statistical physics have made it possible to capture key network properties, such as flow diversity and signal speed, using entropy and free energy. However, large system sizes pose computational challenges. We use graph neural networks to identify suitable groups of components for coarse-graining a network and achieve a low computational complexity, suitable for practical application. Our approach preserves information flow even under significant compression, as shown through theoretical analysis and experiments on synthetic and empirical networks. We find that the model merges nodes with similar structural properties, suggesting they perform redundant roles in information transmission. This method enables low-complexity compression for extremely large networks, offering a multiscale perspective that preserves information flow in biological, social, and technological networks better than existing methods mostly focused on network structure.