Efficient AI Across Edge, Near-Edge, and Cloud

Thu, Sep 25 2025

Overview

Modern applications like smart cameras, self-driving cars, and VR devices rely on powerful AI models. Running these models quickly and efficiently across phones, edge devices, and cloud servers is a tough challenge.

Our work develops two frameworks to make this possible:

DONNA finds the best way to split and run AI models across different types of devices, from traditional CPUs and GPUs to new Compute-In-Memory (CIM) accelerators, so they use less energy while staying fast.
HiDist takes the idea further by looking at the whole system: edge devices near the user, stronger near-edge servers, and powerful cloud machines. It decides where each part of the model should run to save energy and boost performance, instead of simply sending everything to the cloud.

Fig.1 shows this idea in action: AI models are broken into pieces and distributed across device tiers, with the system automatically choosing the best balance between speed and efficiency.

Inside DONNA: Approach & Key Results

The DONNA framework introduces a smarter way to run AI models across different devices. Instead of pushing the entire model to a single machine, DONNA carefully splits the model into pieces and assigns each part to the device best suited for it, whether that’s a CPU, GPU, or newly-emerging technology CIM.

As Fig.2 shows, DONNA uses a profiler to understand how fast and energy-efficient each device is. Then, it searches for the best distribution strategy that balances two goals at once: high throughput and low energy use. Unlike multiple earlier approaches that focused on only one of these goals, DONNA achieves both with user-controllable parameter, making it more efficient and adaptable across a variety of hardware setups.

Highlighted Results

DONNA consistently finds the sweet spot between speed and energy. As shown in Fig.3, it produces smooth trade-offs, or “Pareto curves”, across devices and networks. With strong communication links and heterogenous devices (Fig. 3a), DONNA maps out a clear Pareto curve, while under weaker links (Fig. 3c), it still spreads optimization points to reflect user preferences and maintain flexibility.

DONNA adapts to both fast and weak networks, balancing throughput and energy. — Fig. 3: DONNA adapts to both fast and weak networks, balancing throughput and energy

HiDist: Smarter Distribution Across Tiers

While DONNA showed how to balance throughput and energy, it used a single energy-aware parameter to treat all devices the same. In reality, devices behave very differently. Some scale well with larger workloads and deliver big speedups (Fig.4a), while others are far more energy-efficient for the same task (Fig.4b).

Fig. 5: Comparison of normalized system throughput and edge-tier energy for HiDist vs. naive full offloading strategies across ViT and YOLOv11 variants. HiDist forms a Pareto front, while naive offloading shows poor trade-offs.

Copyright

The data and results presented in this work are protected by copyright and may only be used with proper citation. Any use of this work should reference the following papers:

M. F. AlShams, K. S. Smagulova, S. A. Fahmy, M. E. Fouda, and A. M. Eltawil, “DONNA: Distributed Optimized Neural Network Allocation on CIM-Based Heterogeneous Accelerators,” in 2024 IEEE International Conference on Edge Computing and Communications (EDGE), 2024, pp. 149–156, doi:10.1109/EDGE62653.2024.00027.

Efficient AI Across Edge, Near-Edge, and Cloud

Overview

Inside DONNA: Approach & Key Results

Highlighted Results

HiDist: Smarter Distribution Across Tiers

Copyright

Share

Communication and Computing Systems Lab (CCSL)

Efficient AI Across Edge, Near-Edge, and Cloud

Overview

Inside DONNA: Approach & Key Results

Highlighted Results

HiDist: Smarter Distribution Across Tiers

Copyright

Related People

Ahmed Eltawil

Mojtaba Alshams

Kamilya Smagulova

Share