SOAR: Minimizing Network Utilization Cost With Bounded In-Network Computing

Research output: Contribution to journalArticlepeer-review

Abstract

In-network computing via smart networking devices is a recent trend in modern datacenter networks. State-of-the-art switches with near line-rate computing and aggregation capabilities enable acceleration and improved resource utilization for modern applications like large-scale distributed and federated machine learning, as well as big data analytics. reducing the overall cost incurred by such a deployment. Such limitations on the number of in-network computing elements arise, e.g., in incremental upgrades of network infrastructure, and are also due to requiring specialized middleboxes, or FPGAs, for supporting heterogeneous workloads, and multiple tenants. We present an efficient optimal algorithm for placing such devices in tree networks with arbitrary link rates, and further evaluate its performance in various scenarios and for various tasks, including federated/distributed ML and big data analytics. Our results show that even a small fraction of network devices supporting in-network aggregation leads to a significant reduction in network utilization cost. Furthermore, we show that various intuitive strategies for performing such placements are significantly inferior compared with our solution, for varying workloads, tasks, and link rates.

Original languageEnglish
Pages (from-to)1832-1851
Number of pages20
JournalIEEE Transactions on Network and Service Management
Volume21
Issue number2
DOIs
StatePublished - 1 Apr 2024

Keywords

  • In-network computing
  • big data analytics
  • data center networks
  • distributed machine learning
  • federated machine learning
  • minimum network utilization cost

ASJC Scopus subject areas

  • Computer Networks and Communications
  • Electrical and Electronic Engineering

Fingerprint

Dive into the research topics of 'SOAR: Minimizing Network Utilization Cost With Bounded In-Network Computing'. Together they form a unique fingerprint.

Cite this