End-to-End Network Performance Monitoring for Dispersed Computing


With a growing demand for computationally intensive applications with widely distributed data sources, there is an increased need for dispersed computing systems that can schedule computation on cloud nodes across a network. A key challenge in dispersed computing is the need to characterize the end to end network performance for data transfer between compute points and measure it at run-time so that optimized computation scheduling decisions can be made. To address this challenge, we empirically study file transfer times between geographically dispersed cloud computing points using SCP (secure copy). We show, to our knowledge for the first time, that the end to end file transfer latency experienced by this widely-used protocol is better modelled to have a quadratic dependency on the file size (instead of a simple linear dependency that would be expected if the network were treated as an bit-pipe with a deterministic average bandwidth). We incorporate this observation into the design of a real-time network profiler for dispersed computing that determines best fit quadratic regression parameters between each pair of nodes and reports them to the scheduler node. Our end to end network quadratic latency profiler has been released as a key part of an open source tool dispersed computing profiler called DRUPE, and also as part of a DAG-based dispersed computing scheduling tool called CIRCE.

2018 International Conference on Computing, Networking and Communications (ICNC)