Python native workflows

Leverage Python multiprocessing on small and large distributed systems

Dragon is a composable and distributed runtime that enables users to create scalable, complex, and resilient HPC and AI applications, workflows, and services through standard Python interfaces. Dragon provides capabilities to address many of the challenges around programmability, memory management, transparency, and efficiency on distributed computing systems.

Some of the features Dragon provides to address these challenges: portable and transparent programmability based on standard Python APIs, fine-grained process management, high-speed communication, telemetry tools,integration with Jupyter notebooks, compatibility with a range of existing Python apps and tools, and a high-performance distributed dictionary.

Next

Build locally and then deploy globally

Develop scalable workflows with minimal code modification

Dragon implements nearly all the multiprocessing API, allowing users to manage processes, high-level communication constructs (eg Queue, Connection, Semaphore, Barrier), and shared data (eg Value, Array, Dict) at supercomputing scales. The current Dragon v0.10 release scales to roughly 1,000 supercomputing nodes for several use cases.

Next

Composable workflows

Dragon is a distributed environment for developing high-performance tools, libraries, and applications at scale.

Dragon also extends the multiprocessing API to support sophisticated HPC/AI workflows. For example, users can orchestrate ensembles of MPI and model training processes using multiprocessing and Dragon-native interfaces.

Next

Key features of the DragonHPC stack

Some highlights that make DragonHPC a valuable distributed runtime

Telemetry tools

Visualize what your apps are up to in real-time.

Ease of use

Documentation and cookbook examples to get you up and running.

Composable versatility

Can be deployed as part of your current stack for added performance and function.

High speed communication

Dragon is optimized for the HPE Cray Slingshot network, but also supports other networks including Infiniband and standard ethernet.

Distributed dictionary

In-memory distributed dictionary for performance and versatility in managing data.

Portable

Deploy on a range of systems and with a variety of other workflow tools.

Release v0.10 out now on github

Available now at https://github.com/DragonHPC/dragon

Join our Slack channel

Our next v0.11 release is planned for early 2025