EFA and OpenFabrics
This post attempts to clarify the use of EFA and OpenMPI in multi-node inference, focusing on how components like the Matching Transport Layer (MTL), libfabric, and the OFI framework enable efficient, low-latency communication. Installing EFA for a node Ref
Terms Involved
Matching Transport Layer (MTL) The Matching Transport Layer (MTL) is a component used in the Open MPI implementation when utilizing libfabric for managing two-sided tagged messages. MTL is responsible for matching message tags and ensuring that messages are delivered to the correct destination. This layer is designed to work closely with the underlying network fabric, such as EFA, to provide efficient and reliable message passing between nodes in a high-performance computing (HPC) environment.
EFA
EFA integrates with the libfabric API, which is part of the OpenFabrics Interfaces (OFI) framework. This integration allows EFA to bypass the operating system kernel, reducing overhead and enabling low-latency, high-throughput communication directly with the network interface hardware. This is critical for scaling HPC and machine learning applications on AWS. By leveraging these components, AWS’s EFA can provide enhanced performance for HPC and ML applications, enabling efficient inter-node communication and supporting large-scale computational tasks.
MPI (Message Passing Interface) Communication protocol for parallel programming in distributed computing environments, particularly in HPC clusters. It provides a standardized way for processes to communicate with each other across nodes in a cluster, supporting point-to-point and collective communication.
OFED (OpenFabrics Enterprise Distribution) Set of open-source software components that enable high-performance networking on clusters, especially those using InfiniBand and other high-performance fabrics. It provides the necessary drivers, libraries, and tools to enable low-latency, high-bandwidth communication between nodes in a cluster. OFED is commonly used in environments where RDMA (Remote Direct Memory Access) and InfiniBand technologies are deployed, facilitating direct memory access and efficient data transfer.
LibFabric Low-level communication library designed to abstract hardware-specific communication protocols. It provides a unified API for building high-performance network communication systems and is often used for RDMA, shared memory, and other communication technologies. Libfabric allows applications to use different network fabrics (such as InfiniBand, iWARP, and RoCE) without being tightly coupled to a particular hardware implementation, making it highly flexible and adaptable for various cluster environments.
RDMA (Remote Direct Memory Access) Allows for high-speed data transfer between nodes in a cluster without involving the CPU, offering significant reductions in latency and CPU utilization. By directly accessing the memory of a remote node, RDMA enables faster data transfers than traditional networking methods, making it ideal for applications that require large amounts of data to be exchanged between nodes with minimal overhead. RDMA is supported by technologies like InfiniBand and RoCE (RDMA over Converged Ethernet), and is critical in HPC, machine learning, and cloud computing environments.
OpenFabrics Interfaces (OFI)
OpenFabrics Interfaces (OFI) is a framework designed to expose communication services to middleware and applications, particularly in high-performance computing (HPC) environments. Here are the key aspects of OFI:
Purpose and Design
OFI is specifically designed to meet the performance and scalability requirements of HPC applications such as Message Passing Interface (MPI) libraries, Symmetric Hierarchical Memory Access (SHMEM) libraries, Partitioned Global Address Space (PGAS) programming models, Database Management Systems (DBMS), and enterprise applications running in tightly coupled network environments. Its design aligns fabric services with application needs, providing a tight semantic fit between applications and the underlying fabric hardware. This reduces software overhead and improves efficiency when transmitting or receiving data over a fabric.
Components
- Libfabric: The primary implementation of OFI is the libfabric library, which defines and exports the user-space API of OFI. Libfabric is designed to be independent of the underlying network protocols and the specific implementation of networking devices, making it versatile and widely applicable.
- Provider Libraries: These libraries interface with the hardware and provide the necessary services to the applications through libfabric.
- Kernel Services and Daemons: These components support the user-space libraries and manage the communication between the application and the hardware.
- Applications: These are used to validate and benchmark the performance of the OFI framework.
References
If you found this useful, please cite this post using
Senthilkumar Gopal. (Dec 2023). EFA and OpenFabrics. sengopal.me. https://sengopal.me/posts/efa-and-openfabrics
or
@article{gopal2023efaandopenfabrics, title = {EFA and OpenFabrics}, author = {Senthilkumar Gopal}, journal = {sengopal.me}, year = {2023}, month = {Dec}, url = {https://sengopal.me/posts/efa-and-openfabrics} }