

# Many-Task Computing on the Intel Xeon Phi Coprocessor

Jeffrey Johnson Illinois Institute of Technology johnson@hawk.iit.edu

Scott Krieder, Benjamin Grimmer, Department of Electrical Engineering Dustin Shahidehpour, Ioan Raicu Department of Computer Science Illinois Institute of Technology

# Abstract

This work provides an in depth understanding of MTC on the Intel Xeon Phi and presents our preliminary results of running Many-Task Computing workloads on pre-production Intel Xeon Phi hardware. By utilizing Intel's provided SCIF protocol for communication across the PCI-Express bus, we have achieved over 90% efficiency. These results match or outperform OpenMP in offloading tasks longer than 300 uS. These results demonstrate the ability to further develop a framework for executing heterogeneous tasks on the Intel Xeon Phi.

## What is the Xeon Phi?



The Intel Xeon Phi is a PCI-Express based expansion card comprised of 60 physical cores supporting 240 hardware threads to produce up to 1 teraflop of double-precision floating point performance in a single accelerator. Production versions are available for \$2,000-\$2,700 depending on memory bandwidth.

# Micro benchmarks



## **SCIF Framework**



Our framework uses SCIF to offload work to defined microkernels on the Intel Xeon Phi avoiding the overhead of code offload and thread creation and providing the same interface as GeMTC.



## **Alternative Solutions**

#### **Programing Models**

- OpenMP
- OpenCL
- Native Xeon Phi Applications
- Intel Math Kernel Library

### In Production

These cards have already been introduced to HPC machines such as Stampede, which hosts over 6,400 Xeon Phi Accelerators totaling in over 7 petaflops of TACC double-precision performance.

## Conclusions

The Xeon Phi is an extremely promising new product in the world of accelerators. The familiarity of programming and libraries make it an excellent choice for HPC and may complement or replace graphics card based systems.

## **Future Work**

- Compute Intensive Workloads
- Multi-card single-node support
- Swift/T Integration
- Optimize for data transfer "sweet spot"

## References

Xeon Phi - http://www.intel.com/content/www/us/en/high-performance-computing/high-performance-xeon-phi-coprocessor-brief.html Stampede - http://www.tacc.utexas.edu/resources/hpc/stampede GeMTC - http://datasys.cs.iit.edu/projects/GeMTC/