Summary SIMD Optimization for ARM and RISC-V Vector Extensions arxiv.org
4,695 words - PDF document - View PDF document
One Line
The migration of ARM NEON Intrinsics codes to RISC-V Vector Extensions using SIMDe resulted in a significant speedup of 1.51x to 5.13x in the Google XNNPACK library.
Slides
Slide Presentation (12 slides)
Key Points
- The paper discusses the migration of performance legacy codes from ARM NEON Intrinsics to RISC-V Vector Extensions (RVV).
- The authors propose the use of the open-source tool "SIMD Everywhere" (SIMDe) to automate the migration process.
- They enhance SIMDe to enable the conversion of ARM NEON Intrinsics types and functions to their corresponding RVV Intrinsics types and functions.
- The enhanced SIMDe achieves speedup ranging from 1.51x to 5.13x compared to the original SIMDe.
- The authors validate their implementation using unit tests within SIMDe and Spike simulator.
- They also conduct benchmark experiments using XNNPACK as a benchmark.
- The RVV-enhanced SIMDe achieves significant speedup compared to the original SIMDe.
- This migration strategy has the potential to drive significant enhancements across a range of applications in the Android ecosystem.
Summaries
22 word summary
ARM NEON Intrinsics codes are migrated to RISC-V Vector Extensions using SIMDe, achieving speedup of 1.51x to 5.13x in Google XNNPACK library.
77 word summary
This paper explores the migration of ARM NEON Intrinsics codes to RISC-V Vector Extensions (RVV) using the open-source tool "SIMD Everywhere" (SIMDe). The authors enhance SIMDe to convert ARM NEON Intrinsics types and functions to their RVV counterparts. Through experiments with the Google XNNPACK library, they achieve speedup ranging from 1.51x to 5.13x compared to the original SIMDe. The paper provides background information, migration strategies, and experimental results, highlighting improved performance and potential enhancements for Android applications.
152 word summary
This paper discusses the migration of performance legacy codes from ARM NEON Intrinsics to RISC-V Vector Extensions (RVV). The authors propose using the open-source tool "SIMD Everywhere" (SIMDe) to automate the migration process. They enhance SIMDe to enable the conversion of ARM NEON Intrinsics types and functions to their corresponding RVV Intrinsics types and functions. The authors conduct experiments with the Google XNNPACK library to evaluate the performance of their enhanced SIMDe, finding that it achieves speedup ranging from 1.51x to 5.13x compared to the original SIMDe. The paper provides background information on Arm NEON and RISC-V Vector Extensions, details strategies for migrating ARM NEON Intrinsics to RISC-V Vector Extensions, explains how to use SIMDe for code porting, and presents experimental results comparing the native SIMDe with the RVV-enhanced SIMDe. The authors successfully automate the migration process, achieving improved performance and potential enhancements across a range of applications in the Android ecosystem.
390 word summary
This paper discusses the migration of performance legacy codes from ARM NEON Intrinsics to RISC-V Vector Extensions (RVV). Many libraries, such as OpenCV, FFmpeg, XNNPACK, and Eigen, utilize Arm or x86 SIMD Intrinsics to optimize programs for performance. With the emergence of RVV, there is a need to migrate these libraries and legacy codes for improved performance on RISC-V platforms.
To automate the migration process, the authors propose using the open-source tool "SIMD Everywhere" (SIMDe). They enhance SIMDe to enable the conversion of ARM NEON Intrinsics types and functions to their corresponding RVV Intrinsics types and functions. They devise strategies for converting Neon Intrinsics types to RVV Intrinsics by considering vector length agnostic (vla) architectures. They also develop customized conversions for each function based on the results of RVV code generations.
The authors conduct experiments with the Google XNNPACK library to evaluate the performance of their enhanced SIMDe. They compare it with the original SIMDe and find that the enhanced SIMDe achieves speedup ranging from 1.51x to 5.13x compared to the original SIMDe.
The remainder of the paper is organized as follows. In Section 2, the authors provide background information on Arm NEON and RISC-V Vector Extensions. In Section 3.1, they introduce the SIMD Everywhere design pattern for intrinsics function and type conversion. They detail their strategies for leveraging SIMD Everywhere to migrate ARM NEON Intrinsics to RISC-V Vector Extensions in Sections 3.2 and 3.3. They explain how to use SIMDe for code porting in Section 3.4. Finally, in Section 4, they present the experimental results, comparing the native SIMDe with their RVV-enhanced SIMDe.
The authors validate their implementation using unit tests within SIMDe and Spike simulator. They also conduct benchmark experiments using XNNPACK as a benchmark. They choose 10 commonly used neural network computation functions implemented using NEON Intrinsics in XNNPACK and transform them into RVV Intrinsics using SIMDe. The performance of the converted code is evaluated using dynamic instruction count as the metric.
The experimental results show that the RVV-enhanced SIMDe achieves significant speedup compared to the original SIMDe, ranging from 1.51x to 5.13x across the tested functions.
In conclusion, the authors successfully automate the migration process from ARM NEON to RISC-V Vector Extensions using SIMDe, achieving improved performance. This migration strategy has the potential to drive significant enhancements across a range of applications in the Android ecosystem.
454 word summary
SIMD Everywhere Optimization from ARM NEON to RISC-V Vector Extensions
The paper discusses the migration of performance legacy codes from ARM NEON Intrinsics to RISC-V Vector Extensions (RVV). Many libraries, such as OpenCV, FFmpeg, XNNPACK, and Eigen, utilize Arm or x86 SIMD Intrinsics to optimize programs for performance. With the emergence of RVV, there is a need to migrate these libraries and legacy codes for improved performance on RISC-V platforms. The migration process currently requires manual rewriting, which is time-consuming and error-prone.
To address this issue, the authors propose the use of the open-source tool "SIMD Everywhere" (SIMDe) to automate the migration process. They enhance SIMDe to enable the conversion of ARM NEON Intrinsics types and functions to their corresponding RVV Intrinsics types and functions. For type conversion, they devise strategies to convert Neon Intrinsics types to RVV Intrinsics by considering the vector length agnostic (vla) architectures. They also analyze commonly used conversion methods in SIMDe and develop customized conversions for each function based on the results of RVV code generations.
The authors conduct experiments with the Google XNNPACK library to evaluate the performance of their enhanced SIMDe. They compare it with the original SIMDe, which does not utilize customized RVV implementations for the conversions. The enhanced SIMDe achieves speedup ranging from 1.51x to 5.13x compared to the original SIMDe.
The remainder of the paper is organized as follows. In Section 2, the authors provide background information on Arm NEON and RISC-V Vector Extensions. In Section 3.1, they introduce the SIMD Everywhere design pattern for intrinsics function and type conversion. They detail their strategies for leveraging SIMD Everywhere to migrate ARM NEON Intrinsics to RISC-V Vector Extensions in Sections 3.2 and 3.3. They explain how to use SIMDe for code porting in Section 3.4. Finally, in Section 4, they present the experimental results, comparing the native SIMDe with their RVV-enhanced SIMDe.
The authors validate their implementation using unit tests within SIMDe and Spike simulator. They also conduct benchmark experiments using XNNPACK as a benchmark. They choose 10 commonly used neural network computation functions implemented using NEON Intrinsics in XNNPACK and transform them into RVV Intrinsics using SIMDe. The performance of the converted code is evaluated using dynamic instruction count as the metric.
The experimental results show that the RVV-enhanced SIMDe achieves significant speedup compared to the original SIMDe. The speedup ranges from 1.51x to 5.13x across the tested functions.
In conclusion, the authors successfully automate the migration process from ARM NEON to RISC-V Vector Extensions using SIMDe. Their enhanced SIMDe achieves improved performance compared to the original SIMDe when converting NEON code to RVV code. This migration strategy has the potential to drive significant enhancements across a range of applications in the Android ecosystem.