ARM Cortex-A processor and GCC command line

US1M(A)

The GNU Compiler Collection (GCC) command line options for ARM processors were originally designed many years ago, and the list of available processors and variants at the time was much shorter than today. As the ARM architecture evolves, the options required to get the best code from GCC have changed, but various attempts have been made to ensure that the existing set of options does not change its intent. Designing the compiler means that the options available for the most efficient use of the ARM CortexTM-A processor are currently quite complex. This blog post contains three aspects of the GCC command line options: CPU, floating point, and SIMD (single instruction multiple data) acceleration.

What options should I use for my CPU?

First, let's take a look at the main options for telling the compiler which CPU you are using; later, we'll discuss some of the more advanced options available in special cases.

Whenever you compile a file, the compiler needs to know the type of CPU you are trying to use to run the target code. The first choice for this purpose is -mcpu= Option. As you might expect, cpu-name has been replaced with the specific name of the CPU type you have, but with lowercase. For example, for Cortex-A9, the option is -mcpu=cortex-a9. GCC currently supports all Cortex-A processors prior to Cortex-A15 (including itself); namely:

Cortex-A5 -mcpu=cortex-a5

Cortex-A7 -mcpu=cortex-a7

Cortex-A8 -mcpu=cortex-a8

Cortex-A9 -mcpu=cortex-a9

Cortex-A15 -mcpu=cortex-a15

If your GCC version does not recognize any of these processors, it may be that its version is too low and you should consider upgrading. If you don't specify which CPU to use, GCC will use the built-in defaults - which defaults depending on how the compiler was originally built, and may mean that the generated code will execute very slowly on the CPU you own. (or not at all).

Add floating point and SIMD

Currently, all ARM Cortex-A processors on the market are equipped with floating-point units, and most have SIMD units (commonly known as NEONTM) that implement ARM advanced SIMD processor extensions. However, the exact instruction set available depends on the processor it owns, and GCC requires a separate option for control; it does not attempt to process it with the -mcpu option. The selection of floating point and SIMD instructions is controlled by the -mcpu option, and the following table lists the recommended choices for various CPUs:

The VFPv3 and VFPv4 implementations initially start with 32 double-precision registers, but in the absence of NEON, the first 16 registers can be selected; this is controlled by the d16 part of the option name. The fp16 portion of the name specifies whether there are half-precision (16-bit) floating-point load, store, and conversion instructions; this is an extension to VFPv3, but applies to all VFPv4 implementations.

For historical reasons, GCC only uses these instructions when it is explicitly told that floating point and NEON instructions can be safely used. The options for controlling here are somewhat confusing, and part of the options can also change the ABI that the compiler follows. The -mfloat-abi option has three possible options:

-mfloat-abi=soft -- Ignore all FPU and NEON instructions, use only the core register set and use library calls to simulate all floating point operations.

-mfloat-abi=softfp -- Uses the same calling rules as -float-abi=soft, but uses floating point and NEON instructions where applicable. This option is binary, compatible with -mfloat-abi=soft, and can be used to improve the performance of code that must follow a soft floating point environment, but needs to be an environment where known related hardware instructions are also available.

-mfloat-abi=hard -- Use floating point and NEON instructions where applicable, and also change ABI calling rules to generate more efficient function calls; now you can pass floating point and vector types between functions in the extension register, Not only does it save a lot of copy operations, but it also means fewer calls to pass parameters on the stack.

Which of the above options should be used depends largely on your target system, or the default option is the correct option. Take Ubuntu 12.04 (Precise) as an example, it currently uses -mfloat-abi=hard by default.

Vectorized floating point arithmetic

The NEON architecture includes instructions for both integer and floating-point data types, and GCC now has powerful automatic vectorization optimizations to find out when it's appropriate to use a vector engine that improves performance. However, to the surprise of many users, even if they might expect to do so, the compiler cannot vectorize their code.

The first thing to remember is that the auto vectorization program is enabled by default only when -O3. There is an option to open it at other times; you can find it in the GCC manual.

However, floating-point code often fails to implement vectorization even if vectorization is enabled. The reason is that although floating-point operations in NEON use the IEEE single-precision format to accommodate values, in order to minimize the power required in the NEON unit and maximize throughput, only the inputs and results are within the normal operating range (ie When the value is not an abnormal value or NaN), the vector engine will compile completely according to the standard. GCC is configured by default to generate code that strictly adheres to the IEEE floating-point algorithm specification. The limitations described above mean that it is not appropriate to use SIMD instructions by default.

Fortunately, GCC does provide several command line options that can be used to accurately control the level of compliance with IEEE standards required. Although the details are not discussed here, in most cases it is absolutely safe to use the -ffast-math option to relax rules and enable vectorization.

You can also use the -Ofast option in GCC 4.6 or higher to achieve essentially the same effect. It turns on -O3 and a variety of other optimizations that are often safe to use to get the best performance out of your code.

Another point to remember is that NEON only supports vector operations for single-precision data. Unless the code being written is used to process this format, it may be found that vectorization is not feasible. You should also be aware that floating-point constants (literal values) will eventually force the compiler to perform calculations in double precision. In the C and C++ environment, write "1.0F" instead of "1.0" to make sure the compiler knows what you mean.

Finally, if you still have questions about why the vectorization program doesn't work as you expect, and you're ready to explore it yourself, GCC provides a wealth of information about what it does. -fdump-tree-vect and -ftree-vectorizer-verbose= The option controls the amount of information generated, and the level is a number from 1 to 9. Although most of the information generated is only of interest to compiler developers, you will also find hints in the output from time to time to understand why your code is not vectorized as expected.

In summary, since there are a lot of options, what options should I use in my daily operations? Fortunately, once the target environment is determined, most of the options will not change periodically. Here are some examples:

NEON-based Cortex-A15 processor and some floating-point code that operates on data arrays that use the "floating point" data type. The computing environment can support passing parameters in floating point registers:

Arm-gcc -O3 -mcpu=cortex-a15 -mfpu=neon-vfpv4 -mfloat-abi=hard \

-ffast-math -o myprog.exe myprog.c

NEON-free Cortex-A7 processor handles floating point code. This computing environment only supports passing arguments in integer registers, but floating point hardware can be used.

Arm-gcc -O3 -mcpu=cortex-a7 -mfpu=vfpv4-d16 -mfloat-abi=softfp \

-o myprog2.exe myprog2.c

Finally, the Cortex-A9 processor is used to perform operations in an environment where the floating-point/NEON register set is completely unusable (for example, because it is in the middle of the interrupt handler and the floating-point context is reserved for user state).

Arm-gcc -O3 -mcpu=cortex-a9 -mfloat-abi=soft -c -o myfile.o myfile.c

ZGAR AZ BOX Disposable

ZGAR LEA BOX Disposable

ZGAR electronic cigarette uses high-tech R&D, food grade disposable pod device and high-quality raw material. All package designs are Original IP. Our designer team is from Hong Kong. We have very high requirements for product quality, flavors taste and packaging design. The E-liquid is imported, materials are food grade, and assembly plant is medical-grade dust-free workshops.

Our products include disposable e-cigarettes, rechargeable e-cigarettes, rechargreable disposable vape pen, and various of flavors of cigarette cartridges. From 600puffs to 5000puffs, ZGAR bar Disposable offer high-tech R&D, E-cigarette improves battery capacity, We offer various of flavors and support customization. And printing designs can be customized. We have our own professional team and competitive quotations for any OEM or ODM works.

We supply OEM rechargeable disposable vape pen,OEM disposable electronic cigarette,ODM disposable vape pen,ODM disposable electronic cigarette,OEM/ODM vape pen e-cigarette,OEM/ODM atomizer device.

Disposable E-cigarette, ODM disposable electronic cigarette, vape pen atomizer , Device E-cig, OEM disposable electronic cigarette

ZGAR INTERNATIONAL TRADING CO., LTD. , https://www.zgarvapor.com