Author Topic: Compiler Tweaks  (Read 11089 times)

Offline lawrence

  • Administrator
  • Sr. Member
  • *****
  • Posts: 304
  • Karma: +15/-0
    • View Profile
Compiler Tweaks
« on: February 02, 2013, 04:01:57 am »
I'm a believer in getting the basics going first, *then* move onto tweaking for chip specifics.
I'm at that point now though with my code, so I thought I'd give a few notes on some A10 specific optimizations that can be made while compiling.

The A10 includes support for the following:
  • ARM Cortex A8 architecture
  • Thumbv2 (smaller compile size instructions)
  • Hard floating point - NEON, and VFPv3

A more detailed explanation of each of these features is here on the TI site  (as the AllWinner site is rather empty when it comes to info related to this) -
http://processors.wiki.ti.com/index.php/Cortex-A8
http://processors.wiki.ti.com/index.php/Cortex-A8_Features

Hard (float) choices
So, how do we tell our compiler to optimize for some of these things?

Well, the first option's we have are choosing between Hard Floating point, or Soft Floating point.
Our CPU has hardware floating point instructions, which are faster than software ones, and can be used.
Seems like a simple choice - hardware is faster than software implementations, so lets use that.

Bzzzzt.  Wrong.

The caveat is that these choices are not compatible due to the ABI interfaces used in Linux, so, you use one, or the other, but not both at the same time. 

So, you either choose hard float, or soft float, and stick to that throughout your Kernel / User Space apps.

Luckily we have hard float compilers readily available, and our kernel is hard float compatible, and most of the kernel produced by others all seem to be using Hard Float ABI's so its an easy choice.

Or next choice is choosing *which* hard float set to use!  We have 2 sets of optimizations to choose from.
NEON and VFPv3.   Lucky or what!  I remember when you had to buy a math co-processor, and it cost crazy money.  Ahh, progress :)


So, how do we tell our compiler to use hard float for floating point stuff -

For VFP -

Code: [Select]
-mfloat-abi=hard
-mfpu=vfpv3

For NEON -

Code: [Select]
-mfloat-abi=hard
-mfpu=neon


Which do I choose?
Well, probably NEON, as its a superset of VFP, but it depends on the math (you do)  ::)

More info on that here -
http://wiki.debian.org/ArmHardFloatPort/VfpComparison



So, hard float using NEON looks like its a no brainer, but what about the other stuff?

Our next choice is quite easy - our CPU is a Cortex-A8, so we choose that as our compiler optimization for cpu.


If we include our previous optimizations


Code: [Select]
-mfloat-abi=hard
-mfpu=neon
-mcpu=cortex-a8
-mtune=cortex-a8


But wait, theres more!

We can tweak more.   "Safe(ish)" things to choose are ones like
Code: [Select]
-O3
-funroll-loops


So, lets do that, and see what our final evil call looks like

Code: [Select]
-mfloat-abi=hard  -mfpu=neon  -mcpu=cortex-a8 -mtune=cortex-a8 -O3 -funroll-loops

Great, now how do we integrate that with our compiles?
Well...

Code: [Select]
export CFLAGS='-mfloat-abi=hard  -mfpu=neon  -mcpu=cortex-a8 -mtune=cortex-a8 -O3 -funroll-loops'
Then ./configure / make / make install as usual.

If you want to be even scarier, there is... more.

Code: [Select]
export CFLAGS='-mfloat-abi=hard  -mfpu=neon  -mcpu=cortex-a8 -mtune=cortex-a8 -O3 -funroll-loops -ftree-vectorize -fassociative-math -funsafe-math-optimizations -Os'
We're getting into Gentoo linux territory here though (mild joke).


What do those new bits do?

Quote
-fassociative-math:
Needed to enable auto-vectorization on arm. Part of -funsafe-math-optimizations -ffast-math -Ofast

-funsafe-math-optimizations:
Needed to enable auto-vectorization for NEON (because its not fully IEEE754 compatible). Part of -ffast-math -Ofast

-Os:
NAND/SD/Caches should be the bottleneck.

-ftree-vectorize:
Activates auto-vectorization but should be kicked out. Gives between zero and negligible performance gains with NEON (or overall...broken part of gcc or other compilers). Part of -O3 -Ofast

Do note that they are also known as the  good old Segfaultflags, as turning on compiler optimization leads to strange things (bugs...).  As I have troubleshooted my way from issues all the way back to compiler bugs and gone grr, I often don't go that far unless I really need to.  YMMV though...



So, to recap:

(Below is a sliding scale of safeness vs Speed, in order)

Safe +-
Code: [Select]
-mfloat-abi=hard  -mfpu=neon  -mcpu=cortex-a8 -mtune=cortex-a8  -O3
Less Safe
Code: [Select]
-mfloat-abi=hard  -mfpu=neon  -mcpu=cortex-a8 -mtune=cortex-a8  -O3  -funroll-loops
May even work, but I wouldn't build a kernel with it  8)
Code: [Select]
-mfloat-abi=hard  -mfpu=neon  -mcpu=cortex-a8 -mtune=cortex-a8 -O3 -funroll-loops -ftree-vectorize -fassociative-math -funsafe-math-optimizations -Os
To use -
export CFLAGS=' <your choice from above> '

./configure
make
...




More references -
http://gcc.gnu.org/onlinedocs/gcc/ARM-Options.html

Speed runs for different options -
https://wiki.linaro.org/MichaelHope/Sandbox/CoreMark1

Testing bits you can use for .. testing
http://www.fourmilab.ch/fbench/
« Last Edit: February 02, 2013, 04:06:44 am by lawrence »

Offline MaQ

  • Newbie
  • *
  • Posts: 8
  • Karma: +0/-0
    • View Profile
Re: Compiler Tweaks
« Reply #1 on: July 12, 2013, 07:39:59 am »
found more:
http://www.cnx-software.com/2011/04/22/compile-with-arm-thumb2-reduce-memory-footprint-and-improve-performance/

export CFLAGS=”-mthumb -march=armv7-a”

& kernel config option
Kernel Features --->
  (*) Compilethe kernel in Thumb-2 mode [CONFIG_THUMB2_KERNEL=y]
A10

Offline ryba84

  • Jr. Member
  • **
  • Posts: 75
  • Karma: +2/-0
    • View Profile
Re: Compiler Tweaks
« Reply #2 on: October 02, 2013, 02:22:50 pm »
The linux-sunxi sources doesn't support thumb2. Kernel build with this doessn't boot.

Offline con

  • Jr. Member
  • **
  • Posts: 84
  • Karma: +8/-0
    • View Profile
Re: Compiler Tweaks
« Reply #3 on: October 08, 2013, 06:17:46 am »
Using neon as fp core without unsafe-math operations enabled is pritty pointless as the gcc core will discount most operations due to the lack of ieee compatibility.

NEVER compile your kernel with all the speedy options enabled. Programs however will do fine with them an might even get a decent speedup.

-fassociative-math with -funsafe-math-optimizations is pointless as the second enables the first by default.

-Os in combination with any other optimizations (Os implies O2 and some more) is not recommended as it negates the things you want to do with O3 and more. My recommendation is to avoid any size optimizations.
Unroll loops with size optimizations is like using water to get more fire.

see http://gcc.gnu.org/onlinedocs/gcc/Optimize-Options.html for more fun.