Monday, November 22, 2010

Dhrystone benchmark results for the cdot-beagleXM-0-3

After a few weeks of research and tests, I finally gathered enough information about the general processor performance of the cdot-beagleXM-0-3 builder by running a benchmark program named Dhrystone.

Chris Tyler recommended that I start my benchmarking with Dhrystone just to measure the general performance of the beagleXM. The goal is simple:

-Run Dhrystone and record the result
-Optimize Dhrystone for armv5tel/armv7 and compare/record the results.

I thought it was going to be as easy as just downloading the program, running it, and reporting the results. It never came to me why Chris recommended Dhrystone as the benchmark program for the project. As I worked through the 0.1 goal of my project, I was able to get a better understanding about Dhrystone; it's limitations and advantages.

Reasons for using Dhrystone:
  • ARM® recognizes the program and uses it as a performance attribute of their processors.
  • Dhrystone provides a more meaningful MIPS (Million Instructions Per Second) because results are compared to a reference machine.
  • Dhrystone numbers reflect the performance of the C compiler and libraries more so than the performance of the processor itself. (considered as a weakness of the program)

Dhrystone’s execution is largely spent in standard C library functions. The compiler and the C library affects Dhrystone benchmark result, making it suitable for measuring the general performance of cdot-beagleXM-0-3.The cdot-beagleXM-0-3 builder currently uses armv5tel glibc standard library. If an optimized Dhrystone on armv5tel glibc beagleXM is as fast or has insignificant difference against an armv7 glibc (on the same machine), there is little or no need to focus on recompiling glibc and a lot of C library dependent programs/packages for cdot-beagleXM-0-3 builder.

Here is a graph showing the results of the Dhrystone benchmark:
Notice how both optimizations (armv5tel/armv7) resulted in the same number of DMIPS.

Legend:
Normal = 758.869322709 DMIPS
Optimized for armv5tel = 1034.82179852 DMIPS
Optimized for armv7 = 1034.82179852 DMIPS

The gcc optimization options used for compiling Dhrystone are both ARM architecture specific. One good thing about the result of the benchmark is that optimization increased the overall performance of cdot-beagleXM-0-3 by 36%. The results for both armv5tel and armv7 optimizations are the same. It's safe to assume that the armv5tel glibc impacts the performance of C library dependent programs such as Dhrystone.

Next Dhrystone run will be for an armv7 glibc. Results can then be compared to the data I gathered so far. I am also looking at recompiling the kernel as some sort of an "application benchmark".

Useful Links:

Dhrystone and MIPs performance of ARM processors
DMIPS vs MIPS
Download Dhrystone 2.1

White Papers/Guides:

  • ECL Dhrystone White Paper
  • Dhrystone MIPS - Criticsm by ARM
  • Dhrystone 2.1 README notes

Thursday, November 11, 2010

Playing around with GIT

"Git is distributed version control system focused on speed, effectivity and real-world usability on large projects." It's a wonderful tool for individual developers to work on codes in a peer-to-peer approach.

I still have yet to try more about Git, but for now I'll show how Git can be used as a tool that can backtrack any progress when working on some local files.

Here's how I played around Git:

I created a parent directory called testplay which contains a directory for the nled package.
All the commands that follow are done inside the testplay directory.

To let Git take revision control of the directory, I typed:
#git init
Git initializes with the message "Initialized empty Git repository in /home/user/testplay/.git/"

So far Git has no idea about the files or "lines" that it will track. In order for Git to have a snapshot of the directory, 2 commands are needed:
#git add .
AND
#git commit

This enables Git to store the snapshot permanently in the repository. Every commit creates a version (When commit is run, Git automatically takes the user to Vi editor for a commit message).

Alternatively, using git commit -a does the same purpose as git add and git commit.

Branches

The snapshot/version I created was made under a "branch" named "master" by default. Any commits will update this branch. Git can have multiple branches. Branches are names that points to a particular commit. Given are the commands used to best explain what branches are and what they do.

Display the branches in the repository
#git branch
In my case, I only have 1 branch so the output shows:
* master (where the "*" tells the user that the master branch is active.)

Create another branch from the master branch
#git branch extend1

Git now has 2 branches. I can now create 2 versions of my project based on these branches. A commit or change in one branch wouldn't affect the other unless merged.

To make a desired branch active (In my case 'extend1'):
#git checkout extend1
Produces a message: "Switched to branch 'extend1'"

I created a random file inside the nled/ directory and ran the git add and git commit. If I go back to my "master" branch, the file I created wouldn't be visible; this approach is very useful for debugging and testing out program compilations. :)


Useful Links

Official Git page
gittutorial page
gitmagic

Thursday, November 4, 2010

Optimizing a package for ARMv7 architecture

This is part of the 0.1 stage of Supporting Architectures above armv5tel project. I need to test an optimized package for BeagleboardXM-ARMv7 architecture and document the results. Packages are needed to be optimized to take advantage of ARMv7 processor features namely:
  • Thumb®-2/ Thumb
  • NEON™
  • VFPv3 Floating Point

I was supposed to test my compile on cdot-beagleboardXM-0-3 yesterday and today, but it’s down because the adapter is loaned out.

Anyway, I’ll just show my compiler optimization options and would appreciate any comments.

Compiling a package using ARM specific optimization option (using GCC 4.4.x):
CFLAGS="-O2 –march=armv7-a –mtune=cortex-a8 –mfpu=neon
–mfloat-abi=hard –fomit-frame-pointer"

*Using -mfloat-abi=hard with VFP coprocessors is not supported. Use -mfloat-abi=softfp with the appropriate -mfpu option to allow the compiler to generate code that makes use of the hardware floating-point capabilities for these CPUs. I would have to use this in any case -mfloat-abi=hard doesn't work

Optimization definition:

-O2
-O2 turns on all -O optimizations and all other optimizations that don't greatly increase binary size or interfere with debugging. -O2 is even better than -O, and usually just as safe. This is the optimization level most commonly used for packages and distributions in the Linux world and for the Linux kernel.

-march=your_arch
-march= tells gcc to optimize for a certain architecture. Basically, you just need to know what your CPU is, and the GCC name for it. This may break compatibility with other architectures!

-mtune=/-mcpu=
-mtune=, or -mcpu in older versions of GCC, is similar to -march and accepts the same options. Unlike -march it doesn't break compatibility with older arches. -march and -mtune/-mcpu options can be mixed to get the desired effect.

-mfpu=name
This specifies what floating point hardware (or hardware emulation) is available on the target.

-mfloat-abi=name
Specifies which floating-point ABI to use. Permissible values are: `soft', `softfp' and `hard'.

Specifying `soft' causes GCC to generate output containing library calls for floating-point operations. `softfp' allows the generation of code using hardware floating-point instructions, but still uses the soft-float calling conventions. `hard' allows generation of floating-point instructions and uses FPU-specific calling conventions.

-fomit-frame-pointer
-fomit-frame-pointer tells gcc to omit frame pointers, freeing up an additional register on the CPU. This is mainly useful on x86 as most other arches, like AMD64, have it on by default at -O2 or greater, though binary size may increase slightly. This flag breaks debugging on x86 and possibly other arches unless you're compiling with gcc 4.x and the -fvar-tracking flag.

Sources:

CFLAGS Definition
GNU Compiler Collection
GCC ARM Options
Armin76's Blog

I would like to extend my thanks to the people on IRC and the community Including Chris Tyler and Paul Whalen for guiding me in this project.

#seneca
#fedora-arm
#ubuntu-arm
#gentoo-embedded


Documentation is still on process, the test results will be posted by next week:

Monday, November 1, 2010

FSOSS 2010 - Day 2 (October 29, 2010)

I wasn't able to blog last Friday (October 29, 2010) due to my sickness. After my volunteer shift around 3:00PM that day, I started feeling lightheaded. I rushed home and skipped my awaited presentation (Paul Whalen's ARM Project). As soon as I got home, I crashed into bed and had a fever later that night.
I'm feeling a little better now. I hope I don't miss tomorrow's SBR600 class due to this.


Here are the presentations I attended to last Friday:

The Business of Linux - How individuals can get in the game
It's an hour talk about the basics of business practices do's and don'ts. It was presented by Ms. Karlie Robinson.

Systemtap - System Wide Probing on Linux: Focus on User Space
I personally think this is an awesome project, but I didn't really get how they presented it. I don't know if I was just not techie enough or it was just too technical. It was presented by Mr. Dave Brolley and his team of 2 guys (names didn't appear on the presenter list)

How CMS architectures affect dev. communities
I was expecting too much for this presentation. I thought I was going to find out which CMS system is the most popular, widely used... etc. Although the presentation only lasted for 30 minutes I still learned about architectures, programming languages used and a little about Content Management System evolution. It was presented by Mr. Julian Egelstaff.


Useful Links:

Karlie Robinson's Blog.
Systemtap