Intel Fortran Compiler WTF?

TDC

I'm currently learning Fortran at University (No, I'm studying Physics, not CS).
Anyway, today's program involved rather large arrays.. and strange things happened.

Let's take these two test programs:

PROGRAM test1

IMPLICIT NONE
DOUBLE PRECISION, DIMENSION(1000000) :: bla = 0

PRINT , 'bla'

END

and

PROGRAM test2

IMPLICIT NONE
DOUBLE PRECISION, DIMENSION(1000000) :: bla = 0
INTEGER :: i

DO i=1,1000000
PRINT , bla(i)
ENDDO

END

One would imagine that both would result in fairly small programs, with the latter one being a bit bigger..
But let's look at the executables, when compiled with the "Intel Fortran Compiler":
-rwxr-xr-x 1 f-- edv1 266196 2006-06-28 18:42 test1
-rwxr-xr-x 1 f-- edv1 8266261 2006-06-28 18:42 test2

So, given the allocation of such a large array needs 260kb of instructions, why does a simple loop through the elements need 8mb?
Does it compile to one million instructions?
WTF?

snoofle

It's been decades since I've used FORTRAN, but as a general guide, it's likely that print *, bla(i) caused some additional function in the I/O library to get linked in, and that caused other dependent stuff to get linked in, ad nauseum, to the tune of 8mb worth of crap.

BTW: the code that gets generated for what you wrote is most likely very tiny - just a couple dozen instructions. It's the I/O library, and underlying runtime system that fills most of the first 260-ish K

TDC

Doesn't explain the 8MB though, as both programs have the PRINT *, (And I have other programs with PRINT *, that are much, much smaller)

edit: Overread your underlining, changed the first one to print out only the first element, and yes it has 8MB now, too.

I guess the real WTF is now that it links 8MB of instructions in order to print out an array element.

snoofle

@TDC said:

Doesn't explain the 8MB though, as both programs have the PRINT *, (And I have other programs with PRINT *, that are much, much smaller)

Then try looking at how it's linked. When an executable blows up like that, it's ALWAYS a result of something getting linked in (either implicitly or explicitly), and then it's cascade inclusions.

Some things to look for: static linking, pre-allocating the memory space in the executable as opposed to dynamically at run-time, or even just defining a chunk of heap memory in the data segment.

Try dynamically linking - both executables should shrink down to next to nothing.

TDC

Just tried with reducing the array size, and yes, that makes the program smaller.
(This also explains why it's really small when the array is not used, talk about compiler optimization ^^)

Anyway, why does it pre-allocate the array in the executable by default?

snoofle

@TDC said:

Doesn't explain the 8MB though, as both programs have the PRINT *, (And I have other programs with PRINT *, that are much, much smaller)

edit: Overread your underlining, changed the first one to print out only the first element, and yes it has 8MB now, too.

I guess the real WTF is now that it links 8MB of instructions in order to print out an array element.

Your program probably doesn't need most of the extra stuff that gets linked in. I've actually traced through some of these over the years.

Let's say that to print out an entire array, it uses some generic I/O routine. However, to print out one array element, it calls a routine: printDouble(Double d). But this routine is in a separate library (from the main print function) that has a routine that references some function which is defined in yet another library. Whoops - you just sucked in the entire second library, even though it's not really needed for your purposes.

This can play out in all sorts of combinations, but you get the idea. If you dynamically link the code into a .o you'll see just how tiny your code is.

BTW: this is a function of the linker, not the compiler.

snoofle

@TDC said:

Just tried with reducing the array size, and yes, that makes the program smaller.
(This also explains why it's really small when the array is not used, talk about compiler optimization ^^)

Anyway, why does it pre-allocate the array in the executable by default?

It's been too long since I've used FORTRAN to answer for sure, but I *vaguely* remember that if you put the array in COMMON (which is just a data map for space on the heap, that is allocated at program startup), instead of putting it in the code, it doesn't allocate the data segment in the linked file.

Again, build/link options. Read up on 'common' data sections (essentially data that is global to multiple modules) - dredging up FORTRAN IV memories here ;)

TDC

@snoofle said:

If you dynamically link the code into a .o you'll see just how tiny your code is.

Nah, results in about the same size, see above post I made simultaneously. :)

edit: Man, we should stop posting that fast :/

snoofle

@TDC said:

@snoofle said:
If you dynamically link the code into a .o you'll see just how tiny your code is.

Nah, results in about the same size, see above post I made simultaneously. :)

edit: Man, we should stop posting that fast :/

Try doing a dump of the resulting file, and note the difference in the code and data segment sizes.

If the data segment is huge and the code segment is small, then it's pre-allocating the data segment in the executable. If it's the other way around, then it's sucking in some massive library.

Good luck.

TDC

@snoofle said:

@TDC said:
Just tried with reducing the array size, and yes, that makes the program smaller.
(This also explains why it's really small when the array is not used, talk about compiler optimization ^^)

Anyway, why does it pre-allocate the array in the executable by default?

It's been too long since I've used FORTRAN to answer for sure, but I *vaguely* remember that if you put the array in COMMON (which is just a data map for space on the heap, that is allocated at program startup), instead of putting it in the code, it doesn't allocate the data segment in the linked file.

Again, build/link options. Read up on 'common' data sections (essentially data that is global to multiple modules) - dredging up FORTRAN IV memories here ;)

As I said, it's a course for non-CS-people, so what we hear is "compile like this", "link like that". Thus I never heard of those COMMON-thing, and I doubt I'm allowed to use it.

@snoofle said:

@TDC said:
@snoofle said:
If you dynamically link the code into a .o you'll see just how tiny your code is.

Nah, results in about the same size, see above post I made simultaneously. :)

edit: Man, we should stop posting that fast :/

Try doing a dump of the resulting file, and note the difference in the code and data segment sizes.

If the data segment is huge and the code segment is small, then it's pre-allocating the data segment in the executable. If it's the other way around, then it's sucking in some massive library.

Good luck.

As I said, with decreasing array size the executable shrinks, so I guess it really is the pre-allocation.
(Just curious: Are there other languagues that do this? I guess it was pretty useful in the old days where 640k were 'enough'..)

I wonder why nobody told us about this, seeing they have to consider everyone as total programming beginners, while the current program now has 17mb..

Well maybe they don't know themselves.

Jim: Hehe, that was my first thought, too. I guess I read TDWTF way too often.
And yeah, the argument why we learn Fortran really was the speed ;)

edit: omg, it really does loop unrolling.

loop unrolling might have been worthwhile back when proccessor time was valuable. but nowdays with proccessors that spend 99% of their time doing nothing because they are so fast it's really a waste to bother with it. I can't believe your school is making you use that.

snoofle

@Jim Suruda said:

Dude- maybe it's unrolling the loop! "High performance" compilers used to do that back in the old days, to save overhead. it makes sense, too. Think how much faster your loop will execute if the compiler replaces that loop with 100000 print statements. No more loop variable to increment and check. The resulting exe will be screamingly fast! Extreme Jim

Damn - the Fortran IV compiler I used to work with didn't support that - interesting feature, but, as in this example, seems to require some diligence on the part of the programmer to disable it in the case of large loops.

I wonder if this Fortran compiler provides the ability to see the assembly code that it generates - should be pretty easy to verify the loop unrolling...

Ixpah

Are all these loop unrolling posts missing the required /sarcasm tags?

AFAIK no commercial complier will unroll a loop more than around 16 times, due to diminishing returns, e.g. register pressure, cache misses, etc.

Its fairly obvious that the size is due to the array being allocated in the EXE, is it possible to leave it uninitialised in fortran?

Boris

what version of ifc do you use? i compiled your programs with
ifc -free -o test1 test1.f
(the -free to have free form code, can't remember what the exact code formatting should be*)

-rwxr-xr-x 393065 Jun 30 13:01 test1
-rwxr-xr-x 394791 Jun 30 13:02 test2

version info:
Intel(R) Fortran Compiler for 32-bit applications, Version 8.0 Build 20031016Z Package ID: l_fc_p_8.0.034

ifc is called ifort now, so probably you have an old version. I prefer g95, a free fortran compiler.

my guess is loop unrolling, easily to find out, do the loop 2 times and see whether the file gets twice as big.

Btw, ifc deliberately creates bad code on AMD's (the i stands for Intel after all):

http://www.swallowtail.org/naughty-intel.html

Boris

*for the unititiated, it is truly horrible, something like this: you have got to start all lines with 8 spaces, or put a 'C' in the second column to make it a comment. lines can only be about 77 characters long, but you can continue on the second line when you put a '2' in the second column, i could get the first program to run but not the second without -free..