COBOL Quick Guide

flabdablet

Imagine trying to write C string processing functions on a system with +ve and -ve zeros!

The CDC Cyber that @Steve_The_Cynic and I were fondly remembering above, as well as having non-byte-addressable memory (addressable words are 60 bits each) and using character code zero for the printable : character, also featured ones-complement integer arithmetic.

I'm not aware of a C compiler for that architecture, but if any exist they would certainly need interesting rules for dealing with char pointers and arrays, let alone strings.

Steve_The_Cynic

@flabdablet said:

@Unixwolf said:
Imagine trying to write C string processing functions on a system with +ve and -ve zeros!

The CDC Cyber that @Steve_The_Cynic and I were fondly remembering above, as well as having non-byte-addressable memory (addressable words are 60 bits each) and using character code zero for the printable : character, also featured ones-complement integer arithmetic.

I'm not aware of a C compiler for that architecture, but if any exist they would certainly need interesting rules for dealing with char pointers and arrays, let alone strings.

It's worse than that, because you can't actually build a conformant C compiler for a machine based on CDC Display Code - C requires that char is at least 8 bits long. One's complement wouldn't be a major source of stress in terms of C conformance, though. I suppose you could use 60 bit characters...

flabdablet

@Steve_The_Cynic said:

you can't actually build a conformant C compiler for a machine based on CDC Display Code - C requires that char is at least 8 bits long

Well you could, but it wouldn't be particularly tidy.

One way would be to make char a 12-bit type with an encoding based on CDC Display Code, with the high 6 bits set to zero for all the code points that Display Code would normally pack into a single 6-bit CDC character field, much as UCS-2 does for the Unicode BMP.

Another approach would be to make char an 8-bit type inside C programs, ignoring all the support that the CPU and PPUs have got for handling data in 6 bit chunks, and work out some consistently-applied scheme for packing those into 60-bit words. Handling the common idiom of allowing the use of char[] for raw binary buffers would require packing fifteen chars into each pair of 60-bit words, with one of the fifteen split between them. If that proved too ugly, you could maybe make char a 10-bit type and pack them six to a word.

Standard library char-I/O routines would convert back and forth as required by external I/O devices. Which sounds kind of insane, but is actually pretty similar to what NOS had to do anyway when talking to ASCII terminals, just in the opposite direction (it might even save work when talking to ASCII equipment if you used ASCII or UTF-8 as the C internal character encoding).

The result would certainly be something only its author could ever conceivably love.

PleegWat

@Steve_The_Cynic said:

I suppose you could use 60 bit characters...

You'd have to - individual characters must be addressable.

Steve_The_Cynic

@PleegWat said:

@Steve_The_Cynic said:
I suppose you could use 60 bit characters...

You'd have to - individual characters must be addressable.

Well, not necessarily. Yes, individual characters must be addressable, but there's plenty of room in a Cyber's pointer variable (only 18 bits long, stored in a 60 bit word) to have an extension applicable only to char * and (by arcane implication) void * that would allow you to address individual characters within a 60-bit word. (The arcane implication is simply the fact that you must be able to recover the original pointer if you cast from X * to void * and back, provided that X * is not a function pointer type, since casting between function pointers and data pointers is UB.)

Why is function-data pointer casting UB? Because on certain architectures (compact and medium memory models on DOS machines, for example), function and data pointers aren't even the same size! A similar argument says that in C++, pointers to extern "C" functions cannot be (portably, via UB) stored in undecorated function pointers, which are reserved for pointing to C++ functions. This time it is IBM's fault, for one of their reasonably popular architectures.

Steve_The_Cynic

If the Cyber is sufficiently recent (such machines existed when I used one in the second half of the 80s), you could just run it in ASCII mode, where the word size becomes 64 bits, and everything runs with 8-bit ASCII characters.

flabdablet

What are you, sane?

Steve_The_Cynic

@flabdablet said:

What are you, sane?

I hope not.

Magus

You know what they say, "sanity is for the weak"

flabdablet

@Magus said:

sanity is for the weak

There are classes of weakness that I certainly prefer to the alternatives.

dkf

@PleegWat said:

You'd have to - individual characters must be addressable.

I remember once fielding a code change to support someone who was using my code on some sort of Cray system. The change? Removing the assumption that void * and char * were types of the same size. That weird Cray had char * be actually a structure saying what machine word the character was in, followed by what the offset within the word was.

That such a thing might even be conceivably standards compliant is the scary thing. (These days, I just say that architectures other than ILP32 and I32LP64 are essentially unsupported. Very few people care about that.)

tarunik

@dkf said:

(These days, I just say that architectures other than ILP32 and I32LP64 are essentially unsupported. Very few people care about that.)

What about W64? (IL32LLP64)

dkf

@tarunik said:

IL32LLP64

That's just weird. ;) But not as strange as ILP64…

riking

See, THIS is why signed overflow is undefined.

Unixwolf

Steve, don't talk about 60 bit characters like that! These buggers

Unicode Consortium

might hear you!
http://unicode.org/consortium/consort.html

dkf

@riking said:

THIS is why signed overflow is undefined.

Naw, it's undefined because C is allowed to use 1's complement machines too. A platform ABI should define what signed overflow does.

riking

Sorry, I was responding to the thread in general, and hit the wrong reply button ¯\_(ツ)_/¯

gleemonk

@flabdablet said:

The result would certainly be something only its author could ever conceivably love.

I have a few of those programs. For most though, my relationship is more like that of Frankenstein to his monster.

tar

Filed under: :fa_headache:

Magus

ijij

COBOL has its problems - but the ability to serve that up isn't one of them.

PJH

@Magus said:

That looks like Widescreen - just added this to it:

nav.post-controls .like-count {
  margin-right: inherit; /* was -5px */
}

CoyneTheDup

@Steve_The_Cynic said:

All in all, that machine [CDC] was a major WTF in its own right...

It's word design (6 bit characters, 60 bit word, 10 characters to the word) was strange alright.

But WTF?

Up until the first Cray, the CDC series machines were the fastest in the world. CDC-6400? Record. CDC-6600? Record. CDC-7600? Record. I.e., not a WTF.

I sort of became disassociated with the products so I don't know what came after that. The CDC-8600 was faster than the 7600, but I'm not sure it ever sold. The CDC-Star-100, sold a few I think, and it was faster than the 7600, but not sure if it was faster than the 8600.

Seymour Cray, designer of the Cray machines was a principle architect at CDC.

@flabdablet said:

Then there was the whole business about zero-padding (that is, colon-padding) to the end of the word meaning end-of-line, 12-bit ^-prefix encodings for lowercase characters and other delights.

Yes, those were fun, but I think you forgot about the fact that 12 bits of zero-padding was end of line. Which meant you needed another word if your string ended on the ninth-character boundary.

And I think everyone forgot another oddity: CDC was the only machine I know of with a 3-address instruction set. So a single instruction could do something like "add register 2 to register 5 giving register 3".

@Unixwolf said:

Imagine trying to write C string processing
functions on a system with +ve and -ve zeros!

Everyone always brings that up because, yes, CDC did support a negative zero. However, no instruction produced a negative zero as a result, though they all handled negative zero on input. If you wanted negative zero, you had to force the value with a load. So I don't see a string issue with that--the 12-bits zero to end a string would be much more painful--it would only be an issue if the programmer made it one.

It never occurred to me at the time (not creative enough) but there's no reason a language on CDC couldn't have been built around 12-bit characters, or even 10-bit. I can't find an instruction set reference, but if I recall you had to use bit masking to access the characters anyway.

@dkf said:

I remember once fielding a code change to support someone who was using my code on some sort of Cray system. The change? Removing the assumption that void * and char * were types of the same size. That weird Cray had char * be actually a structure saying what machine word the character was in, followed by what the offset within the word was.

That such a thing might even be conceivably standards compliant is the scary thing. (These days, I just say that architectures other than ILP32 and I32LP64 are essentially unsupported. Very few people care about that.)

That bit about word + offset would sure break all the library routines, wouldn't it?

But it's not hard to understand it being standard; C standard is pretty forgiving. I can't find it now, but somewhere there was an extensive list of misconceptions about C. Here's a few misconceptions I remember:

null == 0
pointer addresses are sequential
pointers are the same size as int
pointers for different types use the same addressing scheme and are the same size
characters are 8 bits

There were others, but these are the ones I remember right off. Odds that C on the CDC proves the last 4, if not the first.

FrostCat

@CoyneTheDup said:

And I think everyone forgot another oddity: CDC was the only machine I know of with a 3-address instruction set. So a single instruction could do something like "add register 2 to register 5 giving register 3".

The VAX had those, too: http://www2.hmc.edu/www_common/OVMS072-OLD/72final/4515/4515pro_016.html#4515ch9_5

dkf

@CoyneTheDup said:

And I think everyone forgot another oddity: CDC was the only machine I know of with a 3-address instruction set. So a single instruction could do something like "add register 2 to register 5 giving register 3".

That's quite a common feature of RISC systems, including the ARM series.

flabdablet

@CoyneTheDup said:

I think everyone forgot another oddity: CDC was the only machine I know of with a 3-address instruction set. So a single instruction could do something like "add register 2 to register 5 giving register 3".

VAX and AMD 29000 instruction sets both have this feature as well, IIRC. It allows for rather more efficient pipeline management.

There's one Cray design touch I don't recall ever seeing in anybody else's architectures: the memory address registers are made available as explicitly programmable user registers, and memory access works by side effect. Use the 18-bit A6 or A7 register as the destination for an operation, and you get a memory write to the resulting address from the corresponding 60-bit X6 or X7 register. Use A1..A5 as destinations and you get a memory read into the corresponding X1..X5 register. So the programmer's model has no explicit memory load or memory store instructions, which is kind of interesting.

@CoyneTheDup said:

I can't find an instruction set reference, but if I recall you had to use bit masking to access the characters anyway.

Later members of the family did have some string processing instructions built in that assumed 6-bit characters, but they were actually slower in most cases than handrolled code that did the same job.

CoyneTheDup

Okay, I didn't know about the other 3-address machines (never used those, not even the ARM).

I did remember about the memory accesses by register store. Of course, technically, any instruction setting an A register was a memory store or load. Except, maybe, A0, which as I recall was conventionally left set to 0 for quick copies and tests....or was it B0? At any rate, A0 did not do memory access.

dkf

@CoyneTheDup said:

never used those, not even the ARM

You've probably used devices that used them. They're ubiquitous.

flabdablet

@CoyneTheDup said:

Except, maybe, A0, which as I recall was conventionally left set to 0 for quick copies and tests....or was it B0?

A0 was a pure scratch register. B0 ignored writes and always read as zero. B1 was conventionally set to 1 and left that way; you could break lots of CDC code by deliberately setting it to something else.

Wikipedia's summary is decent.

PJH

@CoyneTheDup said:

Here's a few misconceptions I remember:

null == 0

No, null /is/ zero. In source code that is. It need not, however, end up being all-bits-zero.

Question 5.17

Steve_The_Cynic

@CoyneTheDup said:

But it's not hard to understand it being standard; C standard is pretty forgiving. I can't find it now, but somewhere there was an extensive list of misconceptions about C. Here's a few misconceptions I remember:

null == 0

pointer addresses are sequential

pointers are the same size as int

pointers for different types use the same addressing scheme and are the same size

characters are 8 bits

There were others, but these are the ones I remember right off. Odds that C on the CDC proves the last 4, if not the first.

Be careful with denying that NULL == 0. This expression is actually true - in a pointer context, the token 0 (or any compile-time constant with that value) means the same thing as NULL. What you mean is that the all-bits-zero bit pattern is not necessarily NULL. That one will catch you out on AS/400 architectures. (This one is important to remember if you are tasked with porting code that zero-fills (memset, bzero, etc.) structures to "initialise" them. This doesn't set pointer members to NULL, and assuming it does leads to UB. (Oddly, the memset itself is not UB, in C-compatible data structures - in C++ objects it is UB, however. Only the subsequent use of the pointers in the structure as if they are NULL is UB.) (and see below)

16-bit x86 with "far" pointers (e.g. large memory model, although compact and medium added their own special sauce) gave sizeof(pointer) != sizeof(int). That wasn't a problem, because it taught you very, very quickly not to assume things about pointer sizes. (Unlike one mature student at my uni, who worked with VAXes, and tended to assume that he could freely cast between pointers and ints.)

16-bit x86 also (in either real or protected mode) does in the sequential address thing quite effectively. In real mode, a thing in memory has multiple (4096) bitwise-different addresses, although the last few will be hard to use if the thing is bigger than 16 bytes.

Medium and compact memory models in 16-bit x86 have different pointer sizes for function and data pointers, something that people usually manage to forget about, even if they remember that not all data pointers are the same size. A function pointer cannot automatically be shoved into a void *, unless you don't mind some of it not coming back.

Characters are indeed not necessarily exactly 8 bits, although they are not allowed (in a conformant implementation) to be shorter than 8 bits.

An important one you might have missed: Like pointers, floating point types do not necessarily become 0 if you fill their memory with zeroes.

dkf

@Steve_The_Cynic said:

something that people usually manage to forget about

You know why people forget about these things? It reduces the amount of mental scarring.

@Steve_The_Cynic said:

Like pointers, floating point types do not necessarily become 0 if you fill their memory with zeroes.

But they do if you're using IEEE floating point. Hardly anyone uses anything else these days (if they're using floats at all) since all the problems with producing decent hardware implementations of at least the binary variants are now solved.

CoyneTheDup

@Steve_The_Cynic said:

Be careful with denying that NULL == 0. This expression is actually true - in a pointer context, the token 0 (or any compile-time constant with that value) means the same thing as NULL. What you mean is that the all-bits-zero bit pattern is not necessarily NULL.

Yeah, there were a couple problems with that statement. First of all, there apparently is no "null", and "NULL" is actually a macro. The actual pointer may be any bit pattern, but you can compare that bit pattern to 0 (or set the pointer equal to 0) and it will test true if the bit pattern is the null pattern.

I got my statement from something I read a long time ago (that I can't find) and I'm guessing I didn't understand the import even then.

The other parts are still true, though; for example, an "int *p" and "char *p" may not have the same memory structure.

Basically internal pointer structure could bite sloppy code authors lots of ways.

dkf

@CoyneTheDup said:

Basically internal pointer structure could bite sloppy code authors lots of ways.

Though that sort of weirdness is getting much rarer now. Thank the blessed Dennis!