Yeah, this kind of stuff doesn't surprise me at all anymore.

nullptr

void CRun::UpdateGreyState()
{
int greyed_state = 0;

    switch(greyed_state)
    {
    case 0:
        GetDlgItem(IDC_VCR_STOP)->EnableWindow(true);
        GetDlgItem(IDC_VCR_PAUSE)->EnableWindow(true);
        break;
    case 2:
        GetDlgItem(IDC_VCR_STOP)->EnableWindow(false);
        GetDlgItem(IDC_VCR_PAUSE)->EnableWindow(false);
        break;
    default:
        break;
    }
}

This function is called 65 times across 34 files.

martijntje

At least it's simple enough for the compiler to just remove the unreachable code, so this just increases the LOC while not making your program any slower.

Who doesn't want free extra lines in his program?

aliceif

Couldn't you just add normal NOPs instead?

Filed under: ; //Do nothing, asm("nop");

Dlareg

Oh this would really be a way to do a Worse Than Failure.

//wait for n processor cycles
for (i=0; i<x; x++) {
   CRun::UpdateGreyState()
}

PleegWat

Hidden speedup loop?

Dlareg

Wow that's some bad code from me... No redbull in my blood is the only excuse I can make (I'm trying to kick the habit). OTOH what's wrong with Discourse? I was not attacked on my for statement (initializing i and incrementing x)

//wait for n processor cycles
for (i=0; i<n; i++) {
   CRun::UpdateGreyState()
}

boomzilla

@Dlareg said:

I was not attacked on my for statement (initializing i and incrementing x)

And saying something about n in the comment.

dkf

@Dlareg said:

for (i=0; i<n; n++) {

Oops! I think you did again.

Dlareg

look again ;)

NedFodder

Why fix the code when you can fix the comment and not have to re-compile?

//wait for n processor cycles
//where n = INT_MAX - x
for (i=0; i<x; x++) {
   CRun::UpdateGreyState()
}

boomzilla

@NedFodder said:

Why fix the code when you can fix the comment and not have to re-compile?

make says yes.

Steve_The_Cynic

@aliceif said:

Couldn't you just add normal NOPs instead?
Filed under: ; //Do nothing, asm("nop");

Sure, and you could do it 6809 style:

[code]
NOP
BRN there
LBRN there
[/code]
That's "no-operation", "[short] branch never", and "long branch never". Yes, the 6809 had two different conditional branch (i.e. branch when this condition is true) instructions where the branching condition was false.

Or you could do it x86 style. x86 has no separate NOP instruction. Instead, you can use any of a variety of register-to-register operations that have no net effect on the state of the machine except moving the program counter / instruction pointer to the next instruction. (Well, maybe. The normal candidate was XCHG AX,AX in 16-bit code segments, and of course once they started introducing register tracking to allow side-by-side (UV) execution and out-of-order execution, it suddenly started mattering a lot which instruction you used.)

TwelveBaud

@Steve_The_Cynic said:

x86 had no separate NOP instruction.

They fixed that. You can use XCHG EAX, EAX or * XCHG AX, AX (*: argument size override, 0x66) if you specifically need a one- or two-byte NOP, but there is a dedicated NOP instruction since the Pentium Pro in 1995.

Maciejasjmj

@TwelveBaud said:

You can use XCHG EAX, EAX or * XCHG AX, AX (*: argument size override, 0x66) if you specifically need a one- or two-byte NOP, but there is a dedicated NOP instruction since the Pentium Pro in 1995.

The one-byte NOP instruction is an alias mnemonic for the XCHG (E)AX, (E)AX instruction

So you end up with XCHG EAX, EAX anyway. Not sure in which scenario that matters, though.

Filed under: someone booby-trapped your EAX register?

dkf

@Steve_The_Cynic said:

Yes, the 6809 had two different conditional branch (i.e. branch when this condition is true) instructions where the branching condition was false.

That'd be useful if you were writing self-modifying code, as it would let you change the type of jump by just altering the appropriate bits in the opcode.

flabdablet

Main use for a never-taken branch instruction is as a NOP with different timing. The explicit 6809 NOP takes two cycles, which is the shortest possible execution time for any 6809 instruction; the short branch-never takes three, just like all the other branches.

The 6502 doesn't have unconditional branch instructions, but there's a fair bit of Apple II code that relies on known conditions to force a branch to be taken, which takes 3 cycles (on the 6502, but not the 6809, a branch not taken consumes only two cycles).

Given that 6809 short branch instructions do always take exactly three cycles whether taken or not, you could argue that branch-never is redundant because you could just use branch-always or a conditional branch with a branch target immediately following the branch instruction itself. However, branch-never means that the branch offset byte following the opcode is guaranteed to be completely ignored. If you're tight for space, sometimes it's useful to be able to tuck a one-byte instruction on there; conceptually, then, the branch-never opcode can be treated as a "skip next single-byte instruction" opcode.

I can construct no similarly plausible rationalization for the existence of long branch never, which IIRC requires four bytes and five cycles. Best that can be said about that is that it was easier not to bother making the hardware do something else with it.

Steve_The_Cynic

@flabdablet said:

If you're tight for space, sometimes it's useful to be able to tuck a one-byte instruction on there; conceptually, then, the branch-never opcode can be treated as a "skip next single-byte instruction" opcode.

TRS-80 Model III did that. There were two BASIC commands, TRON and TROFF that activated and deactivated printing the current line number every time execution moved to a different line. TRON loaded the (non-zero) opcode of a CLR A instruction into the A register, while TROFF executed that CLR A instruction to get a zero in there. They then stored A into the "trace active" flag variable.

Steve_The_Cynic

@flabdablet said:

I can construct no similarly plausible rationalization for the existence of long branch never, which IIRC requires four bytes and five cycles. Best that can be said about that is that it was easier not to bother making the hardware do something else with it.

The whole branch-never thing is more likely that. The opcodes for the branch instructions were arranged into opposed pairs, with e.g. BEQ being odd and BNE being even. Each pair tested a particular CPU flag or combination of flags, and branched if the condition was true (even) or false (odd). The condition, of course, might not be what you think: for BEQ/BNE, on this schema, the condition was "Z flag not set".

And of course, the unconditional branch instructions (BRA/LBRA/BRN/LBRN) weren't unconditional. They, too, came in a pair (with the 0x10 prefix to mean "long"), and that pair's condition was "TRUE", so the even-opcode BRA would branch if TRUE was TRUE, which it is, and the odd-opcode BRN would branch if TRUE was FALSE, which it isn't, so BRN didn't branch.

It was easier, I suspect, to simply design the chip like that than it was to fix it.

flabdablet

One of the standard idioms near the start of the 256-byte firmware on Apple II peripheral cards was

IENTRY     SEC
           DFB $90  ;opcode for BCC
OENTRY     CLC
           ...

Note that the BCC is effectively a branch-never because of the SEC that precedes it.

It was done this way because

Space was very tight
Code had to be position-independent as peripheral card ROMs are memory mapped to a slot-dependent address, and the 6502 has no relative subroutine call instruction
For both input and output, the firmware needed to work out the memory address of the card's I/O registers before doing anything else. Having the carry flag set on an input call and cleared on an output call allowed those cases to be distinguished after slot determination code common to both, but unable to be called as a subroutine, had finished.

@Steve_The_Cynic said:

It was easier, I suspect, to simply design the chip like that than it was to fix it.

Yes - dependent, of course, on the way the instruction set was laid out. The 6502 set, for example, has no multi-byte opcodes and only three bits allocated to distinguishing branch variants, which isn't enough to allow for luxurious combinations like Always and Never.

Yamikuronue

Dammit, the only thing I remember from my Computer Organization and Assembly class that's ever been relevant to a discussion, and I end up 'd due to a need for sleep.

dkf

@Steve_The_Cynic said:

It was easier, I suspect, to simply design the chip like that than it was to fix it.

I think the original ARM designs were the same way, except with lots more parts of the ALU logic as well. Though without the loopy different-lengths-of-jump stuff. (God, I hate architectures that do that, even if I understand why…)

Eldelshell

Well, at least it's in a function and not everywhere.

Steve_The_Cynic

@Maciejasjmj said:

So you end up with XCHG EAX, EAX anyway. Not sure in which scenario that matters, though.

My question is whether the execution core knows enough to make such an instruction NOT dependent on the value in (E)AX... It should be, but it's the sort of thing that can get overlooked, and of course, it adds complexity because one particular XCHG instruction (the one blessed as NOP) is special-cased as not-data-dependent. (Making it not-data-dependent makes it able to be executed out-of-order, and even in the absence of OOX, it allows it to go down the V path while an EAX-using instruction goes down the U path or vice versa, because even though both the NOP and the other instruction both touch EAX.)

Steve_The_Cynic

@dkf said:

@Steve_The_Cynic said:
It was easier, I suspect, to simply design the chip like that than it was to fix it.

I think the original ARM designs were the same way, except with lots more parts of the ALU logic as well. Though without the loopy different-lengths-of-jump stuff. (God, I hate architectures that do that, even if I understand why…)

6809 was particularly odd about branch lengths because it had conventional jump and call instructions with addressing modes up the wazoo - when these were executed with an "immediate" operand, it was an absolute address - as well as 8-bit and 16-bit offset relative jumps (branches) for both jump and call. This, combined with having the "Direct Page" addressing modes (instead of the 6502's "Zero Page" modes) allowed it to do PIC relatively easily, including code that didn't care where its data segment was, so long as the data segment was relatively small. (And even then, there were ways to get around the requirement for a small data segment without resorting to position dependent code.) OS-9 used this extensively. (No, not MacOS 9, OS-9.)

flabdablet

@dkf said:

the loopy different-lengths-of-jump stuff. (God, I hate architectures that do that, even if I understand why…)

How do you feel about architectures like the 65816, where the length of an instruction can depend on the current setting of a processor status flag? Disassembly is so much more fun when you require dynamic code analysis just to get the instruction parsing right.

flabdablet

@Steve_The_Cynic said:

My question is whether the execution core knows enough to make such an instruction NOT dependent on the value

For x86, almost certainly. The ALU and data paths inside that family of CPUs have been a long way removed from the instruction set architecture for a very long time; in 2015, x86 machine code can reasonably be considered just another JIT-compiled bytecode.

Steve_The_Cynic

@flabdablet said:

For x86, almost certainly. The ALU and data paths inside that family of CPUs have been a long way removed from the instruction set architecture for a very long time; in 2015, x86 machine code can reasonably be considered just another JIT-compiled bytecode.

Yes, I know all that. (And the bytecode thing isn't JIT-compiled in quite the same way that e.g. a JVM or the CLR would do it - the compiled version is discarded much sooner than a JVM would do it because there ain't 'nuff mem'ry in there, dammit!) "A very long time" is over 20 years, with the introduction of NexGen's Nx586 line.

But that doesn't mean that it takes such things into account, just that it is flexible enough to do it right.

flabdablet

@Steve_The_Cynic said:

the bytecode thing isn't [exactly] JIT-compiled

It's definitely a bit more than straight-up interpreted though. The microcode that controls the actual ALUs, registers and data paths is chewing on instructions in an internal code generated by what is effectively an x86 parser, and I believe that internal code is in fact cached to some extent.

PleegWat

What use are NOPs anyway? I can think of two uses:

Instruction-level timing, which probably doesn't work when reordering, multiple pipelines, or pre-emptive multitasking exist
Occupying space for future injection of modified code

In both cases, if a reordering module can identify an instruction (or possibly a pair of instructions) as NOP, what reason does it have to execute those instructions at all? Is there something I'm missing?

flabdablet

@PleegWat said:

What use are NOPs anyway?

They're good for patching out code during debugging (some more so than others: the 8048 NOP is zero, IIRC, which is handy when you're patching EPROMs) and they're good for fiddling with timing, in little processors where fiddling with timing is a thing.

ben_lubar

@aliceif said:

asm("nop");

Doesn't that force the compiler to move registers around? What you really need is a function called something like utils.DoNothing(). And then the compiler can inline it.

Maciejasjmj

Some alignment fixing, too. Also, NOP sledding - fill the page with NOPs, and if the IC accidentally hits this page, there's a better chance it'll sled to the error handling code instead of executing potentially dangerous gibberish.

Also I think there was one architecture where an instruction after a branch would get executed regardless of the result of the branching. And probably more @Placeholder usages.

As for actually executing them... well, I'm gonna bet $10 there's at least one guy who still uses timing loops.

HardwareGeek

@Maciejasjmj said:

Also I think there was one architecture where an instruction after a branch would get executed regardless of the result of the branching.

Yes. A very long time ago, I used to write assembly — mostly x86, but I remember doing some stuff on a RISC processor (MIPS, maybe; I don't remember for sure any more). The way the fetch and execute was pipelined, the following instruction was already flowing through the pipeline before the PC was updated to the branch target. Your choices were either to manually — no optimizing compiler to help you — reorder the instructions, or if speed of writing the code was more important than performance (or it couldn't be optimized), keep the instructions in the order that made sense to the programmer and just put a NOP after the branches.

EvanED

@Maciejasjmj said:

Also I think there was one architecture where an instruction after a branch would get executed regardless of the result of the branching.

"Branch delay slot." I know offhand two architectures that use them even at the ASM level, MIPS and SPARC. Wikipedia lists a few others.

dkf

@HardwareGeek said:

The way the fetch and execute was pipelined, the following instruction was already flowing through the pipeline before the PC was updated to the branch target.

That's kind-of died as a technique as memory speeds ended up not keeping pace with CPU speeds. By the time you've put in place all the caching you need, you can do better things than that (such as dynamic pipelining and reordering of instructions).

flabdablet

@Maciejasjmj said:

there was one architecture where an instruction after a branch would get executed regardless of the result of the branching

That's a fairly well trodden road.

HardwareGeek

@dkf said:

That's kind-of died as a technique

I said this was a long time ago — maybe 1992, or thereabouts.

Incidentally, IIRC which job was responsible for the acquisition of this knowledge, that was my one, brief foray into the realm of having Software Engineer as my official job title — writing BIOS code.

PJH

@PleegWat said:

Instruction-level timing, which probably doesn't work when reordering, multiple pipelines, or pre-emptive multitasking exist

Timing attack - Wikipedia

ben_lubar

subtle package - crypto/subtle - Go Packages

Maciejasjmj

ConstantTimeCompare returns 1 iff the two slices, x and y, have equal contents. The time taken is a function of the length of the slices and is independent of the contents.

It's independent of the contents... and dependent on the contents instead?

dkf

@HardwareGeek said:

I said this was a long time ago — maybe 1992, or thereabouts.

That was about when I learned about it too. Funny how time flies…

ben_lubar

It goes through the entire array no matter what. And there's no conditional branching.

HardwareGeek

@dkf said:

Funny how time flies…

:obligatory_Groucho_joke.mp3: (that he probably never actually said)