MOVs all the way down
-
The M/o/Vfuscator (short 'o', sounds like "mobfuscator") compiles programs into "mov" instructions, and only "mov" instructions. Arithmetic, comparisons, jumps, function calls, and everything else a program needs are all performed through mov operations; there is no self-modifying code, no transport-triggered calculation, and no other form of non-mov cheating.
-
@boomzilla I knew I had seen this before:
-
@boomzilla said in MOVs all the way down:
The M/o/Vfuscator (short 'o', sounds like "mobfuscator") compiles programs into "mov" instructions, and only "mov" instructions. Arithmetic, comparisons, jumps, function calls, and everything else a program needs are all performed through mov operations; there is no self-modifying code, no transport-triggered calculation, and no other form of non-mov cheating.
Is this one of those "I did a crazy thing and it was fun to implement!" programs, or is there a real reason to have every instruction be a MOV? I imagine there's a perf hit to using this.
-
I thought this was about Quicktime movie files and the truth was disappointing.
-
@JBert said in MOVs all the way down:
@boomzilla I knew I had seen this before:
I was going to say "What, like SUBLEQ?" But then...
-
@PotatoEngineer said in MOVs all the way down:
is there a real reason to have every instruction be a MOV
I'd say it could conceal malware, but it's probably a safe heuristic to say any program that's entirely
mov
is malicious.
-
I like to MOV it MOV it
I like to MOV it MOV it
-
@PotatoEngineer said in MOVs all the way down:
Is this one of those "I did a crazy thing and it was fun to implement!" programs, or is there a real reason to have every instruction be a MOV? I imagine there's a perf hit to using this.
There's this talk from ~5 years ago:
IIRC it talks about using it as an "anti-reverse engineering technique". That particular aspect starts at ~30:00 minutes in. Apparently it also used to crash IDA. They also mention methods to thwart some of the simple ways of undoing the obfuscation. (FWIW- they also mention there is a pretty heavy perf hit.)
While searching for the talk above, I found a talk from one year later that (based on the title) "breaks" the movfuscation. Haven't watched it, though.
Edit: Conclusion at the end is that it's not entirely clear if this is a valid anti-reverse engineering approach, but it certainly is fun.
-
@cvi I can't decide what's more clever - swapping between real and scratch data to simulate non-execution, or setting up the main function to be its own signal handler.
-
@Gąska Yeah, there's a pile of clever/interesting tricks in there. SIGILL to loop is not something I ever hope to see, but the scratch-vs-real-data trick has been handy occasionally (remove conditions + branches in SIMD-like execution).
There's a follow up talk by the same guy that continues on this idea. It's a bit of a slow start, but the conclusion is kinda neat.
-
@cvi said in MOVs all the way down:
There's a follow up talk by the same guy that continues on this idea. It's a bit of a slow start, but the conclusion is kinda neat.
I wonder why comments are off.
-
Note the sponsor -- a deeply weird organization with a very long, and occasionally startling, history.
-
@Frank-Wilhoit said in MOVs all the way down:
Note the sponsor -- a deeply weird organization with a very long, and occasionally startling, history.
Was about to say that—I'm not sure whether this kind of idea coming out of a place that makes medical and military software is a good or a bad sign.
-
@Frank-Wilhoit said in MOVs all the way down:
Note the sponsor -- a deeply weird organization with a very long, and occasionally startling, history.
Kind of makes this quote :
Q: Why did you make this? A: I thought it would be funny.
seem a bit dishonest, no?
-
@strangeways all commits are by the same guy, the one who made the presentation linked above. Hell, even the readme on upstream repo points to his personal fork. I have zero doubt that it's really his personal project that he worked on in his free time. Why the upstream repo even exists, I can't explain.
-
@cvi said in MOVs all the way down:
There's a follow up talk by the same guy by the same guy that continues on this idea. It's a bit of a slow start, but the conclusion is kinda neat.
The movfuscator part is interesting, although I've seen that before.
But the part where he "reduces" the whole program to just a fewmov
s (that aren't even movs anymore but also other stuff) and a data table makes no sense. He even admits that, of course, if you write an emulator then it's only the data that changes but the code is the same for every program. That the architecture he wrote this emulator for is itself built out of mov (and his big MOVE) instructions isn't really that interesting anymore at that point, IMO.Also, I kinda don't really get it.
Let's take his example for how to do an if:: X == Y mov eax, [X] mov [eax], 0 mov eax, [Y] mov [eax], 4 mov eax, [X] : X = 100 mov eax, [SELECT_X + eax] mov [eax], 100
So, after line 5, eax contains either 0 or 4 depending on whether X and Y are equal, which allows us to swap between either the data or the scratch selector.
But we also wrote random garbage to the adresses with the values of X and Y, which can be anything. So how do we not corrupt memory (or more likely incur a page fault) every time we do something like this?And also I might have missed this: how does he call an external function (
printf
) with that?
-
@topspin said in MOVs all the way down:
And also I might have missed this: how does he call an external function (
printf
) with that?IIRC from scanning the site (I wasn't going to dig deeper), external functions need a little bit of non-MOV code.
-
@topspin said in MOVs all the way down:
: X == Y
mov eax, [X]
mov [eax], 0
mov eax, [Y]
mov [eax], 4
mov eax, [X]
: X = 100
mov eax, [SELECT_X + eax]
mov [eax], 100I have no idea what this is doing, whether the square brackets are significant, or generally what's going on
-
@Jaloopa said in MOVs all the way down:
@topspin said in MOVs all the way down:
: X == Y
mov eax, [X]
mov [eax], 0
mov eax, [Y]
mov [eax], 4
mov eax, [X]
: X = 100
mov eax, [SELECT_X + eax]
mov [eax], 100I have no idea what this is doing, whether the square brackets are significant, or generally what's going on
The square brackets mean indirection.
You take the integers X and Y and interpret them as adresses, then write a 0 at the adress corresponding to integer X and a 4 at the adress corresponding to interger Y. If X and Y are the same, the second write overwrites the first and thus the location contains a 4. If they are different, the first location still contains a 0.
Basically:int x, y; // ... int* x_addr = (int*) x; *x_addr = 0; int* y_addr = (int*) y; *y_addr = 4; int ofs = *x_addr; int* target = (int*) (selector_x + ofs); *target = 100;
He also has a C version of the whole program's assembly at 38:00.
-
@topspin said in MOVs all the way down:
if you write an emulator then it's only the data that changes but the code is the same for every program
He mentions that as well. The point he makes is that in an emulator you select a bunch of instructions to execute based on the data (i.e., there is a large branch somewhere). So different code is executed for different instructions. In his version, there is no branch. It's always the same handful of instructions.
Whether or not that is a "big deal" is debatable. I found it neat twist. On some level, it's certainly different from a traditional interpretor/emulator.
And also I might have missed this: how does he call an external function (printf) with that?
He mentions at one point that I/O can be done by
mmap()
ing/dev/stdin
and/dev/stdout
. I didn't know that was a thing, so can't really comment on how you end up using that.
-
-
@cvi said in MOVs all the way down:
He mentions at one point that I/O can be done by
mmap()
ing/dev/stdin
and/dev/stdout
. I didn't know that was a thing, so can't really comment on how you end up using that.It's only sort of a thing. It works for files, but not for terminals or network sockets.
-
@dkf said in MOVs all the way down:
How is the SIGILL being triggered?
Illegal mov instruction. Didn't know that was a thing, but apparently if you set the destination register to something specific that you're not allowed to, and it'll trigger SIGILL.
-
@dkf said in MOVs all the way down:
It's only sort of a thing. It works for files, but not for terminals or network sockets.
He made it sound like you could do it for IO to the console/terminal, but that does sound a bit sketchy (e.g., what range do you mmap?). But, yeah, even if that's possible somehow, most other syscalls will need a non-mov somewhere.
-
@dkf said in MOVs all the way down:
@cvi said in MOVs all the way down:
He mentions at one point that I/O can be done by
mmap()
ing/dev/stdin
and/dev/stdout
. I didn't know that was a thing, so can't really comment on how you end up using that.It's only sort of a thing. It works for files, but not for terminals or network sockets.
Sure you can mmap a socket. Though for transmits you still need to do a system call.
-
@cvi said in MOVs all the way down:
@dkf said in MOVs all the way down:
It's only sort of a thing. It works for files, but not for terminals or network sockets.
He made it sound like you could do it for IO to the console/terminal, but that does sound a bit sketchy (e.g., what range do you mmap?). But, yeah, even if that's possible somehow, most other syscalls will need a non-mov somewhere.
Also, you reduced the problem of calling printf to the problem of calling mmap.
-
@topspin Once at setup, but, yeah, that's a fair point. It's not the only thing, he also needs to set up the signal handlers, which isn't just movs either.
-
I'd rather write JavaScript with no alphanumeric characters.
-
@error You could have linked to @Yamikuronue's front page article...
-
@TwelveBaud said in MOVs all the way down:
@error You could have linked to @Yamikuronue's front page article...
We have a front page?
-
@error said in MOVs all the way down:
I'd rather write JavaScript with no alphanumeric characters.
Come on, you know the kink thread is !
-
@topspin said in MOVs all the way down:
So, after line 5, eax contains either 0 or 4 depending on whether X and Y are equal, which allows us to swap between either the data or the scratch selector.
But we also wrote random garbage to the adresses with the values of X and Y, which can be anything. So how do we not corrupt memory (or more likely incur a page fault) every time we do something like this?The example code is just theoretical. The actual implementation uses
al
instead ofeax
and generally just operates on bytes rather than dwords. So you have just 256 possible memory addresses to worry about. Also, another thing that the example skipped over is that[eax]
is offset by a constant so that it points to a designated, preallocated buffer.
-
@dkf said in MOVs all the way down:
@topspin said in MOVs all the way down:
And also I might have missed this: how does he call an external function (
printf
) with that?IIRC from scanning the site (I wasn't going to dig deeper), external functions need a little bit of non-MOV code.
One
int
for making the syscall. He mentions something I forgot for how to avoid that but I guess it boils down to some clever startup code that will itself need another instruction or three.
-
@dkf said in MOVs all the way down:
@cvi said in MOVs all the way down:
SIGILL to loop
How is the SIGILL being triggered?
A move to the segment register, apparently that's the only illegal
mov
in user mode.
-
@Gąska said in MOVs all the way down:
@topspin said in MOVs all the way down:
So, after line 5, eax contains either 0 or 4 depending on whether X and Y are equal, which allows us to swap between either the data or the scratch selector.
But we also wrote random garbage to the adresses with the values of X and Y, which can be anything. So how do we not corrupt memory (or more likely incur a page fault) every time we do something like this?The example code is just theoretical. The actual implementation uses
al
instead ofeax
and generally just operates on bytes rather than dwords. So you have just 256 possible memory addresses to worry about. Also, another thing that the example skipped over is that[eax]
is offset by a constant so that it points to a designated, preallocated buffer.And he chose to skip over that saving one minute in a one hour talk for the sake of not making any sense whatsoever.
-
@topspin he reminds me of a university professor. "Let's consider this super basic example that doesn't resemble anything you might actually encounter in the future. I'll explain it to you in great detail. Now, let's take a brief look at some very complicated cases. The results are such and such, I hope it's obvious to everyone how we got here."
-
@Gąska I mean if he just explained it in principle that'd be fine. The idea of comparing things by writing to either the same or different adresses was quite interesting in itself. But at that point he didn't just explain that idea but he's literally presenting assembly code, so he might as well present the actual assembly and not something that obviously crashes immediately with a page fault. Or if that's too much clutter at least mention "we're not writing directly to memory but only indexing into this buffer."
-
@PleegWat said in MOVs all the way down:
Sure you can mmap a socket.
Doesn't look like it would work with a stream-oriented socket.
I think I'm glad of that.
-
Briefly tried the whole mmap to output thing. mmap tends to fail with MAP_FAILED / ENODEV (i.e., filesystem/specified files doesn't support memory mapping). Which makes sense. Sanity restored, but wtf talk dude?