How I cut GTA Online loading times by 70%
-
I mean, I didn't, this guy did, but that's the title of his post.
The WTFs:
Disassembling the now-less-obfuscated dump reveals that one of the addresses has a label pulled out of somewhere! It’s
strlen
? Going down the call stack the next one is labeledvscan_fn
and after that the labels end, tho I’m fairly confident it’ssscanf
.It’s parsing something. Parsing what? Untangling the disassembly would take forever so I decided to dump some samples from the running process using x64dbg. Some debug-stepping later it turns out it’s… JSON! They’re parsing JSON. A whopping 10 megabytes worth of JSON with some 63k item entries.
(...)
But 10 megs? That’s nothing! And using sscanf may not be optimal but surely it’s not that bad? Well…
(...)
The second problem? Right after parsing an item, it’s stored in an array (or an inlined C++ list? not sure). Each entry looks something like this:
struct { uint64_t *hash; item_t *item; } entry;
But before it’s stored? It checks the entire array, one by one, comparing the hash of the item to see if it’s in the list or not. With ~63k entries that’s (n^2+n)/2 = (63000^2+63000)/2 = 1984531500 checks if my math is right. Most of them useless. You have unique hashes why not use a hash map.
(Don't take that struct declaration too seriously, this looks like someone who only has the faintest idea of what C++ looks like. These probably aren't actually pointers but direct values - at least that's what it looks like to me in the disassembly.)
He made a quick and dirty PoC that fixes those two issues.
The plan? Write a .dll, inject it in GTA, hook some functions, ???, profit.
The JSON problem is hairy, I can’t realistically replace their parser. Replacing
sscanf
with one that doesn’t depend onstrlen
would be more realistic. But there’s an even easier way.- hook
strlen
- wait for a long string
- “cache” the start and length of it
- if it’s called again within the string’s range, return cached value
(...)
And as for the hash-array problem, it’s more straightforward - just skip the duplicate checks entirely and insert the items directly since we know the values are unique.
(...)
If this somehow reaches Rockstar: the problems shouldn’t take more than a day for a single dev to solve. Please do something about it :<
As someone on Hacker News put it:
This online gamemode alone made $1 billion in 2017 alone.
Tweaking two functions to go from a load time of 6 minutes to less than two minutes is something any developer worth their salt should be able to do in a codebase like this equipped with a good profiler.
Instead, someone with no source code managed to do this to an obfuscated executable loaded with anti-cheat measures.
- hook
-
@error_bot gif facepalm
-
-
@Gąska
Smells of a temporary solution to data transfer early in development.
-
@izzion said in How I cut GTA Online loading times by 70%:
@Gąska
Smells of a temporary solution to data transfer early in development.Ah, the typical "we'll fix the tech debt later"...
-
There was another comment in the HN thread which said something like
They show ads on the loading screen, so not a bug.
-
@izzion said in How I cut GTA Online loading times by 70%:
@Gąska
Smells of a temporary solution to data transfer early in development.Hey boss, what should we have the intern work on so that he doesn't get in the way?
Have him work on the UI, surely he can't do too much damage there!
-
@Gąska said in How I cut GTA Online loading times by 70%:
He made a quick and dirty PoC that fixes those two issues.
I think this just goes to show that C strings are pretty horrible if you're not careful, and that parsers aren't something that the intern should be writing for you. And C++ can put lipstick on the turd, but doesn't mean that you're doing anything better. People who use C++ like to ridicule scripting languages, but the scripting languages tend to have competently written data structures in consistent use, whereas in C++ it's all too easy to make horrible blunders that only show up as ghastly execution times in production.
-
@Gąska said in How I cut GTA Online loading times by 70%:
sscanf
Forbidden in our codebase, though mostly because of how it handles field sizes. I use
strtoull(p, &ep, 0)
a lot.
-
TRWTF is him calling an AMD FX-8350 "old but decent".
On the other hand, any CPU is good enough in this day and age. So glad I decided to build my current PC in February 2020, right before this shitstorm started.
Also ... why the hell does
sscanf
callstrlen
? It shouldn't care about this at all : if it gets a pointer to something that's not null terminated, it's the caller's fault. And with such a trivial format string there's no reason why it shouldn't work as "discard leading whitespace and read base-10 digits from string until a terminator is found, whichever comes first". There's even a bug in glibc about this, but the maintainers say "There is no complexity class requirement in the standard" and they're right - but it's still very unexpected, at least to me. Oh well.
-
@strangeways said in How I cut GTA Online loading times by 70%:
there's no reason why it shouldn't work as "discard leading whitespace and read base-10 digits from string until a terminator is found, whichever comes first".
which, on my quick scan of the glibc source, is exactly what
strtoul
(or actually__strtol_l
) does. Including the skipping of leading whitespace.
-
TRWTF here is sscanf doing a full string read when it doesn't need to. I can totally understand a dev not knowing that and so not realising that algorithm is O(n²).
Although not using a hash map looks like a genuine WTF.
-
@dkf For some reason, text IO has been neglected by the standard for a long time. It wasn't until recently when things like
std::format
(for output) andstd::from_chars()
(for simple input) started things moving again.
-
@strangeways said in How I cut GTA Online loading times by 70%:
the maintainers say "There is no complexity class requirement in the standard"
The standard also doesn't say that they can't call
sleep(600)
between parsing each character, but they don't do that…
-
@bobjanova said in How I cut GTA Online loading times by 70%:
Although not using a hash map looks like a genuine WTF.
The higher-level WTF is that nobody thought “this is taking a long time, I wonder why” and looked into it.
-
@dkf said in How I cut GTA Online loading times by 70%:
@strangeways said in How I cut GTA Online loading times by 70%:
the maintainers say "There is no complexity class requirement in the standard"
The standard also doesn't say that they can't call
sleep(600)
between parsing each character, but they don't do that…Just think of the future optimization potential!
-
@strangeways That bug is amusing. Reported in late 2014, ignored for the next ~6 years, until about a week ago.
Not unsurprising, though. Messing with
sscanf()
et al. is going to be messy. There's a good chance that it will impact a very significant chunk of all software out there, so, you know, no pressure.
-
@boomzilla said in How I cut GTA Online loading times by 70%:
Just think of the future optimization potential!
Pull request:
sleep()
is taking unnecessarily long. With the attached patch, processing time taken by sleep is cut by 99+%.Filed under: It's not like
sleep()
has any observable side-effects.
-
@cvi said in How I cut GTA Online loading times by 70%:
@boomzilla said in How I cut GTA Online loading times by 70%:
Just think of the future optimization potential!
Pull request:
sleep()
is taking unnecessarily long. With the attached patch, processing time taken by sleep is cut by 99+%.Filed under: It's not like
sleep()
has any observable side-effects.Whoa whoa whoa! Let's start smaller...
-- sleep(600); ++ sleep(550);
That's enough to make a noticeable difference but shouldn't overly interfere with anyone's workflow. And still plenty of future potential!
-
@dkf said in How I cut GTA Online loading times by 70%:
@Gąska said in How I cut GTA Online loading times by 70%:
He made a quick and dirty PoC that fixes those two issues.
I think this just goes to show that C strings are pretty horrible if you're not careful, and that parsers aren't something that the intern should be writing for you. And C++ can put lipstick on the turd, but doesn't mean that you're doing anything better. People who use C++ like to ridicule scripting languages, but the scripting languages tend to have competently written data structures in consistent use, whereas in C++ it's all too easy to make horrible blunders that only show up as ghastly execution times in production.
C++ strings don't have quadratic runtimes on append. Python strings do1. So I don't think your generalization holds up.
1 Apparently. Reminds me that I wanted to submit a patch to that gdb bridge when I had some downtime during the Christmas holidays. But apparently got the better of me.
-
@topspin said in How I cut GTA Online loading times by 70%:
Python strings do1.
But Python's a toxic hellstew anyway. And it's fairly rare there to build a string by using
+
or+=
.The problem is one of models; Python says that strings are strictly immutable and doesn't provide any good way of working around that; fixing concatenation requires either a change to the basic semantics of strings or to the fundamental memory model (in order to make it possible to cheaply decide whether a string needs to be copied). But f-strings and the
join
method mean that there's reasonable options in practice.
-
@boomzilla said in How I cut GTA Online loading times by 70%:
Whoa whoa whoa! Let's start smaller...
Go big, or go home:
- unsigned sleep( unsigned ) - { - //... - } + unsigned sleep __attribute__((section(".text#"))) = 12828721u;
Edit: Briefly thinking about it,
= 12843145u
might be a more conforming implementation. Hmm.
-
@cvi backs away slowly
-
@cvi said in How I cut GTA Online loading times by 70%:
@boomzilla said in How I cut GTA Online loading times by 70%:
Whoa whoa whoa! Let's start smaller...
Go big, or go home:
- unsigned sleep( unsigned ) - { - //... - } + unsigned sleep __attribute__((section(".text#"))) = 12828721u;
Edit: Briefly thinking about it,
= 12843145u
might be a more conforming implementation. Hmm.But...you've eliminated future optimization opportunities.
-
@dkf:
People who use C++ like to ridicule scripting languages, but the scripting languages tend to have competently written data structures in consistent use
Also @dkf:
But Python's a toxic hellstew anyway.
How the turn tables!
-
@Gąska said in How I cut GTA Online loading times by 70%:
@dkf:
People who use C++ like to ridicule scripting languages, but the scripting languages tend to have competently written data structures in consistent use
Also @dkf:
But Python's a toxic hellstew anyway.
How the turn tables!
Stop fighting the assimilation and let Javascript be the One True Universal Language.
-
@PleegWat said in How I cut GTA Online loading times by 70%:
@cvi backs away slowly
Let me guess. You're probably one of those people that wouldn't let code with an unnecessarily complicated, unnecessarily platform-dependent, unnecessarily round-about and unnecessarily dumb way of achieving its goal get through review? :disappoint:
-
@boomzilla said in How I cut GTA Online loading times by 70%:
But...you've eliminated future optimization opportunities.
Nope.
#define sleep(x) (x)
is even more optimized!(But @cvi's solution is worthy of the Evil Ideas Thread.)
-
@Zerosquare said in How I cut GTA Online loading times by 70%:
(But @cvi's solution is worthy of the Evil Ideas Thread.)
Can someone explain is even going on in that code?
-
@boomzilla said in How I cut GTA Online loading times by 70%:
But...you've eliminated future optimization opportunities.
Gotta go fast.No half-measures when it comes to perf. Cut corners elsewhere.
Filed under: Who cares about maintainability? That's just catering to dumb people, like future-me.
-
@error said in How I cut GTA Online loading times by 70%:
Can someone explain is even going on in that code?
It's a convoluted way to create a function that does nothing more than
return x;
.
12843145u
is encoded as 89 F8 C3, which ismov eax, edi / ret
in assembly.
Thesection(".text#")
part puts it in the code segment instead of the data segment.
-
@cvi said in How I cut GTA Online loading times by 70%:
@PleegWat said in How I cut GTA Online loading times by 70%:
@cvi backs away slowly
Let me guess. You're probably one of those people that wouldn't let code with an unnecessarily complicated, unnecessarily platform-dependent, unnecessarily round-about and unnecessarily dumb way of achieving its goal get through review? :disappoint:
Well, I wouldn't put it that bluntly.
-
@Zerosquare said in How I cut GTA Online loading times by 70%:
@error said in How I cut GTA Online loading times by 70%:
Can someone explain is even going on in that code?
It's a convoluted way to create a function that does nothing more than
return x;
.
12843145u
is encoded as 89 F8 C3, which ismov eax, edi / ret
in assembler.
Thesection(".text#")
part puts it in the code segment instead of the data segment.I'd love to see someone trying to port the codebase to ARM finding that.
-
@error said in How I cut GTA Online loading times by 70%:
I'd love to see someone trying to port the codebase to ARM finding that.
No worries, just change the constants: 12843145u => 3778019102u. (Or if you want to return zero instead: 12828721u => 16226468490572201984llu (warning 64 bits)).
-
@cvi said in How I cut GTA Online loading times by 70%:
12828721u
What's the meaning of this magic?
E: Should've just read the thread. Didn't realize people disassemble in their heads...
-
@topspin That one returns zero instead (xor eax, eax; ret).
-
@Zerosquare said in How I cut GTA Online loading times by 70%:
(But @cvi's solution is worthy of the Evil Ideas Thread.)
Next level is to put this in a section that is read-write-execute so that you can change behaviours (and architectures!) at runtime by assigning the different constant values.
-
@cvi said in How I cut GTA Online loading times by 70%:
@boomzilla said in How I cut GTA Online loading times by 70%:
But...you've eliminated future optimization opportunities.
Gotta go fast.No half-measures when it comes to perf. Cut corners elsewhere.
Filed under: Who cares about maintainability? That's just catering to dumb people, like future-me.
But...I wanna milk that a lot over time. It's a nice thing to have in your back pocket when you really need a win.
-
@strangeways said in How I cut GTA Online loading times by 70%:
TRWTF is him calling an AMD FX-8350 "old but decent".
Also ... why the hell does
sscanf
callstrlen
? It shouldn't care about this at all : if it gets a pointer to something that's not null terminated, it's the caller's fault. And with such a trivial format string there's no reason why it shouldn't work as "discard leading whitespace and read base-10 digits from string until a terminator is found, whichever comes first". There's even a bug in glibc about this, but the maintainers say "There is no complexity class requirement in the standard" and they're right - but it's still very unexpected, at least to me. Oh well.That's a really idiotic
bugQoI issue. So the largest part wasn't even a bug in Rockstar's code (though they should have fixed it) but in their C library.
Also, I'm surprised, did they actually use gcc (mingw?) on Windows? Or is the same bug in MSVC's implementation too?To be fair I had no idea most sscanf implementations called strlen so I can’t blame the developer who wrote this.
-
@cvi said in How I cut GTA Online loading times by 70%:
@Zerosquare said in How I cut GTA Online loading times by 70%:
(But @cvi's solution is worthy of the Evil Ideas Thread.)
Next level is to put this in a section that is read-write-execute so that you can change behaviours (and architectures!) at runtime by assigning the different constant values.
W^X says no.
-
@Zerosquare said in How I cut GTA Online loading times by 70%:
@error said in How I cut GTA Online loading times by 70%:
Can someone explain is even going on in that code?
It's a convoluted way to create a function that does nothing more than
return x;
.
12843145u
is encoded as 89 F8 C3, which ismov eax, edi / ret
in assembly.
Thesection(".text#")
part puts it in the code segment instead of the data segment.Is this the cricket thread?
-
@error said in How I cut GTA Online loading times by 70%:
let Javascript be the One True Universal Language
-
@Benjamin-Hall said in How I cut GTA Online loading times by 70%:
Is this the cricket thread?
Hey, you're one of us now. You could at least pretend to understand what we're talking about like everybody else!
-
@Benjamin-Hall said in How I cut GTA Online loading times by 70%:
@Zerosquare said in How I cut GTA Online loading times by 70%:
@error said in How I cut GTA Online loading times by 70%:
Can someone explain is even going on in that code?
It's a convoluted way to create a function that does nothing more than
return x;
.
12843145u
is encoded as 89 F8 C3, which ismov eax, edi / ret
in assembly.
Thesection(".text#")
part puts it in the code segment instead of the data segment.Is this the cricket thread?
Is this an endofunctor in your pocket or are you just happy to see us?
-
@Benjamin-Hall The ref is blowing his wistle... Oh, and would you look at that! It's a green card! That's not one you see every day...
-
@TwelveBaud said in How I cut GTA Online loading times by 70%:
@Benjamin-Hall The ref is blowing his wistle... Oh, and would you look at that! It's a green card! That's not one you see every day...
I'm taking that as an affirmative answer to my question.
-
@topspin said in How I cut GTA Online loading times by 70%:
That's a really idiotic
bugQoI issue. So the largest part wasn't even a bug in Rockstar's code (though they should have fixed it) but in their C library.
Also, I'm surprised, did they actually use gcc (mingw?) on Windows? Or is the same bug in MSVC's implementation too?To be fair I had no idea most sscanf implementations called strlen so I can’t blame the developer who wrote this.
The code from that glibc bug runs in 1.3 seconds under MSVCRT, but only in Debug mode. Release mode brings this down to the (expected) <10ms territory, which is understandable since the debug versions of MSVCRT are riddled with various asserts and checks that can be useful during development. Before anyone asks : yes, the generated assembly for both versions shows a call to
sscanf
.Interestingly, I can't reproduce the issue with glibc 2.33 (~40ms) or the current musl (~5ms). glibc 2.31 runs the code in ~300ms, but on a much slower Celeron-class CPU.
tl;dr : I wish I knew which libc they were using.
-
"After a thorough investigation, we can confirm that player t0st did, in fact, reveal an aspect of the game code related to load times for the PC version of GTA Online that could be improved," the company said in a statement. "As a result of these investigations, we have made some changes that will be implemented in a forthcoming title update."
Update: Shortly after Rockstar confirmed that a load time fix is in the works, tostercx said that they were in fact awarded $10,000 through its Bug Bounty program. That's normally reserved for discovering security or privacy issues in Rockstar's online games, but the studio decided to award the bounty "as an exception" in this case, tostercx said. Rockstar confirmed to PC Gamer that the payment is being made.
-
@Zecc that's pretty cool.
-
@Zecc Well honestly that's a really great ending to the story, well played all parties.