How to speed up my program's startup



  • TRWTF doesn't make an appearance until the <a href="http://stackoverflow.com/questions/6816769/how-to-speed-up-my-programs-startup>OP's comment to his question:

    Q: My program needs to load many big wordlist files, so it always takes a long time to start up....

    Comment: define big and many first... – KillianDS

    Comment: there are 10 files of over 20GB size, in binary format. – user859147

    My first thought (having just seen the title) was, of course, "Stop using java." Turns out he's using C++.



  • Weird. I don't know what the guy means by "wordlist" files, but it certainly doesn't mean a list of natural language words. Those fit in about 0.5 Mb (or perhaps as much as 2 to 3 Mb if you don't do any compression).

    But he surely can't load 200Gb of data, at least, not on any normally priced machine, so it probably just means indexing these lists. And that could be done off-line. Oh hell, why do I bother.



  •  He could load that much if he rolled his own virtual memory...


  • Discourse touched me in a no-no place

    @TGV said:

    But he surely can't load 200Gb of data, at least, not on any normally priced machine
    My work desktop has 128GB, and I'm sure the swap file is at least that big, so he probably could load the whole thing. TRWTF is that no one on the stackoverflow thread told him that that was an extremely stupid thing to do. To be fair, the highest ranking answer did suggest mmap.



  • @PedanticCurmudgeon said:

    My work desktop has 128GB, and I'm sure the swap file is at least that big, so he probably could load the whole thing.

    Uh, why? Is this common where you work? And would you trust such a system to the likes of the questioner?



  • @TGV said:

    Weird. I don't know what the guy means by "wordlist" files, but it certainly doesn't mean a list of natural language words. Those fit in about 0.5 Mb (or perhaps as much as 2 to 3 Mb if you don't do any compression).
     

    I used to work for an SEM company and each individual account had upwards to 4,000,000 words and phrases being included for ad networks. When you take in account that a single concept could have as many as 50 iterations: "mini golf", "mini-golf", "put-put", "putt-put", "putt-putt", "put-putt", "put-put golf", "putt-put golf", you get the idea, plus translations of each to a number of foreign languages plus common mispellings for each word, I could easily see each account taking a good 20Mb of uncompressed space just for those phrases. Even then you would need 10,000 accounts, each with that many words and phrases in order to reach 200Gb. So, I do find his need for 200Gb of data for "wordlist" files to be dubious. One such file taking 20Gb wouldn't be out of the question, though.



  • You missed the Real WTF:

    @Some Stack Overflow User said:

    My program needs to load many big wordlist files, so it always takes a long time to start up. it's so inefficient to recover quickly in crash.

    The only reason he cares about the inefficient loading is because his program crashes often enough to make it an issue!

    The only real answer is, "well, fix the fucking crashes, you idiot." C++ is a shitty language, but there's no good reason you can't shovel 200 GB of data into it without crashes. (The StackOverflow thread, with the exception of Martinho Fernandes, misses the fucking point as much as this thread does.)

    The other answers are:
    1) Use a database, and offload the problem to the RDMS
    2) Memory-map the file, and offload the problem to the OS
    3) Basically, offload the problem to someone else who's probably smarter than you anyway



  • @blakeyrat said:

    The StackOverflow thread, with the exception of Martinho Fernandes, misses the fucking point as much as this thread does.

    I contend it is you who has missed the point, which is that no-one actually gives a rat's arse about a guy who's trying to load 200GB of data on program startup. The guys answering on SO are just hoping the asker will accept their answer so they can get points off his stupidity/laziness/ineptitude/obvious troll post.



  • @blakeyrat said:

    You missed the Real WTF:

    @Some Stack Overflow User said:

    My program needs to load many big wordlist files, so it always takes a long time to start up. it's so inefficient to recover quickly in crash.

    The only reason he cares about the inefficient loading is because his program crashes often enough to make it an issue!

    The only real answer is, "well, fix the fucking crashes, you idiot." C++ is a shitty language, but there's no good reason you can't shovel 200 GB of data into it without crashes. (The StackOverflow thread, with the exception of Martinho Fernandes, misses the fucking point as much as this thread does.)

    On that account, I can give him the benefit of the doubt (if only to focus on the 200GB WTF), and assume that he's actively developing it or something, and there are still lots of bugs to work out, requiring a restart. I got the impression that it wasn't the loading part that was crashing, but whatever he was doing later. Of course, if you have to wait to read 200GB from a disk, unless you're using some sort of SSD (and probably even then, unless it's really magical), one restart a week sounds to me like too many.



  • @boomzilla said:

    Of course, if you have to wait to read 200GB from a disk, unless you're using some sort of [b]SSDS[/b] (and probably even then, unless it's really magical), one restart a week sounds to me like too many.

    SSTFY



  • @Xyro said:

    @boomzilla said:
    Of course, if you have to wait to read 200GB from a disk, unless you're using some sort of SSDS (and probably even then, unless it's really magical), one restart a week sounds to me like too many.

    SSTFY

    Your honor, I will argue that SSDS must fall into the "really magical" classification.


  • Discourse touched me in a no-no place

    @boomzilla said:

    @PedanticCurmudgeon said:
    My work desktop has 128GB, and I'm sure the swap file is at least that big, so he probably could load the whole thing.

    Uh, why? Is this common where you work? And would you trust such a system to the likes of the questioner?
    No, it's not common, and I wouldn't even trust my home laptop (refurbished, 2GB RAM) to the likes of the questioner.



  • @RHuckster said:

    I used to work for an SEM company and each individual account had upwards to 4,000,000 words and phrases being included for ad networks. When you take in account that a single concept could have as many as 50 iterations: "mini golf", "mini-golf", "put-put", "putt-put", "putt-putt", "put-putt", "put-put golf", "putt-put golf", you get the idea, plus translations of each to a number of foreign languages plus common mispellings for each word, I could easily see each account taking a good 20Mb of uncompressed space just for those phrases. Even then you would need 10,000 accounts, each with that many words and phrases in order to reach 200Gb. So, I do find his need for 200Gb of data for "wordlist" files to be dubious. One such file taking 20Gb wouldn't be out of the question, though.

    My guess would be that the wordlists are in fact some sort of rainbow table. Getting 200GB of rainbow tables is (IIRC) not that hard.

    TRWTF is, however, when some of the more recent answers suggest that he should convert the wordlists into compilable code and possibly link them into shared libraries. :-(



  • OTOH, there's this guy:

    Editing large text files on linux ( 5 - 10gb)

    Basically, i need a file of specified format and large size(Around 10gb). To get this, i am copying the contents of my original file into the same file, multiple times, to increase its size. I dont care about the contents of the file as long as they have the required format. Initially, i tried to do this using gedit, which failed miserably after few 100mbs. I'm looking for an editor which will help me do this. Or, may be a suggestion on alternate ways

    WTF? Was today "Do stupid things with big files day," and no one told me?



  • Well... he [i]might[/i] be needing a large file to test his program's ability to handle same. I wouldn't bet on it, though - anyone thoughtful enough to test such things ought to know how to generate a suitable file.



  • @boomzilla said:

    Of course, if you have to wait to read 200GB from a disk, unless you're using some sort of SSD (and probably even then, unless it's really magical), one restart a week sounds to me like too many.

    Normal SSDs can read around 200-300 MB/s, so reading 200 GB would take a good 15 minutes. Some specialized (connected to PCIe instead of SATA) and really expensive ones can reach speeds of close to 1000 MB/s I think. I'd probably go for a RAID of standard SSDs if I needed that kind of performance. With four of those you could reach the 1000 MB/s mark, so loading that data set would "only" take a bit over three minutes.



  • @boomzilla said:

    OTOH, there's this guy:

    Editing large text files on linux ( 5 - 10gb)

    Basically, i need a file of specified format and large size(Around 10gb). To get this, i am copying the contents of my original file into the same file, multiple times, to increase its size. I dont care about the contents of the file as long as they have the required format. Initially, i tried to do this using gedit, which failed miserably after few 100mbs. I'm looking for an editor which will help me do this. Or, may be a suggestion on alternate ways

    WTF? Was today "Do stupid things with big files day," and no one told me?

     

    groan did you see the guy suggesting he use PHP to generate the 10GB file?

     



  • @Scarlet Manuka said:

    Well... he might be needing a large file to test his program's ability to handle same. I wouldn't bet on it, though - anyone thoughtful enough to test such things ought to know how to generate a suitable file.
     

     Out of curiosity: how would one generate such a file? I mean, I have for example a C nQueens-Implementation around that does generate a few hundred megs text, and could use append instead of write, then run it a few times. But surely just using C is as much a wtf as the original post?



  • @fire2k said:

    Out of curiosity: how would one generate such a file? I mean, I have for example a C nQueens-Implementation around that does generate a few hundred megs text, and could use append instead of write, then run it a few times.

    Assuming you have some suitable data, you can simply cat it (to append) over and over until it's as big as you need it to be.

    @fire2k said:

    But surely just using C is as much a wtf as the original post?

    Only if your nick rhymes with flakeyrat. I mean...using any particular language for an unspecified task could be a WTF.



  • @boomzilla said:

    @fire2k said:
    But surely just using C is as much a wtf as the original post?

    Only if your nick rhymes with flakeyrat. I mean...using any particular language for an unspecified task could be a WTF.

    I actually prefer C to C++. It's slightly easier to not fuck up using C.

    But, seriously, this is 2011. Even your JavaScript interpreter doesn't have any noticeable performance hit anymore. Write in a memory-managed language. Whatever the application.



  • @blakeyrat said:

    Write in a memory-managed language. Whatever the application.
     

    The device I'm working on right at this moment has 8192 bytes of instruction memory and 48K of data memory. It's also very critical to the overall performance of the system it's a part of. What memory-managed language would you recommend for my application?

     



  • @Wrongfellow said:

    @blakeyrat said:

    Write in a memory-managed language. Whatever the application.
    The device I'm working on right at this moment has 8192 bytes of instruction memory and 48K of data memory. It's also very critical to the overall performance of the system it's a part of. What memory-managed language would you recommend for my application?

    Oh for fuck's sake. Did you seriously not know what I meant? Are you really that fucking dense? Or did you post this as some sort of condescending brag? Some kind of "I'm so much better than other programmers, because I have strict limitations!" thing?

    Fuck.



  • @blakeyrat said:

    Did you seriously not know what I meant?
     

    Well, since you wrote "Whatever the application", I assumed that was what you meant.

    I'm sorry if my lack of telepathy disappoints you.

    @blakeyrat said:

    Are you really that fucking dense? Or did you post this as some sort of condescending brag? Some kind of "I'm so much better than other programmers, because I have strict limitations!" thing?

    Fuck.

     Now, now. It's only the Internet.

     



  • @blakeyrat said:

    Oh for fuck's sake. Did you seriously not know what I meant? Are you really that fucking dense? Or did you post this as some sort of condescending brag? Some kind of "I'm so much better than other programmers, because I have strict limitations!" thing?

    Dude, you totally bring this shit on yourself by saying crap like, "Whatever the application." Just admit it, you love to troll like this, otherwise you'd have figured out a better way to say what you actually mean, like, "You'd better have a good reason for not using a memory managed language."



  • @boomzilla said:

    @blakeyrat said:
    Oh for fuck's sake. Did you seriously not know what I meant? Are you really that fucking dense? Or did you post this as some sort of condescending brag? Some kind of "I'm so much better than other programmers, because I have strict limitations!" thing?

    Dude, you totally bring this shit on yourself by saying crap like, "Whatever the application." Just admit it, you love to troll like this, otherwise you'd have figured out a better way to say what you actually mean, like, "You'd better have a good reason for not using a memory managed language."

    No, I expect people to have and use COMMON SENSE. And not be pedantic dickweeds.

    Obviously in an environment where you can't run a JavaScript interpreter, don't write code that requires a JavaScript interpreter. Duh. Why do I have to actually type this.

    Now since I'm assuming Wrongfellow is not a fucking retard and has common sense (perhaps being too generous), that means he must have some ulterior motive for typing that little gem of a post. The only ulterior motive I could see in there is bragging, like I said in my post. Maybe I was wrong, and the motive was just being a pedantic dickweed for the sake of it. ("Aha! Blakeyrat didn't say exactly what he meant in a completely unambiguous fashion! I got him now!!") If there's some other ulterior motive, then please enlighten me.

    But I refuse to believe you are so dumb that you somehow thought my advice applied to applications that have to run in 8K of RAM. I refuse. You're not that dumb. And I'm not dumb enough to fall for it.

    I want to have a conversation with people who:
    1) actually think about what they type,
    2) don't just sit around in a basement looking for opportunities to make pedantic dickweed snipes at people, for the crime of not typing like an automaton

    Maybe that's not possible here.



  • @blakeyrat said:

    No, I expect people to have and use COMMON SENSE. And not be pedantic dickweeds.

    I think I found your problem...



  • @blakeyrat said:

    ...blah...blah...projecting poor communication skills and inability to learn from past onto others...blah...blah...

    FTFY



  • @boomzilla said:

    @blakeyrat said:
    ...blah...blah...projecting poor communication skills and inability to learn from past onto others...blah...blah...
    FTFY
    I don't know, I understood what he was saying and assumed you wouldn't be able to install .Net with 8k of memory...

    I think you should look up the word imply.

    Further to that, how clever do you think you would sound if this were an exchange with several people...  I would immediately think you're a pedantic dickweek with superiority complex.  But then again, I'm sure you'll point out just how wrong I am by highlighting a spelling mistake, or something else equally petty.



  • Yeah, I'm going to have to side with blakey on this one.



  • @C-Octothorpe said:

    @boomzilla said:
    @blakeyrat said:
    ...blah...blah...projecting poor communication skills and inability to learn from past onto others...blah...blah...

    FTFY

    I don't know, I understood what he was saying and assumed you wouldn't be able to install .Net with 8k of memory...

    I think you should look up the word imply.

    I'm pretty sure there's no good reason that I should. Are you denying that blakeyrat says stuff like this all the time, and then blows a gasket when someone takes what he wrote at face value? He seems surprised every single time it happens. And he seems to take it personally. Maybe he just has a really really low tolerance for being trolled. Or is looking forward to an early death by heart attack.

    Anyways, now you're implying that on systems beefy enough to install something like .Net, you should always use some memory managed language (by only mentioning resource constraints, not domain or other issues).



  • @blakeyrat said:

    he must have some ulterior motive for typing that little gem of a post
     

     Well, partly it's a minor peeve of mine that so much of today's development work assumes that everyone's happy to throw megabyte upon megabyte of memory at every problem they encounter.

    There was also a smidgen of genuine question in there - if anyone can think of a higher level language which would be a good match for a device like this then I'd like to know about it. I'm aware of Forth and Erlang but off the top of my head I can't think of anything else.

    But I'll admit that part of my motivation was the promise of some interesting frothing.

     



  • @boomzilla said:

    Are you denying that blakeyrat says stuff like this all the time, and then blows a gasket when someone takes what he wrote at face value?

    No.

    @boomzilla said:

    and then blows a gasket when someone takes what he wrote at face value?

    HE'S IN I.T.! WHO THE FUCK IN I.T. DOESN'T HAVE A SHORT TEMPER?!?!

    @boomzilla said:

    He seems surprised every single time it happens. And he seems to take it personally.

    I think it's more that he gets annoyed by people who have nothing better to do than to point out minor foibles.

    @boomzilla said:

    Anyways, now you're implying that on systems beefy enough to install something like .Net, you should always use some memory managed language (by only mentioning resource constraints, not domain or other issues).

    Yes, that's exactly what I meant, clearly...  You must be a *blast* to hang out with in real life.

    I think it's you who is headed to an early death via your tightwadedness.  Relax and just connect the dots yourself sometimes.



  • @boomzilla said:

    I'm pretty sure there's no good reason that I should. Are you denying that blakeyrat says stuff like this all the time, and then blows a gasket when someone takes what he wrote at face value?

    They don't take it at face value, they take it literally. Super-literally, like Data from Star Trek.

    @boomzilla said:

    He seems surprised every single time it happens.

    That's because I don't have the "pedantic gene", so I can't combat it by heading it off at the pass like, for example, Raymond Chen does on his blog.

    @boomzilla said:

    And he seems to take it personally.

    I get mad when people pretend to be stupid.

    @boomzilla said:

    Or is looking forward to an early death by heart attack.

    More likely to be a homicide conviction.

    @boomzilla said:

    Anyways, now you're implying that on systems beefy enough to install something like .Net, you should always use some memory managed language (by only mentioning resource constraints, not domain or other issues).

    I definitely think every piece of software should be written in a memory managed environment. I also think that if a piece of software needs to do a common operation, say a SHA1 hash, they should definitely use a library like .net to do it. That doesn't mean it needs to be .net, though.



  • @C-Octothorpe said:

    @boomzilla said:
    He seems surprised every single time it happens. And he seems to take it personally.

    I think it's more that he gets annoyed by people who have nothing better to do than to point out minor foibles.

    Which makes it really hilarious, because that's his bread and butter around here. Of course, one man's minor foibles is another's

    @C-Octothorpe said:

    @boomzilla said:
    and then blows a gasket when someone takes what he wrote at face value?

    HE'S IN I.T.! WHO THE FUCK IN I.T. DOESN'T HAVE A SHORT TEMPER?!?!

    Absolutely. The problem is that he continually sets himself up for this. Which implies that either he likes blow his stack over this sort of issue, can't be bothered to learn from the past, or is just a poor enough communicator that he doesn't realize when something he says is going to lead to this situation. And, of course, feeding the trolls with a rant in reply is a fabulous way to discourage this sort of response.

    @C-Octothorpe said:

    I think it's you who is headed to an early death via your tightwadedness. Relax and just connect the dots yourself sometimes.

    I think you must be confusing me with someone else.

    @blakeyrat said:

    That's because I don't have the "pedantic gene", so I can't combat it by heading it off at the pass like, for example, Raymond Chen does on his blog.

    You probably do, otherwise why would you hang out here? Seriously, people who program computers pretty much have to be good at being pedantic to be successful. Of course, we're not all pedantic about the same things.



  • @blakeyrat said:

    They don't take it at face value, they take it literally. Super-literally,

    Oh yeah, hot shot? I challenge you to decribe to me the difference between literally and "super-literally".

    @blakeyrat said:

    like Data from Star Trek.

    BUT DATA WAS FICTIONAL!!!

    @blakeyrat said:

    I get mad when people pretend to be stupid.

    Oh :( sorry, ignore this post...



  • @C-Octothorpe said:

    I don't know, I understood what he was saying and assumed you wouldn't be able to install .Net with 8k of memory...
    Yeah, looks like you need 16k (if I read the specs properly).



  • @ender said:

    @C-Octothorpe said:
    I don't know, I understood what he was saying and assumed you wouldn't be able to install .Net with 8k of memory...
    Yeah, looks like you need 16k (if I read the specs properly).
    That must be the compact framework then... ;P



  • @C-Octothorpe said:

    WHO THE FUCK IN I.T. DOESN'T HAVE A SHORT TEMPER?!?!


    I rarely post here.  How do you know this about me?  And why is it so amazing you write it in all caps?  Have you worked with me?  I guess that could explain it, because it seems everyone I work with does have a very short temper.  I'm not sure why, though.


    Still, I'm quite honored that I'm remembered here, despite my infrequent posting.  :)



  • @Who_the_Fuck said:

    I'm quite honored that I'm remembered here, despite my infrequent posting.
    Almost every day I hear someone say "Who The Fuck did this stupid shit".

     



  • @El_Heffe said:

    @Who_the_Fuck said:

    I'm quite honored that I'm remembered here, despite my infrequent posting.
    Almost every day I hear someone say "Who The Fuck did this stupid shit".

    Who The Fuck's on first?




  • @DaveK said:

    @El_Heffe said:

    @Who_the_Fuck said:

    I'm quite honored that I'm remembered here, despite my infrequent posting.
    Almost every day I hear someone say "Who The Fuck did this stupid shit".

    Who The Fuck's on first?
    Naturally



  • @C-Octothorpe said:

    @DaveK said:

    @El_Heffe said:

    @Who_the_Fuck said:

    I'm quite honored that I'm remembered here, despite my infrequent posting.
    Almost every day I hear someone say "Who The Fuck did this stupid shit".

    Who The Fuck's on first?
    Naturally

    So I throw the ball to naturally?


  • @Sutherlands said:

    @C-Octothorpe said:

    @DaveK said:

    @El_Heffe said:

    @Who_the_Fuck said:

    I'm quite honored that I'm remembered here, despite my infrequent posting.
    Almost every day I hear someone say "Who The Fuck did this stupid shit".

    Who The Fuck's on first?
    Naturally

    So I throw the ball to naturally?
    No, you hit Who The Fuck with the bat! Then steal his wallet...

Log in to reply
 

Looks like your connection to What the Daily WTF? was lost, please wait while we try to reconnect.