The firefox fiasco



  • @jamesthedeveloper said:

    What the hell if wrong with you people? I'm currently running FF3.0 with 42 tabs in 2 windows (with TMP, Live HTTP Headers, Web Developer Toolbar, Download Them All and Read It Later). The Core 2 Duo CPU is idling and memory usage is 355MB of the 4GB(ish) ram available on my 32bit XP Pro...

     

    Amen, with 10 tabs open, and a CRAP load of plugins installed ( belgarion should remember giving me shit for them ) I am using 169,108 k or memory on a 5 year old laptop.



  • @dtech said:

    Even the OS cannot access the 4GB of RAM fully. To be accurate, 3GB RAM may indeed not be the correct number, however.

    The cause is that a portion of the virtual 32-bit adress space (usually 1GB in modern systems) gets reserved for memory mapped I/O. afaik this is never less than 512MB on modern systems, and can be up to 1,5GB, and usually is 1GB. That is why even the (32 bit) OS cannot use all of the 4GB of RAM.

    This is an artificial limitation, for client SKU only. Server 32bit SKU (Windows 2003, 2008+) can use all 4GB+, when the "cut" part is remapped by the chipset over 4GB physical.


  • @amischiefr said:

    belgarion should remember giving me shit for them
    I don't really recall that.  Were we drinking at the time?  Had to be.



  •  @roothorick said:

    @tgape said:

    You (meaning nocturnal) might think it would be nice if there was a way an app could specifically request memory which could be ripped from it with no prior notice, or otherwise mark blocks of its memory as such, and an API for the kernel to let the app know it's been taken.

    I dunno. I think it could be done rather well.

    <long complex description>

    That would be too confusing and complex.  Because of that and this adding complexity to threading issues I don't think anyone would use it.   I think the most straightforward and simple way would be if there was a signal that a program would catch from the OS saying, "Please give up some memory" and then a system call to get how much memory the OS would like to have back.  Then the program can free memory if it wants or not free memory.



  • @alegr said:

    This is an artificial limitation, for client SKU only.
    There's a good reason it's there, and not just for price points.[quote user="Mark Russinovich"][When Microsoft enabled Physical Address Extension on Windows XP machines...] What they found was that many of the systems would crash, hang, or
    become unbootable because some device drivers, commonly those for video
    and audio devices that are found typically on clients but not servers,
    were not programmed to expect physical addresses larger than 4GB. As a
    result, the drivers truncated such addresses, resulting in memory
    corruptions and corruption side effects. Server systems commonly have more generic devices and with simpler and
    more stable drivers, and therefore hadn't generally surfaced these
    problems. The problematic client driver ecosystem led to the decision
    for client SKUs to ignore physical memory that resides above 4GB, even
    though they can theoretically address it.[/quote]



  • @belgariontheking said:

    @amischiefr said:

    belgarion should remember giving me shit for them
    I don't really recall that.  Were we drinking at the time?  Had to be.

     

    You're not sober now are you?  Please, say it isn't so...  I don't want to be the only one drunk at work again :(



  • @amischiefr said:

    @belgariontheking said:
    @amischiefr said:
    belgarion should remember giving me shit for them
    I don't really recall that.  Were we drinking at the time?  Had to be.
    You're not sober now are you?  Please, say it isn't so...  I don't want to be the only one drunk at work again :(
    Glad to see someone else is using my Drinking algorithm.



  • @tgape said:

    The OS is in a position to know how much memory it can use for low priority tasks without otherwise interfering with the system, and the OS can readily give up any such memory at a moment's notice.

    Yes it can, and Firefox cannot of course, but Firefox on my system was using < 400MB which is less than many other programs running on my system (IE, Outlook [wtf?], Visual Studio, Sql Management Studio) as well as using less than other browsers that people actually use.  I'm not that worried about Firefox using up memory unless it becomes truly exorbitant, and 400MB is merely 10% of my system memory - after all, I bought 4GB of memory so I could use 4GB of memory.

     @tgape said:

    Firefox is just asking for it to be swapped out.

     Not a problem here, I run ALL of my computers without a page file.  I always thought of the Pagefile as a hack to get stuff to run back in the DOS days when 1MB of memory was $100.   Now that 4GB of memory can be had for $25, why use a pagefile, when you could just as easily buy another 4GB and not have one?  The only problem I ever has is that photoshop complains but works anyway.

     @tgape said:

    You (meaning nocturnal) might think it would be nice if there was a way an app could specifically request memory which could be ripped from it with no prior notice,

    like asp.net's System.Web.Caching ?  Yeah that would be nice, in fact with System.Web.Caching, you can set cache priority, configure expiration and expiration conditions.  Furthermore, a majority of Windows applications are probably written in either .Net or Java, both of which support garbage collection.  Both VM's could (and maybe do) monitor system memory usage, and force a global System.GC() in the event of a low memory condition, to clean up memory and expire cached items.

     @tgape said:

    any application which uses gratuitous amounts of memory for long periods of time for its own performance gains

    On the other hand, Firefox uses less than other common browsers, and I don't consider 400MB to be to bad, since I have another 2000MB that Windows can't even figure out what to do with, and other applications (read:Outlook) manage to use more without doing as much as Firefox. Even Vista the 'memory hog' can't manage to fill 4GB of RAM.  Additionally, when I open Firefox, it's because I'm using it, and I use it more than any other application on the system by a wide margin.  If I absolutely want to minimize the amount of allocated RAM (like because I'm going to play a game), I just close it.

     



  • @nocturnal said:

    Yes it can, and Firefox cannot of course, but Firefox on my system was using < 400MB which is less than many other programs running on my system (IE, Outlook [wtf?], Visual Studio, Sql Management Studio) as well as using less than other browsers that people actually use.  I'm not that worried about Firefox using up memory unless it becomes truly exorbitant, and 400MB is merely 10% of my system memory - after all, I bought 4GB of memory so I could use 4GB of memory.

    My problem is that 400MB makes FF itself slow to a crawl.  Since it's due to memory leaks, it becomes worse over time when FF isn't closed.  Eventually it can used 500+MB with only one tab open which is just absurd and causes FF to slow to a crawl.

     

    @nocturnal said:

     Not a problem here, I run ALL of my computers without a page file.  I always thought of the Pagefile as a hack to get stuff to run back in the DOS days when 1MB of memory was $100.   Now that 4GB of memory can be had for $25, why use a pagefile, when you could just as easily buy another 4GB and not have one?  The only problem I ever has is that photoshop complains but works anyway.

    Not that wise.  Even with 8GB of memory, the OS will still find things to use it for (disk cache, for instance).  Swap files/partitions provide a place to store unused (but allocated) memory so that the physical RAM can be used to store something more useful.  Another very good reason to use swap is that out-of-control programs that try to allocate unlimited memory get slowed up, giving you some breathing room to handle it.  When the OS starts swapping like mad you know something is up and have a chance to fix things before it gets out-of-hand.  Otherwise you risk apps dying or the OS collapsing.



  •  @nocturnal said:

    Not a problem here, I run ALL of my computers without a page file.  I always thought of the Pagefile as a hack to get stuff to run back in the DOS days when 1MB of memory was $100.   Now that 4GB of memory can be had for $25, why use a pagefile, when you could just as easily buy another 4GB and not have one?  The only problem I ever has is that photoshop complains but works anyway.

    In my experience, XP tends to start acting up if you leave it running long enough without a page file, even if you have plenty of RAM free.  This is because Windows tries to page things that haven't been used for X amount of time, even if there is plenty of free memory.  I haven't had enough experience with Vista or 7 to know how it acts with no pagefile.  (Anecdotal evidence doesn't apply to everyone, YMMV, etc.)

    That said, it behaves just fine if you pagefile is tiny, so I've made it a habit to set my pagefile to 512MB (or 256MB on systems with tiny drives).  512MB extra usage on a 1TB drive is cheaper than buying more RAM ;)

    Then again, my new system has 12GB RAM, so... it probably doesn't matter.



  • @Heron said:

    my new system has 12GB RAM
    What percentage of that do you generally use?  I have 6GB and have never run out, and have only gone over 4GB a handful of times since I got the machine in February.  Even that was while running a game without shutting anything else down beforehand (aka leaving firefox, chrome, utorrent, etc. running).

    The point of having tons of RAM is never to run out, which you will certainly never have a problem with, but 12 Gig?  I fear you may have purchased something like two times as much as you'll use within the next couple of years.



  • @belgariontheking said:

    The point of having tons of RAM is never to run out, which you will certainly never have a problem with, but 12 Gig?  I fear you may have purchased something like two times as much as you'll use within the next couple of years.

    I certainly hope you're right... I'd hate to have to upgrade a machine I spent $2k on after just a year or two.

    (I figured if I was going to spend so much I may as well get enough to last a long while.)



  •  Double-posting to say this, but I thought I'd add that I did manage to actually use more than 6GB of RAM.  I set up a 9GB ramdisk in Gentoo to compile OpenOffice (with 8 jobs in parallel, of course).  Rather than take four hours to compile (as was usual on my old core 2 duo with 2GB RAM with compilation happening on the hard drive), it took all of 45 minutes.

    Yeah yeah, I know what some of you are going to say - not worth compiling OOo, etc etc.  Not interested ;)



  • @Heron said:

     Double-posting to say this, but I thought I'd add that I did manage to actually use more than 6GB of RAM.  I set up a 9GB ramdisk in Gentoo to compile OpenOffice (with 8 jobs in parallel, of course).  Rather than take four hours to compile (as was usual on my old core 2 duo with 2GB RAM with compilation happening on the hard drive), it took all of 45 minutes.

    Yeah yeah, I know what some of you are going to say - not worth compiling OOo, etc etc.  Not interested ;)

    I didn't even know it was worth using OOo, let alone compiling it.



  • @belgariontheking said:

    The point of having tons of RAM is never to run out, which you will certainly never have a problem with, but 12 Gig?  I fear you may have purchased something like two times as much as you'll use within the next couple of years.

     

    Oh what was that famous quote about ram, something like: "nobody would ever need more than 64K of memory"

     



  • @morbiuswilters said:

    @Heron said:

     Double-posting to say this, but I thought I'd add that I did manage to actually use more than 6GB of RAM.  I set up a 9GB ramdisk in Gentoo to compile OpenOffice (with 8 jobs in parallel, of course).  Rather than take four hours to compile (as was usual on my old core 2 duo with 2GB RAM with compilation happening on the hard drive), it took all of 45 minutes.

    Yeah yeah, I know what some of you are going to say - not worth compiling OOo, etc etc.  Not interested ;)

    I didn't even know it was worth using OOo, let alone compiling it.

    We've been over this: OOo is for the poor, the cheap, and the /.ers. Every time I use OOo, I feel like I'm using some cheap Chinese knockoff, and I'm at least half right.


  • @amischiefr said:

    @belgariontheking said:

    The point of having tons of RAM is never to run out, which you will certainly never have a problem with, but 12 Gig?  I fear you may have purchased something like two times as much as you'll use within the next couple of years.

     

    Oh what was that famous quote about ram, something like: "nobody would ever need more than 64K of memory"

    I think it was Linus Torvalds, but he was referring to the PDP-11.



  • @bstorer said:

    We've been over this: OOo is for the poor, the cheap, and the /.ers. Every time I use OOo, I feel like I'm using some cheap Chinese knockoff, and I'm at least half right.

    I agree with you, if and only if you do anything more complicated than throwing numbers on a grid, maybe with a few sums, or writing simple documents with little formatting.  OpenOffice is perfectly functional for basic tasks.

    It's also useful for resurrecting ancient WordPerfect documents.  I guess Word can do that too, but why pay for Word when you can resurrect the document for free?

    That said, I think OOo would benefit greatly from a UI makeover; it's quite functional if you take the time to figure out *how* to use it.



  • @morbiuswilters said:

    I think it was Linus Torvalds, but he was referring to the PDP-11.

    Yeah, but the point is that 10 years from now we'll look at somebody and say: "you only have 12GB of ram?  You poor soul, how do you even run Winows2019!!"


  • @amischiefr said:

    @morbiuswilters said:

    I think it was Linus Torvalds, but he was referring to the PDP-11.

    Yeah, but the point is that 10 years from now we'll look at somebody and say: "you only have 12GB of ram?  You poor soul, how do you even run Winows2019!!"
    Well sure, but by then computers will be twice as fast, 10,000 times larger, and so expensive that only the five richest kings of Europe will own them.


  • @morbiuswilters said:

    @benryves said:

    @morbiuswilters said:
    ...and implement a one-process-per-page feature like Chrome has.
    Wasn't that an IE feature before Chrome?

    True, but when a page crashes in IE it takes down the whole browser.  Supposedly this does not happen in Chrome.

     

    It doesn't happen in IE8 as well, and the public beta of IE8 came out slightly before the first public beta of Chrome. So, technically, people were exposed to the feature first in IE8, then Chrome shortly afterwards. No idea which company started developing it first, though. It hardly matters, since all browsers will have it in another year.



  • @morbiuswilters said:

    Even with 8GB of memory, the OS will still find things to use it for (disk cache, for instance).

    So let's say I have two equivalent computers, one with 4GB RAM and a 4GB pagefile, and another with 8GB of RAM with no pagefile, both machines will be able to put 8GB into virtual memory, but on the one w/ 8GB all of it is physical.  Which do you think is going to perform better?

    The way I see it, the one with all physical RAM can read and process everything in RAM (let's say its DDR3 1600Mhz / PC3 12800) 1.33 times in one second.  The other computer would take over 40 seconds to do that, even if the pagefile is the first partition on a WD Raptor and every page was read sequentially in the most efficient way possible.  Not having a pagefile feels faster too.  I never wait for the system to read back pages from the hard drive after gaming anything.  It's awesome to alt-tab out of a game and have Windows appear, completely usable, instantly. Disk performance should increase since actual disk IO will never be interrupted by pagefile IO and I'd also assume having no pagefile would greatly increase the life of the hard drive, since it would be used less.

    @morbiuswilters said:

    Another very good reason to use swap is that out-of-control programs that try to allocate unlimited memory get slowed up, giving you some breathing room to handle it.

    Hmm...I can't recall this happening...wouldn't it just instantly run out of memory and windows would kill the process?  Maybe that's why I don't remember it happening...windows just killed the process and I just thought it crashed?



  • @nocturnal said:

    So let's say I have two equivalent computers, one with 4GB RAM and a 4GB pagefile, and another with 8GB of RAM with no pagefile, both machines will be able to put 8GB into virtual memory, but on the one w/ 8GB all of it is physical.  Which do you think is going to perform better?

    That's not the point.

    The point is, if you have two equivalent computers, one with 4GB of RAM and one with 4GB of RAM plus a 4GB page file, which do you think is more resistant to out-of-memory errors?



  • @nocturnal said:

    Hmm...I can't recall this happening...wouldn't it just instantly run out of memory and windows would kill the process?  Maybe that's why I don't remember it happening...windows just killed the process and I just thought it crashed?

     

    What if it kills other processes or other processes can no longer get memory?



  • @nocturnal said:

    @tgape said:
    The OS is in a position to know how much memory it can use for low priority tasks without otherwise interfering with the system, and the OS can readily give up any such memory at a moment's notice.

    Yes it can, and Firefox cannot of course, but Firefox on my system was using < 400MB which is less than many other programs running on my system (IE, Outlook [wtf?], Visual Studio, Sql Management Studio) as well as using less than other browsers that people actually use.  I'm not that worried about Firefox using up memory unless it becomes truly exorbitant, and 400MB is merely 10% of my system memory - after all, I bought 4GB of memory so I could use 4GB of memory.

    It does tend to become truly exorbitant: on my home computer, Firefox is currently using 1.4GB to display two tabs: Gmail and Pandora.



  • @Heron said:

    @nocturnal said:

    So let's say I have two equivalent computers, one with 4GB RAM and a 4GB pagefile, and another with 8GB of RAM with no pagefile, both machines will be able to put 8GB into virtual memory, but on the one w/ 8GB all of it is physical.  Which do you think is going to perform better?

    That's not the point.

    The point is, if you have two equivalent computers, one with 4GB of RAM and one with 4GB of RAM plus a 4GB page file, which do you think is more resistant to out-of-memory errors?

    Precisely.  I'm not saying "skimp on RAM", I'm saying "buy the same amount of RAM but go ahead and dedicated 1GB to page/swap".  It doesn't hurt, it can sometimes help and disk space is so cheap that 1GB hardly matters.



  • @bstorer said:

    Well sure, but by then computers will be twice as fast, 10,000 times larger, and so expensive that only the five richest kings of Europe will own them.

    Well, sure, if you run Windows...



  • @tster said:

    That would be too confusing and complex.  Because of that and this adding complexity to threading issues I don't think anyone would use it.   I think the most straightforward and simple way would be if there was a signal that a program would catch from the OS saying, "Please give up some memory" and then a system call to get how much memory the OS would like to have back.  Then the program can free memory if it wants or not free memory.

    In the old days, this made sense. Systems were hampered by all kinds of artificial limits and could detect when system pools were getting very close to those limits. See, for example, the long-obsolete 'WM_COMPACTING' notification.

    Interestingly, this is now an extremely difficult thing to do. For one thing, determining if a system is low on memory is extremely difficult. Worse, many of the things most tasks would do in response to such a notification would simply make the problem worse. For example, if you detect low memory by looking for excessive paging, and programs respond by discarding caches, the net effect may be that that discarded cache data simply needs to page back in.

    However, even though it is not a trivial problem, it does seem like it would be one worth solving or at least working on. Processes could register for differing categories of notifications and could let the system know what they are capable of doing. For example, a process can say "I can discard cached data that I am unlikely to need again shortly, let me know if you're paging excessively". Or "I am part of a cluster, and I can load shed to other systems if your CPU load is high or your memory load is high." Or even, "I'm an unimportant task that needs lots of memory, let me know if now's a bad time to run". (And 'nice' or scheduling priorities don't do this. In fact, low priority may make you use up the memory for longer or have more of your memory paged out between timeslices leading to more disk I/O.)



  • @joelkatz said:

    However, even though it is not a trivial problem, it does seem like it would be one worth solving or at least working on. Processes could register for differing categories of notifications and could let the system know what they are capable of doing. For example, a process can say "I can discard cached data that I am unlikely to need again shortly, let me know if you're paging excessively". Or "I am part of a cluster, and I can load shed to other systems if your CPU load is high or your memory load is high." Or even, "I'm an unimportant task that needs lots of memory, let me know if now's a bad time to run". (And 'nice' or scheduling priorities don't do this. In fact, low priority may make you use up the memory for longer or have more of your memory paged out between timeslices leading to more disk I/O.)
    You make computers sound so .... British.  I'm afraid that under your direction, every process would be allowed out for tea in the afternoon and I could get nothing done.  This may be alright if the tea stops on its way back to pick up some porn.



  • You're forgetting the most important lesson that it took Microsoft a decade to learn: programmers are dickwads.

    If you let the programmer tell the OS what resources it needs, it'll always say "all of them". Don't picture your system being used by, say, Linus Torvalds, picture it being used by the douchebags who made RealPlayer in 1999.



  • @belgariontheking said:

    @joelkatz said:

    However, even though it is not a trivial problem, it does seem like it would be one worth solving or at least working on. Processes could register for differing categories of notifications and could let the system know what they are capable of doing. For example, a process can say "I can discard cached data that I am unlikely to need again shortly, let me know if you're paging excessively". Or "I am part of a cluster, and I can load shed to other systems if your CPU load is high or your memory load is high." Or even, "I'm an unimportant task that needs lots of memory, let me know if now's a bad time to run". (And 'nice' or scheduling priorities don't do this. In fact, low priority may make you use up the memory for longer or have more of your memory paged out between timeslices leading to more disk I/O.)
    You make computers sound so .... British.  I'm afraid that under your direction, every process would be allowed out for tea in the afternoon and I could get nothing done.  This may be alright if the tea stops on its way back to pick up some porn.

    Don't forget the bad teeth, mediocre socialized medicine, a privacy-free police state and mediocre food.

     

    "Oy, guv, I was jes' about to allocate meself some 'eap when I made a lil' stop at the ole' pub for some bangers and spott'd dick and a pint to wash it all down with!  And Jaysus help me if mah teeth didn' fall to pieces when I bit innit!"



  • @morbiuswilters said:

    "Oy, guv, I was jes' about to allocate meself some 'eap when I made a lil' stop at the ole' pub for some bangers and spott'd dick and a pint to wash it all down with!  And Jaysus help me if mah teeth didn' fall to pieces when I bit innit!"
    NEEDZ MOUAR SUPERFLUOUUS "U"!



  •  I love you guys. I try to be serious and you turn it into something laugh out loud funny.



  • "Please give up some memory or I'll kill kick your face in" is approximately the signal the iPhone OS sends to apps when it needs to free up some memory. I think you'd need an implied threat (program suspension, termination) in order to have such a feature provide any value to the end user.



  • @nocturnal said:

    Not a problem here, I run ALL of my computers without a page file.
     

    I tried that way back in 2001 or so: My Windows 98 machine had 1GB RAM and turning off swap made it run faster, especially switching to an application that hadn't been used for a few minutes. But I found I couldn't run Quake 2! It specifally requested swap space and crashed when it couldn't get any. It only needed 16MB or something...




  • @nocturnal said:

    Furthermore, a majority of Windows applications are probably written in either .Net or Java, both of which support garbage collection.  Both VM's could (and maybe do) monitor system memory usage, and force a global System.GC() in the event of a low memory condition, to clean up memory and expire cached items.

    That's just exactly what I need: right when my system runs out of memory due to a bunch of poorly written apps, those poorly written apps then seize the processor and memory bandwidth to utilize *everything*, rather than just spiking the disk I/O.

    Note that my problem with GC isn't the time it spends releasing memory back to the OS.  It's the time it spends evaluating all of the objects that it is not going to release back to the OS - and, as the system memory becomes progressively more full of real data, the amount of time these systems spend contemplating objects they aren't going to free.

    Of course, I do need to admit, I'm out of touch with the current state of the art here - I wrote Java's GC off as bad business back before MS even had a Java fork, let alone called it one.  I also fully admit my memory constraints are of my own choosing - when I bought my last computer, I could've gotten a 8G behemoth with dual quad core procs and a 1200W power supply.  I instead chose to purchase an 800MHz model with only 256M and a 40W power supply.  It's a mite quieter.

    My box serves my needs just fine - but, of course, Firefox is one of those apps I don't actually run on it, because it uses too much memory.  I wouldn't be surprised to find that every other piece of software you (meaning, once again, nocturnal) run is also in that category.  (Yeah, I know a bunch of y'all run Linux, so there'll be a lot of software in common there.  mv, ls, and all that stuff.  But nocturnal doesn't strike me as that sort.)



  • @Heron said:

    @nocturnal said:

    So let's say I have two equivalent computers, one with 4GB RAM and a 4GB pagefile, and another with 8GB of RAM with no pagefile, both machines will be able to put 8GB into virtual memory, but on the one w/ 8GB all of it is physical.  Which do you think is going to perform better?

    That's not the point.

    The point is, if you have two equivalent computers, one with 4GB of RAM and one with 4GB of RAM plus a 4GB page file, which do you think is more resistant to out-of-memory errors?

    As my coworkers demonstrated a few years back, the real question is, which machine runs faster:

    1. A machine with 16G RAM and 6G swap on one drive
    2. A machine with 8G RAM and 36G swap striped across 18 drives
    3. A machine with 16G RAM and 6G swap, software mirrored onto two drives (Solaris 8)

    Now run a process on them which operates on an 18G in memory hash table...



  • @tgape said:

    @Heron said:

    @nocturnal said:

    So let's say I have two equivalent computers, one with 4GB RAM and a 4GB pagefile, and another with 8GB of RAM with no pagefile, both machines will be able to put 8GB into virtual memory, but on the one w/ 8GB all of it is physical.  Which do you think is going to perform better?

    That's not the point.

    The point is, if you have two equivalent computers, one with 4GB of RAM and one with 4GB of RAM plus a 4GB page file, which do you think is more resistant to out-of-memory errors?

    As my coworkers demonstrated a few years back, the real question is, which machine runs faster:

    1. A machine with 16G RAM and 6G swap on one drive
    2. A machine with 8G RAM and 36G swap striped across 18 drives
    3. A machine with 16G RAM and 6G swap, software mirrored onto two drives (Solaris 8)

    Now run a process on them which operates on an 18G in memory hash table...

     

    We would have to know quite a bit more info about the program.  For instance:

    1. Are most of the queries on the hash table similar (meaning, getting the same data)

    2. Is there large sections of the data is is very rarely used?

    3. Is the hash table that big because it has so many entries or because the data inside it is so big?

    4. Why are you using a hash table instead of a database?

    5. Do you care about start up time?



  • @tster said:

    @tgape said:

    As my coworkers demonstrated a few years back, the real question is, which machine runs faster:

    1. A machine with 16G RAM and 6G swap on one drive
    2. A machine with 8G RAM and 36G swap striped across 18 drives
    3. A machine with 16G RAM and 6G swap, software mirrored onto two drives (Solaris 8)

    Now run a process on them which operates on an 18G in memory hash table...

    We would have to know quite a bit more info about the program.  For instance:

    1. Are most of the queries on the hash table similar (meaning, getting the same data)

    There is only one type of data in the hash.  There is only one type of key used to index it.  The hash keys are 20 characters long, and match the regex /^[0-9A-Za-y]{16}[0-9]{4}$/.  Values are not changed after they are set.

    @tster said:

    2. Are there large sections of the data which are very rarely used?

    No.  Key/value pairs are purged from the hash after they are no longer useful.  Approximately 0.1% of the keys miss that identification point, but these records are purged at regular intervals.  (Input is separated into different files.  All keys are invalid four files after the first file in which they occur; the process purges any keys remaining after completing the fifth file.  Jobids include a file code, and untracked jobids encountered are checked to verify their file code is within the proper range; failure aborts processing.)

    @tster said:

    3. Is the hash table that big because it has so many entries or because the data inside it is so big?

    All hash keys and values are simple perl scalar strings. Key lengths are 20 characters, average value length is approximately 30 characters.  This seems pretty small to me.  There are usually more than 100k jobs in the hash table at any time.

    @tster said:

    4. Why are you using a hash table instead of a database?

    The code was easy, and using a database would guarantee all of the values would get written to disk.  As no value would remain for more than 2% of the life of the process, and relatively few would even last that long, the permanence of a database would've provided no benefit.  Using an in-memory hash meant that there was a chance the shorter-lived keys would never make it to disk.

    It would have been possible to write the data to a database initially, but the storage requirements for database versions of these log files for the period of time we keep the flat files around would be prohibitive.  (In the grand scheme of things, it's not that much - merely five times the amount which management was willing to pay for storage.)  There had been some testing with Splunk, which found it feasible to load the data into a database in real time, but the processing time to do so was several times the processing time to run this script on the same data.  The fact that we'd be able to get near instantaneous results from Splunk did not apparently convince management it was worth the investment.

    @tster said:

    5. Do you care about start up time?

    Only as much as it is part of the total run time.  If "startup" took 99% of the total execution time, but that total execution time was 10 minutes less, I'd consider it a win.

    Edit: also note: Other than covering the fact that the keys are fairly evenly used and evenly distributed, such that the entire hash table is actually being used, and each hash table lookup therefore approaches a one in nine chance of causing a page fault on the 16G machines, and a better than 50 percent chance on the 8G machine, I think your questions miss the point of my question.  (That is, questions 2 and 3 are very apt.  I think the rest of the questions would only be useful if you were trying to fix the process, rather than simply determine the faster machine.)


Log in to reply