What's worse than a web host failing to keep your site up?



  • A web host that fails to keep its own site up:

    Warning: main(): open_basedir restriction in effect. File(../LIVE/etc/config.inc) is not within the allowed path(s): (/var/www/vhosts/onestop.net/httpdocs:/tmp) in /var/www/vhosts/onestop.net/httpdocs/index.php on line 3

    Warning: main(../LIVE/etc/config.inc): failed to open stream: Operation not permitted in /var/www/vhosts/onestop.net/httpdocs/index.php on line 3

    Warning: main(): open_basedir restriction in effect. File(../LIVE/etc/config.inc) is not within the allowed path(s): (/var/www/vhosts/onestop.net/httpdocs:/tmp) in /var/www/vhosts/onestop.net/httpdocs/index.php on line 3

    Warning: main(../LIVE/etc/config.inc): failed to open stream: Operation not permitted in /var/www/vhosts/onestop.net/httpdocs/index.php on line 3

     (etc etc)

    WTF?

    So, these guys aren't my actual web host, but I do have my domain registered through them. I'm trying to switch it to point at a new server. Two days ago, I found that I couldn't log in - not because it was rejecting my login, but because the login SERVER wouldn't respond.

    I don't think this was a problem for most customers - because OneStop uses two different login servers, one for recent customers, and one for customers who signed up before 2007. WTF? Obviously, I didn't know about this when I first registered the domain there in 2005...

    I should have been clued in to their degradation in service, I suppose, when I tried to renew last fall, and found that they weren't taking credit cards, due to "a problem" (as opposed to their not being able to take credit cards because of a solution, apparently). I had to paypal them eleven dollars. WTF? 

    At any rate, I got the above errors when I tried to have another go at logging in today (I tried to do a 'liveperson' chat before, but, oddly, it didn't work - WTF?).

    At this point I decide it's about time to give them a call. I call the 800 number and get... "This number is not in operation." WTF? So I try the normal number. Miraculously, it works. Double-miraculously, I get a rep in nothing flat.

    So, I talk to the rep:

    (rep takes information)
     
    <font color="#000000" face="Arial" size="2">Rep: "And, uh, sir, you say that you're having trouble with the web site, logging in?"</font> 
     
    <font color="#000000" face="Arial" size="2">Me: "Yeah. The whole thing is gone."</font>
     
    <font color="#000000" face="Arial" size="2">Rep: "OK, sir, I'm going to check that right now..." (obviously expecting it to be fine)</font>
     
    <font color="#000000" face="Arial" size="2">*pause*</font>
     
    <font color="#000000" face="Arial" size="2">Rep: "Er, um, OK, er... ok, sir.. um, it does... er, it does appear that the web site is down... um... and apparently the... the system that would, er, notify us... is not active..."</font>
     
    <font color="#000000" face="Arial" size="2">WTF? You heard it right, folks - the web host didn't even realize that their own web site wasn't working, and the system which would tell them that their system was not working, was not working.</font>
     
    <font color="#000000" face="Arial" size="2">Oh, and the real WTF? I went to check my wife's web site, which is hosted with them. It's just fine.
    </font>
     
     
     


  • Making their own site unusable is a WTF - not testing new settings / code is really bad, but this is not - it's just an issue that user should report:

    @PeriSoft said:

    web host didn't even realize that their own web site wasn't working, and the system which would tell them that their system was not working, was not working.

    Well. If our all monitoring systems were down, and our front website was down... I would expect a customer to be the first to notify about a problem. Maybe they have their internal systems working, so noone needs to look at customers' version of their site. Did you expect some guy sitting in the office, refreshing their own site whole day and yelling "IT'S DOWN", when it goes down?

    Unfortunately there's always a way for any system to go down in a way that you won't get notification about it. In that case only thing that saves you is constant notification that it actually is working... Unless that monitoring system is giving you a false positive - you can't do anything about it.



  •  Actually I would expect that the monitoring system be monitored as well by a person whose job is to monitor things.  Something as simple as a ping every 15 or 30 seconds would suffice, with a little balloon on the sysadmin's desktop that pops up if the monitoring system doesn't respond.  You know, watching the watchers and all that jazz.

     

    Oh, and PeriSoft, I would suggest changing who you do business with ;)



  •  @Heron said:

    Something as simple as a ping every 15 or 30 seconds would suffice, with a little balloon on the sysadmin's desktop that pops up if the monitoring system doesn't respond.  You know, watching the watchers and all that jazz.

    Then what if machine is alive, but monitoring software got stuck? You still get the pings, but you don't get monitor's reports. What if route to sysadmin's box goes down? What if... simple pings don't suffice.

    I've learned that one on many scenarios already... one day everything will fail and you'll get problem report straight from user. Unless you want to spend more on monitoring system, than monitored system is worth :/



  • Wow, both of you fail elementary systems administration hard.  You don't need someone sitting and watching the damn screen, you set it up to page you when a machine goes down. 



  • @morbiuswilters said:

    Wow, both of you fail elementary systems administration hard.
     

    Good - educate us then -> what is your scenario in case your outbound internet routing (if you're using sms sending service), or gsm gateway (if you're using local one) fails?



  • I'm not getting paid to sit here on the wtf forums to describe in detail how a properly set up sysadmin monitoring system should work, so I just described something simple.  But pings are better than nothing; and obviously if an app is capable of popping up a notification on the sysadmin's desktop, it is capable of sending said sysadmin an email or a page or an SMS message or a kick to the groin, if that's what you want.

    Don't be so quick to insult people.



  • @Heron said:

    Oh, and PeriSoft, I would suggest changing who you do business with ;)

     

     

    It has occurred to me to... seek alternatives... :)

    It never really crossed my mind that a domain registrar could actually fail in their duties - sure, a web host can go down, or break your server... but a registrar? They're just reselling from someone else anyway. But OneStop has obviously shown us the path to failure - prevent users from altering their domain information! Genius!

    At any rate, their web site is down being a WTF or not, the fact that they were notified days ago that an entire login server is failing to respond at all, and did nothing about it, suggests a certain WTFiness.

    I'll leave the flaming regarding appropriate system monitoring to the experts, however. I'm no IT guy. :)

     

    (Oh, and a kick to the groin would be great, particularly if it could be remotely administered. I can see it now - a sysadmin is walking down the hall, when somewhere, a server fails - and the sysadmin shouts, buckles in two, and drops his coffee... fantastic!) 



  • @Heron said:

    [...]it is capable of sending said sysadmin an email or a page or an SMS message or a kick to the groin[...]

    See also http://www.bash.org/?4281:

    <[SA]HatfulOfHollow> i'm going to become rich and famous after i invent a device that allows you to stab people in the face over the internet



  • @Heron said:

    Don't be so quick to insult people.

    You're the one who said the monitoring system should be watched by somebody who gets a popup on their desktop.  15 to 30 second pings?  You obviously have no clue what you are talking about and I was merely pointing out your ignorance on this topic.  There are people of many different experience levels here and I don't want them to pick up false information. 



  • @viraptor said:

    Good - educate us then -> what is your scenario in case your outbound internet routing (if you're using sms sending service), or gsm gateway (if you're using local one) fails?

    Quick and dirty:

    On-site you have monitoring that has outbound telephone access so it can dial out if the Internet goes down.  Also good to have a server or two at a remote datacenter that monitors your gateways and starts paging on failure.  If we're talking a very large system here, you probably want to monitor BGP updates from your router at several points across the Intertubez, as well as service connectivity from other countries.  Failures shouldn't even be noticeable to your customers, anyway, I'm only talking about alerting sysadmins when redundant systems go down.  You don't need contant notification or voodoo, just some good engineering and planning.



  •  Alright morbiuswilters, next time rather than give the simplest, easiest method I can come up with to monitor whether a remote system is alive (which, by the way, is better than not monitoring it at all), I'll be sure to write a 30-page instruction manual to cover every contingency.

    Jerk.



  • @Heron said:

     Alright morbiuswilters, next time rather than give the simplest, easiest method I can come up with to monitor whether a remote system is alive (which, by the way, is better than not monitoring it at all), I'll be sure to write a 30-page instruction manual to cover every contingency.

    Actually, I'd prefer a simple idea so long as it was a good one.  It seems you can't tell the difference between quality and quantity, though, so I won't belabour the point.

     

    @Heron said:

    Jerk.

    How mature! 



  • @morbiuswilters said:

    It seems you can't tell the difference between quality and quantity, though, so I won't belabour the point.
     

    Very much agreed!

    @Heron said:

    Jerk.

    Now, now, don't get your panties all in a twist just because you were corrected on the internet...

    How do people who get this bothered on a forum make it through life??

     



  • @viraptor said:

    Good - educate us then -> what is your scenario in case your outbound internet routing (if you're using sms sending service), or gsm gateway (if you're using local one) fails?

     

    You hire a company to do it for you.  Companies like Webmetrics have servers all around the world running test scripts for their customers, and can give you performance metrics and let you know if there is an outage, even a localized one.



  • People bitching about imaginary monitoring systems are missing the point. They already had a monitoring system and [b]it also failed[/b].



  • @Heron said:

     Actually I would expect that the monitoring system be monitored as well by a person whose job is to monitor things. 

     In my experience, all you have to do is to hire some accountants, a CEO or some other completely irrelevant person onto your team. Since they most of their days surfing the web, and are often also interrested in your product, they spend more time on your company website than any other person. This way you will have contant surveillance set up, and since they are humans they can come up with tricky solutions to circumvent temporary outtakes (reconnect cables, change wifi network, go to other office etc). If you hire non-techie people, they will also clickety-click a lot more than others, and are therefore also excellent betatesters. Unfortunately, this only works if you have an interresting product, which might not always be the case.



  • @Obfuscator said:

    In my experience, all you have to do is to hire some accountants, a CEO or some other completely irrelevant person onto your team. Since they most of their days surfing the web, and are often also interrested in your product, they spend more time on your company website than any other person. This way you will have contant surveillance set up, and since they are humans they can come up with tricky solutions to circumvent temporary outtakes (reconnect cables, change wifi network, go to other office etc). If you hire non-techie people, they will also clickety-click a lot more than others, and are therefore also excellent betatesters. Unfortunately, this only works if you have an interresting product, which might not always be the case.

    The "wandering nomad" style of beta testing can be useful, but it's no substitute for a structured regimine of test cases carried out by QA professionals.  Just as relying on an accountant to tell you when your site goes down is no substitute for having a good monitoring system in place that alerts your operations engineers when something goes wrong.



  • @Cap'n Steve said:

    People bitching about imaginary monitoring systems are missing the point. They already had a monitoring system and it also failed.

    "People bitching about O-ring integrity in cold weather are missing the point.  The Challenger shuttle already used O-rings and they failed."

    "People bitching about exploding gas tanks miss the point.  The Ford Pinto had safety features and they failed."

     

    lern2postmortem 



  • @morbiuswilters said:

    "People bitching about O-ring integrity in cold weather are missing the point.  The Challenger shuttle already used O-rings and they failed."

    "People bitching about exploding gas tanks miss the point.  The Ford Pinto had safety features and they failed."

    I have no idea what you're trying to say here. How do you know the system you described wasn't identical to the one they already had running? Should they just pile on notification systems until they're pretty sure they won't all fail at once?

    @morbiuswilters said:

    lern2postmortem 

    OMG, can't you spell? Go back to Slashdot you Lysis/dlihkten/$random_user_i_dont_like clone!



  • @Cap'n Steve said:

    kthxbai
    What the fuck are you bitches babbling about?



  • the web host didn't even realize that their own web site wasn't working, and the system which would tell them that their system was not working, was not working.

    Reminds me of the introduction to "Mostly Harmless"...


Log in to reply