Jump to content

ServInt Major Fail!!!???


MatthewSchenker
 Share

Recommended Posts

Hello,

Is anyone else experiencing a problem with a Servint account?  I cannot view or log into ANY of my sites hosted with them, nor can I log into the ServInt portal.

This is a disaster.  I literally launched a major international project TODAY, with a live presentation occurring this evening in Germany for the site.

Does anyone have any information on this?

Thanks,

Matthew

Link to comment
Share on other sites

@Matthew: no one still seems to have a clear idea, so I guess we can only assume that something big (that they've got no control over) broke down. Network failure or something like that, perhaps?

Happens to the best of us, but admittedly they could've explained things in a bit more detail. I was trying to write something here when the downtime started. Kept tabs on it for the first hour or so in hopes it'd be fixed soon, but well..

Link to comment
Share on other sites

Does anyone have any information on this?

The server has been down for almost 6 hours.  How in the world could a modern hosting service be in this situation?

I don't know about you guys but that is the reason why I never register the clients domain by the same company

that is going to host the website. Register by a register company and host at a hoster company.

Hoster A down ? Point domain to Hoster B and upload a copy. These days DNS propagation is less than 6 hours.

Edit: instead of changing dns settings, use url forwarding. It works instantly!

Visitors won't even see the change in host url if configured properly and it is

only necessary during the time of primary host fail.

I do the same with my email accounts. I never register an email address with my ISP.

Works for me.

Would like to know though what works for you guys.

  • Like 2
Link to comment
Share on other sites

I know we're several hours further on but my sites on Servint are up.

There was a message when you login on their site warning of critical kernel updates required so I assume it was that?

Link to comment
Share on other sites

Update: Matthew, if you login on their site now they have provided an update. Looks like despite a tonne of resiliency and redundancy one sneaky switch was to blame and they seem genuinely surprised that it impacted on other systems in the way that it did.

Not much comfort in terms of your downtime, but this sort of thing just doesn't usually happen at ServInt.

Link to comment
Share on other sites

Greetings,

All told, including loss of FTP and email, about a 10-hour downtime.  That is beyond absurd in 2014.  And e-mails sent during the downtime are all still lost.

This morning, I face the impossible mess of explaining things to a client who went in front of an audience to launch a site that I spent around 30 hours developing.  All they showed was a blue screen.

I'd like to know what it is about ServInt's hardware that could cause this highly unusual situation.

Sorry if I am too upset about this, but it has a real impact on my business. And like a lot of people here, my business can't really afford hits like this.

Thanks,

Matthew

EDIT:

Just saw this incredible statement on the ServInt Portal:

"...No device outside of the ServInt network is supposed to be capable of impacting the core in this way..."

I might expect a statement like this from a college student learning how network hardware works, not from a company hosting very valuable, real-world assets.

Link to comment
Share on other sites

Sorry to hear that Matthew. Things look okay again at Servint.

Anyway, this raises the question what we tell our clients who is end responsible

in such cases ? The Sitebuilder or the Hoster ? I mean it is not our fault if a Hoster

goes down but do we talk about this before we start building the clients site ?

Link to comment
Share on other sites

Greetings,

I'd be curious if anyone here explains to clients ahead of time that this sort of thing might happen.  My guess is, it would make you look bad next to the competition (who isn't saying this).

Part of the cleanup effort is to use various channels to make it 100% crystal clear, and totally certain to my clients, that this was a failure of a service beyond my control and has nothing to do with my work or the CMS I use.

On a related note: does anyone know of an independent resource that provides information on server downtime, so I can compare various hosts?

Thanks,

Matthew

Link to comment
Share on other sites

I'd be curious if anyone here explains to clients ahead of time that this sort of thing might happen.  My guess is, it would make you look bad next to the competition (who isn't saying this).

Not at present, but I may in future. Servint's uptime guarantee is 99.9% (although this page suggests 100% https://www.servint.net/sla.php ) so if this had only impacted you for a couple of hours earlier then technically they are within their 99.9% SLA (though I fully expect them not to point to that in such a serious incident as this - just pointing out that 99.9% still leaves you open over the course of a year without much recourse: http://royal.pingdom.com/royalfiles/pingdom_uptime_cheat_sheet.pdf ). Having just mentioned these "guarantees", ServInt have a very good article on the subject here explaining what web hosts really mean: http://blog.servint.net/2013/05/03/why-uptime-guarantees-are-ridiculous/ - it's more of an "...or (some of) your money back" deal.

Not that any of that helps you right now in your unfortunate situation.

On a related note: does anyone know of an independent resource that provides information on server downtime, so I can compare various hosts?

I don't think such a service is actually possible. You would need to know the IP address of every server in every datacentre and ping them all, which would rely on web hosts submitting the IP address of every server they roll out (including VMs I imagine) to generate such statistics.

Something I have done recently is signed up for the free service at https://uptimerobot.com/ which at least lets you keep tabs on all sorts of processes on your own server. I have it monitoring a couple of servers for uptime on websites on port 80, as well as a variety of email ports. It'll alert you the minute something is wrong (well, to within 5 minutes!) and there's a decent Android app as well: https://play.google.com/store/apps/details?id=hu.elevenoone.android.uptimerobot&hl=en_GB

You could of course pay for Pingdom to check your servers more frequently (at least I think it's more frequent - has more features anyway).

Would any of this have alerted you in time to mobilise an alternative in time for the client? I think that's the question and if there's the slimmest hope of the answer being "yes" then I'd get signed up to one of these services right away.

I think the only way you will ever guarantee uptime though when you absolutely need it is to have two servers in different datacentres, preferable really far apart, with a load balancer or something (I'm not too network savvy so it's probably something else) that would send traffic to the working server when the other server is down.

Link to comment
Share on other sites

Matthew, sorry to hear you were so affected by this outage. It sounds like this particular outage was one that couldn't have been anticipated by anyone. From what I gather reading on other sites and on twitter, it sounds like a piece of network hardware that failed but provided no failure indicators. If that's the case, that would have made it particularly difficult to track down and left little room to put all that redundancy to work. Perhaps this particular type of outage is a once in a lifetime thing, but the reality is that outages occur everywhere and no webhost is immune to them. Not to mention outages can occur anywhere when it comes to networks, with the webhosts like ServInt probably being the most solid part of that chain. 

I was fairly lucky here in that I didn't really notice the outage other than someone emailed me about it when I was cooking dinner. But all seemed to be back online 30 minutes later and didn't go out again as far as I know. I've got most of my clients hosted at that Reston, VA data center, but the time the outage occurred was one of the least traffic times for the sites I work on, so I never heard from anyone about it. 

In 11+ years, I've only experienced one other major outage at ServInt and that was several years ago. Someone apparently got sloppy with a back-hoe in a barnyard and apparently cut off all lines of communication to McLean, VA. If I recall that outage was quite a bit longer than this one, but it's been awhile. 

There is absolutely nothing you could have or should have done extra here. On the other hand, if your client is giving a presentation, they are probably the ones that should have a backup plan. Anyone experienced giving presentations knows that you have to keep everything you need with you. You can't ever count on something being accessible from the internet, though usually for other reasons (bad wireless signal, something broken at the conference center's internet, etc.) So when it comes to presentations, you can only count on what's on your computer. Having a local running copy of a site, or a presentation with screenshots are good plans. If they couldn't access the site, hopefully that's what they did. 

One thing to take comfort in is that if this particular outage had occurred at some other host, chances are they would still be down right now. My opinion is I don't think there's any value in looking elsewhere due to this particular incident. I already know ServInt has the best people in the business. This kind of stuff can happen to any of them, and ServInt now has some experience that the others don't. Outages are a fact of life in the business and nobody is immune, but ServInt's history is that they are less prone to outages than most, and better equipped to handle them when the inevitable strikes. 

  • Like 2
Link to comment
Share on other sites

Sorry you had to go through a situation, as you described.

I'd be curious if anyone here explains to clients ahead of time that this sort of thing might happen.  My guess is, it would make you look bad next to the competition (who isn't saying this).

I actually do and it's a part of me initially working with a client.  I believe it's about integrity, honesty and being a professional.  I also make sure that periodically I make them aware of the potential for outages/disruptions that I have no control over.

Your clients may not want to hear that there can be outages but I usually frame it based on what type of hosting they chose to buy.  By letting them know ahead of time what the pros and cons are of a hosting solution, when bad things ultimately do happen they were advised of the consequences.  It doesn't mitigate the consequences of the outage but they cannot fairly look at you as the cause.

You do get what you pay for, in networking, and everyone needs to be realistic about their hosting choices.  I make sure my clients know this.  High redundancy and fail-over services cost money.  Even if you pay for these things there is still no guarantee that issues won't happen.  That's the nature of an interconnected world.

Once again, I'm sorry you had to endure this type of problem.

Best Regards,

Charles

Link to comment
Share on other sites

Greetings,

Ryan: I appreciate the fact that clients should have a back-up plan when doing live presentations.  However, in this case, it is a contest site, and the presentation was more of a launch, after which contestants were supposed to start using the site.  It was all planned far ahead of time, carefully orchestrated months ahead of time.

Charles: I think we're talking about different things here.  Of course, I would never lead a client to believe that outages don't occur.  It's one thing to cite in a contract that outages may happen, and quite another thing for the site to fail for 10 hours at a crucial juncture.  How many developers would be comforted by the fact that they had a clause in their contract saying this might happen?  Having it in your contract doesn't protect you from the reality of client impressions and business repercussions.

Regarding "integrity, honesty, and being a professional": I consider myself to be all three.  And just for the record, I am paying ServInt for one of its premier services.

I admit that in the end this may just be a case of phenomenally bad luck and bad timing.  The outage occurred right at the heart of all the activity for this project.

Thanks,

Matthew

Link to comment
Share on other sites

I agree it was a stroke of bad luck and bad timing for you.  Please don't take anything I wrote as a reflection on/about you.  It's a bad situation and unfortunately happens from time to time.

Regarding "integrity, honesty, and being a professional":

That's what I believed are the qualities that you have.  I wouldn't have responded to your post if I thought otherwise.  I've gone through a situation like yours before many times.  I agree with all that you wrote and believe you should have been pissed and upset.

Link to comment
Share on other sites

Sorry for the bad luck, but in such a situation a backup site would've been the way to go. My personal website is on a really nice german hoster. The only downtime I know of was by a ddos attack on one of the datacenters, where they rent their servers. So even if everything works right on the hosters side you could get downtime just because of such attacks. So there's never a 100% ensureance for uptime of a server.

Link to comment
Share on other sites

I don't know about you guys but that is the reason why I never register the clients domain by the same company

that is going to host the website. Register by a register company and host at a hoster company.

Hoster A down ? Point domain to Hoster B and upload a copy. These days DNS propagation is less than 6 hours.

I do the same with my email accounts. I never register an email address with my ISP.

Works for me.

Would like to know though what works for you guys.

For important clients I use dns failover. If the monitoring service can't make contact with the UP script on the site then failover switches dns to another server (in my garage!) Then when it comes back online it switches back again. The switch normally takes about 60 seconds.

For not such important sites I use cloudflare.

  • Like 1
Link to comment
Share on other sites

For important clients I use dns failover. If the monitoring service can't make contact with the UP script on the site then failover switches dns to another server (in my garage!) Then when it comes back online it switches back again. The switch normally takes about 60 seconds.

Many thanks for posting this. This was exactly what I was trying to put on the table. First time I read

about such a solution. With solutions like this we can confront ourselves confidently with our clients

about this returning ghost in the host and now even have  something to put in our contracts that does

not look daft anymore but very constructively. (The switch normally takes about 60 seconds) I also like

now having the freedom to offer this service up to the clients website importance. Perfect !

Can you tell something more about your own temporary backup server, dns failover and that up script.

(already searching with google)

Link to comment
Share on other sites

DNS made easy is the service I use. It's just a couple of quid a month but's is worth it for peace of mind. They provide a service that looks to your site to find a pre designated page with a string in it. In my case the page is called siteup.php, from memory it has a one line entry echo 'siteup'; the monitoring service just checks for that page once a minute and when it can't find it for whatever reason, it simply changes the DNS entry to the new server. It can also do a tertiary server should the second server fail.

It keeps monitoring and when it finds siteup.php switches it back with short ttl settings.

If I'm being super efficient I set up SQL replication. http://www.howtoforge.com/mysql_database_replication

  • Like 2
Link to comment
Share on other sites

Hi Matthew - that is really bad luck mate!

This sort of potential failure is something I have had to contend with my entire working life - not, admittedly from servers, but from the fact as a sound producer, I have always had to cope with cutting edge technology that is almost guaranteed to fail at the worst possible moment!

And if remote connections are involved (especially anything satellite shaped), then the potential risks are increased dramatically.

In those cases, I ALWAYS made sure clients knew the risks and always explained to them exactly what we had arranged as a backup plan. These were costed in. So, for instance, once when we were doing a live Q&A across the world between staff and their CEO, we actually did a short, recorded interview the day before with the chap as a backup. In that case, the line DID fail, and the client used a pre-prepared script that I had written for him to apologize to the staff and introduce the recorded interview. Although it was still embarrassing for the client, we at the production house looked great! :)

You cannot prevent every problem and despite always having a backup plan, I have still ended up with egg on my face several times. I even had a client blame me for a lightning strike which took the power out at the hotel at which he was holding his conference; why was I unable to prevent it???

The only thing that might be worth considering is a server/website on a stick, given to the client in advance with instructions and to say to them, "Murphy's law says that the only time the line will ever go down will be tomorrow night - if it does, this is what we do.... It won't give the entire experience, but at least you will be able to carry on" 

The chances are that this sort of thing will be a rarity, but at least the clients will always know that you care enough to have had a back up plan. Some people think that admitting to a client that things can go wrong makes them look weak somehow and that they should not worry the client with that sort of thing. In my experience, it actually makes you look professional, experienced and well prepared.

I do think the IT industry as a whole struggles more with this than other industries - partly because the technology is inherently unreliable (!) and partly because the industry is still very young and does not have the decades of experience and history of catastrophes that something like the broadcast industry has for instance.

EDIT: Having just read your note that this was a competition site, I think your only true backup would be to have had a secondary hosting ready to go at an alternative location, and an already active cname address, so:

mycompetition.com

mycompetition.company.com (at alternative hosts)

Again, with a preprepared statement for the client to read to explain that the gods were being unkind and this is what they would do. And of course, cost it in, telling the client in advance that things CAN go wrong and this is an insurance policy - he can go without the additional cost, but at his own risk.

  • Like 2
Link to comment
Share on other sites

I even had a client blame me for a lightning strike which took the power out at the hotel at which he was holding his conference; why was I unable to prevent it???

There's no possible backup plan against idiocy...

Link to comment
Share on other sites

No, it was with a joomla site. I handled the assets etc using rsync to mirror assets etc. I assume the same would work for pw?

This did of course cause an issue when the original site came back online.

What I tended to do was hard code a subtle difference into the backup site for the client to tell there was an issue so the client would know not to start making changes or they'd be lost when it reverted back.

Link to comment
Share on other sites

@davo: yeah, rsync probably would do it, but to keep sites in sync all the time would require this to be triggered after each change to assets etc.. or just to be executed regularly enough to achieve "good enough accuracy" :)

Disabling edit sounds like a good idea to prevent unnecessary mess. Sounds like a good fit for demo mode too.. and just to be really clever, you could even make it dynamic using something like $_SERVER['SERVER_NAME'], so it'll be automagically turned on for the duplicate :)

  • Like 1
Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...