Jump to content

ServInt Major Fail!!!???


MatthewSchenker
 Share

Recommended Posts

Greetings,

I'm glad Reed posted that.  For me, I got some sleep and am better able to just chalk it up to ridiculously bad luck for all involved -- clients and ServInt.

Propagating information like this also helps those of us who need some support when clients want to know what happened.

Like all adversity in life, I'm hoping that, through this, we all learned something and will be wiser and stronger.

Thanks,

Matthew

  • Like 1
Link to comment
Share on other sites

Reed makes an interesting point about social networking - I don't care what anyone says, but Social Networks can be a really bad way of keeping people informed and companies are falling for this left, right and centre:

  • It can seem flippant and cursory
  • It can lack detail
  • It often relies on the client having to do the running to find information
  • You don't know whether you are dealing with someone serious or just a back-office PR lacky
  • Your important communique is surrounded by non-relevant information, advertising, animations and an inappropriate environment
  • They are incredibly impersonal, ironically

It strikes me that a good communication system should be exclusive to a company, and in the case of something like Servint, should be completely external to the system it is talking about (so it does not go down with the rest of the system).

The blog post that Reed has written should have been available during the incident and should have been emailed out to every contact email address they have. Clients should be encouraged to make sure that they have registered an emergency email address (or several) where notifications are sent. 

Emails should NEVER say "Keep up to date by following us on twitter," or "check our blog often," or anything else that means the client has to do the work - updates should be regular and emailed so the client has to do nothing.

Servint have a really good reputation, from what I see, for being very technically able. But any reputation is made up only partly by a company's history for being good - the rest (possibly the bigger part) is based on how they communicate with their customers; how much detail they give, how often they supply information and how direct and personal are those communications. 

With the advent of social networking, companies (most companies, probably), have abandoned good personal relations in favour of bulk, cheap, impersonal relations via facebook and twitter. Companies should use these systems, they are fun and chatty - but they are not personal. When the muck hits the fan, they need to deal directly with the customer, not via a third party.

  • Like 3
Link to comment
Share on other sites

@Joss: it's also a matter of knowing your customers. Being in the same channels and not forcing them to use the channels you've chosen.

In this regard I appreciate what ServInt were trying to do, i.e. using Twitter and Facebook to communicate to clients already in those medias. What they failed at was that they sent out one quick message, neglected keeping people up to date as things evolved and didn't even reply to messages of frustrated clients as those kept piling up.

Twitter is very fast-paced media; if you stay silent for five minutes, people decide that you're gone (or just don't care) and start asking for status updates. Unless you answer them, they're going to be pissed :)

My opinion is, of course, hugely affected by the fact that 100% of time I'm in some way connected to social media, so if a service provider wants to reach me, that's their best bet (and the number one media I'd check if something was acting up would be Twitter, which is perfect for quick status updates). Not saying that using Twitter would eliminate the need to send important updates as emails, but it can provide very real extra value to a lot of customers.

(Also: if my service provider sent me an email each time there's been some little slowdown or a minute of downtime, I'd be equally pissed at them. Each media has it's own merits and email isn't that great for fast-paced messaging. When you're having this scale of an issue it should've been used, though.)

On slightly related note, I found that blog post somewhat lacking. Sure, they're saying that things went wrong (+1 for explaining what the issue was), but the rest was mostly just saying that "shit happens" and "we'll try to be better". I'd have expected them to name at least one concrete step they're taking to make sure that this doesn't happen again -- "we're going to learn from this" is abstract and in itself doesn't promise any improvement at all.

Communication is a difficult beast to pin down perfectly :)

  • Like 2
Link to comment
Share on other sites

Matt: They gave solution to the problem you had with your site? In my case I have a website with Servint and from that time the site has had problems. I'm in Colombia and the web site looks at some ISPs and not others. You have something similar happened or someone has been the same? I have problems with my web site www.cpnaa.gov.co since June 29 and have failed these people definitely fix the problem. Someone can help me or tell me what to do? Stay pending comments.

post-2463-0-57692700-1404778043_thumb.pn

Link to comment
Share on other sites

Good Morning,

To jcloaiza: Over the past few weeks, my ServInt sites have often been extremely sluggish.

I'd like to give them the benefit of the doubt, but this morning once again all my Servint sites (and e-mail) are down and I am receiving customer complaints left and right.

Is anyone else experiencing this today?

Thanks,

Matthew

EDIT: Just found out -- again, via Twitter -- that this was an outing to fix a kernel vulnerability. It was supposed to be <15 minutes but it is now 4 hours.

Link to comment
Share on other sites

Matthew, I got an email about this planned outage last week (planned 15 mins between 5 and 7 am), as did all my clients. Are you sure ServInt has your correct email address on file? The outages are just to reboot the servers (kernel patch only seen after server reboots), and I read the 15 minutes as more like a maximum. If your server is down it sounds like that's something different, you should submit an urgent ticket or call them. It's worth noting that they are fixing a vulnerability that affects most web hosts, not just ServInt, and appears they are fixing it before everyone else. Please let us know what you find out from them as to why your server isn't online.

Link to comment
Share on other sites

Greetings,

Sites are back online again.

All told, this outing was over 4 hours, and a lot of headaches -- again.  It's becoming difficult to explain to clients that "it's not my fault."  Every time, their outings only affect "a small number" of clients, but I seem to keep on ending up in that "small number."  We're looking at about 14 hours of downtime within the past month.

I'm really trying to be generous here with ServInt, but it's difficult under the circumstances.

Will try to move forward.

Thanks,
Matthew

Link to comment
Share on other sites

Matthew, why were you offline for 4+ hours? This morning's outage was an emergency update planned for all servers, not just a small number. Is the outage you are talking about different from the one that was planned? I'm just curious if your outage is related to the emergency update or if it was something different entirely? In either case, it sounds like your host node has some bad karma or something. Maybe it's worth asking to be switched to a different node in the data center. Just make sure you are communicating directly with them through the portal (or by phone), and not Twitter or other social networks. They are extremely helpful and knowledgeable, but you have to use their secure channels to communicate because they have to maintain client confidentiality. 

Another thing to consider is that any kind of maintenance that would require taking down a server is something they usually do overnight or early AM, since that's the least busy time (though this kind of maintenance is rare). But in your case, your clients are in Europe (I think?). Whether for scheduling or ping times, maybe it would be beneficial to be in their Amsterdam data center rather than the Reston, VA data center?

Link to comment
Share on other sites

Good Afternoon,

Ryan: yes, all of my sites were offline for four hours, from 5:00 AM to 9:00 AM EDT.  I've got some communication started with Servint via their portal, to hopefully figure out what's been going on lately.

I also tested the sites using the following services:

http://downforeveryoneorjustme.com/
http://www.websitedown.info/
http://wheresitup.com/

All of these services showed all my tested sites down, both specific sites and the IP addresses they use.

If it were just the outages, it would be bad enough, but in between the outages my sites have been really sluggish.

Joss: Yes, here in the Northeast USA, our place names seem to play an elusive game with the UK: New England, New Britain, New London, Northampton, Cambridge.  We even have a Thames River.

Thanks,

Matthew

Link to comment
Share on other sites

  • 2 weeks later...

Greetings,

Today, my phone started ringing from clients complaining about their sites being down -- AGAIN.

With disbelief, I logged into the ServInt portal and saw this message:

"The host machine disk array has suffered corruption. We've begun restoring the VPS's from our backup system to other host machines. Updates will be provided periodically on the progress."

I post this because I know here on ProcessWire ServInt is discussed as a good hosting choice. This is three major failures in about a month, totaling (so far) about 30 hours of downtime.

I have managed to soothe my clients twice before, but today it has crossed a line and become impossible to sugar-coat the issue. Servint is not a reliable host.

Thanks,

Matthew

EDIT: This latest failure has now been going for over 9 hours.

Link to comment
Share on other sites

Greetings,

After another embarrassing outage, my sites are up again.

Over the past month, I have spent the equivalent of one full work week dealing with ServInt outages and the related customer communucations/explanations associated with them. I have not seen that many hours of downtime in 12 years combined with all kinds of hosts.

I'd like to take the opportunity to hear from others here what kinds of systems or redundancies you put in place to handle outages like this.

Thanks,

Matthew

Link to comment
Share on other sites

Matthew, sorry to hear you are continuing to have outage issues. I always feel obligated to reply here because I stake my name and reputation on ServInt by recommending them here on the site, and that recommendation comes with a lot of long term experience with ServInt and numerous other hosts. I feel badly that you've not had a good experience. The only outages I've seen were 10 years ago, the latest Reston one, and that's it (though both went largely unnoticed by my clients). Beyond this site, I've got pretty much all of my clients at ServInt, in both Reston and LA data centers, all with their own dedicated servers and/or VPS accounts. I have experience with more than a dozen servers there over the last decade. I have also dealt with numerous other hosts over that time period. My experience has led me to trust ServInt very strongly relative to others, and I consider them the best in the business. 

I'm stating this not to contradict what you are saying or take anything away from the unfortunate experience you've had–it definitely gives me pause. Rather, I'm stating it to say that I think your experience may be unusual and unique. At least I hope that's the case. If it turns out to be a broader trend, we'll act on it, but I don't see evidence of that. Sometimes when things get started on a negative path, it's hard to get off of it (like the law of attraction), and you've got to throw a wrench into it by making a change. I would encourage you to call them and communicate your experience and ask them to make things right. Since your clients that use this server are in Europe, a switch to the Amsterdam data center seems to make sense. But just change something, whether as little as asking them to change the host node, or as major as switching to an entirely different data center (or even web host).

I hope you will communicate your experience directly to them, because I don't think they will ever see this thread, which is unfair both to you and them. The site http://www.webhostingtalk.com/ is probably a better place for this discussion because I know they do sometimes read that and may have the opportunity to respond to your experience and make things right (something that can't be done here). Your best bet is just to call them though. 

Link to comment
Share on other sites

Greetings,

Kongondo: I find it hard to believe it's coincidence. All hosts have trouble, but the total hours of ServInt's downtime is hard to get beyond. I'm trying to give ServInt another chance, because it's such a pain to switch everything to another server. But I wish I had never left my previous host, which had zero downtime in the 5 years I was with them.

Ryan: In my opinion, it's perhaps time to pause before endorsing ServInt. I'm not doubting their good history, but something is clearly going wrong with them. I wonder whether they have grown too fast? For example, this latest outage was a drive corruption. OK, that happens, but the fact that it caused 10+ hours of downtime just for them to restore from backups indicates a deeper resource problem. I will ask them to switch me to the same resource as ProcessWire. That way, either you share my pain, or I share your luck!

I'd definitely like to hear what kinds of redundancy plans other people use to counteract situations like this.

Thanks,

Matt

PS: On the idea of switching to one of ServInt's European servers, I have learned that their Amsterdam center has experinced significant outages lately as well.

Link to comment
Share on other sites

Ryan: In my opinion, it's perhaps time to pause before endorsing ServInt.

If there is any evidence of it being a broader trend that extends beyond your experience, we'll look at it. But as far as I can tell it is an isolated issue. I'm just sorry you are the one to experience it. :(

I wonder whether they have grown too fast? 

The growth has been slow and steady over a very long time. I'm not aware of any major new changes in growth there, other than acquiring the Amsterdam data center a couple years ago. 

OK, that happens, but the fact that it caused 10+ hours of downtime just for them to restore from backups indicates a deeper resource problem. 

I agree, that does seem like a long time. Without knowing the details of exactly what the problem was or how far it spread, I don't think I could analyze how long it should take to fix though. If you want to know why, definitely ask them about it. They are pretty straight up with these kinds of questions. 

I will ask them to switch me to the same resource as ProcessWire. That way, either you share my pain, or I share your luck!

We are on a dedicated server in the Reston data center. If they are able to share a dedicated server, it would be news to me. :) I like the way you are thinking though. But if I were you I'd make a more major change so that you can get off the bad luck train. Try out the LA or Amsterdam data center at least. Reston data center has been great to us, but not so good to you, so change up the location or even the host, otherwise you or your clients' minds will still be on outages and thus attracting them. I think your affected clients would also appreciate your effort in making those changes. But before you decide anything, just call them, tell them what you've run into, that you need a big change, and ask them what they suggest. They will also be the best ones to advise on your redundancy questions. I've found them to be very knowledgeable on this stuff. 

Link to comment
Share on other sites

Thanks Matthew for sharing this wake-up call.
Even Servint does not match with shiny coding shoes.

Six things here:
1.
I am sure Servint has it somewhere in it's small letters that they can not be hold responsible
for it's clients (read your) financial, business and good name damage if Servint services fail.

2.
We have to be sure we have this also somewhere in our small letters with our clients.

3.
I re-edited following post. Instead of waiting for dns propagation, use url forwarding!
It works instantly and visitors won't even see the new host url if configured properly.
https://processwire.com/talk/topic/6792-servint-major-fail/#entry66411

4.
Remember this post ? I re-edited the post away because it was
not considered an option and even a dangerous thing to do.
But now in this case new light is shining upon it. (See also point 5.)
This was my post before I re-edited it away:

Why not subscribe for a 100/50 Mb glassfibre internet connection
and use a fanless zero maintenance computer as a server and
then have home managed service :) with remote access.


5. And then Davo came up with his post:
https://processwire.com/talk/topic/6792-servint-major-fail/#entry66466

6.
Monday morning I am going to have a word with someone how to setup
point 4 and 5 to have a fall back for the time a host fails.

We simply can use the internet connection we already have.

 

Link to comment
Share on other sites

Btw, if you are on WHM/cPanel, moving accounts (to another WHM/cPanel) setup is really easy. Pretty much a 1-click process. So if whatever change you make involves moving to a server where they can't migrate it for you, just use the migration tool built into WHM which has worked great in my experience. I used to host some clients on my own VPS and when they grew and moved to their own dedicated server, this migration tool made life easy. 

  • Like 1
Link to comment
Share on other sites

Greetings,

If there is any evidence of it being a broader trend that extends beyond your experience, we'll look at it. But as far as I can tell it is an isolated issue. I'm just sorry you are the one to experience it. :(

Just some quick evidence from the past month or so of just how big this issue is becoming:

https://www.facebook.com/ServInt/posts/10152593666901177

https://www.facebook.com/ServInt/posts/10152593860231177

https://twitter.com/servint/status/486609891172560896

http://www.thewhir.com/web-hosting-news/servint-ceo-promises-improve-customer-communication-wake-reston-va-data-center-outage

I don't know...

Thanks,

Matt

Link to comment
Share on other sites

I can vouch for moving accounts under WHM - nowadays it will even repoint the DNS records on the source server for you as well so theoretically the sites immediately begin to be served from the new server (as long as the source server remains up during the time your domain name registrar's DNS changes are taking place of course).

I've got some sites hosted at ServInt Amsterdam and haven't had any trouble though there do appear to be a couple of short periods over the last month or so with no traffic (presumably reboots due to these Kernel updates or similar). Looking at the timings they've been during the early morning for the most part so that would explain why I haven't noticed and nor has anyone from the sites hosted there.

Not that it's much comfort, but I would say that every host has its issues. I had a VPS with LiquidWeb and later using their StormOnDemand service and had no issues for about 4 years, but there would be some random server crashes towards the end of this that neither of us could quite pin down. Since I had more UK/European clients being hosted on that server I decided to move to ServInt's Amsterdam server and those issues that we couldn't pinpoint vanished - I guess that points to a config issue somewhere on the server. LiquidWeb is another pretty big player - I'm sure if we'd kept at it we could have resolved that issue, but since I wanted to move some sites closer to home it made sense to move hosts.

Since using ServInt, I've moved all my UK clients completely over to UK servers with Future Hosting so they're at least in the right country now, and left the non-region-specific sites with Servint in the Amsterdam datacentre (it's pretty much split now based on personal projects being on ServInt and clients on Future Hosting, though the sites I'm classing as personal projects have hundreds of thousands of visitors each month so they're busier than client sites by far).

I think the theme for my switching around over the years, and my negative experiences over a decade ago with shared hosts, is that US hosting companies seem to offer better value for money and have great service teams (all three of the companies mentioned will usually turn a ticket round in under 15 minutes) and I never had any massive issues with any of them. Good UK hosts are hard to find it seems unless you want to pay massively more for the same service.

Not sure where I'm going with this to be honest - just name-dropping I guess - but I've not had any real issues with the larger companies like ServInt or LiquidWeb. It's still fairly early days with Future Hosting, but they're impressing me as much as the other two so far.

Link to comment
Share on other sites

Create an account or sign in to comment

You need to be a member in order to leave a comment

Create an account

Sign up for a new account in our community. It's easy!

Register a new account

Sign in

Already have an account? Sign in here.

Sign In Now
 Share

  • Recently Browsing   0 members

    • No registered users viewing this page.
×
×
  • Create New...