All accounts down now for over 6 hours!

General Discussion about VFEmail

Moderators: Havokmon, linkchaser

Post Reply
User avatar
Posts: 31
Joined: Sat May 31, 2014 2:22 am

All accounts down now for over 6 hours!

Post by lakrsrool » Sat Sep 03, 2016 3:06 am

I'm not sure where we get support anymore. :?

I have 7 accounts, all of them have been down for over 6 hours as of now. :(

I see nothing in Twitter or FB other than "Be sure your email client is using - port 587 STARTTLS for SMTP." which I've done as of yesterday (9/1/16) when these messages were posted and see nothing here in the forum on this current situation.

Please keep us up to date, I do see "offline/maintenance" status for ports 143/993 on the "status page", but then it's not clear that this would cause all accounts to be down and if this is the case then we need more clarity on this. It is nice to have this "status page" so thank you for that, but the question remains; if we see "offline/maintenance" or perhaps it is when we see the "red signal" light that informs us that our accounts are down. Please clarify and it would be appreciated to be notified somehow, especially when downtime takes this much time, preferably here in the forum, but if not then in either/both FB and Twitter.

Bottom line, it would be greatly appreciated if we can please have more transparency in some way regarding issues of this kind?

Thanks in advance. :D
Good judgment comes from experience and a lot of that comes from bad judgment. - Will Rogers

User avatar
Site Admin
Posts: 1547
Joined: Wed Jul 02, 2003 11:17 am
Location: WI

Full system update

Post by Havokmon » Mon Sep 05, 2016 2:10 pm

To everyone - I apologize for the lack of updates - there honestly hasn't been too much to update. What's been posted publicly (server is having hardware issues, we're moving to a new one), has not changed - the only thing that's changed is the frequency of crashes. Which unfortunately has increased. That increase has kept pushing back the switch.

I was able to mask the hardware issue by modifying memory allocation to read caching, but that really didn't fix the problem. There's an issue with a drive or controller that is causing driver timeouts, resulting in OS crashes.

I've been trying to avoid long downtimes during the migration, and things have just gone very poorly. The data replication has been frustratingly slow. Typically a migration goes like this:
1. Snapshot, replicate.
2. Snapshot, replicate.
3. shutdown delivery
4. Snapshot, replicate.
5. IP Change

Unfortunately even the 'snapshot' has been taking up to 30 minutes. That's just wrong. It's normally instantaneous. The final replicate of 2Gb of data took 5 hours today.
So it was slow going. Throw in random crashes during that 5 hour window, and hopefully you can understand the delay.

In any case, we're on the new server now. The old server will be rebuilt with new drives. new controllers, more memory
AND a fresh OS install. The new OS will allow us to run a 3rd party application to enable hot active/active replication between servers. That will allow us to IMMEDIATELY switch to a backup server at the first sign of trouble.

I appreciate everyone's patience and support

Rick Admin

Post Reply

Who is online

Users browsing this forum: No registered users and 33 guests