Database down. Service completely interrupted for over an 90 minutes

Discussion in 'Databases' started by Krawma, Apr 12, 2018.

  1. No response from support for the entire time.

    Error message:

    com.microsoft.sqlserver.jdbc.SQLServerException: The TCP/IP connection to the host sxx.everleap.com, port 1433 has failed. Error: "The driver received an unexpected pre-login response. Verify the connection properties and check that an instance of SQL Server is running on the host and accepting TCP/IP connections at the port. This driver can be used only with SQL Server 2000 or later.". ClientConnectionId: xxxxxxxxxxxxxxxxxxxxxxxxxxx


    The xx have been added for security reasons.
    .
     
  2. travis

    travis Everleap staff

    Our monthly scheduled maintenance was being run yesterday which would have caused some connectivity issues. The scheduled maintenance occurs every second Wednesday of the month, and allows us to install any security patches and updates that Microsoft publishes to keep the servers secure, stable, and healthy.

    Please refer to this forum thread concerning our scheduled maintenance.

    https://forum.everleap.com/threads/monthly-maintenance-window.65/
     
  3. We are more than well aware of your scheduled maintenances that happen every second Wednesday.
    1. Why do you not inform people that there is a good chance that the site will be completely unavailable for more than 2 hours? Please don't answer that it is all Microsoft's fault.
    2. Why do you schedule it in the middle of the working week?
    3. Why do you schedule it in the middle of a working day ?
    Please do not answer in the last two posts that it is because it is a shared host and that your other clients have a different timezone - most other hosts organize their hosting with sites of the same or similar timezones precisely for this reason.

    Do you not have other ways to mitigate this problem? Is there anything that you can do to make sure this "scheduled maintenance" does not have such terrible consequences every time?
     
  4. Takeshi

    Takeshi Everleap staff

    I'm sorry to hear that your database was down for a while after the updates. We apologize for the downtime you experienced.

    To answer your questions:

    1. We do inform people about the monthly maintenance schedule. Each update is different but normally the updates take a reboot and the server is fine. Sometimes it can take a couple of reboots.

    We typically test the updates on our test servers first prior to applying them onto production servers.

    Web servers are removed from the cluster and updated and returned back to the cluster so customers should not see issues. The shared SQL servers are not clustered so there is a database outage that usually lasts for the reboot.

    2. Microsoft releases the updates typically on the second Tuesday of the month - they call it "Patch Tuesday". They can include critical security patches so we want to get them in as soon as we can. That is why we schedule for the second Wednesday of the month for consistency and predictability for our customers.

    3. We want to make sure we have people physically present that can immediately take action when these updates are installed - just in case of emergencies like what happened to your server. We don't want to do updates when staff are not physically present and on-call, as that would delay resolving issues that may arise.

    The Microsoft updates do not cause issues every time. On rare occasions there have been issues and for this particular update - only on your one SQL server as far as my understanding. And the reason for the issue was a diagnostic tool that was temporarily installed on the SQL server for additional monitoring because our DBA was researching some abnormal behavior on the SQL server. The updates did not play nice with the diagnostic tool. It took a little time to figure out what was causing the issue but the issue was resolved. We learn from our experience so our patching procedures for the future have been updated so that this doesn't happen again.
     
  5. Thank you very much for the comprehensive reply. There are several things I would like to say in reply to your answer:
    1. It took your support just shy of 3 hours to answer our support query. Even though you specifically run the monthly maintenance so you have "people physically present". That reply simply stated that you were running your monthly maintenance...... twice (!) ..... and that was it. There was no further follow-up or explanation. A proper explanation has taken a full 7 days - and that, I suggest, is only because a question was posed across a public forum. Most people would very much consider that to be very suspect quality service.

    2. "We typically test the updates on our test servers first prior to applying them onto production servers." - If you want to offer a proper service then the suggestion is to look at the testing regime because it is obviously not working very well.

    3. It has already been made clear to Everleap that notification of the monthly maintenance is understood. The way Everleap repeats this ad-infinitum suggest that the reply to this continuous assertion never gets through to Everleap - and this is why it continues to cause its customers major problems. Your mantra in reply to problems caused by the monthly maintenance always has variations of the following elements:
      1. We can't tell you what's going to happen because of Microsoft
      2. We can't change things because of Microsoft
      3. It's the monthly maintenance window and you can "expect some downtime"

        This is an irresponsible approach to offering a service.
    Sincerely disappointed.
     
  6. Takeshi

    Takeshi Everleap staff

    I'm sorry to hear you are disappointed even after hosting with us for several years. To address your reply:

    1. Our system admin team handles the server updates - not our support team. The only issue that the sys admin team ran into was the one SQL server. It took a little while to figure out what was going on because all the other web and SQL servers updated fine.

    I checked and there were no emails from the sys admins to the support team regarding the SQL server because it was related to the monthly maintenance. So, the support team did not have exact details on the issue with the SQL server. I'll let our CTO, sys admins and support manager know of your issue and they can conduct a post-mortem to determine if they can improve their process and communications.

    As for the time to answer your support query, if I am looking at the right ticket, our response time is about 1 hour in the late evening here. And the second response was within minutes. What I do see is that you replied to your own open ticket twice before we responded. As with most helpdesk ticketing systems, it works a first-in first-out rule, so if you respond to your own open ticket, you are pushing your ticket to the back of the queue. This will delay getting a response. We advise to give as much detail as possible on the first email.

    2. This month's update was tested properly and installed fine on all the servers except the one SQL server. And as it turns out that was due to special monitoring application that was installed for research. That monitoring app and the updates didn't play well with each other. This is the first time that our team has encountered this. In light of the latest experience, our sys admin team will be reviewing their procedures regarding maintenance and monitoring. As always, we learn from every experience and try to improve.

    3. I am glad that the monthly maintenance window is clearly understood. But the impression that there are monthly major issues for all our customers due to the updates is not accurate. We would not have any customers if our monthly updates was causing major issues every month.

    Because there are monthly Microsoft updates you can expect some short monthly reboots. We are being a responsible host by maintaining our servers and communicating our maintenance schedules with our customers and letting them know what to expect. We do our best to maintain our servers with the lowest impact to our customers. The Everleap cloud hosting system is a clustered system designed to help eliminate/minimize the issues associated with updates for our web servers. This is a huge improvement from traditional shared, VPS and dedicated server maintenance.

    The shared SQL, MySQL servers are not clustered so there will be monthly reboots. Again, the vast majority of times, customers do not encounter issues after reboot.

    If you want to eliminate/minimize issues with the SQL server updates, we can set up private high-availability clustered SQL servers just for your sites. This would help keep your database working during updates. However, it will be quite expensive. If you want a quote for such a private SQL setup, please let us know.
     
  7. I'm sorry to hear that your support and system services are not communicating properly. Let me try to help:
    1. Everleap needs to understand that offering excuses as being internal communication problems is not a concern for which your clients can take responsibility - nor a possibility.
    2. "if I am looking at the right ticket, our response time is about 1 hour" - I am sorry to say, but you are not right. Please see Ticket: 3BB-2261B273-02DD. The first communication by our staff was at April 11 at 11:00 PM and the first reply from Everleap is April 12 at 1:57 AM by Martin O.
    3. The explanation of the ticket being passed back to the end of the line if we add more information to it is, again, indicative of:
      1. A lower quality service
      2. Shows a lack of responsibility of Everleap. Everyone can see that offering additional information to a support ticket is not only sensible but very often vital. By you basically saying that this is what our ticket service does and it is not our problem but yours - is an attitude that is difficult to understand and a puzzling approach to customer service
    4. "This month's update was tested properly and installed fine on all the servers except the one SQL server." - again Everleap needs to re-visit its testing regime because it is obviously failing.
    5. You say:
      1. "you can expect some short monthly reboots" ad-infinitum as I have pointed out already
      2. "maintaining our servers and communicating our maintenance schedules" - just doing this over and over again is not enough. May be Everleap needs to make a re-visit on this too.
      3. "Everleap cloud hosting system is a clustered system designed to help eliminate/minimize the issues associated with updates for our web servers" - I suggest that it is not all that minimal. I cannot remember us suffering this so often when we hosted with DiscountASP
      4. "However, it will be quite expensive. If you want a quote for such a private SQL setup," - we are not happy with your service as it is presently. Your solution is to pay you more money? This is unlikely as your negotiation approach needs adjusting. It does not inspire me with any confidence.
    Hope this helps
     

Share This Page