Seems like Proton is having a bad evening

plz1@lemmy.world · 3 days ago

Seems like Proton is having a bad evening

Dave@lemmy.nz · 3 days ago

Their status page has an update on what happened.

Service instability due to network incident Resolved - Due to an undocumented change in an operating system update shipped by one of our network equipment vendors, network devices in our Frankfurt datacenter experienced an unexpected partial failure.

This incident impacted primarily Proton Mail, with approximately 50% of users who were routed to the impacted datacenter experiencing intermittent downtime for approximately 1 hour. Due to redundant systems, no data or emails were lost, but some email delivery may have been delayed.

Incident report: Because the failure was partial, it was not sufficient to trigger a failover. Due to the unique circumstances surrounding this failure, a significant amount of confusion led to a longer than usual delay before the infrastructure engineers on shift made the call to failover to an alternative site.

That restored services, with approximately 30 minutes of lingering low-level instability while load was rebalanced. Investigation that took place in parallel uncovered the undocumented operating system change in the network device update that was rolled out earlier this month. Impacted network devices were updated, and the Frankfurt datacenter brought back into production with no user impact. Proton routinely conducts testing before rolling out software patches to our network equipment and rolls them out gradually.

Unfortunately, this problematic undocumented change was not discovered because it only created issues under specific load conditions (indeed, the new software had been running for weeks without issues).

We apologize for the longer than usual incident response time. In the coming days, we will be analyzing our response to this incident to reduce future reaction times.

Dr. Wesker@lemmy.sdf.org · edit-2 3 days ago

Same. Which is whatever, I’m more annoyed they haven’t updated their status page.

Mechanize@feddit.it · 3 days ago

Yeah, incredibly frustrating.
The only acknowledgement is from a volunteer mod on reddit that said an hour ago that “the team is aware and the status page will be updated shortly”.

The fact I had to dig around to find that is really not a pleasing experience.

DarkThoughts@fedia.io · 3 days ago

Why does a status page need to be updated manually?

x00z@lemmy.world · 3 days ago

Servers could still be up and responding to pings, yet backend databases could be down.

Or it could be a caching problem with the status service.

It’s bad ways of handling your status page but it happens.

kautau@lemmy.world · 3 days ago

It’s also a business decision. Many times companies will massage their verbiage and have a plan in place before they even change the status to “investigating” simply to appease when they have SLAs. It’s stupid, but that’s often the reason.

Scolding7300@lemmy.world · 3 days ago

There’s also a insurmountable amount of potential issues to cover, not worth the automation

x00z@lemmy.world · 2 days ago

It depends on the services, but in the end it’s pretty easy to spoof handshake packets to see if a service on a server is still running.

nmap is a great example.

Scolding7300@lemmy.world · 2 days ago

I meant on the logic side of things

TheTechnician27@lemmy.world · edit-2 3 days ago

Maybe somehow the problem was triggered in a way that the status page didn’t automatically detect it (for example, mine still works)? I’m really grasping at straws with that one. If it isn’t automatic, it categorically needs to be; if it is automatic but missed what’s apparently a major outage, then it needs to be fixed.

plz1@lemmy.world · 3 days ago

Yeah, I’m used to company status pages being the last to know.

Justas🇱🇹@sh.itjust.works · 2 days ago

Me when I test something in production

Valmond@lemmy.world · 3 days ago

FYI my proton mail mail works.

tamal3@lemmy.world · 3 days ago

Mine too

h54@programming.dev · 3 days ago

It must be regional. It’s been fine for me all day.

st3ph3n@midwest.social · 3 days ago

The iPhone app kept working for me, but the proton mail website was inaccessible for about 2 hours.

Hanrahan@slrpnk.net · 2 days ago

It had signed me out of the Proton Mail App on Android, first time that’s ever occurred, not sure it’s related though?

HorikBrun@kbin.earth · 3 days ago

No problems with mine all day.