Page MenuHomePhabricator

Upgrade OTRS to a more recent stable release
Closed, ResolvedPublic

Description

We're currently sitting at v3.2.14. The latest stable release is 3.3.9 (4beta3 also available). Creating this now as reference and link to other existing bugs that may be resolved in a version released since our last upgrade.


Version: wmf-deployment
Severity: enhancement

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Just an FYI - 4.0.2 came out today; along with the announcement that versions prior to 3.3 (including 3.2.14, which is what we're using at the moment) will no longer be supported.

Mdann52 raised the priority of this task from Low to High.Aug 12 2015, 2:51 PM
Mdann52 subscribed.

Any progress on this yet? Considering we are using an unsupported version now, I'm upgrading priority.

Steinsplitter renamed this task from Upgrade OTRS to latest stable release (4.0 or later) to Upgrade OTRS to latest stable release (5.0 or later).Oct 21 2015, 1:46 PM
Aklapper renamed this task from Upgrade OTRS to latest stable release (5.0 or later) to Upgrade OTRS to a more recent stable release.Oct 22 2015, 11:20 AM

Constantly changing expectations on a task (in this case: version) is not helpful for discussion or planning. Hence I'm reverting the task summary changes.

So, to add some (not much) context into this, we have a 4.0.13 installation that is currently being evaluated by OTRS volunteers. It resides on https://otrs-test.wikimedia.org. It is not fully functional, specifically it has a snapshot of a "point-in-time" database and will not receive new emails on purpose while the evaluation is ongoing. This installation will be taking over the current one after ACK'ed by the OTRS volunteers, hence fulfilling this task. Apart from the version change and new features, OTRS volunteers should not see anything different or change the way they work.

In the meantime we are investigating upgrading this installation to OTRS 5.0.1. It was released just 2 days ago and there a number of prerequisites (https://git.wikimedia.org/tree/operations/software/otrs) that need to be updated for this to work. Whether this upgrade to 5.0.1 will happen before the migration to the new installation is still unsure, but the current plan is indeed for it to happen

Imho for volunteers like us the key benefit of OTRS Version 5 is the mobile ready user interface. Being able to Prozess some Tickets while commuting would realy increase our productivity.

Therefore: 1 for a direkt upgrade to OTRS5.

Imho for volunteers like us the key benefit of OTRS Version 5 is the mobile ready user interface. Being able to Prozess some Tickets while commuting would realy increase our productivity.

Therefore: 1 for a direkt upgrade to OTRS5.

The mobile app is actually availble in earlier versions. We do not use this feature for various reasons.

I 've upgraded the test installation today to OTRS version 5.0.1. There is one thing that has not been upgraded to version 5 and that is the QuickClose functionality that is provided by a ZnunyPlugin. The current estimation by them is a month before this plugin is ready. The issue is tracked here https://github.com/znuny/Znuny4OTRS-QuickClose/issues/6. The rest of the Wikimedia-specific customization have been updated.

I 've upgraded the test installation today to OTRS version 5.0.1. There is one thing that has not been upgraded to version 5 and that is the QuickClose functionality that is provided by a ZnunyPlugin. The current estimation by them is a month before this plugin is ready. The issue is tracked here https://github.com/znuny/Znuny4OTRS-QuickClose/issues/6. The rest of the Wikimedia-specific customization have been updated.

OTRS5 support for QuickClose is aviable now :-)

I 've upgraded the test installation today to OTRS version 5.0.1. There is one thing that has not been upgraded to version 5 and that is the QuickClose functionality that is provided by a ZnunyPlugin. The current estimation by them is a month before this plugin is ready. The issue is tracked here https://github.com/znuny/Znuny4OTRS-QuickClose/issues/6. The rest of the Wikimedia-specific customization have been updated.

OTRS5 support for QuickClose is aviable now :-)

Heh, you 're fast ;-). I am reinstalling the plugin right now

I 've upgraded the test installation today to OTRS version 5.0.1. There is one thing that has not been upgraded to version 5 and that is the QuickClose functionality that is provided by a ZnunyPlugin. The current estimation by them is a month before this plugin is ready. The issue is tracked here https://github.com/znuny/Znuny4OTRS-QuickClose/issues/6. The rest of the Wikimedia-specific customization have been updated.

OTRS5 support for QuickClose is aviable now :-)

Heh, you 're fast ;-). I am reinstalling the plugin right now

And seems to be working just fine. So that loose end has been tied up.

@akosiaris There's a report from @Rjd0060 that the test instance is slow. Is that likely just a result of the "hardware" it's on/resources assigned to it vs the actual production machine?

And more specifically, I (and others) report slow logins and delays when loading certain queues.

TL;DR

It is to be expected, has nothing to do with the software or the VM but with the "test" nature of that install. Such problems will not be present in the production installation.

Long version:
Yeah, that's to be expected. Let me explain why.

First though, to answer the "hardware" question.

The VM is idling from what is witnessed in:

https://ganglia.wikimedia.org/latest/?c=Miscellaneous eqiad&h=mendelevium.eqiad.wmnet&m=cpu_report&r=hour&s=descending&hc=4&mc=2

The hardware host it resides in currently (and since it was created) is idling even more, so no contention there.

https://ganglia.wikimedia.org/latest/?c=Ganeti Virt cluster eqiad&h=ganeti1003.eqiad.wmnet&m=cpu_report&r=hour&s=descending&hc=4&mc=2

Increasing the resources given to the VM is certainly possible but it would not really help at this specific point in time. That part will be revisited immediately though once a VM becomes the production serving machine.

Now to answer why queries (logins, showing queues and other things as well) are slow.

There was a conscious choice when deploying that test instance to use as a database backend a MySQL slave in the second datacenter (codfw). The disadvantage of that choice was an insertion of > 34ms of latency for every single query. I would never make that choice for a production service. However, given mainly that the plan is to be conducting an evaluation of the software, features. migration process etc and NOT the performance of a fully fledged service, plus the fact that it was quick, easy and database operations (creating a new database, migrating, running tests, changes, queries etc) posed exactly zero (0) threat to anything else (that last one was the driving fact actually), made the choice an easy one.

It will NOT be like that in production, the box will be querying a local database and all these problems will not be present.

What does make me happy though is that if we arrived at the point where we are looking at the performance of the installation, it means we got no real other issues and we should start looking at planning for a date to start the actual upgrade/migration, right? :-)

Understood so far, but having that we cannot test functions that use expensive database queries when they run into timeouts, besides that it's not a lot of fun to do tests at all on slow systems.

So, nobody said we're beyond issues yet because nobody could test a lot.

In T74109#1795219, @Krd wrote:

Understood so far, but having that we cannot test functions that use expensive database queries when they run into timeouts, besides that it's not a lot of fun to do tests at all on slow systems.

So, nobody said we're beyond issues yet because nobody could test a lot.

Ah, you 're killing me :-( or at least my happy mood.

Can I have an example of something that runs into a timeout ?

Some of the admin interface views did, sadly not reproduceable at the moment.

Per @akosiaris OTRS upgrade to version 5.0.x has been scheduled for 28 January 2016 and is expected to last 6-8 hours, beginning at 0800 UTC. Agents have been notified.

Rescheduled for February 3 at 0800 UTC.

Asking dumb question to make sure the answer is as obvious as I hope it is: OTRS will (probably) be down between 0800 UTC and probably somewhere between 1400 or 1600 UTC (if everything goes according to plan) on February 3?

Asking dumb question to make sure the answer is as obvious as I hope it is: OTRS will (probably) be down between 0800 UTC and probably somewhere between 1400 or 1600 UTC (if everything goes according to plan) on February 3?

Yes that's correct.

OTRS is being localised on Transifex.
Am I right that it will be upgraded to 5.0 including translations present there on February 3rd?

In T74109#1956329, @Ata wrote:

OTRS is being localised on Transifex.
Am I right that it will be upgraded to 5.0 including translations present there on February 3rd?

Unfortunately no. OTRS will be upgrade to version 5.0.6 which is the latest version out. The localizations in 5.0.6 are definitely not the ones that will be present on https://www.transifex.com/otrs/ on Feb 3rd. I think however that you can get an idea of what they will be like on https://github.com/OTRS/otrs/tree/rel-5_0_6/i18n/otrs. Seems like the last sync from https://www.transifex.com/otrs/ in the OTRS software was on Jan 13. Commit was https://github.com/OTRS/otrs/commit/29b250b6c4057288aff280f95e945a1b0400221d

Note however that we are not bound to version 5.0.6 and we expect to upgrade to newer versions.

In T74109#1956329, @Ata wrote:

OTRS is being localised on Transifex.
Am I right that it will be upgraded to 5.0 including translations present there on February 3rd?

As @akosiaris stated, the most up-to-date versions of translations (for the most part) will not be moved. However, I think that if we find interface messages displaying incorrect translation (such as the current MOTD which apparently translates to "Soup of the Day" instead of "Message") we can fix that locally. Alexandros can correct me if I'm wrong.

In T74109#1956329, @Ata wrote:

OTRS is being localised on Transifex.
Am I right that it will be upgraded to 5.0 including translations present there on February 3rd?

As @akosiaris stated, the most up-to-date versions of translations (for the most part) will not be moved. However, I think that if we find interface messages displaying incorrect translation (such as the current MOTD which apparently translates to "Soup of the Day" instead of "Message") we can fix that locally. Alexandros can correct me if I'm wrong.

We can fix/patch things locally if needed, but IMHO that should only be a temporary stopgap measure. Operationally it is way better for us to stick to the released software by OTRS than having to patch it on every upgrade. It means we get to do faster, with smaller downtime and smaller cost on our part, upgrades. Our current installation for example is locally patched and took us days to find the patches, sort through them to figure out what's really needed and what's not and port them to proper sopm packages that do not overwrite the OTRS standard files. I would rather never ever go through this again.

And yes, I know this sounds ironic, given that we haven't upgraded the production installation in years and we just now do, but I am an optimist on this these days.

The gist of all this reply, is that we should always try to push our changes upstream instead of just patching locally.

Is there/Would there be a way to updating just translations? Just like Mediawiki gets its translations updated from TWN. It would be rather a pain to have to report here what to change and then go to Transifex to change it for the rest of the world. I like @akosiaris's comment, but if it'd be not possible to make regular updates of everything I'd really love to have a usual from work on TWN way of just fixing/finishing translations whenever I want without need to poke someone for every string I change.

To be honest, I don't see how that suddenly is a problem. How often did we have reports of minor translation mistakes? Off the top of my head, I'd guess maybe three, four times in the last couple of years? Given that, I can really understand the reluctance to maintain custom patches to fix a few translations, let alone implement a custom translation backend synchronizing with Transifex or another translation platform that would require extra maintenance efforts.

I believe the best strategy is to have someone set up an account on Transifex and fix these mistakes as they are brought to their attention through OTRS Wiki or Phabricator. (The fact that this is possible is already a huge improvement; I remember just a few years ago, they didn't even have a public interface for translation work, and mistakes were reported through the OTRS bugtracker or their mailing list.) I also don't share Base's fear that it will take such a long time until we see improvements in Transifix reflected in our installation either. OTRS delivers translation updates not only as part of major releases, but also through their minor patch-level releases which generally seem to be fairly easy to perform and have in fact been performed several times since we upgraded to the 3.2 framework.

Is there/Would there be a way to updating just translations?

A one that is maintainable, makes sense in the long run and that does not involve using OTRS's designated translation service (transifex.com) ? Almost certainly not.

Just like Mediawiki gets its translations updated from TWN.

Arguably, that process is very very similar to what OTRS does for its translations, that is update them from transifex.com periodically and ship them in the next release.

It would be rather a pain to have to report here what to change and then go to Transifex to change it for the rest of the world. I like @akosiaris's comment, but if it'd be not possible to make regular updates of everything I'd really love to have a usual from work on TWN way of just fixing/finishing translations whenever I want without need to poke someone for every string I change.

I think there is a misunderstanding here. There is no reason to "report here what to change" or "poke someone". The correct way is to go straight to transifex.com, fix the problem for everyone and the Wikimedia installation gets the fix along with everyone else.

[snip]

I believe the best strategy is to have someone set up an account on Transifex and fix these mistakes as they are brought to their attention through OTRS Wiki or Phabricator.

I fully agree.

[snip]

OTRS delivers translation updates not only as part of major releases, but also through their minor patch-level releases which generally seem to be fairly easy to perform and have in fact been performed several times since we upgraded to the 3.2 framework.

Indeed. These days minor patch level releases seem to be happening in a monthly cycle and indeed deliver translation updates as well.

Change 267871 had a related patch set uploaded (by Alexandros Kosiaris):
ticket.wikimedia.org: Lower TTL down to 5M

https://gerrit.wikimedia.org/r/267871

Change 267888 had a related patch set uploaded (by Alexandros Kosiaris):
otrs: Route OTRS email to mendelevium

https://gerrit.wikimedia.org/r/267888

Change 268055 had a related patch set uploaded (by Alexandros Kosiaris):
Remove mendelevium's OTRS config override

https://gerrit.wikimedia.org/r/268055

Change 268055 merged by Alexandros Kosiaris:
Remove mendelevium's OTRS config override

https://gerrit.wikimedia.org/r/268055

Change 268056 had a related patch set uploaded (by Alexandros Kosiaris):
Update mendelevium's OTRS config

https://gerrit.wikimedia.org/r/268056

Change 268056 merged by Alexandros Kosiaris:
Update mendelevium's OTRS config

https://gerrit.wikimedia.org/r/268056

Change 267871 merged by Alexandros Kosiaris:
ticket.wikimedia.org: Lower TTL down to 5M

https://gerrit.wikimedia.org/r/267871

Change 267888 merged by Alexandros Kosiaris:
otrs: Route OTRS email to mendelevium

https://gerrit.wikimedia.org/r/267888

@akosiaris please see T125756 some outgoing notification mail fails, and i found

[Error][Kernel::System::Email::Sendmail::Send][Line:85]: Can't send message: Cannot allocate memory!

arg

akosiaris claimed this task.

I am closing this as resolved. We are at OTRS 5.0.6 and there is no way we are going back to 3.2.x. Issues still exist but are being tracked independently of this task.

We still got the some layout-issues:

  • when a HTML-E-Mail is displayed the iframe (containing this email) is initially set to a fixed px-width (I guess via js), which stays the same when resizing the window. This either leads to permanent scrolling or realy short line-lengths till the next reload.<br>I would suggest to set the iframe to 100%, so that it resizes dynamically with the surrounding window:
.ArticleMailContent iframe {
    width: 100% !important;
}
  • The Text in the top Bar is overlapping when using one of the "slim" skins. This may be fixed by
#Header {
    overflow: hidden;
}

@MartinK I would recommend a separate ticket for that since the ticket was specifically about the upgrade and has been closed. Would you mind just pasting that comment into a new task? If you think it was specifically caused by the upgrade, maybe use "Create Subtask" above.