Page MenuHomePhabricator

Store check user data action text in structured format
Closed, ResolvedPublic

Description

For log actions, CheckUser currently stores the action text in cuc_actiontext field in its raw form. Since the action text is generated from the LogFormatter and then inserted, it means CheckUser entries cannot be properly localized and we also cannot format it as needed when displaying to the user. To avoid this, the parameters should be stored in a new field like rc_params/log_params. As part of this the log ID or other identifier for the log entry in the logging table (or other table) should be stored, so that CheckUser can link to the log amongst other needs.

Related Objects

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

Instead of copying the message and params can we store the log_id?

That might work for things which use the normal logging table but won't work for others which doesn't make use of it like AbuseFilter, GlobalRename (T131207) and possibly others. (AbuseFilter's insertion of CU entries is also kind of hackish).

In the long term, I think we should be allowing CheckUser formatting to be independent of LogFormatter so that log entries which are optimized for CU workflow can be rolled out when needed instead of the normal entries although they would default to the regular LogFormatter, of course.

OTOH, I think CU will benefit from not having to invent its own log formatting system. And now that I think of it, I don't really see any reason AbuseFilter needs a separate log system. I'll file a bug for that.

I am with Legoktm on this one.

If another extension generates logs, and CU wants to use those logs, CU should use that extension's (or the MW's) log formatter.

If someone writes an extension that generates logs and there is no way for CU to consume it, then CU should not use its logs, period. CU should not be responsible for making sense out of someone else's logs.

CU can, and should, use MW's own logs through MW's own log formatter.

I can also see the issues with having a separate formatting system but are we sure that data that is going to be inserted in CU will always have a corresponding log table entry? How should cases such as GlobalRename (T131207) be handled?

If someone writes an extension that generates logs and there is no way for CU to consume it, then CU should not use its logs, period. CU should not be responsible for making sense out of someone else's logs.

I am not sure I completely agree with this. If an extension does (not need to|cannot) use the core logging system because it doesn't have the proper features or it has other purposes, but still needs to include data in CU, why should CU make it difficult for them to make use of it? Shouldn't CU be making it easier for them? I'm not saying that CU should be responsible for processing their logs but what I meant is that CU should provide proper interfaces so that extension developers may make use of it so that CU can display it to the user.

I don't know whether there are actual extensions which does have such logging systems but we should probably have a solution for this in case it might cause issues for us in the future.

Only thing I know of that would need special logging systems is T68450#2530241 which is a proposal and not (yet) an extension.

@Glaisher a milder version of my harsh comment above would be this:

CU should only store the log text if it is from an extension that does not use MW's logging system. In all other cases, CU should store the log_id only, and the log's text should be generated on the fly using the log formatter.

That way, we do support those extensions like the on @JEumerus mentioned, but will treat them as exceptions, not the rule.

@Huji The point of this task is to get rid of storing log text in the CU table.. If the log text is stored (instead of type, params etc.), then we would have the issues we are currently having. See the subtasks. Or maybe you meant something else?

What I am trying to say is even the structured text should only be stored if we don't have a straightforward way to generate the log. If we do, then we should use the logid instead. For most cases, we do have a way (most logs use MW's log formatter).

@MusikAnimal Any chance that Community-Tech could have a look at this task as well? Thanks!

Huji added a parent task: Restricted Task.Nov 21 2021, 3:51 AM
Dreamy_Jazz triaged this task as High priority.EditedJun 26 2022, 11:30 PM
Dreamy_Jazz subscribed.

This is depended on by many other tasks and has had at least two WMF CUs say this is a task they'd like seen to be taken on in a recent discussion on checkuser-l. I'm looking into this.

This will also help reduce the truncation issues mentioned on checkuser-l. Some AbuseFilter changes exceed the current limits on text and therefore are truncated irreversibly.

Because some events to not exist in the logging table the checkuser table will need to store the params separately for at least some events.