Page MenuHomePhabricator

Add ability to search by user agent from CheckUser interface
Open, MediumPublic

Assigned To
None
Authored By
kaldari
Sep 27 2016, 11:30 PM
Referenced Files
F4672128: CheckUser user agents - get edits.png
Oct 29 2016, 1:28 AM
F4672119: CheckUser user agents - box_2.png
Oct 29 2016, 1:22 AM
F4569912: CheckUser user agents - get IP.jpg
Oct 6 2016, 10:14 PM
F4569910: CheckUser user agents - get edits.jpg
Oct 6 2016, 10:14 PM
F4569917: CheckUser user agents - get users.jpg
Oct 6 2016, 10:14 PM
F4557231: CheckUser user agents - get users.jpg
Oct 4 2016, 9:58 PM
F4557225: CheckUser user agents - box.jpg
Oct 4 2016, 9:58 PM
F4557228: CheckUser user agents - get IP.jpg
Oct 4 2016, 9:58 PM
Tokens
"Like" token, awarded by Hymeros."Like" token, awarded by TheresNoTime.

Description

The CheckUser interface currently surfaces user agents when listing users or edits, but there's no way to search by user agent. It would be nice if you could click on a listed user agent and it would then show you all users or edits performed by that user agent/IP combination.

This may be an expensive query, so we may have to introduce a new database column with hashed user agents that could be used as an index. This would preclude us from being able to do wildcard or prefix searches, however, (for example with a text field) but in almost all cases, the user will want to search for a specific user agent anyway, so I think that's a decent trade-off.

We will need to add one or more indexes to the table for this.

The user agent search will only be performed as a combined search for user agent and IP address. (If you just searched for user agent on its own, a common user agent would return too many results to be meaningful.)

There will be a second text field under the existing IP/username field that only appears when you click on a user agent link or a user agent is specified in the query string.

Description of the workflow, Get users:

  • User pastes IP address into "IP/username" field, chooses Get users
  • Results list has usernames, IP address and user agents. All three are links. User clicks on one of the user agent links.
  • On refresh, there is an "IP/username" field with the IP address filled in, and a second "user agent" field with the user agent filled in.
  • User can now search again for Get edits or Get users, and the results will only show this IP and user agent combination.
  • User can also blank the "user agent" field, if they don't want to use it anymore.
  • Alternatively, user can change the IP to a different IP or a CIDR range.
  • Note: The "IP/username" field allows for ranges. We should do the same for the "user agent" field, if possible.

Description of the workflow, Get edits:

  • User pastes IP address or username into "IP/username" field, chooses Get edits
  • Results list has log entries, with IP and user agent listed under each entry. IP and user agent are both links. User clicks on user agent link.
  • On refresh, there is an "IP/username" field with the IP address filled in, and a second "user agent" field with the user agent filled in.

Wireframes:

CheckUser user agents - box_2.png (539×905 px, 41 KB)

CheckUser user agents - get edits.png (461×1 px, 81 KB)

Related Objects

StatusSubtypeAssignedTask
Resolved TBolliger
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
OpenNone
ResolvedDreamy_Jazz
OpenNone
ResolvedDreamy_Jazz
ResolvedDreamy_Jazz
Resolved Marostegui
OpenNone
Resolved Marostegui
Resolved DannyH

Event Timeline

There are a very large number of changes, so older changes are hidden. Show Older Changes

@DannyH I don't think the wireframes you added are practical: there is no way to know for sure that a user with username "Mozilla/5.0 ..." doesn't exist. Or a user might edit with the user-agent "DannyH".

Perhaps a better option would be to have a dropdown which has three options (IP, user-agent, username) followed by a textbox in which you provide the data.

On a separate note, I think we should also allow searching in the UA using wildcards. I can think of the efficiency counter-arguments, but an "exact match only" search is not going to be that useful.

Hi @Huji,

Well, the user-agent is a long string of data, generated automatically. A full example is:

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.27604.111 Safari/537.36

So you'd search for this by clicking on a user-agent string that you get from doing a query on an IP address or user name. You wouldn't just type it in; it would be too easy to make mistakes.

So you can't have a user-agent called "DannyH" or anything like that, and if we have a user with a name like "(that whole string of browser names and digits)", then I would like an explanation from that user for why they have such a ridiculous username. :)

I agree that we still have to figure out how useful the exact match will be, compared to wildcards. We need to test out searching for user-agents, and see if we get enough results, or too many.

Well, the user-agent is a long string of data, generated automatically. A full example is:

Mozilla/5.0 (Macintosh; Intel Mac OS X 10_11_5) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/51.0.27604.111 Safari/537.36

A user-agent is "usually" like that. It does not have to be. To my knowledge, there is no RFC that enforces a specific format for user-agent strings, and even if there is, there is no guarantee that a user would not modify their user-agent to something non-standard. We want the CU tool to know if it is searching for "DannyH" in the user field or the UA field. This cannot be decided just by looking at the input string.

So you can't have a user-agent called "DannyH" or anything like that

Yes you can. It is not illegal. In fact I have seen users with really ridiculous user-agent strings.

So you can't have a user-agent called "DannyH" or anything like that

Yes you can. It is not illegal. In fact I have seen users with really ridiculous user-agent strings.

You could spoof it, but the browser would seemingly only give a standard-ish user agent by default, otherwise it may get unwanted content. E.g. DannyH might result in a blank page with "your browser is not supported". I would not be surprised if the more prolific LTAs are spoofing user agents into something ridiculous like this, though. If that is the case, than being able to search for it would certainly be helpful given the unlikelihood of others with a similar UA. Also many browsers automatically update, so you might want to do a wildcard search omitting the browser version to allow for some variation.

I think Huji is right. There's really no way to reliably differentiate between a UA string and a username. In fact a lot of my bots have very simple user agent strings like "WikiTiki 1.0". The suggestion of a drop-down select list sounds like a good idea.

Okay, that can work. Thanks for helping to educate me on user-agents. :)

Thanks @DannyH. To briefly respond to your second last comment above, using a non-standard UA generally doesn't result in any denial of service.

I've updated the wireframes, with a drop-down for IP address, Username and User agent.

DannyH raised the priority of this task from Low to Medium.Oct 11 2016, 9:25 PM
DannyH added a subscriber: Jalexander.

I've revamped the spec and wireframes, following our meeting with @Jalexander. Now you can only search by user agent and IP together.

A few thoughts:

  • If we're only expecting users to provide the user agent by clicking it from a previous search, perhaps there is no input box at all? This could simplify the process.
  • Given the complexity, this system feels prone to displaying error messages. We'll need to list out the messages that could occur and make sure they use clear language.
  • Is there a loading indicator? Should we add one?

A few thoughts:

  • If we're only expecting users to provide the user agent by clicking it from a previous search, perhaps there is no input box at all? This could simplify the process.

But why? I think this is a useful feature. We should allow as many easy ways to search UAs as we can.

Will it be possible to search for IPs using given UA? For example, UA "DannyH" will usually be uncommon and searching for it can be useful.

Will it be possible to search for IPs using given UA? For example, UA "DannyH" will usually be uncommon and searching for it can be useful.

Yes, that is one of the use cases for this tool. It will work similar to IP addresses in the CheckUser tool.

One note on technical side of things: You can use a hash index. In that case you would lose the ability to do regex searches and pattern matching but it's pretty fast. This is a classic example of hash index lookup, I've done this before for other cases.

Please see my related comment in T147894#4962824 in which I explain why, at least for now, it is best to restrict the functionality to searching either by IP or by UA (so if both are provided, we would return an error and ask the user to remove one). It is possible (though I am not sure how likely) that allowing both the IP and the UA to be specified would translate to the need for a massive index on the database tables that would be unjustifiable given the few use cases for a joint search, so I would rather differ that to a later time in the interest of having the UA search in production in near term.

A way to circumvent the large index is to turn this into something like an abusefilter for checkusers only. Get alerted when someone in a range uses a recognizable UA is a gazillion times better that finding a sock after 50 edits, waiting for CUs to check and confirm while the sock is already on a next account.

@Beetstra what you said reminded me of T50623: Add user-agent variable for abuse filtering and I think the limitation there is that AbuseFilter currently does not have a way to restrict certain aspects of a filter (or ability to view or edit certain filters and their associated logs) to a specific group like CUs.

But, while that is a useful feature, it is still useful to also be able to run CU using UA. There is a Venn diagram here, with the two circles not fully overlapped.

The only way I would see is that there is an abusefilter-variety that is
enabled for checkusers (so a separate one). It was however confessed to me
that the AbuseFilter itself needs a serious upgrade, so I can imagine that
a CU-clone of it is not soon going to happen.

I do note that AbuseFilter does require a special right to see filters and
results, I don’t know how difficult it is to make another right that allows
even deeper hidden filters.