The EditAttemptStep schema (previously the Edit schema) has been around since 2013 and provided data for important projects like the various rollouts of the visual editor.
However, over that time it has also accumulated many bugs, oddities, and unanswered strategic questions. This task tracks the resolution of those issues, and will be complete when (eons hence) the schema's stakeholders agree on its purpose and scope, the schema has been modified to implement that purpose and scope, and when all the necessary implementations conform to that schema.
Use cases
- Edit completion rate
- Time to loaded and time to interactive
- Overall edit duration
Scope
Interfaces currently logging to the schema:
- visual editor (phone and desktop)
- 2010 wikitext (phone and desktop)
- 2017 wikitext
- ContentTranslation (but without a separate value of editor_interface to distinguish it from desktop VE)
Should the app editors start using this schema? What about the Flow editor? What about the Wikidata description tool on Android?
- Having a common schema increases the probability that you can get comparable data across all these interfaces (because it forces teams to collaborate), but it doesn't ensure it.
- We should only incur the collaboration overhead if the benefits of more comparability are worth it—there's not much point in comparing, say, edit completion rate of Wikidata description editing with general page editing, because their contexts are so very different.
Session identification
- The schema defines editingSessionId as "a string of 32 alphanumeric characters, unique to the current page view session; used for grouping events".
- mw.user provides a number of different methods for generating session IDs.
- MobileFrontend uses sessionId().
- The visual editor uses generateRandomSessionId()
- The 2010 wikitext editor uses MWCryptRand::generateHex(32).
- Our current implementation of editing sessions is tightly coupled to a page view. However, this doesn't map very well to what we think of as a single edit session: on desktop, switching between the visual editor and the wikitext editor while retaining changes causes a new page view, while on MobileFrontend, aborting an edit using the back button and then reopening the editor (which doesn't preserve your changes) all happens in one page view.
- We don't use the core EventLogging code for client-side session token generation and sampling.
Timings
- There's no reason we should have a separate timing field for each event type when we can have a single one whose meaning varies by event type (T207803#4790039)
- init_timing currently not logged, but the information described in the schema ("timing information about action=init – time in milliseconds since the page was loaded") does not seem useful.
Other issues
- The new ability to switch back-and-forth between the visual editor and wikitext invalidates some key assumptions (for example, we probably want to update action.init.mechanism)
- How should we account for "micro-editing experiences" like Flow? Should they be included in this schema at all?
- Even with T124676 resolved, the table is still quite large. Consider whether to drop mostly unused fields like page.title or normalize the schema (T123958)
- Do our action.saveFailure.type values cover all the options?
- For example, T197499 deals with a save failure because the wiki is in read-only mode, which isn't covered.
- The switch* values of abort_type are probably unnecessary now because we started logging switches as VisualEditorFeatureUse events (T221191#5290393), and in any case it doesn't seem right to consider switches as aborts because logically they are just one intermediate step in a single edit attempt.
- We have started discarding ready and loaded events that occur after a switch (T220697), but it's not clear if we're doing that everywhere
- We need a standard way to deal with multi-interface sessions in analysis—which interface, if any, do we attribute them to?
- The fact that the 2010 wikitext editor logs saveSuccess and init events on the server side, unlike every other event in the schema, create significant inconsistencies (T214132)
- Should we log the user name rather than the user ID? On one hand, the user ID is immutable; on the other hand, the user name is the main global user identifier and easier for humans to use.
Data tidiness
- We should have separate this into two separate tables: EditAttempt (containing data that applies to all steps in an attempt, such as platform, user agent, and user name; this won't include editor_interface because that can differ within a single edit attempt because of switching) and EditAttemptStep (not containing that attempt-wide data).
- We should probably merge VisualEditorFeatureUse into EditAttemptStep with a featureUse action. The observational unit is the same, and it's much easier to subset data from one table than to union data from two tables.
See also
- @Halfak's 2016 proposal for splitting this into five separate schemas:
- EditingSession (one per page edit session)
- EditingStage (one per editing stage)
- EditingAbort (one per aborted edit)
- EditingSaveFailure (one per save failure)
- PageContentSaveComplete (note that this schema already exists)