Page MenuHomePhabricator

Reassure ourselves about triggers & replication
Closed, ResolvedPublic4 Estimated Story Points

Description

I assume we use 'row-based replication'
https://dev.mysql.com/doc/refman/5.7/en/replication-features-triggers.html

Jeff_Green::: eileen: something occured to me over the weekend: triggers exist only on the master right?
eileen:: Jeff_Green: hmm - maybe they get replicated. That would be bad
Jeff_Green
10:30 eileen: i was thinking about what happens if the master explodes
awight
10:30 Jeff_Green: wrt triggers, yeah I think that's the interaction that causes all of our SUPER problems, cos triggers and functions on the replica would not be deterministic
Jeff_Green
10:30 we need to be sure we have a cutover protocol that allows us to promote a slave without losing triggers
awight
10:31 ooh good catch
eileen
10:31 hmm - as long as the triggers don't fire during replication
Jeff_Green
10:31 it woke me up actually
awight
10:31 ha, a demonic vision
Jeff_Green
10:31 just about to fall asleep panic on Saturday night
eileen
10:31 :-)
10:32 yeah we can't really test that on staging!
Jeff_Green
10:32 i might be able to figure out a way to test it
eileen
10:32 am just reading this https://dev.mysql.com/doc/refman/5.7/en/replication-features-triggers.html

Event Timeline

Eileenmcnaughton created this task.

Assigning to you in the hope you can do the test you suggested...

10:32 am just reading this
https://dev.mysql.com/doc/refman/5.7/en/replication-features-triggers.html

"With statement-based replication, triggers executed on the master also
execute on the slave. With row-based replication, triggers executed on the
master do not execute on the slave."

We're currently using 'mixed' replication, so this is something we need to
look at very carefully.

More discussion at http://bots.wmflabs.org/~wm-bot/logs/#wikimedia-fundraising/20160414.txt from 14:53 to 16:00 UTC.

  • We're considering using a feature built into CiviCRM core, which is enabled by an optional feature flag. It adds triggers on all write operations to the civicrm database, and simply copies the new record over to a type 2 history table in a new logging database.
  • There are many CiviCRM instances successfully using the log triggers, but we might be the only instance considering using both logging and replication.
  • We're currently using mixed binary logging, but are in agreement that pure row logging should be fine for us, assuming it's performant.
  • Collectively, fr-tech have a poor understanding of how triggers will behave under mixed or row replication.
  • We can allow spurious logging records of things which were actually rolled back in the civicrm db, so two-phase commit is not necessary or desirable.
  • The failover scenario is poorly understood. We think we will have to create triggers on the slave as part of the manual steps in promoting to master.

@jcrespo
We're hoping to get your eyes on the problem described in the previous comment. You can probably skip reading the IRC history, it's summarized in the bullet points. What we need to understand are the best practices around using log triggers in a replicated environment.

Here's an example trigger, to make things concrete:

CREATE TRIGGER civicrm_case_after_update
after update ON civicrm_case
FOR EACH ROW BEGIN
    IF (
        (
            IFNULL(OLD.id,'') <> IFNULL(NEW.id,'')
            OR IFNULL(OLD.case_type_id,'') <> IFNULL(NEW.case_type_id,'')
            OR IFNULL(OLD.subject,'') <> IFNULL(NEW.subject,'')
            OR IFNULL(OLD.start_date,'') <> IFNULL(NEW.start_date,'')
            OR IFNULL(OLD.end_date,'') <> IFNULL(NEW.end_date,'')
            OR IFNULL(OLD.details,'') <> IFNULL(NEW.details,'')
            OR IFNULL(OLD.status_id,'') <> IFNULL(NEW.status_id,'')
            OR IFNULL(OLD.is_deleted,'') <> IFNULL(NEW.is_deleted,'')
        ) AND (
            @civicrm_disable_logging IS NULL OR @civicrm_disable_logging = 0
        )
    ) THEN
        INSERT INTO `log_civicrm`.log_civicrm_case
            (id, case_type_id, subject, start_date, end_date, details, status_id, is_deleted, log_conn_id, log_user_id, log_action)
            VALUES (NEW.id, NEW.case_type_id, NEW.subject, NEW.start_date, NEW.end_date, NEW.details, NEW.status_id, NEW.is_deleted, CONNECTION_ID(), @civicrm_user_id, 'update');
    END IF;
END

Without much context:

  • Triggers created on the master are automatically created on the slave (but it is always good to check it, in case of filtering, etc.)
  • Row-based replication is desired over statement/mixed. There is no rule over that- there is indeed performance considerations, and configuration that can be tweaked (e.g. it can increase bandwidth needed, and so latency, but in some cases that bandwidth can be reduced; it also depends more on having primary keys on all queries). For context, we are trying to move to ROW-based replication on core production: T109179, but it is not that "easy" (some tables have no PK and we depend on STATEMENT for filtering to labs, which is painful).
  • Statement based replication makes triggers execute on the slaves (to hopefully get eventually to the same state on the master), row based replication replicates the master's triggers effects (rows changes) and ignores triggers execution on the slave. Mixed is statement based replication that failovers to row when running unsafe statements. It will behave in one way or another depending on the actual format used for each event.
  • Triggers, like events, are tricky- I would recommend using cron/application code where possible; even binary logging inspection. Triggers are not immediately clear to debug unlike a user space process. This is not an absolute thing, and not the most optimized option, but in my experience more visibility helps with maintenance. (What if you delete a column and the trigger starts to fail?). Please consider this issues before committing (pun intended) to a trigger. Triggers are the "nice" solution from development point of view, but are a frequent case of headaches (what if you need to reimport that table- you cannot disable triggers in MySQL!).
  • You should create the triggers on the master as usual and let those creation replicate to the slave. If you do not want a table on the slave, use slave replication filters to avoid those (although that can be a problem in the case of a failover)

I do not understand this:

"We can allow spurious logging records of things which were actually rolled back in the civicrm db, so two-phase commit is not necessary or desirable."

2-phase commit is an InnoDB thing. Triggers and replication is sql-layer. What do you mean? If the transaction is committed, it goes to the binary log, if not, it is not. In reality things are not that clear-cut, due to different binlog sync configuration, semisync (which I would recommend you to use), but in theory, transaction log <> binary log.

Depending on the needs of the logging, if it is not for data retention but for auditing, please consider an audit plugin: https://github.com/mcafee/mysql-audit/wiki/Configuration

@awight and I discussed this and felt that we should do a 'trial run' of adding the trigger to just one table on live first - choosing civicrm_setting as a table that can be restored from backup (or from staging) happily if anything wierd happened.

CREATE TABLE log_civicrm.`log_civicrm_setting` (
  `id` int(10) unsigned DEFAULT NULL,
  `group_name` varchar(64) COLLATE utf8_unicode_ci DEFAULT NULL COMMENT 'group name for setting element, useful in caching setting elements',
  `name` varchar(255) COLLATE utf8_unicode_ci DEFAULT NULL COMMENT 'Unique name for setting',
  `value` text COLLATE utf8_unicode_ci COMMENT 'data associated with this group / name combo',
  `domain_id` int(10) unsigned DEFAULT NULL COMMENT 'Which Domain is this menu item for',
  `contact_id` int(10) unsigned DEFAULT NULL COMMENT 'FK to Contact ID if the setting is localized to a contact',
  `is_domain` tinyint(4) DEFAULT NULL COMMENT 'Is this setting a contact specific or site wide setting?',
  `component_id` int(10) unsigned DEFAULT NULL COMMENT 'Component that this menu item belongs to',
  `created_date` datetime DEFAULT NULL COMMENT 'When was the setting created',
  `created_id` int(10) unsigned DEFAULT NULL COMMENT 'FK to civicrm_contact, who created this setting',
  `log_date` timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,
  `log_conn_id` varchar(17) COLLATE utf8_unicode_ci DEFAULT NULL,
  `log_user_id` int(11) DEFAULT NULL,
  `log_action` enum('Initialization','Insert','Update','Delete') COLLATE utf8_unicode_ci DEFAULT NULL,
  KEY `index_id` (`id`),
  KEY `index_log_conn_id` (`log_conn_id`),
  KEY `index_log_date` (`log_date`),
  KEY `index_contact_id` (`contact_id`),
  KEY `index_created_id` (`created_id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8 COLLATE=utf8_unicode_ci ROW_FORMAT=COMPRESSED KEY_BLOCK_SIZE=4 ;

INSERT INTO `log_civicrm`.log_civicrm_setting (id, group_name, name, value, domain_id, contact_id, is_domain, component_id, created_date, created_id, log_conn_id, log_user_id, log_action) 
SELECT id, group_name, name, value, domain_id, contact_id, is_domain, component_id, created_date, created_id, 
COALESCE(@uniqueID, LEFT(CONCAT('c_', unix_timestamp()/3600, CONNECTION_ID()), 17)), @civicrm_user_id, 'Initialization' 
FROM civicrm_setting;

Trigger sql - needs higher permissions

DELIMITER //

DROP TRIGGER IF EXISTS civicrm_setting_after_insert//

CREATE TRIGGER civicrm_setting_after_insert after insert ON civicrm_setting FOR EACH ROW BEGIN  IF ( @civicrm_disable_logging IS NULL OR @civicrm_disable_logging = 0 ) THEN INSERT INTO `log_civicrm`.log_civicrm_setting (id, group_name, name, value, domain_id, contact_id, is_domain, component_id, created_date, created_id, log_conn_id, log_user_id, log_action) VALUES ( NEW.id, NEW.group_name, NEW.name, NEW.value, NEW.domain_id, NEW.contact_id, NEW.is_domain, NEW.component_id, NEW.created_date, NEW.created_id, COALESCE(@uniqueID, LEFT(CONCAT('c_', unix_timestamp()/3600, CONNECTION_ID()), 17)), @civicrm_user_id, 'insert');END IF; END//

DROP TRIGGER IF EXISTS civicrm_setting_after_update//

CREATE TRIGGER civicrm_setting_after_update after update ON civicrm_setting FOR EACH ROW BEGIN  IF ( (IFNULL(OLD.id,'') <> IFNULL(NEW.id,'') OR IFNULL(OLD.group_name,'') <> IFNULL(NEW.group_name,'') OR IFNULL(OLD.name,'') <> IFNULL(NEW.name,'') OR IFNULL(OLD.value,'') <> IFNULL(NEW.value,'') OR IFNULL(OLD.domain_id,'') <> IFNULL(NEW.domain_id,'') OR IFNULL(OLD.contact_id,'') <> IFNULL(NEW.contact_id,'') OR IFNULL(OLD.is_domain,'') <> IFNULL(NEW.is_domain,'') OR IFNULL(OLD.component_id,'') <> IFNULL(NEW.component_id,'') OR IFNULL(OLD.created_date,'') <> IFNULL(NEW.created_date,'') OR IFNULL(OLD.created_id,'') <> IFNULL(NEW.created_id,'')) AND ( @civicrm_disable_logging IS NULL OR @civicrm_disable_logging = 0 ) ) THEN INSERT INTO `log_civicrm`.log_civicrm_setting (id, group_name, name, value, domain_id, contact_id, is_domain, component_id, created_date, created_id, log_conn_id, log_user_id, log_action) VALUES (NEW.id, NEW.group_name, NEW.name, NEW.value, NEW.domain_id, NEW.contact_id, NEW.is_domain, NEW.component_id, NEW.created_date, NEW.created_id, COALESCE(@uniqueID, LEFT(CONCAT('c_', unix_timestamp()/3600, CONNECTION_ID()), 17)), @civicrm_user_id, 'update');END IF; END//



DROP TRIGGER IF EXISTS civicrm_setting_after_delete//


CREATE TRIGGER civicrm_setting_after_delete after delete ON civicrm_setting FOR EACH ROW BEGIN  IF ( @civicrm_disable_logging IS NULL OR @civicrm_disable_logging = 0 ) THEN INSERT INTO `log_civicrm`.log_civicrm_setting (id, group_name, name, value, domain_id, contact_id, is_domain, component_id, created_date, created_id, log_conn_id, log_user_id, log_action) VALUES ( OLD.id, OLD.group_name, OLD.name, OLD.value, OLD.domain_id, OLD.contact_id, OLD.is_domain, OLD.component_id, OLD.created_date, OLD.created_id, COALESCE(@uniqueID, LEFT(CONCAT('c_', unix_timestamp()/3600, CONNECTION_ID()), 17)), @civicrm_user_id, 'delete');END IF; END//

DELIMITER ;

@jcrespo
Thanks for the thorough response! We'll review our options, I think you've given us enough information to answer the immediate questions.

NB in answer to

What if you delete a column and the trigger starts to fail?)

It fails hard in that instance - fatal errors at the php level - so hard to miss & relatively easy to diagnose

OTOH if you add a column & forget to add it into the trigger then the data is simply silently not captured. We need to think about this with custom fields

Closing as we did a trial deploy of the civicrm_settings log table to achieve this. I'll also remove from internet Exploring sprint as was done under Hermit Crab Husbandry

@DStrine I was going to remove this from IE sprint - but do I need to move to Done in HCH sprint to get your report info right?