On the utility of filing bugs

During my five years working at Mozilla, I’ve been known to ask people to file bugs when they encountered an issue. Most of the time, the answer was that they didn’t have time to do so and it was useless. I think it is actually very valuable. You get to learn from that experience: how to file actionable bugs, getting deeper knowledge into a specification, maybe a workaround for the problem.

A recent example

Three weeks ago, at work, we launched a new design for the website header. We got some reports that the logo was missing in Firefox on some pages. After investigation, we discovered that Firefox (and also Edge) had a different behaviour with SVG’s <use xlink:href> on pages with a <base> element. We fixed it right away by using an absolute URL for our logo. But we also filed bugs against Gecko and Edge. As part of filing those bugs, I found the change in the SVG specification clarifying how it should be handled. Microsoft fixed the issue in less than two weeks. Mozilla fixed it in less than three weeks.

In October this year1, all browsers should behave the same way in regard to that issue. And a four year old workaround will be obsolete. We will be able to remove the code that we had to introduce. Less code, yeah!

I hope this will convince you that filing bugs has an impact. You can learn more on how to file actionable bugs. If you’d like an easier venue to file bugs when browsers are incompatible, the WebCompat project is a nice place to start.


  1. Firefox 55 should be released on August 8 and the next Edge should be released in September (maybe even earlier, I’m not clear on Edge’s release schedule) 

Updating my code search tools

Finding an information in a codebase is a very common task performed by software engineers. The quicker you can do it, the more chance you have of not losing your momentum. With proper tools, you can find things 10 times faster1.

In 2011, I’ve presented ack in a lightning talk at Paris Web. That was my tool of choice for a while. It was very convenient because it didn’t report files stored in the VCS folders. A bit later, I switched to ag that does the same thing and also tries to avoid files defined in .gitignore. Recently, I’ve often ran into its limitations in .gitignore file support. So I’m switching to two tools.

ripgrep

A recent player in the field. The introductory blog post got me very interested with the detailed explanation of the code and the benchmarks. It has a way better support for .gitignore. And it’s pretty cool to be using a tool written in Rust.

I’d love if it supported searching through compressed files out of the box. In the mean time, this command will do the trick, although it can’t take advantage of parallelism: find . -name "*.gz" -exec gzcat "{}" + | rg "whatever"

git grep

This will only work in git repositories of course. Because it is a git tool, it only inspects files in the repository. To make it more convenient, I’ve tweaked it to have the same output than ripgrep.

  • --heading prints the filename once before all matches in that file
  • --break puts an empty line before a new file

Here’s an extract of my global .gitconfig:

[alias]
rg = "grep --heading --break -i"
[grep]
lineNumber = true
[color "grep"]
filename = "magenta"
linenumber = "green"
match = "red bold"

While looking into the documentation to tweak the output, I’ve noticed some interesting options. I don’t use them often but they are nice

  • -p: This will try to display the function that the matches are part of. Not very reliable but gives a bit more context sometimes.
  • -O: This will open the matching files in the pager specified. If you set the pager to your code editor, it will open all matching files in your text editor.

  1. Yes, 10 times 

★ Confiance et transparence

Dans un univers en état d’équilibre thermique, aucun événement ne pourrait plus se produire en raison de l’absence de dénivellation. Dans un cercle en état d’équilibre d’information, il n’y a plus aucune information. Dans un groupe en état d’équilibre humain, d’homogénéité humaine, il y a entropie. Mais l’entropie, c’est exactement l’équilibre de la mort. Il nous faut être attentif, si nous acceptons les généralisations qui ont été faites par d’autres, à ce que l’adaptation parfaite des uns et des autres dans un groupe signifie en réalité la disparition de la vie de ce groupe au profit de sa mécanisation. L’unité accomplie dans le mouvement politique signifie la disparition de la vie dans un système donné. […]

Or, je peux dire que l’orientation vers une conception unitaire de la nation sous la puissance organisatrice de l’État, comme l’orientation vers l’adaptation généralisée de l’homme à son milieu sont des tendances qui accroissent l’entropie et diminuent la vie. Dans ce mouvement, l’illusion politique a son rôle fort précis à jouer, qui est de présenter un simulacre, un faux-semblant d’information, de courant vivant, de fixer les passions sur de fausses réalités, cependant que les mécanismes adaptateurs fonctionnent, et d’éviter les heurts et les refus au niveau de la réalité de la société nouvelle.

La seule voie pour maintenir l’État dans son cadre et sa fonction, pour restituer à la problématique vie privée-vie politique une réalité, pour dissiper l’illusion politique, c’est de développer et de multiplier les tensions. Cela est également vrai pour l’individu et pour le corps politique. Je pense que seuls les processus de tension et de conflits sont formateurs de la personne. Non seulement sur le plan le plus élevé mais aussi sur le plan collectif.

L’illusion politique, Jacques Ellul.

La faible réactivité de l’État est parfois nécessaire pour laisser le temps à ces tensions de s’installer. Vouloir niveler trop tôt (cache) est une occasion perdue d’une appropriation citoyenne. C’est proposer une escalator là où quelques marches et encouragements auraient suffit. Sans compter le coût de mise en place et de maintenance.

Mastodon est un réseau distribué fondé sur la confiance entre les personnes d’une même micro-culture puis à une autre échelle entre ces micro-cultures à travers les connexions entre instances. Il y a l’opacité des relations humaines dans ces différentes relations de confiance. Certaines sont rendues publiques, d’autres pas. Certaines ont une gouvernance collective, d’autres pas. Certaines sont légales dans certains pays, pas dans d’autres. Cette complexité est propre à chaque communauté et ne peut être résolue à l’échelle d’un pays ou d’une administration. Du moins de manière démocratique.

Lorsque la confiance est rompue, il reste la transparence. Je creuse depuis quelques jours les technologies et concepts autour de Secure ScuttleButt (SSB pour les intimes). À la différence de Mastodon, il ne s’agit pas de technologies web (cache) et Robin a fait un excellent article d’introduction (cache) que je vais tenter de résumer en trois points clefs :

  1. Chaque périphérique est une identité (plus spécifiquement possède une clé privée) qui va émettre un flux incrémental de messages signés.
  2. Les messages du flux s’échangent en pair à pair, je ne stocke que ceux de mes amis et de leurs propres amis pour suivre les discussions et rendre le réseau plus résilient.
  3. Le stockage est chiffré et sa représentation (interface utilisateur) est à la libre interprétation de l’implémenteur. Il existe actuellement des clients graphiques mais c’est l’API qui fait référence.

Autant dire que c’est à des années lumières d’un Twitter décentralisé :-). Ici chaque nœud comporte ses données et devient le centre de son propre réseau pouvant être connecté ou non à Internet. On touche mine de rien avec ces technologies à une certification forte de l’identité/la clé utilisée. Je pense que c’est trop intelligent pour pouvoir percer et qu’il faudra encore quelques itérations pour une prise de conscience collective de ces enjeux mais le cap est techniquement enthousiasmant.

Au niveau des inconvénients, il y a bien sûr ceux relatifs à la performance car le pair-à-pair consomme forcément davantage de ressources pour chaque nœud mais le stockage des flux des amis des amis limite les pics que j’avais pu avoir avec d’autres réseaux du même type. Qui dit chiffrement dit forcément aussi problématiques associées à la gestion des clefs, pour l’instant une clef est générée par périphérique et au passage il n’y a pas à ma connaissance d’implémentation pour téléphone intelligent(-mais-pas-trop-sinon-on-pourrait-faire-du-mesh-sans-payer-une-dîme-aux-constructeurs-et-opérateurs…).

On retrouve la dualité confiance humaine (Mastodon) vs. transparence algorithmique (SSB), observer où vont se déployer les intérêts personnels, politiques et économiques sur ce curseur est fascinant à plus d’un titre. Plus que jamais, les réseaux que l’on rejoint en disent beaucoup sur nos affinités politiques et notre conception du monde.

Entre la Pachysphère et le Scuttleverse, je ne saurais trancher. D’un côté la communauté francophone qui redécouvre les avantages et les inconvénients d’être en comité restreint, de l’autre des personnes qui expérimentent sur des concepts qui me tiennent à cœur. Après tout les deux ne sont pas antithétiques, mais mon attention est limitée :-).

It's OK to Break Things

We all break things and it's OK.

It's OK to break things because you make a mistake. It's OK to break things because you experiment. It's OK to break things because you need to move fast and didn't test everything.

Breaking things is not something you should worry about. What's been broken can be fixed. Most of the time.

Breaking things is OK. Leaving what you've broken unfixed is not.

Japanese Kintsugi is the philosophy that treats breakage and repair as part of the history of an object, rather than something to disguise. Kintsugi doesn't consider what's been broken as lost, but as an opportunity to become something else because of all the work that's been done to fix it.

As an ops, I break lots of things. It's part of the job. I also fix lots of things. It's part of the job too. Breaking (not too often), fixing are both aspects of getting things done. I've learned much more about things by fixing them after I broke them than I would have if I didn't break anything.

Most people will try to avoid breaking things at all cost. They won't move. They won't experiment. They'll avoid the risk of breaking things because they don't think about fixing what they have broken. Being cautious is good, immobilism is not. Try, experiment, break and fix. Don't be afraid. It's a mindset to acquire.

Experience: how I Learned to Stop Worrying and Love Fixing what I've Broken.

Some people break things but they don't fix. They leave people fixing what they have broken. They don't care other people's time, and they don't learn. They'll break the same thing again and again. Fixing things teaches you how not to break them next time.

Breaking things is OK. Leaving what you’ve broken unfixed is a waste.

Photo: Shattered, by Bart.


It's OK to Break Things was originally published in Fred Thoughts on Medium, where people are continuing the conversation by highlighting and responding to this story.

How we Upgraded a 22TB MySQL Cluster from 5.6 to 5.7 (in 9 months)

Yesterday, the Synthesio Coffee Team finished upgrading a 22TB MySQL cluster from Percona 5.6 to Percona 5.7. We already upgraded most of our clusters and knew that one would take time, but we didn't expect it to take 9 full months. This is what we have learned about migrating giant database clusters without downtime.

The initial setup

Our database cluster is a classic high availability 3 + 1 nodes topology running behind Haproxy. It runs on a Debian Jessie without Systemd with a 4.9.1 kernel (4.4.36 at the beginning). The servers have a 20 core Dual Xeon E5–2660 v3 with 256GB RAM and 36 * 4TB hard drive setup as a RAID10. The throughput is around 100 million writes / day, inserts and updates mixed.

Cluster design

The cluster design itself has nothing special:

  • 2 servers are configured as master / master, but writes are performed on the main master only.
  • Reads are performed on the master and both slaves via a Haproxy configured to remove a slave when the replication lags.
  • A spare slave is running offsite with MASTER_DELAY set to 1 hour in case the little Bobby Tables plays with our servers.

Step 1: in the hell of mysql_upgrade

We upgraded the spare host to MySQL 5.7 using Percona Debian packages.

Upgrade of the spare slave

Upgrading from MySQL 5.6 to MySQL 5.7 requires to upgrade every table having TIME, DATETIME, and TIMESTAMP columns to add support for fractional seconds precision. mysql_upgrade handles that part as well as the system tables upgrade.

Upgrading tables with temporal columns means running an ALTER TABLE … FORCE on every table that requires the upgrade. It meant copying 22TB of data as temporary tables, then load the data back. On spinning disks.

After 5 months, we were 20% done.

We killed mysql_upgrade and wrote a script to run the ALTER on 2 tables in parallel.

2 months later, the upgrade was 50% done and the replication lag around 9 million seconds. A massive replication lag was not in the plans, and it introduced a major unexpected delay in the process.

We decided to upgrade our hardware a bit.

We installed a new host with 12 * 3.8TB SSD disks in RAID0 (don't do this at home), rsynced the data from the spare host and resumed the process. 8 days later, the upgrade was over. It took 3 more weeks to catch up with the replication.

Step 2: adding 2 new MySQL 5.7 slaves

Before doing this, make sure your cluster has the GTID activated. GTID saved us lots of time and headache as we reconfigured the replication a couple of times. Not sure about friendship, but MASTER_AUTO_POSITION=1 is magic.

We added 2 new slaves running MySQL 5.7. There’s a bug with Percona postinst.sh script when installing a fresh server with MySQL data not in /var/lib/mysql. The database path is hardcoded in the shell script, which cause the install to hang forever. The bug can be bypassed by installing percona-server-server-5.6, then installing percona-server-server-5.7.

Adding 2 new slaves

Once done, we synced the data from the MySQL 5.7 spare host to the new slaves running innobackupex.

On the receiver:

mysql -e "SET GLOBAL expire_logs_days=7;"
nc -l -p 9999 | xbstream -x

On the sender:

innobackupex --stream=xbstream -- parallel=8 ./ | nc slave[3,4] 9999

30 hours later:

innobackupex --use-memory=200G --apply-log .

On Slave 3:

CHANGE MASTER TO
MASTER_HOST="master",
MASTER_USER="user",
MASTER_PASSWORD="password",
MASTER_AUTO_POSITION=1;

On slave 4:

CHANGE MASTER TO
MASTER_HOST="slave 3",
MASTER_USER="user",
MASTER_PASSWORD="password",
MASTER_AUTO_POSITION=1;

Step 3: catching up with the replication (again)

Once again, we were late on the replication. I already wrote about fixing a lagging MySQL replication. Please read carefully the pros and cons before you apply the following configuration:

STOP SLAVE;
SET GLOBAL sync_binlog=0;
SET GLOBAL innodb_flush_log_at_trx_commit=2;
SET GLOBAL innodb_flush_log_at_timeout=1800;
SET GLOBAL slave_parallel_workers=40;
START SLAVE;

Catching up with the replication on both hosts took 2 to 3 weeks mostly because we had much more writes than we usually do.

Step 4: finishing the job

The migration was almost done. There were a few things left to do.

We reconfigured Haproxy to switch the writes on slave 3, which de facto became the new master and the reads on slave 4. Then, we restarted all the writing processes on the platform to kill the remaining connections.

After 5 minutes, slave 3 had caught up with everything from master, and we stopped the replication. We saved the last transaction ID from master in case we would have to rollback.

Then, we reconfigured slave 3 replication to make it a slave of slave 4, so we would run in a master / master configuration again.

We upgraded master in MySQL 5.7, ran innobackupex again and made it a slave of slave 3. The replication took a few days to catch up, then, yesterday, we added master back in the cluster.

After one week, we trashed slave 1 and slave 2 which were no use anymore.

Getting things done

Rollback, did someone say "rollback"?

My old Chinese master had a proverb for every migration:

A good migration goes well but a great migration expects a rollback.

If something got wrong, we had a plan B.

When swithing, we kept the last transaction ran on master, and master was still connected to slave 1 and slave 2. In case or problem, we would have stopped everything, exported the missing transactions, loaded the data and switched back to the origin master + slave 1 + slave 2. Thankfully, this is not something we had to do. Next time, I'll tell you how we migrated a 6TB Galera cluster from MySQL 5.5 to MySQL 5.7. But later.

Photo Astrid Westvang.


How we Upgraded a 22TB MySQL Cluster from 5.6 to 5.7 (in 9 months) was originally published in Fred Thoughts on Medium, where people are continuing the conversation by highlighting and responding to this story.

I Refused my Dream Job because of the Company's Culture

A couple of months ago, I got contacted by a top notch recruiting firm for what I still believe would have been my dream job. I'm not looking for anything new but this time, things were different.

The job description seemed to have been written for me. A large corporation was looking for a doer with an important startup experience and mindset to take over their infrastructure. They wanted someone who could bring their old school IT and outdated methodologies to the first century, worldwide. I would have free hands, almost unlimited budget and a large team to achieve my mission. Needless to say there were enough challenges to keep me out of my comfort zone for years. And I would double my salary.

I started the usual interview process meeting the recruiter. We spent 2 hours talking about myself, the position and the company. The more I was speaking, the more convinced she was I was the person for the job. The more she explained about the company and the position, the more doubts I had. There were little to no technical challenges, and I was not sure I would fit in a large company, but there were enough opportunities so I decided to go ahead anyway.

The following weeks, I met 3 execs from the company. Every meeting was challenging. I met smart people who mastered the arcanes of the corporate world. They all had a different view on the job I was applying for. That's how I saw the first red flags. I was applying for a key position in the company, and there seemed to be a subtle political game around me, every people trying to shape the job and my opinion according to their own corporate interests.

I ended the process meeting the group number 2. She was a smart woman full of energy. I was thrilled to meet her. I had spent the past 2 nights on-call fixing broken stuff and I felt I was at 10%. It was the worst job interview I can remember of, but I learnt everything I needed to.

Being the one hiring me, she told me what the role was about. And she told me about the company culture. She told me how I would have to deal with powerful unions every time I'd want to move something. She told me how it took her months of negotiation to set up a 7AM / 9PM help desk. She told me how political the company was. She told me that "move fast" was not something I should think about, hence "break things". She told me the truth about the company's culture, realpolitik beyond the job description.

I left her office and went to the courtyard to smoke a cigarette and think about everything she told me. I realize I'd exhaust myself trying to fit the company's culture. Heartbroken, I called the recruiter to tell her I'd refuse my dream job because it was not my dream company.

Photo Stefanos Papachristou.


I Refused my Dream Job because of the Company's Culture was originally published in Fred Thoughts on Medium, where people are continuing the conversation by highlighting and responding to this story.

★ Micro-cultures and governance

A new system of governance or collaboration that does not follow a competitive hierarchical model will need to employ stigmergy in most of its action based systems. It is neither reasonable nor desirable for individual thought and action to be subjugated to group consensus in matters which do not affect the group, and it is frankly impossible to accomplish complex tasks if every decision must be presented for approval; that is the biggest weakness of the hierarchical model.

Stigmergy (cache)

The recent hype within the French community about Mastodon is fascinating and raises some questions about decentralisation. At first, everybody jumps on main instances for the sake of simplicity and that’s normal. Consider it as freemium, you have 15 days (followers?) to evaluate the product and invest some time to choose and/or install the instance that fits you. The next step is to stick to traditional model and join an instance you trust (like being employed by a company) or to be tech-savy/confident enough to install and maintain your own instance (like acting as a freelance). And then come cooperatives-alike instances, the ones requiring to care about values and ethics.

Why is that important? Each Mastodon instance creates a micro-culture and links between these micro-cultures are done by humans. One line of “code” added, thousands of new connections made possible. Another one is removed and a lot of relations are broken without even being noticeable for users. Having that responsibility should be daunting for each instance administrator. That’s why I decided to co-create a cooperative after years of freelancing. It was both too easy and too difficult to deal with moral questions alone. Efficiency versus exploration. How do we decide collectively which instances we would like to be linked to? How much time can we dedicate to that task?

There are already initiatives to finance and share the responsibility of instances and that’s great. I’m more inclined to host an instance at scopyleft.fr than at larlet.fr for the same reasons. And invite peers to join the governance, discussing and sharing together before doing and working together.

The more I think about all that, the more I realise that federating identities into a unique one is an old model we may want to avoid with real decentralisation. Even with strong self-integrity you do not (re)act the same way given the group you’re in, multiple identities may allow to generate multiple circles of trust (hello G+). Either inter-connected or not. How these relations will evolve at scale based on governance and self-interest is still to be observed. Oh, and acted. Wait, tooted.

Note: disconnected instances will probably hurt Slack too, maybe more that Twitter actually.

✍ Lettre à Éric

Éric,

Je comprends tes réticences (cache) vis-à-vis de Mastodon, j’ai eu les mêmes assez rapidement mais je continue de creuser en me posant des questions sur la fédération d’identités ou l’expérience utilisateur. C’est la beauté d’un réseau naissant avec les tâtonnements de chacun. Proposer une solution compliquée et découvrir qu’il en existe une plus simple. La communauté itère dans un voyage initiatique qui est nécessaire pour atteindre les limites de la décentralisation. Celui-ci doit se faire ensemble sans brûler les étapes au risque d’en perdre en chemin et de créer un réseau plus élitiste que ce qu’il n’est déjà.

Je suis ravi que l’instance la plus populaire n’arrive pas à tenir la charge, ne surtout pas l’optimiser et laisser d’autres prendre le relais pour atteindre un réseau réellement décentralisé. Je ne pense pas que tu aies eu tout de suite ton nom de domaine et ton hébergeur de confiance pour tes courriels, il en est de même pour ton identifiant Mastodon. Il évoluera au cours du temps, des données seront perdues mais tu auras probablement sauvegardé ton graphe de relations en attendant mieux (le CSV d’export est une liste des différents identifiants).

Quelle chance de pouvoir faire une expérience de cette échelle qui implique de la décentralisation et de la liberté. Il n’y a pas de stratégie, de gros acteurs ou quoi que ce soit, juste un joyeux bordel qui apporte du fun. Et ça fait du bien.

David

A Technical Guide on Open Sourcing your Code without Pain

At Synthesio, we have recently started to release part of our Ansible deployment stack on Github. It’s the achievement of a 2 years long project. We wanted to do it, but didn’t for many good and bad reasons.

The code was not good enough to be released. That’s the excuse I hear the most from companies that are reluctant to open source their code.

We can’t release that code, it’s crappy and people will think our engineering team sucks.

That’s the wrong way of thinking. Don’t wait for your code to be perfect or you won’t release anything. Push something that might be useful for someone. If people use it, they will contribute and improve your code.

We didn’t have the time to do it. To tell the truth, open sourcing our code was not a priority. We had to deliver fast, fix many things, so doing simple stuff like writing documentation or pushing on Github came second.

The code had Synthesio specific stuff we couldn’t push. That might be the only good reason to keep our code closed for so long. We had to make our code less Synthesio specific by moving things from the core to the configuration. It didn’t take long, and made our code more readable and reusable as our infrastructure is growing. The process is still ongoing and we’ll keep pushing stuff as we clean them.

If you want to open source part of your code, here’s a way to do it without causing a mess or getting crazy.

Split your code source into modules

The first part of the job is splitting your existing code into modules you can release. In this example, we’re releasing our Mesos deployment Ansible role, which used to be in our Ansible stack core.

To do this, we’ll rely on Git submodules. Many people hate submodules, but in our case, that’s the best way to go. We’ll be able to have a separate Git repository on our internal Gitlab we can mirror on Github in the blink of an eye when we update it.

First, create 2 new git repositories:

  • One on your internal git infrastructure, we’ll call ansible-mesos-internal
  • One on Github, we’ll call infra-ansible-mesos because that’s how we called it.

We want to keep the code on our internal Git infrastructure in case of Github would close or be down someday.

Now, we can actually split the code into modules. Since we don’t want to lose the git history, we’ll use some git tricks to keep that code and its revisions only.

Clone your local repository into a new one:

git clone ./ansible ./ansible-release
cd ./ansible-release

Make sure the place is clean before you start working

git checkout master
git reset --hard
for remote in $(git remote); do git remote rm $remote; done
for branch in $(git branch -a | grep -v master); do git branch -D $branch; done

It’s now time to do the real thing:

git filter-branch --tag-name-filter cat --prune-empty --subdirectory-filter roles/users -- --all

You’re now left alone with your mesos role. Good. Some cleaning might be needded before we push to our new repositories.

git for-each-ref --format="%(refname)" refs/original/ | xargs -n 1 git update-ref -d
git reflog expire --expire=now --all
git gc --aggressive --prune=now

Remove references to your specific / secret code

Our Mesos role used to have some Synthesio internal things we don't want to release, like machine hostnames, or usernames / passwords. To do it, we had to clean Git history so people won't find it while browsing Github.

If you've just deleted that file, it's easy:

git filter-branch -f --index-filter 'git update-index --remove defaults/main.yml' <sha1 of introduction>..HEAD

However, we didn't delete the file, we replaced the data so we need some more git tricks to get rid of the history until the replacement.

First, you need to find the commit SHA1 you replaced your sensitive data with.

git log defaults/main.yml

Now, create a branch ending to that commit

git checkout -b secrets <sha1 of the commit>

git checkout master

git filter-branch -f --index-filter 'git update-index --remove defaults/main.yml' <sha1 of introduction>..secrets

You're done! Your file is still alive but all the sensitive history has been deleted.

Pushing and using

Add a LICENSE and a README file, you're now ready to push your code

git remote add origin ansible-mesos internal
git push -u origin master
git remote add github infra-ansible-mesos
git push -u github master

Now, make use of the newly separated module in your main project. Since we don't want the whole mesos history thing, we'll delete it as well here.

cd ../ansible
git checkout -b feature/split-mesos
git filter-branch -f — index-filter ‘git update-index — remove roles/mesos’ <sha1 of introduction>..HEAD
git submodule add roles/mesos ansible-mesos-internal
git add .gitmodules
git commit -m "Splitting mesos from the main project"
git push origin feature/split-mesos

Get your commit reviewed, tell your pals to update. TADA! Your company is now a proud open source contributor!


A Technical Guide on Open Sourcing your Code without Pain was originally published in Fred Thoughts on Medium, where people are continuing the conversation by highlighting and responding to this story.

How to Fix a Lagging MySQL Replication

A few weeks ago, we added a new slave to a 22TB MySQL server. The time to transfer the data, play innobackupex apply_log, the slave was already way behind the master. Things started to worsen during the weekend as the server performed a RAID check which slowed down the replication even more. With about 100 million writes a day on that cluster, we started the week with a good 500.000 seconds lag.

Replication lag is a frequent issue with loaded MySQL clusters. It can become critical when the lag gets too important: missing data when the slaves are used for reading, temporary data loss when losing the master… In our case, it blocks the cluster migration to GTID until the replication fully catches up.

Many people on the Web had the same problem but no one provided a comprehensive answer to that problem so I had to dig into MySQL documentation and internals to understand how to fix that.

Following the replication catching up

First, the setup:

  • Bi Xeon E5–2660 v3 20 core, 40 threads, 256GB RAM
  • 24 7200 RPM hard disks of 4TB each, RAID 10.
  • Percona Server 5.7.17–11–1 on Debian Jessie
  • 100 million writes / day (~1150 queries / second)
  • No reads, because of the lag

Multi-threaded replication

MySQL introduced multi-threaded replication with version 5.6. MTR has since then been improved with MySQL 5.7. It still needs to be used with caution when not using GTID or you might get into trouble.

First, we enabled parallel replication using all available cores on the server:

STOP SLAVE;
SET GLOBAL slave_parallel_workers=40;
START SLAVE;

You don't need to stop / start slave to change the slave_parallel_workers but according to the documentation MySQL won't use them until the next start slave.

Parallel replication was useless first, as the host has only one database, and the default parallel replication type works on a database lock. We switched slave_parallel_type to LOGICAL_CLOCK, and the result was tremendous.

Transactions that are part of the same binary log group commit on a master are applied in parallel on a slave. There are no cross-database constraints, and data does not need to be partitioned into multiple databases.
STOP SLAVE;
SET GLOBAL slave_parallel_type = LOGICAL_CLOCK;
START SLAVE;

Please, flush the logs before leaving

Before we found the LOGICAL_CLOCK trick, we tuned the flushing a bit.

First, we make sure that MySQL never synchronizes the binary log to disk. Instead, we let the operating system do it from time to time. Note that sync_binlog default value is 0, but we used a higher value to avoid problems instead of crash.

SET GLOBAL sync_binlog=0;

Now comes the best part.

SET GLOBAL innodb_flush_log_at_trx_commit=2;
SET GLOBAL innodb_flush_log_at_timeout=1800;

For ACID compliance, MySQL writes the contents of the InnoDB log buffer out to the log file at each transaction commit, then the log file is flushed to disk. Setting innodb_flush_log_at_trx_commit to 2 makes the flush happen every second (depending on the system load). This means that, in case of crash, innodb will have to replay all the non commited transactions (up you one second here).

innodb_flush_log_at_trx_commit=2 works in pair with innodb_flush_log_at_timeout. With this setting, we ensure MySQL writes and flushes the log every 1800 second. This avoids impacting performances of binary log group commit, but you might have to replay up to 30 minutes of transaction in case of crash.

Conclusions

MySQL default settings are not meant to be used under a heavy workload. They aim at ensuring a correct replication work while ensuring ACID. After studying how our database cluster is used, we were able to decide that ACID was less a priority and catch up with our lagging replication.

Remember: if there's a problem, there's a solution. And if there's no solution, then there's no problem. So:

  • Read the manual. The solution is often hidden there.
  • Read the source when the documentation is not enough.
  • Connect the dots (like innodb_flush_log_at_trx_commit + innodb_flush_log_at_timeout)
  • Make sure you understand what you do
  • Always have Baloo to proofread your article and tell you when you misunderstood parts of the doc and their consequences 😜.

Photo: 白士 李


How to Fix a Lagging MySQL Replication was originally published in Fred Thoughts on Medium, where people are continuing the conversation by highlighting and responding to this story.