Empilement

Voilà un problème que n’importe quel sysadmin rencontrera si un MySQL traîne dans les serveurs qu’il administre. Cela peut venir d’un problème de configuration (un pool de connexion trop petit), mais bien souvent, le problème viendra d’un problème applicatif. Cet article traite du troubleshooting de ce problème.

Public visé : administrateurs systèmes.

Par Jacques Roussel, Consultant Cloud @Objectif Libre.

MySQL : Too many connections !

Les symptômes

On trouve le log suivant trop fréquemment :

ERROR 1040 (HY000): Too many connections

Bien souvent, cela résulte d’un processus qui pose un lock sur une table et empêche tous les autres d’effectuer leurs requêtes.

Environnement

Pour simuler ce problème, nous avons juste besoin d’un conteneur MySQL.

└─ $ docker run --name mariadb -e MYSQL_ROOT_PASSWORD=root -d mariadb
└─ $ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
b291c30bde66 mariadb "docker-entrypoint.s…" 21 seconds ago Up 19 seconds 3306/tcp mariadb

Quand notre conteneur est prêt, on peut commencer.

Premier terminal

└─ $ docker exec -it mariadb bash
root@b291c30bde66:/# mysql -u root -p
Enter password:
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 8
Server version: 10.3.12-MariaDB-1:10.3.12+maria~bionic mariadb.org binary distribution

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> show processlist;
+----+-------------+-----------+------+---------+------+--------------------------+------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+-------------+-----------+------+---------+------+--------------------------+------------------+----------+
| 1 | system user | | NULL | Daemon | NULL | InnoDB purge coordinator | NULL | 0.000 |
| 2 | system user | | NULL | Daemon | NULL | InnoDB purge worker | NULL | 0.000 |
| 3 | system user | | NULL | Daemon | NULL | InnoDB purge worker | NULL | 0.000 |
| 4 | system user | | NULL | Daemon | NULL | InnoDB purge worker | NULL | 0.000 |
| 5 | system user | | NULL | Daemon | NULL | InnoDB shutdown handler | NULL | 0.000 |
| 8 | root | localhost | NULL | Query | 0 | Init | show processlist | 0.000 |
+----+-------------+-----------+------+---------+------+--------------------------+------------------+----------+
6 rows in set (0.000 sec)

MariaDB [(none)]> create database demo; use demo; create table demo (col1 VARCHAR(100));
Query OK, 1 row affected (0.001 sec)

Database changed
Query OK, 0 rows affected (0.035 sec)

On voit ici un show processlist classique avec peu de requêtes en cours. La dernière partie nous permet de créer notre base de données de démonstration avec une table qui contient une colonne.

Deuxième terminal

On lance un programme qui écrit régulièrement dans notre table :

└─ $ docker exec -it mariadb bash
root@b291c30bde66:/# while true; do mysql -h 127.0.0.1 -u root -proot -e "use demo; insert into demo (col1) values ('$(date)');" & sleep 2; done

À ce moment, si on fait un use demo; select * from demo sur notre premier terminal, on constate que les enregistrements augmentent :

MariaDB [demo]> select * from demo;
+------------------------------+
| col1 |
+------------------------------+
| Wed Feb 13 16:54:03 UTC 2019 |
| Wed Feb 13 16:54:45 UTC 2019 |
| Wed Feb 13 16:54:47 UTC 2019 |
| Wed Feb 13 16:54:49 UTC 2019 |
| Wed Feb 13 16:54:51 UTC 2019 |
| Wed Feb 13 16:54:53 UTC 2019 |
| Wed Feb 13 16:54:55 UTC 2019 |
| Wed Feb 13 16:54:57 UTC 2019 |
| Wed Feb 13 16:54:59 UTC 2019 |
| Wed Feb 13 16:55:01 UTC 2019 |
| Wed Feb 13 16:55:03 UTC 2019 |
| Wed Feb 13 16:55:05 UTC 2019 |
| Wed Feb 13 16:55:07 UTC 2019 |
| Wed Feb 13 16:55:09 UTC 2019 |
| Wed Feb 13 16:55:11 UTC 2019 |
| Wed Feb 13 16:55:13 UTC 2019 |
| Wed Feb 13 16:55:15 UTC 2019 |
| Wed Feb 13 16:55:17 UTC 2019 |
| Wed Feb 13 16:55:19 UTC 2019 |
| Wed Feb 13 16:55:21 UTC 2019 |
| Wed Feb 13 16:55:23 UTC 2019 |
| Wed Feb 13 16:55:25 UTC 2019 |
| Wed Feb 13 16:55:27 UTC 2019 |
| Wed Feb 13 16:55:29 UTC 2019 |
| Wed Feb 13 16:55:31 UTC 2019 |
| Wed Feb 13 16:55:33 UTC 2019 |
| Wed Feb 13 16:55:35 UTC 2019 |
| Wed Feb 13 16:55:37 UTC 2019 |
| Wed Feb 13 16:55:39 UTC 2019 |
| Wed Feb 13 16:55:41 UTC 2019 |
| Wed Feb 13 16:55:43 UTC 2019 |
| Wed Feb 13 16:55:45 UTC 2019 |
| Wed Feb 13 16:55:47 UTC 2019 |
| Wed Feb 13 16:55:49 UTC 2019 |
| Wed Feb 13 16:55:51 UTC 2019 |
| Wed Feb 13 16:55:53 UTC 2019 |
| Wed Feb 13 16:55:55 UTC 2019 |
| Wed Feb 13 16:55:57 UTC 2019 |
| Wed Feb 13 16:55:59 UTC 2019 |
| Wed Feb 13 16:56:01 UTC 2019 |
| Wed Feb 13 16:56:03 UTC 2019 |
| Wed Feb 13 16:56:05 UTC 2019 |
| Wed Feb 13 16:56:07 UTC 2019 |
+------------------------------+
43 rows in set (0.001 sec)

Un show processlist reste normal puisque les écritures se font bien.

Troisième terminal

Nous allons maintenant poser un lock sur la table :

└─ $ docker exec -it mariadb bash
root@b291c30bde66:/# mysql -u root -proot -h 127.0.0.1 -e "use demo; LOCK TABLES demo READ; SELECT sleep(3600)"

À présent, si nous faisons un show processlist sur notre premier terminal, on commence à observer la catastrophe :

MariaDB [demo]> show processlist;
+-----+-------------+-----------------+------+---------+------+---------------------------------+-----------------------------------------------------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+-----+-------------+-----------------+------+---------+------+---------------------------------+-----------------------------------------------------------------+----------+
| 1 | system user | | NULL | Daemon | NULL | InnoDB purge coordinator | NULL | 0.000 |
| 2 | system user | | NULL | Daemon | NULL | InnoDB purge worker | NULL | 0.000 |
| 3 | system user | | NULL | Daemon | NULL | InnoDB purge worker | NULL | 0.000 |
| 4 | system user | | NULL | Daemon | NULL | InnoDB purge worker | NULL | 0.000 |
| 5 | system user | | NULL | Daemon | NULL | InnoDB shutdown handler | NULL | 0.000 |
| 8 | root | localhost | demo | Query | 0 | Init | show processlist | 0.000 |
| 269 | root | 127.0.0.1:47534 | demo | Query | 12 | User sleep | SELECT sleep(3600) | 0.000 |
| 270 | root | 127.0.0.1:47536 | demo | Query | 12 | Waiting for table metadata lock | insert into demo (col1) values ('Wed Feb 13 17:03:38 UTC 2019') | 0.000 |
| 271 | root | 127.0.0.1:47540 | demo | Query | 10 | Waiting for table metadata lock | insert into demo (col1) values ('Wed Feb 13 17:03:40 UTC 2019') | 0.000 |
| 272 | root | 127.0.0.1:47542 | demo | Query | 8 | Waiting for table metadata lock | insert into demo (col1) values ('Wed Feb 13 17:03:42 UTC 2019') | 0.000 |
| 273 | root | 127.0.0.1:47544 | demo | Query | 6 | Waiting for table metadata lock | insert into demo (col1) values ('Wed Feb 13 17:03:44 UTC 2019') | 0.000 |
| 274 | root | 127.0.0.1:47546 | demo | Query | 4 | Waiting for table metadata lock | insert into demo (col1) values ('Wed Feb 13 17:03:46 UTC 2019') | 0.000 |
| 275 | root | 127.0.0.1:47548 | demo | Query | 2 | Waiting for table metadata lock | insert into demo (col1) values ('Wed Feb 13 17:03:48 UTC 2019') | 0.000 |
| 276 | root | 127.0.0.1:47552 | demo | Query | 0 | Waiting for table metadata lock | insert into demo (col1) values ('Wed Feb 13 17:03:50 UTC 2019') | 0.000 |
+-----+-------------+-----------------+------+---------+------+---------------------------------+-----------------------------------------------------------------+----------+
14 rows in set (0.000 sec)

Les connexions s’empilent et nous verrons bientôt un Too many connections sur notre terminal 2.

Troubleshoot

Le troubleshoot de ce genre de problème est assez simple.

La première chose à savoir : c’est souvent la requête la plus ancienne (sur notre premier terminal) qui pose problème. A faire alors :

MariaDB [demo]> show processlist;
+-----+-------------+-----------------+------+---------+------+---------------------------------+-----------------------------------------------------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+-----+-------------+-----------------+------+---------+------+---------------------------------+-----------------------------------------------------------------+----------+
| 1 | system user | | NULL | Daemon | NULL | InnoDB purge coordinator | NULL | 0.000 |
| 2 | system user | | NULL | Daemon | NULL | InnoDB purge worker | NULL | 0.000 |
| 3 | system user | | NULL | Daemon | NULL | InnoDB purge worker | NULL | 0.000 |
| 4 | system user | | NULL | Daemon | NULL | InnoDB purge worker | NULL | 0.000 |
| 5 | system user | | NULL | Daemon | NULL | InnoDB shutdown handler | NULL | 0.000 |
| 8 | root | localhost | demo | Query | 0 | Init | show processlist | 0.000 |
| 269 | root | 127.0.0.1:47534 | demo | Query | 196 | User sleep | SELECT sleep(3600) | 0.000 | <----- requêtes la plus vieille (notre sleep avec lock)
| 270 | root | 127.0.0.1:47536 | demo | Query | 195 | Waiting for table metadata lock | insert into demo (col1) values ('Wed Feb 13 17:03:38 UTC 2019') | 0.000 | 
| 271 | root | 127.0.0.1:47540 | demo | Query | 193 | Waiting for table metadata lock | insert into demo (col1) values ('Wed Feb 13 17:03:40 UTC 2019') | 0.000 |
| 272 | root | 127.0.0.1:47542 | demo | Query | 191 | Waiting for table metadata lock | insert into demo (col1) values ('Wed Feb 13 17:03:42 UTC 2019') | 0.000 |
| 273 | root | 127.0.0.1:47544 | demo | Query | 189 | Waiting for table metadata lock | insert into demo (col1) values ('Wed Feb 13 17:03:44 UTC 2019') | 0.000 |
| 274 | root | 127.0.0.1:47546 | demo | Query | 187 | Waiting for table metadata lock | insert into demo (col1) values ('Wed Feb 13 17:03:46 UTC 2019') | 0.000 |
| 275 | root | 127.0.0.1:47548 | demo | Query | 185 | Waiting for table metadata lock | insert into demo (col1) values ('Wed Feb 13 17:03:48 UTC 2019') | 0.000 |
| 276 | root | 127.0.0.1:47552 | demo | Query | 183 | Waiting for table metadata lock | insert into demo (col1) values ('Wed Feb 13 17:03:50 UTC 2019') | 0.000 |
| 277 | root | 127.0.0.1:47554 | demo | Query | 181 | Waiting for table metadata lock | insert into demo (col1) values ('Wed Feb 13 17:03:52 UTC 2019') | 0.000 |
| 278 | root | 127.0.0.1:47556 | demo | Query | 179 | Waiting for table metadata lock | insert into demo (col1) values ('Wed Feb 13 17:03:54 UTC 2019') | 0.000 |
| 279 | root | 127.0.0.1:47560 | demo | Query | 177 | Waiting for table metadata lock | insert into demo (col1) values ('Wed Feb 13 17:03:56 UTC 2019') | 0.000 |
| 280 | root | 127.0.0.1:47562 | demo | Query | 175 | Waiting for table metadata lock | insert into demo (col1) values ('Wed Feb 13 17:03:58 UTC 2019') | 0.000 |
| 281 | root | 127.0.0.1:47566 | demo | Query | 173 | Waiting for table metadata lock | insert into demo (col1) values ('Wed Feb 13 17:04:00 UTC 2019') | 0.000 |
| 282 | root | 127.0.0.1:47568 | demo | Query | 171 | Waiting for table metadata lock | insert into demo (col1) values ('Wed Feb 13 17:04:02 UTC 2019') | 0.000 |
| 283 | root | 127.0.0.1:47596 | demo | Query | 169 | Waiting for table metadata lock | insert into demo (col1) values ('Wed Feb 13 17:04:04 UTC 2019') | 0.000 |
| 284 | root | 127.0.0.1:47600 | demo | Query | 167 | Waiting for table metadata lock | insert into demo (col1) values ('Wed Feb 13 17:04:06 UTC 2019') | 0.000 |
| 285 | root | 127.0.0.1:47604 | demo | Query | 165 | Waiting for table metadata lock | insert into demo (col1) values ('Wed Feb 13 17:04:08 UTC 2019') | 0.000 |
| 286 | root | 127.0.0.1:47608 | demo | Query | 163 | Waiting for table metadata lock | insert into demo (col1) values ('Wed Feb 13 17:04:10 UTC 2019') | 0.000 |
| 287 | root | 127.0.0.1:47610 | demo | Query | 161 | Waiting for table metadata lock | insert into demo (col1) values ('Wed Feb 13 17:04:12 UTC 2019') | 0.000 |
| 288 | root | 127.0.0.1:47612 | demo | Query | 159 | Waiting for table metadata lock | insert into demo (col1) values ('Wed Feb 13 17:04:14 UTC 2019') | 0.000 |
| 289 | root | 127.0.0.1:47616 | demo | Query | 157 | Waiting for table metadata lock | insert into demo (col1) values ('Wed Feb 13 17:04:16 UTC 2019') | 0.000 |

Une fois qu’on a identifié la requête qui semble gêner, il faut trouver le processus à l’origine de cette requête. Si on regarde dans la colonne host, on a une adresse ip et un port. Grâce à ce port et cette adresse, on peut trouver notre processus fautif.

Quatrième terminal

On exécute :

└─ $ docker exec -it mariadb bash
root@b291c30bde66:/# lsof -Pn|grep 47534
mysql 1026 root 3u IPv4 655966 0t0 TCP 127.0.0.1:47534->127.0.0.1:3306 (ESTABLISHED)

Nous avons maintenant l’ID de notre processus. Il ne reste plus qu’à voir qui le lance :

root@b291c30bde66:/# ps faux|grep -C 5 1026
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1601 0.0 0.0 18508 3492 pts/3 Ss 17:09 0:00 bash
root 1736 0.0 0.0 34544 2948 pts/3 R+ 17:10 0:00 \_ ps faux
root 1737 0.0 0.0 11464 1092 pts/3 S+ 17:10 0:00 \_ grep --color=auto -C 5 1026
root 193 0.0 0.0 18508 3472 pts/2 Ss 16:45 0:00 bash
root 1026 0.0 0.0 40724 8248 pts/2 S+ 17:03 0:00 \_ mysql -u root -px xx -h 127.0.0.1 -e use demo; LOCK TABLES demo READ; SELECT sleep(3600) <-------------------- Et voilà notre coupable
root 184 0.0 0.0 18612 3544 pts/1 Ss 16:45 0:00 bash
root 1027 0.0 0.0 40724 8204 pts/1 S 17:03 0:00 \_ mysql -h 127.0.0.1 -u root -px xx -e use demo; insert into demo (col1) values ('Wed Feb 13 17:03:38 UTC 2019');
root 1030 0.0 0.0 40724 8228 pts/1 S 17:03 0:00 \_ mysql -h 127.0.0.1 -u root -px xx -e use demo; insert into demo (col1) values ('Wed Feb 13 17:03:40 UTC 2019');
root 1033 0.0 0.0 40724 8072 pts/1 S 17:03 0:00 \_ mysql -h 127.0.0.1 -u root -px xx -e use demo; insert into demo (col1) values ('Wed Feb 13 17:03:42 UTC 2019');
root 1036 0.0 0.0 40724 7348 pts/1 S 17:03 0:00 \_ mysql -h 127.0.0.1 -u root -px xx -e use demo; insert into demo (col1) values ('Wed Feb 13 17:03:44 UTC 2019');

Après s’être assuré que l’interruption du processus bloquant (dans notre cas, nous voyons que ce n’est pas un update par exemple) ne présente aucun risque pour les données, on peut kill le processsus :

root@b291c30bde66:/# kill -9 1026

Notre base de données redevient rapidement normale (terminal 1) :

MariaDB [demo]> show processlist;
+----+-------------+-----------+------+---------+------+--------------------------+------------------+----------+
| Id | User        | Host      | db   | Command | Time | State                    | Info             | Progress |
+----+-------------+-----------+------+---------+------+--------------------------+------------------+----------+
|  1 | system user |           | NULL | Daemon  | NULL | InnoDB purge coordinator | NULL             |    0.000 |
|  2 | system user |           | NULL | Daemon  | NULL | InnoDB purge worker      | NULL             |    0.000 |
|  3 | system user |           | NULL | Daemon  | NULL | InnoDB purge worker      | NULL             |    0.000 |
|  4 | system user |           | NULL | Daemon  | NULL | InnoDB purge worker      | NULL             |    0.000 |
|  5 | system user |           | NULL | Daemon  | NULL | InnoDB shutdown handler  | NULL             |    0.000 |
|  8 | root        | localhost | demo | Query   |    0 | Init                     | show processlist |    0.000 |
+----+-------------+-----------+------+---------+------+--------------------------+------------------+----------+
6 rows in set (0.000 sec)

Conclusion

Ce type de troubleshoot n’est pas sans risque pour les données. En effet, souvent la requête qui pose problème fait un update des données et son arrêt brutal laissera la base de données dans un état non-intègre. Toutefois, la méthode donnée ici vous aidera à déterminer rapidement quels process sont à l’origine des incidents sur votre MySQL.