Empilement

This is an issue most sysadmins have already met if they are used to deal with a MySQL server. This issue can come from misconfiguration (a connection pool too small) but it often arises from a problem in the application. In this article, we will focus on the latter case.

Targeted audience : system administrators.

By Jacques Roussel, Cloud Consultant @Objectif Libre.

Symptoms

We see this log too many times:

ERROR 1040 (HY000): Too many connections

Often, it is the result of a process locking a table and preventing other processes from executing their requests.

Environment

In order to simulate the problem, we just need a MySQL container.

└─ $ docker run --name mariadb -e MYSQL_ROOT_PASSWORD=root -d mariadb
└─ $ docker ps
CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES
b291c30bde66 mariadb "docker-entrypoint.s…" 21 seconds ago Up 19 seconds 3306/tcp mariadb

When the container is ready, we can start.

First terminal

└─ $ docker exec -it mariadb bash
root@b291c30bde66:/# mysql -u root -p
Enter password:
Welcome to the MariaDB monitor. Commands end with ; or \g.
Your MariaDB connection id is 8
Server version: 10.3.12-MariaDB-1:10.3.12+maria~bionic mariadb.org binary distribution

Copyright (c) 2000, 2018, Oracle, MariaDB Corporation Ab and others.

Type 'help;' or '\h' for help. Type '\c' to clear the current input statement.

MariaDB [(none)]> show processlist;
+----+-------------+-----------+------+---------+------+--------------------------+------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+----+-------------+-----------+------+---------+------+--------------------------+------------------+----------+
| 1 | system user | | NULL | Daemon | NULL | InnoDB purge coordinator | NULL | 0.000 |
| 2 | system user | | NULL | Daemon | NULL | InnoDB purge worker | NULL | 0.000 |
| 3 | system user | | NULL | Daemon | NULL | InnoDB purge worker | NULL | 0.000 |
| 4 | system user | | NULL | Daemon | NULL | InnoDB purge worker | NULL | 0.000 |
| 5 | system user | | NULL | Daemon | NULL | InnoDB shutdown handler | NULL | 0.000 |
| 8 | root | localhost | NULL | Query | 0 | Init | show processlist | 0.000 |
+----+-------------+-----------+------+---------+------+--------------------------+------------------+----------+
6 rows in set (0.000 sec)

MariaDB [(none)]> create database demo; use demo; create table demo (col1 VARCHAR(100));
Query OK, 1 row affected (0.001 sec)

Database changed
Query OK, 0 rows affected (0.035 sec)

We see here a classic show processlit. The last command creates a database and a table with one column.

Second terminal

We start a process that regularly writes a line in the table :

└─ $ docker exec -it mariadb bash
root@b291c30bde66:/# while true; do mysql -h 127.0.0.1 -u root -proot -e "use demo; insert into demo (col1) values ('$(date)');" & sleep 2; done

Right now, if we do a use demo; select * from demo on our first terminal, we see that records increase:

MariaDB [demo]> select * from demo;
+------------------------------+
| col1 |
+------------------------------+
| Wed Feb 13 16:54:03 UTC 2019 |
| Wed Feb 13 16:54:45 UTC 2019 |
| Wed Feb 13 16:54:47 UTC 2019 |
| Wed Feb 13 16:54:49 UTC 2019 |
| Wed Feb 13 16:54:51 UTC 2019 |
| Wed Feb 13 16:54:53 UTC 2019 |
| Wed Feb 13 16:54:55 UTC 2019 |
| Wed Feb 13 16:54:57 UTC 2019 |
| Wed Feb 13 16:54:59 UTC 2019 |
| Wed Feb 13 16:55:01 UTC 2019 |
| Wed Feb 13 16:55:03 UTC 2019 |
| Wed Feb 13 16:55:05 UTC 2019 |
| Wed Feb 13 16:55:07 UTC 2019 |
| Wed Feb 13 16:55:09 UTC 2019 |
| Wed Feb 13 16:55:11 UTC 2019 |
| Wed Feb 13 16:55:13 UTC 2019 |
| Wed Feb 13 16:55:15 UTC 2019 |
| Wed Feb 13 16:55:17 UTC 2019 |
| Wed Feb 13 16:55:19 UTC 2019 |
| Wed Feb 13 16:55:21 UTC 2019 |
| Wed Feb 13 16:55:23 UTC 2019 |
| Wed Feb 13 16:55:25 UTC 2019 |
| Wed Feb 13 16:55:27 UTC 2019 |
| Wed Feb 13 16:55:29 UTC 2019 |
| Wed Feb 13 16:55:31 UTC 2019 |
| Wed Feb 13 16:55:33 UTC 2019 |
| Wed Feb 13 16:55:35 UTC 2019 |
| Wed Feb 13 16:55:37 UTC 2019 |
| Wed Feb 13 16:55:39 UTC 2019 |
| Wed Feb 13 16:55:41 UTC 2019 |
| Wed Feb 13 16:55:43 UTC 2019 |
| Wed Feb 13 16:55:45 UTC 2019 |
| Wed Feb 13 16:55:47 UTC 2019 |
| Wed Feb 13 16:55:49 UTC 2019 |
| Wed Feb 13 16:55:51 UTC 2019 |
| Wed Feb 13 16:55:53 UTC 2019 |
| Wed Feb 13 16:55:55 UTC 2019 |
| Wed Feb 13 16:55:57 UTC 2019 |
| Wed Feb 13 16:55:59 UTC 2019 |
| Wed Feb 13 16:56:01 UTC 2019 |
| Wed Feb 13 16:56:03 UTC 2019 |
| Wed Feb 13 16:56:05 UTC 2019 |
| Wed Feb 13 16:56:07 UTC 2019 |
+------------------------------+
43 rows in set (0.001 sec)

Meanwhile, a show processlist output will look alright as the writings are happening.

Third terminal

We lock a table:

└─ $ docker exec -it mariadb bash
root@b291c30bde66:/# mysql -u root -proot -h 127.0.0.1 -e "use demo; LOCK TABLES demo READ; SELECT sleep(3600)"

Right now, if we do a show processlist on our first terminal, we start to see the disaster coming:

MariaDB [demo]> show processlist;
+-----+-------------+-----------------+------+---------+------+---------------------------------+-----------------------------------------------------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+-----+-------------+-----------------+------+---------+------+---------------------------------+-----------------------------------------------------------------+----------+
| 1 | system user | | NULL | Daemon | NULL | InnoDB purge coordinator | NULL | 0.000 |
| 2 | system user | | NULL | Daemon | NULL | InnoDB purge worker | NULL | 0.000 |
| 3 | system user | | NULL | Daemon | NULL | InnoDB purge worker | NULL | 0.000 |
| 4 | system user | | NULL | Daemon | NULL | InnoDB purge worker | NULL | 0.000 |
| 5 | system user | | NULL | Daemon | NULL | InnoDB shutdown handler | NULL | 0.000 |
| 8 | root | localhost | demo | Query | 0 | Init | show processlist | 0.000 |
| 269 | root | 127.0.0.1:47534 | demo | Query | 12 | User sleep | SELECT sleep(3600) | 0.000 |
| 270 | root | 127.0.0.1:47536 | demo | Query | 12 | Waiting for table metadata lock | insert into demo (col1) values ('Wed Feb 13 17:03:38 UTC 2019') | 0.000 |
| 271 | root | 127.0.0.1:47540 | demo | Query | 10 | Waiting for table metadata lock | insert into demo (col1) values ('Wed Feb 13 17:03:40 UTC 2019') | 0.000 |
| 272 | root | 127.0.0.1:47542 | demo | Query | 8 | Waiting for table metadata lock | insert into demo (col1) values ('Wed Feb 13 17:03:42 UTC 2019') | 0.000 |
| 273 | root | 127.0.0.1:47544 | demo | Query | 6 | Waiting for table metadata lock | insert into demo (col1) values ('Wed Feb 13 17:03:44 UTC 2019') | 0.000 |
| 274 | root | 127.0.0.1:47546 | demo | Query | 4 | Waiting for table metadata lock | insert into demo (col1) values ('Wed Feb 13 17:03:46 UTC 2019') | 0.000 |
| 275 | root | 127.0.0.1:47548 | demo | Query | 2 | Waiting for table metadata lock | insert into demo (col1) values ('Wed Feb 13 17:03:48 UTC 2019') | 0.000 |
| 276 | root | 127.0.0.1:47552 | demo | Query | 0 | Waiting for table metadata lock | insert into demo (col1) values ('Wed Feb 13 17:03:50 UTC 2019') | 0.000 |
+-----+-------------+-----------------+------+---------+------+---------------------------------+-----------------------------------------------------------------+----------+
14 rows in set (0.000 sec)

The connections are stacking and we are goind to see soon enough some Too many connections occurring on our second terminal.

Troubleshoot

To troubleshoot this kind of issue is quite simple.

The first thing to know is that the oldest request is often the request that is actually locking the table. To find it, we do (on our first terminal):

MariaDB [demo]> show processlist;
+-----+-------------+-----------------+------+---------+------+---------------------------------+-----------------------------------------------------------------+----------+
| Id | User | Host | db | Command | Time | State | Info | Progress |
+-----+-------------+-----------------+------+---------+------+---------------------------------+-----------------------------------------------------------------+----------+
| 1 | system user | | NULL | Daemon | NULL | InnoDB purge coordinator | NULL | 0.000 |
| 2 | system user | | NULL | Daemon | NULL | InnoDB purge worker | NULL | 0.000 |
| 3 | system user | | NULL | Daemon | NULL | InnoDB purge worker | NULL | 0.000 |
| 4 | system user | | NULL | Daemon | NULL | InnoDB purge worker | NULL | 0.000 |
| 5 | system user | | NULL | Daemon | NULL | InnoDB shutdown handler | NULL | 0.000 |
| 8 | root | localhost | demo | Query | 0 | Init | show processlist | 0.000 |
| 269 | root | 127.0.0.1:47534 | demo | Query | 196 | User sleep | SELECT sleep(3600) | 0.000 | <----- requêtes la plus vieille (notre sleep avec lock)
| 270 | root | 127.0.0.1:47536 | demo | Query | 195 | Waiting for table metadata lock | insert into demo (col1) values ('Wed Feb 13 17:03:38 UTC 2019') | 0.000 | 
| 271 | root | 127.0.0.1:47540 | demo | Query | 193 | Waiting for table metadata lock | insert into demo (col1) values ('Wed Feb 13 17:03:40 UTC 2019') | 0.000 |
| 272 | root | 127.0.0.1:47542 | demo | Query | 191 | Waiting for table metadata lock | insert into demo (col1) values ('Wed Feb 13 17:03:42 UTC 2019') | 0.000 |
| 273 | root | 127.0.0.1:47544 | demo | Query | 189 | Waiting for table metadata lock | insert into demo (col1) values ('Wed Feb 13 17:03:44 UTC 2019') | 0.000 |
| 274 | root | 127.0.0.1:47546 | demo | Query | 187 | Waiting for table metadata lock | insert into demo (col1) values ('Wed Feb 13 17:03:46 UTC 2019') | 0.000 |
| 275 | root | 127.0.0.1:47548 | demo | Query | 185 | Waiting for table metadata lock | insert into demo (col1) values ('Wed Feb 13 17:03:48 UTC 2019') | 0.000 |
| 276 | root | 127.0.0.1:47552 | demo | Query | 183 | Waiting for table metadata lock | insert into demo (col1) values ('Wed Feb 13 17:03:50 UTC 2019') | 0.000 |
| 277 | root | 127.0.0.1:47554 | demo | Query | 181 | Waiting for table metadata lock | insert into demo (col1) values ('Wed Feb 13 17:03:52 UTC 2019') | 0.000 |
| 278 | root | 127.0.0.1:47556 | demo | Query | 179 | Waiting for table metadata lock | insert into demo (col1) values ('Wed Feb 13 17:03:54 UTC 2019') | 0.000 |
| 279 | root | 127.0.0.1:47560 | demo | Query | 177 | Waiting for table metadata lock | insert into demo (col1) values ('Wed Feb 13 17:03:56 UTC 2019') | 0.000 |
| 280 | root | 127.0.0.1:47562 | demo | Query | 175 | Waiting for table metadata lock | insert into demo (col1) values ('Wed Feb 13 17:03:58 UTC 2019') | 0.000 |
| 281 | root | 127.0.0.1:47566 | demo | Query | 173 | Waiting for table metadata lock | insert into demo (col1) values ('Wed Feb 13 17:04:00 UTC 2019') | 0.000 |
| 282 | root | 127.0.0.1:47568 | demo | Query | 171 | Waiting for table metadata lock | insert into demo (col1) values ('Wed Feb 13 17:04:02 UTC 2019') | 0.000 |
| 283 | root | 127.0.0.1:47596 | demo | Query | 169 | Waiting for table metadata lock | insert into demo (col1) values ('Wed Feb 13 17:04:04 UTC 2019') | 0.000 |
| 284 | root | 127.0.0.1:47600 | demo | Query | 167 | Waiting for table metadata lock | insert into demo (col1) values ('Wed Feb 13 17:04:06 UTC 2019') | 0.000 |
| 285 | root | 127.0.0.1:47604 | demo | Query | 165 | Waiting for table metadata lock | insert into demo (col1) values ('Wed Feb 13 17:04:08 UTC 2019') | 0.000 |
| 286 | root | 127.0.0.1:47608 | demo | Query | 163 | Waiting for table metadata lock | insert into demo (col1) values ('Wed Feb 13 17:04:10 UTC 2019') | 0.000 |
| 287 | root | 127.0.0.1:47610 | demo | Query | 161 | Waiting for table metadata lock | insert into demo (col1) values ('Wed Feb 13 17:04:12 UTC 2019') | 0.000 |
| 288 | root | 127.0.0.1:47612 | demo | Query | 159 | Waiting for table metadata lock | insert into demo (col1) values ('Wed Feb 13 17:04:14 UTC 2019') | 0.000 |
| 289 | root | 127.0.0.1:47616 | demo | Query | 157 | Waiting for table metadata lock | insert into demo (col1) values ('Wed Feb 13 17:04:16 UTC 2019') | 0.000 |

When the request has been identified, we have to find the process which is executing it. In the host column, we have an IP address and a port. With this information, we can find the ‘ill’ process.

Fourth terminal

We do:

└─ $ docker exec -it mariadb bash
root@b291c30bde66:/# lsof -Pn|grep 47534
mysql 1026 root 3u IPv4 655966 0t0 TCP 127.0.0.1:47534->127.0.0.1:3306 (ESTABLISHED)

So we have the process ID, now we just need to find the program:

root@b291c30bde66:/# ps faux|grep -C 5 1026
USER PID %CPU %MEM VSZ RSS TTY STAT START TIME COMMAND
root 1601 0.0 0.0 18508 3492 pts/3 Ss 17:09 0:00 bash
root 1736 0.0 0.0 34544 2948 pts/3 R+ 17:10 0:00 \_ ps faux
root 1737 0.0 0.0 11464 1092 pts/3 S+ 17:10 0:00 \_ grep --color=auto -C 5 1026
root 193 0.0 0.0 18508 3472 pts/2 Ss 16:45 0:00 bash
root 1026 0.0 0.0 40724 8248 pts/2 S+ 17:03 0:00 \_ mysql -u root -px xx -h 127.0.0.1 -e use demo; LOCK TABLES demo READ; SELECT sleep(3600) <-------------------- Here it is
root 184 0.0 0.0 18612 3544 pts/1 Ss 16:45 0:00 bash
root 1027 0.0 0.0 40724 8204 pts/1 S 17:03 0:00 \_ mysql -h 127.0.0.1 -u root -px xx -e use demo; insert into demo (col1) values ('Wed Feb 13 17:03:38 UTC 2019');
root 1030 0.0 0.0 40724 8228 pts/1 S 17:03 0:00 \_ mysql -h 127.0.0.1 -u root -px xx -e use demo; insert into demo (col1) values ('Wed Feb 13 17:03:40 UTC 2019');
root 1033 0.0 0.0 40724 8072 pts/1 S 17:03 0:00 \_ mysql -h 127.0.0.1 -u root -px xx -e use demo; insert into demo (col1) values ('Wed Feb 13 17:03:42 UTC 2019');
root 1036 0.0 0.0 40724 7348 pts/1 S 17:03 0:00 \_ mysql -h 127.0.0.1 -u root -px xx -e use demo; insert into demo (col1) values ('Wed Feb 13 17:03:44 UTC 2019');

Before killing the process, we check that the request is not detrimental to the database integrity. In our exemple, it is not an update so we can kill it without taking any risk:

root@b291c30bde66:/# kill -9 1026

Our database recovers quickly (terminal 1):

MariaDB [demo]> show processlist;
+----+-------------+-----------+------+---------+------+--------------------------+------------------+----------+
| Id | User        | Host      | db   | Command | Time | State                    | Info             | Progress |
+----+-------------+-----------+------+---------+------+--------------------------+------------------+----------+
|  1 | system user |           | NULL | Daemon  | NULL | InnoDB purge coordinator | NULL             |    0.000 |
|  2 | system user |           | NULL | Daemon  | NULL | InnoDB purge worker      | NULL             |    0.000 |
|  3 | system user |           | NULL | Daemon  | NULL | InnoDB purge worker      | NULL             |    0.000 |
|  4 | system user |           | NULL | Daemon  | NULL | InnoDB purge worker      | NULL             |    0.000 |
|  5 | system user |           | NULL | Daemon  | NULL | InnoDB shutdown handler  | NULL             |    0.000 |
|  8 | root        | localhost | demo | Query   |    0 | Init                     | show processlist |    0.000 |
+----+-------------+-----------+------+---------+------+--------------------------+------------------+----------+
6 rows in set (0.000 sec)

As a conclusion,

This kind of troubleshooting is not exempt of risk. As the request locking the table can be often updating a huge amount of data. Killing it may lead to an inconsistent database. However, this method can help you quickly find which process is locking your MySQL database.