Tuning MariaDB configuration for high throughput data warehousing

I have been suffering under the abysmal performance of MariaDB this week, as I have been trying to run a large extract-transform-load (ETL) process into my newest data warehouse. It’s times like this that I really miss SQL Server. But to be fair, it isn’t that SQL server is amazing and MariaDB is terrible – it is simply that I don’t know how to configure MariaDB for high performance loads. So this week, I rolled up my sleeves and started to dig into the dizzying world of a MariaDB DBA. Below there is a brief description of things to consider, followed by an example configuration file.

Configuration of a database server can be broken into a number of different components:

  1. Memory.

    SQL Server’s default is mildly outrageous – it simply uses as much memory as it can get its hands on. Obviously, this can lead to disastrous results for the OS or any other application on the server. A pragmatic DBA will set the max memory setting to ensure that SQL Server doesn’t crash the whole system

    MariaDB takes the opposite approach: it severely limits the memory in the buffer pool to a puny 128MB. There is loads of great information about configuring a more appropriate level. MariaDB’s documentation is a good place to start:
    https://mariadb.com/kb/en/mariadb/xtradbinnodb-buffer-pool/#innodb_buffer_pool_size
    https://mariadb.com/kb/en/mariadb/xtradbinnodb-server-system-variables/#innodb_buffer_pool_size

    Along with the buffer pool size, it makes sense to configure the number of buffer pool instance to reduce memory contention. Check out the documentation: https://mariadb.com/kb/en/xtradbinnodb-server-system-variables/#innodb_buffer_pool_instances

  2. I/O: disks, database data files and database log files

    Disks: regardless of your database preferences fast, dedicated drives are the way to go. Ideally, there would be separate drives for your database system files, database data files and database log files. Spreading these across separate disks helps spread I/O and reduces the risk of fatal corruption.

    Database data files: as mentioned above, think about putting these on their own drive both for security and to ensure that there is dedicated I/O where it is most needed – on your data and your indexes! Unfortunately, I don’t have this option at the moment – everything is sitting on one big SAN. But I can ensure that MariaDB is using a separate data file for each database table using the –innodb_file_per_table setting.

    Database log files: this is something I need to dig into much more. Does MariaDB maintain a separate log file for each schema, or just one master log file?  For now, I have tweaked the –innodb_log_file_size=1024MB and –innodb_log_files_in_group=8 parameters. Reading the documentation, I think this means I get 8 x 1 GB log files (although there is a warning that the combined size can be no more than 4 GB). So I need to review this configuration and see what acutally gets set.

  3. Schema design

    As well as configuring the database instance, you want your database design to me top-notch. Fast disks and loads of memory aren’t much help is your queries are doing full table scans over huge tables…

Current configuration setting:

# my.cnf

[mysqld]
innodb_buffer_pool_size=20480M
innodb_buffer_pool_instances=8
innodb_log_file_size=1024M
innodb_log_files_in_group=8
join_buffer_size=2048m
Advertisements

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: