preloader


Data transfer from MySQL to Redshift using Airflow

Data transfer from MySQL to Redshift using Airflow Image

 Is your MySQL server nowadays getting slow? Or are you trying to find another database? Whatever the use case, it is a smart decision to transfer data from MySQL to Redshift for analytics. This blog outlines the detailed steps you need to follow to migrate data from MySQL to Redshift. You'll also get a quick summary of MySQL and Amazon Redshift. Let’s get started.

MYSQL:

MySQL is a relational database management system (RDBMS) developed by Oracle based on structured query language (SQL).

MySQL benefits

MySQL is fast, reliable, scalable, and easy to use. It was originally developed to handle large databases quickly and has been used in highly demanding production environments for many years.

Although MySQL is under constant development, it offers a rich and useful set of functions. MySQL’s connectivity, speed, and security make it highly suited for accessing databases on the internet.

Amazon Redshift

Amazon Redshift is based on PostgreSQL.

Redshift is not freely offered, and it is also used along with Amazon S3 storage. But the querying is faster, and the queries are similar to those in PostgreSQL. For any amount of data, querying can be done within minutes, and this characteristic makes users select Redshift.

What is Amazon S3?

Amazon Redshift is based on PostgreSQL.

Redshift is not freely offered, and it is also used along with Amazon S3 storage. But the querying is faster, and the queries are similar to those in PostgreSQL. For any amount of data, querying can be done within minutes, and this characteristic makes users select Redshift.

MySQL to Redshift:

In airflow, we can’t directly move a large amount of data from MySQL to Redshift. Instead,

Use S3 as a mediator for both databases.

Get data from MySQL tables using a select query in airflow 

Load the data as a CSV file into S3.

Then insert that CSV file into the Redshift tables.

SQL to Amazon S3

Use SqlToS3Operator to copy data from a SQL server to an Amazon Simple Storage Service (S3) as a file.

MySQL to Amazon S3

The below sends the response of a MySQL query into a file in S3.

To get more information about this operator visit: SqlToS3Operator

Example usage:

sql_to_s3_task = SqlToS3Operator(

    task_id="sql_to_s3_task",

    sql_conn_id="mysql_default",

    aws_conn_id = ‘aws_default’,

    query=SQL_QUERY,

    s3_bucket=S3_BUCKET,

    s3_key=S3_KEY,

    replace=True,

)

In this example, we are storing data from a specific SQL query into a file in S3.

/* Sample code*/

query: the SQL query to be executed.

replace: whether or not to replace the file in S3 if it previously existed

sql_conn_id: reference to a specific database.

aws_conn_id: reference to a specific S3 connection

s3_bucket: bucket where the data will be stored.

s3_key: desired key for the file. It includes the name of the file.

Amazon S3 to Amazon Redshift

The S3ToRedshiftOperator transfers the data from an Amazon Simple Storage Service (S3) file into an Amazon Redshift table.

Amazon S3 to Amazon Redshift transfer operator

This operator loads data from an Amazon S3 file into an existing Amazon Redshift table.

To get more information about this operator visit: S3ToRedshiftOperator

Example usage:

transfer_s3_to_redshift = S3ToRedshiftOperator(

    task_id='transfer_s3_to_redshift',

    redshift_conn_id=conn_id_name,

    aws_conn_id = ‘aws_default’,

    s3_bucket=bucket_name,

    s3_key=S3_KEY_2,

    schema='PUBLIC',

    table=REDSHIFT_TABLE,

    copy_options=['csv'],

)

In this example,
We Execute a COPY command to load files from s3 to Redshift

/* Sample code*/

redshift_conn_id: reference to a specific redshift database

aws_conn_id: reference to a specific S3 connection

schema: reference to a specific schema in the redshift database

table: reference to a specific table in the redshift database

s3_bucket: reference to a specific S3 bucket

s3_key: reference to a specific S3 key

By using S3 as a mediator, we can transfer data from one database to another very easily.

Share

Top Stories