My experience connecting R with MySQL for using large files

For writing this post I borrowed heavily from:

  • Several Stackoverflow including http://stackoverflow.com/questions/4785933/adding-rmysql-package-to-r-fails
  • Some basic manuals on MySQL available online

R is an impressive tool for analyzing data, but it comes a little bit short when we want to use a large databases. This is because the data is loaded in the RAM , which poses major hardware related constraints. In contrast, I’ve seen Stata work with big databases in a fantastic way for regressions and general statistical analysis. In general, and using not so intricate functions, we should expect a good behavior of R with databases that occupy  maximum 60%-70% of your total RAM. 

Now, there are several packages that allow to operate large databases using different methods that will override the given RAM limitations, these include the package ff (http://ff.r-forge.r-project.org/bit&ff2.1-2_WU_Vienna2010.pdf), and more robust and specialized software, like Revolution Analytics (http://www.revolutionanalytics.com/).

However,  one of the solutions that seems more elegant in certain applications is to use MySQL to manage the large database and take just enough of it through queries that R does to MySQL. This allows not only to use the huge capabilities of a native relational database manager as SQL , but reserves the use of RAM for the data processing tasks in R. The possibilities have been explored in several packages, but we concentrate our attention in RMySQL.

The RMySQL package that generates the connection of R with MySQL on Windows, cannot be easily downloaded from a CRAN repository ( if you’re on Mac or Linux there is no problem ). Now, after searching several forums for a simple solution to the problem , I came to one that fits the computer I work with, whose configuration is as follows :

  • Operating System: Windows 7 64-bit (remember that a 32-bit OS has inherently limiting file sizes so it would not allow you to work with files of more than 4GB)
  • Database Manager: MySQL 5.5 64 – bit
  • R 2.15.1 ” Roasted Marshmallows ” 64-bit

We then proceed to make the required changes to the system (remember that the routes on your system may vary because of the language / Drive letter) :

1.Create paths in the system variables:

What is it? The System Variables are values in the Windows operating system,  to which programs refer in order to execute a certain duty. For example, in our case, adding the following routes tells any program that asks Windows about “Path” the location of certain folders.
How to proceed? Click Start> This PC ( Right click)> Properties > Advanced System Settings > Advanced tab > Environment Variables .
Here we will make sure that the system variable named Path possesses the following routes :
C:\Program Files\MySQL\MySQL Server 5.5\bin , C:\Program Files\R\R- 2.15.1\bin\x64

Otherwise add them up to the ones already in the value.

2. Copy the MySQL libraries to a place known for RMySQL

What is it? A library contains relevant information to which other programs refer to when they are dealing with a task. For example, say you are a program that operates an oven, and a user tells you to bake a cake. Then you would go to the “cake” library and get functions related to the temperature, variables of time to cook and other related information in order to bake the cake.

In this sense, adding these libraries to the path we just added, would simply allow R to get the “cookbook” in which it can find the information needed to connect with MySQL.

How to proceed?

  • Copy libmysql.dll , from C:\Program Files\MySQL\MySQL Server 5.5\lib to C:\Program Files\MySQL\MySQL Server 5.5\bin.
  • Copy libmysql.dll Y libmysql.lib from  C:\Program Files\MySQL\MySQL Server 5.5\lib  to  C:\Program Files\MySQL\MySQL Server 5.5\lib\opt. The latter folder must be created if it’s not there.

3 . Generate file environment R

What is it? We should inform R where we left the libraries.

How to proceed?

We create a text file called ” Renviron.site ” in C:\Program Files\R\R- 2.15.1\etc , containing the following:

MYSQL_HOME = C :/Program Files/MySQL/MySQL Server 5.5

4. Install RMySQL 0.8 and DBI

Finally we install the needed packages in R:

That’s all ! You can now use the package RMySQL .

A simple query from R:


library ( " RMySQL " )
driver = dbdriver ( "MySQL ") # Call the driver of MySQL
with = dbConnect ( driver , host = "localhost", dbname = " MyDatabase " , user = "root", pass = " pass") # Create the connection to the DB in MySQL
query = dbGetQuery ( with statement = "SELECT * FROM WHERE mytable cusno = 2" )

For more information on the package RMySQL and how to use it : http://cran.r-project.org/web/packages/RMySQL/RMySQL.pdf

You may want to take a look at this websites for more information:

Leave a comment

Filed under Uncategorized

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s