Efficient In-Database Analytics through Embedding MySQL into R

Show full item record

Title: Efficient In-Database Analytics through Embedding MySQL into R
Author: Boggaram Gopinath, Chandra Mohan
Advisors: Dr. Steffen Heber, Committee Member
Dr. Nagiza F.Samatova, Committee Chair
Dr. Kemafor Anyanwu, Committee Member
Abstract: High-performance analytics of data at extreme scales is a well-recognized challenge by both scientific and business communities. The goal of this Master’s thesis is to explore effective and efficient ways of performing statistical analysis of the data stored in largescale relational databases (DB). The underlying hypothesis is that in-database analytics offers a plausible solution to this challenge by coupling analytical and database capabilities together. Such a coupling may let analytical workflows to be executed without moving the data out of the databases and therefore avoid transferring the data over the network. Therefore, in-database analytics may potentially reduce the overall latency, assure better data governance and security, and scale analytical solutions to larger data sets with more efficient resource utilization. In-database analytics can be realized through the following two complementary approaches: (a) analytics-in-DB places analytical workflows inside a DB server and (b) DB-in-analytics embeds the DB server into the memory space of analytical routines. The former has been primarily driven by the database community through various mechanisms, such as user defined functions, stored procedures, compiled codes, etc. The latter is an emerging approach dominated byopen source, robust, and scalable solutions in their infancy. The focus of this Master’s thesis is on developing an open source and efficient in-database analytics solution via embedding a MySQL server into an R statistical data analysis environment. To the best of our knowledge, this is the first study that integrates analytical capabilities of R with a MySQL database management system in an embedded manner. Specifically, the three novel ways for embedded DB-in-analytics are proposed and systematically evaluated. In contrast to existing wrapper-based approaches that provide wrapper APIs to MySQL functions, the proposed embedded solutions improve the time efficiency of R’s access to the MySQL DB by 650% to 1900%.
Date: 2009-02-08
Degree: MS
Discipline: Computer Science
URI: http://www.lib.ncsu.edu/resolver/1840.16/2781


Files in this item

Files Size Format View
etd.pdf 1.746Mb PDF View/Open

This item appears in the following Collection(s)

Show full item record