Efficient In-Database Analytics through Embedding MySQL into R
No Thumbnail Available
Files
Date
2009-02-08
Authors
Journal Title
Series/Report No.
Journal ISSN
Volume Title
Publisher
Abstract
High-performance analytics of data at extreme scales is a well-recognized challenge
by both scientific and business communities. The goal of this Master’s thesis is to explore
effective and efficient ways of performing statistical analysis of the data stored in largescale
relational databases (DB). The underlying hypothesis is that in-database analytics
offers a plausible solution to this challenge by coupling analytical and database capabilities
together. Such a coupling may let analytical workflows to be executed without moving
the data out of the databases and therefore avoid transferring the data over the network.
Therefore, in-database analytics may potentially reduce the overall latency, assure better
data governance and security, and scale analytical solutions to larger data sets with more
efficient resource utilization.
In-database analytics can be realized through the following two complementary approaches:
(a) analytics-in-DB places analytical workflows inside a DB server and (b) DB-in-analytics
embeds the DB server into the memory space of analytical routines. The former has been
primarily driven by the database community through various mechanisms, such as user defined
functions, stored procedures, compiled codes, etc. The latter is an emerging approach
dominated byopen source, robust, and scalable solutions in their infancy.
The focus of this Master’s thesis is on developing an open source and efficient in-database
analytics solution via embedding a MySQL server into an R statistical data analysis environment.
To the best of our knowledge, this is the first study that integrates analytical
capabilities of R with a MySQL database management system in an embedded manner.
Specifically, the three novel ways for embedded DB-in-analytics are proposed and systematically
evaluated. In contrast to existing wrapper-based approaches that provide wrapper
APIs to MySQL functions, the proposed embedded solutions improve the time efficiency of
R’s access to the MySQL DB by 650% to 1900%.
Description
Keywords
MySQL, R, , BridgeR, In-database Analytics technology
Citation
Degree
MS
Discipline
Computer Science