Efficient In-Database Analytics through Embedding MySQL into R

No Thumbnail Available

Date

2009-02-08

Journal Title

Series/Report No.

Journal ISSN

Volume Title

Publisher

Abstract

High-performance analytics of data at extreme scales is a well-recognized challenge by both scientific and business communities. The goal of this Master’s thesis is to explore effective and efficient ways of performing statistical analysis of the data stored in largescale relational databases (DB). The underlying hypothesis is that in-database analytics offers a plausible solution to this challenge by coupling analytical and database capabilities together. Such a coupling may let analytical workflows to be executed without moving the data out of the databases and therefore avoid transferring the data over the network. Therefore, in-database analytics may potentially reduce the overall latency, assure better data governance and security, and scale analytical solutions to larger data sets with more efficient resource utilization. In-database analytics can be realized through the following two complementary approaches: (a) analytics-in-DB places analytical workflows inside a DB server and (b) DB-in-analytics embeds the DB server into the memory space of analytical routines. The former has been primarily driven by the database community through various mechanisms, such as user defined functions, stored procedures, compiled codes, etc. The latter is an emerging approach dominated byopen source, robust, and scalable solutions in their infancy. The focus of this Master’s thesis is on developing an open source and efficient in-database analytics solution via embedding a MySQL server into an R statistical data analysis environment. To the best of our knowledge, this is the first study that integrates analytical capabilities of R with a MySQL database management system in an embedded manner. Specifically, the three novel ways for embedded DB-in-analytics are proposed and systematically evaluated. In contrast to existing wrapper-based approaches that provide wrapper APIs to MySQL functions, the proposed embedded solutions improve the time efficiency of R’s access to the MySQL DB by 650% to 1900%.

Description

Keywords

MySQL, R, , BridgeR, In-database Analytics technology

Citation

Degree

MS

Discipline

Computer Science

Collections