Optimization and Horizontal Aggregation in SQL for Streamlined Data Mining

Main Article Content

V. Anitha
Neeraj Sharma
Dr Vijay Kumar Burugari
Mrs Prathibha Ganapuram

Abstract

Compiling a dataset for analysis represents a pivotal yet frequently time-consuming stage in data mining endeavors. This involves executing intricate SQL queries that include table joins and column aggregations. However, conventional SQL aggregations have limitations, typically generating one column per aggregated group. Consequently, manually constructing datasets with the required horizontal layout becomes a significant endeavor. Addressing this challenge, we present a collection of straightforward and effective methods designed to automate the generation of SQL code. These methods streamline the return of aggregated columns in a horizontal tabular format, presenting a set of numbers rather than a singular number per row. Referred to as "horizontal aggregations,” these functions shape datasets in a de-normalized, horizontal layout, such as point-dimension, observation variable, or instance-feature—aligning with the preferred standard format of most data mining algorithms. Our proposed evaluation methods for horizontal aggregations encompass CASE, leveraging the programming CASE construct; SPJ, grounded in standard relational algebra operators (SPJ queries); and PIVOT, making use of the PIVOT operator found in specific database management systems.

Article Details

Section
Articles