Authors:
(1) Limeng Zhang, Centre for Research on Engineering Software Technologies (CREST), The University of Adelaide, Australia;
(2) M. Ali Babar, Centre for Research on Engineering Software Technologies (CREST), The University of Adelaide, Australia.
1.1 Configuration Parameter Tuning Challenges and 1.2 Contributions
3 Overview of Tuning Framework
4 Workload Characterization and 4.1 Query-level Characterization
4.2 Runtime-based Characterization
5 Feature Pruning and 5.1 Workload-level Pruning
5.2 Configuration-level Pruning
7 Configuration Recommendation and 7.1 Bayesian Optimization
10 Discussion and Conclusion, and References
In addition to characterizing a workload in terms of queries, a workload can also be characterized by its runtime characteristics. Modern DBMSs provide extensive information about workload running behavior [8]. For example, MySQL’s InnoDB engine [24] provides statistics on the number of pages read/written, query cache utilization, and locking overhead. OtterTune [8] characterizes a workload using such numeric metrics, such as the number of pages read/written, query cache utilization, and locking overhead, etc, to reflect various aspects of its runtime behavior. Additionally, researchers can also define new performance indicators tailored to unique requirements. For instance, RelM [19] profiles the memory allocation of a BDAF application with different configuration parameters, along with customized performance metrics regarding memory management decisions at multiple levels, including the resource management level, container level, application level, and inside the Java Virtual Machine.
Apart from workload-related and runtime-related features, other factors associated with workload execution can also be considered, such as features related to running experiments and data. LITE [21] provides insights on incorporating code semantics features and scheduler features alongside model input data features (such as column number, rows, iteration number, and partition number), as well as cluster environment factors related to nodes, memory, CPU frequency, and bandwidth when generating configurations for applications in Spark.