How do you test a relational database's schema?

12 June 2014

Updated: 23 June 2017

flickr photo by danielmsteinberg shared under a Creative Commons (BY-NC-ND) license

Database applications are very common and there has been much attention to testing them and the individual database management systems with which they interact. Yet, there has been very little work devoted to testing arguably the most important artifact involving an application supported by a relational database — the underlying schema!

The development of a database schema is a process open to flaws like any stage of application development. Examples of potential flaws in database schemas include incomplete primary keys, incorrect foreign keys, and omissions of NOT NULL declarations. The schema's cornerstone nature to a database application means that defects need to be found early in order to prevent knock-on effects to other parts of an application and the spiralling bug-fixing costs that may be incurred.

In this important area of testing for relational database schemas, there are many challenges that researchers and developers need to address. Some recent advances in this field have focused on automatically generating test data to exercise the constraints in the database schema and assessing the effectiveness of the generated data through the use of mutation analysis.

The paper (Kapfhammer, McMinn, & Wright, 2013) introduces a search-based technique for generating database table data with the intention of exercising the integrity constraints placed on table columns. The testing tool described in this paper, SchemaAnalyst, can automatically generate data that will both satisfy and negate the constraints in the database schema.

Of course, the process of automatically generating test data raises the question "well, how good is this data?" The paper (Wright, Kapfhammer, & McMinn, 2013) introduces efficient techniques for answering this question. The presented mutation analysis methods insert simple faults into the database's schema and check to see whether or not the test cases can find them. Intuitively, the tests are not very good if they can not find these faults!

In recent empirical studies designed to evaluate the effectiveness of SchemaAnalyst we compared it to a popular tool for generating table data, DBMonster. With competitive or faster data generation times, our method outperforms DBMonster in terms of both the schema constraint coverage and mutation adequacy scores.

Interested in learning more about this topic? Since this blog post was first written, my colleagues and students and I have published several additional papers about the testing of relational database schemas, with the most noteworthy one being (McMinn, Wright, & Kapfhammer, 2015). If you are interested in using SchemaAnalyst to test your own database schema, then please download and use the tool, which is now available from the GitHub site schemaanalyst/schemaanalyst.

Enjoy this post? If so, please read, SEED Interview with Rebecca Dilla, my most recent article.



Like my work? Support it!