High quality open source applications have academic articles published about their development, features, and usage. The following excerpt is from one discussing database reverse engineering using OSCAR as the case. A very technical analysis, it shows a level of transparency that is almost never found in COTS (commercial off the shelf) software. The following summarizes the history, technical architecture and DB Schema of OSCAR. The rest of the paper goes into details about their reverse engineering of OSCARs evolution.
Understanding database schema evolution: A case study Anthony Cleve et all – University of Namur, Belgium, University of Victoria, Canada
2. Context: The OSCAR system OSCAR (Open Source Clinical Application Resource) is full-featured Electronic Medical Record (EMR) software system for primary care clinics. It has been under development since 2001 and is widely used in hundreds of clinics across Canada. As an open source project, OSCAR has a broad and active community of users and developers. The Department of Family Practice at McMaster University, which has managed OSCAR development efforts from inception to 2012, has recently transferred oversight of ongoing development to a newly formed not-for-profit company called OSCAR-EMR. This move was motivated by a new regulatory requirement to undergo ISO certification (ISO 13485 Medical devices – Quality management systems).
OSCAR architecture. OSCAR has a Web application architecture following the classical 3-tier paradigm. It employs a Javabased technology stack, making use of Java Server Pages (JSP), Enterprise Java Beans (J2EE) and several frameworks such as Spring, Struts and Hibernate. The source code comprises approximately two million lines of code with a rough distribution of 600 kLOC for the application logic, 1200 kLOC for the presentation layer and 100 kLOC for the persistence layer. OSCAR uses MySQL as the relational database engine and a combination of different ways to access it, including Hibernate object relational middleware, Java Persistence Architecture (JPA) and dynamic SQL (via JDBC). The reason for this combination of technologies is the constant and ongoing evolution history of the product, which originated from JDBC, via Hibernate to JPA.
Oscar database. The OSCAR database schema has over 440 tables and many thousands of attributes. At the time of conducting our study, the database schema of the OSCAR distribution did not contain any information on relationships between tables (foreign keys) and no documentation was available about the schema. We later learned that the missing relationships were due to the evolution history of OSCAR, which has been using the older MyISAM database engine provided by MySQL that does not support foreign keys. A port to the newer InnoDB engine is underway, which will eventually allow foreign keys to be defined explicitly.
OSCAR software repositories. The OSCAR community utilizes a range of software repositories and tools, including a feature request and bug tracking system (provided by Sourceforge), a source code submission and review system (Gerrit Code Review), a git-based configuration management system, a community Wiki (based on Plone) and three active mailing lists (one for developers and two for users of different levels of technical expertise).
The need to understand the database schema. The OSCAR database has grown organically over many years and knowledge about its internal structure is distributed among pockets of developers who have been contributing to specific functions of the system (e.g., prescription writer, representation of lab results etc.). Our need to understand the OSCAR database schema originated from our involvement in a project with the goal to develop software for a primary care research network (PCRN). The purpose of the PCRN is to integrate health information kept in primary care EMR software in order to make them accessible to medical research and data mining. An important step in developing the PCRN software is to create “export conduits” for transferring health data from the EMR into a research database for subsequent query processing. Due to its popularity (second largest market share in British Columbia) and openness, OSCAR has been chosen as one of the first EMR products to interface with the emerging PCRN.
While designing early versions of the PCRN data migration adapter for OSCAR, we found that we were running into questions pertaining to the database schema. Of course, as could be expected for any heavily evolved, real-world system, some of them had to do with the fact that the database schema lacked documentation. Moreover, the schema did not contain any declared relationships (foreign keys). Other questions were of a more semantic and puzzling nature. For example, when attempting to design the function to export data on patient immunization records, we found two seemingly unrelated schema structures covering the same semantic issue. One schema structure revolved around tables entitled “immunizations” and “configimmunization” while the other schema structure revolved around tables entitled “preventions” and “preventionsext”. During our project we found that taking into consideration the evolution history of the database schema was helpful in answering questions like these. (We found out that the “prevention” structured superseded the “immunization” structure but has still been retained in order to deal with legacy data.) This motivated us to investigate more formally OSCAR’s evolution history and develop methods and tools to help with this investigation.