During many scientific investigations and maintenance in engineering, projects experience architectural erosion and drift of information, making maintenance tasks challenging. This effort investigates ways to recover the structure of a project’s information archives using a balance between the cohesion of the current architecture and the architecture imposed by the underlying information space.
An example of drifts in architectures is what software engineers face over a long development phase and maintenance period with a code base. Over time, software project architectures diverge from their original design and cease to follow their written documentation due to the dual effect of architectural erosion and drift. Architectural recovery techniques can be used to prevent erosion and drift, by recommending restructuring opportunities to developers, or to treat these conditions, by recovering the architectures of software projects for which the architectural decomposition has been lost. We focus on designing algorithms to seamlessly combine program structures with natural language context of the code. Our algorithms rely on a suit of coordinated clustering, classification, and natural language processing techniques that efficiently discover lexical and structural information from the code base to provide a coherent decomposition of a code repository.