GENESIS: Documentation
Related Documentation:
Publication Review
Current reviews of modeling papers are based entirely on reading text. In contrast
the review process we propose will actually start with a first phase of automated
model validation. This validation process will be exercised by the author
themselves and provide feedback to the author on possible concerns regarding
their model. This facility will allow the author to either correct, or specifically
provide arguments as to why the model is worthy of publication anyway.
In effect this system will therefore iteratively develop a set of accepted
basic standards for model publication, while providing an opportunity
to argue for the extension or replacement of those standards. Once this
level of automated review is passed, the model will be made available
to reviewers, who can evaluate the authors arguments with respect to
core features of the model, and then consider the additional capabilities
of the model as demonstrated by the figures and their annotation. It is
important to note that the modular nature of the core infrastructure
for this project will allow us to take advantage of other approaches to
model evaluation [3]. The proposed process is illustrated in the following
figure:
- Verify model: This step in the Publication-Workflow is to ensure that
the lineage starting point of the model employed for simulations is a
previously published source model. Model verification is performed on
the morphology, cable discretization, equations for different membrane
and synaptic channel types and their kinetics, model parameter values,
and dynamic response to specific current injection and voltage clamp
protocols.
- Validate model: In the absence of a pre-existing model, model
validation is completed by the generation of tables and figures reporting
model verification. These data are acquired from the Model-Container.
If the model is based on a pre-existing published model, validation
is completed by a comparison of the two models. Any differences
between the two models are identified, characterized, quantified, and
reported. This step can be performed by a user prior to submission to
a publication data base.
- Parameter Space: The issue of parameter space analysis is important
and complex and still very much an area of open research, especially
for complex models [2, 5]. However, there are mechanisms available
to examine the sensitivity of core modeling results to parameter
variations [1]. We anticipate that a model publication system is likely
to accelerate the development of this aspect of model analysis and
validation.
Implementation Details
Space does not allow a full description of the implementation details.
However, there are a few key components to the system that are now briefly
described:
- Database Format: Each database supports
system-specific, project-specific, and user-specific views. The system
view abstracts system specific configurations such as the filesystem
layout. The project view is a container for project specific data such as
models and source code. The user view gives the user the flexibility to
run simulations without initiating a new research project. The existing
GENESIS repositories implement these views based on system process
environment settings.
The database structure has three major features: (i) It complements
the steps of the User Workflow, (ii) It provides a database of text,
figures, and annotations, and (iii) Defines relationships between other
objects in the database.
Based on the existing Documentation System each database described
below will initially be implemented as a YAML-tagged flat file system.
This approach, also in use by popular databases in neuroscience
(eg. www.neuromorpho.org) allows convenient storage of the highly
heterogeneous data typical of neuroscience models using specially
formatted text based files. Other advantages include easy manipulation,
archiving, and synchronization between databases using the file
distribution technology and replication technology developed for the
Project Browser.
The databases that will be developed to support the Publication-Workflow
complement the databases of the User-Workflow:
- Publication Narrative Component Repository: The
database with the narrative components of a publication
serves two purposes: the definition of collections of fragments
of simulation configurations allows traditional-style publication
figures to be reproduced. Secondly, inserting free-style text
paragraphs and combining them with the figures allows for
profound discussion sections. Also personal notes, reviewer
comments, and computer generated annotations can be inserted.
- Publication
Object Relationships Repository: The relationships between
all the objects of a model publication are defined in this database.
As an example, the main menu of a publication is defined as a
narrative component, and is linked to its contents via a set of
records in this database. Not only does this allow customization
of the main menu to the contents of a model publication, but it
also provides for the definition of secondary reader-flows through
one or more publications. The implementation will be based
on the YAML tags that were developed for the Documentation
System. For a typical model publication we envisage that at
least the following flows will be defined: ’history’, outlining the
history of the model, ’present’, giving an overview of related
models, and ’tutorial’, tutorializing the functions implemented
by the published model. In addition to these narrative flows, we
will also define flows that are specific to verification, validation,
and review. As an example, every single-cell model should come
with an automatically generated report that guarantees model
parameters are within their physiological range. The publication
system described here greatly expands the current limitations of
paper publication that are bound by a predefined and linear flow.
- Database Tools: We plan to develop additional interfaces
for the sole purpose of examining, exploring, querying, and
comparing the models in the database, including their micro-
and macro-evolution. For example we will develop tools that
visualize the channel distribution over a morphology and compare
the different implementations of a specific channel type. As
an extension of the automated benchmark test suite other
software under development includes robots that validate the
static structure of the models in the database, including parameter
range testers, morphological constraint testers and robots that
check for shared features in a model. Server side based, this
framework needs further extension before it also becomes useful to
researchers. We also plan to develop robots that detect database
model similarities and propose actions to the user.
- Database Distribution and Replication Strategies: The
Project-Browser distributes the data of a single research project
over multiple computers. After creation of data on any one
computer the Project Browser distribution policy replicates
parts of the project database on different computers as required
using a secure shell as a transport mechanism. The same data
distribution mechanism also facilitates collaborations between
labs that trust one another. Research projects typically proceed
by incrementally gathering result data from simulations, so we
plan to evaluate other policies that provide explicit support for
incremental distribution of data, such as the rsync protocol.
Electronic publication submission replicates the research project database at
the publisher’s hardware infrastructure. Both automated and manual review
processes generate new data that is stored in the database. These data can
then be compiled into a review report that becomes an integral part of the
electronic publication.
Accepted electronic publications can be replicated at a local computer of a
reader using the same distribution policies. By default the most important
components of a research project will be replicated, but options will be
available to replicate all data in the research project, including meta-data
that was produced during the review.
- Integrity, Access Control and DRM: Digital rights management (DRM,
also sometimes called Information Rights Management) is a generic term for
access control to a database to protect confidential content. DRM will be
implemented using cryptographic key pairs and checksums. Each account in
the publication system will generate a public and private key. The private
key is stored on a user computer and is one half of user identification, while
the public key completes user identification and can be duplicated and
stored on many computers. Public/private key pairs can also belong to
’machine’ accounts, for example for robot based automated model
verification. To protect the system from unauthorized access and to preserve
data integrity cryptographic checksums on the content of the files in
the database are compared to authenticate both queries and data.
Optionally, these checksums can then be signed using an account’s private
key.
Computing and comparing the cryptographic checksums after pulling data
from an external database guarantees the origin and integrity of the pulled
annotations, including reviewer comments and computer validation
test results. This implementation guarantees the scientific quality
of a model that is considered for starting a research project. The
result will be an infrastructure to maintain a distributed database of
computational neuroscience models, their annotations, and narrative
flows.
The DRM system designed for the publication system will integrate the
distributed nature of a research project database and the different roles a
user can have when reading and modifying an electronic publication. The
design is similar to that of well-known service-driven websites such as
Facebook (www.facebook.com) and Blogger (www.blogger.com).
Adoption of the Publication System
It is possible that investigators might raise objections to adopting the publication
system outlined in this proposal particularly with regard to licensing and
copyright. There are two recognized blocks to truly reproducible research (a
fundamental purpose of both the scientific method and any comprehensive
electronic publication system): firstly the lack of reward for producing
reproducible work and secondly the legal obstacles to the full sharing of
methods, code, data, figures and publication [4]. The first objection is
overcome through the provision of lineage tracking and attribution by the
proposed publication system. To resolve the second block we will evaluate a
number of alternative licensing and copyright procedures, one of which is
the Reproducible Research Standard (RSS). This provides a legal basis
to encourage replicable scientific investigation through attribution, the
facilitation of greater collaboration, and the promotion of the engagement of
the larger community in scientific learning and discovery [4]. In brief,
RRS is a compilation of other licenses by treating individual compendium
elements of a scientific publication separately. It can be used to rescind the
aspects of copyright that prevent scientists from sharing important research
information. It eliminates the viral license proliferation inherent in the domain of
computational based research when using other licenses such as the GPL and
BSD.
References
[1] P Achard and E De Schutter. Complex parameter landscape for a
complex neuron model. PLoS Computational Biology, 2:e94, 2006.
[2] P Baldi, M Vanier, and JM Bower. On the use of bayesian methods
for evaluating compartmental neural models. Journal of Computational
Neuroscience, 5:285–314, 1998.
[3] RC Cannon and G D’Alessandro. The ion channel inverse problem:
Neuroinformatics meets biophysics. PLoS Computational Biology, 2:e91,
2006.
[4] V Stodden. The legal framework for reproducible scientific research:
Licensing and copyright. Computing in Science and Engineering,
11:35–40, 2009.
[5] MC Vanier and JM Bower. A comparative survey of automated
parameter-search methods for compartmental neural models. Journal of
Computational Neuroscience, 7(2):149–171, 1999.