At an early stage, we determined that the GENESIS database should be usable by neuroscientists who do not have the GENESIS simulator installed. Although simulation models are the central organizing principle for the database, much of the information in the database can be useful without the need to simulate any of the models in the database. The tutorials described above are generated by the simulator, but in most cases can be explored without the simulator. As we have said, we expect these tutorials to the the main point of entry to the database for early searching in a domain. At the same time, it is important that rich interfaces be provided between the simulator and the database, so that a model established in the simulator can be easily entered into the database, and so that queries to the database about the behavior of models can be satisfied by running the models in the simulator rather than storing every possible behavior characteristic of each model (an impossible task). In addition to these two requirements, we identified three fundamentally different types of information that will be entered in the database. First and foremost are the GENESIS models, which organize all the information in the database. Second are the experimental data on which the models are based and the simulated data the models produce. Third are classical textual information sources: texts, citations, annotated images, etc.
To meet these needs, we settled on a design using a standard HTML (Hypertext Markup Language) browser interface (e.g., Netscape) as a front end to three database systems, subserving the three types of information listed above, which may be integrated into one database eventually. Although HTML browsers do not provide sufficient interface tools to run an entire database, they suit our project for two reasons. First, the tutorials can be easily organized as a set of hypertext linked pages, with hooks to database entries at appropriate points. Second, these browsers represent an enormous investment in graphical user interface design and implementation for all the common platforms (X/Unix, MS-Windows, Macintosh), which we certainly do not want to duplicate. The HTML format does provide the capability to invoke external programs which can in turn invoke their own graphical user interface (GUI) components, and we expect that this will be a major pathway for accessing the database.
The second critical design decision was to use object oriented database technology. Object technology is highly suited to neuroscience data because of its extensive heterogeneity and compositionality, and for this reason many projects in the Human Brain Program are using object technology. However, the GENESIS database has an even more compelling reason to use object oriented techniques. The GENESIS models are the structured information objects at the core of the database, forming the entries which organize all other information in the database. GENESIS models are designed and written as object oriented programs, using standard and user-supplied library objects communicating via stereotypical message types. Thus it is natural to store a GENESIS model as a collection of related objects, for which an object oriented database system is the clear choice. Moreover, since the models and the simulator itself are organized as objects, we can add to each object actions which will write the necessary code to enter the object into the database. Thus much of the work of entering and structuring models in the database can be automated. We plan to use a commercial object oriented database (UniSQL) to implement the database which accesses the GENESIS models and the experimental data. The third subsystem, for textual material, is described below.
A second important reason for mirroring the object structure of a GENESIS model in an object oriented database is that it will allow sophisticated queries to be run not only over each model as an entity but also over the component parts of the model. This is crucial in facilitating a user's understanding of a model found in the database. We expect each class of GENESIS object to define a database schema, and each instance of that class in a model to be entered as an instance of it's progenitor's schema.
There are two primary interfaces between the simulator and the database. The first allows the database to invoke the simulator, to set up a model derived from a database entry, and then either to hand over control to the simulator's user interface (if the user wishes to manually run the model), or to run the model under the control of the database in order to derive some piece of information about the model's behavior. This information could have been explicitly requested by the user, or it could be implicitly required to satisfy a query. This interface will involve creating some database classes devoted to running the simulator, and some GENESIS library objects devoted to reporting information back to the database.
The second interface is that provided by the simulator to facilitate entry of a model into the database. This will be provided by one or more GENESIS library objects which export user commands for dumping a model into the database. Each standard GENESIS library object involved in a model will be augmented with one or more actions to generate the corresponding database schema and to instantiate the schema in the database for each element derived from that object in the model.
For the experimental and simulated data we will provide schema that can store and manipulate data of the different types. For example, an electrical recording schema will provide methods for computing statistics of spiking behavior, and these methods can be invoked by the user in queries.
The document retrieval subsystem has different requirements than the model and data subsystems described above, and will be implemented in a different manner. However, this retrieval system will be deployed using the same user interface described above. Here, we emphasize the word ``document" in order to draw attention to the distinction between document and data retrieval. A document is a textual representation of an object, such as, an article, book, image, audio, video, etc. For the prototype, the database will cover the literature that is used by the researchers who use GENESIS, i.e., computational neurobiology and the relevant experimental studies.
This subsystem will be a state-of-the-art full-text information retrieval (IR) system using advanced IR techniques, and will be a client/server system that uses SGML (Standard Generalized Markup Language) as the underlying database format in the server search engine. (See, for example, van Herwijnen, 1990.) Any valid SGML Document Type Description (DTD) could be used as the database description for a file of bibliographic records, or full-text documents. The retrieval engine will make use of statistical retrieval techniques based on the probabilistic retrieval model (Robertson et al., 1995), ranked output and relevance feedback. A number of additional approaches would be utilized, such as Boolean searching, document clustering, citation linking of documents for browsing and searching, hypermedia linking of documents and other objects outside the database (Fidel & Efthimiadis, 1995).
Users would enter queries as ``free-text" (that is, normal English prose) statements of their interest or need for topical searches. No formal ``query language" or Boolean logic imposed on the user. The graphical client interface would also include features such as the ability to accumulate bibliographies of citations seen, an extensive help system, command history and redo, multiple display formats for retrieved records and the ability to export and use the text of retrieved documents in other applications within the GENESIS system or for word-processing. The use of probabilistic IR techniques to match the users' initial query with a set of documents in the database would result in the retrieval of documents in decreasing ranked order of probable relevance to the users' query. This aids the user in subject focusing and topic/treatment discrimination. The system would also provide for direct probabilistic or Boolean searches of any indexed data elements.