METADATA MANAGEMENT AND SEMANTICS IN MICROARRAY REPOSITORIES
Kocabaş F1,2,*, Can T3, Baykal N1
*Corresponding Author: Fahri Kocabaş, NATO HQ C3S, Blvd Leopold III B, 1110 Brussels, Belgium; Tel.: +32-2-707-5533; Fax: +32-2-707-5834; E-mail: FK:fahri@ii.metu.edu.tr; f.kocabas@hq.nato.int
page: 49

RESULTS AND DISCUSSION

There is a rising volume of microarray data. The challenge is if we can provide meaning as well as structure and syntax to this information space for automated means. The summary part of the records on microarray repositories and related publications are not synchronized, not appropriately structured. They are in freetext format. The statements are usually incomplete and ambiguous, thus not easily comparable with others in similar studies. The results should be visible, understandable, and usable throughout their life cycles. This is an information management principle. Once we structure (MAdmc) and encode the contextual data (SemNet), not only certain operations such as discovery and exchange become feasible, but also hidden and previously unavailable facts may be extracted from such structured and encoded data sets. The structured entry paradigm can also be enforced in addition to annotation via ontology within a SemNet. If one searches MAdmr (MAdmc and SemNets), it will be more efficient than a search on GEO for domain specific information at present. It is something like sorting data before an efficient search. It is the process of linking data for which the resources-properties-relationships are identified. MAdmf brings about an overhead, but future benefits will justify this start-up cost. Describing data in a structured manner can be better done in a database, but microarray information space includes several microarray repositories, experimenter web sites, publications, and specialized databases. Practically, they cannot all be stored in a database or easily be federated. If all parties could have agreed to use MAGE-OM object model and MAGE- ML exchange platform, there would have been no format, exchange and integration issues. But, this is unlikely and there will always be different implementations that bring about exchange and interoperability problems. Note that metadata cards and semantic nets can also be used in a MAGE-OM/MAGE-ML based repository. We can say that the microarray domain includes semi-structured data that can be best managed with SemWeb technology. SemWeb emphasizes the use of metadata standards and connected data to support data centric operations. The proposed framework, MAdmf follows SemWeb paradigm. The microarray community should adopt such a data centric approach because the operations are data intensive. Data management is the vehicle for data centric initiatives, and an IT system is as weak as its data management. A data layer is built separately than the business logic layer in futureproof applications. MAdmf is related to the data layer. It promotes the data standardization on microarray repositories. Any modelling or application development effort can then follow its use. We examined the MINiML file and introduced an extended format for a metadata card in this study. We created domain-specific SemNets and offered their posting to an ebXML based metadata registry, which provides a shared information space. Thus, in the proposed framework: 1) the producer can add structured data and the consumer can get the conveyed meaning (what has been received is limited to what has been understood), 2) due to the possibility for more automation, backlog is reduced in curation work (from submitted records to GEO Series or GEO Series to GEO Datasets or GEO Datasets to Array Express records), 3) ambiguity and redundancy is reduced with standard format and additional semantics, 4) data centric approach is adopted, and the quality and expressiveness of data are promoted where a separate data layer from business logic is maintained, 5) consumers reach data otherwise unavailable (new entries in descriptive information and semantic layer), 6) life cycle management (lifetime modification and living data set) concept is introduced, 7) visibility, understandability and usability are enforced, 8) users can use W3C and the public-domain tools to extract data, 9) the controlled vocabularies (Countries, Date/Time Group, Names) are used not only to annotate but also to encode the metadata and data, 10) the produced metadata card and its associated SemNet(s) are extendable, integrable, queryable and exchangeable, 11) microarray records and subsequent entries (publication, specialized databases) can be synchronized. The extension on the MINiML file has three aspects. First, content is detailed in summary and experimenters. Second, format is materialized through the employment of data and syntax encoding schemes. The organization and structure is improved with the introduction of layers, additional metadata elements and attributes. Third, the process is extended with the new concepts such as life cycle management, metadata registry use, and structured entry. In this manner, the MINiML file has been transformed into a metadata card and its semantics is extended with SemNets. Then, they can be used in any similar data center. The people, experiment, and result data are linked as the proposed framework provides such a foundation. Thus, for example, a meta-analyst can get a consolidated summary of the result part of all breast cancer data sets by using a SPARQL query. The originator, the curator, the developers and other experimenters may benefit from this framework. We give the specification and present key products in a case study where a proof of concept is introduced. The MAGE-ML and MINiML seem to be alternative structures but they are not in reality. The MINiML is an intermediary data structure, whereas a MAGEML application can be developed onto. The creation of MAdmc and SemNet includes two different and complementary contributions to support MINiML towards a format and exchange standard. They do not replace any existing work. However, if adopted, they can be a focus for discovery, integration and exchange. The SemNets can be created for other parts of microarray record, in addition to the experimenter and summary data. Note also that this study can easily be adapted to other microarray repositories or high throughput repositories. There is up to a 3% monthly increase in records at GEO in recent years. There is a backlog of up to 20% in Series records for varying reasons. There is also a serious backlog of 80% in Dataset transformation (GSE to GDS) tasks performed by GEO curators. This is likely to increase because the amount of data and its complexity are on the rise (Table 5). An RDF-enabled database that provides both reasoning and ontology modeling capabilities, may consume metadata card and SemNets. Another one could be a semantic platform that connects heterogeneous data contained in microarray repositories and related publications. One can combine people, location, organization, and date information with experimentation results across microarray information space to formulate complex inquiries over SemNets and metadata cards. Moreover, the development of knowledge interoperable systems with a separate data layer can be facilitated with such a mode of operation on data. Equally, rule based systems can make use of the summary portion of a microarray record that is structured and encoded. Standardization studies like this one, which promote machine understandability and semantic interoperability, are required. This study not only brings metadata card and semantic net concepts within a format standard approach but also introduces the importance of the life cycle management, data management and structured entry concepts. Such a study will be beneficial, especially for producers, curators, future experimenters and system developers, whether they employ manual or automated means. The experimental data, encoded formats, and program, can be requested from the corresponding author.



Number 26
VOL. 26(1), 2023
Number 25
VOL. 25(2), 2022
Number 25
VOL. 25 (1), 2022
Number 24
VOL. 24(2), 2021
Number 24
VOL. 24(1), 2021
Number 23
VOL. 23(2), 2020
Number 22
VOL. 22(2), 2019
Number 22
VOL. 22(1), 2019
Number 22
VOL. 22, 2019 Supplement
Number 21
VOL. 21(2), 2018
Number 21
VOL. 21 (1), 2018
Number 21
VOL. 21, 2018 Supplement
Number 20
VOL. 20 (2), 2017
Number 20
VOL. 20 (1), 2017
Number 19
VOL. 19 (2), 2016
Number 19
VOL. 19 (1), 2016
Number 18
VOL. 18 (2), 2015
Number 18
VOL. 18 (1), 2015
Number 17
VOL. 17 (2), 2014
Number 17
VOL. 17 (1), 2014
Number 16
VOL. 16 (2), 2013
Number 16
VOL. 16 (1), 2013
Number 15
VOL. 15 (2), 2012
Number 15
VOL. 15, 2012 Supplement
Number 15
Vol. 15 (1), 2012
Number 14
14 - Vol. 14 (2), 2011
Number 14
The 9th Balkan Congress of Medical Genetics
Number 14
14 - Vol. 14 (1), 2011
Number 13
Vol. 13 (2), 2010
Number 13
Vol.13 (1), 2010
Number 12
Vol.12 (2), 2009
Number 12
Vol.12 (1), 2009
Number 11
Vol.11 (2),2008
Number 11
Vol.11 (1),2008
Number 10
Vol.10 (2), 2007
Number 10
10 (1),2007
Number 9
1&2, 2006
Number 9
3&4, 2006
Number 8
1&2, 2005
Number 8
3&4, 2004
Number 7
1&2, 2004
Number 6
3&4, 2003
Number 6
1&2, 2003
Number 5
3&4, 2002
Number 5
1&2, 2002
Number 4
Vol.3 (4), 2000
Number 4
Vol.2 (4), 1999
Number 4
Vol.1 (4), 1998
Number 4
3&4, 2001
Number 4
1&2, 2001
Number 3
Vol.3 (3), 2000
Number 3
Vol.2 (3), 1999
Number 3
Vol.1 (3), 1998
Number 2
Vol.3(2), 2000
Number 2
Vol.1 (2), 1998
Number 2
Vol.2 (2), 1999
Number 1
Vol.3 (1), 2000
Number 1
Vol.2 (1), 1999
Number 1
Vol.1 (1), 1998

 

 


 About the journal ::: Editorial ::: Subscription ::: Information for authors ::: Contact
 Copyright © Balkan Journal of Medical Genetics 2006