Home  Project catalog  Genome Information Integration Project (GIIP)

More information

Project name Genome Information Integration Project (GIIP)
Area Genome Informatics
Purpose # construction of "all human gene catalogue" # development of annotation technique based on H-InvDB # data integration of disease, gene expression and PPI information
Introduction GIIP project was a model project of METI, which executed for 3 years since 2005 to 2007. The representative institute of the commission was Japan Biological Informatics Consortium (JBIC) and also the other six research institutes took their parts. The aim of the project is to construct an annotated human gene database of the world highest quality, based on H-Invitational Database (H-InvDB).
Keyword genome | gene | annotation | integrated | database
Started fiscal year 2005-2007
Project head Takashi Gojobori
Institute of the head Center for Information Biology and DNA Data Bank of Japan (CIB-DDBJ), National Institute of Genetics (NIG)|Biomedicinal Information Research Center (BIRC), National Institute of Advanced Industrial Science and Technology (AIST)
Budget (million yen) 1641
Representative Institute of the commission Japan Biological Informatics Consortium (JBiC)|Biomedicinal Information Research Center (BIRC), National Institute of Advanced Industrial Science and Technology (AIST)
Companies Japan Biological Informatics Consortium | National Institute of Genetics | National Institute of Advanced Industrial Science and Technology | National Cancer Center Research Institute | Hokkaido University | The University of Tokyo | Tokyo Medical and Dental University | Keio University | BITS Co., Ltd | Hitachi, Co., Ltd | Hitachi Software Engineering Co., Ltd | DYNACOM Co., Ltd | C's Lab Co., Ltd | Maze,Inc. | Fujitsu Limited | NEC Soft, Ltd
Published papers (PubMed IDs) 18089548 | Yamasaki C, et al. The H-Invitational Database (H-InvDB), a comprehensive annotation resource for human genes and transcripts. Nucleic Acids Research 36, Database issue D793-D799, 2008 17982176 | Matsuya A, et al. Evola: Ortholog database of all human genes in H-InvDB with manual curation of phylogenetic trees. Nucleic Acids Research 36, Database issue D787-D792, 2008 17130147 | Takeda J, et al. (2007) H-DBAS: Alternative splicing database of completely sequenced and manually annotated full-length cDNAs based on H-invitational. Nucleic Acids Research 35, Database issue D104-D109, 2007
Patent (Japan, overseas) JP-2006-323830 | JP-2008-097189
Archives
The report on "Genome Information Integration Project (GIIP)" 2005-2007. Japanese.

Product (Database, Tool)

Evola

Summary Evola (Evolutionary annotation database) is a database providing ortholog information of H-InvDB human genes. Evola contains orthologs of human and 13 species' genes (chimpanzee, macaque, mouse, rat, dog, horse, cow, opossum, chicken, zebrafish, medaka, Tetraodon, and fugu). Viewers of sequence alignments and phylogenetic trees, transcript variants (Locus maps), and natural selection (dN/dS view) are implemented. Duplicate gene family viewer is also available.
Data type Comparative genomics

G-compass

Summary G-compass was designed as a tool for the study of comparative genomics. It provides the data of evolutionarily conserved genomic regions and orthologous genes between human and 12 vertebrates (chimpanzee, rhesus monkey, mouse, rat, dog, cow, horse, opossum, chicken, zebrafish, medaka and Tetraodon). Information of ultraconserved elements (UCE) and copy number variable regions (CNV) are provided. Sliding window analysis and dot plot analysis are also implemented.
Data type Comparative genomics

H-ANGEL

Summary H-ANGEL is a resource which provides information on human gene expression. H-ANGEL displays expression patterns of transcriptional products generated by the H-Invitational project in practical tissue categories based on tissue-specific expression data from several experimental platforms. H-ANGEL also displays information for the expression of different genes at their corresponding physical positions in the human genome. This information is linked to the corresponding transcript or locus annotation data stored in H-InvDB.
Data type Gene expression

H-DBAS

Summary H-DBAS is a database of human alternative splicing (AS) based on H-InvDB. H-DBAS offers human AS variants identified from H-Inv full-length cDNA and published human mRNA dataset. The data of analyses such as AS pattern, AS affecting protein function and AS comparison with mouse were included. H-DBAS allows users to find AS variants by using various search keys in Advanced Search page and the results are displayed visually in AS Viewer operated by Java.
Data type Annotation

H-Exp

Summary This is a DB of Human Tissue-specific expression profile data and it was integrated and coordinated with H-InvDB. This system enables (1) fast search, sort, and view for gene cluster or isoform, (2) comparison of expression pattern for gene cluster or isoform, (3) display of detailed information for gene expression and related data.
Data type Gene expression

H-InvDB

Summary H-Invitational Database (H-InvDB) is an integrated database of human genes and transcripts. By extensive analyses of all human transcripts, we provide curated annotations of human genes and transcripts that include gene structures, alternative splicing isoforms, protein functions, etc.
Data type RNA human full-length cDNA, mRNA

HEAT

Summary H-InvDB Enrichment Analysis Tool (HEAT) is a data-mining tool for automatically identifying features specific to a given human gene set. HEAT searches for H-InvDB annotations that are significantly enriched in a user-defined gene set, as compared with the entire H-InvDB representative transcripts. This technique is called Gene Set Enrichment Analysis (GSEA), and is popularly used in analyzing results of microarray experiments. Fisher's exact probability is used in statistical tests of HEAT.
Data type Annotation

LEGENDA

Summary Legenda is the system to find articles in which any pair of gene names, diseases, and substrates are co-occurred in the abstract in MEDLINE. Co-occurrence of the same types (e.g. genes) can be searched. Legenda has its own gene name dictionary.
Data type Journal , Gene, Disease, Substrate

MDV

Summary Motif Distribution Viewer (MDV) is a web tool for visualizing the distribution of various motifs around transcription start sites (TSS) on a user-defined set of promoter sequences. The tool can be used on the original site, as well as downloaded to used locally. (cited from original site).
Data type DNA-motif

PPI view

Summary The PPI view displays H-InvDB human protein-protein interaction (PPI)information. PPI data were collected from five major public PPI databases (BIND, DIP, MINT, HPRD, IntAct) and integrated them as a non-redundant PPI dataset. As the result, we got 32,198 human PPIs comprised of 9,268 proteins. (at H-InvDB version 5.0) The PPI view displays proteins which interact with the usersCHR(39) query proteins (or gene products), and provides links to H-InvDB locus view and cDNA view, which guide you to the gene locations and the detailed gene functional annotations of these interacting proteins, respectively.PPI view: http://www.h-invitational.jp/hinv/ppi/ PPI view sample:http://h-invitational.jp/hinv/ppi/ppi_view.cgi?hip=HIP000084307
Data type Protein-Proteome

TACT

Summary Transcriptome Auto-annotation Conducting Tool (TACT) is a web-based automated prediction tool of functional annotation that was developed by integrating ORF prediction, similarity search (BLASTX and FASTY) and motif prediction (InterProScan). TACT was produced in collaboration with the H-Invitational project, and has contributed to the development of the H-Invitational Database (H-InvDB).
Data type DNA-sequence

VarySysDB

Summary This is a system to search, display, and download our research results on human polymorphism based on publicly available data and annotations of transcripts presented by H-InvDB. It provides information about single nucleotide polymorphisms (SNPs), deletion-insertion polymorphisms (DIPs), short tandem repeats (STRs), single amino acid repeats (SARs), structural variation (or copy number variations: CNVs), and their relations to the genome, transcripts, and functional domains.
Data type

Relating project