What is jcontextExplorer? 6 Why should I use jcontextExplorer? 7




Yüklə 254.61 Kb.
səhifə1/6
tarix27.02.2016
ölçüsü254.61 Kb.
  1   2   3   4   5   6



JContextExplorer

User Manual





Phillip Seitzer

Facciotti Lab

UC Davis

March 10, 2014

Version 4.0



TABLE OF CONTENTS


Chapter 1: Getting Started 5

What is JContextExplorer? 6

Why should I use JContextExplorer? 7

Chapter 2: Launching JContextExplorer 8

Where I can find JContextExplorer? 9

What do I need to do before I can launch JContextExplorer? 9

The Launch 10

Chapter 3: Using JContextExplorer 12

WIndow Layout 13

Main Frame 14

Genome Set Search Area 15

OR Statement 16

AND Statment 16

IF AND ONLY IF Clause 16

Continuous range of clusters 17

Search Options Area 18

Context Tree Options 20



Internal Frame Management Area 22

Search Results Frame 23

Export Options: Search Results Frame Menu Options 24

Context Tree Frame 26

Context Tree Menu Options 27

Phylogenetic Tree Frame 28

Additional Node Selection Options 29

Search Results Analysis Area 30

Context Viewer Multiple Genome Browser 32



Genomes Menu 38

New Genome Set 40

Import Genome Set from .gs file 41

Genome Sets 42

Manage Genome Sets 43

Current Genome Set 45

Import Genomes into Current Genome Set 47

From GenBank or .GFF Files 48

Directly from NCBI Databases 50

Import Settings 52

Feature Type Settings 53

GenBank File Options 55

NCBI Database Query Settings 56



Browse NCBI available genomes by organism name 58

Launch NCBI microbial taxonomy browser 59

Retrieve Popular Genome Set 60

Available Sets 61



Load Menu 62

Genome Sequence File(s) 63

Homology Clusters 64

Gene IDs 67

Context Set 68

Available Context Set Types 71

Context Set Filter Types 75

Dissimilarity Measure 76

Amalgamation Types 79

Dissimilarity Factors 80

Included Dissimilarity Types 91



Phylogenetic Tree 95

Sequence Motifs 98

Associating Sequence Motifs with Genomic Features 102



Export Menu 104

Genome Set as .gs file 105

Genomes as Extended GFF files 106

Genomes as GenBank files from NCBI 107

Process Menu 108

Load Query Set 109

Load Data Grouping 111

Data Grouping Correlation 112

Adjusted Fowlkes-Mallows Method 115

Problems associated with non-identical datasets and repeated elements 118

Tree Similarity Scan 120

Comparing Against a Phylogenetic Tree 122



Context Forest 124

Process Output Window 126

Scan Results Panel 127

Context Forest Panel 130

Help Menu 131

Chapter 4: Additional Resources 132

Video tutorials 133

Author Contact Information 134
Chapter 1:

Getting Started

What is JContextExplorer?

JContextExplorer is a tool to facilitate cross-species genomic context comparisons within a set of previously determined annotated genomes (and protein homology clusters). JContextExplorer uses variable-group agglomerative hierarchical clustering to create “context trees”, where each leaf represents a single gene neighborhood.

JContextExplorer offers several ways to (1) define genomic groupings (i.e., create the genomic segments to be compared and clustered), (2) perform pairwise comparisons (compare each genomic segment with each other genomic segment to be clustered, and (3) assemble these comparisons into a tree (link the individual dissimilarities between genomic segments using standard clustering approaches). JContextExplorer allows for fast searching a set of annotated genomes, as well as several flexible visualization tools, and allows for direct comparisons with previously computed phylogenetic trees and additional data.

As evident in the name, JContextExplorer is designed to facilitate exploration. Each of the three major steps in context tree creation (genomic grouping definition, pairwise comparisons, tree creation) may be re-computed quickly and easily with alternative parameters. The graphical interface is designed for point-and-click investigation, and provides fast and easy export of major results (context trees, multi-genome context renderings, etc). We strongly suggest using the automated features (tree computation) in concert with the manual interrogation features (multi-genome browser) in your investigations.

Why Should I Use JContextExplorer?

There are many reasons to use JContextExplorer. Perhaps you would like to


  1. Resolve ambiguities in annotated features, and/or assigning putative functions to un-annotated and under-annotated genes

  2. Compare changes in gene regulatory network structure (as in the case of operons in microbial species).

  3. Discover and interpret potential horizontal gene transfer events.

  4. Within a set of duplicated homologous genes across species, determining which copies are ancestral and which represent more recent expansions.

  5. Peruse annotated features nearby to a gene or genes of interest.

  6. Compare (and count) textual annotations within a set of homology clusters.

  7. Effectively merge one or more context sets into superclusters.

These are but a short list of suggested uses. Any comparative genomic analysis that could benefit by alternative methods of organization and visualization of multiple genomes (or section of multiple genomes) stands to benefit from JContextExplorer. For a few video demonstrations of JContextExplorer in action please see Chapter 4: Additional resources.

Chapter 2:

Launching JContextExplorer

Where I can find JContextExplorer?

JContextExplorer can be found on the software Facciotti lab website:

http://www.bme.ucdavis.edu/facciotti/resources_data/software/

On this website, JContextExplorer is available both (1) as a Java WebStart and (2) as a downloadable .JAR file. Simply click the Orange Launch button on the page. Supplementary documentation, instructions, and links to video tutorials may also be found on this page. JContextExplorer is distributed as an executable JAR. However, it is also possible to build the tool from source. All source code is available on GitHub:

https://github.com/PMSeitzer/JContextExplorer

What do I need to do before I can launch JContextExplorer?

JContextExplorer runs on the Java Virtual Machine (JVM) version 1.6 or higher. If you do not have the Java runtime environment installed, please install the latest version of Java before attempting to launch JContextExplorer.

The Java Webstart version runs with a maximum heap size of 1024 MB. Please make sure your system can accommodate for this memory allocation. If you are using the WebStart version, to launch JContextExplorer, simply click the orange WebStart launch button.

If you have downloaded the .JAR file directly, you may either (A) double-click on the icon or (B) launch JContextExplorer from the command line with the following command:

java –jar


/JContextExplorer.jar

You may want to launch java with a larger max heap size to avoid memory-related problems. In that case, type the following command:

java –Xmx1024M –jar
/JContextExplorer.jar

or java –Xmx2148M –jar


/JContextExplorer.jar

The Launch

The Main Frame of JContextExplorer appears upon launch:



If you are working on a Windows machine, you will notice a menu bar appearing directly above the frame. If you are working on a Macintosh machine, the menu bar will appear at the top of the screen. A menu bar generated from a Macintosh machine looks like this:



The 5 menus, Genomes, Load, Export, Process, and Help are unique to JContextExplorer, while the apple symbol and JContextExplorer menu are auto-generated by Macintosh. On a windows machine, the apple symbol and JContextExplorer menus will not appear.

What now?

To start, you’ll need to create or load a Genome Set, which is simply a set of annotated genomes. Once this data has been loaded, you can search this Genome Set for particular genes, either based on textual annotation (when the Annotation Search radio button is selected, in the upper right-hand corner) or homology cluster ID number (when the Cluster Number radio button is selected. Searches of your Genome Set should be carried out from the search bar in the upper right-hand corner.

Beyond searching the database for instances of a single gene, you’ll also want to search for gene groupings – that is, instead of retrieving just one gene, you may want to retrieve a set of genes. To do this, you’ll need to define a Context Set. After retrieving gene groupings, you may want to quantitatively compare these groupings to each other – in other words, you’ll want to build Context Trees. When building such a context tree, you’ll want to define an appropriate dissimilarity measure and clustering algorithm for your context tree. After you’ve generated context trees, you’ll want to browse your contexts using the Multi-genome browser.

You may want to load up additional information, customize the dissimilarity metrics, or generate many context trees at once, and compare these context trees to each other. You may also want to interact with NCBI’s databases and add, remove, and manage genomes or genome sets to you JContextExplorer section. In other words, there are many things you might want to do, and many things that are possible.



The best way to become familiar with JContextExplorer’s features is to watch and the introductory video tutorials, which are described in more detail on page 133. These tutorials will not highlight all of JContextExplorer’s features, but will provide a good starting point. As you watch, complete the steps on you own, pausing the video as needed. Then, once you’ve mastered the basics, you may return to this manual and read more about which features you’d like to learn in more detail.

Chapter 3:

Using

JContextExplorer



Window Layout

JContextExplorer is organized as a series of major and minor windows laid out in a semi-hierarchical manner:



From an initial starting frame, a main window is launched. Within this window, you can do several things (conduct searches, modify search output, load phylogenetic trees, etc), which will often entail launching subordinate “child” windows. JContextExplorer is designed for frequent coordination between the main and child windows.



Main Frame

Conceptually, the Main Frame may be divided into 4 regions:



A Genome Set Search Area (Blue, upper left), Search Options Area (Green, lower left), Internal Frame Management Area (Orange, upper right), and a Search Results Analysis Area (Red, lower right).

Each of these areas is explained in more depth in the following sections.

Genome Set Search Area

The Genome Set Search Area is the place to conduct searches of a loaded Genome Set and define, switch, and manage various Context Sets. It is located in the upper left-hand corner of the main frame, and looks like this:

If the Annotation Search radio button is selected, then text strings will be searched against gene annotations. Searches are case-insensitive, and will return partial matches. For example, a search of “gluco”, for example, will return hits for genes such as “glucose”, “glucokinase”, “glucose regulator”, and “glucocorticoid”. Annotation searches are case-insensitive: “glucose” and “GLUCOSE” are both exact matches to “Glucose”.

If the Cluster Number radio button is selected, then integral values will be searched against assigned gene cluster numbers. In this case, only exact matches will be returned. For example, a search of “43” will return all genes with cluster number 43.

OR” statement: separate queries with a semicolon ; .

For example, an annotation search of

hexokinase; glucose

will return all genes with annotations that contain either the text string “hexokinase” or “glucose”. An annotation search of

hexokinase; glucose; glycerol; nitrogen

will return all genes with annotations that contain at least one of the text strings “hexokinase”, “glucose”, “glycerol”, or “nitrogen”.

This works as well for cluster IDs: a cluster ID search of

1; 65; 534

will return all genes with cluster ID 1, 65, or 534.

AND” statement: separate queries with a double-dollar sign $$.

The “AND” statement operates following the application of the context set – suppose that a search for “452” with an “operon” context set yields a set of genes that have clusters 451, 452, 453, and 454 in some organisms and a set of genes that have cluster 451, 452, and 453 in others. Searching for 452 $$ 454 will return only those operons that contain both 452 and 454.



IF AND ONLY IF” clause: start query with &&only

Consider the previously described scenario. If you’d like to retrieve all operons that contain 451, 452, and 453 but not 454, you can search as follows:

&&only 451 $$ 452 $$ 453

the “&&only” requires that only exat matches be returned, the “451 $$ 452 $$ 453” specifies that the genomic groupings must contain 451, 452, and 453.



to specify a continuous range of clusters, use a dash.

For example, a cluster ID search of

46-48

will return all genes with cluster ID 46, 47, or 48. Note that this is identical to the query



46; 47; 48

The Cancel button may be used to either cancel (1) a popular genome set being imported or (2) a search query / context tree rendering.

After a search has completed, a message will appear in the console listing the total number of matches. The progress bar below the search bar will display the approximate progress of the search.

Under the Select Context Set banner, it is possible to select the currently active context set from the drop down menu. When a search is performed, gene groupings are returned according to whichever context set is selected. The Add/Remove button allows you to Add or Remove a context set, as you see fit. This is explained in more detail on page 68, Context Set.

Finally, the large Update button will become enabled when one or more Search Results Frames are available. When a Context Tree is drawn, you may wish to display the resulting tree with a different font or different style. These settings can be changed in the Context Tree sub-panel (Explained in more detail on page 18, Search Options Area). Changes will take effect with a push of the Update button.

Search Options Area

The Search Options Area provides options for Search Results, Context Tree drawing, and loading one or more phylogenetic trees or Sequence Motifs. It is located in the lower left-hand corner of the main frame, and looks like this:

This area contains 4 tabbed panes: Options (shown), Tree (explained in next section), Phylogeny (explained in the Phylogenetic Tree section, on page 95), and Motifs (explained in the Sequence Motifs section, on page 98).

When you have entered a search in the search bar in the Genome Set Search Area (upper left-hand corner of main frame), a new internal frame will appear in the Internal Frame Management Area, showing up to 3 results panes. The Options pane allows you to specify which results to include –Search Results can be displayed, a Context Tree can be rendered, and a Phylogenetic tree can be drawn with the frame. Any one or all 3 options may be selected. Pushing the Select All button will select all options, pushing the Deselect All button will deselect all options.

If no boxes are checked, the Print Search Results box will become checked, and search results will be displayed. If the phylogenetic tree box is checked, and no phylogenetic tree is loaded, then this option will become unchecked, and this pane will not be drawn.

Context Tree Options

Selecting the Tree tab in the Search Options Area displays the following panel:



Under the Tree Computation banner, you may select an appropriate Dissimilarity Metric and Clustering Algorithm, from the appropriate drop-down menu. The Add/Remove button below the Dissimilarity Metric field allows you to create a customized Dissimilarity Metric (for more information, please see Dissimilarity Measure, page 76). The Precision Field allows you to specify the number of decimal places to use in the clustering step.

Under the Tree Display banner, there is an option to Show Bands or not, and then the option to specify a color for these bands. Bands demonstrate groups of nodes that have the same range of dissimilarities – for example, if the dissimilarity between nodes A and B is 0, and the dissimilarity between B and C is 0, then the dissimilarity between nodes A and C must be zero. However, depending on the algorithm, the dissimilarity between A and C might not come out to be 0. Suppose, for example, that the computed dissimilarity between A and C is 0.1. In that case, A, B, and C will all be grouped together, with a dissimilarity band between 0 and 0.1. If Show Bands is selected, you will see the range of dissimilarities, if it is not, then you will see the smallest valuenodes A, B, and C, will all have a dissimilarity of 0.

The Banding case is the result of Variable group agglomerative hierarchical clustering, where the order of comparisons does not matter. For a more complicated discussion of bands in variable group agglomerative hierarchical clustering, please see

Gomez, S., Fernandez, A., Montiel, J., & Torres, D. (n.d.). Solving Non-Uniqueness in Agglomerative Hierarchical Clustering Using Multidendrograms. Journal of Classification, 65, 43-65. doi:10.1007/s00357-008-

Under the Nodes banner, you may specify the size of the nodes, and optionally show the labels, change the font, and change the color.

Under the Axis banner, you may change various properties of the axis.

When editing an existing Context Tree, make the changes in this panel, and click the Update button located above this panel at the bottom of the Search Options Area. If nothing new needs to be computed, then the tree computation should be very fast.

Internal Frame Management Area

The Internal Frame Management Area is the portion of the Main Frame where Search Results frames, Context Trees, and rendered Phylogenetic Trees appear in their own internal windows. These windows may be dragged around, minimized, maximized, and closed.

Internal frames appear in the upper right-hand corner of the main frame.

Pictured is a sample frame containing a Search Result frame, a Context Tree, and a Phylogenetic Tree:



Search Results Frame

Pictured is the Search Results Frame from above:



In the frame above, 3 Genomic Groupings are selected – Haloarcula_amylolytica-1, Haloarcula_argenintensis-1, and Haloarcula_californiae-1. Each genomic grouping is named according to the source organism, followed by a serial number, showing the instance of a genomic grouping stemming from that organism. All of the selected genomic groupings have a serial number of “1”, indicating that each is the first genomic grouping (arbitrarily numbered) stemming from that organism.

Expanding each Genomic Grouping folder shows the genes included in that Genomic Grouping. The Gene ID, Cluster ID, and Annotation information is included for each gene in that genomic grouping.

Pushing the Expand All button expands all Genomic Grouping folders (showing all genes in all genomic grouping), while pushing the Collapse All button collapse all Genomic Grouping folders (hiding all genes in all genomic groupings).



Genomic Groupings may be selected by clicking on folders, or holding down SHIFT and selecting a range of folders, or holding down COMMAND or CTRL and selecting/de-selecting one or more folders.

Export Options: Search Results Frame Menu Options

Right-clicking on the search results will cause a pop-up menu to appear where clicked (shown above). This pop-up menu offers several options to export data associated with the selected entries of the search results frame.

The firs half of the menu is associated exporting sequences associated with the sequences of individual genes, or entire genomic groupings, based on a set of pre-loaded .fasta genome sequence files, named the same as the organism it is associated with (please see Genome Sequence File(s), page 63, for more information). “Export Genes (DNA Sequences)” will export the DNA of individual selected genes, or all component genes (if an entire genomic grouping folder is selected). “Export Protein Sequences” works the same way, except the DNA is translated into protien sequence. For these options, genes that are oriented in reverse complement on the genome form are output from the standpoint of the start of the gene. The last option, “Export Genomic Grouping Segments (DNA)”, will output the entire stretch of DNA represented in a single genomic grouping, without regard to the genes contained (from the earliest start site contained in the genomic grouping to the latest stop site in the genomic grouping). In this case, DNA is exported always according to the forward strand, intergenic DNA is included in the export, and no correction is made for genes existing on the reverse complement.

The second half of the menu is associated with exporting selected data entries in the table into a plain text file. the “Short” form (first option) exports exactly the information displayed in the table: gene ID – cluster ID – annotation. The “Long” form (second option) exports additional information. The information exported is as follows: organism – contig – start – stop –strand- annotation – cluster ID – gene ID.



Context Tree Frame

In the frame above, the same 3 Genomic Groupings are selected as shown in the Search Results Frame – Haloarcula_amylolytica-1, Haloarcula_argenintensis-1, and Haloarcula_californiae-1.


  1   2   3   4   5   6


Verilənlər bazası müəlliflik hüququ ilə müdafiə olunur ©azrefs.org 2016
rəhbərliyinə müraciət

    Ana səhifə