All images in this article were created with Circos (v0.49) and the tableviewer utility tool.
To obtain a manpage for any of the scripts, use the -man flag.
> bin/make-table -man > bin/parse-table -man > bin/make-conf -man
This is the second part of a series of articles that describe how Circos can be used to visualize tabular data. The first part presented the visual paradigm behind creating images of tables with Circos. If you're not familiar with this appoach, I strongly suggest that you at least glance over the first few images of that writeup to get an idea of how to interpret the visualizations.
In this article, I will cover the technical details of using the tableviewer set of scripts to parse your tabular data and turn them into files that Circos can use.
The tableviewer set of scripts is distributed as part of the circos-tools package and is composed of three scripts
You CREATE your data file (or supply it), then PARSE it into an intermediate form, then FORMAT it to generate Circos input and finally VISUALIZE by running Circos.
Figure Table visualizations are created using parse-table and make-conf. You can supply your own table data (table.txt in the flow chart), or generate a random data set with make-table.
If you have your own data, you do not need make-table. On the other hand, if you would like to explore different forms of the visualizations with tables of different size and content, you can use make-table to create synthetic data sets.
If you have your own data and are not interested in how to generate random tables, you can skip this section and go directly to the section that describes parse-table.
The make-table scripts generates tables with random data, suitable for input to parse-table (if -brief is used - see below). The minimum information you need to pass to the script is the number of rows (using -rows). If you do not specify the number of columns (using -cols), the number of rows and columns will be the same.
> bin/make-table -rows 3 mean lbl A B C mean A 200 200 200 mean B 50 50 50 mean C 100 100 100 sd lbl A B C sd A 100 100 100 sd B 25 25 25 sd C 50 50 50 table lbl A B C table A 257 296 211 table B 61 58 38 table C 17 145 25
Here the output is segreated into three sections. The first section (each line prefixed by mean) gives the average of the distribution from which the cell value is sampled. The second section (prefixed by sd) reports the standard deviation. The third section (prefixed by table) is the actual table, and the cell values here are sampled from a normal distribution with combination of mean and standard deviation reported in the corresponding sections above.
For example, the cell (B,C)=38 is sampled from the normal distribution with mean=50 and standard deviation=25.
You will normally not need the details of the distribution when creating data files. To generate output that is directly compatible with parse-table, use -brief.
> bin/make-table -rows 3 -brief lbl A B C A 80 387 112 B 1 30 61 C 96 146 29
Notice the data values have changed in this example. This is because the data are generated randomly each time. If you want the data values to remain constant between executions, provide a fixed value for the random seed using -seed.
> bin/make-table -rows 3 -brief -seed 123 lbl A B C A 262 209 168 B 28 86 45 C 58 95 69
You'll notice that it the examples above the rows (A,B,C) were named the same as the columns. When a row shares the same name with a column both are represented by the same segment in the visualization. Thus, the number of shared labels affects the format of the image. If you would like to simulate rows and columns with different labels, use -unique_labels.
> bin/make-table -rows 3 -brief -seed 123 -unique_labels lbl D E F A 262 209 168 B 28 86 45 C 58 95 69
In the first example, the mean and standard deviation values were different for some rows and columns. These values are defined by rules within the configuration file etc/make-table.conf. A rule set is a named rule block
# this is the rule set to use rule_set = some_rule_set # and here is its definition <rules some_rule_name> ... </rules>
which defines a given rule set. You can have any number of rules blocks (all must have unique names) and then pick the one you want to use using rule_set (available as -rule_set)
> bin/make-table -rows 3 -rule_set default > bin/make-table -rows 3 -rule_set constant
Within the rules block, you can define any number of individual rules which apply a mean and standard deviation value to any combination of rows and columns. The rows and columns are selected using two regular expressions which are followed by the mean and standard deviation. For example,
<rules some_rule_name> rule = . . 100 25 </rules>
Will filter rows and columns using regular expression '.' (i.e. any character). Thus each cell in the table will be assigned a (mean,sd) pair of (100,25).
By next adding another rule, such as "A . 50 10", cells in row A can be adjusted (regular expression for the row is 'A' and for the column '.').
<rules some_rule_name> rule = . . 100 25 rule = A . 50 10 </rules>
Finally, a single cell can be adjusted by specifying a regular expression that uniquely selects the cell.
<rules some_rule_name> rule = . . 100 25 rule = A . 50 10 rule = A D 200 50 </rules>
To see this rule set in action,
> bin/make-table -rows 3 -seed 123 -unique_labels -rule some_rule_name mean lbl D E F mean A 200 50 50 mean B 100 100 100 mean C 100 100 100 sd lbl D E F sd A 50 10 10 sd B 25 25 25 sd C 25 25 25 table lbl D E F table A 231 50 46 table B 49 78 136 table C 95 79 97
You can see the effect of the rule entries in the rule set on the mean and sd lines in the full table report. Adjusting the distributions from which cell values are sampled is very helpful to explore how data patterns manifest themselves in the visualization. For example, how would the visualization change if all the values in a given row (and/or column) are doubled?
In the examples above, the rules specified absolute values for both mean and standard deviation values. You can adjust the cell values using relative notation (rVALUE) for any rule, as long as the cell already has an absolute value associated with it. The relative value is used as a multiplier. For example,
<rules some_rule_name> rule = . . 100 25 rule = A . r2 r0.2 </rules>
Will apply (mean,sd)=(100,25) to all the cells (first rule) and then set the mean of cells in row A to be twice their value (e.g. 100 -> 200) and the standard deviation to be 0.2 times their value (e.g. 25 -> 5). The relative syntax makes it possible to grow or attenuate values in rows, columns and individual cells relative to other parts of the table, and define the baseline values only once.
The tabular visualization requires that cell values in the table be non-negative. Given that ribbons are used in the visualization to represent cell values, and that their thickness is proportional to the value in the cell, negative cell values do not have a corresponding visual form. If your data set contains negative values that you'd like to include in the image, you can use remap negative values onto a unique range and then use rules in the Circos configuration file to apply distinct formatting to ribbons in this range.
The output of make-table can contain negative values, however - it will be up to you to manage these downstream. If you set a large standard deviation, relative to the mean, it's likely that some of your sampled values will be negative.
For example, if (mean,sd)=(100,100) for every cell, such as defined in the rule set named with_negatives,
> bin/make-table -rows 3 -seed 234 -unique_labels -rule with_negatives mean lbl D E F mean A 100 100 100 mean B 100 100 100 mean C 100 100 100 sd lbl D E F sd A 100 100 100 sd B 100 100 100 sd C 100 100 100 table lbl D E F table A 123 87 -33 table B 82 -72 181 table C -182 -53 62
This output was created with the following settings
positive_only = no non_negative_only = no negative_is_missing = no zero_is_missing = no
In other words, make-table was not asked to iterate sampling until positive values were selected for each cell, and negative values were not considered to be "missing data". If you want any negative values to be encoded as missing data, use -negative_is_missing.
> bin/make-table -rows 3 -seed 234 -unique_labels -brief -rule with_negatives -negative_is_missing lbl D E F A 123 87 - B 82 - 181 C - - 62
The missing data field is defined by the value of missing_data. Alternatively, if you don't want negative values at all, use -non_negative_only. Here, make-table will sample each cell's distribution until it finds a non-negative value (>=0). Be careful in choosing mean and standard deviation values that heavily favour negative values (e.g. mean=-100 sd=10) - you may never find a non-negative value and the make-table script will sample the distribution forever.
> bin/make-table -rows 3 -seed 234 -unique_labels -brief -rule with_negatives -non_negative_only lbl D E F A 123 87 82 B 181 62 102 C 41 244 78
The difference between -non_negative_only and -positive_only is that the former allows 0 and the latter does not. If you want zeros to be considered missing data, set zero_is_missing=yes or use -zero_is_missing.
The purpose of modeling missing data is to explore how the table visualization deals with empty cells. There are settings in the parse-table script that control how missing values are handled.
The core logic of tabular visualization method is implemented in the parse-table script. This is the script that reads in a table, analyzes relationships between row and column labels and produces an intermediate file which reports statistics (e.g. row, column, label) and features of individual ribbons. Although the output of this script isn't mean to be parsed by a human, its format is sufficiently clear that you can, with only a little effort, figure out what is being reported.
The input to parse-table is expected to be a plain-text file that stores the tabular data. The format of the data is flexible, but it is strongly recommended that each row have the same number of fields.
There are four main parameters that control how input is parsed
Let's look at some example input.
input | parameters | parsed |
---|---|---|
-,A,B,C A,0,1,2 B,3,4,5 C,6,7,8 |
# values are comma-separated, so use the , as delimiter field_delim = , |
A B C A 0 1 2 B 3 4 5 C 6 7 9 |
-,A,B,C A,0,,2 B,3,4,5 C,6,7,8 |
field_delim = , # adjacent delimiters should not be collapsed field_delim_collapse = no # when adjacent delimiters exist and are not collapsed, a blank # field will result. To interpret this as missing data, set blank_means_missing blank_means_missing = yes |
A B C A 0 - 2 B 3 4 5 C 6 7 9 |
-,A,B,C A,0,X,2 B,3,4,5 C,6,7,8 |
field_delim = , # You can use any string to explicitly indicate that the cell's data value is missing (e.g. -). This is different # than using a zero value, because it missing values do not count towards any statistics. missing_cell_value = X |
A B C A 0 X 2 B 3 4 5 C 6 7 8 |
- A B C A 0 1 2 B 3 4 5 C 6 7 8 |
# Use \s as delimiter to indicate either tab or space. # Use ' ' to specifically indicate that a space is used. # The distinction between tabs and a space is usually not important. field_delim = \s # If your input uses whitespace delimiters liberally for formatting, make sure that # adjacent delimiters are collapsed. Keep in mind that when tab-separated data # is generated, adjacent tabs usually indicate missing data. field_delim_collapse=yes |
A B C A 0 1 2 B 3 4 5 C 6 7 8 |
- A B C A "0" 1,000 (2) B "3" 4,000 (5) C "6" 7,000 (8) |
field_delim = \s # If your values are quoted, contain thousands-separators, or have other # cruft, use -remove_cell_rx to define a regular expression of chararacters # that should be removed from each field. remove_cell_rx = ",() |
A B C A 0 1000 2 B 3 4000 5 C 6 7000 8 |
If you would like to see how parse-table parsed your table, use -show_parse. This will report the parsed version and immediately exit.
> cat samples/parse-example-1.txt | bin/parse-table -field_delim , -no-field_delim_collapse -show_parsed data A B C A 0 1 2 B 3 4 5 C 6 7 8
At this point, in order to illustrate how parse-table's configuration can be adjusted to customize the image, I need to briefly go through the process of using make-conf, the next script in the series, to create an image.
The following will generate an image of a 3 x 3 table
# first, create a 3x3 table (use a random seed so that this step is reproducible) > bin/make-table -row 3 -seed 123 -brief > samples/table-basic.txt # let's see the table > cat samples/table-basic.txt lbl A B C A 262 209 168 B 28 86 45 C 58 95 69 # now parse the table > cat samples/table-basic.txt | bin/parse-table > tmp.txt # now create configuration and data files > cat tmp.txt | bin/make-conf -dir data # let's see what was created > ls data/ -rw-r--r-- 1 martink users 246 Jun 1 15:12 all.txt -rw-r--r-- 1 martink users 726 Jun 1 15:12 cells.txt -rw-r--r-- 1 martink users 246 Jun 1 15:12 col.txt -rw-r--r-- 1 martink users 52 Jun 1 15:12 colors.conf -rw-r--r-- 1 martink users 577 Jun 1 15:12 colors_percentile.conf -rw-r--r-- 1 martink users 69 Jun 1 15:12 karyotype.txt -rw-r--r-- 1 martink users 242 Jun 1 15:12 row.txt # now draw the image (circos.conf is already defined to use the data files from data/) > circos -conf etc/circos.conf -outputfile table-basic.png
Figure Visualization of a 3x3 table from samples/table-basic.txt
In subsequent examples, I will be adjusting both the input data (e.g. samples/table-02.txt) and configuration files (e.g. saples/parse-table-02.conf). The input and configuration files designed to be used together will have the same suffix (e.g. -02). In case where the same table file is used repeatedly with different configuration files, the configuration files are further suffixed with a, b, c. For example, table-01.txt can be used with parse-table-01a.conf, parse-table-01b.conf, and so on.
The process of parsing a table and creating the Circos data and configuration files can be chained
# chain calls to parse-table and make-conf for table-01.txt > cat samples/table-01.txt | bin/parse-table -conf samples/parse-table-01.conf | bin/make-conf -dir data > circos -conf etc/circos.conf -outputfile table-01.png
The makeimage script automates this process. Once you know the number of the table file to create (see samples/table-NN.txt for different tables), run
> makeimage NN # e.g. NN=02 > makeimage 02
to parse the table, create the data and run Circos. This script assumes that the Circos binary is at ../../bin/circos relative to the tableviewer directory.
Finally, I need to draw attention to the two distinct types of configuration files that I have mentioned here. First, there are the configuration files that control parse-table (these are all named parse-table*.conf). Second, there are configuration files that control Circos itself (these are in etc/*.conf). The former control the structure of the visualization (order/color of segments and ribbons, data remapping and normalization, etc) whereas the latter control the display of the visualization (image size, thickness of segments, tick marks, etc). The Circos configuration files will be the same for each example. Feel free to adjust these (etc/circos.conf, etc/ticks.conf, etc/ideogram.conf) to suit your needs.
One of the basic ways in which the table visualization can be adjusted is adjusting the order of segments and ribbons. By default, the segments are ordered based on alphabetic label order and the order ribbons is based on cell value, with ribbons for larger-valued cells appearing before those of smaller-valued cells.
Figure Visualization of a 3x3 table from samples/table-basic.txt
Segment order is controlled with the segment_order parameter. This parameter can be defined as one or more comma-delimited values that control the order of the segments, with the values taken from this set
row_major row segments first, then column (useful with a secondary sort order within row/col group) col_major col segments first, then row (useful with a secondary sort order within row/col group) ascii asciibetic order row_size total of rows for the segment - useful if the segment has both row and column contributions col_size total of colums for the segment - useful if the segment has both row and column contributions row_to_col_ratio ratio of total of rows to columns for the segment col_to_row_ratio ratio of total of rows to columns for the segment size_asc size, in ascending order size_desc size, in descending order
with values *_ratio and *_size requiring that rows and columns share the same label. Below are some examples of visualization of table-01.txt with different segment order.
Figure Segment order is controlled with the segment_order parameter.
For example, if segment_order=col_major,size_desc then column segments are shown as a group first, and within this group segments are ordered by the column total in descending order (segment associated with column with the largest total is first). Row segments are shown after the column segments, and within this group are ordered in decreasing size.
The segment order fixes the large-scale structure of the visualization. The fine structure is determined by how the ribbons that correspond to cell values are ordered within each segment. Ribbon order is determined by the following parameters
placement_order - determines the order of row and column ribbons, as groups, within a segment that has both row and column ribbons ribbon_bundle_order - order of ribbons within a segment (or within group, if placement_order is used) reverse_rows - all row ribbons are drawn in reverse order reverse_columns - all column ribbons are drawn in reverse order
The placement_order parameter is useful only if you have rows and columns that share a label (these labels give rise to segments that have both row and column ribbons). We'll skip this option for now, since the rows and columns in the present table (table-01.txt) are all uniquely named.
The ribbon_bundle_order parameter is the primary parameter for controlling ribbon order. Values for this parameter can be size_asc, size_desc, ascii or native.
Figure Ribbon order is controlled with the ribbon_bundle_order parameter.
The size_asc and size_desc values correspond to a ribbon order that is defined within each segment based on the cell value (ribbon thickness). For example, when ribbon_bundle_order=size_asc is used, small ribbons are placed first. When either size_asc or size_desc are used, the ribbon order does not depend on the order of segments - the order within one segment is independent of the order within another and based only on the cell value.
When ribbon_bundle_order is set to ascii or native, however, ribbon order will depends on segment order. When set to 'ascii' ribbons are placed on a segment in order of the label of the destination segment. For example, in the above figure ribbons starting at segment A are ordered A-D, A-E, A-F where -D, -E, -F are the destination segments. Similarly, those starting on B are ordered B-D, B-E, B-F. When 'native' is used, the order is based on the actual position of the destination segments and not their labels.
The purpose of 'native' is to attempt to disambiguate the figure by reducing the number of ribbons that cross. For most data sets, there will be ribbons that cross within the figure. However, given that this number can be reduced by using a different segment order, it makes sense to do so because it results in a visually simplier figure.
The last parameter that controls how ribbons are placed is the ribbon_layer_order parameter. The value of this parameter defines the order in which ribbons are layered. Judicious use of this parameter, together with ribbon transparency, is helpful in showing contribution to the figure from both small and large cell values.
Figure Ribbon layering is controlled with the ribbon_layer_order parameter.
Segments are assigned a color from the range of colors defined in
the Segments will be assigned colors from this range of HSV values with
the interpolation being guided by the number of segments or their
size. If you select interpolate_type=count, then if you have N
segments, the N colors will be sampled from the HSV space uniformly.
To increase color difference between large segments, you can use
interpolate_type=size to sample colors in the HSV range based on size
of segments. The colors for each segment in this scheme are determined
as follows. First, consider the circle to represent the HSV range and
stretch/shift scale so that the half-way point on the first and last
segments fall on the start and end of the range, respectively. Then, the
colors of each segments will be the color associated with the position
of the mid-way point of each segment.
The range of values for HSV components is H=0..360, S=0..1 and
V=0..1. You can use hue values larger than 360, and the effect will be
a hue determined by mod(HUE,360). For example, if you have a large
number of segments and would like to make the segment color appear
random (more or less), use a very large value of h1 (e.g. h1=30,000).
To look slightly ahead, ribbons can inherit their colors from their
segments. In the examples below, each ribbon is colored by its row
segment. I will discuss later how ribbon color is adjusted.
Figure Colors of segments are interpolated within an HSV range using a count or size scheme. If segments are approximately equally sized, these two schemes produce very similar colors.
The order in which segments are displayed and the order in which the color interpolation is done are independent. Order for color is determined by segment_color_order, and may be different from segment_order.
Figure In both cases the position of segments is determined by order of their labels (segment_order = segment_color_order = ascii). In the first panel, segment color is similarly ordered. In the second panel, segment colors are assigned by decreasing segment size (segment_color_order = size_desc).
One of the ways in which ribbons can be colored is through inheriting their color from the segment to which they belong. Coloring ribbons by their row (or column) segment is as easy as changing the value of the color_source parameter in the <linkcolor> block. This is shown below.
Figure Ribbons can take on the color of their segments. Depending on the value of color_source int he <linkcolor> block, the row or column segments can be used to color ribbons.
Coloring ribbons based on their segments' color is helpful because
it gives a breakdown of the row (or column) segment at the ribbon's
other end. For example, in the first panel of the figure above, you
can see that for 3/5 column segments (F, I, J) the largest
contribution was from the red segment (A, giving rise to red ribbons).
Instead of using the segments' colors, you can color ribbons by
mapping their corresponding cell values onto a color scheme. The
mapping can be done using the cell values themselves or their
percentile.
In the figure below, ribbon colors are initially determined by their row segments. Colors (as well as transparency and stroke) are modified based on cell values. The cutoff filters are defined within <value VALUE> blocks, which apply to any cell for which the value is <= VALUE.
Figure In this example, ribbon color is initialized from row segments. Ribbon characteristics are subsequently remapped by modifying color, transparency and stroke thickness based on the values of the cells.
If you are interested in the distribution of values, consider using the <percentile PERCENTILE> block, rather than <value>. Using this approach you can color ribbons based on how they fall within the distribution of values, rather than by absolute value.
Figure In this example, ribbon color is initialized from row segments. Ribbon characteristics are subsequently remapped by modifying color, transparency and stroke thickness based on the percentile of the cell values.
If you want to initialize ribbon characteristics before the remapping is applied, use the <linkparam> block, as follows.
If you are setting the color in this block, make sure to leave color_source undefined (comment out the definition of the parameter), otherwise the segment color will override the color defined in a <linkparam> block.
Figure Ribbon color is initialized from <linkparam> block and subsequently remapped.
When used, <value> or <percentile> blocks are
internally ordered in increasing size, and for each ribbon the ordered
blocks are tested to find the first one for which ribbon_value <=
block_value. Once this block is found (if any), any parameters in the
block are applied to the ribbon and no further blocks are
tested. Thus, in the example above the empty block <value 150>
acts to keep ribbons with values <=150 unaltered (they retain format
characteristics set by the <linkparm> block).
One last way in which ribbon color and stroke can be altered is
through the use of cell_qN_color and cell_qN_nostroke
parameters. These act on ribbons based on the quartile of their values
(q1 for first quartile, q2 for second, etc). Thus, in addition to
remapping colors based on values or percentiles, you can ultimately
override the color of the ribbons based on quartiles.
The level of transparency of a color can be adjusted in the
<linkcolor> block, individual value/percentile blocks or the
<linkparam> blocks. The range of transparency values (1..N) is
determined by the auto_alpha_steps parameter in the circos.conf
file. Circos uses this parameter (e.g. auto_alpha_steps = 5) to define
a range of colors (e.g. red_a1, red_a2, ... red_a5) each with a
different degree of transparency (red_a5 most transparent, red_a1
least transparent). When defining transparency in the parse-table.conf
file, make sure that you stay within this range.
You can hide ribbons (make them invisible, but reserve their segment positions) or remove them entirely (and shrink their segments accordingly). To do this, define any of these
Then, to determine how cells that fall outside this range are handled, define cutoff_cell_handling
Figure Ribbons associated with small (or large) values can be hidden or removed.
Suppressing the display of ribbons is useful in removing
uninteresting data from the display without having to adjust the input
file. If removing the ribbons altogether is too drastic, consider
using the color rules defined above to selectively increase
transparency or alter color (e.g. light grey) of ribbons that are not
of interest.
There are several ways in which you can control the relationship
between table cell value and ribbon width. By default, the figure
scale and ribbon width are linearly proportional to cell values. Thus,
a column with a total of 100 will give rise to a column segment that
is twice as large as a segment that corresponds to a column with a
total of 50. Likewise, a ribbon for a cell value of 100 will be twice
as wide as a ribbon for a cell of 50.
Using use_cell_remap and cell_remap_formula, you can apply any
function to the cell value to transform it to a new value. The remap
function is defined in the cell_remap_formula. When the function is
parsed, all instances of X are replaced with the cell value and then
the string is evaluated as Perl code. For example,
This pair of parameters will remap the cell values by their square root.
You can write as complex a Perl expression as you like, as long as
it results in a numerical value when eval'ed. Keep in mind that Circos
works with an integer scale, so ribbons for small or fractional cell
values will not be distinguishable (e.g. 1.2 and 1.6 is trucated to
1). If your data is composed of small values, or your remap function
produces small values (e.g. log), you can add a constant multiplier to
the function to increase the dynamic range of the data
(e.g. cell_remap_formula = 100*log(X)). Alternatively, you can use the
data_mult parameter to apply a constant multiplier to cell values
(very useful if your input data is small).
Figure Input table values can be remapped with any Perl-compatible expression. Such transformation can be done within parse-table and is equivalent to transforming the input data upstream of this script.
The data remap facility is very general. You may, however, be interested in only a specific type of remapping and not wish to generate your own transformation.
By using the parameters use_scaling, scaling_type and scale_factor, you can scale the data to attenuate either large or small values. To attenuate large values, and thereby increase the visiblity of smaller ribbons, use
Similarly, use atten_small to attenuate small values to decrease their visibility. The transformations used for these two schemes are
Figure Predefined
transformations to attenuate large or small ribbons are available
through the use_scaling and scaling_type parameters. For example, if
your data has a lot of small ribbons that are not interesting (but you
would like to keep them in the figure), consider using
scaling_type=atten_small to reduce their size.
The previous two sections described how individual cell values can
be transformed to affect the visualization. By applying
transformations like log(X), for example, you can reduce the dynamic
range in the data and effectively depict a table with a large spread of values.
Independently, you can normalize the data on a row or column
basis. For example, one very useful normalization is to transform all
the segments to be the same size. Doing so will draw attention to
relationships in the table based on their relative values.
Normalization can be done in two ways. First, cell values can be
altered (e.g. so that each row adds up to the same total, any tick
mark values on the figure will reflect this change), or the segments
can be visually scaled (tick mark values will show the original value, but won't be uniformly spaced across segments).
To normalize segments to be the same size
By providing a normalization function that is a constant, all segment values will be scaled (the value of their cells will be remapped, since the scheme is set to "value").
Figure Segments can be normalized using a variety of schemes. Here, segments are adjusted to be of the same length.
A variety of normalization schemes are available - please see the parse-table.conf file comments immediately before segment_normalization_function.
So far, all the sample data had rows and columns with different
labels. In other words, there were no rows that had the same label as
a column. In many cases, however, data with shared labels are what you
have.
For example, you may have a list of countries with the flow of
travelers between them. Canada may be in a row (as a departure
country) and in a column (as a destination country). In this case,
row=Canada col=France would be the number of people travelling from
Canada to France, whereas row=France col=Canada would be the number of
people traveling in the other direction.
For a detailed description of this approach, see visualizing
ratios in the first of the Visualizing Tabular Data articles. For
an example, see the ratio
layout for dating color preference.
Figure In a ratio layout, the two cells (row A, col B) and (row B, col A) are encoded by a single ribbon whose end at segment A represents the value at (A,B) and at segment B the value at (B,A).
The ratio layout is only feasible if you have at least one shared label between rows and columns. To toggle this mode, use
The ribbon_variable_intra_collapse flag, when set, collapses the
ribbons for transpositive cells (e.g. (A,A)) so that they do not
occupy twice the space of their value (i.e. the start and end of the
ribbon are superimposed and the ribbon becomes more like a peak).
The output of parse-table is an intermediate file that stores
table, row and column statistics, and information about the position
of the ribbons used to represent cell values. It is the role of
make-conf to take this file and generate data and configuration files
that Circos can use.
Remember that Circos, by itself, cannot analyze and process
data. It's use is to draw the data and it needs help (here, with
parse-table) to be able to make sense of tables.
To use make-conf, simply provide the output directory where the data files should be written
Take a look in the data/ directory - you'll find files that
describe the size of each segment (in karyotype.txt), define the
positions for each ribbon (in cells.txt), as well as other files. In
general, you will not need to make modifications to these files.
With this input data, Circos still needs to be told how large the
image should be, at what radius to place the segments, how thick the
segments should be, the geometry of the links and a lot of other
parameters that make up the figure. These parameters, however, are
independent of the tabular nature of the data and are therefore
controlled independently.
In the tableviewer/etc directory you'll find circos.conf,
ideogram.conf and ticks.conf. These files control the look of the
final image. These files have been created to generate the kinds of
images you see here. Feel free to adjust the parameters (e.g. add tick
marks, decrease segment spacing, etc) to suit your needs.
For example, the link track is defined thus
The effect of the rule is to adjust the ribbon start position at its row segment to be closer to the segment, thereby distinguish the role of row and column segments.
Controlling Ribbon Color
<linkparam>
color = red
stroke_color = black
stroke_thickness = 1p
</linkparam>
# no matter what colors were set or remapped, ribbons for
# quartiles 1-3 will be grey and without a stroke
cell_q1_color = vvlgrey
cell_q2_color = vlgrey
cell_q3_color = lgrey
#cell_q4_color = red
cell_q1_nostroke = yes
cell_q2_nostroke = yes
cell_q3_nostroke = yes
#cell_q4_nostroke = yes
using transparency
hiding and removing ribbons
# defines smallest cell value to show, by value
cell_min_value = 50
# defines smallest cell value to show, by percentile
#cell_min_percentile = 10
# defines largest cell value to show, by value
#cell_max_value = 100
# defines largest cell value to show, by percentile
#cell_max_percentile = 100
# hide ribbons, and keep segments as they are
cutoff_cell_handling = hide
# remove ribbons, and shrink segments accordingly
# cutoff_cell_handling = remove
remapping cell values
cell value formula parsed result
10 sqrt(X) sqrt(10) 3.16
10 log(X) log(10) 2.30 (log is the natural logarithm)
10 exp(X)/X**2 exp(10)/10**2 220.3 (exp(X) is eX)
10 X<5?5:X 10<5?5:10 10 (Perl's ?: operator TEST?IF_TRUE:IF_FALSE)
10 X>0?log(X):0 10?log(10):0 2.30 (evaluate log(X) only if X isn't zero)
use_cell_remap = yes
# for each
cell_remap_formula = sqrt(X)
scaling cell values
use_scaling = yes
scaling_type = atten_large
# by increasing the scale factor, the effect is magnified
scale_factor = 1
normalizing segments
use_segment_normalization = yes
segment_normalization_function = 1000
segment_normalization_scheme = value
ratio layout - drawing segments for rows and columns with the same label
ribbon_variable = yes
ribbon_variable_intra_collapse = yes
CREATING CIRCOS CONFIGURATION AND DATA with make-conf
# first parse
cat samples/table-01.txt | bin/parse-table -conf samples/parse-table-01.conf > tmp.1
# now create data/conf
cat tmp.1 | bin/make-conf -dir data
# or chain together
cat samples/table-01.txt | bin/parse-table -conf samples/parse-table-01.conf | bin/make-conf -dir data
<link cellvalues>
ribbon = yes
flat = yes
file = data/cells.txt
bezier_radius = 0.0r
radius = 0.999r-15p
thickness = 1
color = grey
stroke_color = black
stroke_thickness = 1
<rules>
<rule>
importance = 95
condition = 1
radius1 = 0.999r+2p
flow = continue
</rule>
</rules>
</link>