Circos > Presentations > Articles > Visualizing Tables 2

table of contents

methods

All images in this article were created with Circos (v0.49) and the tableviewer utility tool.

To obtain a manpage for any of the scripts, use the -man flag.

> bin/make-table -man
> bin/parse-table -man
> bin/make-conf -man

introduction

This is the second part of a series of articles that describe how Circos can be used to visualize tabular data. The first part presented the visual paradigm behind creating images of tables with Circos. If you're not familiar with this appoach, I strongly suggest that you at least glance over the first few images of that writeup to get an idea of how to interpret the visualizations.

In this article, I will cover the technical details of using the tableviewer set of scripts to parse your tabular data and turn them into files that Circos can use.

tableviewer script set

The tableviewer set of scripts is distributed as part of the circos-tools package and is composed of three scripts

  • make-table - creates a table with random data, useful for exploring and debugging
  • parse-table - parses a table file (such as one created with make-table, or your own data) into an intermediate format useable by make-conf
  • make-conf - creates Circos configuration and data files used to generate a visualization of the table (uses the intermediate output of parse-table)

You CREATE your data file (or supply it), then PARSE it into an intermediate form, then FORMAT it to generate Circos input and finally VISUALIZE by running Circos.

Figure Table visualizations are created using parse-table and make-conf. You can supply your own table data (table.txt in the flow chart), or generate a random data set with make-table.

If you have your own data, you do not need make-table. On the other hand, if you would like to explore different forms of the visualizations with tables of different size and content, you can use make-table to create synthetic data sets.

CREATING DATA WITH make-table

If you have your own data and are not interested in how to generate random tables, you can skip this section and go directly to the section that describes parse-table.

The make-table scripts generates tables with random data, suitable for input to parse-table (if -brief is used - see below). The minimum information you need to pass to the script is the number of rows (using -rows). If you do not specify the number of columns (using -cols), the number of rows and columns will be the same.

> bin/make-table -rows 3
mean  lbl    A    B    C
mean    A  200  200  200
mean    B   50   50   50
mean    C  100  100  100
sd  lbl    A    B    C
sd    A  100  100  100
sd    B   25   25   25
sd    C   50   50   50
table  lbl    A    B    C
table    A  257  296  211
table    B   61   58   38
table    C   17  145   25

Here the output is segreated into three sections. The first section (each line prefixed by mean) gives the average of the distribution from which the cell value is sampled. The second section (prefixed by sd) reports the standard deviation. The third section (prefixed by table) is the actual table, and the cell values here are sampled from a normal distribution with combination of mean and standard deviation reported in the corresponding sections above.

For example, the cell (B,C)=38 is sampled from the normal distribution with mean=50 and standard deviation=25.

-brief : suppressing distribution details

You will normally not need the details of the distribution when creating data files. To generate output that is directly compatible with parse-table, use -brief.

> bin/make-table -rows 3 -brief
 lbl    A    B    C
   A   80  387  112
   B    1   30   61
   C   96  146   29

Notice the data values have changed in this example. This is because the data are generated randomly each time. If you want the data values to remain constant between executions, provide a fixed value for the random seed using -seed.

> bin/make-table -rows 3 -brief -seed 123
 lbl    A    B    C
   A  262  209  168
   B   28   86   45
   C   58   95   69

-unique_labels : creating uniquely labeled rows and columns

You'll notice that it the examples above the rows (A,B,C) were named the same as the columns. When a row shares the same name with a column both are represented by the same segment in the visualization. Thus, the number of shared labels affects the format of the image. If you would like to simulate rows and columns with different labels, use -unique_labels.

> bin/make-table -rows 3 -brief -seed 123 -unique_labels
 lbl    D    E    F
   A  262  209  168
   B   28   86   45
   C   58   95   69

adjusting distribution parameters

In the first example, the mean and standard deviation values were different for some rows and columns. These values are defined by rules within the configuration file etc/make-table.conf. A rule set is a named rule block

# this is the rule set to use
rule_set = some_rule_set
# and here is its definition
<rules some_rule_name>
...
</rules>

which defines a given rule set. You can have any number of rules blocks (all must have unique names) and then pick the one you want to use using rule_set (available as -rule_set)

> bin/make-table -rows 3 -rule_set default
> bin/make-table -rows 3 -rule_set constant

Within the rules block, you can define any number of individual rules which apply a mean and standard deviation value to any combination of rows and columns. The rows and columns are selected using two regular expressions which are followed by the mean and standard deviation. For example,

<rules some_rule_name>
rule = . . 100 25
</rules>

Will filter rows and columns using regular expression '.' (i.e. any character). Thus each cell in the table will be assigned a (mean,sd) pair of (100,25).

By next adding another rule, such as "A . 50 10", cells in row A can be adjusted (regular expression for the row is 'A' and for the column '.').

<rules some_rule_name>
rule = . . 100 25
rule = A . 50 10
</rules>

Finally, a single cell can be adjusted by specifying a regular expression that uniquely selects the cell.

<rules some_rule_name>
rule = . . 100 25
rule = A . 50 10
rule = A D 200 50
</rules>

To see this rule set in action,

> bin/make-table -rows 3 -seed 123 -unique_labels -rule some_rule_name
mean  lbl    D    E    F
mean    A  200   50   50
mean    B  100  100  100
mean    C  100  100  100
sd  lbl    D    E    F
sd    A   50   10   10
sd    B   25   25   25
sd    C   25   25   25
table  lbl    D    E    F
table    A  231   50   46
table    B   49   78  136
table    C   95   79   97

You can see the effect of the rule entries in the rule set on the mean and sd lines in the full table report. Adjusting the distributions from which cell values are sampled is very helpful to explore how data patterns manifest themselves in the visualization. For example, how would the visualization change if all the values in a given row (and/or column) are doubled?

relative adjustments to distribution parameters

In the examples above, the rules specified absolute values for both mean and standard deviation values. You can adjust the cell values using relative notation (rVALUE) for any rule, as long as the cell already has an absolute value associated with it. The relative value is used as a multiplier. For example,

<rules some_rule_name>
rule = . . 100 25
rule = A . r2 r0.2
</rules>

Will apply (mean,sd)=(100,25) to all the cells (first rule) and then set the mean of cells in row A to be twice their value (e.g. 100 -> 200) and the standard deviation to be 0.2 times their value (e.g. 25 -> 5). The relative syntax makes it possible to grow or attenuate values in rows, columns and individual cells relative to other parts of the table, and define the baseline values only once.

handling negative and missing values

The tabular visualization requires that cell values in the table be non-negative. Given that ribbons are used in the visualization to represent cell values, and that their thickness is proportional to the value in the cell, negative cell values do not have a corresponding visual form. If your data set contains negative values that you'd like to include in the image, you can use remap negative values onto a unique range and then use rules in the Circos configuration file to apply distinct formatting to ribbons in this range.

The output of make-table can contain negative values, however - it will be up to you to manage these downstream. If you set a large standard deviation, relative to the mean, it's likely that some of your sampled values will be negative.

For example, if (mean,sd)=(100,100) for every cell, such as defined in the rule set named with_negatives,

> bin/make-table -rows 3 -seed 234 -unique_labels -rule with_negatives 
mean  lbl    D    E    F
mean    A  100  100  100
mean    B  100  100  100
mean    C  100  100  100
sd  lbl    D    E    F
sd    A  100  100  100
sd    B  100  100  100
sd    C  100  100  100
table  lbl    D    E    F
table    A  123   87  -33
table    B   82  -72  181
table    C -182  -53   62

This output was created with the following settings

positive_only       = no
non_negative_only   = no
negative_is_missing = no
zero_is_missing     = no

In other words, make-table was not asked to iterate sampling until positive values were selected for each cell, and negative values were not considered to be "missing data". If you want any negative values to be encoded as missing data, use -negative_is_missing.

> bin/make-table -rows 3 -seed 234 -unique_labels -brief -rule with_negatives -negative_is_missing
 lbl    D    E    F
   A  123   87    -
   B   82    -  181
   C    -    -   62

The missing data field is defined by the value of missing_data. Alternatively, if you don't want negative values at all, use -non_negative_only. Here, make-table will sample each cell's distribution until it finds a non-negative value (>=0). Be careful in choosing mean and standard deviation values that heavily favour negative values (e.g. mean=-100 sd=10) - you may never find a non-negative value and the make-table script will sample the distribution forever.

> bin/make-table -rows 3 -seed 234 -unique_labels -brief -rule with_negatives -non_negative_only
 lbl    D    E    F
   A  123   87   82
   B  181   62  102
   C   41  244   78

The difference between -non_negative_only and -positive_only is that the former allows 0 and the latter does not. If you want zeros to be considered missing data, set zero_is_missing=yes or use -zero_is_missing.

The purpose of modeling missing data is to explore how the table visualization deals with empty cells. There are settings in the parse-table script that control how missing values are handled.

PARSING A TABLE WITH parse-table

The core logic of tabular visualization method is implemented in the parse-table script. This is the script that reads in a table, analyzes relationships between row and column labels and produces an intermediate file which reports statistics (e.g. row, column, label) and features of individual ribbons. Although the output of this script isn't mean to be parsed by a human, its format is sufficiently clear that you can, with only a little effort, figure out what is being reported.

controlling how the table is parsed

The input to parse-table is expected to be a plain-text file that stores the tabular data. The format of the data is flexible, but it is strongly recommended that each row have the same number of fields.

There are four main parameters that control how input is parsed

  • field_delim - a regular expression used to match a delimiter between adjacent entries in a row (if you have tab-delimited data, use \t)
  • field_delim_collapse - multiple adjacent delimiters are treated as one delimiter (i.e. missing values are not interpolated)
  • strip_leading_space - any whitespace at the beginning of a line is removed (highly recommended)
  • remove_cell_rx - any characters listed in this string will be removed from any cell values (useful for quotes and thousands separators)

Let's look at some example input.

input parameters parsed
-,A,B,C
A,0,1,2
B,3,4,5
C,6,7,8
# values are comma-separated, so use the , as delimiter
field_delim = ,
  A B C
A 0 1 2
B 3 4 5
C 6 7 9
-,A,B,C
A,0,,2
B,3,4,5
C,6,7,8
field_delim = , 
# adjacent delimiters should not be collapsed
field_delim_collapse = no 
# when adjacent delimiters exist and are not collapsed, a blank
# field will result. To interpret this as missing data, set blank_means_missing
blank_means_missing = yes
  A B C
A 0 - 2
B 3 4 5
C 6 7 9
-,A,B,C
A,0,X,2
B,3,4,5
C,6,7,8
field_delim = , 
# You can use any string to explicitly indicate that the cell's data value is missing (e.g. -). This is different
# than using a zero value, because it missing values do not count towards any statistics. 
missing_cell_value = X
  A B C
A 0 X 2
B 3 4 5
C 6 7 8
-    A   B  C
A    0   1  2
B    3   4  5
C    6   7  8
# Use \s as delimiter to indicate either tab or space.
# Use ' ' to specifically indicate that a space is used.
# The distinction between tabs and a space is usually not important.
field_delim = \s
# If your input uses whitespace delimiters liberally for formatting, make sure that 
# adjacent delimiters are collapsed. Keep in mind that when tab-separated data
# is generated, adjacent tabs usually indicate missing data.
field_delim_collapse=yes
  A B C
A 0 1 2
B 3 4 5
C 6 7 8
- A B C
A "0" 1,000 (2)
B "3" 4,000 (5)
C "6" 7,000 (8)
field_delim = \s
# If your values are quoted, contain thousands-separators, or have other
# cruft, use -remove_cell_rx to define a regular expression of chararacters
# that should be removed from each field.
remove_cell_rx = ",()
  A    B C
A 0 1000 2
B 3 4000 5
C 6 7000 8

If you would like to see how parse-table parsed your table, use -show_parse. This will report the parsed version and immediately exit.

> cat samples/parse-example-1.txt | bin/parse-table -field_delim , -no-field_delim_collapse -show_parsed
data    A    B    C
   A    0    1    2
   B    3    4    5
   C    6    7    8

Generating a basic image

At this point, in order to illustrate how parse-table's configuration can be adjusted to customize the image, I need to briefly go through the process of using make-conf, the next script in the series, to create an image.

The following will generate an image of a 3 x 3 table

# first, create a 3x3 table (use a random seed so that this step is reproducible)
> bin/make-table -row 3 -seed 123 -brief > samples/table-basic.txt
# let's see the table
> cat samples/table-basic.txt
 lbl    A    B    C
   A  262  209  168
   B   28   86   45
   C   58   95   69
# now parse the table
> cat samples/table-basic.txt | bin/parse-table > tmp.txt
# now create configuration and data files
> cat tmp.txt | bin/make-conf -dir data
# let's see what was created
> ls data/
-rw-r--r--  1 martink users 246 Jun  1 15:12 all.txt
-rw-r--r--  1 martink users 726 Jun  1 15:12 cells.txt
-rw-r--r--  1 martink users 246 Jun  1 15:12 col.txt
-rw-r--r--  1 martink users  52 Jun  1 15:12 colors.conf
-rw-r--r--  1 martink users 577 Jun  1 15:12 colors_percentile.conf
-rw-r--r--  1 martink users  69 Jun  1 15:12 karyotype.txt
-rw-r--r--  1 martink users 242 Jun  1 15:12 row.txt
# now draw the image (circos.conf is already defined to use the data files from data/)
> circos -conf etc/circos.conf -outputfile table-basic.png

Figure Visualization of a 3x3 table from samples/table-basic.txt

In subsequent examples, I will be adjusting both the input data (e.g. samples/table-02.txt) and configuration files (e.g. saples/parse-table-02.conf). The input and configuration files designed to be used together will have the same suffix (e.g. -02). In case where the same table file is used repeatedly with different configuration files, the configuration files are further suffixed with a, b, c. For example, table-01.txt can be used with parse-table-01a.conf, parse-table-01b.conf, and so on.

The process of parsing a table and creating the Circos data and configuration files can be chained

# chain calls to parse-table and make-conf for table-01.txt
> cat samples/table-01.txt | bin/parse-table -conf samples/parse-table-01.conf | bin/make-conf -dir data
> circos -conf etc/circos.conf -outputfile table-01.png

The makeimage script automates this process. Once you know the number of the table file to create (see samples/table-NN.txt for different tables), run

> makeimage NN
# e.g. NN=02
> makeimage 02

to parse the table, create the data and run Circos. This script assumes that the Circos binary is at ../../bin/circos relative to the tableviewer directory.

Finally, I need to draw attention to the two distinct types of configuration files that I have mentioned here. First, there are the configuration files that control parse-table (these are all named parse-table*.conf). Second, there are configuration files that control Circos itself (these are in etc/*.conf). The former control the structure of the visualization (order/color of segments and ribbons, data remapping and normalization, etc) whereas the latter control the display of the visualization (image size, thickness of segments, tick marks, etc). The Circos configuration files will be the same for each example. Feel free to adjust these (etc/circos.conf, etc/ticks.conf, etc/ideogram.conf) to suit your needs.

Segment and ribbon order

One of the basic ways in which the table visualization can be adjusted is adjusting the order of segments and ribbons. By default, the segments are ordered based on alphabetic label order and the order ribbons is based on cell value, with ribbons for larger-valued cells appearing before those of smaller-valued cells.

Figure Visualization of a 3x3 table from samples/table-basic.txt

Segment order is controlled with the segment_order parameter. This parameter can be defined as one or more comma-delimited values that control the order of the segments, with the values taken from this set

row_major          row segments first, then column (useful with a secondary sort order within row/col group)
col_major          col segments first, then row (useful with a secondary sort order within row/col group)
ascii              asciibetic order
row_size           total of rows for the segment - useful if the segment has both row and column contributions
col_size           total of colums for the segment - useful if the segment has both row and column contributions
row_to_col_ratio   ratio of total of rows to columns for the segment
col_to_row_ratio   ratio of total of rows to columns for the segment
size_asc           size, in ascending order
size_desc          size, in descending order

with values *_ratio and *_size requiring that rows and columns share the same label. Below are some examples of visualization of table-01.txt with different segment order.

Figure Segment order is controlled with the segment_order parameter.

For example, if segment_order=col_major,size_desc then column segments are shown as a group first, and within this group segments are ordered by the column total in descending order (segment associated with column with the largest total is first). Row segments are shown after the column segments, and within this group are ordered in decreasing size.

The segment order fixes the large-scale structure of the visualization. The fine structure is determined by how the ribbons that correspond to cell values are ordered within each segment. Ribbon order is determined by the following parameters

placement_order     - determines the order of row and column ribbons, as groups, within a segment that has both row and column ribbons
ribbon_bundle_order - order of ribbons within a segment (or within group, if placement_order is used)
reverse_rows        - all row ribbons are drawn in reverse order
reverse_columns     - all column ribbons are drawn in reverse order

The placement_order parameter is useful only if you have rows and columns that share a label (these labels give rise to segments that have both row and column ribbons). We'll skip this option for now, since the rows and columns in the present table (table-01.txt) are all uniquely named.

The ribbon_bundle_order parameter is the primary parameter for controlling ribbon order. Values for this parameter can be size_asc, size_desc, ascii or native.

Figure Ribbon order is controlled with the ribbon_bundle_order parameter.

The size_asc and size_desc values correspond to a ribbon order that is defined within each segment based on the cell value (ribbon thickness). For example, when ribbon_bundle_order=size_asc is used, small ribbons are placed first. When either size_asc or size_desc are used, the ribbon order does not depend on the order of segments - the order within one segment is independent of the order within another and based only on the cell value.

When ribbon_bundle_order is set to ascii or native, however, ribbon order will depends on segment order. When set to 'ascii' ribbons are placed on a segment in order of the label of the destination segment. For example, in the above figure ribbons starting at segment A are ordered A-D, A-E, A-F where -D, -E, -F are the destination segments. Similarly, those starting on B are ordered B-D, B-E, B-F. When 'native' is used, the order is based on the actual position of the destination segments and not their labels.

The purpose of 'native' is to attempt to disambiguate the figure by reducing the number of ribbons that cross. For most data sets, there will be ribbons that cross within the figure. However, given that this number can be reduced by using a different segment order, it makes sense to do so because it results in a visually simplier figure.

The last parameter that controls how ribbons are placed is the ribbon_layer_order parameter. The value of this parameter defines the order in which ribbons are layered. Judicious use of this parameter, together with ribbon transparency, is helpful in showing contribution to the figure from both small and large cell values.

Figure Ribbon layering is controlled with the ribbon_layer_order parameter.

Controlling Segment Color

Segments are assigned a color from the range of colors defined in the block. The interpolation within this range is done in HSV (hue, saturation, value) - you define the initial and final HSV colors (h0, s0, v0) and (h1, s1, v1), respectively. Segments can also be assigned RGB color values in the input data file. This approach is covered in a subsequent section.

Segments will be assigned colors from this range of HSV values with the interpolation being guided by the number of segments or their size. If you select interpolate_type=count, then if you have N segments, the N colors will be sampled from the HSV space uniformly.

To increase color difference between large segments, you can use interpolate_type=size to sample colors in the HSV range based on size of segments. The colors for each segment in this scheme are determined as follows. First, consider the circle to represent the HSV range and stretch/shift scale so that the half-way point on the first and last segments fall on the start and end of the range, respectively. Then, the colors of each segments will be the color associated with the position of the mid-way point of each segment.

The range of values for HSV components is H=0..360, S=0..1 and V=0..1. You can use hue values larger than 360, and the effect will be a hue determined by mod(HUE,360). For example, if you have a large number of segments and would like to make the segment color appear random (more or less), use a very large value of h1 (e.g. h1=30,000).

To look slightly ahead, ribbons can inherit their colors from their segments. In the examples below, each ribbon is colored by its row segment. I will discuss later how ribbon color is adjusted.

Figure Colors of segments are interpolated within an HSV range using a count or size scheme. If segments are approximately equally sized, these two schemes produce very similar colors.

The order in which segments are displayed and the order in which the color interpolation is done are independent. Order for color is determined by segment_color_order, and may be different from segment_order.

Figure In both cases the position of segments is determined by order of their labels (segment_order = segment_color_order = ascii). In the first panel, segment color is similarly ordered. In the second panel, segment colors are assigned by decreasing segment size (segment_color_order = size_desc).

Controlling Ribbon Color

One of the ways in which ribbons can be colored is through inheriting their color from the segment to which they belong. Coloring ribbons by their row (or column) segment is as easy as changing the value of the color_source parameter in the <linkcolor> block. This is shown below.

Figure Ribbons can take on the color of their segments. Depending on the value of color_source int he <linkcolor> block, the row or column segments can be used to color ribbons.

Coloring ribbons based on their segments' color is helpful because it gives a breakdown of the row (or column) segment at the ribbon's other end. For example, in the first panel of the figure above, you can see that for 3/5 column segments (F, I, J) the largest contribution was from the red segment (A, giving rise to red ribbons).

Instead of using the segments' colors, you can color ribbons by mapping their corresponding cell values onto a color scheme. The mapping can be done using the cell values themselves or their percentile.

In the figure below, ribbon colors are initially determined by their row segments. Colors (as well as transparency and stroke) are modified based on cell values. The cutoff filters are defined within <value VALUE> blocks, which apply to any cell for which the value is <= VALUE.

Figure In this example, ribbon color is initialized from row segments. Ribbon characteristics are subsequently remapped by modifying color, transparency and stroke thickness based on the values of the cells.

If you are interested in the distribution of values, consider using the <percentile PERCENTILE> block, rather than <value>. Using this approach you can color ribbons based on how they fall within the distribution of values, rather than by absolute value.

Figure In this example, ribbon color is initialized from row segments. Ribbon characteristics are subsequently remapped by modifying color, transparency and stroke thickness based on the percentile of the cell values.

If you want to initialize ribbon characteristics before the remapping is applied, use the <linkparam> block, as follows.

<linkparam>
color            = red
stroke_color     = black
stroke_thickness = 1p
</linkparam>

If you are setting the color in this block, make sure to leave color_source undefined (comment out the definition of the parameter), otherwise the segment color will override the color defined in a <linkparam> block.

Figure Ribbon color is initialized from <linkparam> block and subsequently remapped.

When used, <value> or <percentile> blocks are internally ordered in increasing size, and for each ribbon the ordered blocks are tested to find the first one for which ribbon_value <= block_value. Once this block is found (if any), any parameters in the block are applied to the ribbon and no further blocks are tested. Thus, in the example above the empty block <value 150> acts to keep ribbons with values <=150 unaltered (they retain format characteristics set by the <linkparm> block).

One last way in which ribbon color and stroke can be altered is through the use of cell_qN_color and cell_qN_nostroke parameters. These act on ribbons based on the quartile of their values (q1 for first quartile, q2 for second, etc). Thus, in addition to remapping colors based on values or percentiles, you can ultimately override the color of the ribbons based on quartiles.

# no matter what colors were set or remapped, ribbons for
# quartiles 1-3 will be grey and without a stroke
cell_q1_color    = vvlgrey
cell_q2_color    = vlgrey
cell_q3_color    = lgrey
#cell_q4_color    = red
cell_q1_nostroke = yes
cell_q2_nostroke = yes
cell_q3_nostroke = yes
#cell_q4_nostroke = yes

using transparency

The level of transparency of a color can be adjusted in the <linkcolor> block, individual value/percentile blocks or the <linkparam> blocks. The range of transparency values (1..N) is determined by the auto_alpha_steps parameter in the circos.conf file. Circos uses this parameter (e.g. auto_alpha_steps = 5) to define a range of colors (e.g. red_a1, red_a2, ... red_a5) each with a different degree of transparency (red_a5 most transparent, red_a1 least transparent). When defining transparency in the parse-table.conf file, make sure that you stay within this range.

hiding and removing ribbons

You can hide ribbons (make them invisible, but reserve their segment positions) or remove them entirely (and shrink their segments accordingly). To do this, define any of these

# defines smallest cell value to show, by value
cell_min_value       = 50
# defines smallest cell value to show, by percentile
#cell_min_percentile = 10
# defines largest cell value to show, by value
#cell_max_value      = 100
# defines largest cell value to show, by percentile
#cell_max_percentile = 100

Then, to determine how cells that fall outside this range are handled, define cutoff_cell_handling

# hide ribbons, and keep segments as they are
cutoff_cell_handling = hide
# remove ribbons, and shrink segments accordingly
# cutoff_cell_handling = remove

Figure Ribbons associated with small (or large) values can be hidden or removed.

Suppressing the display of ribbons is useful in removing uninteresting data from the display without having to adjust the input file. If removing the ribbons altogether is too drastic, consider using the color rules defined above to selectively increase transparency or alter color (e.g. light grey) of ribbons that are not of interest.

remapping cell values

There are several ways in which you can control the relationship between table cell value and ribbon width. By default, the figure scale and ribbon width are linearly proportional to cell values. Thus, a column with a total of 100 will give rise to a column segment that is twice as large as a segment that corresponds to a column with a total of 50. Likewise, a ribbon for a cell value of 100 will be twice as wide as a ribbon for a cell of 50.

Using use_cell_remap and cell_remap_formula, you can apply any function to the cell value to transform it to a new value. The remap function is defined in the cell_remap_formula. When the function is parsed, all instances of X are replaced with the cell value and then the string is evaluated as Perl code. For example,

cell value    formula      parsed         result
10            sqrt(X)      sqrt(10)       3.16
10            log(X)       log(10)        2.30   (log is the natural logarithm)
10            exp(X)/X**2  exp(10)/10**2  220.3  (exp(X) is eX)
10            X<5?5:X      10<5?5:10      10     (Perl's ?: operator TEST?IF_TRUE:IF_FALSE)
10            X>0?log(X):0 10?log(10):0   2.30   (evaluate log(X) only if X isn't zero)

This pair of parameters will remap the cell values by their square root.

use_cell_remap     = yes
# for each 
cell_remap_formula = sqrt(X)

You can write as complex a Perl expression as you like, as long as it results in a numerical value when eval'ed. Keep in mind that Circos works with an integer scale, so ribbons for small or fractional cell values will not be distinguishable (e.g. 1.2 and 1.6 is trucated to 1). If your data is composed of small values, or your remap function produces small values (e.g. log), you can add a constant multiplier to the function to increase the dynamic range of the data (e.g. cell_remap_formula = 100*log(X)). Alternatively, you can use the data_mult parameter to apply a constant multiplier to cell values (very useful if your input data is small).

Figure Input table values can be remapped with any Perl-compatible expression. Such transformation can be done within parse-table and is equivalent to transforming the input data upstream of this script.

scaling cell values

The data remap facility is very general. You may, however, be interested in only a specific type of remapping and not wish to generate your own transformation.

By using the parameters use_scaling, scaling_type and scale_factor, you can scale the data to attenuate either large or small values. To attenuate large values, and thereby increase the visiblity of smaller ribbons, use

use_scaling  = yes
scaling_type = atten_large
# by increasing the scale factor, the effect is magnified
scale_factor = 1

Similarly, use atten_small to attenuate small values to decrease their visibility. The transformations used for these two schemes are

Figure Predefined transformations to attenuate large or small ribbons are available through the use_scaling and scaling_type parameters. For example, if your data has a lot of small ribbons that are not interesting (but you would like to keep them in the figure), consider using scaling_type=atten_small to reduce their size.

normalizing segments

The previous two sections described how individual cell values can be transformed to affect the visualization. By applying transformations like log(X), for example, you can reduce the dynamic range in the data and effectively depict a table with a large spread of values.

Independently, you can normalize the data on a row or column basis. For example, one very useful normalization is to transform all the segments to be the same size. Doing so will draw attention to relationships in the table based on their relative values.

Normalization can be done in two ways. First, cell values can be altered (e.g. so that each row adds up to the same total, any tick mark values on the figure will reflect this change), or the segments can be visually scaled (tick mark values will show the original value, but won't be uniformly spaced across segments).

To normalize segments to be the same size

use_segment_normalization      = yes
segment_normalization_function = 1000
segment_normalization_scheme   = value

By providing a normalization function that is a constant, all segment values will be scaled (the value of their cells will be remapped, since the scheme is set to "value").

Figure Segments can be normalized using a variety of schemes. Here, segments are adjusted to be of the same length.

A variety of normalization schemes are available - please see the parse-table.conf file comments immediately before segment_normalization_function.

ratio layout - drawing segments for rows and columns with the same label

So far, all the sample data had rows and columns with different labels. In other words, there were no rows that had the same label as a column. In many cases, however, data with shared labels are what you have.

For example, you may have a list of countries with the flow of travelers between them. Canada may be in a row (as a departure country) and in a column (as a destination country). In this case, row=Canada col=France would be the number of people travelling from Canada to France, whereas row=France col=Canada would be the number of people traveling in the other direction.

For a detailed description of this approach, see visualizing ratios in the first of the Visualizing Tabular Data articles. For an example, see the ratio layout for dating color preference.

Figure In a ratio layout, the two cells (row A, col B) and (row B, col A) are encoded by a single ribbon whose end at segment A represents the value at (A,B) and at segment B the value at (B,A).

The ratio layout is only feasible if you have at least one shared label between rows and columns. To toggle this mode, use

ribbon_variable = yes
ribbon_variable_intra_collapse = yes

The ribbon_variable_intra_collapse flag, when set, collapses the ribbons for transpositive cells (e.g. (A,A)) so that they do not occupy twice the space of their value (i.e. the start and end of the ribbon are superimposed and the ribbon becomes more like a peak).

CREATING CIRCOS CONFIGURATION AND DATA with make-conf

The output of parse-table is an intermediate file that stores table, row and column statistics, and information about the position of the ribbons used to represent cell values. It is the role of make-conf to take this file and generate data and configuration files that Circos can use.

Remember that Circos, by itself, cannot analyze and process data. It's use is to draw the data and it needs help (here, with parse-table) to be able to make sense of tables.

To use make-conf, simply provide the output directory where the data files should be written

# first parse
cat samples/table-01.txt | bin/parse-table -conf samples/parse-table-01.conf > tmp.1
# now create data/conf
cat tmp.1 | bin/make-conf -dir data

# or chain together
cat samples/table-01.txt | bin/parse-table -conf samples/parse-table-01.conf | bin/make-conf -dir data

Take a look in the data/ directory - you'll find files that describe the size of each segment (in karyotype.txt), define the positions for each ribbon (in cells.txt), as well as other files. In general, you will not need to make modifications to these files.

With this input data, Circos still needs to be told how large the image should be, at what radius to place the segments, how thick the segments should be, the geometry of the links and a lot of other parameters that make up the figure. These parameters, however, are independent of the tabular nature of the data and are therefore controlled independently.

In the tableviewer/etc directory you'll find circos.conf, ideogram.conf and ticks.conf. These files control the look of the final image. These files have been created to generate the kinds of images you see here. Feel free to adjust the parameters (e.g. add tick marks, decrease segment spacing, etc) to suit your needs.

For example, the link track is defined thus

<link cellvalues>
ribbon        = yes
flat          = yes
file          = data/cells.txt
bezier_radius = 0.0r
radius        = 0.999r-15p
thickness     = 1
color         = grey
stroke_color     = black
stroke_thickness = 1
<rules>

<rule>
importance = 95
condition  = 1
radius1    = 0.999r+2p
flow       = continue
</rule>
</rules>
</link>

The effect of the rule is to adjust the ribbon start position at its row segment to be closer to the segment, thereby distinguish the role of row and column segments.