You are reading the documentation for version 1.6 of OpenStructure. You may also want to read the documentation for:
1.1
1.2
1.3
1.4
1.5
1.7
1.7.1
1.8
1.9
1.10
1.11
2.0
2.1
2.2
devel

Adding/Removing/Reordering data  
AddRow() 
add a row to the table 
AddCol() 
add a column to the table 
RemoveCol() 
remove a column from the table 
RenameCol() 
rename a column 
Extend() 
append a table to the end of another table 
Merge() 
merge two tables together 
Sort() 
sort table by column 
Filter() 
filter table by values 
Select() 
select subtable based on query 
Zip() 
extract multiple columns at once 
SearchColNames() 
search for matching column names 
Input/Output  
Save() 
save a table to a file 
Load() 
load a table from a file 
ToString() 
convert a table to a string for printing 
Simple Math  
Min() 
compute the minimum of a column 
Max() 
compute the maximum of a column 
Sum() 
compute the sum of a column 
Mean() 
compute the mean of a column 
RowMean() 
compute the mean for each row 
Median() 
compute the median of a column 
StdDev() 
compute the standard deviation of a column 
Count() 
compute the number of items in a column 
More Sophisticated Math  
Correl() 
compute Pearson’s correlation coefficient 
SpearmanCorrel() 
compute Spearman’s rank correlation coefficient 
ComputeMCC() 
compute Matthew’s correlation coefficient 
ComputeROC() 
compute receiver operating characteristics (ROC) 
ComputeEnrichment() 
compute enrichment 
GetOptimalPrefactors() 
compute optimal coefficients for linear combination of columns 
Plot  
Plot() 
Plot data in 1, 2 or 3 dimensions 
PlotHistogram() 
Plot data as histogram 
PlotROC() 
Plot receiver operating characteristics (ROC) 
PlotEnrichment() 
Plot enrichment 
PlotHexbin() 
Hexagonal density plot 
PlotBar() 
Bar plot 
Table
(col_names=[], col_types=None, **kwargs)¶The table class provides convenient access to data in tabular form. An empty table can be easily constructed as follows
tab = Table()
If you want to add columns directly when creating the table, column names and column types can be specified as follows
tab = Table(['nameX','nameY','nameZ'], 'sfb')
this will create three columns called nameX, nameY and nameZ of type string, float and bool, respectively. There will be no data in the table and thus, the table will not contain any rows.
The following column types are supported:
name  abbrev 

string  s 
float  f 
int  i 
bool  b 
If you want to add data to the table in addition, use the following:
tab=Table(['nameX','nameY','nameZ'],
'sfb',
nameX = ['a','b','c'],
nameY = [0.1, 1.2, 3.414],
nameZ = [True, False, False])
if values for one column is left out, they will be filled with NA, but if values are specified, all values must be specified (i.e. same number of values per column)
AddCol
(col_name, col_type, data=None)¶Add a column to the right of the table.
Parameters: 


Example:
tab = Table(['x'], 'f', x=range(5))
tab.AddCol('even', 'bool', itertools.cycle([True, False]))
print tab
'''
will produce the table
==== ====
x even
==== ====
0 True
1 False
2 True
3 False
4 True
==== ====
'''
If data is a constant instead of an iterable object, it’s value will be written into each row:
tab = Table(['x'], 'f', x=range(5))
tab.AddCol('num', 'i', 1)
print tab
'''
will produce the table
==== ====
x num
==== ====
0 1
1 1
2 1
3 1
4 1
==== ====
'''
As a special case, if there are no previous rows, and data is not None, rows are added for every item in data.
AddRow
(data, overwrite=None)¶Add a row to the table.
data may either be a dictionary or a listlike object:
 If data is a dictionary, the keys in the dictionary must match the column names. Columns not found in the dict will be initialized to None. If the dict contains listlike objects, multiple rows will be added, if the number of items in all listlike objects is the same, otherwise a
ValueError
is raised. If data is a listlike object, the row is initialized from the values in data. The number of items in data must match the number of columns in the table. A
ValuerError
is raised otherwise. The values are added in the order specified in the list, thus, the order of the data must match the columns.
If overwrite is not None and set to an existing column name, the specified column in the table is searched for the first occurrence of a value matching the value of the column with the same name in the dictionary. If a matching value is found, the row is overwritten with the dictionary. If no matching row is found, a new row is appended to the table.
Parameters: 


Raises: 

Raises: 

Example: add multiple data rows to a subset of columns using a dictionary
# create table with three float columns
tab = Table(['x','y','z'], 'fff')
# add rows from dict
data = {'x': [1.2, 1.6], 'z': [1.6, 5.3]}
tab.AddRow(data)
print tab
'''
will produce the table
==== ==== ====
x y z
==== ==== ====
1.20 NA 1.60
1.60 NA 5.30
==== ==== ====
'''
# overwrite the row with x=1.2 and add row with x=1.9
data = {'x': [1.2, 1.9], 'z': [7.9, 3.5]}
tab.AddRow(data, overwrite='x')
print tab
'''
will produce the table
==== ==== ====
x y z
==== ==== ====
1.20 NA 7.90
1.60 NA 5.30
1.90 NA 3.50
==== ==== ====
'''
ComputeEnrichment
(score_col, class_col, score_dir='', class_dir='', class_cutoff=2.0)¶Computes the enrichment of column score_col classified according to class_col.
For this it is necessary, that the datapoints are classified into positive and negative points. This can be done in two ways:
 by using one ‘bool’ type column (class_col) which contains True for positives and False for negatives
 by specifying a classification column (class_col), a cutoff value (class_cutoff) and the classification columns direction (class_dir). This will generate the classification on the fly
 if
class_dir==''
: values in the classification column that are less than or equal to class_cutoff will be counted as positives if
class_dir=='+'
: values in the classification column that are larger than or equal to class_cutoff will be counted as positives
During the calculation, the table will be sorted according to score_dir, where a ‘‘ values means smallest values first and therefore, the smaller the value, the better.
Warning:  If either the value of class_col or score_col is None, the data in this row is ignored. 

ComputeEnrichmentAUC
(score_col, class_col, score_dir='', class_dir='', class_cutoff=2.0)¶Computes the area under the curve of the enrichment using the trapezoidal rule.
For more information about parameters of the enrichment, see
ComputeEnrichment()
.
Warning:  The function depends on numpy 

ComputeLogROCAUC
(score_col, class_col, score_dir='', class_dir='', class_cutoff=2.0)¶Computes the area under the curve of the log receiver operating characteristics (logROC) where the xaxis is semilogarithmic using the trapezoidal rule.
The logROC is computed with a lambda of 0.001 according to Rapid ContextDependent Ligand Desolvation in Molecular Docking Mysinger M. and Shoichet B., Journal of Chemical Information and Modeling 2010 50 (9), 15611573
For more information about parameters of the ROC, see
ComputeROC()
.
Warning:  The function depends on numpy 

ComputeMCC
(score_col, class_col, score_dir='', class_dir='', score_cutoff=2.0, class_cutoff=2.0)¶Compute Matthews correlation coefficient (MCC) for one column (score_col) with the points classified into true positives, false positives, true negatives and false negatives according to a specified classification column (class_col).
The datapoints in score_col and class_col are classified into positive and negative points. This can be done in two ways:
 by using ‘bool’ columns which contains True for positives and False for negatives
 by using ‘float’ or ‘int’ columns and specifying a cutoff value and the columns direction. This will generate the classification on the fly
 if
class_dir
/score_dir==''
: values in the classification column that are less than or equal to class_cutoff/score_cutoff will be counted as positives if
class_dir
/score_dir=='+'
: values in the classification column that are larger than or equal to class_cutoff/score_cutoff will be counted as positives
The two possibilities can be used together, i.e. ‘bool’ type for one column and ‘float’/’int’ type and cutoff/direction for the other column.
ComputeROC
(score_col, class_col, score_dir='', class_dir='', class_cutoff=2.0)¶Computes the receiver operating characteristics (ROC) of column score_col classified according to class_col.
For this it is necessary, that the datapoints are classified into positive and negative points. This can be done in two ways:
 by using one ‘bool’ column (class_col) which contains True for positives and False for negatives
 by using a nonbool column (class_col), a cutoff value (class_cutoff) and the classification columns direction (class_dir). This will generate the classification on the fly
 if
class_dir==''
: values in the classification column that are less than or equal to class_cutoff will be counted as positives if
class_dir=='+'
: values in the classification column that are larger than or equal to class_cutoff will be counted as positives
During the calculation, the table will be sorted according to score_dir, where a ‘‘ values means smallest values first and therefore, the smaller the value, the better.
If class_col does not contain any positives (i.e. value is True (if column is of type bool) or evaluated to True (if column is of type int or float (depending on class_dir and class_cutoff))) the ROC is not defined and the function will return None.
Warning:  If either the value of class_col or score_col is None, the data in this row is ignored. 

ComputeROCAUC
(score_col, class_col, score_dir='', class_dir='', class_cutoff=2.0)¶Computes the area under the curve of the receiver operating characteristics using the trapezoidal rule.
For more information about parameters of the ROC, see
ComputeROC()
.
Warning:  The function depends on numpy 

Correl
(col1, col2)¶Calculate the Pearson correlation coefficient between col1 and col2, only taking rows into account where both of the values are not equal to None. If there are not enough data points to calculate a correlation coefficient, None is returned.
Parameters: 


Count
(col, ignore_nan=True)¶Count the number of cells in column that are not equal to ‘’None’‘.
Parameters: 


Extend
(tab, overwrite=None)¶Append each row of tab to the current table. The data is appended based on the column names, thus the order of the table columns is not relevant, only the header names.
If there is a column in tab that is not present in the current table, it is added to the current table and filled with None for all the rows present in the current table.
If the type of any column in tab is not the same as in the current table a TypeError is raised.
If overwrite is not None and set to an existing column name, the specified column in the table is searched for the first occurrence of a value matching the value of the column with the same name in the dictionary. If a matching value is found, the row is overwritten with the dictionary. If no matching row is found, a new row is appended to the table.
Filter
(*args, **kwargs)¶Returns a filtered table only containing rows matching all the predicates in kwargs and args For example,
tab.Filter(town='Basel')
will return all the rows where the value of the column “town” is equal to “Basel”. Several predicates may be combined, i.e.
tab.Filter(town='Basel', male=True)
will return the rows with “town” equal to “Basel” and “male” equal to true. args are unary callables returning true if the row should be included in the result and false if not.
GaussianSmooth
(col, std=1.0, na_value=0.0, padding='reflect', c=0.0)¶In place Gaussian smooth of a column in the table with a given standard deviation. All nan are set to nan_value before smoothing.
Parameters: 


Warning:  The function depends on scipy 
GetColIndex
(col)¶Returns the column index for the column with the given name.
Raises:  ValueError if no column with the name is found. 

GetColNames
()¶Returns a list containing all column names.
GetName
()¶Get name of table
GetNumpyMatrix
(*args)¶Returns a numpy matrix containing the selected columns from the table as columns in the matrix.
Only columns of type int or float are supported. NA values in the table will be converted to None values.
Parameters:  *args – column names to include in numpy matrix 

Warning:  The function depends on numpy 
GetOptimalPrefactors
(ref_col, *args, **kwargs)¶This returns the optimal prefactor values (i.e. a, b, c, ...) for the following equation
(1)
where u, v, w and z are vectors. In matrix notation
(2)
where A contains the data from the table (u,v,w,...), p are the prefactors to optimize (a,b,c,...) and z is the vector containing the result of equation (1).
The parameter ref_col equals to z in both equations, and *args are columns u, v and w (or A in (2)). All columns must be specified by their names.
Example:
tab.GetOptimalPrefactors('colC', 'colA', 'colB')
The function returns a list of containing the prefactors a, b, c, ... in the correct order (i.e. same as columns were specified in *args).
Weighting: If the kwarg weights=”columX” is specified, the equations are weighted by the values in that column. Each row is multiplied by the weight in that row, which leads to (3):
(3)
Weights must be float or int and can have any value. A value of 0 ignores this equation, a value of 1 means the same as no weight. If all weights are the same for each row, the same result will be obtained as with no weights.
Example:
tab.GetOptimalPrefactors('colC', 'colA', 'colB', weights='colD')
GetUnique
(col, ignore_nan=True)¶Extract a list of all unique values from one column.
Parameters: 


HasCol
(col)¶Checks if the column with a given name is present in the table.
IsEmpty
(col_name=None, ignore_nan=True)¶Checks if a table is empty.
If no column name is specified, the whole table is checked for being empty, whereas if a column name is specified, only this column is checked.
By default, all NAN (or None) values are ignored, and thus, a table containing only NAN values is considered as empty. By specifying the option ignore_nan=False, NAN values are counted as ‘normal’ values.
Load
(stream_or_filename, format='auto', sep=', ')¶Load table from stream or file with given name.
By default, the file format is set to auto, which tries to guess the file format from the file extension. The following file extensions are recognized:
extension  recognized format 

.csv  comma separated values 
.pickle  pickled byte stream 
<all others>  ostspecific format 
Thus, format must be specified for reading file with different filename extensions.
The following file formats are understood:
ost
This is an ostspecific, but still human readable file format. The file (stream) must start with header line of the form
col_name1[type1] <col_name2[type2]>...
The types given in brackets must be one of the data types the
Table
class understands. Each following line in the file then must
contains exactly the same number of data items as listed in the header. The
data items are automatically converted to the column format. Lines starting
with a ‘#’ and empty lines are ignored.
pickle
Deserializes the table from a pickled byte stream.
csv
Reads the table from comma separated values stream. Since there is no explicit type information in the csv file, the column types are guessed, using the following simple rules:
Returns:  A new Table instance 

Max
(col)¶Returns the maximum value in col. If several rows have the highest value, only the first one is returned. ‘’None’’ values are ignored.
Parameters:  col (str ) – column name 

MaxIdx
(col)¶Returns the row index of the cell with the maximal value in col. If several rows have the highest value, only the first one is returned. ‘’None’’ values are ignored.
Parameters:  col (str ) – column name 

MaxRow
(col)¶Returns the row containing the cell with the maximal value in col. If several rows have the highest value, only the first one is returned. ‘’None’’ values are ignored.
Parameters:  col (str ) – column name 

Returns:  row with maximal col value or None if the table is empty 
Mean
(col)¶Returns the mean of the given column. Cells with ‘’None’’ are ignored. Returns None, if the column doesn’t contain any elements. Col must be of numeric (‘float’, ‘int’) or boolean column type.
If column type is bool, the function returns the ratio of number of ‘Trues’ by total number of elements.
Parameters:  col (str ) – column name 

Raises:  TypeError if column type is string 
Median
(col)¶Returns the median of the given column. Cells with ‘’None’’ are ignored. Returns ‘’None’‘, if the column doesn’t contain any elements. Col must be of numeric column type (‘float’, ‘int’) or boolean column type.
Parameters:  col (str ) – column name 

Raises:  TypeError if column type is string 
Min
(col)¶Returns the minimal value in col. If several rows have the lowest value, only the first one is returned. ‘’None’’ values are ignored.
Parameters:  col (str ) – column name 

MinIdx
(col)¶Returns the row index of the cell with the minimal value in col. If several rows have the lowest value, only the first one is returned. ‘’None’’ values are ignored.
Parameters:  col (str ) – column name 

MinRow
(col)¶Returns the row containing the cell with the minimal value in col. If several rows have the lowest value, only the first one is returned. ‘’None’’ values are ignored.
Parameters:  col (str ) – column name 

Returns:  row with minimal col value or None if the table is empty 
PairedTTest
(col_a, col_b)¶Twosided test for the nullhypothesis that two related samples have the same average (expected values).
Parameters: 


Returns:  Pvalue between 0 and 1 that the two columns have the same average. The smaller the value, the less related the two columns are. 
Percentiles
(col, nths)¶Returns the percentiles of column col given in nths.
The percentiles are calculated as
values[min(len(values), int(round(len(values)*p/100+0.5)1))]
where values are the sorted values of col not equal to ‘’None’’ :param: nths: list of percentiles to be calculated. Each percentile is a number
between 0 and 100.
Raises:  TypeError if column type is string 

Returns:  List of percentiles in the same order as given in nths 
Plot
(x, y=None, z=None, style='.', x_title=None, y_title=None, z_title=None, x_range=None, y_range=None, z_range=None, color=None, plot_if=None, legend=None, num_z_levels=10, z_contour=True, z_interpol='nn', diag_line=False, labels=None, max_num_labels=None, title=None, clear=True, save=False, **kwargs)¶Function to plot values from your table in 1, 2 or 3 dimensions using Matplotlib
Parameters: 


Returns:  the 
Examples: simple plotting functions
tab = Table(['a','b','c','d'],'iffi', a=range(5,0,1),
b=[x/2.0 for x in range(1,6)],
c=[math.cos(x) for x in range(0,5)],
d=range(3,8))
# one dimensional plot of column 'd' vs. index
plt = tab.Plot('d')
plt.show()
# two dimensional plot of 'a' vs. 'c'
plt = tab.Plot('a', y='c', style='o')
plt.show()
# three dimensional plot of 'a' vs. 'c' with values 'b'
plt = tab.Plot('a', y='c', z='b')
# manually save plot to file
plt.savefig("plot.png")
PlotBar
(cols=None, rows=None, xlabels=None, set_xlabels=True, xlabels_rotation='horizontal', y_title=None, title=None, colors=None, width=0.8, bottom=0, legend=False, legend_names=None, show=False, save=False)¶Create a barplot of the data in cols. Every column will be represented at one position. If there are several rows, each column will be grouped together.
Parameters: 


Title:  Title of the plot. No title appears if set to None 
PlotEnrichment
(score_col, class_col, score_dir='', class_dir='', class_cutoff=2.0, style='', title=None, x_title=None, y_title=None, clear=True, save=None)¶Plot an enrichment curve using matplotlib of column score_col classified according to class_col.
For more information about parameters of the enrichment, see
ComputeEnrichment()
, and for plotting see Plot()
.
Warning:  The function depends on matplotlib 

PlotHexbin
(x, y, title=None, x_title=None, y_title=None, x_range=None, y_range=None, binning='log', colormap='jet', show_scalebar=False, scalebar_label=None, clear=True, save=False, show=False)¶Create a heatplot of the data in col x vs the data in col y using matplotlib
Parameters: 


PlotHistogram
(col, x_range=None, num_bins=10, normed=False, histtype='stepfilled', align='mid', x_title=None, y_title=None, title=None, clear=True, save=False, color=None, y_range=None)¶Create a histogram of the data in col for the range x_range, split into num_bins bins and plot it using Matplotlib.
Parameters: 


Examples: simple plotting functions
tab = Table(['a'],'f', a=[math.cos(x*0.01) for x in range(100)])
# one dimensional plot of column 'd' vs. index
plt = tab.PlotHistogram('a')
plt.show()
PlotLogROC
(score_col, class_col, score_dir='', class_dir='', class_cutoff=2.0, style='', title=None, x_title=None, y_title=None, clear=True, save=None)¶Plot an logROC curve where the xaxis is semilogarithmic using matplotlib
For more information about parameters of the ROC, see
ComputeROC()
, and for plotting see Plot()
.
Warning:  The function depends on matplotlib 

PlotROC
(score_col, class_col, score_dir='', class_dir='', class_cutoff=2.0, style='', title=None, x_title=None, y_title=None, clear=True, save=None)¶Plot an ROC curve using matplotlib.
For more information about parameters of the ROC, see
ComputeROC()
, and for plotting see Plot()
.
Warning:  The function depends on matplotlib 

RemoveCol
(col)¶Remove column with the given name from the table.
Parameters:  col (str ) – name of column to remove 

RenameCol
(old_name, new_name)¶Rename column old_name to new_name.
Parameters: 


Raises: 

RowMean
(mean_col_name, cols)¶Adds a new column of type ‘float’ with a specified name (mean_col_name), containing the mean of all specified columns for each row.
Cols are specified by their names and must be of numeric column type (‘float’, ‘int’) or boolean column type. Cells with None are ignored. Adds ‘’None’’ if the row doesn’t contain any values.
Parameters: 


Raises: 

== Example ==
Staring with the following table:
x  y  u 

1  10  100 
2  15  None 
3  20  400 
the code here adds a column with the name ‘mean’ to yield the table below:
x  y  u  mean 

1  10  100  50.5 
2  15  None  2 
3  20  400  201.5 
SUPPORTED_TYPES
= ('int', 'float', 'bool', 'string')¶Save
(stream_or_filename, format='ost', sep=', ')¶Save the table to stream or filename. The following three file formats
are supported (for more information on file formats, see Load()
):
ost  ostspecific format (human readable) 
csv  comma separated values (human readable) 
pickle  pickled byte stream (binary) 
html  HTML table 
context  ConTeXt table 
Parameters: 


Raises: 

SearchColNames
(regex)¶Returns a list of column names matching the regex.
Parameters:  regex (str ) – regex pattern 

Returns:  list of column names (str ) 
Select
(query)¶Returns a new table object containing all rows matching a logical query expression.
query is a string containing the logical expression, that will be evaluated for every row.
Operands have to be the name of a column or an expression that can be parsed to float, int, bool or string. Valid operators are: and, or, !=, !, <=, >=, ==, =, <, >, +, , *, /
subtab = tab.Select('col_a>0.5 and (col_b=5 or col_c=5)')
The selection query should be self explaining. Allowed parenthesis are: (), [], {}, whereas parenthesis mismatches get recognized. Expressions like ‘3<=col_a>=col_b’ throw an error, due to problems in figuring out the evaluation order.
There are two special expressions:
#selects rows, where 1.0<=col_a<=1.5
subtab = tab.Select('col_a=1.0:1.5')
#selects rows, where col_a=1 or col_a=2 or col_a=3
subtab = tab.Select('col_a=1,2,3')
Only consistent types can be compared. If col_a is of type string and col_b is of type int, following expression would throw an error: ‘col_a<col_b’
SetName
(name)¶Set name of the table
Parameters:  name (str ) – name 

Sort
(by, order='+')¶Performs an inplace sort of the table, based on column by.
Parameters: 


SpearmanCorrel
(col1, col2)¶Calculate the Spearman correlation coefficient between col1 and col2, only taking rows into account where both of the values are not equal to None. If there are not enough data points to calculate a correlation coefficient, None is returned.
Warning:  The function depends on the following module: scipy.stats.mstats 

Parameters: 

Stats
(col)¶StdDev
(col)¶Returns the standard deviation of the given column. Cells with ‘’None’’ are ignored. Returns ‘’None’‘, if the column doesn’t contain any elements. Col must be of numeric column type (‘float’, ‘int’) or boolean column type.
Parameters:  col (str ) – column name 

Raises:  TypeError if column type is string 
Sum
(col)¶Returns the sum of the given column. Cells with ‘’None’’ are ignored. Returns 0.0, if the column doesn’t contain any elements. Col must be of numeric column type (‘float’, ‘int’) or boolean column type.
Parameters:  col (str ) – column name 

Raises:  TypeError if column type is string 
ToString
(float_format='%.3f', int_format='%d', rows=None)¶Convert the table into a string representation.
The output format can be modified for int and float type columns by specifying a formatting string for the parameters float_format and int_format.
The option rows specify the range of rows to be printed. The parameter
must be a type that supports indexing (e.g. a list
) containing the
start and end row index, e.g. [start_row_idx, end_row_idx].
Parameters: 


Zip
(*args)¶Allows to conveniently iterate over a selection of columns, e.g.
tab = Table.Load('...')
for col1, col2 in tab.Zip('col1', 'col2'):
print col1, col2
is a shortcut for
tab = Table.Load('...')
for col1, col2 in zip(tab['col1'], tab['col2']):
print col1, col2
Merge
(table1, table2, by, only_matching=False)¶Returns a new table containing the data from both tables. The rows are combined based on the common values in the column(s) by. The option ‘by’ can be a list of column names. When this is the case, merging is based on multiple columns. For example, the two tables below
x  y 

1  10 
2  15 
3  20 
x  u 

1  100 
3  200 
4  400 
when merged by column x, produce the following output:
x  y  u 

1  10  100 
2  15  None 
3  20  200 
4  None  400 
Enter search terms or a module, class or function name.
table
 Working with tabular data