Unofficial Julia interface to the RNAstructure program suite for RNA structure prediction and analysis. Please cite the appropriate publications listed on the RNAstructure website if you use this library.
Enter the package mode from the Julia REPL by pressing ]
and then
install with
add RNAstructure
using RNAstructure
Sequences passed to RNAstructure use the following convention:
- uppercase character: normal nucleotide, U equivalent to T
- lowercase character: nucleotide cannot form basepairs
- X or N character: unknown base or base that cannot interact with others (cannot pair or stack)
See the RNAstructure manual section for sequences for more details.
Some programs make exceptions to these rules, check the manual pages of the RNAstructure programs for details on any differences.
The environment variables RNASTRUCTURE_JL_DATAPATH
can be set to
override the directory where energy parameters are read from. For the
cyclefold_*
functions the environment variable is called
RNASTRUCTURE_JL_CYCLEFOLD_DATAPATH
.
In the original RNAstructure program these environment variables are
called DATAPATH
and CYCLEFOLD_DATAPATH
. RNAstructure.jl
(this
package) sets these environment variables automatically to the
corresponding installation directory of the RNAstructure_jll
binary
package. The names of the env vars were changed to avoid clashes with
possible settings you might already have in your shell startup files
from a pre-existing manual RNAstructure installation, which could be a
different version and have different parameters. In this way, you can
be sure that this package uses the correct parameters, while still
allowing to override them if necessary.
The mfe
function calculates the minimum free energy and the
corresponding minimum free energy structure of an RNA
sequence. Internally, this function calls the Fold
program from
RNAstructure.
Additional information on the Fold
program and possible command-line
options that can be passed via args
can be found at the
RNAstructure Fold
documentation.
# returns mfe and structure
mfe("GGGAAACCC") # -> (-1.2 kcal mol^-1, "(((...)))")
# set temperature to 300 K
mfe(seq; args=`-T 300`) # -> (-1.9 kcal mol^-1, "(((...)))")
# show possible options for args
mfe(""; args=`-h`)
Generate suboptimal structures for a nucleic acid
sequence. Internally, this function calls the Fold
program from
RNAstructure.
Additional information on the Fold
program and possible command-line
options that can be passed via args
can be found at the
RNAstructure Fold
documentation.
subopt("GGGAAACCC")
subopt("GGGGAAACCCC"; args=`-w 0 -p 100`)
# show possible options for args
subopt(""; args=`-h`)
Generate all suboptimal structures in an energy range for a nucleic
acid sequence using the AllSub
program from RNAstructure.
Additional information on the AllSub
program and possible
command-line options that can be passed via args
can be found at
the RNAstructure AllSub
documentation.
subopt_all("GGGAAACCC")
# maximum absolute energy difference of 10 kcal/mol to the MFE, up to
# 500 percent relative difference to MFE
subopt_all("GGGGAAACCCC"; args=`-a 10 -p 500`)
# set temperature to 300 K
subopt_all("GGGGAAACCCC"; args=`-T 300`)
# show possible options for args
subopt_all(""; args=`-h`)
The partfn
function calculates the partition function and returns
the ensemble free energy for a nucleotide sequence.
Additional information on the EnsembleEnergy
program and possible
command-line options that can be passed via args
can be found at
the RNAstructure EnsembleEnergy
documentation.
partfn("GGGAAACCC")
partfn("GGGAAACCC"; args=`--DNA`)
# show possible options for args_partition, args_maxexpect
partfn(""; args=`-h`)
The prob_of_structure
function calculates the probability of a
secondary structure for a given nucleotide sequence.
The supported args are those common to energy
and partfn
.
prob_of_structure("GGGAAACCC", "(((...)))")
The mea
function predicts the maximum expected accuracy structure
(and possibly suboptimals) for a nucleotide sequence.
Additional information on the partition
program and possible
command-line options that can be passed via args_partition
can be
found at the RNAstructure partition
documentation.
Additional information on the MaxExpect
program and possible
command-line options that can be passed via args_maxexpect
can be
found at the RNAstructure MaxExpect
documentation.
mea("GGGAAACCC")
mea("GGGAAACCC"; args_partition=`-T 300`, args_maxexpect=`-s 10 -w 0`)
# show possible options for args_partition, args_maxexpect
mea(""; args_partition=`-h`)
The energy
function calls the efn2
program and parses its
output. It calculates the folding free energy and experimental
uncertainty of a sequence and one or more secondary structures.
Additional information on the efn2
program and possible command-line
options that can be passed via args
can be found at the
RNAstructure efn2
documentation.
# returns energy and experimental uncertainty
energy("GGGAAACCC",
"(((...)))")
# pseudoknot
energy("GGGAAAAGGGAAAACCCAAAACCC",
"(((....[[[....)))....]]]")
# set temperature to 300 K
energy("GGGAAAAGGGAAAACCCAAAACCC",
"(((....[[[....)))....]]]";
args=`-T 300`)
# multiple structures, returns array of results
energy("GGGAAACCC",
["(((...)))",
"((.....))"])
# show possible options for args
energy("", ""; args=`-h`)
The bpp
function calls the partition
and ProbabilityPlot
programs from RNAstructure to calculate the basepair probabilities for
an RNA sequence.
bpp("GGGAAACCC") # -> 9x9 Matrix
# show possible options for args
bpp(""; args=`-h`)
Sample secondary structures from the Boltzmann ensemble of secondary structures.
Additional information on the stochastic
program and possible
command-line options that can be passed via args
can be found at
the RNAstructure stochastic
documentation.
# returns a 1000-element Vector{String}
sample_structures("GGGAAACCC")
# show possible options for args
sample_structures(""; args=`-h`)
The cyclefold_*
functions call the CycleFold
program from
RNAstructure, which uses the nucleotide cyclic motif model by
(Parisien & Major, 2008). This model allows for non-canonical and
canonical basepairs.
NOTE: use the energy with caution --- i think the energy unit is kJ/mol, but i am not sure.
Additional information on the CycleFold
program and possible
command-line options that can be passed via args
can be found at the
RNAstructure CycleFold
documentation.
cyclefold_mea("GGGAAACCC") # -> [9, 8, 7, 6, 0, 4, 3, 2, 1]
cyclefold_mfe("GGGAAACCC") # -> (-7.8305 kJ mol^-1, [9, 8, 7, 6, 0, 4, 3, 2, 1])
cyclefold_bpp("GGGAAACCC") # -> 9Ă—9 Matrix{Float64}
# show possible options for args
cyclefold_mea(""; args=`-h`)
The design
function calls the design
program from RNAstructure.
Additional information on the design
program and possible
command-line options that can be passed via args
can be found at
the RNAstructure design
documentation.
target = "(((...)))"
# returns designed sequence and random seed used for design
design(target)
# set the random number seed used by the design process
seed = 42
design(target; args=`-s $seed`)
# show possible options for args
design(""; args=`-h`)
The ensemble_defect
function calls the EDcalculator
program from
RNAstructure. It calculates the ensemble defect and normalised
ensemble defect of a sequence and one or more secondary structures.
Additional information on the EDcalculator
program and possible
command-line options that can be passed via args
can be found at
the RNAstructure EDcalculator
documentation.
seq = "GGGAAACCC"
dbn = "(((...)))"
dbns = [dbn, "((.....))"]
ensemble_defect(seq, dbn)
ensemble_defect(seq, dbns)
ensemble_defect("AAACCCTTT", "(((...)))"; args=`-a dna`)
# show possible options for args
ensemble_defect("", ""; args=`-h`)
The remove_pseudoknots
function returns the pseudoknot-free
substructure with the maximum possible basepairs.
remove_pknots("(((...[[[[...)))...]]]]") # -> "......((((.........))))"
This function uses the dot2ct
program from RNAstructure to convert a
secondary structure in dot-bracket notation and optionally a sequence
to the ct (connectivity table) format.
# if no sequence is given, it will be all "N" in the resulting ct
# format output
dbn2ct("(((...)))")
# pseudoknots work as well
dbn2ct("(((...[[[...)))...]]]")
dbn2ct("(((...[[[...{{{...<<<...)))...]]]...}}}...>>>")
dbn2ct("(((...)))"; seq="GGGAAACCC")
dbn2ct(["(((...)))", "........."]; title="A sequence", seq="GGGAAACCC")
This function uses the ct2dot
program from RNAstructure to convert a
secondary structure and sequence in ct (connectivity table) format to
dot-bracket notation.
ct = dbn2ct("(((...)))"; title="TITLE", seq="GGGAAACCC")
print(ct)
ct2dbn(ct) # -> (title = "TITLE", seq = "GGGAAACCC", dbn = "(((...)))")
ct2 = dbn2ct("........."; title="TITLE2", seq="NNNAAANNN")
ct_twostruct = ct * ct2
print(ct_twostruct)
ct2dbn(ct_twostruct, 2) # -> (title = "TITLE2", seq = "NNNAAANNN", dbn = ".........")
This function uses the draw
program from RNAstructure to plot a
secondary structure in dot-bracket notation to SVG format. This
should show an image when used in Jupyter and Pluto notebooks.
Additional information on the draw
program and possible command-line
options that can be passed via args
can be found at the
RNAstructure draw
documentation.
plot("(((...)))", "GGGAAACCC")
plot("(((...)))", "GGGAAACCC"; args=`--circle`)
plot("(((...)))", "GGGAAACCC"; args=`--flat`)
plot("(((...)))", "GGGAAACCC"; args=`--uncircled`)
These functions setup input files automatically and read output files,
but don"t parse the results. They typically return the exit status of
the RNAstructure program, the contents of the output file, and
stdout/stderr output. Additional command-line arguments can be passed
to the programs with the keyword argument args
.
The AllSub
program calculates all suboptimal structures within a
certain energy range.
See the RNAstructure AllSub
documentation for
more details and for command-line arguments that can be passed via
args
.
RNAstructure.run_AllSub("GGGAAACCC")
RNAstructure.run_AllSub("GGGAAACCC"; args=`-a 10 -p 500`)
The ct2dot
converts secondary structures in connectivity table (ct)
format to dot-bracket notation.
See the RNAstructure ct2dot
documentation for
more details and for command-line arguments that can be passed via
args
.
ct = dbn2ct("(((...)))")
RNAstructure.run_ct2dot(ct)
ct2 = ct * dbn2ct(".........")
RNAstructure.run_ct2dot(ct2, 2)
The dot2ct
converts secondary structures in dot-bracket notation to
connectivity table (ct) format.
See the RNAstructure dot2ct
documentation for
more details and for command-line arguments that can be passed via
args
.
RNAstructure.run_dot2ct("(((...)))")
RNAstructure.run_dot2ct("(((...)))"; seq="GGGAAACCC")
RNAstructure.run_dot2ct(["(((...)))", "........."];
title="A sequence", seq="GGGAAACCC")
The draw
program draws secondary structure diagrams.
See the RNAstructure draw
documentation for more
details and for command-line arguments that can be passed via
args
.
RNAstructure.run_draw("(((...)))", "GGGAAACCC"; args=`--svg`)
The EDcalculator
program calculates the ensemble defect of a
sequence and one or more secondary structures.
See the RNAstructure EDcalculator
documentation
for more details and for command-line arguments that can be passed via
args
.
RNAstructure.run_EDcalculator("GGGAAACCC", "(((...)))")
RNAstructure.run_EDcalculator("GGGAAACCC", ["(((...)))", "((.....))"])
RNAstructure.run_EDcalculator("GGGAAACCC", "(((...)))"; args=`-a dna`)
The efn2
program calculates the folding free energy of a sequence
and one or more secondary structures.
See the RNAstructure efn2
documentation for more
details and for command-line arguments that can be passed via
args
.
RNAstructure.run_efn2("GGGAAACCC", "(((...)))")
RNAstructure.run_efn2("GGGAAACCC", ["(((...)))", "((.....))"])
RNAstructure.run_efn2("GGGAAACCC", "(((...)))"; args=`-T 300`)
The EnsembleEnergy
program calculates the ensemble energy of
structures for an RNA sequence, given by the formula -RT log(Q)
.
See the RNAstructure EnsembleEnergy
documentation
for more details and for command-line arguments that can be passed via
args
.
RNAstructure.run_EnsembleEnergy("GGGAAACCC")
The Fold
program calculates minimum free energy (mfe) and suboptimal
structures.
See the RNAstructure Fold
documentation for more
details and for command-line arguments that can be passed as
args
.
RNAstructure.run_Fold("GGGAAACCC")
RNAstructure.run_Fold("GGGAAACCC"; args=`-mfe`)
The MaxExpect
program predicts the maximum expected accuracy (MEA)
structure for an RNA sequence.
See the RNAstructure MaxExpect
documentation for
more details and for command-line arguments that can be passed via
args
.
pf_savefile = "out.pfs"
RNAstructure.run_partition!(pf_savefile, "GGGAAACCC")
RNAstructure.run_MaxExpect(pf_savefile)
The partition
program calculates the partition function and basepair
probabilities for an RNA sequence and saves this information in a
partition save file, which can then be used by other programs.
See the RNAstructure partition
documentation for
more details and for command-line arguments that can be passed via
args
.
# write the partition function save file to "save.pfs", overwriting
# any data if the file already exists
RNAstructure.run_partition!("save.pfs", "GGGAAACCC")
The ProbabilityPlot
program extracts basepair probabilities from a
partition function save file generated with partition
and can output
them as a text file or as a dot plot.
See the RNAstructure ProbabilityPlot
documentation
for more details and for command-line arguments that can be passed via
args
.
pf_savefile = "save.pfs"
RNAstructure.run_partition!(pf_savefile, "GGGAAACCC")
RNAstructure.run_ProbabilityPlot(pf_savefile)
The RemovePseudoknots
program removes pseudoknots from an RNA
secondary structure, returning either the structure with the most base
pairs or the structure with lowest folding free energy.
See the RNAstructure RemovePseudoknots
documentation
for more details and for command-line arguments that can be passed via
args
.
# maximise basepairs in returned structure
dbn = "((...[[[[...))..]].]]"
RNAstructure.run_RemovePseudoknots("N"^length(dbn), dbn; args=`-m`)
# return pseudoknot-free structure with lowest folding free energy at
# a temperature of 300 K for a given sequence
seq = "GGAAAAUGCAAACCAAGCAAU"
RNAstructure.run_RemovePseudoknots(seq, dbn; args=`-T 300`)
The stochastic
program samples from the Boltzmann ensemble of
secondary structures.
See the RNAstructure stochastic
documentation
for more details and for command-line arguments that can be passed via
args
.
RNAstructure.run_stochastic("GGGAAACCC")