Software and Libraries


TensorFlowTensorFlow is an open source software library for high performance numerical computation. Its flexible architecture allows easy deployment of computation across a variety of platforms (CPUs, GPUs, TPUs), and from desktops to clusters of servers to mobile and edge devices.tensorflow
#Example for ConvNets
from tensorflow import keras
from tensorflow.keras import layers
model = keras.Sequential()
model.add(layers.Conv1D(64, 9, activation='relu')
model.add(layers.Conv1D(32, 4, activation='relu')
model.add(layers.Dense(100, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy, metrics=['accuracy'])
#x, and y represent input and labels, respectively., y)
pybedtoolspybedtools wraps and extends BEDTools and offers feature-level manipulations from within Pythonbedtools
#Example for finding overlapping features using bedtools intersect
from pybedtools import BedTool
bedtool_a = BedTool(input_bedfile_a)
bedtool_b = BedTool(input_bedfile_b)
a_with_b = bedtool_a.intersect(bedtool_b)
bedtoolsbedtools: flexible tools for genome arithmetic and DNA sequence analysis.bedtools; option bedtools_version 2.23.0bedtools <subcommand> [options]
Find overlapping intervals: intersect [OPTIONS] -a <bed/gff/vcf/bam> -b <bed/gff/vcf/bam>
Profile the nucleotide content of intervals in a FASTA file: bedtools nuc [OPTIONS] -fi <fasta>
lastzlastz-- Local Alignment Search Tool, blastZ-likelastzlastz target [query] [options]
matlabNumerical computing environment and programming languagematlabmatlab
memetool for discovering motifs in a group of related DNA or protein sequences.memememe <dataset> [optional arguments] where <dataset> file containing sequences in FASTA format
fimoFIMO scans a sequence database for individual matches to each of the motifs you provide.memefimo --no-qvalue --oc ./outputDir/ --bgfile motif-file /panfs/ /panfs/
rstudioUser interface for Rrstudio; option rstudio_version 0.98.493rstudio


fimoScriptScript to get TFBS of a fasta sequence/panfs/ OnefastaSequence motifFile threshold nonRedundant
threshold options: NONE,1,5 / nonRedundantPWMs options: 50,75,90, default is ByName
e.g. /panfs/ /panfs/ /panfs/ 5
output will be in fimoConverted.txt
pan_dfChecking available space on panfs. /opt/panfs/bin/pan_df -H /panfs/
LiftOverConverts genome coordinates and genome annotation files between assemblies/panfs/.../devdcode/common/softwareTools/liftOver chainFile bedFile outputFile
tfbsFragextracting a list of FIMO TFBS for a region in the human (or mouse) genome/panfs/.../devdcode/common/bin/tfbsFragtfbsFrag [options] region genome

for example:
tfbsFrag chr1:11120081-11120094 hg19
tfbsFrag chr1:11120081-11120094 hg19 -r (coordinates relative to the input region)
tfbsFrag chr1:11120081-11120094 hg19 -t (tfSearch output format)
tfbsFrag chr1:11120081-11120094 hg19 -s 1 (stringency 1 per 10kb)
tfbsFrag chr1:11120081-11120094 hg19 -n 1 (non redundant PWMs)

available genomes: hg19, mm10, and mm9
fetch_controlsfetching controls (controlled for GC, repeat content, and length) for a list of regions in a genome/panfs/.../devdcode/common/bin/fetch_controlsfetch_controls [options] db[hg19] element_list[chrom:start-stop]

for example:
fetch_controls hg19 signal_list.txt
fetch_controls hg19 signal_list.txt -n=X (X control sequences per each signal sequence)
fetch_controls hg19 signal_list.txt -e (avoid overlaps w/ refGene+knownGene coding exons)
fetch_controls hg19 signal_list.txt -nm (no repeat masking)
fetch_controls hg19 signal_list.txt -ng (no GC matching)
annotateElements.plannotating genic features (intergenic, UTR, etc) for a set of elements/panfs/.../devdcode/common/bin/ element_list[chrom:start-stop] db[hg19]


MEME motifs databaseTF motifs database/panfs/
CTCF-CTCF interactionsfrom ChIA-Pet/panfs/
PWM Ids (with info content) to TF namesPWMs to TF names/panfs/