Software and Libraries

Facilities

NameDescriptionfacilitiesUsageManual/Tutorial
TensorFlowTensorFlow is an open source software library for high performance numerical computation. Its flexible architecture allows easy deployment of computation across a variety of platforms (CPUs, GPUs, TPUs), and from desktops to clusters of servers to mobile and edge devices.tensorflow
#Example for ConvNets
from tensorflow import keras
from tensorflow.keras import layers
model = keras.Sequential()
model.add(layers.Conv1D(64, 9, activation='relu')
model.add(layers.BatchNormalization())
model.add(layers.Conv1D(32, 4, activation='relu')
model.add(layers.Flatten())
model.add(layers.Dense(100, activation='relu'))
model.add(layers.Dense(1, activation='sigmoid'))
model.compile(loss='binary_crossentropy, metrics=['accuracy'])
#x, and y represent input and labels, respectively.
model.fit(x, y)
https://www.tensorflow.org/
pybedtoolspybedtools wraps and extends BEDTools and offers feature-level manipulations from within Pythonbedtools
#Example for finding overlapping features using bedtools intersect
from pybedtools import BedTool
bedtool_a = BedTool(input_bedfile_a)
bedtool_b = BedTool(input_bedfile_b)
a_with_b = bedtool_a.intersect(bedtool_b)
https://daler.github.io/pybedtools/
bedtoolsbedtools: flexible tools for genome arithmetic and DNA sequence analysis.bedtools; option bedtools_version 2.23.0bedtools <subcommand> [options]
Find overlapping intervals: intersect [OPTIONS] -a <bed/gff/vcf/bam> -b <bed/gff/vcf/bam>
Profile the nucleotide content of intervals in a FASTA file: bedtools nuc [OPTIONS] -fi <fasta>
http://bedtools.readthedocs.org/en/latest/index.html
lastzlastz-- Local Alignment Search Tool, blastZ-likelastzlastz target [query] [options]http://www.bx.psu.edu/miller_lab/dist/README.lastz-1.02.00/README.lastz-1.02.00a.html
matlabNumerical computing environment and programming languagematlabmatlabhttp://www.mathworks.com/help/pdf_doc/matlab/getstart.pdf
memetool for discovering motifs in a group of related DNA or protein sequences.memememe <dataset> [optional arguments] where <dataset> file containing sequences in FASTA formathttp://meme-suite.org/doc/meme.html?man_type=web
fimoFIMO scans a sequence database for individual matches to each of the motifs you provide.memefimo --no-qvalue --oc ./outputDir/ --bgfile motif-file /panfs/pan1.be-md.ncbi.nlm.nih.gov/devdcode/common/tfFimo/all.meme /panfs/pan1.be-md.ncbi.nlm.nih.gov/devdcode/common/tfFimo/random10kbHuman.fahttp://meme-suite.org/doc/fimo.html?man_type=web
rstudioUser interface for Rrstudio; option rstudio_version 0.98.493rstudiohttp://dss.princeton.edu/training/RStudio101.pdf



Programs

NameDescriptionlocationUsage
fimoScriptScript to get TFBS of a fasta sequence/panfs/pan1.be-md.ncbi.nlm.nih.gov/devdcode/common/tfFimo/fimoScript.shfimoScript.sh OnefastaSequence motifFile threshold nonRedundant
threshold options: NONE,1,5 / nonRedundantPWMs options: 50,75,90, default is ByName
e.g. /panfs/pan1.be-md.ncbi.nlm.nih.gov/devdcode/common/tfFimo/fimoScript.sh /panfs/pan1.be-md.ncbi.nlm.nih.gov/devdcode/common/tfFimo/random500.fa /panfs/pan1.be-md.ncbi.nlm.nih.gov/devdcode/common/tfFimo/all.meme 5
output will be in fimoConverted.txt
pan_dfChecking available space on panfs. /opt/panfs/bin/pan_df -H /panfs/pan1.be-md.ncbi.nlm.nih.gov/devdcode
LiftOverConverts genome coordinates and genome annotation files between assemblies/panfs/.../devdcode/common/softwareTools/liftOver chainFile bedFile outputFile
tfbsFragextracting a list of FIMO TFBS for a region in the human (or mouse) genome/panfs/.../devdcode/common/bin/tfbsFragtfbsFrag [options] region genome

for example:
tfbsFrag chr1:11120081-11120094 hg19
tfbsFrag chr1:11120081-11120094 hg19 -r (coordinates relative to the input region)
tfbsFrag chr1:11120081-11120094 hg19 -t (tfSearch output format)
tfbsFrag chr1:11120081-11120094 hg19 -s 1 (stringency 1 per 10kb)
tfbsFrag chr1:11120081-11120094 hg19 -n 1 (non redundant PWMs)

available genomes: hg19, mm10, and mm9
fetch_controlsfetching controls (controlled for GC, repeat content, and length) for a list of regions in a genome/panfs/.../devdcode/common/bin/fetch_controlsfetch_controls [options] db[hg19] element_list[chrom:start-stop]

for example:
fetch_controls hg19 signal_list.txt
fetch_controls hg19 signal_list.txt -n=X (X control sequences per each signal sequence)
fetch_controls hg19 signal_list.txt -e (avoid overlaps w/ refGene+knownGene coding exons)
fetch_controls hg19 signal_list.txt -nm (no repeat masking)
fetch_controls hg19 signal_list.txt -ng (no GC matching)
annotateElements.plannotating genic features (intergenic, UTR, etc) for a set of elements/panfs/.../devdcode/common/bin/annotateElements.plannotateElements.pl element_list[chrom:start-stop] db[hg19]



Libraries

NameDescriptionlocation
MEME motifs databaseTF motifs database/panfs/pan1.be-md.ncbi.nlm.nih.gov/devdcode/common/tfFimo/all.meme
CTCF-CTCF interactionsfrom ChIA-Pet/panfs/pan1.be-md.ncbi.nlm.nih.gov/devdcode/common/CTCF-CTCF_interactions_Handoko2011_NatGen.xls
PWM Ids (with info content) to TF namesPWMs to TF names/panfs/pan1.be-md.ncbi.nlm.nih.gov/devdcode/common/TFBSinfocontent.txt