Software and Libraries
Facilities
Name | Description | facilities | Usage | Manual/Tutorial |
---|---|---|---|---|
TensorFlow | TensorFlow is an open source software library for high performance numerical computation. Its flexible architecture allows easy deployment of computation across a variety of platforms (CPUs, GPUs, TPUs), and from desktops to clusters of servers to mobile and edge devices. | tensorflow | #Example for ConvNets from tensorflow import keras from tensorflow.keras import layers model = keras.Sequential() model.add(layers.Conv1D(64, 9, activation='relu') model.add(layers.BatchNormalization()) model.add(layers.Conv1D(32, 4, activation='relu') model.add(layers.Flatten()) model.add(layers.Dense(100, activation='relu')) model.add(layers.Dense(1, activation='sigmoid')) model.compile(loss='binary_crossentropy, metrics=['accuracy']) #x, and y represent input and labels, respectively. model.fit(x, y) | https://www.tensorflow.org/ |
pybedtools | pybedtools wraps and extends BEDTools and offers feature-level manipulations from within Python | bedtools | #Example for finding overlapping features using bedtools intersect from pybedtools import BedTool bedtool_a = BedTool(input_bedfile_a) bedtool_b = BedTool(input_bedfile_b) a_with_b = bedtool_a.intersect(bedtool_b) | https://daler.github.io/pybedtools/ |
bedtools | bedtools: flexible tools for genome arithmetic and DNA sequence analysis. | bedtools; option bedtools_version 2.23.0 | bedtools <subcommand> [options] Find overlapping intervals: intersect [OPTIONS] -a <bed/gff/vcf/bam> -b <bed/gff/vcf/bam> Profile the nucleotide content of intervals in a FASTA file: bedtools nuc [OPTIONS] -fi <fasta> | http://bedtools.readthedocs.org/en/latest/index.html |
lastz | lastz-- Local Alignment Search Tool, blastZ-like | lastz | lastz target [query] [options] | http://www.bx.psu.edu/miller_lab/dist/README.lastz-1.02.00/README.lastz-1.02.00a.html |
matlab | Numerical computing environment and programming language | matlab | matlab | http://www.mathworks.com/help/pdf_doc/matlab/getstart.pdf |
meme | tool for discovering motifs in a group of related DNA or protein sequences. | meme | meme <dataset> [optional arguments] where <dataset> file containing sequences in FASTA format | http://meme-suite.org/doc/meme.html?man_type=web |
fimo | FIMO scans a sequence database for individual matches to each of the motifs you provide. | meme | fimo --no-qvalue --oc ./outputDir/ --bgfile motif-file /panfs/pan1.be-md.ncbi.nlm.nih.gov/devdcode/common/tfFimo/all.meme /panfs/pan1.be-md.ncbi.nlm.nih.gov/devdcode/common/tfFimo/random10kbHuman.fa | http://meme-suite.org/doc/fimo.html?man_type=web |
rstudio | User interface for R | rstudio; option rstudio_version 0.98.493 | rstudio | http://dss.princeton.edu/training/RStudio101.pdf |
Programs
Name | Description | location | Usage |
---|---|---|---|
fimoScript | Script to get TFBS of a fasta sequence | /panfs/pan1.be-md.ncbi.nlm.nih.gov/devdcode/common/tfFimo/fimoScript.sh | fimoScript.sh OnefastaSequence motifFile threshold nonRedundant threshold options: NONE,1,5 / nonRedundantPWMs options: 50,75,90, default is ByName e.g. /panfs/pan1.be-md.ncbi.nlm.nih.gov/devdcode/common/tfFimo/fimoScript.sh /panfs/pan1.be-md.ncbi.nlm.nih.gov/devdcode/common/tfFimo/random500.fa /panfs/pan1.be-md.ncbi.nlm.nih.gov/devdcode/common/tfFimo/all.meme 5 output will be in fimoConverted.txt |
pan_df | Checking available space on panfs. | /opt/panfs/bin/pan_df -H /panfs/pan1.be-md.ncbi.nlm.nih.gov/devdcode | |
LiftOver | Converts genome coordinates and genome annotation files between assemblies | /panfs/.../devdcode/common/softwareTools/ | liftOver chainFile bedFile outputFile |
tfbsFrag | extracting a list of FIMO TFBS for a region in the human (or mouse) genome | /panfs/.../devdcode/common/bin/tfbsFrag | tfbsFrag [options] region genome for example: tfbsFrag chr1:11120081-11120094 hg19 tfbsFrag chr1:11120081-11120094 hg19 -r (coordinates relative to the input region) tfbsFrag chr1:11120081-11120094 hg19 -t (tfSearch output format) tfbsFrag chr1:11120081-11120094 hg19 -s 1 (stringency 1 per 10kb) tfbsFrag chr1:11120081-11120094 hg19 -n 1 (non redundant PWMs) available genomes: hg19, mm10, and mm9 |
fetch_controls | fetching controls (controlled for GC, repeat content, and length) for a list of regions in a genome | /panfs/.../devdcode/common/bin/fetch_controls | fetch_controls [options] db[hg19] element_list[chrom:start-stop] for example: fetch_controls hg19 signal_list.txt fetch_controls hg19 signal_list.txt -n=X (X control sequences per each signal sequence) fetch_controls hg19 signal_list.txt -e (avoid overlaps w/ refGene+knownGene coding exons) fetch_controls hg19 signal_list.txt -nm (no repeat masking) fetch_controls hg19 signal_list.txt -ng (no GC matching) |
annotateElements.pl | annotating genic features (intergenic, UTR, etc) for a set of elements | /panfs/.../devdcode/common/bin/annotateElements.pl | annotateElements.pl element_list[chrom:start-stop] db[hg19] |
Libraries
Name | Description | location |
---|---|---|
MEME motifs database | TF motifs database | /panfs/pan1.be-md.ncbi.nlm.nih.gov/devdcode/common/tfFimo/all.meme |
CTCF-CTCF interactions | from ChIA-Pet | /panfs/pan1.be-md.ncbi.nlm.nih.gov/devdcode/common/CTCF-CTCF_interactions_Handoko2011_NatGen.xls |
PWM Ids (with info content) to TF names | PWMs to TF names | /panfs/pan1.be-md.ncbi.nlm.nih.gov/devdcode/common/TFBSinfocontent.txt |