Bioinformatics Computational Infrastructure

The MidSouth Bioinformatics Center at UALR provides bioinformatics consulting, training, technical assistance, and access to our computational infrastructure to aid faculty, students, and researchers in the region. The MBC has assembled a variety of computing platforms that allow bioinformaticians to store and analyze life science data. The Bioinformatics Core is strongly committed to the use and promotion of open-source software. Most of the software supported by the MBC is either open-source or developed in-house. A brief listing of tools, packages and applications is listed here.

• Programming languages – R, Perl, Python, C/C++

• Tools for next-gen data – tuxedo package (tophat2, bowtie2, cufflinks, cummeRbund), trimgalor, trinity, FastQC, Genome Analysis Tool Kit (GATK), bismark, fastx toolkit

• Miscellaneous packages – EMBOSS, ViennaRNA package, samtools, tools from NCBI (ncbi-blast, sratoolkit), tools from Novocraft, hmmer

• Machine learning – C5.0, Cubist

With support from the Arkansas INBRE, the MBC we have assembled our computational infrastructure. The core of computational infrastructure is a new large storage fileserver. We us a system integration approach to make the data available to the rest of the infrastructure through a private network switch. This includes a small High-performance Computing (HPC) cluster (Rocks OS), a distributed-memory multiprocessor system with message-passing (MPI) for running independent jobs simultaneously. This cluster is used as a teaching tool. For example, it can be used to compare and contrast the shared memory systems with distributed memory systems. Students have the opportunity to write and test code on the MBC’s cluster and thus become familiar with submitting jobs using schedulers on the Center’s cluster, which helps them to understand and apply their skills to larger more powerful institutional clusters like the UALR HPC systems.

The target audience for our systems includes all researchers are working with “omics” data. The MBC can store and process large data sets (i.e., in the form of next-generation *.bam, *.sam and *.fastq files, as well as other formats). The MBC has set up a new fileserver with 22TB of storage and 128GB of RAM. One of its roles is as a file server for both the MBC’s shared memory systems and cluster. The new MBC server with 128GB of RAM, along with a machine that has 64GB RAM, is used for RAM-intensive jobs. For CPU-intensive processing, the MBC has several multi-CPU systems including the new server with 16 CPUs, and two servers with 16 and 8 CPUs, respectively.