\documentclass{article} \usepackage{url,Sweave} %\VignetteIndexEntry{The microseq package vignette} \title{The \texttt{microseq} package vignette} \author{Lars Snipen and Kristian Hovde Liland} \date{} \begin{document} %\SweaveOpts{concordance=TRUE} \maketitle \section{Using \texttt{dplyr} and \texttt{stringr}} An idea behind this package is to keep sequence data in the generic data structures in R instead of creating new and complex data types. This makes it possible to use the power of standard data manipulation tools that R-users are familiar with. Both FASTA and FASTQ files are read into tables, and sequences are stored as texts. This makes it straightforward to use all the tools available in packages like \texttt{dplyr} and \texttt{stringr}, for data wrangling and string manipulations. Both input and output FASTA or FASTQ files may be gzipped, no need for decompression. Functions for findings ORFs or genes (\texttt{findOrfs, findrRNA, findGenes}) return results as GFF-formatted tables, i.e. a standard \texttt{tibble} with either texts or numbers in the columns. Many bioinformatic software tools produce results as tables, if you let them. Reading, wrangling and plotting data in tables is what R does best! \section{External software} Some functions in this package calls upons external software that must be available on the system. Some of these are 'installed' by simply downloading a binary executable that you put somewhere proper on your computer. To make such programs visible to R, you typically need to update your \texttt{PATH} environment variable, to specify where these executables are located. Try it out, and use google for help! \subsection{Software \texttt{muscle}} The functions \emph{msalign()} and \emph{muscle()} uses the free software \texttt{muscle} (https://www.drive5.com/muscle/). From the website you download (and unzip) an executable. NB! Change its name to \texttt{muscle}, no more and no less (i.e. no version numbers etc). In the R console the command <>= system("muscle -h") @ should produce some sensible output. \subsection{Software \texttt{barrnap}} The functions \emph{findrRNA()} uses the free software \texttt{barrnap} (https://github.com/tseemann/barrnap). The GitHub site explains how to install. In the R console the command <>= system("barrnap -h") @ should produce some sensible output. \subsection{Software \texttt{prodigal}} The functions \emph{findGenes()} uses the free software \texttt{prodigal} (https://github.com/hyattpd/Prodigal). The GitHub site explains how to install. In the R console the command <>= system("prodigal -h") @ should produce some sensible output. \end{document}