Read Me File from the Package
This is the file readme.txt from the dna_composition.tgz tarball.
The readme.txt file from the tarball::
DNA Composition Software
(CC by-sa) 2007
R. Mark Adams, Ph.D.
INTRODUCTION
------------------------------------------------------------
In this folder you will find a set of tools in the python programming
language for translating DNA sequences into music. This software is
under heavy development, and consequently is very rough. It is usable,
though, and produces interesting (if limited) music from pretty much
any DNA sequence it is given. It works best with sequences of a few
hundred to a few thousand base pairs- try cDNA sequences from GenBank
(see http://www.ncbi.nlm.nih.gov/entrez/) or similar length non-coding
regions for interesting results.
The basic classes are in the Melody.py and Chords.py files, and both
are also pretty rough- the Chords are particularly limited to a narrow
subset of possible chords, but I am working on it! I based the MIDI
side of the work on the extraordinary PythonMIDI library by mxm (see
http://www.mxm.dk/products/public/pythonmidi for details.)
From the documentation in aaclass_composition.py:
# Composition based on amino acid classes from translated DNA sequence
# data. It is based on the concept that amino acids fall into a well-
# defined set of chemical classes (see Adams, Das, Smith, (1996)
# Protein Science for more details.)
# Classes:
# (From the pima (Smith and Smith, 1992) man page)
#
#
# Original Amino Acid Class Hierarchy Alphabet (Class1 alpha-
# bet):
# Amino Acid Classes Match score
# -2
# _______________ X __________________ 0
# / / \ \
# _ f _ / ______r _______ \ 1
# / / \ / / / \ \ \
# / c \ e / m p \ _ j __ 2
# / / \ \ / \ / / \ / \ \ / \ \
# / a b d \ / l k o n i h \ 3
# / / \ / \ /|\ \ / / \ / \ / \ /\ / \ / \ \
# C I V L M F W Y H N D E Q K R S T A G P 5
# For both alphabets, gaps are denoted by "g"s.
# For the purposes of this composition, I will use three classes:
#
# (C,I,V,L,M,F,W,Y) - nonpolar
# (H,N,D,E,Q,K,R,S,T) - polar
# (A,G,P) - small
#
# Each class will be assocaited with chord transition rules, based on the
# chord transition maps developed by Steve Mugglin (see:
# http://chordmaps.com/part3.htm) When translated into a python dictionary:
#
# {'ii':('iii','V','ii'),'iii':('vi','IV','iii'),'IV':('ii','V','I','IV'),\
# 'V':('I','V'),'vi':('IV','ii','I'),'I':('ii','V','iii','vi','IV','V','I')}
#
# This is easily translated into the 'C' key, which I am using for these
# experiments (thanks, emacs replace-regexp!):
#
# {'Dm':('Em','G','Dm'),'Em':('Am','F','Em'),'F':('Dm','G','C','F'),\
# 'G':('C','G'),'Am':('F','Dm','C'),'C':('Dm','G','Em','Am','F','G','C')}
#
# So: The algorithm is:
#
# (I) Each base gets a middle 'C' eighth-note hit <done>
# (II) For each attempted codon translation:
# (a) For a codon not in the open reading frame, no additional notes
# (b) For a codon in a reading frame:
# (0) If it is a stop codon, make the current chord 'I' ('C')
# (1) If it is not in the current biochemical class:
# Make an allowed chord the current chord, based on the
# codon triplet- add their values as below, mod on
# the length of the set of allowed transitions
# Make the current melody note the base of the current chord
# (2) If it is the same biochemical class:
# Stay in the current chord and make that the current chord
# (III) For the bases in the codon triplet:
# (a) For each of the three bases:
# (1) If it is an 'A': make the current melody note a note higher
# in the current chord ......A=+1
# (2) If it is a 'T': make the current note a note lower in the
# current chord ......T=-1
# (3) If it is a 'G': make the current note the same in the current
# chord ......G=0
# (4) if it is a 'C': add a rest- add no new note this codon
# ......C=0
INSTALLATION
------------------------------------------------------------
You will need a working installation of python (you can get it from
http://www.python.org.) I have tested it on version 2.3 on Linux, Mac
OSX and Windows XP with no problems.
Expand the contents of the dna_composition.tgz tarball somewhere
convenient.
You will also need to grab PythonMIDI from mxm's site:
http://www.mxm.dk/products/public/pythonmidi/download/midi.0.1.1.tar.gz
You will need at minimum MidiOutFile.py, constants.py, MidiOutStream.py,
DataTypeConverters.py, and RawOutstreamFile.py from that tarball.
Either expand it and put it somewhere in your pythonpath or just put
the files you need in the DNA_Composion directory (from dna_composition.tgz)
and you should be all set.
RUNNING
------------------------------------------------------------
Running the software is easy- just type:
python aaclass_composition.py <input_file>
Where <input_file> is a file containing a DNA sequence.
The sequence data can be pretty much any format, as the software
ignores anything that is not in [ATGCatcg]. This can lead to problems
if the data you have contains names, or other nanotation data with
those characters in it. I have included a couple of example files,
test.dna (non-coding data from ancient viral DNA integrated into the
human genome) and cDNA_test.dna (the coding region for alcohol
dehydrogenase) for you to try out. Note that I just cut the DNA
sequences out of their respective GenBank entries at
http://www.ncbi.nlm.nih.gov/entrez .
You will see a whole bunch of information zip by on the screen. If
you are on a slow computer, you can comment out the "print" calls in
the aaclass_composition.py file. Someday I will put in the
appropriate "debug" flags in, etc. Right after I finish the Chord
object... :-)
Three files will result:
<input_file>_dna_base_track.mid
<input_file>_dna_melody_track.mid
<input_file>_protein_chord.mid
They can be played directly with your MIDI instrument or combined via
a sequencer for even more fun.
BUGS, THOUGHTS and TODOs
------------------------------------------------------------
The software should run as advertised if properly set up. I have not
tested it as thoroughly as I should, and so I am sure that there are
all kinds of input that will give it fits, or at least give
unpredictable results. YMMV.
As mentioned, the Chord object needs lots of work. I am still porting
big sections of the original C++ code to python, and so it is only
marginally functional- most of the interesting chord progressions
cannot be added at this point. I also want to add more structure and
robustness so that it can be used more interactively or in other
contexts. The music that comes out from the sequences is neither as
varied or as useful as "auralization" as I would like, so this is
merely the beginning.
I didn't want to hold onto the code forever, though, so out it goes.
As mentioned above, it is Creative Commons Attribution/Share and share
alike, so please go ahead and do anything you want with it- if you are
kind you will drop me a note or a pointer to the cool stuff you have
built with it. If you find bugs, or have features you would think are
useful, please pass them along! If you want to fix them, so much the
better! I will incorporate the changes into the codebase happily.
Happy DNAing- hope you have as much fun as I did exploring the genome.
-Mark
rmadams@epotential.com
Read the source code for more details.

