class: center, middle # Data structures in Python --- # Outline * Arrays / Lists * Sets * Dictionaries * Objects --- # Walking through arrays ```python >>> l = [ 'a', 'b', 'c' ] >>> print l [ 'a', 'b', 'c' ] >>> print ",".join(l) a,b,c >>> l.append(100) >>> print l [ 'a', 'b', 'c', 100 ] >>> print ",".join(l) TypeError: sequence item 2: expected string, int found >>> print ";".join(str(x) for x in l) a;b;c;100 ``` --- # Some built-in functions * [range()](https://docs.python.org/2/library/functions.html#range) - _range(start, stop[, step])_ ```python >>> range(5,10,1) [5, 6, 7, 8, 9] >>> range(5,-1,-1) [5, 4, 3, 2, 1, 0] ``` * [sorted()](https://docs.python.org/2/library/functions.html#sorted) - Return a new sorted list from the items in iterable. ```python >>> l = ['aaa','bat', 'zoo', 'hog'] # case sensitive >>> sorted(l) ['Bat', 'aaa', 'hog', 'zoo'] # sort case insensitive >>> sorted(l,cmp= lambda x,y: cmp(x.lower(), y.lower())) ['aaa','Bat', 'hog', 'zoo'] # can use reverse option to flip order >>> sorted(l,cmp= lambda x,y: cmp(x.lower(), y.lower()),reverse=True) ['zoo', 'hog', 'Bat', 'aaa'] ``` --- # More built in functions * [map()](https://docs.python.org/2/library/functions.html#map) - lets you update a list with a function ```python >>> l = [ 'a', 100, 12/3.3 ] >>> print l ['a', 100, 3.6363636363636367] >>> print ";".join(map(str,l)) >>> l = ['A','Bear','AAAGGH'] >>> lens = map(len,l) >>> print l [3, 3, 3, 3] ``` * [reversed()](https://docs.python.org/2/library/functions.html#reversed) - iterate in reverse order of an array/string ```python >>> l = ['zzz','yyy'] >>> for n in reversed(l): print n ``` --- # Remove an item from list * [del()](https://docs.python.org/2/tutorial/datastructures.html#del) - remove an item from a list ```python >>> l = ['a','b','c','d'] >>> del l[2] >>> print l ['a', 'b', 'd'] >>> del l[1:3] >>> print l ['a'] ``` --- #More array functions See more details here https://docs.python.org/2/tutorial/datastructures.html * list.append(x) - Add an item to the end of the list; * list.extend(L) - Extend the list by appending all the items in the given list; * list.insert(i, x) - Insert an item at a given position. The first argument is the index of the element before which to insert, so a.insert(0, x) inserts at the front of the list, and a.insert(len(a), x) is equivalent to a.append(x). * list.remove(x) - Remove the first item from the list whose value is x. It is an error if there is no such item. * list.pop([i]) - Remove the item at the given position in the list, and return it. If no index is specified, a.pop() removes and returns the last item in the list. * list.index(x) - Return the index in the list of the first item whose value is x. It is an error if there is no such item. * list.count(x) - Return the number of times x appears in the list. * list.sort(cmp=None, key=None, reverse=False) - Sort the items of the list in place * list.reverse() - Reverse the order of the items in the list --- # Iterate on Strings/Arrays in the same way ```python lst = [ 'BRCA1','SOD1','PTEN'] for gene in sorted(lst): print "gene is",gene DNA='AAAACCGTAG' for let in DNA: print let for let in reversed(DNA): print let ``` ```text BRCA1 PTEN SOD1 A A A ... G A T ... ``` --- #Dictionaries (from https://docs.python.org/2/library/stdtypes.html#typesmapping) Another useful data type built into Python is the dictionary (see [Mapping Types — dict](https://docs.python.org/2/library/stdtypes.html#typesmapping)). Dictionaries are sometimes found in other languages as “associative memories” or “associative arrays”. Unlike sequences (_arrays_), which are indexed by a range of numbers, dictionaries are indexed by __keys__, which can be any immutable type; strings and numbers can always be keys. Tuples can be used as keys if they contain only strings, numbers, or tuples; if a tuple contains any mutable object either directly or indirectly, it cannot be used as a key. You can’t use lists as keys, since lists can be modified in place using index assignments, slice assignments, or methods like append() and extend(). --- #Dictionaries Initialize a dictionary, Dictionaries are key and value pairs ```python >>> things = {} # an empty dictionary >>> listofstuff = [] # an empty array >>> print things {} >>> things = {'diane': 10, 'jack': 13} >>> things {'diane': 10, 'jack': 13} >>> things['diane'] 10 >>> things['billy'] = 15 # if you have a list of pairs of things >>> morethings = dict([('sape', 4139), ('guido', 4127), ('jack', 4098)]) >>> morethings['guido'] 4127 ``` --- #Iterate through a dictionary Using the for loop and the items() function ```python for key,value in morethings.items(): print "key is", key,"value is",value ``` ```text key is sape value is 4139 key is jack value is 4098 key is guido value is 4127 ``` --- #Read Fasta code part 1 ```python import itertools import gzip import sys import re # based on post here # https://drj11.wordpress.com/2010/02/22/python-getting-fasta-with-itertools-groupby/ # define what a header looks like in FASTA format def isheader(line): return line[0] == '>' ``` --- # Read Fasta code part 2 ```python # this function reads in fasta file and returns pairs of data # where the first item is the ID and the second is the sequence # it isn't that efficient as it reads it all into memory # but this is good enough for our project def aspairs(f): seq_id = '' sequence = '' for header,group in itertools.groupby(f, isheader): if header: line = group.next() seq_id = line[1:].split()[0] else: sequence = ''.join(line.strip() for line in group) yield seq_id, sequence ``` --- # Read Fasta example code Part 3 ```python # here is my program # get the filename from the cmdline filename = sys.argv[1] # open files if compressed with gzip if re.match('(\S+)\.gz$',filename): with gzip.open(filename,"rb") as f: seqs = dict(aspairs(f)) else: print "filename ",filename,"doesn't match .gz" with open(filename,"r") as f: seqs = dict(aspairs(f)) # iterate through the sequences n=0 for k,v in seqs.iteritems(): print "id is ",k,"seq is",v n += 1 print n,"sequences" ``` --- #Class problem Write a script to translate DNA into Protein * Use dictionary to lookup codons to convert to amino acid