# Day 3 Materials

## References

Day 3 solutions can be found here

## Part I: Functions

1. Write a function to compute the GC content of a DNA sequence. The function should accept a single argument, the DNA sequence, and return the GC percentage. Test your function with the nucleotide sequence "AGCTATAGCATAGC".

2. Write a function that calculates the percentage of a given nucleotide from a DNA sequence. The function should accept two arguments: the nucleotide of interest and the DNA sequence. It should return the nucleotide percentage. Test your function with the nucleotide sequence "AGCTATAGCATAGC".

3. Write a function that calculates the percentage each nucleotide in a given DNA sequence. of a given nucleotide from a DNA sequence. The function should accept a single argument, the DNA sequence, and return a dictionary containing key:value pairs of nucleotide:percentage. You can assume that the provided sequence contains only A, C, G, T. Test your function with the nucleotide sequence "AGCTATAGCATAGC".

4. Write a function to guess whether a provided sequence is DNA or protein. For this task, assume that any sequence comprised of $\geq50$% A, C, G, T is a DNA sequence. Test your function with the following two sequences:
• "AGCTATGCATACGAGCATAGC"
• "AGIILLCPKLKKQWTATWCAGCATADSARCVLMKGC"
5. Modify the previous function to ignore all ambiguities in calculations. Use this list of ambiguous characters for this task: ambig = ["B", "J", "N", "O", "X", "Z"]. Test your function with the following sequence:"APAPPPKKLRATNNYPOPPBXXXXXNTYGCTATLMQASDFTDTCATAGC"

## Part II: File Input/Output

Files used in these exercises can be downloaded from the course website. Be sure to write your scripts in the same directory as these files!

1. Open the file file1.txt in read-mode, and print its contents to screen. Use the .read() method, which saves the contents of the file to a single string. Perform this task twice: once using open and close, and once using with control-flow.

2. Open the file file1.txt in read-mode, and save all lines in this file to a list using the .readlines() method. Write a new file called upper_file1.txt which contains the same contents of file1.txt but in upper-case. Try to do this task using a single for-loop.

3. Open the newly created file upper_file1.txt in read-mode. Loop over the file lines without using .read() or .readlines(), and print out lines as you loop.

4. Modify the previous for-loop to only print out lines in upper_file1.txt which contain at least (i.e. $>=$) 5 letter E’s.

5. You should notice 20 files named file1.txt, file2.txt, ..., file20.txt. Write a for-loop to open each of these files (Hint: use the range() function to loop over file names). For each file, print each line that contains more than 25 characters.

6. Write another for-loop over the same 20 files. For each file, create a second file named fileX_odd.txt (where X=1-20) which contains only the odd-numbered lines from the original file. For this, use a counter in the for-loop that goes over file lines (this will count the line numbers), but be careful: Remember that python indexing starts from 0, but the first line is technically line #1!

7. Convert our zoo-keeper dictionaries into a comma-separated file with the header animal,vore,food, and rows should contain corresponding information, i.e. lion,carnivore,meat. Perform this task with a single for-loop.

 category = {"lion": "carnivore",
"gazelle": "herbivore",
"anteater": "insectivore",
"alligator": "homovore",
"hedgehog": "insectivore",
"cow": "herbivore",
"tiger": "carnivore",
"orangutan": "frugivore"}

feed = {"carnivore": "meat",
"herbivore": "grass",
"frugivore": "mangos",
"homovore": "visitors",
"insectivore": "termites"}

8. Create a second zoo-keeper file by converting the CSV into a tab-separated file. Perform this task by reading in the CSV, replacing commas with tabs, where tabs can be created as the string "\t". For example, the following snippet will replace all commas with tabs in a string called mystring:

 mystring2 = mystring.replace(",", "\t")