Program Design
DNA is a sequence of molecules called nucleotides, arranged into a particular shape (a double helix). Each nucleotide of DNA contains one of four different bases: adenine (A), cytosine (C), guanine (G), or thymine (T). Scientists often represent DNA strands with a string of letters like this: CTAGATAGTAGACAGATTAAGATGAT
Some portions of this sequence are the same, or at least very similar, across almost all humans, but other portions of the sequence have a higher genetic diversity and thus vary more across the population. One place where DNA tends to have high genetic diversity is in Short Tandem Repeats (STRs). An STR is a short sequence of DNA bases that tends to be repeated back-to-back numerous times at specific locations in DNA. For example, the STR AGAT repeated three times in the sequence CTAGATAGTAGACAGATTAAGATGAT.
Please write the code in C
Technical requirements:
1. Name your program STR.c
2. Prompt the user to enter a file name for the DNA sequence and a STR.
3. Assume the input file name is no more than 100 characters.
4. Read the DNA sequence from the file. DNA sequences are stored as one line of characters in the files.
5. Assume length of DNA sequences is no more than 20000 characters.
6. The program should include the following function:
int count_repeats(char *sequence, char *STR);
The function expects sequence to point to a string containing the DNA sequence and STR to point to a string containing the STR. The function returns the number of repeats of STR in the sequence. String library functions are allowed.
7. In the main function, call count_repeats function and display how many times a STR is repeated in the sequence.


0 comments