December 2005 Archives

2005-12-31 01:27:21

Note on project of Systems biology

  1. Download the sequence data. At least essenstial genes data and DNA sequenecs.
  2. Write a counting program for counting codon.
  1. Download the sequence data. At least essenstial genes data and DNA sequenecs. CDS -- Coding sequence

    The data in the "Codon Usage Database" should be complete. I downloaded gbbct.* into my harddisk. According tp the description of CODON_LABEL, README, SPSUM_LABEL, it means this file contains all bacterial sequence entries.

    File gbbct.codon lists the codon usage in each gene registered in the selected GenBank Flat Files. So we don't need to count again! (we could add some non-compiled data in it! such as EST)

    File gbbct.spsum lists the sum of numbers of codon usage in each species. It might be useful for comparing!

    Because it saves a lots of time to compute the sequence data which all used CDS(?), we should emphasis the part of analysis.

    Based on the webpage of The E. coli Index, there are five completed E. coli genome sequences.

    	Escherichia coli CFT073 (UPEC)
    	Escherichia coli K-12 MG1655
    	Escherichia coli K-12 W3110
    	Escherichia coli O157:H7 EDL933
    	Escherichia coli O157:H7 RIMD 0509952 (Sakai)
    	

  2. write a counting program for counting codon.

    Frist, write a program to count all usage of essential genes and compare the differences between total usage and essential genes usage. I have processed the datas of gbbct.codon and gbbct.spsum into Escherichia.codon and Escherichia.spsum, respectively. Note that the number of each line in before two files are produced by grep automatically. We should ignore it when process the data in the future.

    Genes Statistical Table from PEC,

    	Classification 	Whole 	Minimal
    	essential 	250	250
    	nonessential 	3,260	1,960
    	unknown 	900	900
    	Total 		4,410	3,110 
    	
    Here, there are 250 essential genes in whole genome, and the number of esstential genes I extracted from PECData.dat is also 250. But the number of essential genes on DEG's webpage is 235. What is the difference?

    Now, I basicly finish the essential part of counting program. It can counting the essential genes of E. coli.

References:
  1. countcodon: Translation exceptions for start and stop codons

Posted by Abner | Permanent Link | Categories: Notes

2005-12-24 21:23:28

Nonapproximability

Yesterday was the second time meeting of our study group.

The topic of each meeting is ordered by chapters of Vijay V. Vazirani's book. The frosty meeting was about the concepts of approximation and computational complexity.

Some terminologies like "instances", "problems" were very important; \comment{ I feel that the theory of approximation algorithms is meaningful on understanding more about computation complexity rather than designing approximation solvers. }

The ratio $\delta$ is defined as function mapping positive integers to positive rational numbers. If an algorithms could produce a feasible solution $s$ for each instance $I$ and satisfy following inequality after evaluating such solution by objective function $f(I,s)$ (i.e., cost, weight...etc), then we say it an factor $\delta$ approximation algorithm for such problem.

$f(I,s)\leq \delta(sizeof(I))*OPT(I)$

And for randomization approximation algorithm, the inequality will be: $Pr[f(I,s)\leq \delta(sizeof(I))*OPT(I)]\geq 1/2$

On the second meeting, the constant $\delta$ algorithm appeared and we contacted the concept about nonapproximability.

Theorem 3.6 For any polynomial time computable function $\alpha(n)$, TSP can not be approximated within factor $\alpha(n)$, unless P=NP.



The proof of above theorem is really interesting, but I don't understand it....

I can't understand the role of time-complexity function. Does the words "polynomial time computable" necessary? That's could we restate the theorem as following:

Theorem 3.6* For any function $\alpha(n)$, TSP can not be approximated within factor $\alpha(n)$.

Update: In fact, No! The "polynomial time computable function" take the main role in the proof.

The nonapproximability is based on the transformation form optimization problem to decision problem, i.e.,

$f(I,s)$ ? $\alpha(n)\times n$ == Does H.C. exists?

If we have algorithm $\mathscr{A}$ in $P$, and polynomial time computable factor function, then we can compute the solution of TSP and compare the solution and $\alpha(n)*n$ (it will in P).

Thus, it can be taken as the decision problem of H.C.. But H.C. is in NPC, therefore it is contradiction!

Although, I still think the transformation strange! The author make a transformation and assume the existence of approximation algorithm $\mathscr{A}$, and the transformation is specially designed for proving this theorem.

But, the transformation make any algorithms have only two kinds of output, i.e., $n$ and larger than $n\alpha(n)$.

And for the case $f(I,s)=n$, such approximation algorithm just look like optimization algorithm; for the case $f(I,s)>n\alpha(n)$, the requirement of such approximation algorithm is only lager then the impossible lower bound, so it is no matter that such approximation algorithm within $\alpja(n)$.

My questions: Although the logic is correct! Does the usage of the constructive method in logic reasoning (contradiction method) valid? any restriction ?

Upadte: The theorem 3.6 is cited from

@article{321975,
 author = {Sartaj Sahni and Teofilo Gonzalez},
 title = {P-Complete Approximation Problems},
 journal = {J. ACM},
 volume = {23},
 number = {3},
 year = {1976},
 issn = {0004-5411},
 pages = {555--565},
 doi = {http://doi.acm.org/10.1145/321958.321975},
 publisher = {ACM Press},
 address = {New York, NY, USA},
 }


Posted by Abner | Permanent Link | Categories: Notes

2005-12-29 13:46:30

Quote form Andrew S. Tanenbaum

Andrew S. Tanenbaum's Home page: http://www.cs.vu.nl/~ast/

Something funny of Andrew S. Tanenbaum's Personal:

Q: Did you experience culture shock going from M.I.T. to Berkeley?

A: Oh my goodness. Yes. It was so strange to be in an environment with people having I.Q.'s below 150 and where it wasn't necessary to study 12, 13, 14 hours a day, seven days a week just to keep up.

My techer of calculus told me that his classmate need write a 3D engine as a homework in two weeks in Duke Unvi. CS grad. school. It's horrible.

Q: What are you most proud of?

A: Having been elected Fellow of the ACM and a Fellow of the IEEE. There are only a handful of people in the world who are Fellows of both.

Q: What is your philosophy of life?

A: I can't make up my mind. It is either KISS: Keep It Simple, Stupid or Get it right the first time.


Posted by abner | Permanent Link | Categories: Quotes

2005-12-29 05:02:16

Technical Reports

During the procedure of survey, I found some papers classified as technical reports. I ask others what it means?

Most answers are technical reports are research results not good enough to accept by journals. But technical reports appear in references of many journal papers. And many mathematical organizations collect its technical reports. It must has some differences between pre-prints and technical reports.

Technical reports describe the progress or results of scientific or technical research and development. They are usually produced in response to a specific request or research need, and serve as a report of accountability to the funding organization.

Technical reports usually fall into two categories:

* Government sponsored research reports
* Privately funded research reports

Both categories may include national or international reports by university departments, institutes, private industry, or government agencies and laboratories.

Ref.: SULAIR: How Do I Find?: Technical Reports




Update 2006-04-15: I find another explanation of technical report at MIT Libraries' website

  • Are written to convey new developments or final results of scientific and technical research.
  • Are usually funded by government departments or corporate bodies.
  • Deliver technical information to the funding organization.
  • Provide a forum for peer information exchange.
  • Are not easy to find.

And I find that "technical report writing" can explain it more lucid. Nancy Halligan's website divided it into six classes, Reporting Research Findings, Simple Technical Information Report, Technical Specifications, Technical Evaluation Reports, Technical Recommendation Reports, Technical Manuals and Instructions.

  • Some Advice on Writing a Technical Report, Alan T. Sherman.

    The Technical Report (TR) is a common written form through which computer scientist communicate their findings. Each TR should have a focused topic that is developed logically along some clearly identified perspective.

    A TR should explain what you did, why you did it, what you discovered, and what is significant of your findings. The report should identify clearly what is novel about your work, and how it relates to prior knowledge.

    However, if one got good enough results to write a TR, why he doesn't just submit it to conferences or journals?
  • Hints on Technical Report Writing, John Ringwood
  • PREPARATION OF TECHNICAL REPORTS, Dr. Ronald Francis.


Update 2006-10-16:

Prof. Lance Fortnow wrote a post ``What Happened to Departmental Tech Reports?'' on his blog. In the comments of this article, some one gave a link about evaluation of tech reports.

Posted by abner | Permanent Link | Categories: Notes

2005-12-29 03:20:21

2006 coming!

Today is 12/29. Year 2006 is coming.

According to the situation in this semester, I decide to take only two courses, Compiler Design and Randomized Algorithms. Th e first one is the requirement of graduation.

Why do I take only two courses?

This is for more time to learn something I am interested in. I am interested in some subjects such as combinatorics, graph th eory, algebra, topology, parallel algorithm, theory of computation, etc. But I might loss passions soon after I understand th e basic approaches and concepts of such subjects.

Since I still have one project of NSC on next semester, and I hope to audit some courses such as Optimization Method, Informa tion Theory, and Quantum Information and Computation.(Unfortunately, Information theory is on the same time with Randomized A lgorithms) I think I don't need to take too much courses.

In fact, all my collage years, I almost took 25 credits each semesters.

But, it's shame, I get c for most courses. After considering my personality, I make this decision. That, taking much courses and pretending to study hard, is easy for me, but it doesn't means that I introspect. To consider problem seriously and trying to find a good solution, and then make change. I think that it is the real introspection. Although it seems not so positive.

Posted by abner | Permanent Link

2005-12-29 01:08:13

New look of arXiv.org


Posted by abner | Permanent Link

2005-12-28 23:39:20

Parallel processing

Today, I attended the seminar of Dept. CS in NTHU; the speaker was my advisor Prof. Tang. His topic was not so important. I am interested in the sub-topic he mentioned in this speech, i.e., the speedup of parallel processing.

For really huge problem, the speed of parallel processing is not so admissible, so the heuristic approach and approximation algorithm might be the better choices. But in some huge problem, there are some middle-size problems which need optimal solution.

Prof. Tang said that, in some cases, with the cooperators, prof. Yeh-Ching Chung etc, they processed in parallel by heuristic algorithm and got speedup better than the liner.

The comment from prof. Tang was that the overlap of information and computation might make the problem reduce faster than the computation, and it might be a direction that could in further research.

I know less on parallel algorithm, but prof. Tang was professional in parallel algorithms (now, he focus on the computational biology). Therefore, prof. Tang's comment should be significant. But it's strange. Doesn't it a intuitive approach?

Posted by abner | Permanent Link

2005-12-26 19:54:53

Note on linear programming

I read some papers on graph theory.

It is usually that the local optimal solution is not in the global solution, so that we can't use greedy or dynamic programming approach. The approach of linear programming is useful in this condition because it is global-view optimization.

But I know less in this field. As I know, the basic concepts in LP are two properties:
  1. Duality: The standard form of linear programming problem is defined on maximal problem. But it can be transformed to minimal problem.
  2. Convex: The optimal solution must be on the corner point(extreme point).
Form the geometric concept, the Simplex algorithm calculates those extreme points (feasible solutions) on convex polyhedron which is constructed by linear constraints.



Update 2006-09-18:

Some notes I wrote for sub-topics of LP:
  1. 2006-02-09 20:20:49 Simplex Method
  2. 2006-09-05 22:24:01 Integrality and the Unimodular Property
  3. 2006-09-18 03:36:26 Primal-Dual Method

Posted by abner | Permanent Link | Categories: Notes

2005-12-25 03:39:13

Bug of nanoblogger?

I use utf-8 as charset, and I set $LANG as zh_tw.big5. It's ok on the html but rss feed. The chinese on xml can't not display correctly. It's my fault (miss something?) or bug of nanoblogger?

Posted by abner | Permanent Link

¤é 12¤ë 25 01:42:09 CST 2005

Welcome to NanoBlogger 3.3!

The basic syntax is: nb [-b blog_dir] [options]

How to ...
  • create new weblog (directory) = nb -b [blog_dir] -a
  • create new entry = nb -a
  • create new category = nb -c new -a
  • create new entry with category = nb -c [cat_id] -a
  • list entries = nb -l [all|DATE|max]
  • list categories = nb -l cat
  • list entries by category = nb -c [cat_id] -l [all|DATE|max]
  • edit entry = nb -e [entry_id]
  • move entry to category = nb -c [cat_id] -m [entry_id]
  • delete entry = nb -d [entry_id]
  • delete category = nb -c [cat_id] -d cat
  • delete entry from category = nb -c [cat_id] -d [entry_id]
  • draft entry = nb -E [draft_file]
  • import draft as entry = nb -f [draft_file] -a
  • force update of weblog files = nb -u [all|DATE|main]

Thank you for trying NanoBlogger. Please direct comments and suggestions to the mailing list or submit a bug report to the project page on sourceforge.net.


Posted by n1xt3r | Permanent Link | Categories: Nanoblogger

2005-12-11 23:51:24

Recent funny subjects

Those started on the Dr. Fortnow's blog. I think Dr. Fortnow must be a welcome and enthusiastic person. He is so enthusiastic that people are influenced
  1. Computational Complexity: Surprising Results

    As I observed, most students felt the Computational Complexity is a less exciting part of TCS, but Dr. Fortnow's blog always make me feel exciting to know more about this field.

  2. Computational Complexity: How I Became a Theorist

    There are many stories of visitors on the comments of Dr. Fortnow's post. I will not give my story here, because I am not a computer scientist yet..... :)

    And I might not be a computer scientist forever.


Posted by AbnerCYH | Permanent Link