December 2005 Archives
2005-12-31 01:27:21
Note on project of Systems biology
-
Download the sequence data. At least essenstial genes data and DNA sequenecs.
Write a counting program for counting codon.
Download the sequence data. At least essenstial genes data and DNA sequenecs. CDS -- Coding sequence
The data in the "Codon Usage Database" should be complete. I downloaded gbbct.* into my harddisk. According tp the description of CODON_LABEL, README, SPSUM_LABEL, it means this file contains all bacterial sequence entries.
File gbbct.codon lists the codon usage in each gene registered in the selected GenBank Flat Files. So we don't need to count again! (we could add some non-compiled data in it! such as EST)
File gbbct.spsum lists the sum of numbers of codon usage in each species. It might be useful for comparing!
Because it saves a lots of time to compute the sequence data which all used CDS(?), we should emphasis the part of analysis.
Based on the webpage of The E. coli Index, there are five completed E. coli genome sequences.Escherichia coli CFT073 (UPEC) Escherichia coli K-12 MG1655 Escherichia coli K-12 W3110 Escherichia coli O157:H7 EDL933 Escherichia coli O157:H7 RIMD 0509952 (Sakai)
write a counting program for counting codon.
Frist, write a program to count all usage of essential genes and compare the differences between total usage and essential genes usage. I have processed the datas of gbbct.codon and gbbct.spsum into Escherichia.codon and Escherichia.spsum, respectively. Note that the number of each line in before two files are produced by grep automatically. We should ignore it when process the data in the future.
Genes Statistical Table from PEC,Classification Whole Minimal essential 250 250 nonessential 3,260 1,960 unknown 900 900 Total 4,410 3,110
Here, there are 250 essential genes in whole genome, and the number of esstential genes I extracted from PECData.dat is also 250. But the number of essential genes on DEG's webpage is 235. What is the difference?
Now, I basicly finish the essential part of counting program. It can counting the essential genes of E. coli.
2005-12-24 21:23:28
Nonapproximability
The topic of each meeting is ordered by chapters of Vijay V. Vazirani's book. The frosty meeting was about the concepts of approximation and computational complexity.
Some terminologies like "instances", "problems" were very important; \comment{ I feel that the theory of approximation algorithms is meaningful on understanding more about computation complexity rather than designing approximation solvers. }
The ratio $\delta$ is defined as function mapping positive integers to positive rational numbers. If an algorithms could produce a feasible solution $s$ for each instance $I$ and satisfy following inequality after evaluating such solution by objective function $f(I,s)$ (i.e., cost, weight...etc), then we say it an factor $\delta$ approximation algorithm for such problem.
$f(I,s)\leq \delta(sizeof(I))*OPT(I)$
And for randomization approximation algorithm, the inequality will be: $Pr[f(I,s)\leq \delta(sizeof(I))*OPT(I)]\geq 1/2$
On the second meeting, the constant $\delta$ algorithm appeared and we contacted the concept about nonapproximability.
Theorem 3.6 For any polynomial time computable function $\alpha(n)$, TSP can not be approximated within factor $\alpha(n)$, unless P=NP.
The proof of above theorem is really interesting, but I don't understand it....
I can't understand the role of time-complexity function. Does the words "polynomial time computable" necessary? That's could we restate the theorem as following:
Theorem 3.6* For any function $\alpha(n)$, TSP can not be approximated within factor $\alpha(n)$.
Update: In fact, No! The "polynomial time computable function" take the main role in the proof.
The nonapproximability is based on the transformation form optimization problem to decision problem, i.e.,
$f(I,s)$ ? $\alpha(n)\times n$ == Does H.C. exists?
If we have algorithm $\mathscr{A}$ in $P$, and polynomial time computable factor function, then we can compute the solution of TSP and compare the solution and $\alpha(n)*n$ (it will in P).
Thus, it can be taken as the decision problem of H.C.. But H.C. is in NPC, therefore it is contradiction!
Although, I still think the transformation strange! The author make a transformation and assume the existence of approximation algorithm $\mathscr{A}$, and the transformation is specially designed for proving this theorem.
But, the transformation make any algorithms have only two kinds of output, i.e., $n$ and larger than $n\alpha(n)$.
And for the case $f(I,s)=n$, such approximation algorithm just look like optimization algorithm; for the case $f(I,s)>n\alpha(n)$, the requirement of such approximation algorithm is only lager then the impossible lower bound, so it is no matter that such approximation algorithm within $\alpja(n)$.
My questions: Although the logic is correct! Does the usage of the constructive method in logic reasoning (contradiction method) valid? any restriction ?
Upadte: The theorem 3.6 is cited from
@article{321975,
author = {Sartaj Sahni and Teofilo Gonzalez},
title = {P-Complete Approximation Problems},
journal = {J. ACM},
volume = {23},
number = {3},
year = {1976},
issn = {0004-5411},
pages = {555--565},
doi = {http://doi.acm.org/10.1145/321958.321975},
publisher = {ACM Press},
address = {New York, NY, USA},
}
2005-12-29 13:46:30
Quote form Andrew S. Tanenbaum
Something funny of Andrew S. Tanenbaum's Personal:
Q: Did you experience culture shock going from M.I.T. to Berkeley?
A: Oh my goodness. Yes. It was so strange to be in an environment with people having I.Q.'s below 150 and where it wasn't necessary to study 12, 13, 14 hours a day, seven days a week just to keep up.
Q: What are you most proud of?
A: Having been elected Fellow of the ACM and a Fellow of the IEEE.
There are only a handful of people in the world who are Fellows
of both.
Q: What is your philosophy of life?
A: I can't make up my mind. It is either
KISS: Keep It Simple, Stupid
or
Get it right the first time.
2005-12-29 05:02:16
Technical Reports
Most answers are technical reports are research results not good enough to accept by journals. But technical reports appear in references of many journal papers. And many mathematical organizations collect its technical reports. It must has some differences between pre-prints and technical reports.
Technical reports describe the progress or results of scientific or technical research and development. They are usually produced in response to a specific request or research need, and serve as a report of accountability to the funding organization.
Technical reports usually fall into two categories:
* Government sponsored research reports
* Privately funded research reports
Both categories may include national or international reports by university departments, institutes, private industry, or government agencies and laboratories.
-
On-line CS Techreports
Networked Computer Science Technical Reference Library
Defense Technical Information Center (DTIC)'s Scientific and Technical Information Network (STINET)
Socrates
The Virtual Technical Reports Center
Princeton Computer Science :: Technical Reports
Update 2006-04-15: I find another explanation of technical report at MIT Libraries' website
-
Are written to convey new developments or final results of scientific and technical research.
Are usually funded by government departments or corporate bodies.
Deliver technical information to the funding organization.
Provide a forum for peer information exchange.
Are not easy to find.
-
Some Advice on Writing a Technical Report, Alan T. Sherman.
The Technical Report (TR) is a common written form through which computer scientist communicate their findings. Each TR should have a focused topic that is developed logically along some clearly identified perspective.
A TR should explain what you did, why you did it, what you discovered, and what is significant of your findings. The report should identify clearly what is novel about your work, and how it relates to prior knowledge.
However, if one got good enough results to write a TR, why he doesn't just submit it to conferences or journals? Hints on Technical Report Writing, John Ringwood PREPARATION OF TECHNICAL REPORTS, Dr. Ronald Francis.Update 2006-10-16:
Prof. Lance Fortnow wrote a post ``What Happened to Departmental Tech Reports?'' on his blog. In the comments of this article, some one gave a link about evaluation of tech reports.
2005-12-29 03:20:21
2006 coming!
According to the situation in this semester, I decide to take only two courses, Compiler Design and Randomized Algorithms. Th e first one is the requirement of graduation.
Why do I take only two courses?
This is for more time to learn something I am interested in. I am interested in some subjects such as combinatorics, graph th eory, algebra, topology, parallel algorithm, theory of computation, etc. But I might loss passions soon after I understand th e basic approaches and concepts of such subjects.
Since I still have one project of NSC on next semester, and I hope to audit some courses such as Optimization Method, Informa tion Theory, and Quantum Information and Computation.(Unfortunately, Information theory is on the same time with Randomized A lgorithms) I think I don't need to take too much courses.
In fact, all my collage years, I almost took 25 credits each semesters.
But, it's shame, I get c for most courses. After considering my personality, I make this decision. That, taking much courses and pretending to study hard, is easy for me, but it doesn't means that I introspect. To consider problem seriously and trying to find a good solution, and then make change. I think that it is the real introspection. Although it seems not so positive.
2005-12-28 23:39:20
Parallel processing
For really huge problem, the speed of parallel processing is not so admissible, so the heuristic approach and approximation algorithm might be the better choices. But in some huge problem, there are some middle-size problems which need optimal solution.
Prof. Tang said that, in some cases, with the cooperators, prof. Yeh-Ching Chung etc, they processed in parallel by heuristic algorithm and got speedup better than the liner.
The comment from prof. Tang was that the overlap of information and computation might make the problem reduce faster than the computation, and it might be a direction that could in further research.
I know less on parallel algorithm, but prof. Tang was professional in parallel algorithms (now, he focus on the computational biology). Therefore, prof. Tang's comment should be significant. But it's strange. Doesn't it a intuitive approach?
2005-12-26 19:54:53
Note on linear programming
It is usually that the local optimal solution is not in the global solution, so that we can't use greedy or dynamic programming approach. The approach of linear programming is useful in this condition because it is global-view optimization.
But I know less in this field.
-
Linear Programming FAQ There is an references list in this FAQ.
A Short Course in Linear Programming
Mathematical Programming Glossary, by Harvey J. Greenberg
Linear Programming -- From MathWorld
The Classical Simplex Method
Linear programming - Wikipedia, the free encyclopedia
Linear Programming, by Vasek Chvatal (most recommended textbook)
Linear and combinatorial programming, Katta G. Murty.
An Introduction to Linear Programming and the Simplex Algorithm, by Spyros Reveliotis
2006-10-21 Update: Luca Trevisan wrote a post in his blog - ``The limitations of linear and semidefinite programming.'' Good job!
-
Duality: The standard form of linear programming problem is defined on maximal problem. But it can be transformed to minimal problem.
Convex: The optimal solution must be on the corner point(extreme point).
Update 2006-09-18:
Some notes I wrote for sub-topics of LP:
-
2006-02-09 20:20:49 Simplex Method
2006-09-05 22:24:01 Integrality and the Unimodular Property
2006-09-18 03:36:26 Primal-Dual Method
2005-12-25 03:39:13
Bug of nanoblogger?
¤é 12¤ë 25 01:42:09 CST 2005
Welcome to NanoBlogger 3.3!
The basic syntax is: nb [-b blog_dir] [options]
- create new weblog (directory) =
nb -b [blog_dir] -a - create new entry =
nb -a - create new category =
nb -c new -a - create new entry with category =
nb -c [cat_id] -a - list entries =
nb -l [all|DATE|max] - list categories =
nb -l cat - list entries by category =
nb -c [cat_id] -l [all|DATE|max] - edit entry =
nb -e [entry_id] - move entry to category =
nb -c [cat_id] -m [entry_id] - delete entry =
nb -d [entry_id] - delete category =
nb -c [cat_id] -d cat - delete entry from category =
nb -c [cat_id] -d [entry_id] - draft entry =
nb -E [draft_file] - import draft as entry =
nb -f [draft_file] -a - force update of weblog files =
nb -u [all|DATE|main]
Thank you for trying NanoBlogger. Please direct comments and suggestions to the mailing list or submit a bug report to the project page on sourceforge.net.
2005-12-11 23:51:24
Recent funny subjects
-
Computational
Complexity: Surprising Results
As I observed, most students felt the Computational Complexity is a less exciting part of TCS, but Dr. Fortnow's blog always make me feel exciting to know more about this field.
Computational Complexity: How I Became a Theorist-
The
Geomblog: ahhhh: or why I became a computer scientist.
Learning
Computation: How I became a theory dilettante
There are many stories of visitors on the comments of Dr. Fortnow's
post. I will not give my story here, because I am not a computer scientist
yet..... :)
And I might not be a computer scientist forever.