October 2006 Archives

2006-10-29 08:22:06

Graph Alignments and Motifs

Notes : Complex Networks, Computational Biology Problems on Graph Representations.

In the field analysis of biology networks, the main challenge is to read off biological functions from the topology of the network. Topological motifs, i.e., patterns occurring repeatedly at different positions in the network, have recently been identified as basic modules of molecular information processing.

``Are network motifs the spandrels of cellular complexity?'' Ricard V. Sole and Sergi Valverde.
Biological Networks: The Tinkerer as an Engineer, U. Alon (26 September 2003) Science 301 (5641), 1866.
Biological networks are abstract representations of biological systems, which capture many of their essential characteristics. In the network, molecules are represented by nodes, and their interactions are represented by edges (or arrows). The cell can be viewed as an overlay of at least three types of networks, which describes protein-protein, protein-DNA, and protein-metabolite interactions.

A module in a network is a set of nodes that have strong interactions and a common function. A module has defined input nodes and output nodes that control the interactions with the rest of the network. A module also has internal nodes that do not significantly interact with nodes outside the module.
Those paragraphs are about modularity in biological networks. They said the modular better than nonmodular, because it is readily reconfigured to adapt to new conditions. Nontheless this fashion faded in CS. The modularity should be the concept in the generation of structural PL. Now OO, an idea barrowed from biology, is the mainstream.....^_^;;
The second common feature of engineering and biological networks is robustness to component tolerances.

The third feature common to engineering and biological networks is the use of recurring circuit elements. .....(skip)....Metabolic networks use regulatory circuits such as feedback inhibition in many different pathways (24). The transcriptional network of E. coli has been shown to display a small set of recurring circuit elements termed "network motifs"
Milo, R.; Shen-Orr, S.; Itzkovitz, S.; Kashtan, N.; Chklovskii, D. & Alon, U. Network Motifs: Simple Building Blocks of Complex Networks, Science, 2002, 298, 824-827. (I found this one as report in Science. I thought it is a roadmap to understand biological network motifs, since it got a lots of citations.)

The paper ``Biological Networks: The Tinkerer as an Engineer'' trid to popularize the idea of network motif, however it didn't say how to find it. In this paper, they demonstrated how to find the striking appearance of motifs in networks representing a broad range of natural phenomena.
- Shen-Orr, S.S.; Milo, R.; Mangan, S. & Alon, U. Network motifs in the transcriptional regulation network of Escherichia coli Nat Genet, 2002, 31, 64-68 [Note]
- Guelzim, N.; Bottani, S.; Bourgine, P. & Kepes, F. Topological and causal structure of the yeast transcriptional regulatory network Nat Genet, 2002, 31, 60-63
- Maslov, S. & Sneppen, K. Specificity and Stability in Topology of Protein Networks, Science, 2002, 296, 910-913 [Note]
Christensen, C. & Albert, R. Using graph concepts to understand the organization of complex systems, 2006. to appear in the International Journal of Bifurcation and Chaos.

1.6 From local organization to large-scale clustering: network motifs
Network motifs�Xpatterns of connection that recur statistically more frequently than they would in a degree-preserving randomized graph [Milo, Itzkovitz, Kashtan, Chklovskii and Alon, 2002; Shen-Orr, Milo, Mangan and Alon, 2002]-- demonstrate that in most real networks, structural and functional organization exists at a level of complexity of only a handful of nodes, and, moreover, that networks with similar overall function appear to be built from the same basic motifs.
Albert, R. Scale-free networks in cell biology, J Cell Sci, 2005, 118, 4947-4957

I were interested in this paper because it mentioned the real cellular networks, e.g., regulation netowrks. Nonetheless it just tried to show that cellular networks like protein interaction, signal transduction pathways, and transcriptional regulations, could be modeled by scale-free networks.
[Unread] Berg, J. & Lassig, M. Local graph alignment and motif search in biological networks, PNAS, 2004, 101, 14689-14694
[Unread] Prill, R.J.; Iglesias, P.A. & Levchenko, A. Dynamic Properties of Network Motifs Contribute to Biological Network Organization PLoS Biology, 2005, 3, e343

The course Analysis of Biological Networks, Fall 2005-6, Roded Sharan, seems a good start point!

Prof. Uri Alon's website has many interesting things

By the way, the arXiv.org search results of the key word ``network motifs'' are 47 entries.

Graph(or Network) Alignment

Flannick, Jason, Novak, Antal, Srinivasan, Balaji S., McAdams, Harley H., Batzoglou, Serafim, Graemlin: General and robust alignment of multiple large interaction networks, Genome Res. 2006 16: 1169-1181
Jingchun Chen , and Bo Yuan, Detecting functional modules in the yeast protein�Vprotein interaction network, Bioinformatics Advance Access published on September 15, 2006.

Posted by Abner Chih Yi Huang | Permanent Link | Categories: Notes

2006-10-29 04:46:22

Computational and Systems Biology

I had taken the course Systems biology and sat in the course Computational Biology at first year of graduate school, but I did not find some topics in which I am interested.

Lectures

Bioinformatics Graduate Seminar, BF821, Fall 2004
Algorithms for Molecular Biology: Course Archive, Ron Shamir
Computational Genomics Research Seminar, Chaim Linhart
MIT 7.90J / 6.874J Computational Functional Genomics, Spring 2005
MIT 6.096 Algorithms for Computational Biology, Spring 2005
NSYSU Bioinformatics, Chang-Biau Yang
NCNU, Computational Biology,

I prefer to put the network realted fields like gene networks in the entry of complex networks

I had interested in the complex networks about Systems biology, but it needs many knowledges of biology.

That is the reason I just wrote a liitle on this field. After back from workshop on BioAlgorithmics, I was interested in Motif discovery, especially Gibbs Motif Sampler and Homology Search especially PatternHunter.

I had collected few links in this article, and PhyloGibbs seems interesting. The Homology Search interested me because of Ming Li's works.

Recently, I am interested in Physical Mapping, more precisely ``consecutive one property (C1P),'' I noted here. I am interested in Graph Alignments and Motifs, too. The note is here.

Posted by Abner Chih Yi Huang | Permanent Link | Categories: Notes

2006-10-27 02:48:06

Combinatorial Design

I had taken prof. Tayuan Huang's coures Design Theory at 2005, however I don't really like this field. That's why I don't note about it.

This issue of Discrete Mathematics has a paper ``Designs on the web''

In 2001, the United Kingdom Engineering and Physical Sciences Research Council awarded three of the authors of this paper a grant for "A web-based resource for design theory". As the project developed, we have had to face a number of problems, ranging from fundamental questions such as "What is a design?", through research topics such as "How should the concept of partial balance be extended to designs which do not have constant block size?", to more practical problems concerning what format we should use to store designs. This paper gives a brief description of the project to date, concentrating on theoretical questions about designs and their classification.

In prof. Huang's coures, I learned balanced designs include projective and affine planes, Steiner systems, and difference sets, non-balanced designs include partial geometries, and partially balanced designs, and so on.

This field is more close to the experiments design and coding theory. Those are not my interests, but it might be useful in my researches because of its statistical characters of construction.

Posted by Abner Chih Yi Huang | Permanent Link | Categories: Notes

2006-10-26 22:52:19

Special Topics in Algorithms I: Lovasz Local Lemma

This seminar is about Probabilistic Methods. Instructor is Prof. Tang. Organizer is Yun-Sheng Chung. The main material is The Probabilistic Method, 2nd Ed, by Noga Alon and Joel Spencer, 2000.

I need to present the Lovasz Local Lemma(LLL) at 10-26. However, I will not write the notes of LLL, but some questions of it.

Ref. : Lov{\'a}sz local lemma - Wikipedia, the free encyclopedia

The (general) Lovasz Local Lemma: let $A_1,A_2,\cdots,A_n$ be events in same probability space. Let D=(V,E) be a dependency graph for $\{A_1,A_2,\cdots,A_n\}$ , suppose there are real numbers $x_1,x_2,\cdots,x_n, 0\leq x_i<1$ and $Pr[A_i]\leq x_i\Pi_{(i,j)\in E}(1-x_j), \mbox{then } Pr[\wedge_{i=1}^n \bar{A_i}]\geq \Pi_{i=1}^n(1-x_i)>0$

The symmetric Lovasz Local Lemma: Let $A_1,A_2,\cdots,A_n$ be events in a probability space such that each $A_i$ is mutually independent of all but most $d$ other events. If $Pr[A_i]\leq p, \forall i, \mbox{and } ep(d+1)\leq 1, \mbox{then } Pr[\wedge_{j=1}^n \bar{A_j}]>0$

The representation of LLL is the dependency graph. (Let A_1,A_2,...,A_n be events in same probability space. Let D=(V,E) be a directed graph for {A_1,A_2,...,A_n}, and the event A_i is mutually independent of all the events $\{A_j:(i,j)\not\in E\}.$

Although the defintion said the dependency graph is a directed graph, the LLL didn't show the usage of directed edge.

The mutully independent: any collection of events -- possibly more than just two of them -- are mutually independent if and only if for any finite subset $A_1,\cdots, A_n$ of the collection we have $\Pr(A_1 \cap \cdots \cap A_n)=\Pr(A_1)\,\cdots\,\Pr(A_n).$

It is intuitive to use undirected edge to represnt two related events, more precisely some events will affect other events, but inverse condition will not hold, e.g., the event that one win the lottery will not affect the probability of the event that he is struck by lightning, but the inverse case will make one unable to win the lottery...:) Therefore we could use directed edge to present those cases.

Ref.: Wikipedia: Pairwise independence

In probability theory, a pairwise independent collection of random variables is a set of random variables any two of which are independent. Any collection of mutually independent random variables is pairwise independent, but some pairwise independent collections are not independent.

However the definition of dependency graph is not this case. The directed edge (i,j) means that event i is not mutually independent to event j. If (i,j) exists, then ?

If graph G({a1,a2},{(a1,a2)}) exists, then it means the event a1 and event a2 are not mutually independent, i.e., Pr(a1 and a2)=/=Pr(a1)*Pr(a2), but the event a2 and event a1 are mutually independent, i.e., Pr(a1 and a2)=Pr(a1)*Pr(a2), since edge (a2,a1) does not exist!

Actually some materials describes the dependency digraph with a little diffences, and the original paper of Lovas and Erdos is too old to obtain, therefore the I couldn't find the orignal defintion!

This is my first question and the second one is:

What does the ``symmetric'' mean? I don't know. Alon and Spencer's textbook doesn't mention it, but some references have a little diffence in the lemmas, i.e., they described the ``symmetric version'' of Lovasz Local Lemma by ``independent'' not ``mutually independent''.

Actually, in Lovasz Local Lemma's proof, it didn't use the mutually independent property.(I didn't find it) In the definition of symmetric Lovasz Local Lemma, its proof use the degree (in_degree+out_degree) to prove it. It might be the meaning of ``symmetric''.

In remaining two question is about the algorithmic aspect of Lovasz Local Lemma which proposed by Beck at 1991.

The thrid question is the defintion of 4-tree. The definition of 4-tree T is a graph recreated from G, so that the vertices of T are the vertices of G whose distance to each other vertex is at least 4, and the edge (i,j) exists if the distance of i and j are exactly 4. Therefore T is connected.

By this defintion, there are some cases disconnective, e.g., a,b,c are three vertices with distance(a,b)=4, and distance(a,c)=distance(b,c)=5, therefore this graph is not connected.

According to the context of proof of Beck's work, the 4-tree T should means that T(V,E) is a graph recreated by G, and the neighbors of vertex i \in V are with exactly pairwise distance 4 in G. Thus T is connected, and the vertices of T are the vertices of G whose distance to each other vertex is at least 4.

The fourth question is the setence ``there are at most 4^j trees (up to isomorphism) of j vertices.''

If it means that isomorphism only counts once, and every tree must has j vertices (i.e., excluding the subtrees with less than j vertices). there are at most 4^j trees of j vertices.

According to a webpage, A rooted tree is ordered if the edges below each node are ordered in some way, perhaps from left to right on the page. In any case, the number of ordered rooted trees on n nodes is the (n-1)-th Catalan number. And by Wikipedia, C_n < 4^n!

However, if we count the subtrees with less than j vertices, by Cayley's_formula, it states that if n is a positive integer, the number of trees on n labeled vertices is n^{n-2}.

Ref.: P. Erdos and L. Lov'asz, Problems and results on 3-chromatic hypergraphs and some related questions, in "Infinite and finite sets" (A. Hajnal, R. Rado, and V. S'os, Eds), Coll. Math. Soc. J. Bolyai, Vol. 11, Budapest, 1973, 609--627. (In this paper, Lovasz proposed the Lemma)

Finally, I put here a compressed file containing some lecture notes or slides about LLL on the Internet.

Posted by Abner Chih Yi Huang | Permanent Link | Categories: Seminar

2006-10-25 22:21:41

P2P Lending

Muhammad Yunus was awarded the Nobel Peace Prize 2006. "for their efforts to create economic and social development from below." He founded the Grameen Bank to make loans to poor people. Its idea is community and socially based lending.

Furthermore, Zopa and Prosper are companies providing an online money exchange, allowing people who have money to lend to provide it directly to people wanting to borrow it without the need to go through banks, a process sometimes referred to as peer-to-peer lending.

I am interested in the common part of Grameen Bank and Zopa. Ho John Lee said in his article about Prosper:

In a slightly different context, it would be interesting to see something similar which matched borrower groups in relatively poor and developing areas with lenders in relatively wealthy areas. Grameen Bank has done amazing things with microfinance in Bangaladesh, in the sense of helping the borrowing communities building new businesses and opportunities for themselves and making a positive economic return for the bank.

According to the article of Yi-Feng Tzeng, ROI of Proper and Zopa are about 7%!

I would like to know more about them and the theoretical fundations.

Ref.: BrandShift Podcast: interview with the founders of Zopa.

Ref.: Comment on Microfinance-BECKER

Ref.: Microfinance and Third World Poverty and Development--Posner

Ref.: Muhammad Yunus (p3)
Uploaded by stewartslams - JS, DS, Muhammad, Yunus - Dailymotion Share Your Videos

Posted by Abner Chih Yi Huang | Permanent Link | Categories: Notes

2006-10-18 21:33:57

NTHU ECON5091 Macroeconomic Analysis I

This coures is in Fall 2006. The instructor is Prof. Elaine Chyi.

Traditional and Modern Macro.

Neoclassical economics
As expressed by E. Roy Weintraub, neoclassical economics rests on three assumptions, although certain branches of neoclassical theory may have different approaches:

1. People have rational preferences among outcomes that can be identified and associated with a value.
2. Individuals maximize utility and firms maximize profits.
3. People act independently on the basis of full and relevant information.
New classical economics
it builds its analysis on an entirely neoclassical framework. Specifically, new classical macroeconomics (NCM) emphasises the importance of rigorous microfoundations, in which the macroeconomic model is built up from the actions of individual agents, whose behaviour is modelled by microeconomics.
Keynesian economics
New Keynesian economics
It strives to provide microeconomic foundations to Keynesian economics by showing how demand management by the government or its central bank can improve efficiency under imperfect markets. The main assumption of New Keynesian economics that distinguishes it from the new classical economics is that wages and prices do not adjust instantly to allow the economy to attain full employment
Monetarism
Monetarism is a set of views concerning the determination of national income and monetary economics. It focuses on the supply and demand for money as the primary means by which economic activity is regulated. Monetary theory focuses on money supply and on inflation as an effect of the supply of money being larger than the demand for money.

Monetarism is an economic theory which focuses on the macroeconomic effects of the supply of money and central banking. Formulated by Milton Friedman, it argues that excessive expansion of the money supply is inherently inflationary, and that monetary authorities should focus solely on maintaining price stability.

Traditional and Modern-Lucas

Solow growth From of Wikipedia

Exogenous growth model, also known as the Neo-classical model or Solow growth model is a term used to sum up the contributions of various authors to a model of long-run economic growth within the framework of neoclassical economics.

Development of the model

The Neo-classical model was an extension to the Harrod-Domar model including a new term, productivity growth. The most important contribution was probably the work by Robert Solow; in 1956 he and T.W. Swan developed a relatively simple growth model, which fit the data on US economic growth with some success[1]. Solow received the 1987 Nobel Prize in Economics for his work on the model.

Solow growth

Production Function Y(t)=F(K(t),A(t)*L(t)), where Y:output, K:capital, L:labor, and A:effective (knowledge).

Note that many production function has a property called ``constant returns to scale.'' i.e., cY=F(cK,cL) And the property ``diminishing marginal product.'' Remind that the terms about investment in macroeconomics is for economy not for person.

Hicks-Neutral

Hicks-neutral is an attribute of an effectiveness variable in a production function. The attribute is that it does not affect labor differently from the way it affects capital.

Harrod-Neutral?

is one of the ways in which an effectiveness variable could be included in a production function in a Solow model. If effectiveness A is multiplied by labor L but not by capital K, then we say the effectiveness variable is labor-augmenting.

Solow Model is the extension to the Harrod-Domar model.

* Warranted growth �V the rate of output growth at which firms believe they have the correct amount of capital and therefore do not increase or decrease investment, given expectations of future demand.

* Natural rate of growth �V The rate at which the labour force expands, a larger labour force generally means a larger aggregate output.

* Actual growth �V The actual aggregate output change.

Assumptions of Solow Model

the economy is large enough to exhusate the speical gain.
Inputs other than K, A, L are not important or ignorable
Marginal Product Capital is decreasing., i.e., capital is subject to diminishing returns.

According to the assumptions, the economy converges to a constant "steady-state" rate of growth. It means the growth will stop after arriving steady-state. Moreover it means the Solow Model lacks the autonomous engine of growth.

Limitations of this model include its failure to take account of entrepreneurship (which may be catalyst behind economic growth), strength of institutions. In addition it does not explain how or why technological progress occurs. This failing has led to the development of endogenous growth theory, which endogenizes technological progress and/or knowledge accumulation.

For improving the drawbacks of Solow Model, two strands of this literature :

Growth can continue indefinitely through capital accumulation. i.e., AK model

AK model Assume the marginal product is constant.(Rebelo, 1991) (The original assumption of neoclassical is marginal product of capital diminishing. Therefore there exist steady state of Solow Model and the captial will stop accumulating.)

Why is it named AK model? The production function is in the form . i.e., no steady state and Endogenous Growth.

Why could we introduce the constant marginal product per capital?
- Extend the definition of capital. i.e., human capitial.
- Externalities! Invention is harder than minimation. Firms could learn from others.
Note: Why do we merge the technological progress fact with the effective parameter of Solow Model? It is because the technology and knowledge will depreciate or accumulate!
Technological progress can be explained by economic forces and is endogenous.

Eventually prof. Chyi introduced some empirical study of those model such as absolute and conditional converges, however it seems difficult and not accpted by all economists.

Endog. growth

Ramsey Model I

Ref. Wikipedia: Ramsey growth model

The core assumption: People is myopic, i.e., people will underestimate the needs of furture, so that they will save less than optimal value [Pigou]. It is a time preference of assets.

In Solow model, there is no an endogenous saving rate. Frank Ramsey proposed the Ramsey growth model, and its approach is the basis of morden dynamic macroeconomics. For optimimzation of intertemporal allocation of resources, there are two usual appraoches, optimal control and dynamic programming. The former one is good for analysis and the later one is good for simulation.

Prof. Chyi gave a conceptual introduction to Hamiltonian method of Ramsey model which extends the traditional Lagrange's method, however it is hard to understand to me because I didn't learn about it.

Ref.: R. Barro, et al. Economic Growth, 2e, A.3 page 604.

Ref.: Pontryagin's minimum principle - Wikipedia, the free encyclopedia

And about Shadow price:

Loosely, the shadow price is the change in the objective value of the optimal solution of an optimization problem obtained by relaxing the constraint by one unit. The Shadow Price is the value of the Lagrange Multiplier at the optimal solution, so more precisely it is the infinitesimal change in the objective function arising from an infinitesimal change in the constraint

The value of the shadow price can provide decision makers powerful insight into problems. For instance if you have a constraint that limits the amount of labor available to 40 hours per week, the shadow price will tell you how much you would be willing to pay for an additional hour of labor. If your shadow price is $10 for the labor constraint, for instance, you should pay no more than $10 an hour for additional labor.

Ramsey Model II

Ramsey-Applications

The Diamond Model I

The Diamond Model II

Real Business Cycles

Lucas model

Posted by Abner Chih Yi Huang | Permanent Link | Categories: Notes

2006-10-18 03:01:16

2006 Computational Systems Biology Lecture Series

2006 Computational Systems Biology Lecture Series
Sponsors: Department of Computer Science and College of Life Science, National Tsing Hua University
Talk Series: Systems Biology and Computational Biology

Posted by Abner Chih Yi Huang | Permanent Link

2006-10-18 00:58:52

Principle of Maximum Entropy

I had said that I am interesting in Gibbs Sampling, that is a special case of the Metropolis-Hastings algorithm, and thus an example of a Markov chain Monte Carlo algorithm, on Motif discovey of DNA or protein sequences, however I can't understand why it do that.

I had read Cosma Shalizi's ``Maximum Entropy Methods,'' but it interests me because of a post on Chinese Google Blog. I found two other chinese articles about it, here and here. (I backup them bu furl) But they are not enough for me.

Principle of maximum entropy - Wikipedia, the free encyclopedia
Maximum Entropy Modeling
Statistical basis of the principle of maximum entropy production. a peper of Dewar, R. C.
MIT 6.050J / 2.110J Information and Entropy Spring 2003, Chapter 9. Principle of Maximum Entropy: Simple Form. Chapter 10. Principle of Maximum Entropy.
Some webpages recommanded the two books, Data Analysis: A Bayesian Tutorial by D. S. Sivia, and Probability Theory: The Logic of Science, by E. T. Jaynes, G. Larry Bretthorst.

Although Shalizi's article didn't interest me, it also make me impressive because of a sentence.

But I don't believe that this is due to some general fact about inductive inference or incomplete information, which is the view propagated by the late, great E. T. Jaynes.

2006-11-16 Update: Here is the series article of Chinese Google Blog. The author also provides his Ph.D. Thesis.

Posted by Abner Chih Yi Huang | Permanent Link | Categories: Notes

2006-10-14 04:54:12

Works

Homeworks of OPT Theory
Work of TA.
Homeworks of Advanced Computer Architecture
Lovasz Local Lemma
Graph Connectivity

Posted by Abner Chih Yi Huang | Permanent Link

2006-10-11 20:08:47

NTHU CS5371 Theory of Computation Fall 2006

Ref. :NTHU CS5371 Theory of Computation Fall 2006, Wing-Kai Hon.

This course is for graduate students, however its content is more like the undergraduate course of theory of computation in many universities.

Regular Language

The power of DFA = NFA = regular expression, therefore the languages which they recongized is called regular language. However the ``power'' means the set of languages which they can recongize is identical, it doesn't mean the cost of time is the same. In fact, DFAs which transformed form some NFAs will have exponential size of states. Moreover automata theory does not care the difference between ``accepted'' and ``recognized'', since the computational steps is as many as the length of input string, therefore ``not halting'' is impossible.
Pumping Lemma: If A is a regular language, then there is a number p (called the pumping length) such that:

if s is a string in A of length at least p, then s can be divided into three pieces, s=xyz, satisfying the following three conditions: (1) $\forall k\geq0, xy^kz\in A$ (2) $|y|>0$ , and (3) $|xy|\leq p$ .

Here the procedure of partitioning s is a kind of algorithms, although it looks like requiring the exact one xyz partition. That's we could design several distinct xyz for corresponding cases. If we could partition the strings of a language into k sets, then we could devise k xyz's and k pumping lengths to satisfy the three condition of pumping lemma. And the largest one is the pumping length of such language.

Moreover there exists a minimal pumping length of a regular language. (However there is no systematic approcah to find it.)
I didn't understand the meaning of ''pump'' of pumping lemma. I think it might mean ``pump something (pattern) out of string.''
Prove languages not regular: 1. use pumping lemma, show that condition 1 doesn't hold by discussing all cases or condition 3 doesn't hold by contradiction; 2. use the closed property of regular lanuage; 3. Myhill-Nerode Theorem. The cirtical part of using first method is to choose the s.
Let x and y be strings. We say that x and y are distinguishable by L if some string z exists whereby exactly one of the strings xz and yz is a member of L; otherwise, for every string z, we have xz in L whenever yz in L. Let X be a set of strings. We say that X is pairwise distinguishable by L if every two distinct strings in X are distinguishable by L. the index of L to be the maximum number of elements in any set of strings that is pairwise distinguishable by L.

Myhill-Nerode Theorem: L is regular if and only if it has finite index.

I thought the proof of Myhill-Nerode Theorem told us more than the theorem itself.
Actually there are many techniques to design finite automata, e.g., the product of DFAs. This method is useful when we wanted the conditions satisfied concurrently, e.g., intersection operation of regular language.

Context-Free Language

Like the decription of regular expression, the power of Context-Free Grammer(CFG) = Nondeterministic Pushdown Automata(NPDA). A context-free grammer is ambiguous if there are different paese trees of it. The decision problem of ambiguity is undeciable. Some context-free languages are inherent ambiguous, i.e., there is no unambiguous CFG for those context-free languages. And the power of Deterministic Pushdown Automata(DPDA) = unambiguous CFGs, thus they are subset of NPDA.
It is difficult to me to design a CFG. I don't know how to design for certain problems. It seems totally tricky and clueless. In CS, An coomon approach is to model a problem by formal language. This approach applied on many different fields like computational biology and algorithmic graph theory, but I am not used to this fashion. I almost never consider problems in the programming language model. I don't like it actually. I prefer the automata model than the progeamming language, e.g., for solving problems in chapter of regular language, I choose to use NFA, DFA, or algebraic property of it. It is hard to me to consider problems in regular expression.

The problem: Let C={xy | x and y are in {0,1}^*, |x|=|y|, x=/=y.} Show that C is a context-free language.

I spent a lot of time on designing PDA of C. The first idea is to sieve the case x=y. However the language {ww | w in {0,1}^* } is not context free. This idea is impossible. Worse I couldn't escape the dead-end street. I tried to design a PDA to check the mismatch position, and divide the C into infinte classes by infinte positions of checking. The given answer does not use PDA to match something. They use CFG to generate C. Of course, we could translate the any CFG to PDA, but CFG might be the intuitive approach.

For example, another problem is ``Show L={w | w in {a,b}^*, and in each w, the 2 fold of number of a does not equal to the 3 fold of number of b.} is CFL.''

It might be more natrual to solve it by PDA, since we could match and check the numbers of a's and b's by PDA, and CFG couldn't do this, i.e., PDA could check and exculde some cases, but CFG needs to generate the required language exactly.

For above problem, consider the set of pairs (a,b)={(3,2),(6,4),(9,6),....}. If we divided the L into L1 and L2, for cases ``any number of a and no even number of b'' and ``no number of a is 3,6,9,12... and any number of b''. This problem might be easier.

The design method of CFG seems only to investigate required langugae and divide it into sub-languages which can consist the required langugae by closed operations. For two main kinds of problems, counting the numbers of terminals and comparing, the first kind like lanuage C could be considered by subsets of strings which contains different number of terimals, e.g., 3a>2b or 3a+1=2b; the second kind like lanuage L could be divided by position since CFG is unable to pass the information of positions. Moreover, ......

The possible approach might divide it

Turing Machine

The definition and introduction refer Turing machine From Wikipedia, the free encyclopedia.
Turing machine equivalents From Wikipedia, the free encyclopedia. This article introduce some computational models whose power of description are equivalent to Turing Machine, e.g., Multi-tape Turing machines, NTM.
Definition: M recognizes language A if A={w | M accepts w}.
Definition: Call a language Turing-recognizable if some Turing machine recognizes it.
Definition: Call a language Turing-decidable if some Turing machine decides it.
By above three definition, we know that Turing machine M will ACCEPT a string w if w\in L(M), and Turing machine M will reject a string w if w not in L(M). However there is the third case that M enter a loop if w is not in L(M), and L(M) is Turing-recognizable.
The descriptive power of Turing-recognizable Language equals The descriptive power of enumerator which is a TM with a printer.
Acceptance problem: determine a string a member of a language { | C accepts x } or not. (It is usual method to describe the membership problem.)
Every CFL is deciable: for a CFG C, it is decidable that a string w in L(C) or not by a TM.
Recongizer of Regular Language: Actually it is from the problem: ``Show that single-tape TMs that cannot write on the portion of the tape containing the input string recognize only regular languages.''

We know that Read-only TM = DFA. And we denote rTM as the special TM defined in the problem.

I solve this problem by prove rTM equals to DFA (However prove this by Myhill Theorem about finite index is faster.), but I didn't consider the case ``loop'', and my TA who solve this by finite index can't overcome this sub-problem, too. Thus a question is ``Does recongizer of regular language equal to DFA?''

By definition, M recongize L that implies L={w | M accepts w}. If M will not return ACCEPT on string of non-regular language, and return ACCEPT on string of regular language, then we siad M recongize regular language. That's because we are talking power of description.

In the other hand, for DFAs, it is meaningless to ask whether a string in certain regular language, but the result is different to recongizer of regular language. Hence this recongizer does not exist, or we just ignore this kind of recongizer?

This is an open problem in my mind.

Since we could embed a loop into a Turing machine, it might not really necessary to prove the inexistence of loop. One prove some problems Turing-decidable by giving a decider. The same approach should work here! Any Turing machine T accept w in certain regular language could be inserted a loop into accept configurations.

Posted by Abner Chih Yi Huang | Permanent Link | Categories: Notes

2006-10-03 12:41:52

Testing Connectivity of Graphs

Ref. :Testing Connectivity of Graphs, Hsuan-Chang Chen, Master thesis, Dept. of CS, NTU.

Chen devised new algortihms based on ``Property testing in bounded degree graphs,'' O. Goldreich and D. Ron. Proc. 28th Symposium on Theory of Computing, pp. 406--415, 1997, and Bender, M.A. & Ron, D. ``Testing Acyclicity of Directed Graphs in Sublinear Time,'' ICALP '00: Proceedings of the 27th International Colloquium on Automata, Languages and Programming, Springer-Verlag, 2000, 809-820. The journal version is ``Testing properties of directed graphs: acyclicity and connectivity,'' Random Structures and Algorithms, 2002, 20, 184-205.

This paper is not diffcult, but it present an approach to show the usage of different distance metric, i.e., epsilon-far and epsilon-close. The author defined a kind of distance metric on connectivity, they connected the epsilon-far and $\epsilon{n \choose 2}$ connected components.

However I don't know the correctness of its time complexity, I can't deduce the same result from its inequality.

Update 2006-10-04
Eventually, we deduce the results of this thesis, but it use many approximations and a fact that $\epsilon=O(\frac{1}{n}) \Rightarrow n\epsilon=O(1)$

Posted by Abner Chih Yi Huang | Permanent Link | Categories: Papers, Seminar

2006-10-01 06:20:24

Stanford University

My classmate at CYCU, Yao-Liang Chung who is a ph.d. student of NTU now, are accepted by Stanford EE. Congratulations! He is always the no. 1 in my graduate class. Sometimes somebody sucked for it, but I think this situation will be improved at Stanford EE. :) It is really a challenge to him. Local genius might not so brilliant at big city. :)

Sometime I imaged that I will be brilliant. (ohmmm, I know it is almost impossible to 25 years old guy.) However, I feel terrible every time the problem I spent a lot of time on is solved by a smart guy in five minutes.

I think I lack the creativity. Recent case is I tried to solve a interesting puzzle, but I assumed the best solution is a equal size partitions not a sequence of partitions.

Another case is from chapter 1 of N. Alon's textbook ``the probabilistic method,'' i.e., if Y is a random variable, and $E[Y]\geq 2m^{1-\epsilon/2}$ .

Since $Y\leq m$ ,

$\begin{eqnarray*}E[Y]&\leq& m\times\mbox{Pr}[Y\geq m^{1-\epsilon/2}]+m^{1-\epsilon/2}\times\mbox{Pr}[Y<m^{1-\epsilon/2}]\\ &\leq& m \times\mbox{Pr}[Y\geq m^{1-\epsilon/2}] + m^{1-\epsilon/2} \end{eqnarray*}$

Therefore, we have $\mbox{Pr}[Y\geq m^{1-\epsilon/2}]\geq m^{-\epsilon/2}$

I am a smart guy, but I am not smart enough. I want to be smarter and smarter.

Posted by Abner Chih Yi Huang | Permanent Link

2006-10-01 00:44:10

Bibliographic Projects and Bibliographic Standards

I updated JabRef to version 2.1, and read its manual. I found that there is a function ``Journal abbreviations,'' but I don't know how to use it. After googling, I found the Journal abbreviations ISO Standard!

I knew there are that each journal has its own standard of abbreviation, but it is really unexpected that there exist a ISO Standard for it...:)

Posted by Abner Chih Yi Huang | Permanent Link