September 2006 Archives

2006-09-30 09:06:06

Ten Simple Rules for Getting Published

Ref. : Bourne PE (2005) Ten Simple Rules for Getting Published. PLoS Comput Biol 1(5): e57

  1. Read many papers, and learn from both the good and the bad work of others.
  2. The more objective you can be about your work, the better that work will ultimately become.
  3. Good editors and reviewers will be objective about your work.
  4. If you do not write well in the English language, take lessons early; it will be invaluable later.
  5. Learn to live with rejection.
  6. The ingredients of good science are obviousˇXnovelty of research topic, comprehensive coverage of the relevant literature, good data, good analysis including strong statistical support, and a thought-provoking discussion. The ingredients of good science reporting are obviousˇXgood organization, the appropriate use of tables and figures, the right length, writing to the intended audienceˇXdo not ignore the obvious.
  7. Start writing the paper the day you have the idea of what questions to pursue.
  8. Become a reviewer early in your career.
  9. Decide early on where to try to publish your paper.
  10. Quality is everything.

Posted by Abner Chih Yi Huang | Permanent Link

2006-09-30 08:51:14

Ten Simple Rules for Reviewers

Ref. : Bourne, P.E. & Korngreen, A. Ten Simple Rules for Reviewers, PLoS Computational Biology, 2006, 2, e110

This issue of PLoS Computational Biology has two interesting articles to me. One of them is ``Ten Simple Rules for Reviewers.''
  1. Do Not Accept a Review Assignment unless You Can Accomplish the Task in the Requested TimeframeˇXLearn to Say No. However it seems not easy to say no for chinese professor, I think the chinese culture made people be used to say yes for others, especially the powerful people's requests
  2. Avoid Conflict of Interest.
  3. Write Reviews You Would Be Satisfied with as an Author
  4. As a Reviewer You Are Part of the Authoring Process
  5. Be Sure to Enjoy and to Learn from the Reviewing Process. It is best case, but if this request was from your adviser then you might not have such many choices. ^_^;;;
  6. Develop a Method of Reviewing That Works for You. I am convieced it is helpful, however I don't know how to do.
  7. Spend Your Precious Time on Papers Worthy of a Good Review
  8. Maintain the Anonymity of the Review Process if the Journal Requires It
  9. Write Clearly, Succinctly, and in a Neutral Tone, but Be Decisive.
  10. Make Use of the ``Comments to Editors''


References
  • Bourne PE (2005) Ten simple rules for getting published. PLoS Comput Biol 1(5): e57. DOI: 10.1371/journal.pcbi.0010057.
  • Bourne PE, Chalupa LM (2006) Ten simple rules for getting grants. PLoS Comput Biol 2(2): e12. DOI: 10.1371/journal.pcbi.0020012.

Posted by Abner Chih Yi Huang | Permanent Link

2006-09-26 09:56:27

Communications of the ACM, Volume 49, Number 9, 2006

Two articles in this issue of CACM(Communications of the ACM, Volume 49, Number 9, 2006) are interesting!

A Wiki for discussing and promoting best practices in research

The result of our work combined effort with many other ACM and IEEE leaders as a Wiki (wiki.acm.org/healthcc) for presenting and discussing ideas and responses to this growing challenge.

Wow! The professors seems interested in Web 2.0!

The business of software: Software: hard data, Phillip G. Armour

Productivity is Getting Lower

One of the most interesting things QSM found was that productivity of projects, after rising steadily and quite predictably for the 15 years from 1982 through 1997, has declined markedly since then. The primary productivity factor they tracked (a factor called "Productivity Index" or PI) is now around the same level as it was back in the late 1980s. It might be enlightening to postulate what reasons there could be for this decline.

I would like to invite reader comments on and interpretations of these results and if they match other observations and measurements. It would interesting if we, as a profession, were to think how we could measure the reasons for these findings, and understand what these critical trends actually mean to our business.

This article give us many ``everybody know'' knowledges like ``small development team is better,'' however the most surprising thing is ``Productivity is Getting Lower.'' Well! Does it mean the Soft-Engineering useless?

:)

Posted by Abner Chih Yi Huang | Permanent Link

2006-09-25 20:27:20

LaTeX on Webpage

For building new website of Lab, I need a new system. MediaWiki is a free software wiki package originally written for Wikipedia. It support the LaTeX code in their webpage.

I also want to get the LaTeX support in my blog, however MediaWiki is to heavy for personal usage. MimeTeX, licensed under the GPL, could easily embed LaTeX math in html pages without installing LaTeX, i.e.,



Note: For my server, it need to change ``../cgi-bin/mimetex.cgi?x=\frac{-b\pm\sqrt{b^2-4ac}}{2a}'' to ``../../cgi-bin/mimetex.cgi?x=\frac{-b\pm\sqrt{b^2-4ac}}{2a}''

Note: Since the pages in Archives have different path, therefore the path ``../cgi-bin/mimetex.cgi?'' should be changed to ``http://algorithm.cs.nthu.edu.tw/cgi-bin/mimetex.cgi?''

LatexRender seems more powerful, but it need to install more softwares on server.

Ref.: An article on Hongbo's Blog.

Posted by Abner Chih Yi Huang | Permanent Link

2006-09-24 22:32:09

NCTU EE5065 Optimzation Theory Note 1

The lecturer is prof. Shin-Yeu Lin. This course focusd on Unconstrained Optimization and Constrained Optimization. The main material is lecture notes provided by lecturer and the references are Edwin K.P.Chong and Stanislaw H.Zak's ``An Introduction to Optimization,'' and David G. Luenberger's ``Linear and Nonlinear Programming.''

This course is engineering style. Prof. Lin started from the chpater six of Zak's book, and he didn't explain some mathematical bases of optimization theory.

Some References:

Concepts of Geometry and Analysis:
  • Hyperplanes: I have a little difficulty to image the n-D space for n>3. An hyperplane is ; if a=0 then it is called hyperspace. An (n-1)-hyperplane cut a space into two helf-spaces.
  • Usually we denote C^1 for continuously differentiable and C^2 for twice continuously differentiable.
  • We usually called y=ax+b a linear function, but it is not correct. y=ax+b is an affine function, and the linear function is y=ax.
  • Let ;
    1. If f is differentiable at x, then all partials exist at x, and
    2. If all partials exist and are continuous att x, then Df(x) exists,and
    3. f is C^1 on S iff. all partial derivatives of f exist and are continuous on S.
    4. A symmetric matrix Q is said to be positive definite if the quadratic form x^TQx is positive definite.If Q is positive definite, we write Q > 0. Similarly, we define a symmetric matrix Q to be positive semidefinite (Q >= 0), negative definite (Q < 0), and negative semidefinite (Q <= 0), if the corresponding quadratic forms have the respective properties. The symmetric matrix Q is indefinite if it is neither positive semidefinite nor negative semidefinite.


  • Let . Let x be any point in S, and let h in R^n. The directional derivative of f at x in the direction h is defined as

    when this limit exists, and is denoted Df(x;h).

  • There is a little difference between Optimization Theory and Analysis of Algorithms, i.e, small o notation o(). In Optimization Theory, ,

    e.g.,

    In Analysis of Algorithms, .

    e.g.,

    Ref. Paul Walton Purdom, Jr., Cynthia A. Brown's The analysis of algorithms. And Edwin K.P.Chong and Stanislaw H.Zak's ``An Introduction to Optimization.''

    The Big-O notation is different, too. The idea is similar, but in Optimization Theory they care the behavior of function as x approximating 0. According to Wikipedia,

    There are two formally close, but noticeably different usages of this notation: infinite asymptotics and infinitesimal asymptotics. This distinction is only in application and not in principle, howeverˇXthe formal definition for the "big O" is the same for both cases, only with different limits for the function argument.

    The Infinite asymptotics is like the Big-O notation in Analysis of Algorithms, the Infinitesimal asymptotics is like the Big-O notation in Optimization Theory.

  • There are a lot of Matrix Calculus in this fields, however, I don't understand gow to do this. e.g.,

    I collected some links about it, but they are references and lack details.

We say that an iterative algorithm is globally convergent if for any arbitrary starting point the algorithm is guaranteed to generate a sequence of points converging to a point that satisfies the FONC for a minimizer. When the algorithm is not globally convergent, it may still generate a sequence that converges to a point satisfying the FONC, provided the initial point is sufficiently close to the point. In this case, we say that the algorithm is locally convergent.



Methods of Unconstrainted Optimization:
  • 1-D: Most of problems are not in this case, however we could reduce objective function to 1-D search sometimes.
    • Golden section: reduce the uncertainity interval with the Golden Section ratio 0.382. The idea is to save the calculation of section points, i.e., take one of k-th section points as the one of (k+1)-th section points.
    • Fibonacci search: reduce the uncertainity interval with different ratios. According the proof, the ratio will be a function of Fibonacci series. The idea is to accelerate the reduction of uncertainity interval.
    • Newton's method: Need the first and second derivatives. It use the minimum of Tylor approximation f(x)=f(x')+f'(x)(x-x')+f''(x)((x-x')^2)/2. It only needs one starting point.
    • Secant method: It needs two starting points. It used the positions and derivates of such two points to compute the slope of first-order derivate for approximating the second order derivative in Newton's method.
  • Gradient method: Evaluate the gradient of f(x) at point x', and use the feasible direction with the maximum descreasing to approach the minimum.
    • Steepest descent algorithm: Use the first and second derivatives to find the Steepest descent feasible direction d and maximum feasible step size a for f(x')=f(x-ad).
    • Fixed step size gradient algorithm: Like Steepest descent algorithm but adapt the fixed step size. It is much simpler than steepest descent algorithm in practical, however the algorithm will converge iff. the step size
    However Zak's book only disscuss the convergence of Steepest descent algorithm for quradratic function. The convegence condition of steepest descent algorithm contains the matrix Q which is not Hessian matrix of objective function. Q is the matrix in the quadratic form . Actually the steepest descent algorithm does not require the existence of second derivative.

    Ref.: David G. Luenberger's ``Linear and Nonlinear Programming,'' pp. 218, 219 for an alternative approach to the convergence ratio analysis. Ref.: Gradient descent - Wikipedia, the free encyclopedia

  • Newton's method (sometimes called the Newton-Raphson method): Gradient method might not efficient enough for solving problems. An extension of Newton's method will provide faster convergence ratio if the starting point is close to the x^* enough. Thus it also need to some convergent analysis.

    It construct a quadratic approximation to the objective function that matches the first and second derivative values at that point. We then minimize the approximate (quadratic) function instead of the original objective function.

    The procedure is that firstly solve ; secondly set . Actually in the case quadratic form the convergence of Newton's method is naturally.



    Therefore, for the quadratic case, the order of convergence of Newton's algorithm is infinity for any initial point. However, generally

    is a point such that is invertible. Then, for all x^0 sufficiently close to x^*, Newton's method is well defined for all k, and converges to x^* with order of convergence at least 2.

    Ref. :Newton's method in optimization - Wikipedia, the free encyclopedia

    Modified Newton's method: The formula to compute the next point of Newton's method is , where \alpha_k is always 1 for all k. However, in the modified version, it use line search to find \alpha_k that minimize the f like steepest descent method.

    Levenberg-Marquardt modification to Newton's algorithm: (LMA)

    Above algorithm needs that Hessian martix of f is positive definite to guarantee the descent property. If the Hessian matrix is not positive definite, then the search direction may not point in a descent direction. A simple technique to ensure that the search direction is a descent direction is to introduce the so-called Levenberg-Marquardt modification to Newton's algorithm:



    The idea is to make the 2ed derivative positive definite.

    If , then this algorithm behaves like pure Newton's algorithm; if , then this algorithm behaves like gradient algorithms. In practice, we may start with a small value of , and then slowly increase it until we find that the iteration is descent.

    Ref. : Levenberg-Marquardt algorithm - Wikipedia, the free encyclopedia.

    Ref.: Gauss-Newton algorithm: (GNA) - Wikipedia, the free encyclopedia.

    Ref.: Jacobian Matrix- Wikipedia, the free encyclopedia

  • Conjugate gradient method From Wikipedia, the free encyclopedia.

    Origianlly the conjugate gradient method is an algorithm for solving particular systems of linear equations, namely those whose matrix is symmetric and positive definite, i.e., . The conjugate gradient method can also be used to solve unconstrained optimization problems. Hence it is really about the linear algebra (linear systems), and naturally for quadratic function. A set $S$ of nonzero vectors with respect to A is conjugate if .

    The conjugate vectors are linear independent, and the size of S equals n or the number of eigenvalues of A. (It also guanratee to arrive the optima in n steps.) In conjugate gradient method, for quadratic function, we have where $p^i$ is the i-th conjugate direction. Since , we know that x^{k+1} belongs to the affine space x^0 + span[{p^i}].

    If we trnasformed the objective function as where D=span[{p^i}], then there is a $\alpha$ minimizing it, i.e, minimize f over the affine space x^0 + span[{p^i}]. (Note that f is a quadratic function and has a unique minimizer. It is the reason the proof only prove FONC and SONC and conclude minimization.)

    A systematic procedure for finding Q-conjugate vectors can be devised using the idea underlying the Gram-Schmidt process of transforming a given basis of R^n into an orthonormal basis of R^n.

    The conjugate gradient algorithm without prespecified conjugate directions: It computed the i-th Q-conjugate direction in each iteration. The precedure is like the original one, but it computed the (i+1)-th conjugate direction in the i-th iteration, and those {P^i} are also Q-conjugate. Roughly speaking, it is based two properties.

    (1) we have where $p^i$ is the i-th conjugate direction. And (2)

    However, it is for a positive definite quadratic function of n variables. For the quadratic function that Hessian is unknown, there are three well-known modifications to eliminate Q in computing . (we could use linear search to find \alpha directly.)
    1. Hestenes-Stiefel formula: Subsitute the Q in
    2. Polak-Ribiere formula: Since , Hestenes-Stiefel formula actually computed
    3. Fletcher-Reeves formula: Moreover, we know that , we have
    In general, the choice of which formula for 0k to use depends on the objective function.

    The algorithm can be extended to general nonlinear functions by interpreting f(x) as a second-order Taylor series approximation of the objective function.

    For nonquadratic problems, the algorithm will not usually converge in n steps, and as the algorithm progresses, the "Q-conjugacy" of the direction vectors will tend to deteriorate. Thus, a common practice is to reinitialize the direction vector to the negative gradient after every few iterations (e.g., n or n + 1), and continue until the algorithm satisfies the stopping criterion.



  • Quasi-Newton Methods: Newton's Method has two common dracbacks, one is the convergence depending on the choosing of initial point, the other is that computing inverse matrix of Hessian costs too much computations.

    The first Quasi-Newton Algorithm was proposed by W.C. Davidon, a physicist working at Argonne National Laboratory. Like the story of QuickSort, Davidon got the trouble of applying optimization algorithms on his research, thus he decided to improve such algorithm. Eventually he developed the first quasi-Newton algorithm in the mid 1950s. However, such seminal idea is not accpeted for publications, it remained as a technical report until 1991.

    The most popular quasi-Newton algorithms are BFGS method and DFP algorithm. The rough decirption of quasi-Newton method as following,



    Quasi-Newton methods are also conjugate direction methods when it applied on guadratic objective function with symmetric Hessian, hence a quasi-Newton algorithm solves a quadratic of n variables in at most n steps.

    (Refer to Theorem 11.1 of Zak's textbook. Notice that textbook doesn't give the proof of case k=0. Accodring to lecture notes of Prof. Lin, his proof relied on the formula of \alpha_k of conjugate gradient algorithm. However that the Quasi-Newton method is equivalent to conjugate gradient algorithm is what we want to prove, I don't think that putting it into the proof is correct.)

    The Newton' method use the inverse Hessian to compute the next point. Thus if we want to approaximate inverse Hessian, then we want it to behave more and more like inverse Hessian for each iteration, i.e, for Hessian Q and a given k, since , we impose the requirement that the approximation as Thus eventually we got the inverse Hessian.



    Because inverse Hessian also holds above equation, Therefore, if is nonsingular, we have



    Thus one should prove the , if he wants to say some algorithms Quasi-Newton.

    For the quadratic function with Hessian Q=Q^T, the followings are Quasi-Newton.
    1. Rank one correction formula: start from symmetric positive definite matrix H_0. However, for nonquadratic objective function, H^i may not be positive definite and thus d^{i+1} may not be a descent direction.
    2. DFP algorithm (variable metric algorithm): start from symmetric positive definite H_0. H_{k+1} inherits positive definiteness from H_k.
    3. BFGS algorithm: The principal idea of the method is to construct an approximate Hessian matrix B_k not inverse Hessian H_k. In particular, the BFGS update for B_k corresponds to the DFP update for H_k. Formulas related in this way are said to be dual or complementary. the BFGS algorithm enjoys all the properties of quasi-Newton methods, including the conjugate directions property. Moreover, the BFGS algorithm also inherits the positive definiteness property of the DFP algorithm.


    For nonquadratic problems, quasi-Newton algorithms will not usually converge in n steps. As in the case of the conjugate gradient methods, here too some modifications may be necessary to deal with nonquadratic problems. For example, we may reinitialize the direction vector to the negative gradient after every few iterations (e.g., n or n + 1), and continue until the algorithm satisfies the stopping criterion.

  • Constrained Optimization: See the other notes.

Posted by Abner Chih Yi Huang | Permanent Link | Categories: Notes

2006-09-23 20:06:25

NTHU EE6455 Advanced Computer Architecture

This course is taught by Prof. Yarsun Hsu, and textbook is Hennessy & Patterson, "Computer Architecture, A Quantitative Approach", Morgan Kaufmann, Third Edition.

Actually it has nothing to do with my research, I take this course just because I had buy the textbook, and my background is EE.

Courses Lecture 1: Fundamentals of Computer Design

I am most interested in the Performance Evaluation. The authors of our textbook thought the best metric of performance is execution time, however there are various definitions for execution time, e.g., CPU time, response time, and many types of testing materials, i.e., real applications, modified application, kernels, toy benchmarks, synthetic benchmark. And there are different benchmarks for different needs, i.e., SPEC, TPC-H, and EEMBC for desktop, server, and embedded system respectively.

Actually, it is really difficult to analyse and compare the performance of different computer systems. There are too many chances to improve the performance by modifying the testing procedures, e.g., using the more powerful compiler. This condition also appeared in the experiments of algorithms.

I don't exactly know meaning of the ``summary performance.'' The arithmetic mean is intuitive, but harmonic means ,geometric means....

There are two approaches to summary the performance of individual testing program: average and normalization. The first one is intuitive for calculating the arithmetic mean of (weighted) execution times; the second one normalizes the execution times of machines to a reference, i.e., a specific machine, and calculate their means. The geometric mean might be more suitable for this case; it could eliminate the influence of reference choosing, since GM(A)/GM(B)=GM(A/B), nonetheless there are two main drawbacks, the first is that it can't predict the execution time, i.e., for some workloads, there are different execution times for the machines with same geometric mean measurement; the second one is that the same ratio improvement will get the same improvement of GM, therefore people will prefer to spend time on easier improved parts.

Hence the best solution still analyze the weights of individual testing programs of the real workload.

Lecture 2: Instruction Set Principles

Lecture 3: Pipelining

The instructor taught this, that is based on appendix A of textbook, because the follwoing chapters will heavily depend on pipelining.

Some mnemonics
  • ``D'' is prefix or suffix for the extended 64-bits instructions of MIPS64, e.g., LD for load, SD for store.
  • ``I'' is suffix for immediate version of instructions, e.g., ADDI for add.
  • ``U'' is suffix for unsigned arithmetic instructions which do not generate overflow exceptions, e.g., DADDU for extended 64-bits add, however sometimes the notation will be modified slightly if cooperated with immediate version like DADDUI.
Clock skew is a phenomenon in synchronous circuits in which the clock signal (sent from the clock circuit) arrives at different components at different times.

If clock cycle is as small as the sum of the clock skew and latch overhead, no further pipelining is useful, since there is no time left in the cycle for useful work.

(Pentium 4 vs. Pentium 3 met this problem.)

The speedup equation of pipelining in this chapter is a little bit complicated. I prefer to undersatnd the meaning and deduce the needed equation immediately, but it might not good approach for exam.

In this book, the authors use ``exception'' for all other terms interrupt, fault, and exception.

15

Consider the branch and exceptions on pipelined machince. If exception occurs at the branch delay slot, it might got some problems such as "forget to do branch" after resuming, thus if exceptions occur, it might store the PCs of delay slots and the target PC of branch.

The ``precise exception'' is better and harder. However it is hard to do this in more complicated cases like float points for linger pipeline.

Exception order is not as the pipeline order



An instruction is called commited if it quaranteed to complete. In MIPS integer pipeline, all instructions are committed when they reach the end of the MEM stage or beginning of WB stage, and no instruction updates the state before that stage.
  • Structual Hazard: Since the difference of lengths of pipeline stages of instructions, it might occur that the memory access request at the same time.
  • WAW Hazard: The later instruction complete the WRITE before the former instruction completing the WRITE. (For the same target.)
  • WAR Hazard: Impossible since the register reads always in ID.
  • RAW Hazard: The later instruction complete the WRITE before the former instruction completing the READ.


16

17

For superpipelining, like MIPS R4000, designer divided the memory access into serval stages. Divide IF into two stages, and MEM into three stages (Combined with caches). In such case, the ``load'' and ``branch'' delay increased. By forwarding DS to EX, it still costed the 2 cycles for stalling, and `branch'' delay costed 3 cycles since the branch condition is determined at EX.

The MIPS architecture has a 1 cycle ``delay slot'', thus the MIPS R4000 uses a predicted-not-taken strategy for remaining 2 cycle of branch delay. Furthermore it needs more forwardings of R4000, i.e., EX/DF, DF/DS, DS/TC, and TC/WB.

MIPS design a new instruction ``Blez1'' (branch likely) that make delay slot be turned into no-op when branch is not taken. This is useful since compiler can't always find the instruction to insert into delay slot.

MIPS R4000 has 4 kinds of stall: Load stall, Branch stall, FP result stall (RAW for FP), FP structual stall. Thus the total pipeline CPI = pipeline CPI + Load stall + Branch stall + FP result stall + FP structual stall.

xx

xxx

18

Lecture 5: Dynamic Scheduling -- Scoreboard: Scoreboarding is a technique for allowing instructions to execute out of order when there are sufficient resources and no data dependencies; it is named after the CDC 6600 scoreboard, which developed this capability. However, the scoreboard is limited in that it does not handle WAR and WAW hazards very well. (handle it by stalls)
  1. Issue
  2. Read operands
  3. Execution
  4. Write result


Lecture 6: Exploiting Instruction Level Parallelism: Instruction-level parallelism (ILP) is a measure of how many of the operations in a computer program can be performed simultaneously.

Dependency
  1. True dependency: Data Hazard (RAW)
  2. Name dependency: It is not real true data dependency. It could be solved by renaming.
    1. Anti-dependency: WAR
    2. Output dependency: WAW
  3. Control Dependency: Like if then else statement. See following figure. 19


Tomasulo Approach is another scheme to allow execution to proceed in the presence of hazards developed by the IBM 360/91 floating-point unit. This scheme combines key elements of the scoreboarding scheme with the introduction of register renaming.
  1. Issue
  2. Execution
  3. Write result
Common Data Bus (CDB): data bus (64 bits) + functional unit source address (4 bits)

The algorithm allows data forwarding wherever possible. (refer to The Scheduling Algorithm, however it seems not be mentioned by textbook.)

110

Two main adavantages of Tomasulo Approach:
  1. register renaming
  2. distributed control (by reservation stations)
112

Lecture 7: Dynamically Exploiting ILP (I)




Lecture 8: Dynamically Exploiting ILP (II)

Posted by Abner Chih Yi Huang | Permanent Link | Categories: Notes

2006-09-23 19:32:33

The Art of Uniformed Decisions

Ref.: E. Fischer, The art of uninformed decisions: A primer to property testing, The Computational Complexity Column of The Bulletin of the European Association for Theoretical Computer Science 75 (2001), 97-126.

Like the online algorithms, ``property testing'' has its own framework of analysis, i.e., $\epsilon$-close or $\epsilon$-far. I don't think what the $\epsilon$ means is very important, because the metric function is quite different from various cases. Terms like testable are really matters.

Section 5 show that why we need to study Szemer\'edi's Regularity Lemma. However in the references, there are too much mathematical stuff. The first step might be [AFKS 2000] that the author said that it prove a variant of Regularity Lemma.

Section 8 discuss the non-testable property, it is interesting that they use the Yao's method to prove non-testability, and connect it to the section 5, i.e., `$\forall\exists$' type first order graph property.

Posted by AbnerCYH | Permanent Link | Categories: Papers

2006-09-23 19:19:04

Update to RC5

As title:)

Posted by Abner Chih Yi Huang | Permanent Link | Categories: Nanoblogger

2006-09-23 05:44:20

LEAST Seminar 20060922

Title: Faster algorithms for computing longest common increasing subsequences

Authors: Gerth Stolting Brodal, Kanela Kaligosi, Irit Katriel, and Martin Kutz

Source: 17th Annual Symposium on Combinatorial Pattern Matching (CPM06)

Speaker: Chieh Chin Chiu Time: 9.22 (Friday) 19:00-21:00 Place: Room 550, Department of Computer Science, NTHU

I am interesting the meeting in last night. I found a journal version paper on the webpage of one of authors.

This paper discuss the problem LCIS is to find the longest common increasing subsequences of two sequences A and B with length m, n respectively. The first algorithm is $O(mn)$ based on dynamic programming, and its space complexity could be reduced to $O(m)$ by divide-and-conquer approach, where $m>n$.

This paper also adopt the dynamic programming approach, but they use a special data structure, Van Emde Boas tree, to reduce the time complexity.

LCIS

The authors defined a new data structure Bounded Heap with three operations, it could base on McCreight's Priority Search Tree, which costs $O(log n)$ each operation, however they chose the Van Emde Boas tree, because it supports fewer operations but costs only $O(log log n)$ each operation. And its operations are enough. Besides, it surprised us that this is invented in 1977. Prof. R. C. T. Lee commented on it: It is early age of CS. R. E. Tarjan's minimum spanning trees algorithm is in 1976....:)

The part I am most interesting in is the technique reducing the time complexity of dynamic programming. The algorithm has $n$-iterations, they compute length $i$ CIS in $i$-iteration by results of $i-1$-iteration.

According to the page of wikipedia or Erik Demaine's lecture note, the Van Emde Boas tree included the delete operation. Authors seem not exactly correct, or I misunderstand their expressions.

Van Emde Boas tree
The original paper is unavaliable. For applying this data structure to other problem, we should know the abstract problem -- fixed-universe successor problem, i.e., Maintain a dynamic subset S of size n from the universe of size u, and one want to find the smallest element in S that is > x for given x, i.e, find successor of x in S.

The idea of Van Emde Boas tree is to combine bit vector and binary search tree. It is defined recursively. Prof. Erik Demaine's lecture notes of Advanced Data Structures is very good reference, but I would like express it in the global view.

Clipboard01

Actually it is heap, i.e., complete binary tree. The leaves is the bits to present the appearence of element in S, hence there are u leaves. Thus there are (log u) levels if we adopted the binary representation. Each internal node is 1 if one of its children is 1 or both are 1.

If we use the first half bits, , as an index (another Van Emde Boas tree with leaves), then we could traverse the tree to find successor. One traverse a (log u) height tree would cost O(log u), but we could jump half height by index, i.e., doing binary search on the heap, therefore it costs .

The details of operation definitions refer to Prof.Demaine's lecture notes. Furthermore it could reduce space complexity by using hash fuction.

Priority Search Tree
McCreight's Priority search tree is combination of priority queue and binary search tree.

Ref. :CSIE Geometric Computing and Visualization, Instructor : D. T. Lee

However, I can't see the difference between Priority search tree and binary search tree with geometic meaning in D. T. Lee's slides.

Posted by AbnerCYH | Permanent Link | Categories: Seminar

2006-09-22 03:46:53

Two papers captured my eyes

Recently, I am reading E. Fischer's ``The art of uninformed decisions: A primer to property testing.'' In section 5, prof. Fischer wrote an idea that express the ``graph property'' in first order logic, and I saw a paper this morning. I think it might useful.

Ref. : Oleg Pikhurko, Helmut Veith and Oleg Verbitsky, The first order definability of graphs: Upper bounds for quantifier depth, Discrete Applied Mathematics, Volume 154, Issue 17, , 15 November 2006, Pages 2511-2529.

Another paper is Uriel Feige, Eran Ofek's ``Random 3CNF formulas elude the Lovasz theta function.''

I don't know when I first time hear Lovasz theta function, but I like it. It is about the Shannon capacity of the graph G.

Shannon capacity of the graph

Ref.: Lovasz, L., "On the Shannon capacity of a graph," Information Theory, IEEE Transactions on , vol.25, no.1pp. 1- 7, Jan 1979

Posted by AbnerCYH | Permanent Link

2006-09-22 02:25:28

Parallel Architecture

I reviewed two conference papers of ICS2006 few days ago. Those are about the Interconnection Networks. Although it is also an application of graph theory, I am not familiar with it. The considerations and goals are not like the special graph algotihmics., however it might be an interesting extension for my research.

Prof. Biing-Feng Wang teach ``Parallel Algorithm Design'' this semester.

PART I 1. Introduction: Classification of parallel computers .Performance of parallel algorithms 2. Shared-memory computers, basic techniques, and Brent Theorem

PART II 3. Tree Machine 4. Linear-processor array 5. Mesh-connected computers 6. Hypercubes 7. Perfect shuffles 8. Mesh-connected computers with multiple broadcasting 9. Processor arrays with reconfigurable bus systems

PART III 10. Systolic architectures 11. Wavefront array processors 12. Distributed algorithms 13. Fault-tolerance, randomized algorithms, P-completeness

References: [1] J. JaJa, An introduction to parallel algorithms, Addison Wesley, 1992. [2] S. G. Akl, The design and analysis of parallel algorithms, Prentice-Hall, 1989. [3] Selected papers from journals and proceedings.

However I didn't choose this course, actually I were not very interested in the parallel algorithms, but I am interested in the applications of graph theory of Parallel Architecture, e.g., Fault-tolerance, after that I reviewed those two papers. (I had written a note about Parallel Algorithms)

Some interconnection networks, e.g., arrangement graphs and hypercube-like graphs, interest me very much, they have pretty interesting properties.

Courses

Khaled Day and Anand Tripathi, Arrangement graphs: a class of generalized star graphs, Information Processing Letters, Volume 42, Issue 5, , 3 July 1992, Pages 235-241.

the arrangement graph, as a generalization of the star graph topology and prove many of its properties such as: hierarchical structure, vertex and edge symmetry, simple and optimal routing, and many fault tolerance properties. The arrangement graph presents more flexibility than the star graph in terms of choosing the major design parameters: degree, diameter, and number of nodes.


Posted by AbnerCYH | Permanent Link | Categories: Notes

2006-09-18 15:25:09

Nonlinear Time Series Analysis

I wrote a note about data analysis yesterday, and I decided to write another note about Nonlinear Time Series Analysis, since I am interested in complex (non-linear) topics.

However, I found that there are less Internet resources explaining the idea of Nonlinear Time Series Analysis; there are many website talk about Time Series Analysis, but less about Nonlinear parts.

The Chaos Research Group

The Chaos Research Group studies deterministic chaos and nonlinear dynamics in engineering systems and is located within the College of Engineering at the University of Tennessee. This group closely collaborates with the researchers at the Oak Ridge National Laboratory.

They study some physical and mechanical phenomena. I can't understand most of them, but there are three topics specially interesting me, i.e. Internal combustion engines, Chaos control algorithms, Time-series analysis. Here I focus on Nonlinear time-series analysis

The process of analyzing time series using mathematical and numerical data transformations or even appropriate graphical displays constitutes a field of science known as time-series analysis. Conventional signal-processing techniques include Fourier transforms, autocorrelation functions and autoregressive data modeling, but these methods generally rely on linear relations and often have been found insensitive in describing the nonlinear structure in chaotic time series.

Chaotic time-series analysis (CTSA), or nonlinear time-series analysis (NTSA), refers to a class of data-analysis techniques employed to provide a richer description of chaotic time series. The CRG has been involved continuously in developing and applying new CTSA algorithms to time series, and current research topics include stationarity tests, dynamical state classification and periodic orbit detection.



Ben Goertzel's paper, Delay Vectors as Perceptual Chunks: Understanding Nonlinear Time Series Analysis as Self-Organizing Cognition, explain this by another viewpoint, but for me it is harder to understand.

I would like to ask: how do we decide to adopt Time Series Analysis or Nonlinear Time Series Analysis? Although most phenomena of real world are nonlinear, actually we could use many techniques to approximate them. Give some assumptions, ignore some variables, we could get pretty good results sometimes. What is the criteria of evaluating approaches?

Other Sites

Posted by AbnerCYH | Permanent Link | Categories: Notes

2006-09-18 07:46:09

Data Analysis

Actually I don't like some fields about statistics like Data Analysis. I know it is useful, but I didn't like it. Some reasons might be the elementary statistics courses always gave many formula without explaining the sources, therefore students need to remember them.

However, I have some interests to learn it after watching Numb3rs....:)

My first question is the difference between Data Analysis and Data mining. According to Data analysis - Wikipedia, the free encyclopedia

Respect to Data mining, data analysis is usually more narrowly intended as not aiming to the discovery of unforeseen patterns hidden in the data, but to the verification or disproval of an existing model, or to the extraction of parameters necessary to adapt a theoretical model to (experimental) reality.

After taking a look into Wikipedia, I was surprised that so much sub-fields of statistics. However, I thought I should start from the terms taht are familiar to computer scientist.

Data mining - Wikipedia, the free encyclopedia

Data mining (DM), also called Knowledge-Discovery in Databases (KDD) or Knowledge-Discovery and Data Mining, is the process of automatically searching large volumes of data for patterns such as association rules. It is a fairly recent topic in computer science but applies many older computational techniques from statistics, information retrieval, machine learning and pattern recognition.

Machine learning - Wikipedia, the free encyclopedia

At a general level, there are two types of learning: inductive, and deductive. Inductive machine learning methods create computer programs by extracting rules and patterns out of massive data sets. It should be noted that although pattern identification is important to Machine Learning, without rule extraction a process falls more accurately in the field of data mining.



Other fields about Data Analysis
  • Regression analysis
  • Analysis of variance
  • analysis of covariance:

    an old-fashioned name for a linear regression model with one continuous explanatory variable and one or more factors. The name exists for historical reasons, but there is no particular reason to distinguish the method from the general purpose linear model.

  • Time series analysis
  • Principal components analysis

    In statistics, principal components analysis (PCA) is a technique for simplifying a dataset, by reducing multidimensional datasets to lower dimensions for analysis.

    Technically speaking, PCA is a linear transformation that transforms the data to a new coordinate system such that the greatest variance by any projection of the data comes to lie on the first coordinate (called the first principal component), the second greatest variance on the second coordinate, and so on.

  • Andreas Galka's links collection on Nonlinear Time Series Analysis. It doesn't contain the explaination of Nonlinear Time Series Analysis. TISEAN is free a software project for the analysis of time series with methods based on the theory of nonlinear deterministic dynamical systems, or chaos theory, if you prefer.
  • Neural Modeling and Data Analysis It is not Neural Network.
  • Computational Mechanics


Other Resources
  • Interactive Statistical Calculation Pages listed here comprise a powerful, conveniently-accessible, multi-platform statistical software package.
  • StatLib is a system for distributing statistical software, datasets, and information by electronic mail, FTP and WWW. Therefore it might be a good start point for learning data analysis.

Posted by AbnerCYH | Permanent Link | Categories: Notes

2006-09-18 05:41:30

Neuroeconomics

There are many weird new terms about science and society, such as Neuroeconomics, Ecophysics, etc. The term Ecophysics seems no good reputations, but Neuroeconomics seems better. At least The New Yorker had a positive report ``MIND GAMES -- What neuroeconomics tells us about money and the brain, by JOHN CASSIDY.

Ref.: Neuroeconomics - Wikipedia, the free encyclopedia

Ref.: Neuroeconomics Blog - Center for the Study of Neuroeconomics at George Mason University

The Neuroeconomics Blog try to give it a definition:

Neuroeconomics is an interdisciplinary research program with the goal of building a biological model of decision making in economic environments. Neuroeconomists ask, how does the embodied brain enable the mind (or groups of minds) to make economic decisions? By combining techniques from cognitive neuroscience and experimental economics we can now watch neural activity in real time, observe how this activity depends on the economic environment, and test hypotheses about how the emergent mind makes economic decisions. Neuroeconomics allows us to better understand both the wide range of heterogeneity in human behavior, and the role of institutions as ordered extensions of our minds.

There is a note that seems talking about the same thing. Cosma's Notebook: Social Neuroscience

Ref.: Neuroeconomics at Caltech - The Camerer Lab

Posted by AbnerCYH | Permanent Link | Categories: Notes

2006-09-18 03:36:26

Primal-Dual Scheme

Ref. : Goemans, M. & Williamson, D., The Primal-Dual Method for Approximation Algorithms and its Application to Network Design Problems, chapter 4, Approximation Algorithms, PWS Publishing Company, 1997.(This chpater also contained the model of network design, therefore it might be a good reference when one want to design approximation akgorithm for some graph or network problems.)

Originally Primal and Dual method is another method solving linear program. In 1955, Harold Kuhn invent the first version primal-dual Hungarian method for solving assignment problems. This was revised by James Munkres in 1957, and has been known since as the Hungarian algorithm, the Munkres assignment algorithm, or the Kuhn-Munkres algorithm.After that, Primal and Dual method inspired many researchers in combinatorial optimization.

The main feature of the primal-dual method is that it allows a weighted optimization problem to be reduce to a purely combinatorial, unweighted problem. Most of the fundamental algorithms in combinatorial optimization either use the method or can be understand in terms of it, including the Dijkstra's shortest path algorithm [Dij59], ford and Fulkerson's network flow alforithm[FF56], Edmond's non-bipartite matching algorithm[Edm65], and of course Kuhn's assignment algorithm.

Actually we could take the dual program of assignment problem as min. weight perfect matching problem on bipartite graph.

The idea of LP-Dual $\mathit{D}$ is that we could estimate the solution of LP-Primal $\mathit{P}$, i.e., one could try to match the linear combination of constraints to the objective function, therefore we would get the lower bound $\sum^m_j y_jb_j$ by the linear combination of constraints. Furthermore we could extend this idea.

We have the following theorem \THM{{\bf Weak Duality} For any primal-dual solution pair $(x,y)$, $\VEC{c}^T\VEC{x}\geq\VEC{y}^T\VEC{b}$.}

It is obviously that solution of $\mathit{D}$ is increasing and solution of $\mathit{P}$ is decreasing; when they met, we have \THM{{\bf Strong Duality} The optimal primal solution will equal to dual solution, i.e., $\VEC{c}^T\VEC{x}=\VEC{y}^T\VEC{b}$. }

\DEF{We define $\VEC{c}^T\VEC{x}-\VEC{y}^T\VEC{b}$ the {\bf duality gap}.}

\DEF{We define $\sup_I\frac{OPT_{IP}(I)}{OPT_{LP}(I)}$ the {\bf integral gap}.}

Primal-Dual Schema om Approximation Algorithms
  1. Formulate Dual-LP,
  2. Construct the feasible solution of Dual-LP,
  3. Make the feasible solution tight combinatorially,
  4. Keep the corresponding solution of Primal-LP feasible.\newline


References
  • The Primal-Dual Schema for Approximation Algorithms: Where Does It Stand, and Where Can It Go?, Vijay Vazirani, Georgia Institute of Technology (USA), Algorithms Seminar, December 11, 2000.
  • Introduction to Linear Optimization, by Dimitris Bertsimas and John N. Tsitsiklis.

    Chapter 7: Provides a comprehensive review of the principal results and methods for the different variants of the network flow problem. It contains representatives from all major types of algorithms: primal descent (the simplex method), dual ascent (the primal-dual method), and approximate dual ascent (the auction algorithm). The focus is on the major algorithmic ideas, rather than on the refinements that can lead to better complexity estimates.

  • Vazirani, V.V., ``Primal-dual schema based approximation algorithms,'' Theoretical aspects of computer science: advanced lectures, Springer-Verlag New York, Inc., 2002, pp. 198-207.
  • Williamson, D.P., The primal-dual method for approximation algorithms, Mathematical Programming, 2002, 91, 447-478.

Posted by AbnerCYH | Permanent Link | Categories: Notes

2006-09-18 03:03:18

Worse or Better

I had said that I was considering many things. Something of them are done.

I got the TA position of ``Computer Systems & Applications,'' but I didn't pass the entrance exam of ``academic writing.'' However I need to report the Lovasz Local Lemma on``Special Topics in Algorithms.''

I need to read a survey about property testing and get an interview with my adviser, and build a server for ``Graph algorithms meeting.''

Nevertheless the worst thing might be the project that raising the standard of graduation. Master program student might publish a conference paper for graduating.

And there are something still interesting me, i.e., fractional graph theory (fractional dominating), topological method, LP primal and dual method.

Posted by AbnerCYH | Permanent Link

2006-09-18 00:24:23

Fate and Plan

Pigeonhearted people like me prefer to collect information, analyzes the risk, and make a plan, rather than involve in directly, however the plan always mismatch the fate.

I remember that I would like to be a business man at junior high school, physicist at senior high school, EE engineer at college, and I changed my mind to apply master program of computer science.

At that year I prepared the entrance exam of graduate school, I found that algorithmics and combinatorics interested me more than operating systems and computer architecture on which I originally wanted to research.

However, I decided to join the Algorithm and Biocomputing Laboratory. The first reason was that Prof. Tang's professional field is algorithmics, and I thought that the algorithmics is the most important training which I need. One year passed, I think I have a little bit precipitation.

I always want to learn many things but I don't really spend enough time on those.

Now I am considering to plan my life. At least I am trying to decide to pursue a Ph.D. or not, but I also know that it is not that useful.

Posted by AbnerCYH | Permanent Link

2006-09-17 17:47:49

Numb3rs

Ref.: CBS.com: Numb3rs

I currently enjoyed in those American television show about FBI, such as CSI, Criminal Mind. And I like NUMB3RS mostly.



The ep. 2 of season 1 introduced the NP vs. P problem....:)

The mathematic an and physicist are cool.

I had dreamed to be a scientist solving people's problems by math and scientific analysis just like complex scientist....

Ref.: NUMB3RS From Wikipedia, the free encyclopedia

Posted by AbnerCYH | Permanent Link

2006-09-16 18:17:56

Zero Knowledge and the Chromatic Number

Ref. : Uriel Feige and Joe Kilian, ``Zero Knowledge and the Chromatic Number,'' Journal of Computer and System Sciences, Volume 57, Issue 2, , October 1998, Pages 187-199.Note that the journal version contains many small typos. The better choice is to download the e-print from the authors' homepages.

We present a new technique, inspired by zero-knowledge proof systems, for proving lower bounds on approximating the chromatic number of a graph. To illustrate this technique we present simple reductions from max-3-coloring and max-3-sat, showing that it is hard to approximate the chromatic number within[Omega](N[delta]) for some[delta]>0. We then apply our technique in conjunction with the probabilistically checkable proofs of Hastad and show that it is hard to approximate the chromatic number to within[Omega](N1-[var epsilon]) for any[var epsilon]>0, assuming NP[nsube]ZPP. Here, ZPP denotes the class of languages decidable by a random expected polynomial-time algorithm that makes no errors. Our result matches (up to low order terms) the known gap for approximating the size of the largest independent set. Previous O(N[delta]) gaps for approximating the chromatic number (such as those by Lund and Yannakakis, and by Furer) did not match the gap for independent set nor extend beyond[Omega](N1/2-[var epsilon]).

I am interested in this paper because I want to know how to connect the zero-knowledge proof systems to chromatic number problem. It seems without any relation between such two topics intuitively.

Actually this paper is not the newest result on this problem, to best of my knowledge, the paper ``Linear degree extractors and the inapproximability of max clique and chromatic number'' by David Zuckerman on STOC2006 should be the newest result. But this paper appeal my attention first by his interesting title.

Free Bits

In the section 2, the authors summarized the previous works on this problem. They mentioned the idea ``free bit'' that are proposed by M. Bellare and M. Sudan's paper ``Improved Non-Approximability Results.''

It might useful to refer prof. Sudan's lecture note, but it is still hard to understand such concept. Luca Trevisan's paper ``Inapproximability of Combinatorial Optimization Problems'' explained this concept better, i.e.,

We say that a (r(n), q(n))-restricted verifier V for a language L has ``free bit complexity'' at most f(n) if, for every input x and every choice of the random input, there are at most $2^{f(n)}$ possible answers to the q(n) queries made by the verifier that would make it accept.

This implies that the important parameter to optimize is not the number of queries but rather the number of accepting computations.

According to M. Bellare, O. Goldreich, and M. Sudan. ``Free bits, PCP's and nonapproximability - towards tight results.'' SIAM Journal on Computing, 27(3):804--915, 1998.

The free-bit complexity, roughly speaking, is the logarithm of number of possible accepting configurations of V on coins R and input x. (For example, a verifier which makes three queries and accepts, iff parity of the answers is odd, has four accepting configurations and thus free-bit complexity 2.)

I.e., {001,100,010,111} four accepting configurations. However the amortized free-bit complexity is still not easy to understand.

the amortized free-bit complexity is the free-bit complexity (of a proof system with perfect completeness) divided by the logarithm of the gap (i.e., the number of free bits needed per factor of 2 increases in the gap).

where the gap g is defined as $g={completeness probability}/{soundness probability}$.

The ``amortized query complexity'' is also an important parameter of the PCP verifier, although it is not mentioned here. There is a paragraph about it in Alex Samorodnitsky and Luca Trevisan's paper ``A PCP characterization of NP with optimal amortized query complexity'' ``Furthermore, one can show that, in a PCP characterization of NP, a verifier having error s must have query complexity at least log(l/s) (unless P.=NP) [5; 20]. So the best we can hope for (in terms of trade-off between error probability and number of queries) is to construct verifiers having error s and query complexity $\overline{q} \log(1/s)$, where $q>1$ is some constant that we would like to be as small as possible. The parameter $\overline{q}$ (the ratio between number of queries and logarithm of inverse error probability) is called the amortized query complexity of the verifier.''

Back to this paper, although it did not explain amortized free-bit complexity clearly, the author told us that amortized free-bit $f$ is the most relevant parameter for proving hardness of approximating $\alpha(G)$. I had confused the studies of PCPs. Since one had proved the PCP theorem, why do researchers still involve in the studies of some parameters such as query complexity? What the benefit is if we could decrease it significantly.

However, in the other hand, for improving the results of non-approximability, the different parameters might mean the explaining abilities, i.e., the study of different parameters of PCPs might be a kind of characterization of different intractable problems.

\redW{FGLSS Reduction}, proposed by Uriel Feige and Shafi Goldwasser and Laszlo Lovasz and Shmuel Safra and Mario Szegedy, ``Interactive proofs and the hardness of approximating cliques.''



Fractional Chromatic Number \redW{Fractional Chromatic number $X_f(G)$}, proposed by Lovasz ``On the ratio of optimal integral and fractional covers.''. \redW{k-tuple Chromatic Number}: Original definition from L. Lovasz, ``On the ratio of optimal integral and fractional covers''

\DEF{The {\bf k-tuple chromatic number} $X_k$ of $G$ is the least number $l$ such that it is possible to associate a $k$-subset $C(x)$ of $\{1,\ldots,l\}$ with every point $x$ of $G$ in such a way that $C(x)\cap C(y)=\varnothing$ for $(x,y)\in E(G)$. $X(G)=X_1(G)$ is the ordinary chromatic number.}

$X_k(G)$ could also be interpreted as the minimum number of independent sets which cover every point at least $k$ times.

The other definition proposed by the authors are in the inverse perspective. (However, it is not so lucid as the original one.)

\DEF{We define {\bf fractional chromatic number} $X_f(G)$ as the smallest real $k$ such that for some distribution $f$ on $G$'s independent sets, choosing $I\leftarrow f$ covers any $v\in V(G)$ with probability at least $1/k$.}

Ref.: Wikipedia: fractional chromatic number



Randomized Graph Products

The authors mentioned that there are two techniques. The first one is proposed in P. Berman and G. Schnitger, ``On the complexity of approximating the independent set problem'' and A. Blum, ``Algorithms for Approximate Graph Coloring.'' Another one is proposed in U. Feige, Randomized graph products, chromatic numbers, and the Lovasz -function.''

Randomized PCPs

It is convenient to view an RPCP as follows: A prover has an initial witness $W$ to the NP-statement and a source of randomness $R_P$. Based on $W$ and $R_P$ the prover generates a witness $w$ for the PCP verifier to check. If $x\in L$ then $w$ will have perfect completeness regardless of $R_P$. If $x\not\in L$, there is no legal witness $w$, and the verifier accepts with probability at most $e$.



The new parameter $\rho$ confused me again, however I think we could take the RPCP as a PCP system and Prover of it has the random bits, i.e., randomized prover.

Zero-Knowledge Proofs

I am studying Zero-Knowledge Proofs. Salil Vadhan wrote a good slide\footnote{http://www.cpi.caltech.edu/quantum-security/slides/vadhan.ppt} for this topic. The idea of zero knowledge is not hard to understand, but its formal definition isn't.

\DEF{$(P,V)$ is {\bf zero knowledge} if $\forall$ poly-time $V^{*}\exists$ poly-time $S$ s.t. when $x\in L, S(x)$ is computationally indistinguishable from $(P,V)(x)$.}

According to Oded Goldreich's tutorial, it means that there is a simulator $S$, that is a polynomial-time algorithm that generates false transcripts (without the prover) which are identical to the genuine, and an interactive proof has the zero knowledge property if a simulator exists for the proof. It means that \redW{The Verifier learns nothing in the interaction.}

I totally can't understand the statement. Why do they use the idea "simulator" to describe the idea zero knowledge? Usually, people use the term "simulator" to present a procedure that could work as simulated things, however the simulator must know something which Verifier don't know to simulate the whole work, since the Verifier doesn't interact with Prover.

I must get something wrong! but I don't know where it is! Maybe I need to read the original paper, Shafi Goldwasser, Silvio Micali, and Charles Rackoff. The knowledge complexity of interactive proof-systems. Proceedings of 17th Symposium on the Theory of Computation, Providence, Rhode Island. 1985. [Draft]



Zero-Knowledge Property

I had said that I am interested in the various complexities, so that I really want to know what it means although it might not useful to my research.

The original idea came from the paper, Shafi Goldwasser, Silvio Micali, and Charles Rackoff. ``The knowledge complexity of interactive proof-systems.'' Proceedings of 17th Symposium on the Theory of Computation, Providence, Rhode Island. 1985. In this paper the authors defined a measurement, knowledge complexity, of the amount of knowledge about the proof transferred from the prover to the verifier. and convinced it is an important parameter of proof systems.

It is great! It is not easy to formalize an informal and fuzzy idea, but they did it!

A ZK proof system is interactive proof system $+$ zero-knowledge property. The later one is the hardest part. A slides\footnote{http://www.cs.biu.ac.il/~porately/2001/algorithm/zk1.ppt} gave proofs for G3C, GI, respectively. Actually, Boaz Barak and Silvio Micali gave their lecture note and another on this topic respectively.

I thought Graph-3-Coloring is a earier one to understand, but it seems not so easy to me.

\DEF{$(P,V)$ is {\bf zero knowledge} if $\forall$ poly-time $V^{*}\exists$ poly-time $S$ s.t. when $x\in L, S(x)$ is computationally indistinguishable from $(P,V)(x)$.}

I had said that I totally can't understand the zero knowledge property, however if change the aspect of view, the probabilistic polynomial time simulator means simulate the ``interactive procedure'', i.e., That the interactive procedure could be simulated in probabilistic polynomial time means there didn't exist something intractable in the interactive part of original Interactive Proof System.

In the example of Graph-3-Coloring, the simulator will repeat forever until the verifier choose the pre-determined edge, but there exist only $m$ edges, i.e., the probability of the pre-determined edge doesn't be choosing for $m$ time run is $\frac{1}{m}^m$. Although in this example the verifier seems a fool who can't awake to the multiple time repeat.

The second question is that if it is computationally indistinguishable?

Piotr Berman and Georg Schnitger, ``On the complexity of approximating the independent set problem'', Information and Computation, Volume 96, Issue 1, , January 1992, Pages 77-94.

Update 2006-10-05
Prof. Madhu Sudan gave a course ``Approximability of Optimization Problems'' at MIT. The Lecture 23 at 1999-12-1 was ``Tailormade PCPs: Zero-knowledge and the chromatic number,'' that discussed the custom-made PCP and use this paper as example. He explained the ZK in here.

Of course, here we don't care whether the verifier learns anything or not, all we want is for the FGLSS graph G to have chromatic number X(G) equal to the number of valid probabilitistically checkable proofs of a given theorwm; and if the theorem does not have a valid proof, then the chromatic number of the FGLSS graph cannot be small.

This paper is really hard to understand. I try to understand it by converting it back to normal PCP (opposite to Randomized PCP), but prof. Sudan followed the authors' direction.

Posted by AbnerCYH | Permanent Link | Categories: Papers

2006-09-08 16:00:36

Manuel Blum and His Students

From ephermata: Good advices..., I saw the list of Blum's students. I knew the Blum family is outstanding in computer science. Manuel Blum received the Turing Award in 1995 "In recognition of his contributions to the foundations of computational complexity theory and its application to cryptography and program checking". His wife, Lenore Blum, is an American mathematician and computer scientist. They and their son, Avrim Blum, are professors of Computer Science at Carnegie Mellon.

But I don't know Manuel Blum is also an outstanding teacher!

Blum_students

(Click the picture for larger size.)

OH! I think Manuel Blum is a Mr. CS indeed.

P.S. I drawn this graph by Ajax/Graphviz, and the data are from The Mathematics Genealogy Project.

Posted by AbnerCYH | Permanent Link | Categories: Scholars

2006-09-07 23:23:54

The New Semester

The new semester will start on 9/8, and I am the first reporter in the order of presenting on the LEAST seminar. I need to report in two seminars in a day, so that I am busy with writing slides.

I am considering several things. My senior, a Ph.D. student, will teach the course ``Computer Systems & Applications,'' and I would like to apply the position of TA, however, I project to attend four courses, computer architecture, Optimization Theory, theory of computation, and academic writing. Besides I need to write my thesis and to attend ``Special Topics in Algorithms'' and ``Graph Algorithms Meeting.'' It is really heavy!

The worse thing is that my adviser seems interested in property testing recently, therefore it would become my topic of thesis, but I know less about it and I don't think that I have time to learn it....:(

Posted by AbnerCYH | Permanent Link

2006-09-07 15:01:16

Theory of Computation

As my tradition of writing, I will first list the links of introduction which is about such topic. The first one should be Wikipedia:: Computational complexity theory, and the second should be website of Lance Fortnow who is professor of CS dept. of University of Chicago. He wrote many good articles about computational complexity, e.g., A short history of computational complexity, Favorite Ten Complexity Theorems, Favorite Computational Complexity Books, and his Blog ``Computational Complexity.''

Computational Complexity:

Computability Theory:

Real Computation:
  • Ker-I Ko, Professor of Computer Science, State University of New York at Stony Brook. Not much info on his homepage. Some survey papers.
  • Vasco Brattka. There are some informations on this field and tutorial. The following figure is from his tutorial.

    Computability_and_Complexity_in_Analysis
  • Lenore Blum, professor of CMU.


Probabilistically Checkable Proofs Sysyems: I thought the term ``non-deterministic,'' that Michael O. Rabin proposed is the one of most fuzzy term in computer science. Prof. Rabin is really a genius, but it is hard to understand. The ``Proof System'' should be a better framework that could make us understand those Turing Machines better. Oded Goldreich wrote surveys and maintains a webpage named ``Probabilistic Proof Systems,'' I think it is a good start point to understand Probabilistic Proof Systems. On that framework, we could take P as a polynomial time prover, NP as a polynomial time verifier, and PCP as randomized polynomial time verifier, i.e., P construct solution, and NP and PCP verify the proof from Oracle and instance for membership problem.

Bibliographies and Links Collection:

Posted by AbnerCYH | Permanent Link | Categories: Notes

2006-09-06 04:04:39

Optimization Theory

Ref. : Weisstein, Eric W. "Optimization Theory." From MathWorld--A Wolfram Web Resource.

A branch of mathematics which encompasses many diverse areas of minimization and optimization. Optimization theory is the more modern term for operations research. Optimization theory includes the calculus of variations, control theory, convex optimization theory, decision theory, game theory, linear programming, Markov chains, network analysis, optimization theory, queuing systems, etc.

Ref. :Optimization (mathematics) - Wikipedia, the free encyclopedia
Ref. :Operations research - Wikipedia, the free encyclopedia

Since I decided to attend the course Optimization Theory of NCTU, I would write a note for collecting some resources.

Lecture Notes: Slides:

People
  • Ted Ralphs, an associate professor of dept. of Industrial and Systems Engineering, Lehigh University. He maintains many nice stuff on his homepage, including lecture notes and resources of research.
  • MIT, Dimitri Bertsekas. Author of many books and teach many courses.
  • Katta G. Murty, Professor, industrial and Operations engineering, The University of Michigan, Ann Arbor.
  • Monique Laurent, CWI
  • Alexander Schrijver(Lex Schrijver), CWI
  • Yinyu Ye, Stanford. He taught many courses like Linear and Conic Optimization, Linear and Nonlinear Optimization, etc.


Others

2006-11-16 Update: In the course, Macroeconomic Analysis I, I heard the term ``Dynamic Optimization''.

I knew the Optimization Theory contains a lot of sub-fields like combinatorial optimization, Stochastic programming, however I didn't know this.

Ref. Dynamic optimization problems:

Dynamic optimization problems involve dynamic variables whose values change in time. Such problems exist in the areas of optimal control, parameter estimation for dynamic models, reactor network synthesis where the dynamic models arise from the differential modeling of the tubular reactors (plug flow reactors), and for dynamic simulation and optimization.

Some links put here There are two main approaches on this field, i.e., Optimal control theory and Dynamic programming.

Posted by AbnerCYH | Permanent Link | Categories: Notes

2006-09-05 22:24:01

Integrality and the Unimodular Property

Earlier notes: 2005-12-26 19:54:53, 2006-02-09 20:20:49.

Integrality and the Unimodular Property

We knew that for some network problems, there are combinatorial algorithms and linear programming algorithms. It means the solutions of those problem are integers, i.e., integrality of network problems, and generally the integer linear programming is in NP-Complete. Thus I want to know the criteria of integrality and tractability.

Ref. : Chapter 4 NETWORK FLOWS, section 12 Integrality of Flows and the Unimodular Property, Lawler, Eugene L. Combinatorial Optimization. Holt, Rinehart, and Winston, 1976.

Ref. : Chapter 8. Integer linear programming and totally unimodular matrices, A Course in Combinatorial Optimization, Alexander Schrijver, February 1, 2006.

Theorem 2.2. Let $P=\{x \| Ax\leq b\}$ be a polyhedron in $R^n$ and let $z\in P$. Then $z$ is a vertex of $P$, if and only if $rank(A_z) = n$.}

where $A_z$ is the submatrix of $A$ consisting of those rows $a_i$ of $A$ for which $a_i\times z= b_i$. $A_z$ is the submatrix of A consisting of those rows $a_i$ of $A$ for which $a_i\times z=b_i$.

A matrix A is called totally unimodular if each square submatrix of A has determinant equal to 0, +1, or -1. In particular, each entry of a totally unimodular matrix is 0, +1, or -1.

Theorem 8.1. Let A be a totally unimodular $m\times n$ matrix and let $b\in Z^m$. Then each vertex of the polyhedron
(9) $P:=\{x \| Ax\leq b\}$
is an integer vector.

Theorem 8.2. Let A be an integer $m\times n$ matrix of rank $m$. Then A is unimodular, if and only if for each integer vector $b$ the polyhedron
(18) $P=\{x \| x>0; Ax = b\}$
is integer.

Theorem 12.1 (Integral Circulation Theorem) If all lower bounds and capacities are integers and there exists a finite optimal circulation, then there exists an integral optimal circulation (whether or not arc costs are integers).

Corollary 12.3 Let $C'$ be the convex polytope defined by the inequality constraints $A'x\leq b, x \geq 0$, where $A'$ is an integer matrix. The following three conditions are equivalent:
(12.2) $A'$ is totally unimodular.
(12.3) The extreme points o f$C'$ are all integral for any integer vector $b$.
(12.4) Every nonsingular submatrix of$A'$ has an integer inverse.

Unfortunately, there do not seem to be any easily tested necessary and sufficient conditions for total unimodularity. Perhaps the most elegant such conditions are due to Camion, which we state without proof. A matrix is said to be Eulerian if the sum of the elements in each row and in each column is even.

Theorem 12.4 (Cumion) A (0, +l, -1) matrix is totally unimodular if and only if the sum of the elements in each Eulerian square submatrix is a multiple of four.

Theorem 12.5 .4 (0, + 1, -1) matrix A is totally unimodular if both of the following conditions are satisfied:
(12.5) Each column contains at most two nonzero elements.
(12.6) The rows of A can be partitioned into two sets A, and $A_2$, such that two nonzero entries in a column are in the same set of rows if they have different signs and in different sets of rows if they have the same sign.


Posted by AbnerCYH | Permanent Link | Categories: Notes

2006-09-05 05:57:13

Move some old stuff to here

The antecedent of this blog is ``BarrosH's Blog in English'' hosted on Blogsome, but it is disused. I moved here since February 17, 2006 8:52 pm. I posted a notice on it.

I moved this blog to my Labˇ¦s server several months ago. Now I think the new one works well and satisfy my needs so that I announce it.

Its blog system is nanoblogger

Now I would like delete that blog, so that I move some old stuff to here. The posts before ``2005-12-25 - Welcome to NanoBlogger 3.3!'' are all the old stuff of ``BarrosH's Blog in English''.

Posted by AbnerCYH | Permanent Link

2006-09-03 02:49:42

Property Testing

I had written an entry for gestatating of my thesis proposal. Actually I had sent a mail about this to my adviser. I said that I am more familiar on chordal graph and I would like to take the approximation issue of chordal graph as my topic of (master) thesis, however it is unexpectedly that my adviser asked me whether I had interest on property testing.

Thus it is the reason I create this note....:)

We consider the question of determining whether a given object has a predetermined property or is ``far'' from any object having the property. Specifically, objects are modeled by functions, and distance between functions is measured as the fraction of the domain on which the functions differ.

From Combinatorial Property Testing, Oded Goldreich

At first, I would like to wirte down some URLs of articles not so formal.

I used the term property testing in google for searching Taiwaness researchers, there are only two guys, one is the student of prof. Lyuu, the other one is Chuang-Chieh Lin. Chuang-Chieh Lin also mantained a list of papers on property testing.

People:
  • Ronitt Rubinfeld, Professor of Electrical Engineering and Computer Science at MIT,
  • Prof. Dana Ron, Department of Electrical Engineering - Systems, Tel-Aviv University
  • Eldar Fischer, Computer Science Department, Technion - Israel Institute of Technology


Some other Bibliographies [1], [2]

Some paper I read:
  1. Kumar, R., Rubinfeld, R., ``Sublinear time algorithms'', Samir Khuller's Algorithms column in SIGACT News 2003.

    It introduced the sublinear algorithms. It introduced the traditional sublinear algorithms like k-median and linear programming.

  2. The Art of Uniformed Decisions.

    After reading this paper, I am interested in non-testable. I think Property Testing could be taken as the approximation of the optimization problems which are transformed(or re-formulated) from decision problems like Zuckerman's ``NP-complete problems have a version that's hard to approximate.'' Structure in Complexity Theory Conference, 1993., Proceedings of the Eighth Annual, 1993, 305-312.

    Therefore the distance parameter (epsilon-far, -close) is the approximation ratio of problem in this view.

  3. Testing Connectivity of Graphs.
  4. Property Testing :A Tutorial, Dana Ron, Appeared: Handbook on Randomization, volume II, Edited by: S. Rajasekaran, P. M. Pardalos, J. H. Reif, and J. D. P. Rolim, 39 pages.

    Half pages of this paper are about testing graph properties. E. Fischer's survey is more general but harder to read. (1st order and rho clique)

  5. [Unread]Property testing in bounded degree graphs, O. Goldreich and D. Ron. Proc. 28th Symposium on Theory of Computing, pp. 406--415, 1997.

    I wanted to read this paper, because a paper I had read cite it.. However this paper lacked too much details, I can't understand it. Though I read the journal version, Algorithmica, Vol. 32 (2), pages 302-343, 2002. (Ohmm...42 pages) I might read the part about connectiveity testing.
  6. [Unread] WANG, Y. Approximate Algorithms on Checking Proofs with Probabilistic Methods, master thesis, Dept. CS, NTU, 2000.
  7. [Unread] Funda Ergun , Ravi Kumar and Ronitt Rubinfeld, Fast approximate probabilistically checkable proofs, Information and Computation, Volume 189, Issue 2, , 15 March 2004, Pages 135-159.
  8. [Unread] Shapira, A. A combinatorial characterization of the testable graph properties: it's all about regularity STOC '06: Proceedings of the thirty-eighth annual ACM symposium on Theory of computing, ACM Press, 2006, 251-260

Posted by AbnerCYH | Permanent Link | Categories: Notes

2006-09-02 19:58:51

Bad Habit

I like reading papers, but I have a bad habit that I will want to learn something appeared in the paper, therefore I can't focus on a topic.

Currently I am interested in the topics why some LP problem will have integer in P such as network flow problems.

Posted by AbnerCYH | Permanent Link