January 2006 Archives

2006-01-23 15:13:36

Note on Statistics

Ref. Statistical inference - Wikipedia, the free encyclopedia

Statistical inference is inference about a population(????) from a random sample drawn from it.

It has two main parts:
  • Estimation
  • Hypothesis testing


Ref. Statistics Glossary - Basic Definitions

In statistics, the term parameter is a value used to represent a certain population characteristic. And the term statistic is a quantity that is calculated from a sample of data. It is used to give information about unknown values in the corresponding population. A population with fixed value of parameters, and each sample of population has its own value of any statistic. We use statistic to estimate parameter. e.g., We can use the means of samples to get information about the mean of the population from which samples were drawn.



point estimation:

In statistics, point estimation involves the use of sample data to calculate a single value (known as a statistic) which is to serve as a "best guess" for an unknown (fixed or random) population parameter.

The Interval estimation is like point estimation but estimate the interval of possible parameter of population by using statistic.

Ref. Statistical hypothesis testing - Wikipedia, the free encyclopedia

After the data is available, the test statistic is calculated and we determine whether it is inside the critical region.

If the test statistic is inside the critical region, then our conclusion is one of the following:

  1. The hypothesis is incorrect, therefore reject the null hypothesis. (Therefore the critical region is sometimes called the rejection region, while its complement is the acceptance region.)
  2. An event of probability less than or equal to alpha(comment: alpha means critical value of critical region) has occurred.
The researcher has to choose between these logical alternatives. In the example we would say: the observed response to treatment is statistically significant. If the test statistic is outside the critical region, the only conclusion is that There is not enough evidence to reject the hypothesis.

On this basis, statistical research progresses by eliminating error, not by finding the truth.As I know, statistics doesn't implies causation(?]?G??Y). i.e., it can't prove nothing!!! It might be why statisticians test the Null hypothesis.

Ref. Null hypothesis - Wikipedia, the free encyclopedia

The formulation, testing, and rejection of null hypotheses is methodologically consistent with the falsificationist model of scientific discovery formulated by Karl Popper and widely believed to apply to most kinds of empirical research. However, concerns regarding the high power of statistical tests to detect differences in large samples have led to suggestions for re-defining the null hypothesis, for example as a hypothesis that an effect falls within a range considered negligible. This is an attempt to address the confusion among non-statisticians between significant and substantial, since large enough samples are likely to be able to indicate differences however minor.

Ref. Statistical significance - Wikipedia, the free encyclopedia

in traditional frequentist statistical hypothesis testing, the significance level of a test is the maximum probability of accidentally rejecting a true null hypothesis (a decision known as a Type I error)(False positive, also called a Type I error). The significance of a result is also called its p-value; the smaller the p-value, the more significant the result is said to be.

For example, one may choose a significance level of, say, 5%, and calculate a critical value of a statistic (such as the mean) so that the probability of it exceeding that value, given the truth of the null hypothesis, would be 5%. If the actual, calculated statistic value exceeds the critical value, then it is significant "at the 5% level".

I am confused on the statistical significance.....And what's the statistical significance of correlation?

Posted by abner | Permanent Link | Categories: Notes

2006-01-23 06:37:35

How much damage can be caused by a peer reviewer having a bad day?

Ref. Logicomp: How much damage can be caused by a peer reviewer having a bad day?
Ref. We Are Sorry to Inform You, by Simone Santini, University of California, San Diego

I don't know.

But I thought really good papers will be published eventually. The worst results might be published on not good journals. Every year, a lots of researchers join and huge literature are published. The parallelism in research will be more and more.

Peer-review might be meaningless in the top research institutes, but, in some countries, some people want to get money form organizations of government like NSF by cheating on his research results. If the research level of this country was not so good, then they would settle for just passable results. It would make some loopholes.

Posted by abner | Permanent Link

2006-01-23 05:03:14

Debut of New Lab Website

I am the maintainer of our lab's new website.

Algorithm and Biocompuing Laboratory

I write it on TiddlyWiki 2.0.3.

Posted by abner | Permanent Link

2006-01-21 18:14:05

Algorithmic Algebraic Model Checking I: Challenges from Systems Biology.

C.Piazza, M.Antoniotti, V.Mysore, A.Policriti, F.Winkler, B.Mishra.. Algorithmic Algebraic Model Checking I: Challenges from Systems Biology. In: Proceedings Computer Aided Verification (CAV 2005), K.Etessami and S.K.Rajamani (ed.), LNCS 3576, pp. 5-19. 2005. Springer, ISSN 0302-9743, ISBN 3-540-27231-3.

I browsed this paper today. In fact, I read it word by word at beginning, but there are too many background which I don't have in this paper. Therefore, I just read the introduction, conclusion, and preliminaries. However, I don't know whether I understand such part.

In biochemistry, there are many methods to model biological systems, e.g., system of differential equations. But millions of differential equations are hard to solve to gain closed form. The simulation via numerical method does not dig out too much insights of biological system, so that the authors asked "Can we understand biology by 'simulating the biologist, and not biology."

The detail of this paper is too complex to me. Even its ideas are out of range of what I know. They introduced the "hybrid automata" which can model the discrete program with continuously changing. After modeling biological systems by differential algebraic equations, they use symbolic computation to determine the transitions between the states.

I never thought about that! It is interesting!

Ref.

R. Alur, C. Courcoubetis, N. Halbwachs, T. A. Henzinger, P. -H. Ho, X. Nicollin, A. Olivero, J. Sifakis and S. Yovine, The algorithmic analysis of hybrid systems, Theoretical Computer Science, Volume 138, Issue 1, 6 February 1995, Pages 3-34.

R. L. Grossman and R. G. Larson, An algebraic approach to hybrid systems, Theoretical Computer Science, Volume 138, Issue 1, 6 February 1995, Pages 101-112.


Posted by abner | Permanent Link | Categories: Papers

2006-01-17 15:38:39

Candidate Papers for Lab Meeting

I will present on the Lab meeting at 2006.2.28. The paper which I elect might be the topic of my M.S. thesis. Those can be classified into three classes, i.e., approximation and graph theory. language and graph theory, complexity science.
  • "Massive Quasi-Clique Detection," by James Abello and Mauricio G. C. Resende and Sandra Sudarsky. And "On mining cross-graph quasi-cliques," by Jian Pei and Daxin Jiang and Aidong Zhang.
  • "Handle-rewriting hypergraph grammars," by Bruno Courcelle and Joost Engelfriet and Grzegorz Rozenberg. And "Node rewriting in graphs and hypergraphs: a categorical framework," by Michel Bauderon and H\&\#233;l\&\#232;ne Jacquet.
  • "Dynamics, Computation, and the "Edge of Chaos": A Re-Examination," by Melanie Mitchell and James P. Crutchfield and Peter T. Hraber. And "Pattern Discovery and Computational Mechanics," by Cosma Rohilla Shalizi, James P. Crutchfield. And "Methods and Techniques of Complex Systems Science: An Overview," by Cosma Rohilla Shalizi.
Update at 2006/01/21:
C.Piazza, M.Antoniotti, V.Mysore, A.Policriti, F.Winkler, B.Mishra.. Algorithmic Algebraic Model Checking I: Challenges from Systems Biology. In: Proceedings Computer Aided Verification (CAV 2005), K.Etessami and S.K.Rajamani (ed.), LNCS 3576, pp. 5-19. 2005. Springer, ISSN 0302-9743, ISBN 3-540-27231-3.

It seems a interesting paper.

Posted by abner | Permanent Link

2006-01-17 03:13:48

CS6135 VLSI Physical Design Automation, Fall 2005

I submitted the report of term project at yesterday's morning. There are some records of the useful links. I copy them form my bookmarks to here. It might be useful some day.

Global Optimization For Term Project

Posted by abner | Permanent Link

2006-01-07 21:51:28

Look for Topic

I am looking for topic of my thesis.

About the topics, there are many different viewpoints. Some professors preferred hard and important problems; they thought that a little results on important problems better than a lots of results on minor problems. On the other hand, some professors preferred the problems which could got results in short-term.

For a researcher, their publications are important for competing on the job market, so that pursuit of more publications are understandable. The same problems present in the other way to the topic of thesis.

I am a master program student, and I don't have any pressure of publication. Therefore, I want to pursue something not mainstream. It might be not smart, but I don't think it is stupid. However, I don't know how to tell my adviser.

Posted by abner | Permanent Link

2006-01-07 17:44:04

Operations Research

Operations Research (OR), for historical aspect, is a field which is emphasis on math, but in Taiwan, it seems a field of bussiness college.

However, very much good researchers form OR made good and important contribution of various fields. For example, professor Wen-Lian Hsu is a IEEE fellow, and he got Ph.D. at Cornell University, Operations Research.


From wikipedia:

Operations research, operational research, or simply OR, is the use of mathematical models, statistics and algorithms to aid in decision-making. It is most often used to analyze complex real-world systems, typically with the goal of improving or optimizing performance. It is one form of applied mathematics.

What are the core subjects of OR? For some subjects, it looks the same as CS.

Posted by abner | Permanent Link

2006-01-07 03:41:20

Note on CS6135 project

Processing Fixed cell:
In sample case of ISPD'05, there are 5xx fixed cells. It is more than the max variables of Ipopt under AMPL interface.

At present, I have two ideas to deal with it. The first one is that we ignore those fixed cells and do placing; after multilevel clustering and placing, we divide the region into m-subregion, and replace in this subregion with some fixed cells in it. On each level of clustering and placing, we ignore the wirelengths between clusters and fixed cells.

The second one is that, we take the positron of fixed cells as the constant in NLP, it might be easy in AMPL, but might be difficult in compiled version (i.e., in C interface). The only thing we could change is the value of input data..... And when the number of fixed cells are too many, the computation might be more tough.

Cooperating with hMETIS:
hMETIS is one of two engines in my program; it is for multi-level clustering, but its format assume that all names of nodes and net are numbers. Further, bookshelf formats describe net by pins of cells, so that it is possible that a node presents twice in a net. Those problem should be fixed.

Today, I finished the part of I/O. The remaining jobs are to link it to hmetis and to write the AMPL code generator. Note: There is one thing needed to solve; the program will occur segment fault when it call the function of hMETIS. I guessed that the reason might like MLPart, i.e., it can't deal with nets with degree 1, but, after trying some cases, I thought that it work well with those nets.

By the way, I really want to rewrite my codes if I have more time. In my code, there are many stupid design and five-level nest structure.....

Update:I made a little mistake, I fixed it, now my program works. But, hMETIS needs lots of memory, it requests about 1GB for the 300-way partition. However, stand-alone hMETIS works well as 2000-way partition.....why?

Update:I must be too tired. It is AM 05:14. About above question, the answer is the test cases are different. The case for 300-way partition is about 40MB..... I decrease the value of k to 100, it can work this time.

¤»  1¤ë  7 05:12:20 CST 2006
¤»  1¤ë  7 05:17:12 CST 2006


It spent almost 5 minutes......

Posted by abner | Permanent Link | Categories: Notes

2006-01-06 20:42:28

XXX Geometry

This semester I took a course, combinatorial design; this course introduced a kind of geometry named finite geometry. I am insterested in those works which studied about the rigid property with set representitive of a system. Not like algebra, they doesn't define operations, they discuss the set system with some axioms, e.g., parallel axiom.

I know there are a lot of fields about "geometry", computational geometry, affine geometry, algebraic geometry, Analytic geometry, Complex geometry,Differential geometry, Euclidean geometry, Non-Euclidean geometry, Plane geometry, Projective geometry, and Riemannian geometry. However, it seems more than that I imaged.

Ref. Types, methodologies, and terminologies of geometry

    * Absolute geometry
    * Affine geometry
    * Algebraic geometry
    * Analytic geometry
    * Birational geometry
    * Complex geometry
    * Combinatorial geometry
    * Computational geometry
    * Conformal geometry
    * Constructive solid geometry
    * Contact geometry
    * Convex geometry
    * Descriptive geometry
    * Differential geometry
    * Digital geometry
    * Discrete geometry
    * Distance geometry
    * Elliptic geometry
    * Euclidean geometry
    * Finite geometry
    * Geometry of numbers
    * How Archimedes used infinitesimals
    * Hyperbolic geometry
    * Information geometry
    * Integral geometry
    * Inversive geometry
    * Non-Euclidean geometry
    * Numerical geometry
    * Parabolic geometry
    * Plane geometry
    * Projective geometry
    * Riemannian geometry
    * Spherical geometry
    * Symplectic geometry
    * Synthetic geometry
    * Transformation geometry

Posted by abner | Permanent Link

2006-01-04 17:33:49

Workshop on the Evolution of Complexity

Via Three-Toed Sloth

The evolution of complexity is a central theme in Biology. Yet it is not without any ambiguity. Complexity has been used to refer to different things. For instance, complexification has been interpreted as a process of diversification between evolving units or as a scaling process that is related to the idea of transitions between different levels of complexity. Other meanings of complexity have been introduced, both inside and outside the realm of Biology. What concerns most researchers is to get insight into the mechanisms that produce their notion of complexity.

In fact, I am not insterested too much in it, but I am insterested in the concept of "complexity" on other fields.

Some atricles mentioned by crshalizi

Posted by abner | Permanent Link

2006-01-04 04:12:27

Entropy, Complexity and Computation

  1. Communication complexity
  2. Complexity theory
  3. Complexity
  4. Systems theory
  5. Complex system
  6. Entropy by Tianyan-Li in chinese.
  7. Measuring the "Complexity" of a time series
  8. Resource Letter ITP:1: Information Theory in Physics
  9. Dynamics, Computation, and the "Edge of Chaos": A Re-Examination
  10. Complexity, Computation and the Physics of Information
  11. Computational Mechanics
  12. The Internet as a Complex System: Scaling, Complexity and Control
  13. Computation in Physical and Biological Systems
  14. In computer science, proof complexity is a measure of efficiency of automated theorem proving methods that is based on the size of the proofs they produce. The methods for proving contradiction in propositional logic are the most analyzed. The two main issues considered in proof complexity are whether a proof method can produce a polynomial proof of every inconsistent formula, and whether the proofs produced by one method are always of size similar to those produced by another method.(Update)

Posted by abner | Permanent Link | Categories: Notes

2006-01-03 23:45:40

Note on CS6135 Project

I spent a whole day to deal with multi-level clustering. At first, I tried to use MLPart as my cluster engine, but MLPart can't process net with degree 1. So I wrote some code to remove the number of degree 1 nets from .nets file. (Note : Don't forget to subtract the number of nets and pins form total number on the declaration of .nets file.

But, the second problem was that MLPart doesn't support the k-way partition in this version, and there were some error message of the old version, UCLA_MLPart, so that I decided change to the hMETIS. Fortunately, hMETIS supports k-way partition and provides library. I need to write the format converter. However, format of hMETIS is very simple. I doesn't need to spend too much time on overhead.

I should design a suitable data structure for writing a multi-level partitioner!

On the other hand, there were some trouble of NLP solver. I wanted to use lancelot A as the NLP solver at beginning. But I found that SIF is too complex, and lancelot A couldn't run on my PC well. It might be my fault, but I don't know the reason.

Therefore, I decided to change to OOL, Open Optimization Library. But it request programmer to calculate the Hessian matrix before coding, I thought that it is hard for complex problem. (In fact, it provides the tools in "section 9.1 Numerical Gradient and Hessian" of its manual, but I don't know the exactly functions.)

I restart to find the next suitable NLP solver, LOQO seems a good choice, but the free student version restricted only 300 variables. Finally, I found Ipopt.

The simplest solution of using Ipopt should be the AMPL interface. Weirdly, I can't find out the option to use the function of AMPL. Does I miss or misunderstand anything?

However, it seems able to solve NLP problems without calculating Hessian matrices by hand.

Now, the plan is
  • For the huge numbers of cells in *.pl and nets in *.nets, I should design a data structure to handle the I/O problem of memory and disk.
  • Deal with the FIXED CELL
  • Write I/O part at first.
  • ...........
Note: The Deadline 2006/01/16 10:00 AM

Update : I misunderstood the content of Ipopt's manual about AMPL. I thought AMPL is a description language to describe the problems and the only thing which I should do is to write down the problem by AMPL.

In fact, Ipopt doesn't contain the interpreter of AMPL, so that I downloaded student version of AMPL form http://netlib.bell-labs.com/netlib/ampl/(download student.tar), and move both ampl(stand-alone interpreter of AMPL) and ipopt into ~/bin .

After testing some examples form website of LOQO, Ipopt seems to work well with AMPL.(Note: add "option solver ipopt;" into *.mod file) But I don't know that whether the student version ampl restricted under 300 variables, too.

Reference: [Coin-ipopt] Is there any free AMPL parser can work with IPOPT ? Using the AMPL interface

Update : According to AMPL FAQ, "How big a problem can AMPL solve?"

Certain low-cost versions of AMPL have built-in limits on problem size. The student edition bundled with the AMPL book is limited to 300 variables and a total of 300 objectives and constraints, for example, and several packages having higher limits are offered by AMPL vendors. The professional edition of AMPL has no intrinsic limit on problem size, however; it is limited only by the resources available to the computer that is running AMPL.


Posted by abner | Permanent Link | Categories: Notes