January 2006 Archives
2006-01-23 15:13:36
Note on Statistics
Statistical inference is inference about a population(????) from a random sample drawn from it.
It has two main parts:- Estimation
- Hypothesis testing
Ref. Statistics Glossary - Basic Definitions
In statistics, the term parameter is a value used to represent a certain population characteristic. And the term statistic is a quantity that is calculated from a sample of data. It is used to give information about unknown values in the corresponding population. A population with fixed value of parameters, and each sample of population has its own value of any statistic. We use statistic to estimate parameter. e.g., We can use the means of samples to get information about the mean of the population from which samples were drawn.
point estimation:
In statistics, point estimation involves the use of sample data to calculate a single value (known as a statistic) which is to serve as a "best guess" for an unknown (fixed or random) population parameter.
The Interval estimation is like point estimation but estimate the interval of possible parameter of population by using statistic.Ref. Statistical hypothesis testing - Wikipedia, the free encyclopedia
After the data is available, the test statistic is calculated and we determine whether it is inside the critical region.
If the test statistic is inside the critical region, then our conclusion is one of the following:
- The hypothesis is incorrect, therefore reject the null hypothesis. (Therefore the critical region is sometimes called the rejection region, while its complement is the acceptance region.)
- An event of probability less than or equal to alpha(comment: alpha means critical value of critical region) has occurred.
On this basis, statistical research progresses by eliminating error, not by finding the truth.As I know, statistics doesn't implies causation(?]?G??Y). i.e., it can't prove nothing!!! It might be why statisticians test the Null hypothesis.
Ref. Null hypothesis - Wikipedia, the free encyclopedia
The formulation, testing, and rejection of null hypotheses is methodologically consistent with the falsificationist model of scientific discovery formulated by Karl Popper and widely believed to apply to most kinds of empirical research. However, concerns regarding the high power of statistical tests to detect differences in large samples have led to suggestions for re-defining the null hypothesis, for example as a hypothesis that an effect falls within a range considered negligible. This is an attempt to address the confusion among non-statisticians between significant and substantial, since large enough samples are likely to be able to indicate differences however minor.
Ref. Statistical significance - Wikipedia, the free encyclopediain traditional frequentist statistical hypothesis testing, the significance level of a test is the maximum probability of accidentally rejecting a true null hypothesis (a decision known as a Type I error)(False positive, also called a Type I error). The significance of a result is also called its p-value; the smaller the p-value, the more significant the result is said to be.
For example, one may choose a significance level of, say, 5%, and calculate a critical value of a statistic (such as the mean) so that the probability of it exceeding that value, given the truth of the null hypothesis, would be 5%. If the actual, calculated statistic value exceeds the critical value, then it is significant "at the 5% level".
2006-01-23 06:37:35
How much damage can be caused by a peer reviewer having a bad day?
Ref. We Are Sorry to Inform You, by Simone Santini, University of California, San Diego
I don't know.
But I thought really good papers will be published eventually. The worst results might be published on not good journals. Every year, a lots of researchers join and huge literature are published. The parallelism in research will be more and more.
Peer-review might be meaningless in the top research institutes, but, in some countries, some people want to get money form organizations of government like NSF by cheating on his research results. If the research level of this country was not so good, then they would settle for just passable results. It would make some loopholes.
2006-01-23 05:03:14
Debut of New Lab Website
Algorithm and Biocompuing Laboratory
I write it on TiddlyWiki 2.0.3.
2006-01-21 18:14:05
Algorithmic Algebraic Model Checking I: Challenges from Systems Biology.
C.Piazza, M.Antoniotti, V.Mysore, A.Policriti, F.Winkler, B.Mishra.. Algorithmic Algebraic Model Checking I: Challenges from Systems Biology. In: Proceedings Computer Aided Verification (CAV 2005), K.Etessami and S.K.Rajamani (ed.), LNCS 3576, pp. 5-19. 2005. Springer, ISSN 0302-9743, ISBN 3-540-27231-3.
I browsed this paper today. In fact, I read it word by word at beginning, but there are too many background which I don't have in this paper. Therefore, I just read the introduction, conclusion, and preliminaries. However, I don't know whether I understand such part.In biochemistry, there are many methods to model biological systems, e.g., system of differential equations. But millions of differential equations are hard to solve to gain closed form. The simulation via numerical method does not dig out too much insights of biological system, so that the authors asked "Can we understand biology by 'simulating the biologist, and not biology."
The detail of this paper is too complex to me. Even its ideas are out of range of what I know. They introduced the "hybrid automata" which can model the discrete program with continuously changing. After modeling biological systems by differential algebraic equations, they use symbolic computation to determine the transitions between the states.
I never thought about that! It is interesting!
Ref.
2006-01-17 15:38:39
Candidate Papers for Lab Meeting
-
"Massive Quasi-Clique Detection," by James Abello and Mauricio G. C. Resende and Sandra Sudarsky. And "On mining cross-graph quasi-cliques," by Jian Pei and Daxin Jiang and Aidong Zhang.
"Handle-rewriting hypergraph grammars," by Bruno Courcelle and Joost Engelfriet and Grzegorz Rozenberg. And "Node rewriting in graphs and hypergraphs: a categorical framework," by Michel Bauderon and H\&\#233;l\&\#232;ne Jacquet.
"Dynamics, Computation, and the "Edge of Chaos": A Re-Examination," by Melanie Mitchell and James P. Crutchfield and Peter T. Hraber. And "Pattern Discovery and Computational Mechanics," by Cosma Rohilla Shalizi, James P. Crutchfield. And "Methods and Techniques of Complex Systems Science: An Overview," by Cosma Rohilla Shalizi.
C.Piazza, M.Antoniotti, V.Mysore, A.Policriti, F.Winkler, B.Mishra.. Algorithmic Algebraic Model Checking I: Challenges from Systems Biology. In: Proceedings Computer Aided Verification (CAV 2005), K.Etessami and S.K.Rajamani (ed.), LNCS 3576, pp. 5-19. 2005. Springer, ISSN 0302-9743, ISBN 3-540-27231-3.
It seems a interesting paper.
2006-01-17 03:27:19
ISA5101 Systems Biology 2005
-
Codon Usage Database
G~tRNA~db: The Genomic tRNA Database
Amino Acids
The E. coli Index (part of the WWW Virtual Library)
Expressed sequence tag
Functional Genomics
DEG Database of Essential Genes
Wobble base pair - Wikipedia, the free encyclopedia
The Profiling of Escherichia coli chromosome (PEC) database
The Genetic Codes
Regulatory noncoding RNAs database
Genes & Gene Expression - BioChemWeb.org
Junk DNA - Wikipedia, the free encyclopedia
Summary of Escherichia coli, Strain EDL933, serotype O157:H7, version 9.6, by Jonathan Wagg, SRI
E. coli Genome Project
coliBASE, an online database for E.coli, Shigella and Salmonella comparative genomics
Gene prediction and start codon selection in GC-rich genomes, Michaela Falb1, Friedhelm Pfeiffer2, Dieter Oesterhelt
Essential Genes in E. coli
Genetic code - Wikipedia, the free encyclopedia
Messenger RNA - Wikipedia, the free encyclopedia
Gene expression - Wikipedia, the free encyclopedia
2006-01-17 03:13:48
CS6135 VLSI Physical Design Automation, Fall 2005
Global Optimization
-
Global Optimization (Slides), by Torn, Aimo A.
Global optimization - Wikipedia, the free encyclopedia
Global Optimization
A Survey of Global Optimization Methods by Perry Gray, William Hart, Laura Painton, Cindy Phillips, Mike Trahan, John Wagner
-
Session 10: 2005 ISPD Placement Contest
CG+
Conjugate Gradient Method -- From MathWorld
Unconstrained Optimization
Unconstrained Optimization Packages
Nonlinear Programming Frequently Asked Questions
LANCELOT
hMETIS
Open Optimization Library (OOL)
Introduction to IPOPT:A tutorial for downloading, installing, and using IPOPT.
GSL - GNU Scientific Library
Hooking Your Solver to AMPL by David M. Gay
Ipopt (Interior Point OPTimizer, pronounced I-P-Opt)
galahad@camelot.uk
The SIF Reference Document, by Andrew R. Conn, Nicholas I. M. Gould and Philippe L. Toint
Bookshelf : Hypergraph Partitioning Slot
feng shui 5.0: Placement Engine
MLPart: high-performance min-cut bisector
Bookshelf: Executable Placement Utilities
Prof. Klaus Schittkowski : Software
Benchmarks for Optimization Software, by Hans Mittelmann (mittelmann at asu.edu)
NEOS Guide
Systems Optimization Laboratory (SOL)
SAL- Numerical Analysis - Optimization
HQP: a solver for sparse nonlinear optimization
Decision Tree for Optimization Software
Numerical Computing Resources on the Internet
2006-01-07 21:51:28
Look for Topic
About the topics, there are many different viewpoints. Some professors preferred hard and important problems; they thought that a little results on important problems better than a lots of results on minor problems. On the other hand, some professors preferred the problems which could got results in short-term.
For a researcher, their publications are important for competing on the job market, so that pursuit of more publications are understandable. The same problems present in the other way to the topic of thesis.
I am a master program student, and I don't have any pressure of publication. Therefore, I want to pursue something not mainstream. It might be not smart, but I don't think it is stupid. However, I don't know how to tell my adviser.
2006-01-07 17:44:04
Operations Research
However, very much good researchers form OR made good and important contribution of various fields. For example, professor Wen-Lian Hsu is a IEEE fellow, and he got Ph.D. at Cornell University, Operations Research.
From wikipedia:
Operations research, operational research, or simply OR, is the use of mathematical models, statistics and algorithms to aid in decision-making. It is most often used to analyze complex real-world systems, typically with the goal of improving or optimizing performance. It is one form of applied mathematics.
What are the core subjects of OR? For some subjects, it looks the same as CS.2006-01-07 03:41:20
Note on CS6135 project
In sample case of ISPD'05, there are 5xx fixed cells. It is more than the max variables of Ipopt under AMPL interface.
At present, I have two ideas to deal with it. The first one is that we ignore those fixed cells and do placing; after multilevel clustering and placing, we divide the region into m-subregion, and replace in this subregion with some fixed cells in it. On each level of clustering and placing, we ignore the wirelengths between clusters and fixed cells.
The second one is that, we take the positron of fixed cells as the constant in NLP, it might be easy in AMPL, but might be difficult in compiled version (i.e., in C interface). The only thing we could change is the value of input data..... And when the number of fixed cells are too many, the computation might be more tough.
Cooperating with hMETIS:
hMETIS is one of two engines in my program; it is for multi-level clustering, but its format assume that all names of nodes and net are numbers. Further, bookshelf formats describe net by pins of cells, so that it is possible that a node presents twice in a net. Those problem should be fixed.
Today, I finished the part of I/O. The remaining jobs are to link it to hmetis and to write the AMPL code generator. Note: There is one thing needed to solve; the program will occur segment fault when it call the function of hMETIS. I guessed that the reason might like MLPart, i.e., it can't deal with nets with degree 1, but, after trying some cases, I thought that it work well with those nets.
By the way, I really want to rewrite my codes if I have more time. In my code, there are many stupid design and five-level nest structure.....
Update:I made a little mistake, I fixed it, now my program works. But, hMETIS needs lots of memory, it requests about 1GB for the 300-way partition. However, stand-alone hMETIS works well as 2000-way partition.....why?
Update:I must be too tired. It is AM 05:14. About above question, the answer is the test cases are different. The case for 300-way partition is about 40MB..... I decrease the value of k to 100, it can work this time.
¤» 1¤ë 7 05:12:20 CST 2006 ¤» 1¤ë 7 05:17:12 CST 2006
It spent almost 5 minutes......
2006-01-06 20:42:28
XXX Geometry
I know there are a lot of fields about "geometry", computational geometry, affine geometry, algebraic geometry, Analytic geometry, Complex geometry,Differential geometry, Euclidean geometry, Non-Euclidean geometry, Plane geometry, Projective geometry, and Riemannian geometry. However, it seems more than that I imaged.
Ref. Types, methodologies, and terminologies of geometry
* Absolute geometry
* Affine geometry
* Algebraic geometry
* Analytic geometry
* Birational geometry
* Complex geometry
* Combinatorial geometry
* Computational geometry
* Conformal geometry
* Constructive solid geometry
* Contact geometry
* Convex geometry
* Descriptive geometry
* Differential geometry
* Digital geometry
* Discrete geometry
* Distance geometry
* Elliptic geometry
* Euclidean geometry
* Finite geometry
* Geometry of numbers
* How Archimedes used infinitesimals
* Hyperbolic geometry
* Information geometry
* Integral geometry
* Inversive geometry
* Non-Euclidean geometry
* Numerical geometry
* Parabolic geometry
* Plane geometry
* Projective geometry
* Riemannian geometry
* Spherical geometry
* Symplectic geometry
* Synthetic geometry
* Transformation geometry
2006-01-04 17:33:49
Workshop on the Evolution of Complexity
The evolution of complexity is a central theme in Biology. Yet it is not without any ambiguity. Complexity has been used to refer to different things. For instance, complexification has been interpreted as a process of diversification between evolving units or as a scaling process that is related to the idea of transitions between different levels of complexity. Other meanings of complexity have been introduced, both inside and outside the realm of Biology. What concerns most researchers is to get insight into the mechanisms that produce their notion of complexity.
In fact, I am not insterested too much in it, but I am insterested in the concept of "complexity" on other fields.Some atricles mentioned by crshalizi
2006-01-04 04:12:27
Entropy, Complexity and Computation
-
Communication complexity
Complexity theory
Complexity
Systems theory
Complex system
Entropy by Tianyan-Li in chinese.
Measuring the "Complexity" of a time series
Resource Letter ITP:1: Information Theory in Physics
Dynamics, Computation, and the "Edge of Chaos": A Re-Examination
Complexity, Computation and the Physics of Information
Computational Mechanics
The Internet as a Complex System: Scaling, Complexity and Control
Computation in Physical and Biological Systems
In computer science, proof complexity is a measure of efficiency of automated theorem proving methods that is based on the size of the proofs they produce. The methods for proving contradiction in propositional logic are the most analyzed. The two main issues considered in proof complexity are whether a proof method can produce a polynomial proof of every inconsistent formula, and whether the proofs produced by one method are always of size similar to those produced by another method.(Update)
2006-01-03 23:45:40
Note on CS6135 Project
But, the second problem was that MLPart doesn't support the k-way partition in this version, and there were some error message of the old version, UCLA_MLPart, so that I decided change to the hMETIS. Fortunately, hMETIS supports k-way partition and provides library. I need to write the format converter. However, format of hMETIS is very simple. I doesn't need to spend too much time on overhead.
I should design a suitable data structure for writing a multi-level partitioner!
On the other hand, there were some trouble of NLP solver. I wanted to use lancelot A as the NLP solver at beginning. But I found that SIF is too complex, and lancelot A couldn't run on my PC well. It might be my fault, but I don't know the reason.
Therefore, I decided to change to OOL, Open Optimization Library. But it request programmer to calculate the Hessian matrix before coding, I thought that it is hard for complex problem. (In fact, it provides the tools in "section 9.1 Numerical Gradient and Hessian" of its manual, but I don't know the exactly functions.)
I restart to find the next suitable NLP solver, LOQO seems a good choice, but the free student version restricted only 300 variables. Finally, I found Ipopt.
The simplest solution of using Ipopt should be the AMPL interface. Weirdly, I can't find out the option to use the function of AMPL. Does I miss or misunderstand anything?
However, it seems able to solve NLP problems without calculating Hessian matrices by hand.
Now, the plan is
-
For the huge numbers of cells in *.pl and nets in *.nets, I should design a data structure to handle the I/O problem of memory and disk.
Deal with the FIXED CELL
Write I/O part at first.
...........
Update : I misunderstood the content of Ipopt's manual about AMPL. I thought AMPL is a description language to describe the problems and the only thing which I should do is to write down the problem by AMPL.
In fact, Ipopt doesn't contain the interpreter of AMPL, so that I downloaded student version of AMPL form http://netlib.bell-labs.com/netlib/ampl/(download student.tar), and move both ampl(stand-alone interpreter of AMPL) and ipopt into ~/bin .
After testing some examples form website of LOQO, Ipopt seems to work well with AMPL.(Note: add "option solver ipopt;" into *.mod file) But I don't know that whether the student version ampl restricted under 300 variables, too.
Reference: [Coin-ipopt] Is there any free AMPL parser can work with IPOPT ? Using the AMPL interface
Update : According to AMPL FAQ, "How big a problem can AMPL solve?"
Certain low-cost versions of AMPL have built-in limits on problem size. The student edition bundled with the AMPL book is limited to 300 variables and a total of 300 objectives and constraints, for example, and several packages having higher limits are offered by AMPL vendors. The professional edition of AMPL has no intrinsic limit on problem size, however; it is limited only by the resources available to the computer that is running AMPL.