Table of Contents
List of Figures
Score-Slicing
.Mini-Max
.Shift
AnTonCom, Budapest, 2011 This electronic book was prepared in the framework of project Eastern Hungarian Informatics Books Repository no. TÁMOP-4.1.2-08/1/A-2009-0046 This electronic book appeared with the support of European Union and with the co-financing of European Social Fund
Nemzeti Fejlesztési Ügynökség http://ujszechenyiterv.gov.hu/ 06 40 638-638
Editor: Antal Iványi Authors of Volume 3: Béla Vizvári (Chapter 24), Antal Iványi and Shariefuddin Pirzada (Chapter 25), Mira-Cristiana Anisiu and Zoltán Kása (Chapter 26), Ferenc Szidarovszky and László Domoszlai (Chapter 27), László Szirmay-Kalos and László Szécsi (Chapter 28), Antal Iványi (Chapter 29), Shariefuddin Pirzada, Antal Iványi and Muhammad Ali Khan(Chapter 30) Validators of Volume 3: György Kovács (Chapter 24), Zoltán Kása (Chapter 25), Antal Iványi (Chapter 26), Sándor Molnár (Chapter 27), György Antal (Chapter 28), Zoltán Kása (Chapter 29), Zoltán Kása (Chapter 30), Anna Iványi (Bibliography) ©2011 AnTonCom Infokommunikációs Kft. Homepage: http://www.antoncom.hu/ |
The third volume contains seven new chapters.
Chapter 24 (The Branch and Bound Method) was written by Béla Vizvári (Eastern Mediterrean University), Chapter 25 (Comparison Based Ranking) by Antal Iványi (Eötvös Loránd University) and Shariefuddin Pirzada (University of Kashmir), Chapter 26 (Complexity of Words) by Zoltán Kása (Sapientia Hungarian University of Transylvania) and Mira-Cristiana Anisiu (Tiberiu Popovici Institute of Numerical Mathematics), Chapter 27 (Conflict Situations) by Ferenc Szidarovszky (University of Arizona) and by László Domoszlai (Eötvös Loránd University), Chapter 28 (General Purpose Computing on Graphics Processing Units) by László Szirmay-Kalos and László Szécsi (both Budapest University of Technology and Economics), Chapter 29 (Perfect Arrays) by Antal Iványi (Eötvös Loránd University), and Chapter 30 by Shariefuddin Pirzada (University of Kashmir), Antal Iványi (Eötvös Loránd University) and Muhammad Ali Khan (King Fahd University).
The LaTeX style file was written by Viktor Belényesi, Zoltán Csörnyei, László Domoszlai and Antal Iványi. The figures was drawn or corrected by Kornél Locher and László Domoszlai. Anna Iványi transformed the bibliography into hypertext. The DOCBOOK version was made by Marton 2001. Kft.
Using the data of the colofon page you can contact with any of the creators of the book. We welcome ideas for new exercises and problems, and also critical remarks or bug reports.
The publication of the printed book (Volumes 1 and 2) was supported by Department of Mathematics of Hungarian Academy of Science. This electronic book (Volumes 1, 2, and 3) was prepared in the framework of project Eastern Hungarian Informatics Books Repository no. TÁMOP-4.1.2-08/1/A-2009-0046. This electronic book appeared with the support of European Union and with the co-financing of European Social Fund.
Budapest, September 2011
Antal Iványi (tony@compalg.inf.elte.hu)
Table of Contents
It has serious practical consequences if it is known that a combinatorial problem is NP-complete. Then one can conclude according to the present state of science that no simple combinatorial algorithm can be applied and only an enumerative-type method can solve the problem in question. Enumerative methods are investigating many cases only in a non-explicit, i.e. implicit, way. It means that huge majority of the cases are dropped based on consequences obtained from the analysis of the particular numerical problem. The three most important enumerative methods are (i) implicit enumeration, (ii) dynamic programming, and (iii) branch and bound method. This chapter is devoted to the latter one. Implicit enumeration and dynamic programming can be applied within the family of optimization problems mainly if all variables have discrete nature. Branch and bound method can easily handle problems having both discrete and continuous variables. Further on the techniques of implicit enumeration can be incorporated easily in the branch and bound frame. Branch and bound method can be applied even in some cases of nonlinear programming.
The Branch and Bound (abbreviated further on as B&B) method is just a frame of a large family of methods. Its substeps can be carried out in different ways depending on the particular problem, the available software tools and the skill of the designer of the algorithm.
Boldface letters denote vectors and matrices; calligraphic letters are used for sets. Components of vectors are denoted by the same but non-boldface letter. Capital letters are used for matrices and the same but lower case letters denote their elements. The columns of a matrix are denoted by the same boldface but lower case letters.
Some formulae with their numbers are repeated several times in this chapter. The reason is that always a complete description of optimization problems is provided. Thus the fact that the number of a formula is repeated means that the formula is identical to the previous one.
In this section the branch and bound method is shown on a numerical example. The problem is a sample of the binary knapsack problem which is one of the easiest problems of integer programming but it is still NP-complete. The calculations are carried out in a brute force way to illustrate all features of B&B. More intelligent calculations, i.e. using implicit enumeration techniques will be discussed only at the end of the section.
There are many different knapsack problems. The first and classical one is the binary knapsack problem. It has the following story. A tourist is planning a tour in the mountains. He has a lot of objects which may be useful during the tour. For example ice pick and can opener can be among the objects. We suppose that the following conditions are satisfied.
Each object has a positive value and a positive weight. (E.g. a balloon filled with helium has a negative weight. See Exercises 24.1-1 and 24.1-2) The value is the degree of contribution of the object to the success of the tour.
The objects are independent from each other. (E.g. can and can opener are not independent as any of them without the other one has limited value.)
The knapsack of the tourist is strong and large enough to contain all possible objects.
The strength of the tourist makes possible to bring only a limited total weight.
But within this weight limit the tourist want to achieve the maximal total value.
The following notations are used to the mathematical formulation of the problem:
For each object a so-called binary or zero-one decision variable, say , is introduced:
Notice that
is the weight of the object in the knapsack.
Similarly is the value of the object on the tour. The total weight in the knapsack is
which may not exceed the weight limit. Hence the mathematical form of the problem is
The difficulty of the problem is caused by the integrality requirement. If constraint (24.3) is substituted by the relaxed constraint, i.e. by
then the Problem (24.1), (24.2), and (24.4) is a linear programming problem. (24.4) means that not only a complete object can be in the knapsack but any part of it. Moreover it is not necessary to apply the simplex method or any other LP algorithm to solve it as its optimal solution is described by
Theorem 24.1 Suppose that the numbers are all positive and moreover the index order satisfies the inequality
Then there is an index and an optimal solution such that
Notice that there is only at most one non-integer component in . This property will be used at the numerical calculations.
From the point of view of B&B the relation of the Problems (24.1), (24.2), and (24.3) and (24.1), (24.2), and (24.4) is very important. Any feasible solution of the first one is also feasible in the second one. But the opposite statement is not true. In other words the set of feasible solutions of the first problem is a proper subset of the feasible solutions of the second one. This fact has two important consequences:
The optimal value of the Problem (24.1), (24.2), and (24.4) is an upper bound of the optimal value of the Problem (24.1), (24.2), and (24.3).
If the optimal solution of the Problem (24.1), (24.2), and (24.4) is feasible in the Problem (24.1), (24.2), and (24.3) then it is the optimal solution of the latter problem as well.
These properties are used in the course of the branch and bound method intensively.
The basic technique of the B&B method is that it divides the set of feasible solutions into smaller sets and tries to fathom them. The division is called branching as new branches are created in the enumeration tree. A subset is fathomed if it can be determined exactly if it contains an optimal solution.
To show the logic of B&B the problem
will be solved. The course of the solution is summarized on Figure 24.1.
Notice that condition (24.5) is satisfied as
The set of the feasible solutions of (24.6) is denoted by , i.e.
The continuous relaxation of (24.6) is
The set of the feasible solutions of (24.7) is denoted by , i.e.
Thus the difference between (24.6) and (24.7) is that the value of the variables must be either 0 or 1 in (24.6) and on the other hand they can take any value from the closed interval in the case of (24.7).
Because Problem (24.6) is difficult, (24.7) is solved instead. The optimal solution according to Theorem 24.1 is
As the value of is non-integer, the optimal value 67.54 is just an upper bound of the optimal value of (24.6) and further analysis is needed. The value 67.54 can be rounded down to 67 because of the integrality of the coefficients in the objective function.
The key idea is that the sets of feasible solutions of both problems are divided into two parts according the two possible values of . The variable is chosen as its value is non-integer. The importance of the choice is discussed below.
Let
and
Obviously
Hence the problem
is a relaxation of the problem
Problem (24.8) can be solved by Theorem 24.1, too, but it must be taken into consideration that the value of is 0. Thus its optimal solution is
The optimal value is 65.26 which gives the upper bound 65 for the optimal value of Problem (24.9). The other subsets of the feasible solutions are immediately investigated. The optimal solution of the problem
is
giving the value 67.28. Hence 67 is an upper bound of the problem
As the upper bound of (24.11) is higher than the upper bound of (24.9), i.e. this branch is more promising, first it is fathomed further on. It is cut again into two branches according to the two values of as it is the non-integer variable in the optimal solution of (24.10). Let
The sets and are containing the feasible solution of the original problems such that is fixed to 1 and is fixed to 0. In the sets and both variables are fixed to 1. The optimal solution of the first relaxed problem, i.e.
is
As it is integer it is also the optimal solution of the problem
The optimal objective function value is 65. The branch of the sets and is completely fathomed, i.e. it is not possible to find a better solution in it.
The other new branch is when both and are fixed to 1. If the objective function is optimized on then the optimal solution is
Applying the same technique again two branches are defined by the sets
The optimal solution of the branch of is
The optimal value is 63.32. It is strictly less than the objective function value of the feasible solution found in the branch of . Therefore it cannot contain an optimal solution. Thus its further exploration can be omitted although the best feasible solution of the branch is still not known. The branch of is infeasible as objects 1, 2, and 3 are overusing the knapsack. Traditionally this fact is denoted by using as optimal objective function value.
At this moment there is only one branch which is still unfathomed. It is the branch of . The upper bound here is 65 which is equal to the objective function value of the found feasible solution. One can immediately conclude that this feasible solution is optimal. If there is no need for alternative optimal solutions then the exploration of this last branch can be abandoned and the method is finished. If alternative optimal solutions are required then the exploration must be continued. The non-integer variable in the optimal solution of the branch is . The subbranches referred later as the 7th and 8th branches, defined by the equations and , give the upper bounds 56 and 61, respectively. Thus they do not contain any optimal solution and the method is finished.
The calculation is revisited to emphasize the general underlying logic of the method. The same properties are used in the next section when the general frame of B&B is discussed.
Problem (24.6) is a difficult one. Therefore the very similar but much easier Problem (24.7) has been solved instead of (24.6). A priori it was not possible to exclude the case that the optimal solution of (24.7) is the optimal solution of (24.6) as well. Finally it turned out that the optimal solution of (24.7) does not satisfy all constraints of (24.6) thus it is not optimal there. But the calculation was not useless, because an upper bound of the optimal value of (24.6) has been obtained. These properties are reflected in the definition of relaxation in the next section.
As the relaxation did not solved Problem (24.6) therefore it was divided into Subproblems (24.9) and (24.11). Both subproblems have their own optimal solution and the better one is the optimal solution of (24.6). They are still too difficult to be solved directly, therefore relaxations were generated to both of them. These problems are (24.8) and (24.10). The nature of (24.8) and (24.10) from mathematical point of view is the same as of (24.7).
Notice that the union of the sets of the feasible solutions of (24.8) and (24.10) is a proper subset of the relaxation (24.7), i.e.
Moreover the two subsets have no common element, i.e.
It is true for all other cases, as well. The reason is that the branching, i.e. the determination of the Subproblems (24.9) and (24.11) was made in a way that the optimal solution of the relaxation, i.e. the optimal solution of (24.7), was cut off.
The branching policy also has consequences on the upper bounds. Let be the optimal value of the problem where the objective function is unchanged and the set of feasible solutions is . Using this notation the optimal objective function values of the original and the relaxed problems are in the relation
If a subset is divided into and then
Notice that in the current Problem (24.12) is always satisfied with strict inequality
(The values and were mentioned only.) If the upper bounds of a certain quantity are compared then one can conclude that the smaller the better as it is closer to the value to be estimated. An equation similar to (24.12) is true for the non-relaxed problems, i.e. if then
but because of the difficulty of the solution of the problems, practically it is not possible to use (24.13) for getting further information.
A subproblem is fathomed and no further investigation of it is needed if either
its integer (non-relaxed) optimal solution is obtained, like in the case of , or
it is proven to be infeasible as in the case of , or
its upper bound is not greater than the value of the best known feasible solution (cases of and ).
If the first or third of these conditions are satisfied then all feasible solutions of the subproblem are enumerated in an implicit way.
The subproblems which are generated in the same iteration, are represented by two branches on the enumeration tree. They are siblings and have the same parent. Figure 24.1 visualize the course of the calculations using the parent–child relation.
The enumeration tree is modified by constructive steps when new branches are formed and also by reduction steps when some branches can be deleted as one of the three above-mentioned criteria are satisfied. The method stops when no subset remained which has to be still fathomed.
As it was mentioned in the introduction of the chapter, B&B and implicit enumeration can co-operate easily. Implicit enumeration uses so-called tests and obtains consequences on the values of the variables. For example if is fixed to 1 then the knapsack inequality immediately implies that must be 0, otherwise the capacity of the tourist is overused. It is true for the whole branch 2.
On the other hand if the objective function value must be at least 65, which is the value of the found feasible solution then it possible to conclude in branch 1 that the fifth object must be in the knapsack, i.e. must be 1, as the total value of the remaining objects 1, 2, and 4 is only 56.
Why such consequences accelerate the algorithm? In the example there are 5 binary variables, thus the number of possible cases is . Both branches 1 and 2 have 16 cases. If it is possible to determine the value of a variable, then the number of cases is halved. In the above example it means that only 8 cases remain to be investigated in both branches. This example is a small one. But in the case of larger problems the acceleration process is much more significant. E.g. if in a branch there are 21 free, i.e. non-fixed, variables but it is possible to determine the value of one of them then the investigation of 1 048 576 cases is saved. The application of the tests needs some extra calculation, of course. Thus a good trade-off must be found.
The use of information provided by other tools is further discussed in Section 24.5.
Exercises
24.1-1 What is the suggestion of the optimal solution of a Knapsack Problem in connection of an object having (a) negative weight and positive value, (b) positive weight and negative value?
24.1-2 Show that an object of a knapsack problem having negative weight and negative value can be substituted by an object having positive weight and positive value such that the two knapsack problems are equivalent. (Hint. Use complementary variable.)
24.1-3 Solve Problem (24.6) with a branching strategy such that an integer valued variable is used for branching provided that such a variable exists.
The aim of this section is to give a general description of the B&B method. Particular realizations of the general frame are discussed in later sections.
B&B is based on the notion of relaxation. It has not been defined yet. As there are several types of relaxations the first subsection is devoted to this notion. The general frame is discussed in the second subsection.
Relaxation is discussed in two steps. There are several techniques to define relaxation to a particular problem. There is no rule for choosing among them. It depends on the design of the algorithm which type serves the algorithm well. The different types are discussed in the first part titled “Relaxations of a particular problem”. In the course of the solution of Problem (24.6) subproblems were generated which were still knapsack problems. They had their own relaxations which were not totally independent from the relaxations of each other and the main problem. The expected common properties and structure is analyzed in the second step under the title “Relaxation of a problem class”.
The description of Problem (24.6) consists of three parts: (1) the objective function, (2) the algebraic constraints, and (3) the requirement that the variables must be binary. This structure is typical for optimization problems. In a general formulation an optimization problem can be given as
The underlying logic of generating relaxation (24.7) is that constraint (24.16) has been substituted by a looser one. In the particular case it was allowed that the variables can take any value between 0 and 1. In general (24.16) is replaced by a requirement that the variables must belong to a set, say , which is larger than , i.e. the relation must hold. More formally the relaxation of Problem (24.14)-(24.16) is the problem
(24.14)
(24.15)
This type of relaxation can be applied if a large amount of difficulty can be eliminated by changing the nature of the variables.
There is a similar technique such that (24.16) the inequalities (24.15) are relaxed instead of the constraints. A natural way of this type of relaxation is the following. Assume that there are inequalities in (24.15). Let be fixed numbers. Then any satisfying (24.15) also satisfies the inequality
Then the relaxation is the optimization of the (24.14) objective function under the conditions (24.18) and (24.16). The name of the inequality (24.18) is surrogate constraint.
The problem
is a general zero-one optimization problem. If then the relaxation obtained in this way is Problem (24.6). Both problems belong to NP-complete classes. However the knapsack problem is significantly easier from practical point of view than the general problem, thus the relaxation may have sense. Notice that in this particular problem the optimal solution of the knapsack problem, i.e. (1,0,1,1,0), satisfies the constraints of (24.19), thus it is also the optimal solution of the latter problem.
Surrogate constraint is not the only option in relaxing the algebraic constraints. A region defined by nonlinear boundary surfaces can be approximated by tangent planes. For example if the feasible region is the unit circuit which is described by the inequality
can be approximated by the square
If the optimal solution on the enlarged region is e.g. the point (1,1) which is not in the original feasible region then a cut must be found which cuts it from the relaxed region but it does not cut any part of the original feasible region. It is done e.g. by the inequality
A new relaxed problem is defined by the introduction of the cut. The method is similar to one of the method relaxing of the objective function discussed below.
In other cases the difficulty of the problem is caused by the objective function. If it is possible to use an easier objective function, say , but to obtain an upper bound the condition
must hold. Then the relaxation is
(24.15)
(24.16)
This type of relaxation is typical if B&B is applied in (continuous) nonlinear optimization. An important subclass of the nonlinear optimization problems is the so-called convex programming problem. It is again a relatively easy subclass. Therefore it is reasonable to generate a relaxation of this type if it is possible. A Problem (24.14)-(24.16) is a convex programming problem, if is a convex set, the functions are convex and the objective function is concave. Thus the relaxation can be a convex programming problem if only the last condition is violated. Then it is enough to find a concave function such that (24.20) is satisfied.
For example the single variable function is not concave in the interval .
Footnote. A continuous function is concave if its second derivative is negative. which is positive in the open interval .
Thus if it is the objective function in an optimization problem it might be necessary that it is substituted by a concave function such that . It is easy to see that satisfies the requirements.
Let be the optimal solution of the relaxed problem (24.21), (24.15), and (24.16). It solves the original problem if the optimal solution has the same objective function value in the original and relaxed problems, i.e. .
Another reason why this type of relaxation is applied that in certain cases the objective function is not known in a closed form, however it can be determined in any given point. It might happen even in the case if the objective function is concave. Assume that the value of is known in the points . If concave then it is smooth, i.e. its gradient exists. The gradient determines a tangent plane which is above the function. The equation of the tangent plane in point is
Footnote. The gradient is considered being a row vector.
Hence in all points of the domain of the function we have that
Obviously the function is an approximation of function .
The idea if the method is illustrated on the following numerical example. Assume that an “unknown” concave function is to be maximized on the [0,5] closed interval. The method can start from any point of the interval which is in the feasible region. Let 0 be the starting point. According to the assumptions although the closed formula of the function is not known, it is possible to determine the values of function and its derivative. Now the values and are obtained. The general formula of the tangent line in the point is
Hence the equation of the first tangent line is giving the first optimization problem as
As is a monotone increasing function, the optimal solution is . Then the values and are provided by the method calculating the function. The equation of the second tangent line is . Thus the second optimization problem is
As the second tangent line is a monotone decreasing function, the optimal solution is in the intersection point of the two tangent lines giving . Then the values and are calculated and the equation of the tangent line is . The next optimization problem is
The optimal solution is . It is the intersection point of the first and third tangent lines. Now both new intersection points are in the interval [0,5]. In general some intersection points can be infeasible. The method goes in the same way further on. The approximated “unknow” function is .
Another relaxation called Lagrange relaxation. In that method both the objective function and the constraints are modified. The underlying idea is the following. The variables must satisfy two different types of constraints, i.e. they must satisfy both (24.15) and (24.16). The reason that the constraints are written in two parts is that the nature of the two sets of constraints is different. The difficulty of the problem caused by the requirement of both constraints. It is significantly easier to satisfy only one type of constraints. So what about to eliminate one of them?
Assume again that the number of inequalities in (24.15) is . Let be fixed numbers. The Lagrange relaxation of the problem (24.14)–(24.16) is
(24.16)
Notice that the objective function (24.22) penalizes the violation of the constraints, e.g. trying to use too much resources, and rewards the saving of resources. The first set of constraints disappeared from the problem. In most of the cases the Lagrange relaxation is a much easier one than the original problem. In what follows Problem (24.14)–(24.16) is also denoted by and the Lagrange relaxation is referred as . The notation reflects the fact that the Lagrange relaxation problem depends on the choice of 's. The numbers 's are called Lagrange multipliers.
It is not obvious that is really a relaxation of . This relation is established by
Theorem 24.2 Assume that both and have optimal solutions. Then for any nonnegative the inequality
holds.
Proof. The statement is that the optimal value of is an upper bound of the optimal value of . Let be the optimal solution of . It is obviously feasible in both problems. Hence for all the inequalities , hold. Thus which implies that
Here the right-hand side is the objective function value of a feasible solution of , i.e.
There is another connection between and which is also important from the point of view of the notion of relaxation.
Theorem 24.3 Let be the optimal solution of the Lagrange relaxation. If
and
then is an optimal solution of .
Proof. (24.23) means that is a feasible solution of . For any feasible solution of it follows from the optimality of that
i.e. is at least as good as .
The importance of the conditions (24.23) and (24.24) is that they give an optimality criterion, i.e. if a point generated by the Lagrange multipliers satisfies them then it is optimal in the original problem. The meaning of (24.23) is that the optimal solution of the Lagrange problem is feasible in the original one and the meaning of (24.24) is that the objective function values of are equal in the two problems, just as in the case of the previous relaxation. It also indicates that the optimal solutions of the two problems are coincident in certain cases.
There is a practical necessary condition for being a useful relaxation which is that the relaxed problem is easier to solve than the original problem. The Lagrange relaxation has this property. It can be shown on Problem (24.19). Let , . Then the objective function (24.22) is the following
The only constraint is that all variables are binary. It implies that if a coefficient is positive in the objective function then the variable must be 1 in the optimal solution of the Lagrange problem, and if the coefficient is negative then the variable must be 0. As the coefficient of is zero, there are two optimal solutions: (1,0,1,1,0) and (1,1,1,1,0). The first one satisfies the optimality condition thus it is an optimal solution. The second one is infeasible.
They have three common properties.
All feasible solutions are also feasible in the relaxed problem.
The optimal value of the relaxed problem is an upper bound of the optimal value of the original problem.
There are cases when the optimal solution of the relaxed problem is also optimal in the original one.
The last property cannot be claimed for all particular case as then the relaxed problem is only an equivalent form of the original one and needs very likely approximately the same computational effort, i.e. it does not help too much. Hence the first two properties are claimed in the definition of the relaxation of a particular problem.
Definition 24.4 Let be two functions mapping from the -dimensional Euclidean space into the real numbers. Further on let be two subsets of the -dimensional Euclidean space. The problem
is a relaxation of the problem
if
(i) and
(ii) it is known priori, i.e. without solving the problems that (24.25) (24.26).
No exact definition of the notion of problem class will be given. There are many problem classes in optimization. A few examples are the knapsack problem, the more general zero-one optimization, the traveling salesperson problem, linear programming, convex programming, etc. In what follows problem class means only an infinite set of problems.
One key step in the solution of (24.6) was that the problem was divided into subproblems and even the subproblems were divided into further subproblems, and so on.
The division must be carried out in a way such that the subproblems belong to the same problem class. By fixing the value of a variable the knapsack problem just becomes another knapsack problem of lesser dimension. The same is true for almost all optimization problems, i.e. a restriction on the value of a single variable (introducing either a lower bound, or upper bound, or an exact value) creates a new problem in the same class. But restricting a single variable is not the only possible way to divide a problem into subproblems. Sometimes special constraints on a set of variables may have sense. For example it is easy to see from the first constraint of (24.19) that at most two out of the variables , , and can be 1. Thus it is possible to divide it into two subproblems by introducing the new constraint which is either , or . The resulted problems are still in the class of binary optimization. The same does not work in the case of the knapsack problem as it must have only one constraint, i.e. if a second inequality is added to the problem then the new problem is out of the class of the knapsack problems.
The division of the problem into subproblems means that the set of feasible solutions is divided into subsets not excluding the case that one or more of the subsets turn out to be empty set. and gave such an example.
Another important feature is summarized in formula (24.12). It says that the upper bound of the optimal value obtained from the undivided problem is at most as accurate as the upper bound obtained from the divided problems.
Finally, the further investigation of the subset could be abandoned as was not giving a higher upper bound as the objective function value of the optimal solution on which lies at the same time in , too, i.e. the subproblem defined on the set was solved.
The definition of the relaxation of a problem class reflects the fact that relaxation and defining subproblems (branching) are not completely independent. In the definition it is assumed that the branching method is a priori given.
Definition 24.5 Let and be two problem classes. Class is a relaxation of class if there is a map with the following properties.
maps the problems of into the problems of .
If a problem (P) is mapped into (Q) then (Q) is a relaxation of (P) in the sense of Definition 24.4.
If (P) is divided into (), ,() and these problems are mapped into (), ,(), then the inequality
holds.
There are infinite many pairs (P), (Q) such that an optimal solution of (Q) is also optimal in (P).
As the Reader has already certainly observed B&B divides the problem into subproblems and tries to fathom each subproblem by the help of a relaxation. A subproblem is fathomed in one of the following cases:
The optimal solution of the relaxed subproblem satisfies the constraints of the unrelaxed subproblem and its relaxed and non-relaxed objective function values are equal.
The infeasibility of the relaxed subproblem implies that the unrelaxed subproblem is infeasible as well.
The upper bound provided by the relaxed subproblem is less (in the case if alternative optimal solution are sought) or less or equal (if no alternative optimal solution is requested) than the objective function value of the best known feasible solution.
The algorithm can stop if all subsets (branches) are fathomed. If nonlinear programming problems are solved by B&B then the finiteness of the algorithm cannot be always guaranteed.
In a typical iteration the algorithm executes the following steps.
It selects a leaf of the branching tree, i.e. a subproblem not divided yet into further subproblems.
The subproblem is divided into further subproblems (branches) and their relaxations are defined.
Each new relaxed subproblem is solved and checked if it belongs to one of the above-mentioned cases. If so then it is fathomed and no further investigation is needed. If not then it must be stored for further branching.
If a new feasible solution is found which is better than the so far best one, then even stored branches having an upper bound less than the value of the new best feasible solution can be deleted without further investigation.
In what follows it is supposed that the relaxation satisfies definition 24.5.
The original problem to be solved is
(24.14)
(24.15)
(24.16)
Thus the set of the feasible solutions is
The relaxed problem satisfying the requirements of definition 24.5 is
where and for all points of the domain of the objective functions and for all points of the domain of the constraint functions . Thus the set of the feasible solutions of the relaxation is
Let be a previously defined subset. Suppose that it is divided into the subsets , i.e.
Let and be the feasible sets of the relaxed subproblems. To satisfy the requirement (24.27) of definition 24.5 it is assumed that
The subproblems are identified by their sets of feasible solutions. The unfathomed subproblems are stored in a list. The algorithm selects a subproblem from the list for further branching. In the formal description of the general frame of B&B the following notations are used.
Note that . The frame of the algorithms can be found below. It simply describes the basic ideas of the method and does not contain any tool of acceleration.
Branch-and-Bound
1 2 3 4WHILE
5DO
determination of 6 7 determination of 8 determination of branching 9FOR
TO
DO
10 11 calculation of 12IF
13THEN
IF
14THEN
15ELSE
16 17FOR
TO
DO
18IF
19THEN
20RETURN
The operations in rows 5, 7, 8, and 11 depend on the particular problem class and on the skills of the designer of the algorithm. The relaxed subproblem is solved in row 11. A detailed example is discussed in the next section. The handling of the list needs also careful consideration. Section 24.4 is devoted to this topic.
The loop in rows 17 and 18 can be executed in an implicit way. If the selected subproblem in row 5 has a low upper bound, i.e. then the subproblem is fathomed and a new subproblem is selected.
However the most important issue is the number of required operations including the finiteness of the algorithm. The method is not necessarily finite. Especially nonlinear programming has infinite versions of it. Infinite loop may occur even in the case if the number of the feasible solutions is finite. The problem can be caused by an incautious branching procedure. A branch can belong to an empty set. Assume that that the branching procedure generates subsets from such that one of the subsets is equal to and the other ones are empty sets. Thus there is an index i such that
If the same situation is repeated at the branching of then an infinite loop is possible.
Assume that a zero-one optimization problem of variables is solved by B&B and the branching is made always according to the two values of a free variable. Generally it is not known that how large is the number of the feasible solutions. There are at most feasible solutions as it is the number of the zero-one vectors. After the first branching there are at most feasible solutions in the two first level leaves, each. This number is halved with each branching, i.e. in a branch on level there are at most feasible solutions. It implies that on level there is at most feasible solution. As a matter of fact on that level there is exactly 1 zero-one vector and it is possible to decide whether or not it is feasible. Hence after generating all branches on level the problem can be solved. This idea is generalized in the following finiteness theorem. While formulating the statement the previous notations are used.
(i) The set is finite.
(ii) There is a finite set such that the following conditions are satisfied. If a subset is generated in the course of the branch and bound method then there is a subset of such that . Furthermore if the branching procedure creates the cover then has a partitioning such that
and moreover
(iii) If a set belonging to set has only a single element then the relaxed subproblem solves the unrelaxed subproblem as well.
Then the Branch-and-Bound
procedure stops after finite many steps. If then there is no feasible solution. Otherwise is equal to the optimal objective function value.
Proof. Assume that the procedure Branch-and-Bound
executes infinite many steps. As the set is finite it follows that there is at least one subset of say such that it defines infinite many branches implying that the situation described in (24.29) occurs infinite many times. Hence there is an infinite sequence of indices, say , such that is created at the branching of and . On the other hand the parallel sequence of the sets must satisfy the inequalities
It is impossible because the s are finite sets.
The finiteness of implies that optimal solution exists if and only if is nonempty, i.e. the problem cannot be unbounded and if feasible solution exist then the supremum of the objective function is its maximum. The initial value of is . It can be changed only in row 14 of the algorithm and if it is changed then it equals to the objective function value of a feasible solution. Thus if there is no feasible solution then it remains . Hence if the second half of the statement is not true, then at the end of the algorithm equal the objective function value of a non-optimal feasible solution or it remains .
Let be the maximal index such that still contains the optimal solution. Then
Hence it is not possible that the branch containing the optimal solution has been deleted from the list in the loop of rows 17 and 18, as . It is also sure that the subproblem
has not been solved, otherwise the equation should hold. Then only one option remained that was selected for branching once in the course of the algorithm. The optimal solution must be contained in one of its subsets, say which contradicts the assumption that has the highest index among the branches containing the optimal solution.
Remark. Notice that the binary problems mentioned above with 's of type
where is the set of fixed variables and is a fixed value, satisfy the conditions of the theorem.
If an optimization problem contains only bounded integer variables then the sets s are the sets the integer vectors in certain boxes. In the case of some scheduling problems where the optimal order of tasks is to be determined even the relaxations have combinatorial nature because they consist of permutations. Then is also possible. In both of the cases Condition (iii) of the theorem is fulfilled in a natural way.
Exercises
24.2-1 Decide if the Knapsack Problem can be a relaxation of the Linear Binary Optimization Problem in the sense of Definition 24.5. Explain your solution regardless that your answer is YES or NO.
Many decisions have both continuous and discrete nature. For example in the production of electric power the discrete decision is to switch on or not an equipment. The equipment can produce electric energy in a relatively wide range. Thus if the first decision is to switch on then a second decision must be made on the level of the produced energy. It is a continuous decision. The proper mathematical model of such problems must contain both discrete and continuous variables.
This section is devoted to the mixed integer linear programming problem with bounded integer variables. It is assumed that there are variables and a subset of them, say must be integer. The model has linear constraints in equation form and each integer variable has an explicit integer upper bound. It is also supposed that all variables must be nonnegative. More formally the mathematical problem is as follows.
where and are -dimensional vectors, is an matrix, is an -dimensional vector and finally all is a positive integer.
In the mathematical analysis of the problem below the the explicit upper bound constraints (24.33) will not be used. The Reader may think that they are formally included into the other algebraic constraints (24.32).
There are technical reasons that the algebraic constraints in (24.32) are claimed in the form of equations. Linear programming relaxation is used in the method. The linear programming problem is solved by the simplex method which needs this form. But generally speaking equations and inequalities can be transformed into one another in an equivalent way. Even in the numerical example discussed below inequality form is used.
First a numerical example is analyzed. The course of the method is discussed from geometric point of view. Thus some technical details remain concealed. Next simplex method and related topics are discussed. All technical details can be described only in the possession of them. Finally some strategic points of the algorithm are analyzed.
The problem to be solved is
To obtain a relaxation the integrality constraints are omitted from the problem. Thus a linear programming problem of two variables is obtained.
Figure 24.2. The geometry of linear programming relaxation of Problem (24.36) including the feasible region (triangle ), the optimal solution (), and the optimal level of the objective function represented by the line .
Figure 24.3. The geometry of the course of the solution. The co-ordinates of the points are: O=(0,0), A=(0,3), B=(2.5,1.5), C=(2,1.8), D=(2,1.2), E=(,2), F=(0,2), G=(,1), H=(0,1), I=(1,2.4), and J=(1,2). The feasible regions of the relaxation are as follows. Branch 1: , Branch 2: , Branch 3: empty set, Branch 4: , Branch 5: , Branch 6: , Branch 7: empty set (not on the figure). Point J is the optimal solution.
Figure 24.4. The course of the solution of Problem (24.36). The upper numbers in the circuits are explained in subsection 24.3.2. They are the corrections of the previous bounds obtained from the first pivoting step of the simplex method. The lower numbers are the (continuous) upper bounds obtained in the branch.
The branching is made according to a non-integer variable. Both and have fractional values. To keep the number of branches as low as possible, only two new branches are created in a step.
The numbering of the branches is as follows. The original set of feasible solutions is No. 1. When the two new branches are generated then the branch belonging to the smaller values of the branching variable has the smaller number. The numbers are positive integers started at 1 and not skipping any integer. Branches having no feasible solution are numbered, too.
The optimal solution of the relaxation is , and the optimal value is as it can be seen from figure 24.2. The optimal solution is the intersection point the lines determined by the equations
and
If the branching is based on variable then they are defined by the inequalities
Notice that the maximal value of is 2.5. In the next subsection the problem is revisited. Then this fact will be observed from the simplex tableaux. Variable would create the branches
None of them is empty. Thus it is more advantageous the branch according to . Geometrically it means that the set of the feasible solutions in the relaxed problem is cut by the line . Thus the new set becomes the quadrangle on Figure 24.3. The optimal solution on that set is . It is point C on the figure.
Now branching is possible according only to variable . Branches 4 and 5 are generated by the cuts and , respectively. The feasible regions of the relaxed problems are of Branch 4, and of Branch 5. The method continues with the investigation of Branch 5. The reason will be given in the next subsection when the quickly calculable upper bounds are discussed. On the other hand it is obvious that the set is more promising than if the Reader takes into account the position of the contour, i.e. the level line, of the objective function on Figure 24.3. The algebraic details discussed in the next subsection serve to realize the decisions in higher dimensions what is possible to see in 2-dimension.
Branches 6 and 7 are defined by the inequalities and , respectively. The latter one is empty again. The feasible region of Branch 6 is . The optimal solution in this quadrangle is the Point I. Notice that there are only three integer points in which are (0,3), (0,2), and (1,2). Thus the optimal integer solution of this branch is (1,2). There is a technique which can help to leap over the continuous optimum. In this case it reaches directly point J, i.e. the optimal integer solution of the branch as it will be seen in the next section, too. Right now assume that the integer optimal solution with objective function value 4 is uncovered.
At this stage of the algorithm the only unfathomed branch is Branch 4 with feasible region . Obviously the optimal solution is point G=(,1). Its objective function value is . Thus it cannot contain a better feasible solution than the known (1,2). Hence the algorithm is finished.
The first ever general method solving linear programming problems were discovered by George Dantzig and called simplex method. There are plenty of versions of the simplex method. The main tool of the algorithm is the so-called dual simplex method. Although simplex method is discussed in a previous volume, the basic knowledge is summarized here.
Any kind of simplex method is a so-called pivoting algorithm. An important property of the pivoting algorithms is that they generate equivalent forms of the equation system and – in the case of linear programming – the objective function. Practically it means that the algorithm works with equations. As many variables as many linearly independent equations exist are expressed with other variables and further consequences are drawn from the current equivalent form of the equations.
If there are inequalities in the problem then they are reformulated by introducing nonnegative slack variables. E.g. in case of LP-relaxation of Problem (24.36) the equivalent form of the problem is
Notice that all variables appear in all equations including the objective function, but it is allowed that some coefficients are zeros. The current version (24.37) can be considered as a form where the variables and are expressed by and and the expression is substituted into the objective function. If then and , thus the solution is feasible. Notice that the value of the objective function is 0 and if it is possible to increase the value of any of and and still getting a feasible solution then a better feasible solution is obtained. It is true, because the method uses equivalent forms of the objective function. The method obtains better feasible solution by pivoting. Let and be the two expressed variables. Skipping some pivot steps the equivalent form of (24.37) is
That form has two important properties. First if then and , thus the solution is feasible, similarly to the previous case. Generally this property is called primal feasibility. Secondly, the coefficients of the non-expressed variables are negative in the objective function. It is called dual feasibility. It implies that if any of the non-expressed variables is positive in a feasible solution then that is worse than the current one. It is true again, because the current form of the objective function is equivalent to the original one. Thus the current value of the objective function which is , is optimal.
In what follows the sign of maximization and the nonnegativity of the variables will be omitted from the problem but they are kept in mind.
In the general case it may be assumed without loss of generality that all equations are independent. Simplex method uses the form of the problem
where is an matrix, and are -dimensional vectors, and is an -dimensional vector. According to the assumption that all equations are independent, has linearly independent columns. They form a basis of the -dimensional linear space. They also form an invertible submatrix. It is denoted by . The inverse of is . Matrix is partitioned into the basic and non-basic parts: and the vectors and are partitioned accordingly. Hence
The expression of the basic variables is identical with the multiplication of the equation by from left
where is the unit matrix. Sometimes the equation is used in the form
The objective function can be transformed into the equivalent form
Notice that the coefficients of the basic variables are zero. If the non-basic variables are zero valued then the value of the basic variables is given by the equation
Hence the objective function value of the basic solution is
Definition 24.7 A vector is a solution of Problem (24.39)-(24.41) if it satisfies the equation (24.40). It is a feasible solution or equivalently a primal feasible solution if it satisfies both (24.40) and (24.41). A solution is a basic solution if the columns of matrix belonging to the non-zero components of are linearly independent. A basic solution is a basic feasible or equivalently a basic primal feasible solution if it is feasible. Finally a basic solution is basic dual feasible solution if
The primal feasibility of a basic feasible solution is equivalent to
Let be the column vectors of matrix . Further on let and be the set of indices of the basic and non-basic variables, respectively. Then componentwise reformulation of (24.44) is
The meaning of the dual feasibility is this. The current value of the objective function given in (24.43) is an upper bound of the optimal value as all coefficients in the equivalent form of the objective function is nonpositive. Thus if any feasible, i.e. nonnegative, solution is substituted in it then value can be at most the constant term, i.e. the current value.
Definition 24.8 A basic solution is OPTIMAL if it is both primal and dual feasible.
It is known from the theory of linear programming that among the optimal solutions there is always at least one basic solution. To prove this statement is beyond the scope of the chapter.
In Problem (24.37)
If the basic variables are and then
Hence
The last equation is true as is the unit matrix. Finally
One can conclude that this basic solution is both primal and dual feasible.
There are two types of simplex methods. Primal simplex method starts from a primal feasible basic solution. Executing pivoting steps it goes through primal feasible basic solutions and finally even the dual feasibility achieved. The objective function values are monotone increasing in the primal simplex method.
The dual simplex method starts from a dual feasible basic solution it goes through dual feasible basic solutions until even primal feasibility is achieved in the last iteration. The objective function values are monotone decreasing in the dual simplex method. We discuss it in details as it is the main algorithmic tool of the method.
Each simplex method uses its own simplex tableau. Each tableau contains the transformed equivalent forms of the equations and the objective function. In the case of the dual simplex tableau the elements of it are derived from the form of the equations
where the second equation indicates that the minus sign is associated to non-basic variables. The dual simplex tableau contains the expression of all variables by the negative non-basic variables including the objective function variable and the non-basic variables. For the latter ones the trivial
equation is included. For example the dual simplex tableau for (24.37) is provided that the basic variables are and (see (24.38))
Generally speaking the potentially non-zero coefficients of the objective function are in the first row, the constant terms are in the first column and all other coefficients are in the inside of the tableau. The order of the rows is never changed. On the other hand a variable which left the basis immediately has a column instead of that variable which entered the basis.
The elements of the dual simplex tableau are denoted by where refers to the constant term of the equation of variable and otherwise and is the coefficient of the non-basic variable in the expression of the variable . As is the objective function variable is the coefficient of in the equivalent form (24.42) of the objective function. The dual simplex tableau can be seen on Figure 24.5.
Notice that dual feasibility means that there are nonnegative elements in the first row of the tableau with the potential exception of its first element, i.e. with the potential exception of the objective function value.
Without giving the proof of its correctness the pivoting procedure is this. The aim of the pivoting is to eliminate the primal infeasibility, i.e. the negative values of the variables, with the potential exception of the objective function value, i.e. the elimination of the negative terms from the first column. Instead of that basic variable a non-basic one will be expressed from the equation such that the negative constant term becomes zero and the value of the new basic variable, i.e. the value of , becomes positive. It is easy to see that this requirement can be satisfied only if the new expressed variable, say , has a negative coefficient in the equation, i.e. . After the change of the basis the row of must become a negative unit vector as became a non-basic variable, thus its expression is
The transformation of the tableau consists of the transformations of the columns such that the form (24.45) of the row of is generated. The position of the (-1) in the row is the crossing of the row of and the column belonging to before pivoting. This column becomes the column of . There is another requirement claiming that the dual feasibility must hold on. Let be the column of the non-basic variable including as the column of the constant terms. Then the formulae of the column transformation are the followings where is either zero or the index of a non-basic variable different from :
and
To maintain dual feasibility means that after the change of the basis the relation must hold for all non-basic indices, i.e. for all . It follows from (24.46) that must be chosen such that
In the course of the branch method in the optimization of the relaxed subproblems dual simplex method can save a lot of computation. On the other hand what is used in the description of the method, is only the effect of one pivoting on the value of the objective function. According to (24.46) the new value is
Notice that and . Hence the objective function value decreases by the nonnegative value
The formula (24.48) will be used if a new constraint is introduced at branching and it cuts the previous optimal solution. As the new constraint has nothing to do with the objective function, it will not destroy dual feasibility, but, of course, the optimal solution of the relaxed problem of the branch becomes primal infeasible.
For example the inequality is added to the relaxation (24.37) defining a new branch then it is used in the equation form
where is nonnegative continuous variable. According to the simplex tableau
Hence
(24.49) is added to the problem in the form (24.50). Then the dual simplex tableau is
Only has a negative value, thus the first pivoting must be done in its row. Rule (24.47) selects for entering the basis. Then after the first pivot step the value of the objective function decreases by
If the optimal solution of the relaxed problem is not reached after the first pivoting then further decrease is possible. But decrease of 0.7 is sure compared to the previous upper bound.
Another important property of the cuts is that if it has no negative coefficient in the form how it is added to the simplex tableau then there is no negative pivot element, i.e. the relaxed problem is infeasible, i.e. the branch is empty. For example the cut leading to an empty branch is added to the problem in the form
where is also a nonnegative variable. Substituting the value of again the equation is transformed to
Hence the simplex tableau
is obtained. There is a negative value at the crossing point of the first column and the row of . But it is not possible to choose a pivot element in that row, as there is no negative element of the row. It means that feasibility can not be achieved, i.e. that branch is infeasible and thus it is completely fathomed.
The branching is always based on a variable which should be integer but in the current optimal solution of the linear programming relaxation it has fractional value. If it has fractional value then its value is non-zero thus it is basic variable. Assume that its index is . Remember that , , and are the index sets of the integer, basic, and non-basic variables, respectively. Hence . According to the last simplex tableau is expressed by the non-basic variables as follows:
As has fractional value
The branch created by the inequality
is called lower branch and the inequality
creates the upper branch.
Let and be the set of indices of the nonbasic variables according to the signs of the coefficients in (24.51), i.e.
First the lower branch is analyzed. It follows from (24.51) that the inequality (24.52) is equivalent to
Thus
is a nonnegative variable and row (24.53) can be added to the dual simplex tableau. It will contain the only negative element in the first column that is the optimization in the lower branch starts by pivoting in this row. (24.53) can be reformulated according to the signs of the coefficients as
The pivot element must be negative, thus it is one of 's with . Hence the first decrease (24.48) of the objective function is
In the upper branch the inequality (24.52) is equivalent to
Again the nonnegative slack variable should be introduced. Then the row which can be added to the simplex tableau is
Thus the pivot element must belong to one of the indices giving the value
Let be the upper bound on the original branch obtained by linear programming. Then the quantities and define the upper bounds of the objective functions and on the lower and upper subbranches, respectively. They are not substituting complete optimization in the subbranches. On the other hand they are easily computable and can give some orientation to the selection of the next branch for further investigation (see below).
The quantities and can be improved, i.e. increased. The claim that the variable defined in (24.54) is nonnegative implies that
In a similar way the nonnegativity of variable in (24.56) implies that
If (24.59) is multiplied by the positive number
then it gets the form
The inequalities (24.58) and (24.60) can be unified in the form:
Notice that (24.61) not the sum of the two inequalities. The same negative number stands on the left-hand side of both inequalities and is greater or equal than the right-hand side. Then both right-hand sides must have negative value. Hence the left-hand side is greater than their sum.
The same technique is applied to the variable instead of with
where 's are integer values to be determined later. can be expressed by the non-basic variables as
Obviously is an integer variable as well and its current value if the non-basic variables are fixed to zero is equal to the current value of . Thus it is possible to define the new branches according to the values of . Then the inequality of type (24.61) which is defined by , has the form
The appropriate quantities and are as follows:
where
and
further
where
and
The values of the integers must be selected in a way that the absolute values of the coefficients are as small as possible, because the inequality cuts the greatest possible part of the polyhedral set of the continuous relaxation in this way. (See Exercise 24.3-1.) To do so the absolute value of must be small. Depending on its sign it can be either , or , where is the fractional part of , i.e. .
Assume that . If then the term
is present among the terms of the minimum of the lower branch. If then it obviously is at least as great as the term
which appears in , i.e. in the right-hand side of (24.55). If then there is a term
is in the right-hand side of (24.57) . is a common multiplier in the terms (24.62) and (24.63), therefore it can be disregarded when the terms are compared. Under the assumption that it will be shown that
As is supposed to be negative the statement is equivalent to
Hence the inequality
must hold. It can be reduced to
It is true as and
If then according to (24.57) and (24.61) the term
is present among the terms of the minimum of the upper branch. In a similar way it can be shown that if then it is always at least as great as the term
which is present in the original formula (24.57).
Thus the rule of the choice of the integers 's is
The B&B frame doesn't have any restriction in the selection of the unfathomed node for the next branching in row 7 of Branch-and-Bound
. First two extreme strategies are discussed with pros and cons. The same considerations have to be taken in almost all applications of B&B. The third method is a compromise between the two extreme ones. Finally methods more specific to the integer programming are discussed.
LIFO means “Last-In-First-Out”, i.e. one of the branches generated in the last iteration is selected. A general advantage of the rule is that the size of the enumeration tree and the size of the list remains as small as possible. In the case of the integer programming problem the creation of the branches is based on the integer values of the variables. Thus the number of the branches is at most if the branching variable is . In the LIFO strategy the number of leaves is strictly less then the number of the created branches on each level with the exemption of the deepest level. Hence at any moment the enumeration tree may not have more than
leaves.
The drawback of the strategy is that the flexibility of the enumeration is lost. The flexibility is one of the the main advantage of B&B in solving pure integer problems.
If the algorithm skips from one branch to another branch far away from the first one then it is necessary to reconstruct the second branch including not only the branching restrictions on the variables but any other information which is necessary to the bounding procedure. In the particular algorithm the procedure determining the bound is linear programming, more precisely a simplex method. If a new restriction as a linear constraint is added to the problem which cuts off the previous optimal solution, then the simplex tableau looses the primal feasibility but the dual feasibility still holds. Thus a dual simplex method can immediately start without carrying out a first phase. (The purpose of the first phase which itself is a complete optimization, is to find a primal or dual feasible basic solution depending for primal or dual simplex method, respectively.) If the B&B method skips to another branch then to get the new bound by the simplex method will require the execution of the first phase, too.
A further consideration concerns to the construction of feasible solutions. Generally speaking if good feasible solutions are known in the early phase of the algorithm then the whole procedure can be accelerated. In the current algorithm branching has a “constructive nature”. It means that the value of the branching variable becomes more restricted therefore it either becomes integer in the further optimal solutions in the subbranches of the branch, or it will be restricted further on. Thus it can be expected that sooner or later a complete integer solution is constructed which might be feasible or infeasible. On the other hand if the algorithm skips frequently in the phase when no feasible solution is known then it is very likely that any construction will be finished only later, i.e. the acceleration will not take place, because of the lack of feasible solution.
If a LIFO type step is to be done and the branching variable is then the lower branch should be chosen in step 7 of the algorithm, if
The other extreme strategy is that the branch having the maximal bound is selected in each iteration. The idea is simple and clear: it is the most promising branch therefore it worth to explore it.
Unfortunately the idea is not completely true. The bounds of the higher level branches are not accurate enough. This phenomenon has been discussed during the analysis of the numerical example in the subsection 24.1.3 in relation (24.12). Thus a somewhat smaller upper bound in a lower branch can indicate a more promising branch.
The maximal bound strategy can lead to a very wide enumeration tree which may cause memory problems. Moreover the construction of feasible solutions will be slow and therefore the relatively few solutions will be enumerated implicitly, i.e. the number of steps will be high, i.e. the method will be slow.
If the optimal solution of the relaxed problem is non-integer then it can have several fractional components. All of them must be changed to be integer to obtain the optimal integer programming solution of the branch. The change of the value of each currently fractional variable as a certain cost. The cost of the individual changes are estimated and summed up. The cost means the loss in the value of the objective function. An adjusted value of the bound of the branch is obtained if the sum of the estimated individual costs is subtracted from the current bound. It is important to emphasize that the adjusted value is not an upper or lower bound of the optimal value of integer programming solution of the branch but it is only a realistic estimation.
There are two ways to obtain the estimation. The first one uses the crude values of the fractionality. Let and be the fractional part of variable in the current branch and in the relaxed problem of the original problem, respectively. Further on let , , and be the optimal value of the relaxed problem in the current branch, in the original problem, and the value of the best feasible integer solution found so far. Generally speaking the measure of the fractionality of a real number is that how far is to the closest integer, i.e.
Hence the estimate is
(24.65) takes into account the average inaccuracy of the bounds.
The fast bounds defined in (24.55) and (24.57) can be used also for the same purpose. They concern to the correction of the fractionality of a single variable in the current branch. Hence the estimate
is a natural choice.
The constraints defining the branches are integer valued lower and upper bounds on the branching variables. Thus one can expect that these new constraints force the majority of the branching variables to be integer. It means that the integrality of the optimal solution of the relaxed problem improves with the depth of the branch. Thus it is possible to connect the last two rules on the following way. The current bound is abandoned and the algorithm selects the best bound is the improvement based on estimates is above a certain threshold.
In selecting the branching variable again both the fractional part of the non-integer variables and the fast bounds have critical role. A further factor can be the information obtained from the user.
The most significant change can be expected from that variable which is farthest from integers as the cuts defining the two new branches cut the most. As the measure of fractionality is the rule suggest to choose the branching variable as
Upper bounds are
in the lower and upper branches of branch if the branching variable is .
Here are five possible selection criteria:
Which one can be offered for a B&B algorithm?
Notice that
is a correct upper bound of branch as it has been mentioned earlier. Thus (24.66) selects according to the most inaccurate upper bound. It is obviously not good. (24.68) makes just the opposite it selects the variable giving the most accurate bound. On the other hand
is the upper bound in the worse one of the two subbranches. The interest of the algorithm is that it will be fathomed without explicit investigation, i.e. the bound of this subbranch will be less than the objective function value of an integer feasible solution. Thus it is good if (24.71) is as small as possible. Hence (24.69) is a good strategy and (24.67) is not. Finally, (24.70) tries to separate the good and low quality feasible solutions. The conclusion is that (24.69) and (24.70) are the two best ones and (24.68) is still applicable, but (24.66) and (24.67) must be avoided.
Assume that the numerical problem (24.31)-(24.35) is the model of an industrial problem. Then the final user is the manager and/or expert who must apply the decisions coded into the optimal solution. The expert may know that which factors (decisions) are the most critical ones from the point of view of the managerial problem and the industrial system. The variables belonging to these factors may have a special importance. Therefore it has sense if the user may define a priority order of variables. Then the first non-integer variable of the order can be selected as branching variable.
The solution of the problem
(24.36)
has been analyzed from geometric point of view in subsection 24.3.1. Now the above-mentioned methods will be applied and the same course of solution will be obtained.
After introducing the slack variables and the (primal) simplex method gives the equivalent form (24.38) of the equations and the objective function:
(24.38)
Hence it is clear that the solution and . (24.38) gives the following optimal dual simplex tableaux:
The first two branches were defined by the inequalities and . The second one is an empty branch. The algebraic evidence of this fact is that there is no negative element in the row of , thus it is not possible to find a pivot element for the dual simplex method after introducing the cut. Now it will be shown in a detailed way. Let be the appropriate slack variable, i.e. the cut introduced in the form
The new variable must be expressed by the non-basic variables, i.e. by and :
Hence
When this row is added to the dual simplex tableaux, it is the only row having a negative constant term, but there is no negative coefficient of any non-basic variable proving that the problem is infeasible. Notice that the sign of a coefficient is an immediate consequence of the sign of the coefficient in the row of , i.e. it is not necessary to carry out the calculation of the row of and it is possible to conclude immediately that the branch is empty.
The fractional part equals . Hence the fast bound (24.55) of the lower branch defined by is
It means that the fast upper bound in the branch is 13/2-7/10=5.8. The bound can be rounded down to 5 as the objective function is integer valued.
Let be the slack variable of the cut , i.e. . Hence
If it is added to the simplex tableaux then the pivot element is . After the first pivot step the tableaux becomes optimal. It is
Notice that the optimal value is 5.8, i.e. exactly the same what was provided by the fast bound. The reason is that the fast bound gives the value of the objective function after the first pivot step. In the current case the first pivot step immediately produced the optimal solution of the relaxed problem.
is the only variable having non-integer value in simplex tableaux. Thus the branching must be done according to . The two new cuts defining the branches are and . There are both positive and negative coefficients in the row of , thus both the lower and upper branches exist. Moreover
Thus the continuous upper bound is higher on the upper branch, therefore it is selected first for further branching.
The constraint
are added to the problem. By using the current simplex tableaux the equation
is obtained. It becomes the last row of the simplex tableaux. In the first pivoting step enters the basis and leaves it. The first tableaux is immediately optimal and it is
Here both and are integer variables having non-integer values. Thus branching is possible according to both of them. Notice that the upper branch is empty in the case of , while the lower branch of is empty as well. is selected for branching as it is the variable of the original problem. Now
On the other hand the bound can be improved in accordance with (24.64) as , i.e. the coefficient of may be instead of . It means that the inequality
is claimed instead of
It is transferred to the form
Hence
The improved fast bound is obtained from
It means that the objective function can not be greater than 4. After the first pivoting the simplex tableau becomes
giving the feasible solution and with objective function value 4.
There is only one unfathomed branch which is to be generated from tableaux (24.72) by the constraint . Let be the slack variable. Then the equation
gives the cut
to be added to the tableaux. After two pivoting steps the optimal solution is
Although the optimal solution is not integer, the branch is fathomed as the upper bound is under 5, i.e. the branch can not contain a feasible solution better than the current best known integer solution. Thus the method is finished.
Exercises
24.3-1 Show that the rule of the choice of the integers (24.64) is not necessarily optimal from the point of view of the object function. (Hint. Assume that variable enters into the basis in the first pivoting. Compare the changes in the objective function value if its coefficient is and , respectively.)
One critical point of B&B is the storing of the enumeration tree. When a branch is fathomed then even some of its ancestors can become completely fathomed provided that the current branch was the last unfathomed subbranch of the ancestors. The ancestors are stored also otherwise it is not possible to restore the successor. As B&B uses the enumeration tree on a flexible way, it can be necessary to store a large amount of information on branches. It can causes memory problems. On the other hand it would be too expensive from the point of view of calculations to check the ancestors every time if a branch becomes fathomed. This section gives some ideas how to make a trade-off.
The first thing is to decide is that which data are describing a branch. There are two options. The first one is that all necessary informations are stored for each branch. It includes all the branching defining constraints. In that case the same constraint is stored many times, because a branch on a higher level may have many subbranches. As matter of fact the number of branches is very high in the case of large scale problems, thus the memory required by this solution is very high.
The other option is that only those informations are stored which are necessary to the complete reconstruction of the branch. These ones are
the parent branch, i.e. the branch from which it was generated directly,
the bound of the objective function on the branch,
the index of the branching variable,
the branch defining constraint of the branching variable.
For technical reasons three other attributes are used as well:
a Boolean variable showing if the branch has been decomposed into subbranches,
another Boolean variable showing if any unfathomed subbranch of the branch exists,
and a pointer to the next element in the list of branches.
Thus a branch can be described by a record as follows:
RECORD
BranchBEGIN
Parent : Branch; Bound :INTEGER
; Variable :INTEGER
; Value :INTEGER
; Decomposition :BOOLEAN
; Descendant :BOOLEAN
; suc : BranchEND
;
The value of the Parent attribute is none if and only if the branch is the initial branch, i.e. the complete problem. It is the root of the B&B tree. The reconstruction of the constraints defining the particular branch is the simplest if it is supposed that the branches are defined by the fixing of a free variable. Assume that Node is a variable of type Branch. At the beginning its value is the branch to be reconstructed. Then the algorithm of the reconstruction is as follows.
Branch-Reconstruction
1WHILE
NodeNONE
2DO
[Node.Variable] Node.Value; 3 4 Node Node.Parent; 5RETURN
Node
The value of a previously fixed variable is set to the appropriate value in row 2. Further operations are possible (row 3). Node becomes its own parent branch in row 4. If it is none then the root is passed and all fixings are done.
Sometimes it is necessary to execute some operations on all elements of the list . The suc attribute of the branches point to the next element of the list. The last element has no next element, therefore the value of suc is none in this case. The procedure of changing all elements is somewhat similar to the Branch Reconstruction
procedure. The head of the list is Tree, i.e. the first element of the list is Tree.suc.
B&B-List
1 Node Tree.suc 2WHILE
NodeNONE
3 4 Node Node.suc 5RETURN
Node
The loop runs until there is no next element. The necessary operations are executed in row 3. The variable Node becomes the next element of the list in row 4. To insert a new branch into the list is easy. Assume that it is NewNode of type Branch and it is to be inserted after Node which is in the list. Then the necessary two commands are:
NewNode.suc Node.suc
Node.suc NewNode
If the branches are not stored as objects but they are described in long arrays then the use of attribute suc is superflous and instead of the procedure B&B List
a for
loop can be applied.
The greatest technical problem of B&B from computer science point of view is memory management. Because branches are created in enormous large quantity the fathomed branches must be deleted from the list time to time and the memory occupied by them must be freed. It is a kind of garbage collection. It can be done in three main steps. In the first one value false is assigned to the attribute Descendant of all elements of the list. In the second main step an attribute Descendant is changed to true if and only if the branch has unfathomed descendant(s). In the third step the unnecessary branches are deleted. It is assumed that there is a procedure Out which gets the branch to be deleted as a parameter and deletes it and frees the part of the memory.
Garbage-Collection
1 Node Tree.suc 2WHILE
NodeNONE
3 Node.DescendantFALSE
4 Node Node.suc 5 Node Tree.suc 6WHILE
NodeNONE
7DO/TTBF TTBFIF NOT
Node.DecompositionAND
Node.Bound > 8THEN
Pont Node.Parent 9WHILE
PontNONE DO
10 Pont.DescendantTRUE
11 Pont Pont.Parent 12 Node Node.suc 13 Node Tree.suc 14WHILE
NodeNONE DO
15 Pont Node.suc 16IF/TTBF (TTBFNOT
Node.DescendantAND/TTBF NODE.DECOMPOSITION) TTBFOR
Node.Bound 17THEN
Out(Node) 18 Node Pont 19RETURN
The method can be sped up by using information provided by further algorithmic tools.
The aim of the application of heuristics methods is to obtain feasible solutions. From theoretical point of view to decide if any feasible solution exists is NP-complete as well. On the other hand heuristics can produce feasible solutions in the case of the majority of the numerical problems. The methods to be applied depend on the nature of the problem in question, i.e. pure binary, bounded integer, mixed integer problems may require different methods. For example for pure integer problems local search and Lagrange multipliers can work well. Lagrange multipliers also provide upper bound (in the case of maximization) of the optimal value.
If a feasible solution is known then it is immediately possible to disregard branches based on their bounds. See row 12 of algorithm Branch and Bound
. There the branches having not good enough bounds are automatically eliminated. In the case of pure binary problem an explicit objective function constraint can give a lot of consequences as well.
Preprocessing means to obtain information on variables and constraints based on algebraic constraints and integrality.
For example if the two constraints of problem (24.36) are summed up then the inequality
is obtained implying that .
Let
be one of the constraints of problem (24.14)-(24.16). Many tests can be based on the following two easy observations:
If the maximal value of the left-hand side of (24.73) of is not greater than the right-hand side of (24.73) then the constraint is redundant.
If the minimal value of the left-hand side of (24.73) if is greater than the right-hand side of (24.73) then it is not possible to satisfy the constraint, i.e. the problem (24.14)-(24.16) has no feasible solution.
If under some further restriction the second observation is true then the restriction in question can be excluded. A typical example is that certain variables are supposed to have maximal/minimal possible value. In this way it is possible to fix a variable or decrease its range.
Lagrange relaxation can be used to fix some variables, too. Assume that the optimal value of Problem (24.22) and (24.16) is under the further condition that must take the value . If is the objective function value of a known feasible solution and then can not take value . Further methods are assuming that the LP relaxation of the problem is solved and based on optimal dual prices try to fix the values of variables.
Branch and Cut (B&C) in the simplest case is a B&B method such that the a certain kind of information is collected and used during the whole course of the algorithm. The theoretical background is based on the notion of integer hull
be a polyhedral set where is an matrix, and are and dimensional vectors. All elements of and are rationals. The convex hull of the integer points of is called the integer hull of , i.e. it is the set
The integer hull of the polyhedral set of problem (24.36) is the convex hull of the points (0,0), (0,3), (1,2), and (1,1) as it can be seen on Figure 24.2. Thus the description of the integer hull as a polyhedral set is the inequality system:
Under the conditions of the definition the integer hull is a polyhedral set, too. It is a non-trivial statement and in the case of irrational coefficients it can be not true. If the integer hull is known, i.e. a set of linear inequalities defining exactly the integer hull polyhedral set is known, then the integer programming problem can be reduced to a linear programming problem. Thus problem (24.36) is equivalent to the problem
As the linear programming problem easier to solve than the integer programming problem, one may think that it worth to carry out this reduction. It is not completely true. First of all the number of the linear constraint can be extremely high. Thus generating all constraints of the integer hull can be more difficult than the solution of the original problem. Further on the constraints determining the shape of the integer hull on the side opposite to the optimal solution are not contributing to the finding of the optimal solution. For example the optimal solution of (24.74) will not change if the first constraint is deleted and it is allowed both and may take negative values.
On the other hand the first general integer programming method is the cutting plane method of Gomory. Its main tool is the cut which is based on the observation that possible to determine linear inequalities such that they cut the non-integer optimal solution of the current LP relaxation, but they do not cut any integer feasible solution. A systematic generation of cuts leads to a finite algorithm which finds an optimal solution and proves its optimality if optimal solution exist, otherwise it proves the non-existence of the optimal solution. From geometrical point of view the result of the introducing of the cuts is that the shape of the polyhedral set of the last LP relaxation is very similar to the integer hull in the neighborhood of the optimal solution.
There is the generalization of Gomory's cut called Chvátal (or Chvátal-Gomory) cut. If the two inequalities of (24.36) are summed such that both have weight then the constraint
is obtained. As must be integer the inequality
follows immediately. It is not an algebraic consequence of the original constraints. To obtain it the information of the integrality of the variables had to be used. But the method can be continued. If (24.75) has weight and the second constraint of (24.36) has weight then
is obtained implying
If the last inequality has weight and the first inequality of (24.36) has weight then the result is
implying
Finally the integer hull is obtained. In general the idea is as follows. Assume that a polyhedral set is defined by the linear inequality system
Let be a vector such that is an integer vector and is a noninteger value. Then
is a valid cut, i.e. all integer points of the polyhedral set satisfy it. As a matter of fact it can be proven that a systematic application of the method creates a complete description of the integer hull after finite many steps.
The example shows that Gomory and Chvátal cuts can help to solve a problem. On the other hand they can be incorporated in a B&B frame easily. But in the very general case it is hopeless to generate all effective cuts of this type.
The situation is significantly different in the case of many combinatorial problems. There are many theoretical results known on the type of the facet defining constraints of special polyhedral sets. Here only one example is discussed. It is the Traveling Salesperson Problem (TSP). A salesman must visit some cities and at the end of the tour he must return to his home city. The problem is to find a tour with minimal possible length. TSP has many applications including cases when the “cities” are products or other objects and the “distance” among them doesn't satisfy the properties of the geometric distances, i.e. symmetry and triangle inequality may be violated.
The first exact mathematical formulation of the problem is the so-called Dantzig-Fulkerson-Johnson (DFJ) model. DFJ is still the basis of the numerical solutions. Assume that the number of cities is . Let the distance of the route from city to city . DFJ uses the variables such that
The objective function is the minimization on the total travel length:
The set of the constraints consists of three parts. The meaning of the first part is that the salesman must travel from each city to another city exactly once:
The second part is very similar. It claims that the salesman must arrive to each city from somewhere else again exactly once:
Constraints (24.78) and (24.79) are the constraints of an assignment problem. Taking into account that the variables must be binary Problem (24.77)-(24.79) is really an assignment problem. They don't exclude solutions consisting of several smaller tours. For example if and and then all other variables must be zero. The solution consists of two smaller tours. The first one visits only cities 1, 2, and 3, the second one goes through the cities 4, 5, and 6. The small tours are called subtours in the language of the theory of TSP.
Thus further constraints are needed which excludes the subtours. They are called subtour elimination constraints. There are two kinds of logic how the subtours can be excluded. The first one claims that in any subset of the cities which has at least two elements but not the complete set of the cities the number of travels must be less than the number of elements of the set. The logic can be formalized as follows:
The other logic claims that the salesman must leave all such sets. Let . Then the subtour elimination constraints are the inequalities
The numbers of the two types of constraints are equal and exponential. Although the constraints (24.78)–(24.80) or (24.78), (24.79), and (24.81) are satisfied by only binary vectors being characteristic vectors of complete tours but the polyhedral set of the LP relaxation is strictly larger than the integer hull.
On the other hand it is clear that it is not possible to claim all of the subtour elimination constraints in the real practice. What can be done? It is possible to claim only the violated once. The difficulty is that the optimal solution of the LP relaxation is a fractional vector in most of the cases and that subtour elimination constraint must be found which is violated by the fractional solution provided that such constraint exists as the subtour elimination constraints are necessary to the description of the integer hull but further constraints are needed, too. Thus it is possible that there is no violated subtour elimination constraint but the optimal solution of the LP relaxation is still fractional.
To find a violated subtour elimination constraint is equivalent to the finding of the absolute minimal cut in the graph which has only the edges having positive weights in the optimal solution of the relaxed problem. If the value of the absolute minimal cut is less than 1 in the directed case or less than 2 in the non-directed case then such a violated constraint exists. The reason can be explained based on the second logic of the constraints. If the condition is satisfied then the current solution doesn't leaves at least one of the two sets of the cut in enough number. There are many effective methods to find the absolute minimal cut.
A general frame of the numerical solution of the TSP is the following. In a B&B frame the calculation of the lower bound is repeated until a new violated subtour elimination constraint is obtained, that is the new inequality is added to the relaxed problem and the LP optimization is carried out again. If all subtour elimination constraints are satisfied and the optimal solution of the relaxed problem is still non-integer then branching is made according to a fractional valued variable.
The frame is rather general. The violated constraint cuts the previous optimal solution and reoptimization is needed. Gomory cuts do the same for the general integer programming problem. In the case of other combinatorial problems special cuts may work if the description of the integer hull is known.
Thus the general idea of B&C is that a cut is generated until it can be found and the improvement in the lower bound is great enough. Otherwise branching is made by a non-integer variable. If the cut generation is made only at the root of the enumeration tree then the name of the method is Cut and Branch (C&B). If a cut is generated in a branch then it is locally valid in that branch and in its successors. The cuts generated at the root are valid globally, i.e. in all branches. In some cases, e.e. in binary optimization, it is possible to modify it such that it is valid in the original problem, too.
For practical reasons the type of the generated cut can be restricted. It is the case in TSP as the subtour elimination constraints can be found relatively easily.
The Branch and Price method is the dual of B&C in a certain sense. If a problem has very large number of variables then it is often possible not to work explicitely with all of them but generate only those which may enter the basis of the LP relaxation. This is column generation and is based on the current values of the dual variables called shadow prices. Similarly to B&C the type of the generated columns is restricted. If it is not possible to find a new column then branching is made.
Problems |
24-1
Continuous Knapsack Problem
Prove Theorem 24.1. (Hint. Let be a feasible solution such that there are two indices, say and , such that and , and . Show that the solution can be improved.)
24-2
TSP's relaxation
Decide if the Assignment Problem can be a relaxation of the Traveling Salesperson Problem in the sense of definition 24.5. Explain your solution regardless that your answer is YES or NO.
24-3
Infeasibility test
Based on the the second observation of Subsection 24.5.2 develop a test for the infeasibility of a linear constraint of binary variables.
24-4
Mandatory fixing
Based on the previous problem develop a test for the mandatory fixing of binary variables satisfying a linear constraint.
Chapter Notes |
The principle of B&B first appears in [158]. It solves problems with bounded integer variables. The fast bounds were introduced in [22] and [272]. A good summary of the bounds is [80]. To the best knowledge of the author of this chapter the improvement of the fast bounds appeared first in [281].
B&B can be used as an approximation scheme, too. In that case a branch can be deleted even in the case if its bound is not greater than the objective function value of the current best solution plus an allowed error. [122] showed that there are classes such that the approximate method requires more computation than to solve the problem optimally. B&B is very suitable for parallel processing. This issue is discussed in [43].
Based on the theoretical results of [167] a very effective version of B&C method was developed for pure binary optimization problem by [252] and independently [14]. Especially Egon Balas and his co-authors could achieve a significant progress. Their method of lifting cuts means that a locally generated cut can be made globally valid by solving a larger LP problem and modify the cut according to its optimal solution.
The first integer programming method to solve an IP problem with general, i.e. non-bounded, integer variables is Ralph Gomory's cutting plane method [93]. In a certain sense it is still the only general method. Strong cuts of integer programming problems are discussed in [15]. The surrogate constraint (24.18) has been introduced by [92]. The strength of the inequality depends on the choice of the multipliers . A rule of thumb is that the optimal dual variables of the continuous problem give a strong inequality provided that the original problem is linear.
The DFJ model of TSP appeared in [60]. It was not only an excellent theoretical result, but is also an enormous computational effort as the capacities and speed of that time computers were far above the recent ones. One important cut of the TSP polyhedral set is the so-called comb inequality. The number of edges of a complete tour is restricted in a special subgraph. The subgraph consists of a subset of cities called hand and odd number of further subsets of cities intersecting the hand. They are called teeth and their number must be at least three. Numerical problems of TSP are exhaustively discussed in [240].
A good summary of Branch and Price is [20].
The European Union and the European Social Fund under the grant agreement no. TÁMOP 4.2.1/B-09/1/KMR-2010-0003.
Table of Contents
In practice often appears the problem, how to rank different objects. Researchers of these problems frequently mention different applications, e.g. in biology Landau [159], in chemistry Hakimi [100], in networks Kim, Toroczkai, Miklós, Erdős, and Székely [147], Newman and Barabási [193], in comparison based decision making Bozóki, Fülöp, Kéri, Poesz, Rónyai and [34], [35], [145], in sports Iványi, Lucz, Pirzada, Sótér and Zhou [127], [128], [131], [129], [130], [168], [217], [231], [260].
A popular method is the comparison of two—and sometimes more—objects in all possible manner and distribution some amount of points among the compared objects.
In this chapter we introduce a general model for such ranking and study some connected problems.
Let , be positive integers, , and vectors of nonnegative integers with and .
An -supertournament is an sized matrix , whose columns correspond to the players of the tournament (they represent the rankable objects) and the rows correspond to the comparisons of the objects. The permitted elements of belong to the set , where means, that the player is not a participants of the match corresponding to the -th line, while means, that received points in the match corresponding to the -th line, and .
The sum (dots are taken in the count as zeros) of the elements of the -th column of is denoted by and is called the score of the th player :
The sequence is called the score vector of the tournament. The increasingly ordered sequence of the scores is called the score sequence of the tournament and is denoted by .
Using the terminology of the sports a supertournament can combine the matches of different sports. For example in Hungary there are popular chess-bridge, chess-tennis and tennis-bridge tournaments.
A sport is characterized by the set of the permitted results. For example in tennis the set of permitted results is , for chess is the set , for football is the set and in the Hungarian card game last trick is . There are different possible rules for an individual bridge tournament, e.g. .
The number of participants in a match of a given sport is denoted by , the minimal number of the distributed points in a match is denoted by , and the maximal number of points is denoted by .
If a supertournament consists of only the matches of one sport, then we use and instead of vectors , and and omit the parameter . When the number of the players is not important, then the parameter is also omitted.
If the points can be divided into arbitrary integer partitions, then the given sport is called complete, otherwise it is called incomplete. According to this definitions chess is a complete (2,2)-sport, while football is an incomplete (2,3)-sport.
Since a set containing elements has -element subsets, an -tournament consists of matches. If all matches are played, then the tournament is finished, otherwise it is partial.
In this chapter we deal only with finished tournaments and mostly with complete tournaments (exception is only the section on football).
Figure 25.1 contains the results of a full and complete chess+last trick+bridge supertournament. In this example , , and . In this example the score vector of the given supertournament is , and its score sequence is .
In this chapter we investigate the problems connected with the existence and construction of different types of supertournaments having prescribed score sequences.
At first we give an introduction to -tournaments (Section 25.2), then summarize the results on (1,1)-tournaments (Section 25.3), then for -tournaments (Section 25.4) and for general -tournaments (Section 25.5).
In Section 25.6 we deal with imbalance sequences, and in Section 25.7 with supertournaments. In Section 25.8 we investigate special incomplete tournaments (football tournaments) and finally in Section 25.9 we consider examples of the reconstruction of football tournaments.
Exercises
25.1-1 Describe known and possible multitournaments.
25.1-2 Estimate the number of given types of multitournaments.
Let and be nonnegative integers and let be the set of such generalized tournaments, in which every pair of distinct players is connected by at least , and at most arcs. The elements of are called -tournaments. The vector of the outdegrees of is called the score vector of . If the elements of are in nondecreasing order, then is called the score sequence of .
An arbitrary vector of nonnegative integers is called multigraphic vector, (or degree vector) if there exists a loopless multigraph whose degree vector is , and is called dimultigraphic vector (or score vector) iff there exists a loopless directed multigraph whose outdegree vector is .
A nondecreasingly ordered multigraphic vector is called multigraphic sequence, and a nondecreasingly ordered dimultigraphic vector is called dimultigraphic sequence (or score sequence).
In there exists a simple graph, resp. a simple digraph with degree/out-degree sequence , then is called simply graphic, resp. digraphic.
The number of arcs of going from player to player is denoted by , and the matrix is called the point matrix or tournament of the .
In the last sixty years many efforts have been devoted to the study of both types of vectors, resp. sequences. E.g. in the papers [27], [71], [86], [96], [100], [103], [101], [109], [130], [131], [129], [139], [204], [250], [253], [260], [262], [276], [285] the multigraphic sequences, while in the papers [1], [12], [16], [27], [41], [78], [90], [91], [95], [98], [103], [111], [148], [149], [159], [171], [183], [184], [185], [189], [190], [197], [198], [199], [211], [236], [239], [243], [279], [284], [289] the dimultigraphic sequences have been discussed.
Even in the last two years many authors investigated the conditions when is multigraphical (e.g. [21], [31], [40], [54], [82], [83], [87], [84], [116], [121], [136], [147], [150], [151], [174], [178], [213], [216], [241], [274], [280], [282], [286], [287], [292]) or dimultigraphical (e.g. [23], [105], [127], [144], [152], [166], [188], [206], [228], [227], [229], [230], [245], [247], [270], [294]).
It is worth to mention another interesting direction of the study of different kinds of tournament, the score sets [217].
In this chapter we deal first of all with directed graphs and usually follow the terminology used by K. B. Reid [237], [239]. If in the given context and are fixed or non important, then we speak simply on tournaments instead of generalized or -tournaments.
The first question is: how one can characterize the set of the score sequences of the -tournaments? Or, with other words, for which sequences of nonnegative integers does exist an -tournament whose outdegree sequence is . The answer is given in Section 25.5.
If is an -tournament with point matrix , then let and be defined as follows: , , and . Let denote the set of all tournaments having as outdegree sequence, and let and be defined as follows: , , and . In the sequel we use the short notations , and .
Hulett, Will, and Woeginger [121], [288], Kapoor, Polimeni, and Wall [138], and Tripathi et al. [277], [274] investigated the construction problem of a minimal size graph having a prescribed degree set [233], [291]. In a similar way we follow a mini-max approach formulating the following questions: given a sequence of nonnegative integers,
How to compute and how to construct a tournament characterized by ? In Subsection 25.5.3 a formula to compute , and in 25.5.4 an algorithm to construct a corresponding tournament is presented.
How to compute and ? In Subsection 25.5.4 we characterize and , and in Subsection 25.5.5 an algorithm to compute and is described, while in Subsection 25.5.8 we compute and in linear time.
How to construct a tournament characterized by and ? In Subsection 25.5.10 an algorithm to construct a corresponding tournament is presented and analyzed.
We describe the proposed algorithms in words, by examples and by the pseudocode used in [57].
The simplest supertournament is the classical tournament, in our notation the -tournament.
Now, we give the characterization of score sequences of tournaments which is due to Landau [159]. This result has attracted quite a bit of attention as nearly a dozen different proofs appear in the literature. Early proofs tested the readers patience with special choices of subscripts, but eventually such gymnastics were replaced by more elegant arguments. Many of the existing proofs are discussed in a survey written by K. Brooks Reid [236]. The proof we give here is due to Thomassen [270]. Further, two new proofs can be found in the paper due to Griggs and Reid [95].
Theorem 25.1 (Landau [159]) A sequence of nonnegative integers is the score vector of a -tournament if and only if for each subset
with equality, when .
This theorem, called Landau theorem is a nice necessary and sufficient condition, but its direct application can require the test of exponential number of subsets.
If instead of the nonordered vector we consider a nondecreasingly ordered sequence , then due to the monotonity the inequalities (25.2), called Landau inequalities, we get the following consequence.
Corollary 25.2 (Landau [159]) A nondecreasing sequence of nonnegative integers is the score sequence of some -tournament, if and only if
for , with equality for .
Proof. Necessity If a nondecreasing sequence of nonnegative integers is the score sequence of an -tournament , then the sum of the first scores in the sequence counts exactly once each arc in the subtournament induced by plus each arc from to . Therefore the sum is at least , the number of arcs in . Also, since the sum of the scores of the vertices counts each arc of the tournament exactly once, the sum of the scores is the total number of arcs, that is, .
Sufficiency (Thomassen [270]) Let be the smallest integer for which there is a nondecreasing sequence of nonnegative integers satisfying Landau's conditions (25.3), but for which there is no -tournament with score sequence . Among all such , pick one for which is as lexicografically small as possible.
First consider the case where for some ,
By the minimality of , the sequence is the score sequence of some tournament . Further,
for each , with the equality when . Therefore, by the minimality of , the sequence is the score sequence of some tournament . By forming the disjoint union of and and adding all arcs from to , we obtain a tournament with score sequence .
Now, consider the case where each inequality in (25.3) is strict when (in particular ). Then the sequence satisfies (25.3) and by the minimality of , is the score sequence of some tournament . Let and be the vertices with scores and respectively. Since the score of is larger than that of , then according to Lemma 25.5 has a path from to of length . By reversing the arcs of , we obtain a tournament with score sequence , a contradiction.
Landau's theorem is the tournament analog of the Erdős-Gallai theorem for graphical sequences [71]. A tournament analog of the Havel-Hakimi theorem [102], [109] for graphical sequences is the following result.
Theorem 25.3 (Reid, Beineke [238]) A nondecreasing sequence of nonnegative integers, , is the score sequence of an -tournament if and only if the new sequence
arranged in nondecreasing order, is the score sequence of some -tournament.
Proof. See [238].
For the -tournament Moon [184] proved the following extension of Landau's theorem.
Theorem 25.4 (Moon [184], Kemnitz, Dulff [144]) A nondecreasing sequence of nonnegative integers is the score sequence of an -tournament if and only if
for , with equality for .
Later Kemnitz and Dulff [144] reproved this theorem.
The proof of Kemnitz and Dulff is based on the following lemma, which is an extension of a lemma due to Thomassen [270].
Lemma 25.5 (Thomassen [270]) Let be a vertex of maximum score in an -tournament . If is a vertex of different from , then there is a directed path from to of length at most 2.
Proof. ([144]) Let be all vertices of such that . If then for the length of path . Otherwise if there exists a vertex , such that then . If for all then there are arcs which implies , a contradiction to the assumption that has maximum score.
Proof of Theorem 25.4. The necessity of condition (25.7) is obvious since there are arcs among any vertices and there are arcs among all vertices.
To prove the sufficiency of (25.7) we assume that the sequence is a counterexample to the theorem with minimum and smallest with that choice of . Suppose first that there exists an integer , such that
Because the minimality of , the sequence is the score sequence of some -tournament .
Consider the sequence defined by , . Because of by assumption it follows that
which implies . Since is nondecreasing also is a nondecreasing sequence of nonnegative integers.
For each with it holds that
with equality for since by assumption
Therefore the sequence fulfils condition (25.8), by the minimality of , is the score sequence of some -tournament . By forming the disjoint union of and we obtain a -tournament with score sequence in contradiction to the assumption that is counterexample.
Now we consider the case when the inequality in condition (25.8) is strict for each , . This implies in particular .
The sequence is a nondecreasing sequence of nonnegative integers which fulfils condition (25.8). Because of the minimality of , is the score sequence of some -tournament . Let denote a vertex of with score and a vertex of with score . Since has maximum score in there is a directed path from to of length at most 2 according to Lemma 25.5. By reversing the arcs of the path we obtain an -tournament with score sequence . This contradiction completes the proof.
In this section we show that for arbitrary prescribed sequence of nondecreasingly ordered nonnegative integers there exists an -tournament
Since the numbers of points are not limited, it is easy to construct a -tournament for any .
Lemma 25.6 If , then for any vector of nonnegative integers there exists a loopless directed multigraph with outdegree vector so, that .
Proof. Let and for , and let the remaining values be equal to zero.
Using weighted graphs it would be easy to extend the definition of the -tournaments to allow arbitrary real values of , , and . The following algorithm, Naive-Construct
works without changes also for input consisting of real numbers.
We remark that Ore in 1956 [197], [198], [199] gave the necessary and sufficient conditions of the existence of a tournament with prescribed indegree and outdegree vectors. Further Ford and Fulkerson [78][Theorem11.1] published in 1962 necessary and sufficient conditions of the existence of a tournament having prescribed lower and upper bounds for the indegree and outdegree of the vertices. Their results also can serve as basis of the existence of a tournament having arbitrary outdegree sequence.
Sorting of the elements of is not necessary.
Input. : the number of players ;
: arbitrary sequence of nonnegative integer numbers.
Output. : the point matrix of the reconstructed tournament.
Working variables. : cycle variables.
Naive-Construct
1FOR
TO
2FOR
TO
3 4 5FOR
TO
6 7RETURN
The running time of this algorithm is in worst case (in best case too). Since the point matrix has elements, this algorithm is asymptotically optimal.
This is also an easy question. From now on we suppose that is a nondecreasing sequence of nonnegative integers, that is . Let .
Since is a finite set for any finite score vector , exists.
Lemma 25.7 (Iványi [127]) If , then for any sequence there exists a -tournament such that
and is the smallest upper bound for , and is the smallest possible upper bound for .
Proof. If all players gather their points in a uniform as possible manner, that is
then we get , that is the bound is valid. Since player has to gather points, the pigeonhole principle [24], [25], [64] implies , that is the bound is not improvable. implies . The score sequence shows, that the upper bound is not improvable.
Corollary 25.8 (Iványi [128]) If , then for any sequence holds .
Proof. According to Lemma 25.7 is the smallest upper bound for .
The following algorithm constructs a -tournament having for any .
Input. : the number of players ;
: arbitrary sequence of nonnegative integer numbers.
Output. : the point matrix of the tournament.
Working variables. : cycle variables;
: the number of the “larger part” in the uniform distribution of the points.
Pigeonhole-Construct
1FOR
TO
2 3 4FOR
TO
5 6 7FOR
TO
8 9 10RETURN
The running time of Pigeonhole-Construct
is in worst case (in best case too). Since the point matrix has elements, this algorithm is asymptotically optimal.
Let be the sum of the first elements of . be the binomial coefficient . Then the players together can have points only if . Since the score of player is , the pigeonhole principle implies .
These observations result the following lower bound for :
If every player gathers his points in a uniform as possible manner then
These observations imply a useful characterization of .
Lemma 25.9 (Iványi [127]) If , then for arbitrary sequence there exists a -tournament having as its outdegree sequence and the following bounds for and :
Proof. (25.15) follows from (25.13) and (25.14), (25.16) follows from the definition of .
It is worth to remark, that if is integer and the scores are identical, then the lower and upper bounds in (25.15) coincide and so Lemma 25.9 gives the exact value of .
In connection with this lemma we consider three examples. If , then and , that is is twice larger than . In the other extremal case, when and , then , , so is times larger, than .
If , then Lemma 25.9 gives the bounds . Elementary calculations show that Figure 25.2 contains the solution with minimal , where .
In 2009 we proved the following assertion.
Theorem 25.10 (Iványi [127]) For a nondecreasing sequence of nonnegative integers is the score sequence of some -tournament if and only if
where
The theorem was proved by Moon [184], and later by Kemnitz and Dolff [144] for -tournaments is the special case of Theorem 25.10. Theorem 3.1.4 of [133] is the special case . The theorem of Landau [159] is the special case of Theorem 25.10.
The following algorithm Interval-Test
decides whether a given is a score sequence of an -tournament or not. This algorithm is based on Theorem 25.10 and returns if is a score sequence, and returns otherwise.
Input. : minimal number of points divided after each match;
: maximal number of points divided after each match.
Output. : logical variable ( shows that is an -tournament).
Local working variable. : cycle variable;
: the sequence of the values of the loss function.
Global working variables. : the number of players ;
: a nondecreasing sequence of nonnegative integers;
: the sequence of the binomial coefficients;
: the sequence of the sums of the smallest scores.
Interval-Test
1FOR
TO
2 3IF
4 5RETURN
6IF
7 8RETURN
9RETURN
In worst case Interval-Test
runs in time even in the general case (in the best case the running time of Interval-Test
is ). It is worth to mention, that the often referenced Havel–Hakimi algorithm [100], [109] even in the special case decides in time whether a sequence is digraphical or not.
The following algorithm is based on the bounds of and given by Lemma 25.9 and the logarithmic search algorithm described by D. E. Knuth [152][page 410].
Input. No special input (global working variables serve as input).
Output. : (the minimal );
: (the maximal ).
Local working variables. : cycle variable;
: lower bound of the interval of the possible values of ;
: upper bound of the interval of the possible values of .
Global working variables. : the number of players ;
: a nondecreasing sequence of nonnegative integers;
: the sequence of the binomial coefficients;
: the sequence of the sums of the smallest scores;
: logical variable (its value is , when the investigated is a score sequence).
MinF-MaxG
1 Initialization 2FOR
TO
3 4 5 6 7 Computation of 8Interval-Test
9IF
10 11GO TO
21 12 13Interval-Test
14IF
15GO TO
17 16 17IF
18 19GO TO
37 20GO TO
14 21 Computation of 22 23Interval-Test
24IF
25 26GO TO
37 27 28Interval-Test
29IF
30 31GO TO
33 32 33IF
34 35GO TO
37 36GO TO
27 37RETURN
MinF-MaxG
determines and .
Lemma 25.11 (Iványi [128]) Algorithm MinG-MaxG
computes the values and for arbitrary sequence in time.
Proof. According to Lemma 25.9 is an element of the interval and is an element of the interval . Using Theorem B of [152][page 412] we get that calls of Interval-Test
is sufficient, so the run time of Interval-Test
implies the required running time of MinF-MaxG
.
Analyzing Theorem 25.10 and the work of algorithm MinF-MaxG
one can observe that the maximal value of and the minimal value of can be computed independently by the following Linear-MinF-MaxG
.
Input. No special input (global working variables serve as input).
Output. : (the minimal ).
: (the maximal ).
Local working variables. : cycle variable.
Global working variables. : the number of players ;
: a nondecreasing sequence of nonnegative integers;
: the sequence of the binomial coefficients;
: the sequence of the sums of the smallest scores.
Linear-MinF-MaxG
1 Initialization 2FOR
TO
3 4 5 6 7FOR
TO
Computation of 8 9IF
10 11FOR
TO
Computation of 12 13 14IF
15 16RETURN
Lemma 25.12 Algorithm Linear-MinG-MaxG
computes the values and for arbitrary sequence in time.
Proof. Lines 01, 05, 06, and 16 require only constant time, lines 02–06, 07–10, and 11–15 require time, so the total running time is .
The following reconstruction algorithm Score-Slicing2
is based on balancing between additional points (they are similar to “excess”, introduced by Brauer et al. [36]) and missing points introduced in [127]. The greediness of the algorithm Havel–Hakimi [100], [109] also characterizes this algorithm.
This algorithm is an extended version of the algorithm Score-Slicing
proposed in [127].
The work of the slicing program is managed by the following program Mini-Max
.
Input. : the number of players ;
: a nondecreasing sequence of integers satisfying (25.17).
Output. : the point matrix of the reconstructed tournament.
Local working variables. : cycle variables.
Global working variables. : provisional score sequence;
: the partial sums of the provisional scores;
: matrix of the provisional points.
Mini-Max
1MinF-MaxG
Initialization 2 3 4FOR
TO
5FOR
TO
6 7FOR
TO
8 9 10IF
Score slicing for players 11FOR
DOWNTO
12Score-Slicing2
13IF
Score slicing for 2 players 14 15 16RETURN
The key part of the reconstruction is the following algorithm Score-Slicing2
[127].
During the reconstruction process we have to take into account the following bounds:
Input. : the number of the investigated players ;
: prefix of the provisional score sequence ;
: matrix of provisional points;
Output. : number of missing points
: prefix of the provisional score sequence.
Local working variables. the number of the additional points;
: missing points: the difference of the number of actual points and the number of maximal possible points of ;
: difference of the maximal decreasable score and the following largest score;
: number of sliced points per player;
: frequency of the number of maximal values among the scores ;
: cycle variables;
: maximal amount of sliceable points;
: the sums of the provisional scores;
: the maximal index with and .
Global working variables: : the number of players ;
: the sequence of the binomial coefficients;
: minimal number of points distributed after each match;
: maximal number of points distributed after each match.
Score-Slicing2
1 2FOR
TO
Initialization 3 4 5 6WHILE
AND
There are missing and additional points 7 8WHILE
9 10 11WHILE
12 13 14 15FOR
DOWNTO
16 17 18 19 20 21FOR
DOWNTO
22 23WHILE
No missing points 24 25 26 27 28 29RETURN
Let us consider an example. Figure 25.3 shows the point table of a -tournament . We remark that the termin point table is used as a synonym of the termin point matrix.
The score sequence of is . In [127] the algorithm Score-Slicing
resulted the point table reprezented in Figure 25.4.
The algorithm Mini-Max
starts with the computation of . MinF-MaxG
called in line 01 begins with initialization, including provisional setting of the elements of so, that , if , and otherwise. Then MinF-MaxG
sets the lower bound of in line 07 and tests it in line 10 Interval-Test
. The test shows that is large enough so Mini-Max
sets in line 12 and jumps to line 23 and begins to compute . Interval-Test
called in line 25 shows that is too large, therefore MinF-MaxG
continues with the test of in line 30. The result is positive, therefore comes the test of , then the test of . Now in line 35, so is fixed, and the control returns to line 02 of Mini-Max
.
Lines 02–09 contain initialization, and Mini-Max
begins the reconstruction of a -tournament in line 10. The basic idea is that Mini-Max
successively determines the won and lost points of , , and by repeated calls of Score-Slicing2
in line 12, and finally it computes directly the result of the match between and .
At first Mini-Max
computes the results of calling calling Score-Slicing2
with parameter . The number of additional points of the first five players is according to line 03, the number of missing points of is according to line 04. Then Score-Slicing2
determines the number of maximal numbers among the provisional scores ( according to lines 09–14) and computes the difference between and ( according to line 12). In line 13 we get, that points are sliceable, and gets these points in the match with in line 16, so the number of missing points of decreases to (line 19) and the number of additional point decreases to . Therefore the computation continues in lines 22–27 and and will be decreased by 1 resulting and as the seventh line and seventh column of Figure 25.5 show. The returned score sequence is .
In the second place Mini-Max
calls Score-Slicing2
with parameter , and get and . At first gets point, then and get both 4 points, reducing to and to . The computation continues in line 22 and results the further decrease of , , , and by 1, resulting , , , and as the sixth row of Figure 25.5 shows.
In the third place Mini-Max
calls Score-Slicing2
with parameter , and get and . At first gets points, then further 1 point, and and also both get 1 point, resulting , , , , and , further and . The computation continues in lines 22–27 and results a decrease of by 1 point resulting , , and , as the fifth row and fifth column of Figure 25.5 show. The returned score sequence is .
In the fourth place Mini-Max
calls Score-Slicing2
with parameter , and gets and . At first and get points, resulting , and , and , and . Then MINI-MAX
sets in lines 23–26 and . The returned point vector is .
Finally Mini-Max
sets and in lines 14–15 and returns the point matrix represented in Figure 25.5.
The comparison of Figures 25.4 and 25.5 shows a large difference between the simple reconstruction of Score-Slicing2
and the minimax reconstruction of Mini-Max
: while in the first case the maximal value of is and the minimal value is , in the second case the maximum equals to and the minimum equals to , that is the result is more balanced (the given does not allow to build a perfectly balanced -tournament).
The main result of this paper is the following assertion.
Theorem 25.13 (Iványi [128]) If is a positive integer and is a nondecreasing sequence of nonnegative integers, then there exist positive integers and , and a -tournament with point matrix such that
for any -tournament, and algorithm Linear-MinF-MaxG
computes and in time, and algorithm Mini-Max
generates a suitable in time.
Proof. The correctness of the algorithms Score-Slicing2
, MinF-MaxG
implies the correctness of Mini-Max
.
Lines 1–46 of Mini-Max
require uses of MinG-MaxF
, and one search needs steps for the testing, so the computation of and can be executed in times.
The reconstruction part (lines 47–55) uses algorithm Score-Slicing2
, which runs in time [127]. Mini-Max
calls Score-Slicing2
times with , so finishes the proof.
The interesting property of and is that they can be determined independently (and so there exists a tournament having both extremal features) is called linking property. One of the earliest occurrences appeared in a paper of Mendelsohn and Dulmage [175]. It was formulated by Ford and Fulkerson [78][page 49] in a theorem on the existence of integral matrices for which the row-sums and the column-sums lie between specified bounds. The concept was investigated in detail in the book written by Mirsky [179]. A. Frank used this property in the analysis of different problems of combinatorial optimization [81], [85].
A -tournament is a digraph in which multiarcs multiarcs are permitted, and which has no loops [97].
At first we consider the special case , then the -tournaments.
A -tournament is a directed graph (shortly digraph) without loops and without multiple arcs, is also called simple digraph [97]. The imbalance of a vertex in a digraph is (or simply ) , where and are respectively the outdegree and indegree of . The imbalance sequence of a simple digraph is formed by listing the vertex imbalances in nonincreasing order. A sequence of integers with is feasible if the sum of its elements is zero, and satisfies , for .
The following result provides a necessary and sufficient condition for a sequence of integers to be the imbalance sequence of a simple digraph.
Theorem 25.14 (Mubayi, Will, West [187]) A sequence is realizable as an imbalance sequence of a -tournament if and only if it is feasible.
The above result is equivalent to saying that a sequence of integers with is an imbalance sequence of a -tournament if and only if
for with equality when .
On arranging the imbalance sequence in nondecreasing order, we have the following observation.
Corollary 25.15 A sequence of integers with is an imbalance sequence of a -tournament if and only if
for , with equality when .
Various results for imbalances in different tournaments can be found in [127], [128], [223], [224].
A -tournament is a digraph in which multiarcs are permitted, and which has no loops [97]. If then a -tournament is an orientation of a simple multigraph and contains at most edges between the elements of any pair of distinct vertices. Let be a -tournament with vertex set , and let and respectively denote the outdegree and indegree of vertex . Define (or simply ) as imbalance of . Clearly, . The imbalance sequence of is formed by listing the vertex imbalances in nondecreasing order.
We remark that -digraphs are special cases of -digraphs containing at least and at most edges between the elements of any pair of vertices. Degree sequences of -tournaments have been studied by Mubayi, West, Will [187] and Pirzada, Naikoo and Shah [223].
Let and be distinct vertices in . If there are arcs directed from to and arcs directed from to , then we denote this by , where .
A double in is an induced directed subgraph with two vertices , and having the form , where , and , and is the number of arcs directed from to , and is the number of arcs directed from to . A triple in is an induced subgraph with tree vertices , , and having the form , where , and , , , and the meaning of , , , , , is similar to the meaning in the definition of doubles. An oriented triple in is an induced subdigraph with three vertices. An oriented triple is said to be transitive if it is of the form , or , or , or , or , otherwise it is intransitive. An -graph is said to be transitive if all its oriented triples are transitive. In particular, a triple in an -graph is transitive if every oriented triple of is transitive.
The following observation can be easily established and is analogue to Theorem 2.2 of Avery [12].
Lemma 25.16 (Avery 1991 [12]) If and are two -tournaments with same imbalance sequence, then can be transformed to by successively transforming (i) appropriate oriented triples in one of the following ways, either (a) by changing the intransitive oriented triple to a transitive oriented triple , which has the same imbalance sequence or vice versa, or (b) by changing the intransitive oriented triple to a transitive oriented triple , which has the same imbalance sequence or vice versa; or (ii) by changing a double to a double , which has the same imbalance sequence or vice versa.
The above observations lead to the following result.
Theorem 25.17 (Pirzada, Naikoo, Samee, Iványi 2010 [224]) Among all -tournaments with given imbalance sequence, those with the fewest arcs are transitive.
Proof. Let be an imbalance sequence and let be a realization of that is not transitive. Then contains an intransitive oriented triple. If it is of the form , it can be transformed by operation i(a) of Lemma 25.16 to a transitive oriented triple with the same imbalance sequence and three arcs fewer. If contains an intransitive oriented triple of the form , it can be transformed by operation i(b) of Lemma 25.16 to a transitive oriented triple same imbalance sequence but one arc fewer. In case contains both types of intransitive oriented triples, they can be transformed to transitive ones with certainly lesser arcs. If in there is a double , by operation (ii) of Lemma 25.16, it can be transformed to , with same imbalance sequence but two arcs fewer.
The next result gives necessary and sufficient conditions for a sequence of integers to be the imbalance sequence of some -tournament.
Theorem 25.18 (Pirzada, Naiko, Samee, Iványi [224]) A nondecreasing sequence of integers is an imbalance sequence of a -tournament if and only if
with equality when .
Proof. Necessity. A subtournament induced by vertices has a sum of imbalances at least .
Sufficiency. Assume that is a nonincreasing sequence of integers satisfying conditions (25.27) but is not the imbalance sequence of any -tournament. Let this sequence be chosen in such a way that is the smallest possible and is the least with that choice of . We consider the following two cases.
Case (i). Suppose equality in (25.27) holds for some , so that
for .
By minimality of , is the imbalance sequence of some -tournament with vertex set, say . Let . Consider
for , with equality when . Therefore, by the minimality for , the sequence forms the imbalance sequence of some -tournament with vertex set, say . Construct a new -tournament with vertex set as follows.
Let with, and the arc set containing those arcs which are in and . Then we obtain the -tournament with the imbalance sequence , which is a contradiction.
Case (ii). Suppose that the strict inequality holds in (25.27) for some , so that
for . Let , so that satisfies the conditions (25.27). Thus by the minimality of , the sequences is the imbalances sequence of some -tournament with vertex set, say . Let and . Since , there exists a vertex such that , or , or , or , and if these are changed to , or , or , or respectively, the result is a -tournament with imbalances sequence , which is again a contradiction. This proves the result.
Arranging the imbalance sequence in nonincreasing order, we have the following observation.
Corollary 25.19 (Pirzada, Naiko, Samee, Iványi [224]) A nondecreasing sequence of integers is an imbalance sequence of a -tournament if and only if
for , with equality when .
The converse of a -tournament is a -graph , obtained by reversing orientations of all arcs of . If with is the imbalance sequence of a -tournament , then is the imbalance sequence of .
The next result gives lower and upper bounds for the imbalance of a vertex in a -tournament .
Theorem 25.20 If is an imbalance sequence of a -tournament , then for each
Proof. Assume to the contrary that , so that for ,
That is,
Adding these inequalities, we get
which contradicts Theorem 25.18.
Therefore, .
The second inequality is dual to the first. In the converse -tournament with imbalance sequence we have, by the first inequality
Since , therefore
Hence, .
Now we obtain the following inequalities for imbalances in -tournament.
Theorem 25.21 (Pirzada, Naikoo, Samee, Iványi 2010 [224]) If is an imbalance sequence of a -tournament with , then
for , with equality when .
Proof. By Theorem 25.18, we have for , with equality when
implying
from where
and so we get the required
or
In Section 25.1 we defined the -supertournaments.
Now at first we present some results on the special case , that is on the hypertournaments.
Hypergraphs are generalizations of graphs [27], [28]. While edges of a graph are pairs of vertices of the graph, edges of a hypergraph are subsets of the vertex set, consisting of at least two vertices. An edge consisting of vertices is called a -edge. A -hypergraph is a hypergraph all of whose edges are -edges. A -hypertournament is a complete -hypergraph with each -edge endowed with an orientation, that is, a linear arrangement of the vertices contained in the hyperedge. Instead of scores of vertices in a tournament, Zhou et al. [294] considered scores and losing scores of vertices in a -hypertournament, and derived a result analogous to Landau's theorem [159]. The score or of a vertex is the number of arcs containing and in which is not the last element, and the losing score or of a vertex is the number of arcs containing and in which is the last element. The score sequence (losing score sequence) is formed by listing the scores (losing scores) in non-decreasing order.
The following characterizations of score sequences and losing score sequences in -hypertournaments can be found in G. Zhou et al. [295].
Theorem 25.22 (Zhou, Yang, Zhang [295]) Given two non-negative integers and with , a nondecreasing sequence of nonnegative integers is a losing score sequence of some -hypertournament if and only if for each ,
with equality when .
Proof. See [295].
Theorem 25.23 (Zhou, Yang, Zhang [295]) Given two positive integers and with , a nondecreasing sequence of nonnegative integers is a score sequence of some -hypertournament if and only if for each ,
with equality when .
Proof. See [295].
Some more results on -hypertournaments can be found in [37], [154], [214], [216], [294]. The analogous results of Theorem 25.22 and Theorem 25.23 for [ ]-bipartite hypertournaments can be found in [213] and for -tripartite hypertournaments can be found in [225].
Throughout this subsection takes values from 1 to and takes values from 1 to , unless otherwise is stated.
A -partite hypertournament is a generalization of -partite graphs (and -partite tournaments). Given non-negative integers and , with for each , an - -partite hypertournament (or briefly -partite hypertournament) of order consists of vertex sets with for each , together with an arc set , a set of tuples of vertices, with exactly vertices from , called arcs such that any subset of , contains exactly one of the -tuples whose entries belong to .
Let , with for each , , be an arc in and let , we let denote to be the new arc obtained from by interchanging and in . An arc containing vertices from for each , is called an ()-arc.
For a given vertex for each , and , the score (or simply ) is the number of -arcs containing and in which is not the last element. The losing score (or simply ) is the number of -arcs containing and in which is the last element. By arranging the losing scores of each vertex set separately in non-decreasing order, we get lists called losing score lists of and these are denoted by for each , . Similarly, by arranging the score lists of each vertex set separately in non-decreasing order, we get lists called score lists of which are denoted as for each . The following two theorems are the main results of this subsection.
Theorem 25.24 (Pirzada, Zhou, Iványi [231][Theorem 3]) Given nonnegative integers and nonnegative integers with for each , the nondecreasing lists of nonnegative integers are the losing score lists of a -partite hypertournament if and only if for each with ,
with equality when for each .
Theorem 25.25 (Pirzada, Zhou, Iványi [231][Theorem 4]) Given nonnegative integers and nonnegative integers with for each , the non-decreasing lists of non-negative integers are the score lists of a -partite hypertournament if and only if for each , with
with equality, when for each .
We note that in a -partite hypertournament , there are exactly arcs and in each arc only one vertex is at the last entry. Therefore,
In order to prove the above two theorems, we need the following Lemmas.
Lemma 25.26 (Pirzada, Zhou, Iványi [231][Lemma 5]) If is a -partite hypertournament of order with score lists for each , then
Proof. We have for each . If is the losing score of , then
The number of arcs containing for each , , and is
Thus,
Lemma 25.27 (Pirzada, Zhou, Iványi [231][Lemma 6]) If are losing score lists of a -partite hypertournament , then there exists some with
so that , () and , (), are losing score lists of some -partite hypertournament, is the largest integer such that .
Proof. Let be losing score lists of a -partite hypertournament with vertex sets so that for each (, ).
Let be the smallest integer such that
and be the largest integer such that
Now, let
, and , , .
Clearly, and are both in non-decreasing order.
Since , there is at least one -arc containing both and with as the last element in , let . Clearly, , and for each , are the losing score lists of .
The next observation follows from Lemma 25.26, and the proof can be easily established.
Lemma 25.28 (Pirzada, Zhou, Iványi [231][Lemma 7]) Let , be nondecreasing sequences of nonnegative integers satisfying (25.43). If , then there exist and (), such that , and , (), satisfy (25.43).
Proof of Theorem 25.24. Necessity. Let , be the losing score lists of a -partite hypertournament . For any with , let be the sets of vertices such that for each , . Let be the -partite hypertournament formed by for each .
Then,
Sufficiency. We induct on , keeping fixed. For , the result is obviously true. So, let , and similarly . Now,
We consider the following two cases.
Case 1. .
Then,
By induction hypothesis , are losing score lists of a -partite hypertournament of order . Construct a -partite hypertournament of order as follows. In , let for each , . Adding a new vertex to , for each -tuple containing , arrange on the last entry. Denote to be the set of all these -tuples. Let . Clearly, for each , are the losing score lists of .
Case 2. .
Applying Lemma 25.26 repeatedly on and keeping each , fixed until we get a new non-decreasing list in which now . By Case 1, , are the losing score lists of a -partite hypertournament. Now, apply Lemma 25.26 on , repeatedly until we obtain the initial non-decreasing lists for each . Then by Lemma 25.27, for each are the losing score lists of a -partite hypertournament.
Proof of Theorem 25.25. Let be the score lists of a -partite hypertournament , where with , for each , .
Clearly,
, .
Let , .
Then are the losing score lists of . Conversely, if for each are the losing score lists of , then for each , are the score lists of . Thus, it is enough to show that conditions (25.43) and (25.44) are equivalent provided
for each .
First assume (25.44) holds. Then,
with equality when for each .
Thus (1) holds.
Now, when (25.43) holds, using a similar argument as above, we can show that (25.44) holds. This completes the proof.
The majority of the results on hypertournaments can be extended to supertournaments.
The simplest case when all individual tournaments have own input sequence , where . Then we can apply the necessary and sufficient conditions and algorithms of the previous sections.
If all tournaments have a join input sequence , then all the previous necessary conditions remain valid.
The football tournaments are special incomplete -tournaments, where the set of the permitted results is .
In this section we describe eight properties of football sequences. These properties serve as necessary conditions for a given sequence to be a football sequence.
Definition 25.29 A football tournament is a directed graph (on vertices) in which the elements of every pair of vertices are connected either with 3 arcs directed in identical manner or with 2 arcs directed in different manner. A nondecreasingly ordered sequence of any is called football sequence
The -th vertex will be called -th team and will be denoted by . For the computations we represent a tournament with , what is an sized matrix, in which means the number of points received by in the match against . The elements , that is the elements in the main diagonal of are equal to zero. Let's underline, that the permitted elements are 0, 1 or 3, so .
The vector of the outdegrees of a tournament is called score vector. Usually we suppose that the score vector is nondecreasingly sorted. The sorted score vector is called score sequence and is denoted by . The number of football sequences for teams is denoted by . The values of are known for [155].
In this subsection at first we describe six algorithms which require time in worst case, then more complicate algorithms follow.
In this subsection we introduce relatively simple algorithms Boundness-Test, Monotonity-Test, Intervallum-Test, Loss-Test, Draw-Loss-test, Victory-Test, Strong-Test
, and Sport-Test
.
Testing of boundness
Since every team plays matches and receives at least and at most points in each match, therefore in a football sequence it holds for .
Definition 25.30 A sequence of integers will be called -bounded (shortly: bounded), if and only if
Lemma 25.31 (Lucz, Iványi, Sótér [168]) Every football sequence is a bounded sequence.
Proof. The lemma is a direct consequence of Definition 25.29.
The following algorithm executes the corresponding test. Sorting of the elements of is not necessary. We allow negative numbers in the input since later testing algorithm Decomposition
can produce such input for Boundness-Test
.
Input. : the number of teams ;
: arbitrary sequence of integer numbers.
Output. : a logical variable. Its value is , if the input vector is bounded, and otherwise.
Working variable. : cycle variable.
Boundness-Test
1FOR
TO
2IF
or 3 4RETURN
5 6RETURN
In worst case Boundness-Test
runs time, in expected case runs in time. More precisely the algorithm executes comparisons in worst case and asymptotically in average 2 comparisons in best case.
Testing of monotonity
Monotonity is also a natural property of football sequences.
Definition 25.32 A bounded sequence of nonnegative integers will be called -monotone (shortly: monotone), if and only if
Lemma 25.33 (Lucz, Iványi, Sótér [168]) Every football sequence is a monotone sequence.
Proof. This lemma also is a direct consequence of Definition 25.29.
The following algorithm executes the corresponding test. Sorting of the elements of is not necessary.
Input. : the number of players ;
: a bounded sequence of length .
Output. : a logical variable. Its value is , if the input vector is monotone, and otherwise.
Working variable. : cycle variable.
Monotonity-Test
1FOR
TO
2IF
3 4RETURN
5 6RETURN
In worst case Monotonity-Test
runs time, in expected case runs in time. More precisely the algorithm executes comparisons in worst case.
The following lemma gives the numbers of bounded and monotone sequences. Let denote the set of -bounded, and the set of -monotone sequences, the size of and the size of .
and
Proof. (25.56) is implied by the fact that an -bounded sequence contains elements and these elements have different possible values.
To show (25.57) let be a monotone sequence and let , where . Then . The mapping is a bijection and so equals to the number of ways of choosing numbers from , resulting (25.57).
Testing of the intervallum property
The following definition exploits the basic idea of Landau's theorem [159].
Definition 25.35 A monotone nonincreasing sequence is called intervallum type (shortly: intervallum), if and only if
Lemma 25.36 Every football sequence is intervallum sequence.
Proof. The left inequality follows from the fact, that the teams play matches and they get together at least two points in each matches.
The right inequality follows from the monotonity of and from the fact, that the teams play matches and get at most 3 points in each match.
The following algorithm Intervallum-Test
tests whether a monotone input is intervallum type.
Input. : the number of teams ;
: a bounded sequence of length .
Output. : a logical variable. Its value is , if the input vector is intervallum type, and otherwise.
Working variables. : cycle variable;
: binomial coefficients;
: initial value for the sum of the input data;
: the sum of the smallest input data.
We consider and as global variables, and therefore they are used later without new calculations. The number of -intervallum sequences will be denoted by .
Intervallum-Test
1 2FOR
TO
3 4 5IF
or 6 7RETURN
8 9RETURN
In worst case Intervallum-Test
runs time. More precisely the algorithm executes comparisons, additions, extractions, multiplications and 2 assignments in worst case. The number of the -intervallum sequences will be denoted by .
Testing of the loss property
The following test is based on Theorem 3 of [127][page 86]. The basis idea behind the theorem is the observation that if the sum of the smallest scores is less than , then the teams have lost at least points in the matches among others. Let and .
Definition 25.37 An intervallum satisfying sequence is called loss satisfying, iff
Lemma 25.38 (Lucz, Iványi, Sótér [168]) A football sequence is loss satisfying.
Proof. See the proof of Theorem 3 in [127].
The following algorithm Loss-Test
exploits Lemma 25.38.
Input. : the number of teams ;
: a bounded sequence of length .
Output. : a logical variable. Its value is , if the input vector is Landau type, and otherwise.
Working variables. : cycle variable;
: vector of the loss coefficient;
: sums of the input values, global variables;
: binomial coefficients, global variables.
Loss-Test
1 2FOR
TO
3 4IF
5 6RETURN
7 8RETURN
In worst case Loss-Test
runs in time, in best case in time. We remark that is in the following a global variable. The number of loss satisfying sequences will be denoted by .
Testing of the draw-loss property
In the previous subsection Loss-Test
exploited the fact, that small scores signalize draws, allowing the improvement of the upper bound of the sum of the scores.
Let us consider the loss sequence . made a draw, therefore one point is lost and so must hold implying that the sequence is not a football sequence. This example is exploited in the following definition and lemma. Let
Definition 25.39 A loss satisfying sequence is called draw loss satisfying, if and only if
Lemma 25.40 (Lucz, Iványi, Sótér [168]) A football sequence is draw loss satisfying.
Proof. The assertion follows from the fact that small scores and remainders (mod 3) of the scores both signalize lost points and so decrease the upper bound .
The following algorithm Draw-Loss-Test
exploits Lemma 25.38.
Input. : the number of teams ;
: a loss satisfying sequence of length .
Output. : a logical variable. Its value is , if the input vector is Landau type, and otherwise.
Working variables. : cycle variable;
: global variables;
: modified loss coefficients.
Draw-Loss-Test
1 2FOR
TO
3 4IF
5 6RETURN
7 8RETURN
In worst case Draw-Loss-Test
runs in time, in best case in time.
We remark that is in the following a global variable.
The number of draw loss satisfying sequences will be denoted by .
Testing of the victory property
In any football tournament matches end with victory and end with draw.
Definition 25.41 A loss satisfying (shortly: loss) sequence is called victory satisfying, iff
Lemma 25.42 (Lucz, Iványi, Sótér [168]) A football sequence is victory satisfying.
Proof. Team could win at most times. The left side of (25.62) is an upper bound for the number of possible wins, therefore it has to be greater or equal then the exact number of wins in the tournament.
The following algorithm Victory-Test
exploits Lemma 25.42.
Input. : the number of teams ;
: a loss sequence of length .
Output. : a logical variable. Its value is , if the input vector is Landau type, and otherwise.
Working variables. : cycle variable;
: where is an upper estimation of the number of possible wins of .
: global variables.
Victory-Test
1 2FOR
TO
3 4IF
5 6RETURN
7 8RETURN
Victory-Test
runs in time in all cases. The number of the victory satisfying sequences is denoted by .
Victory-Test
is successful e.g. for the input sequence , but until now we could not find such draw loss sequence, which is not victory sequence. The opposite assertion is also true. Maybe that the sets of victory and draw loss sequences are equivalent?
Testing of the strong draw-loss property
In paragraph “Testing of the draw-loss property” we estimated the loss caused by the draws in a simple way: supposed that every draw implies half point of loss. Especially for short sequences is useful a more precise estimation.
Let us consider the sequence . The sum of the remainders (mod 3) is 2 + 1 = 3, but we have to convert to draws at least three “packs” (3 points), if we wish to pair the necessary draws, and so at least six points are lost, permitting at most .
Exploiting this observation we can sharp a bit Lemma 25.40. There are the following useful cases:
one small remainder (1 pont) implies the loss of points;
one large remainder (2 points) implies the loss of points;
one small and one large remainder imply the loss of points;
two large remainders imply the loss of points;
one small and two large remainders imply the loss of = 4 points.
According to this remarks let resp. denote the multiplicity of the equality (mod 3) resp. (mod 3).
Definition 25.43 A victory satisfying sequence is called strong, iff
Lemma 25.44 (Lucz, Iványi, Sótér [168]) Every football sequence is strong.
Proof. The assertion follows from the fact that any point matrix of a football tournament order the draws into pairs.
The following algorithm Strong-Test
exploits Lemma 25.44.
Input. : the number of teams ;
: a loss satisfying sequence of length
Output. : a logical variable. Its value is , if the input vector is Landau type, and otherwise.
Working variables. : cycle variable;
: modified loss coefficients, global variables; : sum of the elements of the sequence , global variable;
: strongly modified loss coefficients.
Strong-Test
1 2FOR
TO
3IF
(mod 3) 4 5IF
(mod 3) 6 7 8IF
and 9 10IF
and 11 12IF
and 13 14IF
and 15 16IF
and 17 18IF
19 20RETURN
21 22RETURN
Strong-Test
runs in all cases in time.
We remark that is in the following a global variable.
The number of strong sequences will be denoted by .
Testing of the sport property
One of the typical form to represent a football tournament is its point matrix as it was shown in Figure 25.3.
Definition 25.45 A victory satisfying sequence is called sport sequence if it can be transformed into a sport matrix.
Lemma 25.46 (Lucz, Iványi, Sótér [168]) Every football sequence is a sport sequence.
Proof. This assertion is a consequence of the definition of the football sequences.
If a loss sequence can be realized as a sport matrix, then the following algorithm Sport-Test
constructs one of the sport matrices belonging to .
If the team has points, then it has at least draws, wins and losses. These results are called obligatory wins, draws, resp. losses. Sport-test
starts its work with the computation of and . Then it tries to distribute the remaining draws.
Input. : the number of players ;
: a victory satisfying sequence of length .
Output. : a logical variable. Its value is , if the input sequence is sport sequence, and otherwise;
Working variables. : cycle variable;
: columns of the sport matrix;
: sum of the numbers of obligatory wins, draws, resp. losses;
: global variables;
: the sum of the elements of the input sequence;
: the exact number of wins, draws, resp. losses.
Sport-Test
1 2FOR
TO
3 4 5 6 7 8 9 10IF
or 11 12RETURN
13 14 15FOR
TO
16WHILE
or or 17 18 19 20 21 22 23 24IF
25 26RETURN
27IF
or or 28 29RETURN
30 31RETURN
Sport-Test
runs in time in all cases. The number of the sport sequences is denoted by .
Concrete examples
Let us consider short input sequences illustrating the power of the linear testing algorithms.
If , then according to Lemma 25.34 we have and . The monotone sequences are , , , , , , , , , . Among the monotone sequences there are 4 interval sequences: , , , and , so . Loss-Test
does not help, therefore . Victory-Test
excludes , so . Finally Sport-Test
can not construct a sport matrix for and so it concludes . After further unsuccessful tests Football
reconstructs and , proving .
If , then according to Lemma 25.34 we have and . Among the 84 monotone sequence there are 27 interval sequences, and these sequences at the same time have also the loss property, so . These sequences are the following: , , , , , , , , , , , , , , , , , , , , , , , , , and . From these sequences only , , , , , , are paired sport sequences, so . The following tests are unsuccessful, but Football
reconstructs the remained seven sequences, therefore .
If , then according to Lemma 25.34 we have and . The number of paired sport sequences is . We now that , so our linear algorithms evaluate the input sequences correctly up to .
If , then , but our approximating algorithms end with , that is the quality of 5 sequences remains open.
Earlier we used a greedy approach to check whether the necessary number of draws is allocatable.
Definition 25.47 A sequence is called potential -draw sequence. The number of potential -draw sequences is denoted by .
Lemma 25.48 (Iványi, Lucz, Sótér [130]) If , then .
Proof. The proof is similar to the proof of Lemma 25.34.
Let us suppose we get a potential draw sequence. In this subsection we describe the testing algorithms Quick-Havel-Hakimi
and Linear-Erdős-Gallai
.
Quick Havel-Hakimi algorithm
Algorithm Quick-Havel-Hakimi-Test
is based on the following classical theorem [100], [109], [166].
Theorem 25.49 (Havel [109], Hakimi [100]) If , then a nonincreasing sequence of positive integers is the outdegree sequence of a simple graph if and only if is the outdegree sequence of some simple graph .
If is for example a complete simple graph, then it contains edges and the direct application of Havel-Hakimi theorem requires time. We make an attempt to decide in linear time the pairability of a sequence of positive integers.
The first simple observation is the necessity of the condition for all . We have not to test this property since all our draw allocation algorithms guarantee its fulfilment. Another interesting condition is
Lemma 25.50 (Iványi, Lucz, Sótér [130]) If a nonincreasing sequence of positive integers is the outdegree sequence of a simple graph , then
and
Proof. The draw request of the teams must be covered by inner and outer draws. The first sum on the right side gives the exact number of usable outer draws, while the sum of the right side gives the exact number of the reachable inner draws. The minimum on the left side represent an upper bound of the possible inner draws.
If we substitute this upper bound with the precise value, then our formula becomes a sufficient condition, but the computation of this value by Havel-Hakimi theorem is dangerous for the linearity of the method.
Let's take a few example. If , then we have only one potential draw-sequence, which is accepted by Havel-Hakimi algorithm and satisfies (25.64) and (25.65).
If , then there are potential draw sequence: (2,2,2), (2,2,1), (2,1,1) and (1,1,1). From these sequences Havel-Hakimi algorithm and the conditions of Lemma 25.48 both accept only (2,2,2) and (1,1,1).
If , then there are potential draw sequences. Havel-Hakimi algorithm and the conditions of Lemma 25.48 both accept the following 7: (3,3,3,3), (3,3,2,2), (3,2,2,1), (3,1,1,1), (2,2,2), (2,2,1,1), and (1,1,1,1).
If , then there are potential draw sequences. The methods are here also equivalent.
From one side we try to find an example for different decisions or try to find an exact proof of the equivalence of these algorithms.
Linear Erdős-Gallai algorithm
For given nondecreasing sequence of nonnegative integers the first elements of the sequence is called the head of the sequence and last elements are called the tail belonging to the th element of the sequence. The sum of the elements of the head is denoted by , while the sum of the element of the tail is denoted by . The sum is denoted by and is called the capacity of the tail belonging to . If is even, then is called even, otherwise the sequence is called odd sequence.
Another classical theorem on the testing of the potential draw sequences whether they are graphical is the theorem proved by Erdős and Gallai in 1960 [71].
Theorem 25.51 (Erdős, Gallai, [71]) If , the -regular sequence is graphical if and only if
and
Proof. See [52], [71], [253], [275].
Recently we could improve this theorem [130]. The algorithm Erdős-Gallai-Linear
exploits, that is monoton. It determines the capacities in constant time. The base of the quick computation is thesequence containing pointers. For given sequence let , where points to the element of having the maximal index among such elements of which are greater or equal with .
Theorem 25.52 (Iványi, Lucz, Sótér [130]) If , the -regular sequence is graphical if and only if
and if , then
further if , then
Proof. (25.68) is the same as (25.66).
During the testing of the elements of by Erdős-Gallai-Linear
there are two cases:
if , then the contribution of the tail of equals to , since the contribution of the element is only .
if , then the contribution of the tail of consists of two parts: equal to , while for .
Therefore in the case we have
and in the case
The following program is based on Theorem 25.52. It decides on arbitrary -regular sequence whether it is graphicakl or not.
Input. : number of vertices ;
: -regular sequence.
Output. : logical variable, whose value is , if the input is graphical, and it is , if the input is not graphical.
Work variables. and : cycle variables;
: is the sum of the first elements of the tested ;
: is the maximum of the indices of such elements of , which are not smaller than ; : help variable to compute of the other elenments of the sequence ;
: help variable to compute the elements of the sequence .
Linear-Erdős-Gallai
1 Line 01: initialization 2FOR
TO
Lines 02–03: computation of the elements of 3 4IF
odd Lines 04–06: test of the parity 5 6RETURN
7 Line 07: initialization of 8FOR
DOWNTO
Lines 08–09: setting of some pointers 9 10FOR
TO
Lines 10–14: calculation of the pointers 11IF
12FOR
DOWNTO
13 14 15FOR
DOWNTO
Lines 15–16: setting of some pointers 16 17FOR
TO
Lines 17–23: test of 18IF
and 19 20RETURN
21IF
and 22 23RETURN
24 Lines 24–25: the program ends with value 25RETURN
Theorem 25.53 (Iványi, Lucz [129], Iványi, Lucz, Sótér [130]) Algorithm Linear-Erdős-Gallai
decides in time, whether an -regular sequence is graphical or not.
Proof. Line 1 requires time, lines 2–3 time, lines 4–6 time, line 07 time, line 08–09 time, lines 10–16 time, lines 17–23 time and lines 24–25 time, therefore the total time requirement of the algorithm is .
Until now the number of good sequences was known only for , see Online Encyclopedia of Integer Sequences [256]. Using the new and quick algorithms we determined for too [129], [130], [131]. Figure 25.6 contains the number of good sequences for , and also for .
Figure 25.6. Number of binomial, head halfing and good sequences, further the ratio of the numbers of good sequences for neighbouring values of .
Testing of the pairing sport property at cautious allocation of the draws
Sport-Test
investigated, whether the scores allow to include draws into the sport matrix.
Let us consider the sport sequence . In a unique way we get the sport matrix
Here has no partners to make two draws, therefore is not a football sequence. Using the Havel-Hakimi algorithm [100], [109], [166] we can try to pair the draws of any sport matrix. If we received the sport matrix in a unique way, and Havel-Hakimi algorithms can not pair the draws, then the investigated sequence is not a football sequence.
Now consider the football sequence , which is the result of a tournament of 7 weak, 14 medium and 7 strong teams. the weak player play draws among themselves and loss against the medium and strong teams. The medium teams form a transitive subtournament and loss against the strong teams. The strong teams play draws among themselves. We perturbate this simple structure: one of the weak teams wins against the best medium team instead of to lost the match. There are 42 draws in the tournament, therefore the sum of the multiplicities of the sport matrix has to be 84. A uniform distribution results for all determining the sport matrix in a unique way.
Let us consider the matches in the subtournament of . This subtournament consists of 21 matches, from which at most can end with draw, therefore at least matches have a winner, resulting at least inner points. But the seven teams have only points signalizing that that the given sport matrix is not a football matrix.
In this case the concept of inner draws offers a solution. Since and , the teams made at least 9 draws among themselves. “Cautious” distribution results a draw sequence , which can be paired easily. Then we can observe that , while , so the teams have to made at least 18 draws. Cautious distribution results a draw sequence . Havel-Hakimi algorithm finishes the pairing with the draw sequence (2,2), so 2 draws remain unpaired. If we assign a further draw pack to this subtournament, then the uniform distribution results the draw sequence consisting of 13 draw packs instead of 12. Since is an odd number, this draw sequence is unpairable—the subtournament needs at least one outer draw.
The reconstruction begins with the study of the inner draws. Let us consider the following sequence of length 28: . This is the score sequence of a tournament, consisting of seven weak, 14 medium and 7 strong teams. The weak teams play only draws among themselves, the medium teams win against the weak teams and form a transitive subtournament among themselves, the strong teams win against the weak and medium teams and play only draws among themselves. Here a good management of obligatory draws is necessary for the successful reconstruction.
In general the testing of the realizabilty of the draw sequence of a sport matrix is equivalent with the problem to decide on a given sequence of nonnegative integers whether there exists a simple nondirected graph whose degree sequence is .
Let us consider the following example: . This is the score sequence of a tournament of 4 “week”, 8 “medium” and 4 “strong” teams. The week teams and also the strong teams play only draws among themselves. The medium teams win against the weak ones and the strong teams win against the medium ones. wins again , wins against , wins against , and wins against , and the remaining matches among weak and strong teams end with draw.
In this case the 16 teams play 120 matches, therefore the sum of the scores has to be between 240 and 360. In the given case the sum is 336, therefore the point matrix has to contain 96 wins and 24 draws. So at uniform distribution of draws every team gets exactly one draw pack.
How to reconstruct this sequence? At a uniform distribution of the draw packs we have to guarantee the draws among the weak teams. The original results imply nonuniform distribution of the draws but it seems not an easy task to find a quick and successful method for a nonuniform distribution.
Exercises
25.9-1 How many monoton sequence is possible in a football tournament with 5 teams?
25.9-2 How many sportmatrix is possible in a football tournament with 5 teams?
PROBLEMS |
25-1
FOOTBALL SCORE SEQUENCES
Design a program generating all football sequences of length 6. Implement your program.
25-2
DRAW SEQUENCES
Design a program which generates all draw sequences of a tournament with teams.
CHAPTER NOTES |
A nondecreasing sequence of nonnegative integers is a score sequence of a -tournament, if and only if the sum of the elements of equals to and the sum of the first elements of is at least [159].
is a score sequence of a -tournament, iff the sum of the elements of equals to , and the sum of the first elements of is at least [144], [183].
is a score sequence of an -tournament, if and only if (25.17) holds [127].
In all 3 cases the decision whether is digraphical requires only linear time.
In this paper the results of [127] are extended proving that for any there exists an optimal minimax realization , that is a tournament having as its outdegree sequence and maximal and minimal in the set of all realization of . There are further papers on imbalances in different graphs [140], [187], [214], [215], [212], [246].
Many efforts was made to enumerate the different types of degree and score sequences and connected with them sequences, e.g. by Ascher [11], Barnes and Savage [18], [19], Hirschhorn and Sellers [112], Iványi, Lucz and Sótér [131], [130], Metropolis [176], Rodseth, Sellers and Tverberg [241], Simion [254], Sloane and Plouffe [255], [256], [259], [258], [257].
You can find many references in [127], [128], [130].
Acknowledgement. The author thanks András Frank (Eötvös Loránd University) for valuable advices concerning the application of flow theory, further Péter L. Erdős (Alfréd Rényi Institute of Mathematics of HAS) and Zoltán Kása (Sapientia Hungarian University of Transylvania) for the consultation.
The research of the first author was supported by the European Union and the European Social Fund under the grant agreement no. TÁMOP 4.2.1/B-09/1/KMR-2010-0003.
Table of Contents
The complexity of words is a continuously growing field of the combinatorics of words. Hundreds of papers are devoted to different kind of complexities. We try to present in this chapter far from being exhaustive the basic notions and results for finite and infinite words.
First of all we summarize the simple (classical) complexity measures, giving formulas and algorithms for several cases. After this, generalized complexities are treated, with different type of algorithms. We finish this chapter by presenting the palindrome complexity.
Finally, references from a rich bibliography are given.
In this section simple (classical) complexities, as measures of the diversity of the subwords in finite and infinite words, are discussed. First, we present some useful notions related to the finite and infinite words with examples. Word graphs, which play an important role in understanding and obtaining the complexity, are presented in detail with pertinent examples. After this, the subword complexity (as number of subwords), with related notions, is expansively presented.
Let be a finite, nonempty set, called alphabet. Its elements are called letters or symbols. A string , formed by (not necessary different) elements of , is a word. The length of the word is , and is denoted by . The word without any element is the empty word, denoted by (sometimes ). The set of all finite words over is denoted by . We will use the following notations too:
that is is the set of all finite and nonempty words over , whilst is the set of all words of length over . Obviously . The sets and are infinite denumerable sets.
We define in the binary operation called concatenation (shortly catenation). If and , then
This binary operation is associative, but not commutative. Its neutral element is because . The set with this neutral element is a monoid. We introduce recursively the power of a word:
, if .
A word is primitive if it is no power of any word, so is primitive if
For example, is a primitive word, whilst is not.
The word is periodic if there is a value , such that
and is the period of . The least such is the least period of .
The word is periodic with the least period .
Let us denote by the greatest common divisor of the naturals and . The following result is obvious.
Theorem 26.1 If is periodic, and and are periods, then is a period too, provided .
The reversal (or mirror image) of the word is . Obviously . If , then is a palindrome.
The word is a subword (or factor) of if there exist the words and such that . If , then is a proper subword of . If , then is a prefix of , and if , then is a suffix of . The set of all subwords of length of is denoted by . is the set of nonempty subwords of , so
For example, if , then
,
.
The words and are equal, if
and
, for .
Theorem 26.2 (Fine–Wilf) If and are words of length , respective , and if there are the natural numbers and , such that and have a common prefix of length , then and are powers of the same word.
The value in the theorem is tight. This can be illustrated by the following example. Here the words and have a common prefix of length , but and are not powers of the same word.
, , ,
, , .
By the theorem a common prefix of length 7 would ensure that and are powers of the same word. We can see that and have a common prefix of length 6 (), but and are not powers of the same word, so the length of the common prefix given by the theorem is tight.
Beside the finite words we consider infinite (more precisely infinite at right) words too:
The set of infinite words over the alphabet is denoted by . If we will study together finite and infinite words the following notation will be useful:
The notions as subwords, prefixes, suffixes can be defined similarly for infinite words too. The word is a subword of if there are the words , , such that . If , then is a prefix of , whilst is a suffix of . Here also represents the set of all subwords of length of .
Examples of infinite words over a binary alphabet:
1) The power word is defined as:
It can be seen that
,
The Champernowne word is obtained by writing in binary representation the natural numbers :
It can be seen that
,
3) The finite Fibonacci words can be defined recursively as:
,
, if .
From this definition we obtain:
,
,
,
,
,
,
.
The infinite Fibonacci word can be defined as the limit of the sequence of finite Fibonacci words:
The subwords of this word are:
,
.
The name of Fibonacci words stems from the Fibonacci numbers, because the length of finite Fibonacci words is related to the Fibonacci numbers: , i.e. the length of the th finite Fibonacci word is equal to the th Fibonacci number.
The infinite Fibonacci word has a lot of interesting properties. For example, from the definition, we can see that it cannot contain the subword 11. The number of 1's in a word will be denoted by . An infinite word is balanced, if for arbitrary subwords and of the same length, we have , i.e.
Theorem 26.3 The infinite Fibonacci word is balanced.
If word is concatenated by itself infinitely, then the result is denoted by .
The infinite word is periodic, if there is a finite word , such that . This is a generalization of the finite case periodicity. The infinite word is ultimately periodic, if there are the words and , such that .
The infinite Fibonacci word can be generated by a (homo)morphism too. Let us define this morphism:
Based on this definition, the function can be defined on letters only. A morphism can be extended for infinite words too:
The finite Fibonacci word can be generated by the following morphism:
In this case we have the following theorem.
Proof. The proof is by induction. Obviously . Let us presume that for all . Because
,
by the induction hypothesis
.
From this we obtain:
The infinite Fibonacci word is the fixed point of the morphism .
Let be a set of words of length over , and . We define a digraph, whose vertices are from , and whose arcs from . There is an arc from the vertex to the vertex if
that is the last letters in the first word are identical to the first letters in the second word. This arc is labelled by (or ).
If and , where , then the graph is called De Bruijn graph, denoted by .
Figures 26.1 and 26.2 illustrate De Bruijn graphs and .
To a walk in the De Bruijn graph we attach the label , which is obtained by maximum overlap of the vertices of the walk.
Footnote. In a graph a walk is a sequence of neighbouring edges (or arcs with the same orientation). If the edges or arcs of the walk are all different the walk is called trail, and when all vertices are different, the walk is a path.
In Figure26.1 in the graph the label attached to the walk (which is a path) is . The word attached to a Hamiltonian path (which contains all vertices of the graph) in the graph is an -type De Bruijn word. For example, words and are -type De Bruijn word. An -type De Bruijn word contains all words of length .
A connected digraph is Eulerian if the in-degree of each vertex is equal to its out-degree.
Footnote. A digraph (oriented graph) is connected if between every pair of vertices there is an oriented path at least in a direction.
Footnote. A digraph is Eulerian if it contains a closed oriented trail with all arcs of the graph.
Footnote. In-degree (out-degree) of a vertex is the number of arcs which enter (leave) this vertex.
Theorem 26.7 The De Bruijn graph is Eulerian.
Proof. a) The graph is connected because between all pair of vertices and there is an oriented path. For vertex there are leaving arcs, which enter vertices whose first letters are , and the last letters in this words are all different. Therefore, there is the path , .
b) There are incoming arcs to vertex from vertices , where ( is the alphabet of the graph, i.e. ). The arcs leaving vertex enter vertices , where . Therefore, the graph is Eulerian, because the in-degree and out-degree of each vertex are equal.
From this the following theorem is simply obtained.
Theorem 26.8 An oriented Eulerian trail of the graph (which contains all arcs of graph) is a Hamiltonian path in the graph , preserving the order.
For example, in the sequence 000, 001, 010, 101, 011, 111, 110, 100 of arcs is an Eulerian trail. At the same time these words are vertices of a Hamiltonian path in .
Generating De Bruijn words is a common task with respectable number of algorithms. We present here the well-known Martin algorithm. Let be an alphabet. Our goal is to generate an -type De Bruijn word over the alphabet .
We begin the algorithm with the word , and add at its right end the letter with the greatest possible subscript, such that the suffix of length of the obtained word does not duplicate a previously occurring subword of length . Repeat this until such a prolongation is impossible. When we cannot continue, a De Bruijn word is obtained, with the length . In the following detailed algorithm, is the -letters alphabet, and represents the result, an -type De Bruijn word.
Martin(
)
1FOR
TO
2DO
3 4REPEAT
5TRUE
6 7WHILE
8DO
IF
Not a subword. 9THEN
10 11FALSE
12EXIT WHILE
13ELSE
14UNTIL
15RETURN
.
Because this algorithm generates all letters of a De Bruijn word of length , and and are independent, its time complexity is . The more precise characterization of the running time depends on the implementation of line 8. The rEPEAT
statement is executed times. The wHILE
statement is executed at most times for each step of the rEPEAT
. The test can be made in the worst case in steps. So, the total number of steps is not greater than , resulting a worst case bound . If we use Knuth-Morris-Pratt string mathching algorithm, then the worst case running time is .
In chapter 29 a more efficient implementation of the idea of Martin is presented.
Based on this algorithm the following theorem can be stated.
Theorem 26.9 An -type De Bruijn word is the shortest possible among all words containing all words of length over an alphabet with letters.
To generate all ()-type De Bruijn words the following recursive algorithm is given. Here is also an alphabet with letters, and represents an -type De Bruijn word. The algorithm is called for each position with .
All-De-Bruijn(
)
1FOR
TO
2DO
3IF
Not a subword. 4THEN
All-De-Bruijn(
)
5ELSE
IF
6THEN
print A De Bruijn word. 7EXIT FOR
The call of the procedure:
FOR
TO
DO
ALL-DE-BRUIJN
.
This algorithm naturally is exponential.
In following, related to the De Bruijn graphs, the so-called De Bruijn trees will play an important role.
A De Bruijn tree with the root is a -ary tree defined recursively as follows:
i. The word of length over the alphabet is the root of .
ii. If is a leaf in the tree , then each word of the form , , will be a descendent of , if in the path from root to the vertex does not appears.
iii. The rule ii is applied as many as it can.
In Figure 26.3 the De Bruijn tree is given.
If the word is infinite, and , , then the corresponding word graph is called Rauzy graph (or subword graph). Figure 26.4 illustrates the Rauzy graphs of the infinite Fibonacci word for and . As we have seen, the infinite Fibonacci word is
and , , .
In the case of the power word , where , , the corresponding Rauzy graphs are given in Figure 26.5.
As we can see in Fig, 26.4 and 26.5 there are subwords of length which can be continued only in a single way (by adding a letter), and there are subwords which can be continued in two different ways (by adding two different letters). These latter subwords are called special subwords. A subword is a right special subword, if there are at least two different letters , such that . Similarly, is left special subword, if there are at least two different letters , such that . A subword is bispecial, if at the same time is right and left special. For example, the special subwords in Figures 26.4 and 26.5) are:
left special subwords: 010, 0100 (Figure 26.4),
110, 000, 111, 1110, 0001, 1111, 0011 (Figure 26.5),
right special subwords: 010, 0010 (Figure 26.4),
011, 000, 111, 0111, 1111, 0011 (Figure 26.5)
bispecial subwords: 010 (Figure 26.4),
000, 111, 1111, 0011 (Figure 26.5).
The complexity of words measures the diversity of the subwords of a word. In this regard the word has smaller complexity then the word .
We define the following complexities for a word.
1) The subword complexity or simply the complexity of a word assigns to each the number of different subwords of length . For a word the number of different subwords of length is denoted by .
If the word is finite, then , if .
2) The maximal complexity is considered only for finite words.
If is an infinite word, then is the lower maximal complexity, respectively the upper maximal complexity.
3) The global maximal complexity is defined on the set :
4) The total complexity for a finite word is the number of all different nonempty subwords
For an infinite word is the lower total complexity, and is the upper total complexity:
Footnote. Sometimes the empty subword is considered too. In this case the value of total complexity is increased by 1.
5) A decomposition is called a factorization of . If each (with the possible exception of ) is the shortest prefix of which does not occur before in , then this factorization is called the Lempel-Ziv factorization. The number of subwords in such a factorization is the Lempel-Ziv factorization complexity of . For example for the word the Lempel-Ziv factorization is: . So, the Lempel-Ziv factorization complexity of is .
6) If in a factorization each is the longest possible palindrome, then the factorization is called a palindromic factorization, and the number of subwords in this is the palindromic factorization complexity. For , so the palindromic factorization complexity of is .
7) The window complexity is defined for infinite words only. For the window complexity is
As we have seen
, if .
For example, in the case of :
In Theorem 26.4 was stated that for the infinite Fibonacci word:
In the case of the power word the complexity is:
This can be proved if we determine the difference , which is equal to the number of words of length which can be continued in two different ways to obtain words of length . In two different ways can be extended only the words of the form (it can be followed by 1 always, and by 0 when ) and (it can be followed by 0 always, and by 1 when ). Considering separately the cases when is odd and even, we can see that:
and from this we get
In the case of the Champernowne word
the complexity is .
Theorem 26.10 If for the infinite word there exists an such that , then is ultimately periodic.
Proof. , otherwise the word is trivial (contains just equal letters). Therefore there is a , such that . But
It follows that each subword has only one extension to obtain . So, if , then . Because is a finite set, and is infinite, there are and (), for which , but in this case is true too. Then from we obtain the following equality results: , therefore is true for all values. Therefore is ultimately periodic.
A word is Sturmian, if for all .
Sturmian words are the least complexity infinite and non periodic words. The infinite Fibonacci word is Sturmian. Because , the Sturmian words are two-letters words.
From the Theorem 26.10 it follows that each infinite and not ultimately periodic word has complexity at least , i.e.
The equality holds for Sturmian words.
Infinite words can be characterized using the lower and upper total complexity too.
Theorem 26.11 If an infinite word is not ultimately periodic and , then
For the Sturmian words equality holds.
Let us denote by the fractional part of , and by its integer part. Obviously . The composition of a function by itself times will be denoted by . So ( times). Sturmian words can be characterized in the following way too:
Theorem 26.12 A word is Sturmian if and only if there exists an irrational number and a real number , such that for
or
In the case of the infinite Fibonacci number, these numbers are: .
Sturmian words can be generated by the orbit of a billiard ball inside a square too. A billiard ball is launched under an irrational angle from a boundary point of the square. If we consider an endless move of the ball with reflection on boundaries and without friction, an infinite trajectory will result. We put an 0 in the word if the ball reaches a horizontal boundary, and 1 when it reaches a vertical one. In such a way we generate an infinite word. This can be generalized using an -letter alphabet and an -dimensional hypercube. In this case the complexity is
If , then , and if , then .
For a finite word
is the maximal complexity. In Figure 26.6 the values of the complexity function for several words are given for all possible length. From this, we can see for example that , etc.
For the complexity of finite words the following interesting result is true.
Theorem 26.13 If is a finite word, is its complexity, then there are the natural numbers and with such that
, for ,
, for ,
, for .
From the Figure 26.6, for example, if , then , ,
, then , ,
, then , .
The global maximal complexity is
that is the greatest (maximal) complexity in the set of all words of length on a given alphabet. The following problems arise:
what is length of the subwords for which the global maximal complexity is equal to the maximal complexity?
how many such words exist?
Example 26.1 For the alphabet the Figure 26.7 and 26.8 contain the complexity of all 3-length and 4-length words.
In this case of the 3-length words (Figure 26.7) the global maximal complexity is 2, and this value is obtained for 1-length and 2-length subwords. There are 6 such words.
For 4-length words (Figure 26.8) the global maximal complexity is 3, and this value is obtained for 2-length words. The number of such words is 8.
To solve the above two problems, the following notations will be used:
In the table of Figure 26.9 values of , , are given for length up to 20 over on a binary alphabet.
We shall use the following result to prove some theorems on maximal complexity.
Lemma 26.14 For each , the shortest word containing all the words of length over an alphabet with letters has letters (hence in this word each of the words of length appears only once).
Proof. Let us consider at first the case , . From Lemma 26.14 we obtain the existence of a word of length which contains all the words of length , hence . It is obvious that for and for . Any other word of length will have the maximal complexity less than or equal to , hence we have .
For we consider now the values of of the form with , hence . If from the word of length considered above we delete the last letters, we obtain a word of length with . This word will have and this value will be its maximal complexity. Indeed, it is obvious that for ; for it follows that , hence . Because it is not possible for a word of length , with to have the maximal complexity greater than , it follows that .
Theorem 26.16 If and then ; if then .
Proof. In the first part of the proof of Theorem 26.15, we proved for , , the existence of a word of length for which . This means that . For the word , as well as for any other word of length , we have , , because of the special construction of , which contains all the words of length in the most compact way. It follows that .
As in the second part of the proof of Theorem 26.15, we consider with and the word for which . We have again . For , it is obvious that the complexity function of , or of any other word of length , is strictly less than . We examine now the possibility of finding a word with for which for . We have , hence the equality holds only for and , that is for .
We show that for we have indeed . If we start with the word of length generated by the Martin's algorithm (or with another De Bruijn word) and add to this any letter from , we obtain obviously a word of length , which contains all the words of length and words of length , hence .
Having in mind the Martin
algorithm (or other more efficient algorithms), words with maximal complexity can be easily constructed for each and for both situations in Theorem 26.16.
Theorem 26.17 If and then is equal to the number of different paths of length in the de Bruijn graph .
Proof. From Theorems 26.15 and 26.16 it follows that the number of the words of length with global maximal complexity is given by the number of words with . It means that these words contain subwords of length , all of them distinct. To enumerate all of them we start successively with each word of letters (hence with each vertex in ) and we add at each step, in turn, one of the symbols from which does not duplicate a word of length which has already appeared. Of course, not all of the trials will finish in a word of length , but those which do this, are precisely paths in starting with each vertex in turn and having the length . Hence to each word of length with we can associate a path and only one of length starting from the vertex given by the first letters of the initial word; conversely, any path of length will provide a word of length which contains distinct subwords of length .
can be expressed also as the number of vertices at level in the set of De Bruijn trees.
Proof. In the De Bruijn graph there are different Hamiltonian cycles. With each vertex of a Hamiltonian cycle a De Bruijn word begins (containing all -length subwords), which has maximal complexity, so , which proves the theorem.
A generalization for an alphabet with letters:
The total complexity is the number of different nonempty subwords of a given word:
The total complexity of a trivial word of length (of the form , ) is equal to . The total complexity of a rainbow word (with pairwise different letters) of length is equal to .
The problem of existence of words with a given total complexity are studied in the following theorems.
Theorem 26.20 If is a natural number different from 1, 2 and 4, then there exists a nontrivial word of total complexity equal to .
Proof. To prove this theorem we give the total complexity of the following -length words:
These can be proved immediately from the definition of the total complexity.
1. If is odd then we can write for a given . It follows that , and the word has total complexity .
2. If is even, then .
2.1. If , then gives , and from this results. The word has total complexity .
2.2. If then gives , and from this results. The word has total complexity .
In the proof we have used more than two letters in a word only in the case of the numbers of the form (case 2.2 above). The new question is, if there exist always nontrivial words formed only of two letters with a given total complexity. The answer is yes anew. We must prove this only for the numbers of the form . If and , we use the followings:
If , then gives , and the word has total complexity .
If , then gives , and the word has total complexity . For only 14, 26 and 30 are feasible. The word has total complexity 14, has 26, and 30. Easily it can be proved, using a tree, that for 6, 10, 18 and 22 such words does not exist. Then the following theorem is true.
Theorem 26.21 If is a natural number different from 1, 2, 4, 6, 10, 18 and 22, then there exists a nontrivial word formed only of two letters, with the given total complexity .
The existence of a word with a given length and total complexity is not always assured, as we will prove in what follows.
In relation with the second problem a new one arises: How many words of length and complexity there exist? For small this problem can be studied exhaustively. Let be of letters, and let us consider all words of length over . By a computer program we have got Figure 26.10, which contains the frequency of words with given length and total complexity.
Let and let denote the frequency of the words of length over having a complexity . Then we have the following easy to prove results:
, if or ,
,
,
,
As regards the distribution of the frequency , the following are true:
If , then .
If , then .
The question is, if there exists a value from which up to no more 0 frequency exist. The answer is positive. Let us denote by the least number between and for which
for all with .
The number exists for any (in the worst case it may be equal to ):
As we have seen in the previous section, a contiguous part of a word (obtained by erasing a prefix or/and a suffix) is a subword or factor. If we eliminate arbitrary letters from a word, what is obtained is a scattered subword, sometimes called subsequence. Special scattered subwords, in which the consecutive letters are at distance at least and at most in the original word, are called -subwords. More formally we give the following definition.
Let , , be positive integers, and let be a word over the alphabet . The word , where
,
, for ,
,
is a -subword of length of .
For example the -subwords of are: , , , , , , , , , , , , , , , , , , , , , , .
The number of different -subwords of a word is called -complexity and is denoted by .
For example, if , then .
Words with pairwise different letters are called rainbow words. The -complexity of a rainbow word of length does not depends on what letters it contains, and is denoted by .
To compute the -complexity of a rainbow word of length , let us consider the word (if , then ) and the corresponding digraph , with
,
.
For see Figure 26.11.
The adjacency matrix of the graph is defined by:
Because the graph has no directed cycles, the entry in row and column in (where , with ) will represent the number of -length directed paths from to . If is the identity matrix (with entries equal to 1 only on the first diagonal, and 0 otherwise), let us define the matrix :
The -complexity of a rainbow word is then
The matrix can be better computed using a variant of the well-known Warshall algorithm:
Warshall(
)
1 2FOR
TO
3DO
FOR
TO
4DO
FOR
TO
5DO
6RETURN
From we obtain easily . The time complexity of this algorithms is .
For example let us consider the graph in Figure 26.11. The corresponding adjacency matrix is:
After applying the Warshall algorithm:
and then , the sum of entries in .
The Warshall algorithm combined with the Latin square method can be used to obtain all nontrivial (with length at least 2) -subwords of a given rainbow word of length . Let us consider a matrix with the elements which are set of words. Initially this matrix is defined as
If and are sets of words, will be formed by the set of concatenation of each word from with each word from :
If is a word, let us denote by the word obtained from by erasing the first character: . Let us denote by the set in which we erase from each element the first character. In this case is a matrix with entries .
Starting with the matrix defined as before, the algorithm to obtain all nontrivial -subwords is the following:
Warshall-Latin(
)
1 2FOR
TO
3DO
FOR
TO
4DO
FOR
TO
5DO
IF
and 6THEN
7RETURN
The set of nontrivial -subwords is . The time complexity is also .
For , , , the initial matrix is:
and
Counting the one-letter subwords too, we obtain .
In this case instead of we will use . For a rainbow word, we will denote the number of -subwords which finish at the position . For
For simplicity, let us denote by . The -complexity of a rainbow word can be obtained by the formula
Because of (26.1) we can write in the case of
Denoting
we get
and the sequence is one of Fibonacci-type. For any we have and from this results. Therefore the numbers are defined by the following recurrence equations:
for ,
for .
These numbers can be generated by the following generating function:
The -complexity can be expressed with these numbers by the following formula:
and
or
If then
where is the generating function of the Fibonacci numbers (with , ). Then, from this formula we have
and
Figure 26.13 contains the values of for and .
The following theorem gives the value of in the case :
The main step in the proof is based on the formula
The value of can be also obtained by computing the number of sequences of length of 's and 's, with no more than adjacent zeros. In such a sequence one 1 represents the presence, one 0 does the absence of a letter of the word in a given -subword. Let denote the number of -length sequences of zeros and ones, in which the first and last position is 1, and the number of adjacent zeros is at most . Then it can be proved easily that
, for ,
,
, for all ,
because any such sequence of length () can be continued in order to obtain a similar sequence of length in only one way (by adding a sequence of the form on the right). For the following formula also can be derived:
If we add one 1 or 0 at an internal position (e.g at the position) of each sequences, then we obtain sequences of length , but from these, sequences will have adjacent zeros.
The generating function corresponding to is
By adding zeros on the left and/or on the right to these sequences, we can obtain the number , as the number of all these sequences. Thus
( zeros can be added in ways to these sequences: on the left and on the right, on the left and on the right, and so on).
From the above formula, the generating function corresponding to the complexities can be obtained as a product of the two generating functions and , thus:
In the sequel instead of we will use . In this case the distance between two letters picked up to be neighbours in a subword is at least .
Let us denote by the number of -subwords which begin at the position in a rainbow word of length . Using our previous example (abcdef), we can see that , , , , , and .
The following formula immediately results:
for , and ,
For simplicity, will be denoted by .
The -complexity of rainbow words can be computed by the formula:
This can be expressed also as
because of the formula
In the case the complexity can be computed easily: .
From (26.2) we get the following algorithm for the computation of . The numbers for a given and are obtained in the array . Initially all these elements are equal to . The call for the given and and the desired is:
Input
FOR
TO
DO
B
Array is a global one. Output
The recursive algorithm is the following:
B(
)
1 2FOR
TO
3DO
IF
4THEN
B
5 6 7RETURN
This algorithm is a linear one.
If the call is , the elements will be obtained in the following order: , , , , , , and .
Lemma 26.24 , where is the th Fibonacci number.
Proof. Let us consider a rainbow word and let us count all its -subwords which begin with . If we change for in each -subword which begin with , we obtain -subwords too. If we add in front of each -subword which begin with , we obtain -subwords too. Thus
So is a Fibonacci number, and because , we obtain that .
Theorem 26.25 , where is the th Fibonacci number.
Proof. From equation (26.4) and Lemma 26.24:
If we use the notation , because of the formula
a generalized middle sequence will be obtained:
Let us call this sequence -middle sequence. Because of the equality , the -middle sequence can be considered as a generalization of the Fibonacci sequence.
Then next linear algorithm computes , by using an array to store the necessary previous elements:
Middle(
)
1 2FOR
TO
3DO
4FOR
TO
5DO
6 print 7RETURN
Using the generating function , the following closed formula can be obtained:
This can be used to compute the sum , which is the coefficient of in the expansion of the function
So . Therefore
Theorem 26.26 , where and is the th elements of -middle sequence.
Proof. The proof is similar to that in Theorem 26.25 taking into account the equation (26.7).
Proof. Let us consider the generating function . Then, taking into account the equation (26.6) we obtain . The general term in this expansion is equal to
and the coefficient of is equal to
The coefficient of is
By Theorem 26.26 , and an easy computation yields
The palindrome complexity function of a finite or infinite word attaches to each the number of palindrome subwords of length in , denoted by .
The total palindrome complexity of a finite word is equal to the number of all nonempty palindrome subwords of , i.e.:
This is similar to the total complexity of words.
Theorem 26.28 The total palindrome complexity of any finite word satisfies .
Proof. We proceed by induction on the length of the word . For we have .
We consider and suppose that the assertion holds for all words of length . Let be a word of length and its prefix of length . By the induction hypothesis it is true that .
If for each , the only palindrome in which is not in is , hence .
If there is an index , such that , then if and only if has suffixes which are palindromes. Let us suppose that there are at least two such suffixes and , , which are palindromes. It follows that
hence . The last palindrome appears in (because of ) and has been already counted in . It follows that .
This result shows that the total number of palindromes in a word cannot be larger than the length of that word. We examine now if there are words which are `poor' in palindromes. In the next lemma we construct finite words of arbitrary length , which contain precisely 8 palindromes.
Let us denote by the fractional power of the word of length , which is the prefix of length of .
Proof. In there are the following palindromes: 0, 1, 00, 11, 010, 101, 0110, 1001. Because 010 and 101 are situated in between 0 on the left and 1 on the right, these cannot be continued to obtain any palindromes. The same is true for 1001 and 0110, which are situated between 1 on the left and 0 on the right, excepting the cases when 1001 is a suffix. So, there are no other palindromes in .
Remark 26.30 If is a circular permutation of and then too. Because we can interchange with , for any there will be at least words of length with total complexity equal to .
We shall give now, beside the upper delimitation from Theorem 26.28, lower bounds for the number of palindromes contained in finite binary words. (In the trivial case of a -letter alphabet it is obvious that, for any word , .)
Theorem 26.31 If is a finite word of length on a -letter alphabet, then
Proof. Up to 8 the computation can be made by a computer program. For , Lemma 26.29 gives words for which . The maximum value is obtained for words of the form .
Remark 26.32 For all the short binary words (up to ), the palindrome complexity takes the maximum possible value given in Theorem 26.28; from the words with , only four (out of ) have , namely , and their complemented words.
In the following lemmas we construct binary words which have a given total palindrome complexity greater than or equal to .
Lemma 26.33 If for and , then .
Proof. In the prefix of length of there are always palindromes (). The other palindromes different from these are , , , , and (for ), respectively (for ). In each case .
Lemma 26.34 If for and , then .
Proof. Since , the prefix of is at least , which includes the palindromes , , , , , and , hence . The palindromes and are situated between and , while and are between and (excepting the cases when they are suffixes), no matter how large is . It follows that contains no other palindromes, hence .
Remark 26.35 If , then the word is equal to , with defined in Lemma 26.29.
We can determine now precisely the image of the restriction of the palindrome complexity function to , .
Theorem 26.36 Let be a binary alphabet. Then
Proof. Having in mind the result in Theorem 26.31, we have to prove only that for each and so that , there exists always a binary word of length for which the total palindrome complexity is . Let and be given so that . We denote and .
If , we take (from Lemma 26.33); if , (from Lemma 26.34). It follows that and .
Example 26.2 Let us consider and . Then , . Because , we use , whose total palindrome complexity is .
We give similar results for the case of alphabets with letters.
Theorem 26.37 If is a finite word of length over a -letter () alphabet, then
Proof. For it can be checked directly. Let us consider now and a word of length at least . If this is a trivial word (containing only one letter times), its total palindrome complexity is . If in the word there appear exactly two letters and , it will have as palindromes those two letters and at least one of , , or , hence again . If the word contains a third letter, then obviously . So, the total complexity cannot be less then .
Theorem 26.38 Let be a -letter () alphabet. Then for
Proof. It remains to prove that for each and so that , there exists always a word of length , for which the total palindrome complexity is . Such a word is , which has palindromes in its prefix of length , and other two palindromes and in what follows.
The number of palindromes in the infinite Sturmian words is given by the following theorem.
Let us recall the power word as being
.
Theorem 26.40 The palindrome complexity of the power word is
where
Proof. There exist the following cases:
Case . Palindrome subwords are:
for ,
for , so .
Case . Palindrome subwords are:
for ,
for , so .
Case . Palindrome subwords are:
for , for , so .
The palindrome subwords of the power word have the following properties:
Every palindrome subword which contains both 0's and 1's occurs only once in the power word.
If we use the notations and then there are the unique decompositions:
The Champernowne word is defined as the concatenation of consecutive binary written natural numbers:
.
Theorem 26.41 The palindrome complexity of the Champernowne word is
where
Proof. Any palindrome of length can be continued as and to obtain palindromes of length . This theorem results from the following: , and for we have
,
.
The following algorithm generates all palindromes up to a given length of a Sturmian word beginning with the letter , and generated by the morphism .
The idea is the following. If is the least value for which and are both of odd length (such a always exists), we consider conjugates of these words, which are palindromes (such conjugates always exists), and we define the following morphism:
Footnote. If then is a conjugate of .
,
,
where conj() produces a conjugate of , which is a palindrome.
The sequences and generate all odd length palindromes, and the sequence all even length palindromes.
If is a word, then represents the word which is obtained from by erasing its first and last letter. More generally, is obtained from by erasing its first and last letters.
Sturmian-Palindromes(
)
1IF
is even 2THEN
3 let be the least value for which and are both of odd length 4 let define the morphism: and 5 6WHILE
7DO
8 9 10 11WHILE
12DO
13 14 15REPEAT
UNTIL
and 19 20WHILE
22DO
22 23 24REPEAT
UNTIL
Because any substitution requires no more than steps, where is a constant, the algorithm is a linear one.
In the case of the Fibonacci word the morphism is defined by
, ,
and because
, , , ,
, , , ,
both being odd numbers, will be equal to 3. The word is not a palindrome, and for the morphism we will use the adequate conjugate , which is a palindrome. In this case the morphism is defined by
,
.
For example, if , the following are obtained:
, and then ,
, and ,
, and
.
PROBLEMS |
26-1
Generating function 1
Let denote the number of sequences of length of zeros and ones, in which the first and last position is 1, and the number of adjacent zeros is at most . Prove that the generating function corresponding to is
Hint. See Subsection 26.2.1.)
26-2
Generating function 2
Prove that the generating function of , the number of all -subwords of a rainbow word of length , is
(Hint. (See Subsection 26.2.1.)
26-3
Window complexity
Compute the window complexity of the infinite Fibonacci word.
26-4
Circuits in De Bruijn graphs
Prove that in the De Bruijn graph there exist circuits (directed cycles) of any length from 1 to .
CHAPTER NOTES |
The basic notions and results on combinatorics of words are given in Lothaire's [162], [163], [164] and Fogg's books [77]. Neither Lothaire nor Fogg is a single author, they are pseudonyms of groups of authors. A chapter on combinatorics of words written by Choffrut and Karhumäki [51] appeared in a handbook on formal languages.
The different complexities are defined as follows: total complexity in Iványi [124], maximal and total maximal complexity in Anisiu, Blázsik, Kása [6], -complexity in Iványi [124] (called -complexity) and used also in Kása [141], -complexity (called super- -complexity) in Kása [143], scattered complexity in Kása [142], factorization complexity in Ilie [123] and window complexity in Cassaigne, Kaboré, Tapsoba [46].
The power word, lower/upper maximal/total complexities are defined in Ferenczi, Kása [74]. In this paper a characterization of Sturmian words by upper maximal and upper total complexities (Theorem 26.11) is also given. The maximal complexity of finite words is studied in Anisiu, Blázsik, Kása [6]. The total complexity of finite words is described in Kása [141], where the results of the Theorem 26.22 is conjectured too, and proved later by Levé and Séébold [160].
Different generalized complexity measures of words are defined and studied by Iványi [124] and Kása [141], [143], [142].
The results on palindrome complexity are described in M.-C. Anisiu, V. Anisiu, Kása [5] for finite words, and in Droubay, Pirillo [65] for infinite words. The algorithm for palindrome generation in Sturmian words is from this paper too. Applications of complexities in social sciences are given in Elzinga [69], [68], and in biology in Troyanskaya et al. [278]. It is worth to consult other papers too, such as [9], [45], [62], [73], [251] (on complexity problems) and [2], [13], [29], [33], [38], [59], [273] (on palindromes).
The European Union and the European Social Fund under the grant agreement no. TÁMOP 4.2.1/B-09/1/KMR-2010-0003.
Table of Contents
In all areas of everyday life there are situations when conflicting aspects have to be taken into account simultaneously. A problem becomes even more difficult when several decision makers' or interest groups' mutual agreement is needed to find a solution.
Conflict situations are divided in three categories in terms of mathematics:
One decision maker has to make a decision by taking several conflicting aspects into account
Several decision makers have to find a common solution, when every decision maker takes only one criterion into account
Several decision makers look for a common solution, but every decision maker takes several criteria into account
In the first case the problem is a multi-objective optimization problem, where the objective functions make up the various aspects. The second case is a typical problem of the game theory, when the decision makers are the players and the criteria mean the payoff functions. The third case appears as Pareto games in the literature, when the different players only strive to find Pareto optimal solutions instead of optimal ones.
In this chapter we will discuss the basics of this very important and complex topic.
Suppose, that one decision maker wants to find the best decision alternative on the basis of several, usually conflicting criteria. The criteria usually represent decision objectives. At first these are usually defined verbally, such as clean air, cheap maintenance, etc. Before the mathematical model is given, firstly, these objectives have to be described by quantifiable indices. It often occurs that one criterion is described by more than one indices, such as the quality of the air, since the simultaneous presence of many types of pollution have an effect on it. In mathematics it is usually assumed that the bigger value of the certain indices (they will be called objective functions) means favorable value, hence we want to maximize all of the objective functions simultaneously. If we want to minimize one of the objective functions, we can safely multiply its value by , and maximize the resulting new objective function. If in the case of one of the objective functions, the goal is to attain some kind of optimal value, we can maximize the deviation from it by multiplying it by .
If denotes the set of possible decision alternatives, and denotes the th objective function , the problem can be described mathematically as follows:
supposing that .
In the case of a single objective function we look for an optimal solution. Optimal solutions satisfy the following requirements:
(i) An optimal solution is always better than any non-optimal solution.
(ii) There is no such possible solution that provides better objective functions than an optimal solution.
(iii) If more than one optimal solution exist simultaneously, they are equivalent in the meaning that they have the same objective functions.
These properties come from the simple fact that the consequential space,
is a subset of the real number line, which is totally ordered. In the case of multiple objective functions, the
consequential space is a subset of the -dimensional Euclidean space, which is only partially ordered. Another complication results from the fact that a decision alternative that could maximize all of the objective functions simultaneously doesn't usually exist.
Let's denote
the maximum of the th objective function, then the
point is called ideal point. If , then there exits an decision for which . In such special cases satisfies the previously defined (i)-(iii) conditions. However, if , the situation is much more complicated. In that case we look for Pareto optimal solutions instead of optimal ones.
Definition 27.1 An alternative is said to be Pareto optimal, if there is no such that for all , with at least one strict inequality.
It is not necessary that a multi-purpose optimization problem has Pareto optimal solution, as the case of the
set shows it. Since is open set, for arbitrary and for a small enough positive and .
Theorem 27.2 If bounded, closed in a finite dimensional Euclidean space and all of the objective functions are continuous, there is Pareto optimal solution.
The following two examples present a discrete and a continuous problem.
Example 27.1 Assume that during the planning of a sewage plant one out of two options must be chosen. The expenditure of the first option is two billion Ft, and its daily capacity is 1500 . The second option is more expensive, three billion Ft with 2000 daily capacity. In this case , , . The following table summarizes the data:
Both options are Pareto optimal, since and .The consequential space consists of two points: and .
Example 27.2 The optimal combination of three technology variants is used in a sewage station. The first variant removes 3,2,1 from one kind of pollution, and 1,3,2 quantity from the another kind of pollution. Let , and denote the percentage composition of the three technology variants.
The restrictive conditions:
the quantity of the removed pollution:
Since the third term is constant, we get the following two objective-function optimum problem:
provided that
A consequential space can be determined as follows. From the
equations
and from the restrictive conditions the following conditions arises for the and objective functions:
Figures 27.2 and 27.3 display the and sets.
On the basis of the image of the set, it is clear that the points of the straight section joining to are Pareto optimal. Point isn't better than any possible point of , because in the first objective function it results the worst possible planes. The points of the section are not equivalent to each other, either, going down from the point towards point , the first objective function is increasing, but the second one is continually decreasing. Thus the (ii) and (iii) properties of the optimal solution doesn't remain valid in the case of multi-objection.
As we saw in the previous example, the different Pareto optimal solutions result in different objective function values, so it is primary importance to decide which one should be selected in a particular case. This question is answered by the methodology of the multi-objective programming. Most methods' basis is to substitute some real-valued “value-function” for the objective functions, that is the preference generated by the objective functions is replaced by a single real-valued function. In this chapter the most frequently used methods of multi-objective programming are discussed.
A natural method is the following. We assign one utility function to every objective function. Let denote the utility function of the th objective function. The construction of the function can be done by the usual manner of the theory of utility functions, for example the decision maker can subjectively define the values for the given values, then a continuous function can be interpolated on the resulting point. In the case of additive independent utility function additive, whereas in the case of independent of usefulness utility function additive or multiplicative aggregate utility function can be obtained. That is, the form of the aggregate utility function is either
or
In such cases the multi-objective optimization problem can be rewrite to one objective-function form:
provided that , and thus means the “value-function”.
Example 27.3 Consider again the decision making problem of the previous example. The range of the first objective function is , while the range of the second one is . Assuming linear utility functions
In addition, suppose that the decision maker gave the
values. Assuming linear utility functions
and in accordance with the given values
By the third equation , and by the second one we obtain , so that
Thus we solve the following one objective-function problem:
provided that
Apparently, the optimal solution is: , , that is the first technology must be chosen.
Assume that the number of objective functions is and the decision maker gives vectors: and the related aggregated utility function values. Then the coefficients can be given by the solution of the
equation system. We always suppose that , so that we have at least as many equations as the number of unknown quantities. If the equation system is contradictory, we determine the best fitting solution by the method of least squares. Suppose that
The formal algorithm is as follows:
Utility-Function-Method(
)
1FOR
TO
2DO
FOR
TO
3DO
4 the vector of solutions 5RETURN
Using this method the value-function is chosen as the linear combination of the original object functions, that is we solve the
problem. If we measure the certain objective functions in different dimensions, the aggregate utility function can't be interpreted, since we add up terms in different units. In this case we generally normalize the objective functions. Let and the minimum and maximum of the objective function on the set . Then the normalized th objective function is given by the
formula, and in the (27.8) problem is replaced by :
It can be shown, that if all of the weights are positive, the optimal solutions of (27.9) are Pareto optimal with regard to the original problem.
Example 27.4 Consider again the case of Example 27.2. From Figure 27.3, we can see that , , , and . Thus the normalized objective functions are:
and
Assume that the objective functions are equally important, so we choose equivalent weights: , in this way the aggregate objective function is:
It is easy to see that the optimal solution on set :
that is, only the second technology variant can be chosen.
Suppose that . The formal algorithm is as follows:
Weighting-Method(
)
1FOR
TO
2DO
3 4 5RETURN
If we normalize the objective functions, the certain normalized objective functions most favorable value is 1 and the most unfavourable is 0. So that is the ideal point and is the worst yield vector.
In the case of distance-dependent methods we either want to get nearest to the vector or get farthest from the point , so that we solve either the
or the
problem, where denotes some distance function in .
In practical applications the following distance functions are used most frequently:
The distance functions the commonly known Minkowski distance for . The geometric distance doesn't satisfy the usual requirements of distance functions however, it is frequently used in practice. As we will see it later, Nash's classical conflict resolution algorithm uses the geometric distance as well. It is easy to prove that the methods using the distance are equivalent of the weighting method. Notice firstly that
where the first term is constant, while the second term is the objective function of the weighting method. Similarly,
which is the objective function of the weighting method.
The method is illustrated in Figures 27.4 and 27.5.
Example 27.5 Consider again the problem of the previous example. The normalized consequences are shown by Figure 27.6. The two coordinates are:
Choosing the and the distances, the nearest point of to the ideal point is
Hence
that is the optimal decision is:
Therefore only the first two technology must be chosen in 20% and 80% proportion.
Let's choose again equivalent weights () and the distance, but look for the farthest point of from the ideal worst point. We can see from Figure 27.5, that the solution is
so
Thus the optimal decision is: and
The formal algorithm is as follows:
Distance-Dependent-Method(
)
1FOR
TO
2DO
3 4 5 or 6RETURN
Assume that we have a point in set , on which we'd like to improve. denotes the present position, on which the decision maker wants to improve, or at design level we can choose the worst point for the starting one. Furthermore, we assume that the decision maker gives an improvement direction vector, which is denoted by . After that, the task is to reach the farthest possible point in set starting from along the direction vector. Thus, mathematically we solve the
optimum task, and the related decision is given by the solution of the
equation under the optimal value. The method is illustrated in Figure 27.7.
Example 27.6 Let's consider again the problem of Example 27.2, and assume that , which contains the worst possible objective function values in its components. If we want to improve the objective functions equally, we have to choose . The graphical solution is illustrated in Figure 27.8, that
so the appropriate values of the decision variables are the following:
A very rarely used variant of the method is when we diminishes the object function values systematically starting from an unreachable ideal point until a possible solution is given. If denotes this ideal point, the (27.18) optimum task is modified as follows:
and the appropriate decision is given by the solution of the
equation.
Example 27.7 To return to the previous example, consider again that and , that is we want to diminish the object functions equally. Figure 27.9 shows the graphical solution of the problem, in which we can see that the given solution is the same as the solution of the previous example.
Applying the method is to solve the (27.18) or the (27.20) optimum tasks, and the optimal decision is given by the solution of the (27.19) or the (27.21) equations.
Exercises
27.1-1 Determine the consequence space for the following exercise:
provided that
27.1-2 Consider the utility functions of the decision maker: és . Furthermore, assume that the decision maker gave the values. Determine the form of the aggregate utility function.
27.1-3 Solve Exercise 27.1-1 using the weighting-method without normalizing the objective functions. Choose the weights.
27.1-4 Repeat the previous exercise, but do normalize the objective functions.
27.1-5 Solve Exercise 27.1-1with normalized objective functions, weights and minimizing the
(i) distance
(ii) distance
(iii) distance.
27.1-6 Repeat the previous exercise, but maximize the distance from the vector instead of minimizing it.
27.1-7 Solve Exercise 27.1-1 using the direction-dependent method, choosing and .
27.1-8 Repeat the previous exercise, but this time choose and .
In this chapter we assume that decision makers interested in the selection of a mutual decision alternative. Let denote the objective function of the th decision maker, which is also called payoff function in the game theory literature. Depending on the decision makers relationship to each other we can speak about cooperative and non-cooperative games. In the first case the decision makers care about only their own benefits, while in the second case they strive for an agreement when every one of them are better off than in the non-cooperative case. In this chapter we will discuss the non-cooperative case, while the cooperative case will be topic of the next chapter.
Let's denote for and , the set of the decision alternatives into which the th decision maker can move over without the others' support. Evidently .
Definition 27.3 An alternative is equilibrium if for all and ,
This definition can also be formulated that is stable in the sense that none of the decision makers can change the decision alternative from alone to change any objective function value for the better. In the case of non-cooperative games, the equilibrium are the solutions of the game.
For any and decision maker, the set
is called the set of the best answers of the th decision maker to alternative . It is clear that the elements of are those alternatives which the th decision maker can move over from , and which ensure the best objective functions out of all the possible alternatives. According to inequality (27.22) it is also clear that is an equilibrium if and only if for all , , that is is mutual fixed point of the point-to-set maps. Thus, the existence of equilibrium can be traced to the existence of mutual fixed point of point-to-set maps, so the problem can be solved by the usual methods.
It is a very common case when the collective decision is made up by the personal decisions of the certain decision makers. Let denote the set of the th decision maker's alternatives, let be the concrete alternatives, and let be the objective function of the th decision maker. That is the collective decision is . In this case
and the (27.22) definition of equilibrium is modified as follows:
In the game theory literature the equilibrium is also called Nash-equilibrium.
The existence of an equilibrium is not guaranteed in general. To illustrate this let's consider the case, when both decision makers can choose between to alternatives: and . The objective function values are shown in Figure 27.10, where the the first number in the parentheses shows the first, the second number shows the second decision maker's objective function value. If equilibrium exists, it might not be unique, what can be proved by the case of constant objective functions, when every decision alternative is an equilibrium.
If the sets are finite, the equilibrium can be found easily by the method of reckoning, when we check for all of the decision vectors whether the component can be changed for the better of the objective function. If the answer is yes, is not equilibrium. If none of the components can be changed in such manner, is equilibrium. For the formal algorithm, let's assume that .
Equilibrium-Search
1FOR
TO
2DO
FOR
TO
3 4DO
FOR
TO
5DO
6FOR
TO
7DO
FOR
TO
8DO
IF
9THEN
and go to 10 10IF
11THEN
is equilibrium
The existence of equilibrium is guaranteed by the following theorem.
Theorem 27.4 Assume that for all
(i) is convex, bounded and closed in a final dimensional Euclidean space;
(ii) is continuous on the set ;
(iii) for any fixed , is concave in .
Then there is at least one equilibrium.
Determination of the equilibrium is usually based on the observation that for all , is the solution of the
optimum task. Writing the necessary conditions of the optimal solution (for example the Kuhn-Tucker conditions) we can get an equality-inequality system which solutions include the equilibrium. To illustrate this method let's assume that
where is a finite dimensional vector and is a vector-valued function. In this way (27.25) can be rewritten as follows:
In this case the Kuhn-Tucker necessary conditions are:
where denotes the gradient at , and is a vector which has the same length as . If we formulate the (27.27) conditions for , we get an equality-inequality system which can be solved by computer methods. It is easy to see that (27.27) can also be rewritten to an nonlinear optimization task:
If the optimal objective function value is positive, the (27.27) system doesn't have a solution, and if the optimal objective function value is zero, any optimal solution is also a solution of the (27.27) system, so the equilibrium are among the optimal solutions. We know about the sufficiency of the Kuhn-Tucker conditions that if is concave in with all , the Kuhn-Tucker conditions are also sufficient, thus every solution of (27.27) gives an equilibrium.
The formal algorithm is as follows:
Kuhn–Tucker-Equilibrium
1FOR
TO
2DO
3 4 the solution of the (27.28) optimum task 5IF
6THEN
RETURN
"there is no equilibrium" 7ELSE
RETURN
()
Example 27.8 Assume that production plant produce some water purification device sold into households. Let denote the quantity produced by the th production plant, let be the cost function of it, and let be the sale price, which depends on the total quantity to be put on the market. Furthermore, be is the capacity of the th production plant. Thus, the possible decision set is the closed interval, which can be defined by the
conditions, so
The objective function of the th production plant is the profit of that:
Since is two-dimensional, is a two-element vector as well, and the (27.28) optimum task has the following form:
Let's introduce the new variables, and for the sake of notational convenience be , then taking account of the last condition, we get the following problem:
Let's notice that in case of optimum , so the last term of the objective function can be neglected.
Consider the special case of , , , . The (27.32) problem is now simplified as follows:
Using a simple computer program we can get the optimal solution:
which is the equilibrium as well.
Exercises
27.2-1 Let , , , . Formulate the (27.27) conditions and solve them as well.
27.2-2 Formulate and solve the optimum problem (27.28) for the previous exercise.
27.2-3 Let again . , , . Repeat Exercise 27.2-1.
27.2-4 Repeat Exercise 27.2-2 for the problem given in the previous exercise.
Similarly to the previous chapter let denote again the decision set of the th decision maker and let be the concrete decision alternatives. Furthermore, let denote the objective function of the th decision maker. Let be some subset of the decision makers, which is usually called coalition in the game theory. For arbitrary , let's introduce the
function, which is also called the characteristic function defined on all of the subsets of the set , if we add the and
special cases to definition (27.34).
Consider again that all of the sets are finite for . Be a coalition. The value of is given by the following algorithm, where denotes the number of elements of , denotes the elements and the elements which are not in .
Characteristic-Function(
)
1 , where a very big positive number 2FOR
TO
3 4DO
FOR
TO
5DO
FOR
TO
6 7DO
FOR
TO
8DO
, where a very big positive number 9 10IF
11THEN
12IF
13THEN
14RETURN
Example 27.9 Let's return to the problem discussed in the previous example, and assume that , , és for . Since the cost functions are identical, the objective functions are identical as well:
In the following we determine the characteristic function. At first be , then
Since the function strictly decreases in the variables, the minimal value of it is given at , so
what is easy to see by plain differentiation. Similarly for
Similarly to the previous case the minimal value is given at , so
where we introduced the new variable. In the case of
where this time we introduced the variable.
Definition (27.34) can be interpreted in a way that the characteristic function value gives the guaranteed aggregate objective function value of the coalition regardless of the behavior of the others. The central issue of the theory and practice of the cooperative games is how should the certain decision makers share in the maximal aggregate profit attainable together. An division is usually called imputation, if
for and
The inequality system (27.35)–(27.36) is usually satisfied by infinite number of imputations, so we have to specify additional conditions in order to select one special element of the imputation set. We can run into a similar situation while discussing the multi-objective programming, when we looks for a special Pareto optimal solution using the concrete methods.
Example 27.10 In the previous case a vector is imputation if
The most popular solving approach is the Shapley value, which can be defined as follows:
where denotes the number of elements of the coalition.
Let's assume again that the decision makers are fully cooperating, that is they formulate the coalition , and the certain decision makers join to the coalition in random order. The difference indicates the contribution to the coalition by the th decision maker, while expression (27.37) indicates the average contribution of the same decision maker. It can be shown that is an imputation.
The Shapley value can be computed by following algorithm:
Shapley-Value
1FOR
2DO
Characteristic-Function(
)
3FOR
TO
4DO
use (27.37) for calculating
Example 27.11 In the previous example we calculated the value of the characteristic function. Because of the symmetry, must be true for the case of Shapley value. Since , . We get the same value by formula (27.37) too. Let's consider the value. If , , so the appropriate terms of the sum are zero-valued. The non-zero terms are given for coalitions and , so
An alternative solution approach requires the stability of the solution. It is said that the vector majorizes the vector in coalition , if
that is the coalition has an in interest to switch from payoff vector to payoff vector , or is instabil for coalition . The Neumann–Morgenstern solution is a set of imputations for which
(i) There is no , that majorizes in some coalition (inner stability)
(ii) If , there is , that majorizes -t in at least one coalition (outer stability).
The main difficulty of this conception is that there is no general existence theorem for the existence of a non-empty set, and there is no general method for constructing the set .
Exercises
27.3-1 Let , , . Determine the characteristic function.
27.3-2 Formulate the (27.35), (27.36) condition system for the game of the previous exercise.
27.3-3 Determine the Shapley values for the game of Exercise 27.3-1.
In the previous chapter we assumed that the objective functions are given by numerical values. These numerical values also mean preferences, since the th decision maker prefers alternative to , if . In this chapter we will discuss such methods which don't require the knowledge of the objective functions, but the preferences of the certain decision makers.
Let denote again the number of decision makers, and the set of decision alternatives. If the th decision maker prefers alternative to , this is denoted by , if prefers alternative to or thinks to be equal, it is denoted by . Assume that
(i) For all , or (or both)
(ii) For and , .
Condition (i) requires that the partial order be a total order, while condition (ii) requires to be transitive.
Definition 27.5 A group decision-making function combines arbitrary individual partial orders into one partial order, which is also called the collective preference structure of the group.
We illustrate the definition of group decision-making function by some simple example.
Example 27.12 Be arbitrary, and for all
Let given positive constant, and
The group decision-making function means:
The majority rule is a special case of it when .
Example 27.13 An decision maker is called dictator, if his or her opinion prevails in group decision-making:
This kind of group decision-making is also called dictatorship.
Example 27.14 In the case of Borda measure we assume that is a finite set and the preferences of the decision makers is expressed by a measure for all . For example , if is the best, , if is the second best alternative for the th decision maker, and so on, , if is the worst alternative. Then
A group decision-making function is called Pareto or Pareto function, if for all and , necessarily. That is, if all the decision makers prefer to , it must be the same way in the collective preference of the group. A group decision-making function is said to satisfy the condition of pairwise independence, if any two and preference structure satisfy the followings. Let such that for arbitrary , if and only if , and if and only if . Then if and only if , and if and only if in the collective preference of the group.
Example 27.15 It is easy to see that the Borda measure is Pareto, but it doesn't satisfy the condition of pairwise independence. The first statement is evident, while the second one can be illustrated by a simple example. Be , . Let's assume that
and
Then , thus . However , so . As we can see the certain decision makers preference order between and is the same in both case, but the collective preference of the group is different.
Let denote the set of the -element full and transitive partial orders on an at least three-element set, and be the collective preference of the group which is Pareto and satisfies the condition of pairwise independence. Then is necessarily dictatorial. This result originated with Arrow shows that there is no such group decision-making function which could satisfy these two basic and natural requirements.
Example 27.16 The method of paired comparison is as follows. Be arbitrary, and let's denote the number of decision makers, to which . After that, the collective preference of the group is the following:
that is if and only if more than one decision makers prefer the alternative to . Let's assume again that consists of three elements, and the individual preferences for
Thus, in the collective preference , because and . Similarly , because and , and , since and . Therefore which is inconsistent with the requirements of transitivity.
The methods discussed so far didn't take account of the important circumstance that the decision makers aren't necessarily in the same position, that is they can have different importance. This importance can be characterized by weights. In this generalized case we have to modify the group decision-making methods as required. Let's assume that is finite set, denote the number of alternatives. We denote the preferences of the decision makers by the numbers ranging from 1 to , where 1 is assigned to the most favorable, while is assigned to most unfavorable alternative. It's imaginable that the two alternatives are equally important, then we use fractions. For example, if we can't distinguish between the priority of the 2nd and 3rd alternatives, then we assign 2.5 to each of them. Usually the average value of the indistinguishable alternatives is assigned to each of them. In this way, the problem of the group decision can be given by a table which rows correspond to the decision makers and columns correspond to the decision alternatives. Every row of the table is a permutation of the numbers, at most some element of it is replaced by some average value if they are equally-preferred. Figure 27.11 shows the given table in which the last column contains the weights of the decision makers.
In this general case the majority rule can be defined as follows. For all of the alternatives determine first the aggregate weight of the decision makers to which the alternative is the best possibility, then select that alternative for the best collective one for which this sum is the biggest. If our goal is not only to select the best, but to rank all of the alternatives, then we have to choose descending order in this sum to rank the alternatives, where the biggest sum selects the best, and the smallest sum selects the worst alternative. Mathematically, be
and
for . The th alternative is considered the best by the group, if
The formal algorithm is as follows:
Majority-Rule(
)
1 2FOR
TO
3DO
FOR
TO
4DO
IF
5THEN
6IF
7THEN
8 9RETURN
Applying the Borda measure, let
and alternative is the result of the group decision if
The Borda measure can be described by the following algorithm:
Borda-Measure-Method(
)
1 2FOR
TO
3DO
FOR
TO
4DO
5IF
6THEN
7 8RETURN
Applying the method of paired comparison, let with any
which gives the weight of the decision makers who prefer the alternative to . In the collective decision
In many cases the collective partial order given this way doesn't result in a clearly best alternative. In such cases further analysis (for example using some other method) need on the
non-dominated alternative set.
By this algorithm we construct a matrix consists of the elements, where if and only if the alternative is better in all then alternative . In the case of draw .
Paired-Comparison(
)
1FOR
TO
2DO
FOR
TO
3DO
4FOR
TO
5DO
IF
6THEN
7IF
8THEN
9IF
10THEN
11IF
12THEN
13 14RETURN
Example 27.17 Four proposal were received by the Environmental Authority for the cleaning of a chemically contaminated site. A committee consists of 6 people has to choose the best proposal and thereafter the authority can conclude the contract for realizing the proposal. Figure 27.12 shows the relative weight of the committee members and the personal preferences.
Majority rule
Using the majority rule
so the first alternative is the best.
Using the Borda measure
In this case the first alternative is the best as well, but this method shows equally good the second and third alternatives. Notice, that in the case of the previous method the second alternative was better than the third one.
In the case of the method of paired comparison
Thus and . These references are showed by Figure 27.13. The first alternative is better than any others, so this is the obvious choice.
In the above example all three methods gave the same result. However, in several practical cases one can get different results and the decision makers have to choose on the basis of other criteria.
Exercises
27.4-1 Let's consider the following group decision-making table:
Apply the majority rule.
27.4-2 Apply the Borda measure to the previous exercise.
27.4-3 Apply the method of paired comparison to Exercise 27.4-1.
27.4-4 Let's consider now the following group decision-making table:
Repeat Exercise 27.4-1 for this exercise.
27.4-5 Apply the Borda measure to the previous exercise.
27.4-6 Apply the method of paired comparison to Exercise 27.4-4.
Let denote again the number of decision makers but suppose now that the decision makers have more than one objective functions separately. There are several possibility to handle such problems:
(A) In the application of multi-objective programming, let denote the weight of the th decision maker, and let be the weights of this decision maker's objective functions. Here denote the number of the th decision maker's objective functions. Thus we can get an optimization problem with the objective function, where all of the decision makers' all the objective functions mean the objective function of the problem, and the weights of the certain objective functions are the sequences. We can use any of the methods from Chapter 27.1 to solve this problem.
(B) We can get another family of methods in the following way. Determine an utility function for every decision maker (as described in Chapter 27.1.1), which compresses the decision maker's preferences into one function. In the application of this method every decision maker has only one (new) objective function, so any methods and solution concepts can be used from the previous chapters.
(C) A third method can be given, if we determine only the partial order of the certain decision makers defined on an alternative set by some method instead of the construction of utility functions. After that we can use any method of Chapter 27.4 directly.
Example 27.18 Modify the previous chapter as follows. Let's suppose again that we choose from four alternatives, but assume now that the committee consists of three people and every member of it has two objective functions. The first objective function is the technical standards of the proposed solution on a subjective scale, while the second one are the odds of the exact implementation. The latter one is judged subjectively by the decision makers individually by the preceding works of the supplier. The data is shown in Figure 27.16, where we assume that the first objective function is judged on a subjective scale from 0 to 100, so the normalized objective function values are given dividing by 100. Using the weighting method we get the following aggregate utility function values for the separate decision makers:
1. Decision maker
2. Decision maker
3. Decision maker
The preferences thus are the following:
For example, in the application of Borda measure
are given, so the group-order of the four alternatives
Exercises
27.5-1 Let's consider the following table:
Let's consider that the objective functions are already normalized. Use method (A) to solve the exercise.
27.5-2 Use method (B) for the previous exercise, where the certain decision makers' utility functions are given by the weighting method, and the group decision making is given by the Borda measure.
27.5-3 Solve Exercise 27.5-2 using the method of paired comparison instead of Borda measure.
For the sake of simplicity, let's consider that , that is we'd like to solve the conflict between two decision makers. Assume that the consequential space is convex, bounded and closed in , and there is given a point which gives the objective function values of the decision makers in cases where they are unable to agree. We assume that there is such that . The conflict is characterized by the pair. The solution obviously has to depend on both and , so it is some function of them: .
For the case of the different solution concepts we demand that the solution function satisfies some requirements which treated as axioms. These axioms require the correctness of the solution, the certain axioms characterize this correctness in different ways.
In the case of the classical Nash solution we assume the following:
(i) (possibility)
(ii) (rationality)
(iii) is Pareto solution in (Pareto optimality)
(iv) If and , necessarily (independence of irrelevant alternatives)
(v) Be such linear transformation that is positive for and . Then (invariant to affine transformations)
(vi) If and are symmetrical, that is and , then the components of be equals (symmetry).
Condition (i) demands the possibility of the solution. Condition (ii) requires that none of the rational decision makers agree on a solution which is worse than the one could be achieved without consensus. On the basis of condition (iii) there is no better solution than the friendly solution. According to requirement (iv), if after the consensus some alternatives lost their possibility, but the solution is still possible, the solution remains the same for the reduced consequential space. If the dimension of any of the objective functions changes, the solution can't change. This is required by (v), and the last condition means that if two decision makers are in the absolutely same situation defining the conflict, we have to treat them in the same way in the case of solution. The following essential result originates from Nash:
Theorem 27.6 The (i)-(vi) conditions are satisfied by exactly one solution function, and can be given by as the
optimum problem unique solution.
Example 27.19 Let's consider again the consequential space showed in Figure 27.3 before, and suppose that , that is it comprises the worst values in its components. Then Exercise (27.44) is the following:
It's easy to see that the optimal solution is .
Notice that problem (27.44) is a distance dependent method, where we maximize the geometric distance from the point. The algorithm is the solution of the (27.44) optimum problem.
Condition (vi) requires that the two decision makers must be treated equally. However in many practical cases this is not an actual requirement if one of them is in stronger position than the other.
Theorem 27.7 Requirements (i)-(v) are satisfied by infinite number of functions, but every solution function comprises such , that the solution is given by as the
optimum problem unique solution.
Notice that in the case of , problem (27.45) reduces to problem (27.44). The algorithm is the solution of the (27.45) optimum problem.
Many author criticized Nash's original axioms, and beside the modification of the axiom system, more and more new solution concepts and methods were introduced. Without expose the actual axioms, we will show the methods judged to be of the utmost importance by the literature.
In the case of the Kalai–Smorodinsky solution we determine firstly the ideal point, which coordinates are:
then we will accept the last mutual point of the half-line joining to the ideal point and as solution. Figure 27.18. shows the method. Notice that this is an direction dependent method, where the half-line shows the direction of growing and is the chosen start point.
The algorithm is the solution of the following optimum problem.
provided that
Example 27.20 In the case of the previous example and . We can see in Figure 27.19, that the last point of the half-line joining to in is the intersection point of the half-line and the section joining to .
The equation of the half-line is
while the equation of the joining section is
so the intersect point: .
In the case of the equal-loss method we assume, that starting from the ideal point the two decision makers reduce the objective function values equally until they find a possible solution. This concept is equivalent to the solution of the
optimum problem. Let denote the minimal value, then the point is the solution of the conflict. The algorithm is the solution of the (27.46) optimum problem.
Example 27.21 In the case of the previous example , so starting from this point going by the line, the first possible solution is the point again.
In the case of the method of monotonous area the solution is given by as follows. The linear section joining to divides the set into two parts, if is a Pareto optimal solution. In the application of this concept we require the two areas being equal. Figure 27.20 shows the concept. The two areas are given by as follows:
and
where we suppose that defines the graph of the Pareto optimal solution. Thus we get a simple equation to determine the unknown value of .
The algorithm is the solution of the following nonlinear, univariate equation:
Any commonly known (bisection, secant, Newton's method) method can be used to solve the problem.
Exercises
27.6-1 Consider that . Be . Use the (27.44) optimum problem.
27.6-2 Assume that the two decision makers are not equally important in the previous exercise. . Solve the (27.45) optimum problem.
27.6-3 Use the Kalai–Smorodinsky solution for Exercise 27.6-1.
27.6-4 Use the equal-loss method for Exercise 27.6-1.
27.6-5 Use the method of monotonous area for Exercise 27.6-1.
PROBLEMS |
Prove that the solution of problem (27.9) is Pareto optimal for any positive values.
27-2
Distant dependent methods
Prove that the distance dependent methods always give Pareto optimal solution for . Is it also true for ?
27-3
Direction dependent methods
Find a simple example for which the direction dependent methods give non Pareto optimal solution.
27-4
More than one equilibrium
Suppose in addition to the conditions of 27.4. that all of the functions are strictly concave in . Give an example for which there are more than one equilibrium.
Prove that the Shapley values result imputation and satisfy the (27.35)–(27.36) conditions.
27-6
Group decision making table
Solve such a group decision making table where the method of paired comparison doesn't satisfy the requirement of transitivity. That is there are such alternatives for which , , but .
27-7
Application of Borda measure
Construct such an example, where the application of Borda measure equally qualifies all of the alternatives.
27-8
Kalai–Smorodinsky solution
Prove that using the Kalai–Smorodinsky solution for non convex , the solution is not necessarily Pareto optimal.
Show that for non convex , neither the equal-loss method nor the method of monotonous area can guarantee Pareto optimal solution.
CHAPTER NOTES |
Readers interested in multi-objective programming can find addition details and methods related to the topic in the [263] book. There are more details about the method of equilibrium and the solution concepts of the cooperative games in the [79] monograph. The [268] monograph comprises additional methods and formulas from the methodology of group decision making. Additional details to Theorem 27.6 originates from Hash can be found in [191]. One can read more details about the weakening of the conditions of this theorem in [107]. Details about the Kalai–Smorodinsky solution, the equal-loss method and the method of monotonous area can found respectively in [137], [53] and [3]. Note finally that the [271] summary paper discuss the axiomatic introduction and properties of these and other newer methods.
The results discussed in this chapter can be found in the book of Molnár Sándor and Szidarovszky Ferenc [182] in details.
The European Union and the European Social Fund have provided financial support to the project under the grant agreement no. TÁMOP 4.2.1/B-09/1/KMR-2010-0003.
Table of Contents
GPGPU stands for General-Purpose computation on Graphics Processing Units, also known as GPU Computing. Graphics Processing Units (GPU) are highly parallel, multithreaded, manycore processors capable of very high computation and data throughput. Once specially designed for computer graphics and programmable only through graphics APIs, today's GPUs can be considered as general-purpose parallel processors with the support for accessible programming interfaces and industry-standard languages such as C.
Developers who port their applications to GPUs often achieve speedups of orders of magnitude vs. optimized CPU implementations. This difference comes from the high floating point performance and peak memory bandwidth of GPUs. This is because the GPU is specialized for compute-intensive, highly parallel computation—exactly what graphics rendering is about—and therefore designed such that more transistors are devoted to data processing rather than data caching and flow control. From the developer's point of view this means that hardware latencies are not hidden, they must be managed explicitly, and writing an efficient GPU program is not possible without the knowledge of the architecture.
Another reason of discussing GPGPU computing as a specific field of computer science is that although a GPU can be regarded as a parallel system, its architecture is not a clean implementation of parallel computing models (see Chapter 15 of this book titled Parallel Computations). Instead, it has the features of many different models, like pipelines, vector or array processors, Single-Instruction Multiple-Data (SIMD) machines, stream-processors, multi-processors connected via shared memory, hard-wired algorithms, etc. So, when we develop solutions for this special architecture, the ideas applicable for many different architectures should be combined in creative ways.
GPUs are available as graphics cards, which must be mounted into computer systems, and a runtime software package must be available to drive the computations. A graphics card has programmable processing units, various types of memory and cache, and fixed-function units for special graphics tasks. The hardware operation must be controlled by a program running on the host computer's CPU through Application Programming Interfaces (API). This includes uploading programs to GPU units and feeding them with data. Programs might be written and compiled from various programming languages, some originally designed for graphics (like Cg [195] or HLSL [177]) and some born by the extension of generic programming languages (like CUDA C). The programming environment also defines a programming model or virtual parallel architecture that reflects how programmable and fixed-function units are interconnected. Interestingly, different programming models present significantly different virtual parallel architectures (Figure 28.1). Graphics APIs provide us with the view that the GPU is a pipeline or a stream-processor since this is natural for most of the graphics applications. CUDA [196] or OpenCL [146], on the other hand, gives the illusion that the GPU is a collection of multiprocessors where every multiprocessor is a wide SIMD processor composed of scalar units, capable of executing the same operation on different data. The number of multiprocessors in a single GPU can range nowadays up to a few hundreds and a single multiprocessor typically contains 8 or 16 scalar units sharing the instruction decoder.
Figure 28.1. GPU programming models for shader APIs and for CUDA. We depict here a Shader Model 4 compatible GPU. The programmable stages of the shader API model are red, the fixed-function stages are green.
The total number of scalar processors is the product of the number of multiprocessors and the number of SIMD scalar processors per multiprocessor, which can be well over a thousand. This huge number of processors can execute the same program on different data. A single execution of the program is called the thread. A multiprocessor executes a thread block. All processors have some fast local memory, which is only accessible to threads executed on the same processor, i.e. to a thread block. There is also global device memory to which data can be uploaded or downloaded from by the host program. This memory can be accessed from multiprocessors through different caching and synchronization strategies. Compared to the CPU, this means less transistors for caching, less cache performance in general, but more control for the programmer to make use of the memory architecture in an efficient way.
The above architecture favours the parallel execution of short, coherent computations on compact pieces of data. Thus, the main challenge of porting algorithms to the GPU is that of parallelization and decomposition to independent computational steps. GPU programs, which perform such a step when executed by the processing units, are often called kernels or shaders, the former alludes to the parallel data processing aspect and the latter is a legacy of the fundamental graphics task: the simulation of light reflection at object surfaces, better known as shading.
GPU programming languages and control APIs have grown pretty similar to each other in both capabilities and syntax, but they can still be divided into graphics and GPGPU solutions. The two approaches can be associated with two different programmer attitudes. While GPGPU frameworks try to add some constructs to programming languages to prepare regular code for parallel execution, graphics APIs extend previously very limited parallel shader programs into flexible computational tools. This second mindset may seem obsolete or only relevant in specific graphics-related scenarios, but in essence it is not about graphics at all: it is about the implicit knowledge of how parallel algorithms work, inherent to the incremental image synthesis pipeline. Therefore, we first discuss this pipeline and how the GPU device is seen by a graphics programmer. This will not only make the purpose and operation of device components clear, but also provides a valid and tried approach to general purpose GPU programming, and what GPU programs should ideally look like. Then we introduce the GPGPU approach, which abandons most of the graphics terminology and neglects task-specific hardware elements in favour of a higher abstraction level.
The graphics pipeline model provides an abstraction over the GPU hardware where we view it as a device which performs incremental image synthesis [264] (see Chapter 22 of this book, titled Computer Graphics of this book). Incremental image synthesis aims to render a virtual world defined by a numerical model by transforming it into linear primitives (points, lines, triangles), and rasterizing these primitives to pixels of a discrete image. The process is composed of several algorithmic steps, which are grouped in pipeline stages. Some of these stages are realized by dedicated hardware components while others are implemented through programs run by GPUs. Without going into details, let us recap the image synthesis process (Figure 28.2):
The virtual world is a collection of model instances. The models are approximated using triangle meshes. This is called tessellation.
In order to perform shading, the objects have to be transformed into the coordinate system where the camera and lights are specified. This is either the world space or the camera space.
Triangle vertices are projected on-screen according to the camera settings. Where a vertex should appear on the screen is found by applying the camera transformation, the perspective transformation, and finally the viewport transformation. In camera space the camera is in the origin and looks at the direction. Rays originating at the camera focus, called the eye position, and passing through points on the window that represent the pixels of our display form a perspective bundle. The role of perspective transformation is to convert this perspective bundle into a parallel bundle of rays, thus to replace perspective projection by a parallel projection. After perspective transformation, the vertices are in normalized device space where the visible volume is an axis aligned cube defined by inequalities , , . Parts of the geometric primitives that are outside of this volume are removed by clipping. Normalized device space is further transformed to screen space, where the target image resolution and position are taken into account. Points of normalized device space coordinates are mapped to the lower left corner of the viewport rectangle on the screen. Points of are projected to the upper right corner. Meanwhile, the range of is converted to .
In screen space every projected triangle is rasterized to a set of pixels. When an internal pixel is filled, its properties, including the coordinate, also called the depth value, and shading data are computed via incremental linear interpolation from the vertex data. For every pixel, a shading color is computed from the interpolated data. The shading color of a pixel inside the projection of the triangle might be the color interpolated from the vertex colors. Alternatively, we can map images called textures onto the meshes. Texture images are 2D arrays of color records. An element of the texture image is called the texel. How the texture should be mapped onto triangle surfaces is specified by texture coordinates assigned to every vertex.
Pixel colors are finally written to the frame buffer that is displayed on the computer screen. Besides the frame buffer, we maintain a depth buffer (also called z-buffer or depth stencil texture), containing screen space depth, which is the coordinate of the point whose color value is in the frame buffer. Whenever a triangle is rasterized to a pixel, the color and the depth are overwritten only if the new depth value is less than the depth stored in the depth buffer, meaning the new triangle fragment is closer to the viewer. As a result, we get a rendering of triangles correctly occluding each other in 3D. This process is commonly called the depth buffer algorithm. The depth buffer algorithm is also an example of a more general operation, which computes the pixel data as some function of the new data and the data already stored at the same location. This general operation is called merging.
The GPU architecture as presented by the graphics API is the direct implementation of the image synthesis pipeline (left part of Figure 28.1). This pipeline is configured by the CPU via graphics API calls, and its operation is initiated by the draw call. A sequence of draw calls during which the configuration of the pipeline does not change (but the inputs do) is called a pass. A single draw call operates on a sequence of vertices, the attributes of which are stored in a vertex buffer.
Vertices are expected to be specified in modeling space with homogeneous coordinates. A point of Cartesian coordinates can be defined by the quadruple of homogeneous coordinates using an arbitrary, non-zero scalar (for more details see Chapter 21 Computer Graphics of this book). This representation owns its name to the fact that if the elements of the quadruple are multiplied by the same scalar, then the represented point will not change. From homogeneous quadruple the Cartesian coordinates of the same point can be obtained by homogeneous division, that is as . Homogeneous coordinates have several advantages over Cartesian coordinates. When homogeneous coordinates are used, even parallel lines have an intersection (an ideal point,) thus the singularity of the Euclidean geometry caused by parallel lines is eliminated. Homogeneous linear transformations include perspective projection as well, which has an important role in rendering, but cannot be expressed as a linear function of Cartesian coordinates. Most importantly, the widest class of transformations that preserve lines and planes are those which modify homogeneous coordinates linearly.
Having set the vertex buffer, vertices defined by their coordinates and attributes like texture coordinates or color begin their journey down the graphics pipeline, visiting processing stages implemented by programmable shader processors or fixed-function hardware elements. We consider these stages one-by-one.
If the vertices do not directly define the final triangle mesh, but they are control points of a parametric surface or define just a coarse version of the mesh, the first step is the development of the final mesh, which is called tessellation. As the programmability of this stage is limited and its GPGPU potential is small, we do not discuss this stage further but assume that the vertex buffer contains the fine mesh that needs no tessellation.
Objects must be transformed to normalized device space for clipping, which is typically executed by a homogeneous linear transformation. Additionally, GPUs may also take the responsibility of illumination computation at the vertices of the triangle mesh. These operations are executed in the vertex shader. From a more general point of view, the vertex shader gets a single vertex at a time, modifies its attributes, including position, color, and texture coordinates, and outputs the modified vertex. Vertices are processed independently and in parallel.
The geometry shader stage receives vertex records along with primitive information. It may just pass them on as in the fixed-function pipeline, or spawn new vertices. Optionally, these may be written to an output buffer, which can be used as an input vertex buffer in a consecutive pass. A typical application of the geometry shader is procedural modeling, when a complex model is built from a single point or a triangle [170].
While vertex shaders have evolved from small, specialized units to general stream processors, they have kept the one record of output for every record of input scheme. The geometry shader, on the other hand, works on vertex shader output records (processed vertices), and outputs a varying (but limited) number of similar records.
The clipping hardware keeps only those parts of the primitives that are inside an axis aligned cube of corners and in normalized device space. In homogeneous coordinates, a point should meet the following requirements to be inside:
This formulation complies to the OpenGL [192] convention. It is valid e.g. in the Cg language when compiling for an OpenGL vertex shader profile. The last pair of inequalities can also be defined as , as Direct3D assumes. This is the case for Cg Direct3D profiles and in the HLSL standard. The difference is hidden by compilers which map vertex shader output to what is expected by the clipping hardware.
Clipping is executed by a fixed-function hardware of the GPU, so its operation can neither be programmed nor modified. However, if we wish our primitives to continue their path in further stages of the pipeline, the conditions of the clipping must be satisfied. In GPGPU, the clipping hardware is considered as a stream filter. If it turns out that a data element processed by vertex and geometry shader programs needs to be discarded, vertices should be set to move the primitive out of the clipping volume. Then the clipping hardware will delete this element from the pipeline.
After clipping the pipeline executes homogeneous division, that is, it converts homogeneous coordinates to Cartesian ones by dividing the first three homogeneous coordinates by the fourth (). The points are then transformed to screen space where the first two Cartesian coordinates select the pixel in which this point is visible.
The heart of the pipeline is the non-programmable rasterization stage. This is capable of converting linear primitives (triangles, line segments, points) into discrete fragments corresponding to digital image pixels. More simply put, it draws triangles if the screen coordinates of the vertices are given. Pipeline stages before the rasterizer have to compute these vertex coordinates, stages after it have to process the fragments to find pixel colors.
Even though the base functionality of all stages can be motivated by rasterization, GPGPU applications do not necessarily make use of drawing triangles. Still, the rasterizer can be seen to work as a stream expander, launching an array of fragment computations for all primitive computations, only the triangles have to be set up cleverly.
Rasterization works in screen space where the coordinates of the vertices are equal to those integer pixel coordinates where the vertices are projected. The vertices may have additional properties, such as a coordinate in screen space, texture coordinates and color values. When a triangle is rasterized, all those pixels are identified which fall into the interior of the projection of the triangle. The properties of the individual pixels are obtained from the vertex properties using linear interpolation.
The fragment properties interpolated from vertex properties are used to find the fragment color and possibly a modified depth value. The classical operation for this includes fetching the texture memory addressed by the interpolated texture coordinates and modulating the result with the interpolated color.
Generally, fragment shader programs get the interpolated properties of the fragment and output the color and optionally modify the depth of the fragment. Like the vertex shader, the fragment shader is also one-record-in, one-record-out type processor. The fragment shader is associated with the target pixel, so it cannot write its output anywhere else.
When final fragment colors are computed, they may not directly be written to the image memory, but the output merger stage is responsible for the composition. First, the depth test against the depth buffer is performed. Note that if the fragment shader does not modify the value, depth testing might be moved before the execution of the fragment shader. This early -culling might improve performance by not processing irrelevant fragments.
Finally, the output merger blends the new fragment color with the existing pixel color, and outputs the result. This feature could implement blending needed for transparent surface rendering (Figure 28.3).
In GPGPU, blending is mainly useful if we need to find the sum, minimum or maximum of results from consecutive computations without a need of reconfiguring the pipeline between them.
Figure 28.3. Blending unit that computes the new pixel color of the frame buffer as a function of its old color (destination) and the new fragment color (source).
In general purpose programming, we are used to concepts like input data, temporary data, output data, and functions that convert input data to temporary and finally to output data according to their parameters. If we wish to use the GPU as presented by a graphics API, our programming concepts should be mapped onto the concepts of incremental image synthesis, including geometric primitives, vertex/primitive/fragment processing, rasterization, texturing, merging, and final image. There are many different possibilities to establish this correspondence, and their comparative advantages also depend on the actual algorithm. Here we consider a few general approaches that have proven to be successful in high performance computing applications. First, we discuss how our general purpose programming concepts can be related to GPU features.
GPUs render images, i.e. two-dimensional arrays of pixels. The render target can be the frame buffer that is displayed or an output texture (in the latter case, the pixel is often referred to as a texel). In GPGPU the output is usually a texture since texels can be stored in floating point format unlike the final frame buffer values that are unsigned bytes. Furthermore, textures can be used later on as inputs of subsequent computation passes, i.e. the two-dimensional output texture can be interpreted as one or two-dimensional input texture in the next rendering pass, or as a single layer of a three-dimensional texture. In older GPUs, a pixel was capable of storing at most five floating point values since a color is typically identified by red, green, blue, and opacity values, and hidden surface elimination needed a single distance value, which is the coordinate of the point in screen coordinates. Later, with the emergence of multiple render targets, a pixel could be associated with several, e.g. four textures, which means that the maximum size of an output record could grow to 17 floats. In current, most advanced Shader Model 5.0 GPUs even this limitation has been lifted, so a single pixel can store a list of varying number of values.
Which pixel is targeted by the rendering process is determined by the geometric elements. Each primitive is transformed to screen space and its projection is rasterized which means that those pixels are targeted that are inside the projection. If more than one element is sent down the pipeline, their projections may overlap, so the pixel value is calculated multiple times. The merging unit combines these partial results, it may keep only one, e.g. the fragment having minimal screen space coordinate if depth testing is enabled, or it may add up partial results using blending.
An important property of the render target is that it can be read directly by none of the shader processors, and only the fragment shader processor can indirectly write into it via the possible merging operation. Different fragment shaders are assigned to different parts of the render target, so no synchronization problem may occur.
In image synthesis the inputs are the geometry stream and the textures used to color the geometry. As a triangle mesh geometry has usually no direct meaning in a GPGPU application, we use the geometry stream only as a control mechanism to distribute the computational load among the shader processors. The real GPGPU input will be the data stored in textures. The texture is a one-, two- or three-dimensional array of color data elements, which can store one, two, three or four scalars. In the most general case, the color has red, green, blue and opacity channels. These color values can be stored in different formats including, for example, unsigned bytes or 32 bit floats. From the point of view of GPGPU, 32 bit floats are the most appropriate.
A one-dimensional float texture is similar to the linear CPU memory where the usual data structures like arrays, lists, trees etc. can be encoded. However, the equivalence of the CPU memory and the GPU texture fails in two important aspects. In one, the texture is poorer, in the other, it is better than the linear CPU memory.
An apparent limitation is that a texture is parallel read-only for all programmable shaders with the exception of the render target that cannot be read by the shaders and is accessible only for the merger unit. Read-modify-write cycles, which are common in the CPU memory, are not available in shader programs. GPU designers had a good reason not to allow read-modify-write cycles and to classify textures as parallel read-only and exclusive write-only. In this way, the writes do not have to be cached and during reads caches get never invalidated.
On the other hand, the texture memory has much more addressing modes than a linear memory, and more importantly, they are also equipped with built-in texture filters. With the filters, a texture is not only an array of elements, but also a finite element representation of a one-, two-, or three-dimensional spatial function (refer to Section 28.7 to learn more of the relation between finite element representations and textures).
For one-dimensional textures, we can use linear filtering, which means that if the texture coordinate points to a location in between two texels of coordinates and , then the hardware automatically computes a linear interpolation of the two texel values. Let these texels be and . The filtered value returned for is then
Two-dimensional textures are filtered with bi-linear filtering taking the four texels closest to the interpolated texture coordinate pair . Let these be , , , and . The filtered value returned for is then
where and .
For three-dimensional textures, tri-linear filtering is implemented.
As the primitives flow through the pipeline, shader processors and fixed-function elements process them, determining the final values in each pixel. The programs of shader processors are not changed in a single rendering pass, so we can say that each pixel is computed by the very same program. The difference of pixel colors is due to data dependencies. So, in conclusion a GPU can be regarded as a hardware that computes an array of records.
In the GPU, primitives are processed by a series of processors that are either programmable or execute fixed algorithms while output pixels are produced. It means that GPUs can also be seen as stream processors. Vertices defining primitives enter a single virtual stream and are first processed by the vertex shader. With stream processing terminology, the vertex shader is a mapping since it applies a function to the vertex data and always outputs one modified vertex for each input vertex. So, the data frequency is the same at the output as it was at the input. The geometry shader may change the topology and inputting a single primitive, it may output different primitives having different number of vertices. The data frequency may decrease, when the stream operation is called reduction, or may increase, when it is called expansion. The clipping unit may keep or remove primitives, or may even change them if they are partially inside of the clipping volume. If we ignore partially kept primitives, the clipping can be considered as a stream filter. By setting the coordinates of the vertices in the vertex shader to be outside of the clipping volume, we can filter this primitive out of the further processing steps. Rasterization converts a primitive to possibly many fragments, so it is an expansion. The fragment shader is also a mapping similarly to the vertex shader. Finally, merging may act as a selection, for example, based on the coordinate or even as an accumulation if blending is turned on.
Shader processors get their stream data via dedicated registers, which are filled by the shader of the preceding step. These are called varying input. On the other hand, parameters can also be passed from the CPU. These parameters are called uniform input since they are identical for all elements of the stream and cannot be changed in a pass.
If the computation of the elements is done independently and without sharing temporary results, the parallel machines are called vector processors or array processors. As in the GPU hardware the fragment shader is associated with the elements of the output data, we use the fragment shader to evaluate output elements. Of course, the evaluation in a given processor must also be aware which element is being computed, which is the fundamental source of data dependency (it would not make sense to compute the very same data many times on a parallel machine). In the fragment shader, the index of the data element is in fact the pair of the pixel coordinates. This is available in screen space as a pair of two integers specifying the row and the column where the pixel is located.
In the simplest, but practically the most important case, we wish to have a result in all pixels in a single rendering pass. So we have to select a geometric primitive that is mapped to all pixels in screen space and a single pixel is mapped only once. Such a geometric primitive is the virtual display itself, thus we should render a rectangle or a quadrilateral that represents the window of our virtual camera. In screen space, this is the viewport rectangle, in clipping space, this is a square on the plane and having corners in homogeneous coordinates , , , . This rectangle is also called the full screen quad and is processed by the hardware as two triangles (Figure 28.4).
Suppose that we want to compute an output array of dimension from an input array of possibly different dimension and a global parameter with function :
To set up the GPU for this computation, we assign output array to the output texture that is the current render target. Texture size is chosen according to the output size, and the viewport is set to cover the entire render target. A two-dimensional array of horizontal resolution and vertical resolution is capable of storing elements. If , then it is up to us how horizontal and vertical resolutions are found. However, GPUs may impose restrictions, e.g. they cannot be larger than or, if we wish to use them as input textures in the next pass or compute binary reductions, the resolutions are preferred to be powers of two. If power of two dimensions are advantageous but the dimension of the array is different, we can extend the array by additional void elements.
According to vector processing principles, different output values are computed independently without sharing temporary results. As in the GPU hardware the fragment shader is associated with the elements of the output data and can run independently of other elements, we use the fragment shader to evaluate function . To find its parameters, we need to know , i.e. which element is currently computed, and should have an access to input array . The simplest way is to store the input array as an input texture (or multiple input textures if that is more convenient) since the fragment shader can access textures.
The only responsibility of the CPU is to set the uniform parameters, specify the viewport and send a full screen quad down the pipeline. Uniform parameters select the input texture and define global parameter . Assuming the OpenGL API, the corresponding CPU program in C would look like the following:
StartVectorOperation( ) { Set uniform parameters p and arrayX identifying the input texture glViewport(0, 0, H, V); // Set horizontal and vertical resolutions, H and V glBegin(GL_QUADS); // The next four vertices define a quad glVertex4f(-1,-1, 0, 1); // Vertices assuming normalized device space glVertex4f(-1, 1, 0, 1); glVertex4f( 1, 1, 0, 1); glVertex4f( 1,-1, 0, 1); glEnd( ); } }
Note that this program defines the rectangle directly in normalized device space using homogeneous coordinates passed as input parameters of the glVertex4f functions. So in the pipeline we should make sure that the vertices are not transformed.
For the shader program, the varying inputs are available in dedicated registers and outputs must also be written to dedicated registers. All of these registers are of type float4, that is, they can hold 4 float values. The role of the register is explained by its name. For example, the current value of the vertex position can be fetched from the POSITION register. Similar registers can store the texture coordinates or the color associated with this vertex.
The vertex shader gets the position of the vertex and is responsible for transforming it to the normalized device space. As we directly defined the vertices in normalized device space, the vertex shader simply copies the content of its input POSITION register to its output POSITION register (the input and output classification is given by the in and out keywords in front of the variable names assigned to registers):
void VertexShader( in float4 inputPos : POSITION, out float4 outputPos : POSITION ) { outputPos = inputPos; }
The geometry shader should keep the rectangle as it is without changing the vertex coordinates. As this is the default operation for the geometry shader, we do not specify any program for it. The rectangle leaving the geometry shader goes to the clipping stage, which keeps it since we defined our rectangle to be inside the clipping region. Then, Cartesian coordinates are obtained from the homogeneous ones by dividing the first three coordinates by the fourth one. As we set all fourth homogeneous coordinates to 1, the first three coordinates are not altered. After homogeneous division, the fixed-function stage transforms the vertices of the rectangle to the vertices of a screen space rectangle having the coordinates equal to the corners of the viewport and the coordinate to 0.5. Finally, this rectangle is rasterized in screen space, so all pixels of the viewport are identified as a target, and the fragment shader is invoked for each of them.
The fragment shader is our real computing unit. It gets the input array and global parameter as uniform parameters and can also find out which pixel is being computed by reading the WPOS register:
float FragmentShaderF( in float2 index : WPOS, // target pixel coordinates uniform samplerRECT arrayX, // input array uniform float p // global parameter p ) : COLOR // output is interpreted as a pixel color { float yi = F(index, arrayX, p); // F is the function to be evaluated return yi; }
In this program two input parameters were declared as uniform inputs by the uniform keyword, a float parameter p and the texture identification arrayX. The type of the texture is samplerRECT that determines the addressing modes how a texel can be selected. In this addressing mode, texel centers are on a two-dimensional integer grid. Note that here we used a different syntax to express what the output of the shader is. Instead of declaring a register as out, the output is given as a return value and the function itself, and is assigned to the output COLOR register.
To show concrete examples, we first implement the level 1 functionality of the Basic Linear Algebra Subprograms (BLAS) library (http://www.netlib.org/blas/) that evaluates vector functions of the following general form:
where and are vectors and is a scalar parameter. This operation is called SAXPY in the BLAS library. Now our fragment shader inputs two textures, vector and the original version of vector . One fragment shader processor computes a single element of the output vector:
float FragmentShaderSAXPY( in float2 index : WPOS, // target pixel coordinates uniform samplerRECT arrayX, // input array x uniform samplerRECT arrayY, // original version of y uniform float p // global parameter p ) : COLOR // output is interpreted as a pixel color { float yoldi = texRECT(arrayY, index); // yoldi = arrayY[index] float xi = texRECT(arrayX, index); // xi = arrayX[index] float yi = p * xi + yoldi; return yi; }
Note that instead of indexing an array of CPU style programming, here we fetch the element from a texture that represents the array by the texRECT Cg function. The first parameter of the texRECT function is the identification number of a two-dimensional texture, which is passed from the CPU as a uniform parameter, and the second is the texture address pointing to the texel to be selected.
Here we can observe how we can handle the limitation that a shader can only read textures but is not allowed to write into it. In the SAXPY operation, vector is an input and simultaneously also the output of the operation. To resolve this, we assign two textures to vector . One is the original vector in texture arrayY, and the other one is the render target. While we read the original value, the new version is produced without reading back from the render target, which would not be possible.
Another important example is the discrete convolution of two textures, an image and a filter kernel, which is the basic operation in many image processing algorithms:
where is the filtered value at pixel , is the original image, and is the filter kernel, which spans over pixels.
Now the fragment shader is responsible for the evaluation of a single output pixel according to the input image given as texture Image and the filter kernel stored in a smaller texture Weight. The half size of the filter kernel is passed as a uniform variable:
float3 FragmentShaderConvolution( in float2 index : WPOS, // target pixel coordinates uniform samplerRECT Image, // input image uniform samplerRECT Weight, // filter kernel uniform float M // size of the filter kernel ) : COLOR // a pixel of the filtered image { float3 filtered = float3(0, 0, 0); for(int i = -M; i <= M; i++) for(int j = -M; j <= M; j++) { float2 kernelIndex = float2(i, j); float2 sourceIndex = index + kernelIndex; filtered += texRECT(Image, sourceIndex) * texRECT(Weight, kernelIndex); } } return filtered; }
Note that this example was a linear, i.e. convolution filter, but non-linear filters (e.g. median filtering) could be implemented similarly. In this program we applied arithmetic operators (*, +=, =) for float2 and float3 type variables storing two and three floats, respectively. The Cg compiler and the GPU will execute these instructions independently on the float elements.
Note also that we did not care what happens at the edges of the image, the texture is always fetched with the sum of the target address and the shift of the filter kernel. A CPU implementation ignoring image boundaries would obviously be wrong, since we would over-index the source array. However, the texture fetching hardware implementing, for example, the texRECT function automatically solves this problem. When the texture is initialized, we can specify what should happen if the texture coordinate is out of its domain. Depending on the selected option, we get the closest texel back, or a default value, or the address is interpreted in a periodic way.
Exercises
28.3-1 Following the vector processing concept, write a pixel shader which, when a full screen quad is rendered, quantizes the colors of an input texture to a few levels in all three channels, achieving a cell shading effect.
28.3-2 Following the gathering data processing scheme, write a pixel shader which, when a full screen quad is rendered, performs median filtering on an input grayscale image, achieving dot noise reduction. The shader should fetch nine texel values from a neighborhood of , outputting the fifth largest.
28.3-3 Implement an anisotropic, edge preserving low-pass image filter with the gathering data processing scheme. In order to preserve edges, compute the Euclidean distance of the original pixel color and the color of a neighboring pixel, and include the neighbor in the averaging operation only when the distance is below a threshold.
28.3-4 Write a parallel Mandelbrot set rendering program by assuming that pixel corresponds to complex number and deciding whether or not the iteration diverges when started from . The divergence may be checked by iterating times and examining that is large enough. Divergent points are depicted with white, non-divergent points with black.
Imagining the GPU as a vector processor is a simple but efficient application of the GPU hardware in general parallel processing. If the algorithm is suitable for vector processing, then this approach is straightforward. However, some algorithms are not good candidates for vector processing, but can still be efficiently executed by the GPU. In this section, we review the basic approaches that can extend the vector processing framework to make the GPU applicable for a wider range of algorithms.
Vector processors are usually SIMD machines, which means that they execute not only the same program for different vector elements but always the very same machine instruction at a time. It means that vector operations cannot contain data dependent conditionals or loop lengths depending on the actual data. There is only one control sequence in a SIMD parallel program.
Of course, writing programs without if conditionals and using only constants as loop cycle numbers are severe limitations, which significantly affects the program structure and the ease of development. Early GPU shaders were also SIMD type processors and placed the burden of eliminating all conditionals from the program on the shoulder of the programmer. Current GPUs and their compilers solve this problem automatically, thus, on the programming level, we can use conditionals and variable length loops as if the shaders were MIMD computers. On execution level, additional control logic makes it possible that execution paths of scalar units diverge: in this case it is still a single instruction which is executed at a time, possibly with some scalar units being idle. Operations of different control paths are serialized so that all of them are completed. The overhead of serialization makes performance strongly dependent on the coherence of execution paths, but many transistors of control logic can be spared for more processing units.
The trick of executing all branches of conditionals with possibly disabled writes is called predication. Suppose that our program has an if statement like
if (condition(i)) { F( ); } else { G( ); }
Depending on the data, on some processors the may be true, while it is false on other processors, thus our vector machine would like to execute function of the first branch in some processors while it should evaluate function of the second branch in other processors. As in SIMD there can be only one control path, the parallel system should execute both paths and disable writes when the processor is not in a valid path. This method converts the original program to the following conditional free algorithm:
enableWrite = condition(i); F( ); enableWrite = !enableWrite; G( );
This version does not have conditional instructions so it can be executed by the SIMD machine. However, the computation time will be the the sum of computation times of the two functions.
This performance bottleneck can be attacked by decomposing the computation into multiple passes and by the exploitation of the early -culling feature. The early -cull compares the value of the fragment with the content of the depth buffer, and if it is smaller than the stored value, the fragment shader is not called for this fragment but the fragment processor is assigned to another data element. The early z-cull is enabled automatically if we execute fragment programs that do not modify the fragment's coordinate (this is the case in all examples discussed so far).
To exploit this feature, we decompose the computation into three passes. In the first pass, only the condition is evaluated and the depth buffer is initialized with the values. Recall that if the value is not modified, our full screen quad is on the plane in normalized device space, so it will be on the plane in screen space. Thus, to allow a discrimination according to the condition, we can set values in the range if the condition is true and in if it is false.
The fragment shader of the first pass computes just the condition values and stores them in the depth buffer:
float FragmentShaderCondition( in float2 index : WPOS, // target pixel coordinates uniform samplerRECT Input, // input vector ) : DEPTH // the output goes to the depth buffer { bool condition = ComputeCondition( texRECT(Input, index) ); return (condition) ? 0.8 : 0.2; // 0.8 is greater than 0.5; 0.2 is smaller than 0.5 }
Then we execute two passes for the evaluation of functions and , respectively. In the first pass, the fragment shader computes and the depth comparison is set to pass those fragments where their coordinate is less than the depth value stored in the depth buffer. In this pass, only those fragments are evaluated where the depth buffer has 0.8 value, i.e. where the previous condition was true. Then, in the second pass, the fragment shader is set to compute while the depth buffer is turned to keep those fragments where the fragment's depth is greater than the stored value.
In Subsection 28.7.1 we exploit early -culling to implement a variable length loop in fragment processors.
The vector processing principle assumes that the output is an array where elements are obtained independently. The array should be large enough to keep every shader processor busy. Clearly, if the array has just one or a few elements, then only one or a few shader processors may work at a time, so we loose the advantages of parallel processing.
In many algorithms, the final result is not a large array, but is a single value computed from the array. Thus, the algorithm should reduce the dimension of the output. Doing the reduction in a single step by producing a single texel would not benefit from the parallel architecture. Thus, reduction should also be executed in parallel, in multiple steps. This is possible if the operation needed to compute the result from the array is associative, which is the case for the most common operations, like sum, average, maximum, or minimum.
Suppose that the array is encoded by a two-dimensional texture. At a single phase, we downsample the texture by halving its linear resolution, i.e. replacing four neighboring texels by a single texel. The fragment shaders will compute the operation on four texels. If the original array has resolution, then reduction steps are needed to obtain a single output value. In the following example, we compute the sum of the elements of the input array (Figure 28.5). The CPU program renders a full screen quad in each iteration having divided the render target resolution by two:
Reduction( ) { Set uniform parameter arrayX to identify the input texture for(N /= 2 ; N >= 1; N /= 2) { // log_2 N iterations glViewport(0, 0, N, N); // Set render target dimensions to hold NxN elements glBegin(GL_QUADS); // Render a full screen quad glVertex4f(-1,-1, 0, 1); glVertex4f(-1, 1, 0, 1); glVertex4f( 1, 1, 0, 1); glVertex4f( 1,-1, 0, 1); glEnd( ); Copy render target to input texture arrayX } }
The fragment shader computes a single reduced texel from four texels as a summation in each iteration step:
float FragmentShaderSum( ) ( in float2 index : WPOS, // target pixel coordinates uniform samplerRECT arrayX, // input array x ) : COLOR // output is interpreted as a pixel color { float sum = texRECT(arrayX, 2 * index); sum += texRECT(arrayX, 2 * index + float2(1, 0)); sum += texRECT(arrayX, 2 * index + float2(1, 1)); sum += texRECT(arrayX, 2 * index + float2(0, 1)); return sum; }
Note that if we exploited the bi-linear filtering feature of the texture memory, then we could save three texture fetch operations and obtain the average in a single step.
In vector processing a processor is assigned to each output value, i.e. every processor should be aware which output element it is computing and it is not allowed to deroute its result to somewhere else. Such a static assignment is appropriate for gathering type computations. The general structure of gathering is that we may rely on a dynamically selected set of input elements but the variable where the output is stored is known a-priory:
index = ComputeIndex( ); // index of the input data y = F(x[index]);
Opposed to gathering, algorithms may have scattering characteristics, i.e. a given input value may end up in a variable that is selected dynamically. A simple scatter operation is:
index = ComputeIndex( ); // index of the output data y[index] = F(x);
Vector processing frameworks and our fragment shader implementation are unable to implement scatter since the fragment shader can only write to the pixel it has been assigned to.
If we wish to solve a problem having scattering type algorithm on the GPU, we have two options. First, we can restructure the algorithm to be of gathering type. Converting scattering type parallel algorithms to gathering type ones requires a change of our viewpoint how we look at the problem and its solution. For example, when integral equations or transport problems are considered, this corresponds to the solution of the adjoint problem [265]. Secondly, we can move the index calculation up to the pipeline and use the rasterizer to establish the dynamic correspondence between the index and the render target (Figure 28.6).
Let us consider a famous scattering type algorithm, histogram generation. Suppose we scan an input array of dimension , evaluate function for the elements, and calculate output array of dimension that stores the number of function values that are in bins equally subdividing range .
A scalar implementation of histogram generation would be:
Histogram( x ) { for(int i = 0; i < M; i++) { index = (int)((F(x[i]) - Fmin)/(Fmax - Fmin) * N); // bin index = max(index, 0); index = min(index, N-1); y[index] = y[index] + 1; } }
We can see that the above function writes to the output array at random locations, meaning it cannot be implemented in a fragment shader which is only allowed to write the render target at its dedicated index. The problem of scattering will be solved by computing the index in the vertex shader but delegating the responsibility of incrementing to the rest of the pipeline. The indices are mapped to output pixels by the rasterization hardware. The problem of read-modify-write cycles might be solved by starting a new pass after each increment operation and copying the current render target as an input texture of the next rendering pass. However, this solution would have very poor performance and would not utilize the parallel hardware at all. A much better solution uses the arithmetic capabilities of the merging unit. The fragment shader generates just the increment (i.e. value 1) where the histogram needs to be updated and gives this value to the merging unit. The merging unit, in turn, adds the increment to the content of the render target.
The CPU program generates a point primitive for each input data element. Additionally, it sets the render target to match the output array and also enables the merging unit to execute add operations:
ScanInputVector( ) { Set uniform parameters Fmin, Fmax, N glDisable(GL_DEPTH_TEST); // Turn depth buffering off glBlendFunc(GL_ONE, GL_ONE); // Blending operation: dest = source * 1 + dest * 1; glEnable(GL_BLEND); // Enable blending glViewport(0, 0, N, 1); // Set render target dimensions to hold N elements glBegin(GL_POINTS); // Assign a point primitive to each input elements for(int i = 0; i < M; i++) { glVertex1f( x[i] ); // an input element as a point primitive } glEnd( ); }
The vertex positions in this level are not important since it turns out later where this point will be mapped. So we use the first coordinate of the vertex to pass the current input element .
The vertex shader gets the position of the vertex currently storing the input element, and finds the location of this point in normalized device space. First, function is evaluated and the bin index is obtained, then we convert this index to the range since in normalized device space these will correspond to the extremes of the viewport:
void VertexShaderHistogram( in float inputPos : POSITION, out float4 outputPos : POSITION, uniform float Fmin, uniform float Fmax, uniform float N ) { float xi = inputPos; int index = (int)((F(xi) - Fmin)/(Fmax - Fmin) * N); // bin index = max(index, 0); index = min(index, N-1); float nindex = 2.0 * index / N - 1.0; // normalized device space outputPos = float4(nindex, 0, 0, 1); // set output coordinates }
The above example is not optimized. Note that the index calculation and the normalization could be merged together and we do not even need the size of the output array to execute this operation.
The fragment shader will be invoked for the pixel on which the point primitive is mapped. It simply outputs an increment value of 1:
float FragmentShaderIncr( ) : COLOR // output is interpreted as a pixel color { return 1; // increment that is added to the render target by merging }
Figure 28.7. Caustics rendering is a practical use of histogram generation. The illumination intensity of the target will be proportional to the number of photons it receives (images courtesy of Dávid Balambér).
Parallel processors running independently offer a linear speed up over equivalent scalar processor implementations. However, scalar processors may benefit from recognizing similar parts in the computation of different output values, so they can increase their performance utilizing reuse. As parallel processors may not reuse data generated by other processors, their comparative advantages become less attractive.
GPUs are parallel systems of significant streaming capabilities, so if data that can be reused are generated early, we can get the advantages of both independent parallel processing and the reuse features of scalar computing.
Our main stream expander is the rasterization. Thus anything happens before rasterization can be considered as a global computation for all those pixels that are filled with the rasterized version of the primitive. Alternatively, the result of a pass can be considered as an input texture in the next pass, so results obtained in the previous pass can be reused by all threads in the next pass.
Exercises
28.4-1 Implement a parallel regula falsi equation solver for that searches for roots in for many different and parameters. The and parameters are stored in a texture and the pixel shader is responsible for iteratively solving the equation for a particular parameter pair. Terminate the iteration when the error is below a given threshold. Take advantage of the early -culling hardware to prevent further refinement of the terminated iterations. Analyze the performance gain.
28.4-2 Based on the reduction scheme, write a program which applies simple linear tone mapping to a high dynamic range image stored in a floating-point texture. The scaling factor should be chosen to map the maximum texel value to the value of one. Find this maximum using iterative reduction of the texture.
28.4-3 Based on the concept of scatter, implement a caustics renderer program (Figure 28.7). The scene includes a point light source, a glass sphere, and a diffuse square that is visualized on the screen. Photons with random directions are generated by the CPU and passed to the GPU as point primitives. The vertex shader traces the photon through possible reflections or refractions and decides where the photon will eventually hit the diffuse square. The point primitive is directed to that pixel and the photon powers are added by additive alpha blending.
28.4-4 Based on the concept of scatter, given an array of GSM transmitter tower coordinates, compute cell phone signal strength on a 2D grid. Assume signal strength diminishes linearly with the distance to the nearest transmitter. Use the rasterizer to render circular features onto a 2D render target, and set up blending to pick the maximum.
The Compute Unified Device Architecture (CUDA) and the OpenCL interfaces provide the programmer with a programming model that is significantly different from the graphics pipeline model (right of Figure 28.1). It presents the GPU as a collection of multiprocessors where each multiprocessor contains several SIMD scalar processors. Scalar processors have their own registers and can communicate inside a multiprocessor via a fast shared memory. Scalar processors can read cached textures having built-in filtering and can read or write the slow global memory. If we wish, even read-modify-write operations can also be used. Parts of the global memory can be declared as a texture, but from that point it becomes read-only.
Unlike in the graphics API model, the write to the global memory is not exclusive and atomic add operations are available to support semaphores and data consistency. The fixed-function elements like clipping, rasterization, and merging are not visible in this programming model.
Comparing the GPGPU programming model to the graphics API model, we notice that it is cleaner and simpler. In the GPGPU programming model, parallel processors are on the same level and can access the global memory in an unrestricted way, while in the graphics API model, processors and fixed-function hardware form streams and write is possible only at the end of the stream. When we program through the GPGPU model, we face less restrictions than in the graphics pipeline model. However, care should be practiced since the graphics pipeline model forbids exactly those features that are not recommended to use in high performance applications.
The art of programming the GPGPU model is an efficient decomposition of the original algorithm to parallel threads that can run with minimum amount of data communication and synchronization, but always keep most of the processors busy. In the following sections we analyze a fundamental operation, the matrix-vector multiplication, and discuss how these requirements can be met.
Computational problems are based on mathematical models and their numerical solution. The numerical solution methods practically always rely on some kind of linearization, resulting in algorithms that require us to solve linear systems of equations and perform matrix-vector multiplication as a core of the iterative solution. Thus, matrix-vector multiplication is a basic operation that can be, if implemented efficiently on the parallel architecture, the most general building block in any numerical algorithm. We define the basic problem to be the computation of the result vector from input matrix , vectors and , as
We call this the MV problem. Let be the dimensions of matrix . As every input vector element may contribute to each of the output vector elements, a scalar CPU implementation would contain a double loop, one loop scans the input elements while the other the output elements. If we parallelize the algorithm by assigning output elements to parallel threads, then we obtain a gathering type algorithm where a thread gathers the contributions of all input elements and aggregates them to the thread's single output value. On the other hand, if we assigned parallel threads to input elements, then a thread would compute the contribution of this input element to all output elements, which would be a scatter operation. In case of gathering, threads share only input data but their output is exclusive so no synchronization is needed. In case of scattering, multiple threads may add their contribution to the same output element, so atomic adds are needed, which may result in performance degradation.
An implementation of the matrix-vector multiplication on a scalar processor looks like the following:
void ScalarMV(int N, int M, float* y, const float* A, const float* x, const float* b) { for(int i=0; i<N; i++) { float yi = b[i]; for(int j=0; j<M; j++) yi += A[i * M + j] * x[j]; y[i] = yi; } }
The first step of porting this algorithm to a parallel machine is to determine what a single thread would do from this program. From the options of gathering and scattering, we should prefer gathering since that automatically eliminates the problems of non-exclusive write operations. In a gathering type solution, a thread computes a single element of vector and thus we need to start threads. A GPU can launch a practically unlimited number of threads that are grouped in thread blocks. Threads of a block are assigned to the same multiprocessor. So the next design decision is how the threads are distributed in blocks. A multiprocessor typically executes 32 threads in parallel, so the number of threads in a block should be some multiple of 32. When the threads are halted because of a slow memory access, a hardware scheduler tries to continue the processing of other threads, so it is wise to assign more than 32 threads to a multiprocessor to always have threads that are ready to run. However, increasing the number of threads in a single block may also mean that at the end we have just a few blocks, i.e. our program will run just on a few multiprocessors. Considering these, we assign 256 threads to a single block and hope that exceeds the number of multiprocessors and thus we fully utilize the parallel hardware.
There is a slight problem if is not a multiple of . We should assign the last elements of the vector to some processors as well, so the thread block number should be the ceiling of . As a result of this, we shall have threads that are not associated with vector elements. It is not a problem if the extra threads can detect it and cause no harm, e.g. they do not over-index the output array.
Similarly to the discussed vector processing model, a thread should be aware which output element it is computing. The CUDA library provides implicit input parameters that encode this information: blockIdx is the index of the thread block, blockDim is the number of threads in a block, and threadIdx is the index of the thread inside the block.
The program of the CUDA kernel computing a single element of the output vector is now a part of a conventional CPU program:
__global__ void cudaSimpleMV(int N, int M, float* y, float* A, float* x, float* b) { // Determine element to process from thread and block indices int i = blockIdx.x * blockDim.x + threadIdx.x; if(i < N) { // if the index is out of the range of the output array, skip. float yi = b[i]; for(int j=0; j<M; j++) yi += A[i * M + j] * x[j]; y[i] = yi; } }
The global keyword tells the compiler that this function will run not on the CPU but on the GPU and it may be invoked from the CPU as well. The parameters are passed according to the normal C syntax. The only special feature is the use of the implicit parameters to compute the identification number of this thread, which is the index of the output array.
The kernels are started from a CPU program that sets the parameters and also defines the number of thread blocks and the number of threads inside a block.
__host__ void run_cudaSimpleMV() { int threadsPerBlock = 256; // number of threads per block int blockNum = (N + threadsPerBlock - 1)/threadsPerBlock; // number of blocks cudaSimpleMV<<<blockNum, threadsPerBlock>>>(N, M, y, A, x, b); }
The compiler will realize that this function runs on the CPU by reading the host keyword. The parallel threads are started like a normal C function call with the exception of the <blockNum, threadsPerBlock> tag, which defines how many threads should be started and how they are distributed among the multiprocessors.
So far, we assigned matrix rows to parallel threads and computed scalar product serially inside threads. If the number of matrix rows is less than the number of parallel scalar processors, this amount of parallelization is not enough to supply all processing units with work to do, and the execution of individual threads will be lengthy. Reformulating the scalar product computation is a well known, but tougher parallelization problem, as the additions cannot be executed independently, and we require a single scalar to be written for every row of the matrix. However, parts of the summation can be executed independently, and then the results added. This is a classic example of reduction. It is required that the threads whose results are to be added both finish execution and write their results to where they are accessible for the thread that needs to add them. Thus, we use thread synchronization and shared memory available only for the threads of the same block.
Let us assume first—unrealistically—that we can have threads processing a row and the shared memory can hold floating point values. Let be the vector of length residing in shared memory. Then, every thread can compute one element as . Finally, elements of must be reduced by summation. Let us further assume that . The reduction can be carried out in steps, terminating half of the threads, while each surviving thread adds the value in computed by a terminated one to its own. The final remaining thread outputs the value to the global memory.
#define M THE_NUMBER_OF_MATRIX_COLUMNS __global__ void cudaReduceMV(int N, float* y, float* A, float* x, float* b) { int i = blockIdx.x; int j = threadIdx.x; __shared__ float Q[M]; // in the shader memory inside a multiprocessor Q[j] = A[i * M + j] * x[j]; // a parallel part of matrix-vector multiplication for(int stride = M / 2; stride > 0; stride >>= 1) // reduction { __syncthreads(); // wait until all other threads of the block arrive this point if(j + stride < M) Q[j] += Q[j + stride]; } if(j == 0) // reduced to a single element y[i] = Q[0] + b[i]; } __host__ void run_cudaReduceMV() { cudaReduceMV<<< N, M >>>(N, y, A, x, b); }
For practical matrix dimensions (), neither the number of possible threads of a single multiprocessor nor the size of the shared memory is enough to process all elements in parallel. In our next example, we use a single block of threads with limited size to process a large matrix. First, we break the output vector into segments of size . Elements within such a segment are evaluated in parallel, then the threads proceed to the next segment. Second, for every scalar product computation, we break the vectors and into segments of length . We maintain a shared vector of length for every row being processed in parallel. We can compute the elementwise product of the and segments in parallel, and add it to . As rows are being processed by threads each, the block will consist of threads. From one thread's perspective this means it has to loop over with a stride of , and for every such element in , loop over and with a stride of . Also for every element in , the contents of must be summed by reduction as before. The complete kernel which works with large matrices would then be:
__global__ void cudaLargeMV(int N, int M, float* y, float* A, float* x, float* b) { __shared__ float Q[T * Z]; // stored in the shared memory inside a multiprocessor int t = threadIdx.x / Z; int z = threadIdx.x % Z; for(int i = t; i < N; i += T) { Q[t * Z + z] = 0; for(int j = z; j < M; j += Z) Q[t * Z + z] += A[i * M + j] * x[j]; for(int stride = Z / 2; stride > 0; stride >>= 1) { __syncthreads(); if(z + stride < Z) Q[t * Z + z] += Q[t * Z + z + stride]; } if(z == 0) y[i] = Q[t * Z + 0] + b[i]; } } __host__ void run_cudaLargeMV() { cudaReduceMV<<< 1, T*Z >>>(N, M, y, A, x, b); }
This can easily be extended to make use of multiple thread blocks by restricting the outer loop to only a fraction of the matrix rows based on the blockIdx parameter.
The above algorithm uses shared memory straightforwardly and allows us to align memory access of threads through a proper choice of block sizes. However, every element of vector must be read once for the computation of every row. We can improve on this if we read values of into the shared memory and have threads in one block operate on multiple rows of the matrix. This, however, means we can use less shared memory per line to parallelize summation. The analysis of this trade-off is beyond the scope of this chapter, but a block size of has been proposed in [88]. With such a strategy it is also beneficial to access matrix as a texture, as data access will exhibit 2D locality, supported by texture caching hardware.
Even though matrix-vector multiplication is a general mathematical formulation for a wide range of computational problems, the arising matrices are often large, but sparse. In case of sparse matrices, the previously introduced matrix-vector multiplication algorithms will not be efficient as they explicitly compute multiplication with zero elements. Sparse matrix representations and MV algorithms are discussed in [26].
Exercises
28.6-1 Implement matrix-vector multiplication for large matrices in CUDA. Compare results to a CPU implementation.
28.6-2 Implement an inverse iteration type Julia set renderer. The Julia set is the attractor of the iteration where and are complex numbers. Inverse iteration starts from a fixed point of the iteration formula, and iterates the inverse mapping, by randomly selecting either or from the two possibilities. Threads must use pseudo-random generators that are initialized with different seeds. Note that CUDA has no built-in random number generator, so implement one in the program.
Problems emerging in physics or engineering are usually described mathematically as a set of partial differential or integral equations. As physical systems expand in space and time, derivatives or integrals should be evaluated both in temporal and spatial domains.
Figure 28.8. Finite element representations of functions. The texture filtering of the GPU directly supports finite element representations using regularly placed samples in one-, two-, and three-dimensions and interpolating with piece-wise constant and piece-wise linear basis functions.
When we have to represent a value over space and time, we should use functions having the spatial position and the time as their variables. The representation of general functions would require infinite amount of data, so in numerical methods we only approximate them with finite number of values. Intuitively, these values can be imagined as the function values at discrete points and time instances. The theory behind this is the finite element method. If we need to represent function with finite data, we approximate the function in the following finite series form (Figure 28.8):
where are pre-defined basis functions and are the coefficients that describe .
A particularly simple finite element representation is the piece-wise linear scheme that finds possibly regularly placed sample points in the domain, evaluates the function at these points to obtain the coefficients and linearly interpolates between and .
When the system is dynamic, solution will be time dependent, so a new finite element representation is needed for every time instance. We have basically two options for this. We can set sample points in a static way and allow only coefficients to change in time. This approach is called Eulerian. On the other hand, we can also allow the sample points to move with the evaluation of the system, making also sample points time dependent. This is the Lagrangian approach, where sample locations are also called particles.
Intuitive examples of Eulerian and Lagrangian discretization schemes are how temperature and other attributes are measured in meteorology. In ground stations, these data are measured at fixed locations. However, meteorological balloons can also provide the same data, but from varying positions that follow the flow of the air.
In this section we discuss a case study for GPU-based scientific computation. The selected problem is computational fluid dynamics. Many phenomena that can be seen in nature like smoke, cloud formation, fire, and explosion show fluid-like behavior. Understandably, there is a need for good and fast fluid solvers both in engineering and in computer animation.
The mathematical model of the fluid motion is given by the Navier-Stokes equation. First we introduce this partial differential equation, then discuss how GPU-based Eulerian and Langrangian solvers can be developed for it.
A fluid with constant density and temperature can be described by its velocity and pressure fields. The velocity and the pressure vary both in space and time:
Let us focus on a fluid element of unit volume that is at point at time . At an earlier time instance , this fluid element was in and, according to the fundamental law of dynamics, its velocity changed according to an acceleration that is equal to total force divided by mass of this unit volume fluid element:
Mass of a unit volume fluid element is called the fluid density. Moving the velocity terms to the left side and dividing the equation by , we can express the substantial derivative of the velocity:
The total force can stem from different sources. It may be due to the pressure differences:
where is the gradient of the pressure field. The minus sign indicates that the pressure accelerates the fluid element towards the low pressure regions. Here we used the nabla operator, which has the following form in a Cartesian coordinate system:
Due to friction, the fluid motion is damped. This damping depends on the viscosity of the fluid. Highly viscous fluids like syrup stick together, while low-viscosity fluids flow freely. The total damping force is expressed as a diffusion term since the viscosity force is proportional to the Laplacian of the velocity field:
Finally, an external force field may also act on our fluid element causing acceleration. In the gravity field of the Earth, assuming that the vertical direction is axis , this external acceleration is where [ ].
Adding the forces together, we can obtain the Navier-Stokes equation for the velocity of our fluid element:
In fact, this equation is the adaptation of the fundamental law of dynamics for fluids. If there is no external force field, the momentum of the dynamic system must be preserved. This is why this equation is also called momentum conservation equation.
Closed physics systems preserve not only the momentum but also the mass, so this aspect should also be built into our fluid model. Simply put, the mass conservation means that what flows into a volume must also flow out, so the divergence of the mass flow is zero. If the fluid is incompressible, then the fluid density is constant, thus the mass flow is proportional to the velocity field. For incompressible fluids, the mass conservation means that the velocity field is divergence free:
The Eulerian approach tracks the evolution of the velocity and pressure fields on fixed, uniform grid points. The grid allows a simple approximation of spatial derivatives by finite differences. If the grid points are in distances , , and along the three coordinate axes and the values of scalar field and vector field at grid point are and , respectively, then the gradient, the divergence and the Laplacian operators can be approximated as:
The Navier-Stokes equation and the requirement that the velocity is divergence free define four scalar equations (the conservation of momentum is a vector equation) with four scalar unknowns (, , , ). The numerical solver computes the current fields advancing the time in discrete steps of length :
The velocity field is updated in several steps, each considering a single term on the right side of this equation. Let us consider these steps one-by-one.
To initialize the new velocity field at point , we fetch the previous field at position since the fluid element arriving at point was there in the previous time step [261]. This step computes advection, i.e. the phenomenon that the fluid carries its own velocity field:
To damp the velocity field, we could update it proportionally to a diffusion term:
However, this type of forward Euler integrator is numerically unstable. The reason of instability is that forward methods predict the future based on the present values, and as time passes, each simulation step adds some error, which may accumulate and exceed any limit.
Unlike forward integrators, a backward method can guarantee stability. A backward looking approach is stable since while predicting the future, it simultaneously corrects the past. Thus, the total error converges to a finite value and remains bounded. Here a backward method means that the Laplacian is obtained from the future, yet unknown velocity field, and not from the current velocity field:
At this step of the computation, the advected field is available at the grid points, the unknowns are the diffused velocity for each of the grid points. Using (28.5) to compute the Laplacian of the coordinates of unknown vector field at grid point , we observe that it will be a linear function of the velocities in the grid point and its neighbors. Thus, (28.6) is a sparse linear system of equations:
where vector is the vector of the known velocities obtained by advection, is the vector of unknown velocities of the grid points, and matrix-vector multiplication represents the discrete form of .
Such systems are primary candidates for Jacobi iteration (see Chapter 12 of this book, titled Scientific Computation). Initially we fill vector with zero and evaluate the right side of (28.7) iteratively, moving the result of the previous step to vector of the right side. Thus, we traced back the problem to a sequence of sparse vector-matrix multiplications. Note that matrix needs not be stored. When velocity field is needed at a grid point, the neighbors are looked up and the simple formula of (28.5) gives us the result.
Updating a value in a grid point according to its previous value and the values of its neighbors are called image filtering. Thus, a single step of the Jacobi iteration is equivalent to an image filtering operation, which is discussed in Section 28.3.2.
So far, we calculated an updated velocity field without considering the unknown pressure field. In the projection step, we compute the unknown pressure field and update the velocity field with it:
The pressure field is obtained from the requirement that the final velocity field must be divergence free. Let us apply the divergence operator to both sides of this equation. After this, the left side becomes zero since we aim at a divergence free vector field for which :
Assuming a regular grid where vector field is available, searching the unknown pressure at grid positions, and evaluating the divergence and the Laplacian with finite differences of equations (28.4) and (28.5), respectively, we again end up with a sparse linear system for the discrete pressure values and consequently for the difference between the final velocity field and . This system is also solved with Jacobi iteration. Similarly to the diffusion step, the Jacobi iteration of the projection is also a simple image filtering operation.
The discretized velocity and pressure fields can be conveniently stored in three-dimensional textures, where discrete variables are defined at the centers of elemental cubes, called voxels of a grid [106]. At each time step, the content of these data sets should be refreshed (Figure 28.9).
The representation of the fields in textures has an important advantage when the advection is computed. The advected field at voxel center is obtained by copying the field value at position . Note that the computed position is not necessarily a voxel center, but it can be between the grid points. According to the finite element concept, this value can be generated from the finite element representation of the data. If we assume piece-wise linear basis functions, then the texture filtering hardware automatically solves this problem for us at no additional computation cost.
Figure 28.10. Computation of the simulation steps by updating three-dimensional textures. Advection utilizes the texture filtering hardware. The linear equations of the viscosity damping and projection are solved by Jacobi iteration, where a texel (i.e. voxel) is updated with the weighted sum of its neighbors, making a single Jacobi iteration step equivalent to an image filtering operation.
The disadvantage of storing vector and scalar fields in three-dimensional textures is that the GPU can only read these textures no matter whether we take the graphics API or the GPGPU approach. The updated field must be written to the render target in case of the graphics API approach, and to the global memory if we use a GPGPU interface. Then, for the next simulation step, the last render target or global memory should be declared as an input texture.
In order to avoid write collisions, we follow a gathering approach and assign threads to each of the grid points storing output values. If GPUs fetch global data via textures, then the new value written by a thread becomes visible when the pass or the thread run is over, and the output is declared as an input texture for the next run. Thus, the computation of the time step should be decomposed to elemental update steps when the new output value of another grid point is needed. It means that we have and advection pass, a sequence of Jacobi iteration passes of the diffusion step, an external force calculation pass, and another sequence of Jacobi iteration passes of the projection step. With a GPGPU framework, a thread may directly read the data produced by another thread, but then synchronization is needed to make sure that the read value is already valid, so not the old but the new value is fetched. In such cases, synchronization points have the same role and passes or decomposed kernels.
In case of graphics APIs, there is one additional limitation. The render target can only be two-dimensional, thus either we flatten the layers of the three-dimensional voxel array into a large two-dimensional texture, or update just a single layer at a time. Flattened three-dimensional textures are shown by Figure 28.11. Once the textures are set up, one simulation step of the volume can be done by the rendering of a quad covering the flattened grid.
The graphics API approach has not only drawbacks but also an advantage over the GPGPU method, when the linear systems are solved with Jacobi iteration. The graphics API method runs the fragment shader for each grid point to update the solution in the texel associated with the grid point. However, if the neighbor elements of a particular grid point are negligible, we need less iteration steps than in a grid point where the neighbor elements are significant. In a quasi-SIMD machine like the GPU, iterating less in some of the processors is usually a bad idea. However, the exploitation of the early -culling hardware helps to sidestep this problem and boosts the performance [267]. The coordinate in the depth value is set proportionally to the maximum element in the neighborhood and to the iteration count. This way, as the iteration proceeds, the GPU processes less and less number of fragments, and can concentrate on important regions. According to our measurements, this optimization reduces the total simulation time by about 40%.
When we wish to visualize the flow, we can also assume that the flow carries a scalar display variable with itself. The display variable is analogous with some paint or confetti poured into the flow. The display variable is stored in a float voxel array.
Using the advection formula for display variable , its field can also be updated in parallel with the simulation of time step :
At a time, the color and opacity of a point can be obtained from the display variable using a user controlled transfer function.
We can use a 3D texture slicing rendering method to display the resulting display variable field, which means that we place semi-transparent polygons perpendicular to the view plane and blend them together in back to front order (Figure 28.12). The color and the opacity of the 3D texture is the function of the 3D display variable field.
In the Lagrangian approach, the space is discretized by identifying particles, i.e. following just finite number of fluid elements. Let us denote the position and the velocity of the th discrete fluid element by and , respectively. We assume that all particles represent fluid elements of the same mass , but as the density varies in space and will be the attribute of the particle, every particle is associated with a different volume of the fluid. The momentum conservation equation has the following form in this case:
If particles do not get lost during the simulation, the mass is automatically conserved. However, temporarily this mass may concentrate in smaller parts of the volume, so the simulated fluid is not incompressible. In Lagrangian simulation, we usually assume compressible gas.
From the knowledge of the system at discrete points, attributes are obtained at an arbitrary point via interpolation. Suppose we know an attribute at the particle locations, i.e. we have . Attribute is interpolated at location by a weighted sum of contributions from the particles:
where is the volume represented by the particle in point , and is a smoothing kernel, also called radial basis function, that depends on distance between the particle location and the point of interest. From a different point of view, the smoothing kernel expresses how quickly the impact of a particle diminishes farther away. The smoothing kernel is normalized if smoothing preserves the total amount of the attribute value, which is the case if the kernel has unit integral over the whole volumetric domain. An example for the possible kernels is the spiky kernel of maximum radius :
For normalized kernels, the particle density at point is approximated as:
As each particle has the same mass , the volume represented by particle is
According to the ideal gas law, the pressure is inversely proportional to the volume on constant temperature, thus at particle the pressure is
where constant depends on the temperature.
The pressure at an arbitrary point is
The acceleration due to pressure differences requires the computation of the gradient of the pressure field. As spatial variable shows up only in the smoothing kernel, the gradient can be computed by using the gradient of the smoothing kernel:
Thus, our first guess for the pressure force at particle is:
However, there is a problem here. Our approximation scheme could not guarantee to satisfy the physical rules including symmetry of forces and consequently the conservation of momentum. We should make sure that the force on particle due to particle is always equal to the force on particle due to particle . The symmetric relation can be ensured by modifying the pressure force in the following way:
The viscosity term contains the Laplacian of the vector field, which can be computed by using the Laplacian of the smoothing kernel:
Similarly to the pressure force, a symmetrized version is used instead that makes the forces symmetric:
External forces can be directly applied to particles. Particle-object collisions are solved by reflecting the velocity component that is perpendicular to the surface.
Having computed all forces, and approximating the time derivatives of (28.8) by finite differences, we may obtain the positions and velocities of each of the particles in the following way:
Note that this is also a forward Euler integration scheme, which has stability problems. Instead of this, we should use a stable version, for example, the Verlet integration [67].
The Lagrangian approach tracks a finite number of particles where the forces acting on them depend on the locations and actual properties of other particles. Thus, to update a system of particles, interactions should be examined. Such tasks are generally referred to as the N-body problem.
In a GPGPU framework, the particle attributes can be stored in the global memory as a one-dimensional array or can be fetched via one-dimensional textures. In graphics API frameworks, particle attributes can only be represented by textures. The advantage of reading the data via textures is only the better caching since now we cannot utilize the texture filtering hardware. A gathering type method would assign a thread to each of the controlled particles, and a thread would compute the effect of other particles on its own particle. As the smoothing kernel has finite support, only those particles can interact with the considered one, which are not farther than the maximum radius of the smoothing filter. It is worth identifying these particles only once, storing them in a two-dimensional texture of in the global memory, and using this information in all subsequent kernels.
Figure 28.13. Data structures stored in arrays or textures. One-dimensional float3 arrays store the particles' position and velocity. A one-dimensional float2 texture stores the computed density and pressure. Finally, a two-dimensional texture identifies nearby particles for each particle.
A GPGPU approach would need three one-dimensional arrays representing the particle position, velocity, density and pressure, and a two-dimensional array for the neighboring particles (Figure 28.13). In a graphics API approach, these are one- or two-dimensional textures. We can run a kernel or a fragment shader for each of the particles. In a GPGPU solution it poses no problem for the kernel to output a complete column of the neighborhood array, but in the fragment shaders of older GPUs the maximum size of a single fragment is limited. To solve this, we may limit the number of considered neighbor particles to the number that can be outputted with the available multiple render target option.
Figure 28.14. A time step of the Lagrangian solver. The considered particle is the red one, and its neighbors are yellow.
The processing of a single particle should be decomposed to passes or kernel runs when we would like to use the already updated properties of other particles (Figure 28.14). The first pass is the identification of the neighbors for each particles, i.e. those other particles that are closer than the support of the smoothing kernel. The output of this step is a two-dimensional array where columns are selected by the index of the considered particle and the elements in this column store the index and the distance of those particles that are close by.
The second pass calculates the density and the pressure from the number and the distance of the nearby particles. Having finished this pass, the pressure of every particle will be available for all threads. The third pass computes the forces from the pressure and the velocity of nearby particles. Finally, each particle gets its updated velocity and is moved to its new position.
Figure 28.15. Animations obtained with a Lagrangian solver rendering particles with spheres (upper image) and generating the isosurface (lower image) [114].
Having obtained the particle positions, the system can be visualized by different methods. For example, we can render a point or a small sphere for each particle (upper image of Figure 28.15). Alternatively, we can splat particles onto the screen, resulting in a rendering style similar to that of the Eulerian solver (Figure 28.12). Finally, we can also find the surface of the fluid and compute reflections and refractions here using the laws of geometric optics (lower image of Figure 28.15). The surface of fluid is the isosurface of the density field, which is the solution of the following implicit equation:
This equation can be solved for points visible in the virtual camera by ray marching. We trace a ray from the eye position through the pixel and make small steps on it. At every sample position we check whether the interpolated density has exceeded the specified isovalue . The first step when this happens is the intersection of the ray and the isosurface. The rays are continued from here into the reflection and refraction directions. The computation of these directions also requires the normal vector of the isosurface, which can be calculated as the gradient of the density field.
Exercises
28.7-1 Implement a game-of-life in CUDA. On a two-dimensional grid of cells, every cell is either populated of unpopulated. In every step, all cell states are re-evaluated. For populated cells:
Each cell with one or no neighbors dies, as if by loneliness.
Each cell with four or more neighbors dies, as if by overpopulation.
Each cell with two or three neighbors survives.
For unpopulated cells:
Each cell with three neighbors becomes populated.
Store cell states in arrays accessible as textures. Always compute the next iteration state into a different output array. Start with a random grid and display results using the graphics API.
28.7-2 Implement a wave equation solver. The wave equation is a partial differential equation:
where is the wave height above point in time , and is the speed of the wave.
CHAPTER NOTES |
The fixed transformation and multi-texturing hardware of GPUs became programmable vertex and fragment shaders about a decade ago. The high floating point processing performance of GPUs has quickly created the need to use them not only for incremental rendering but for other algorithms as well. The first GPGPU algorithms were also graphics related, e.g. ray tracing or the simulation of natural phenomena. An excellent review about the early years of GPGPU computing can be found in [200]. Computer graphics researchers have been very enthusiastic to work with the new hardware since its general purpose features allowed them to implement algorithms that are conceptually different from the incremental rendering, including the physically plausible light transport, called global illumination [264], physics simulation of rigid body motion with accurate collision detection, fluid dynamics etc., which made realistic simulation and rendering possible in real-time systems and games. The GPU Gems book series [75], [194], [205] and the ShaderX (currently GPU Pro [70]) series provide a huge collection of such methods.
Since the emergence of GPGPU platforms like CUDA and OpenCL, GPU solutions have showed up in all fields of high performance computing. Online warehouses of papers and programs are the gpgpu.org homepage and the NVIDIA homepage [195], [196], which demonstrate the wide acceptance of this approach in many fields. Without aiming at completeness, successful GPU applications have targeted high performance computing tasks including simulation of all kinds of physics phenomena, differential equations, tomographic reconstruction, computer vision, database searches and compression, linear algebra, signal processing, molecular dynamics and docking, financial informatics, virus detection, finite element methods, Monte Carlo methods, simulation of computing machines (CNN, neural networks, quantum computers), pattern matching, DNA sequence alignment, cryptography, digital holography, quantum chemistry, etc.
To get a scalable system that is not limited by the memory of a single GPU card, we can build GPU clusters. A single PC can be equipped with four GPUs and the number of interconnected PCs is unlimited [293]. However, in such systems the communication will be the bottleneck since current communication channels cannot compete with the computing power of GPUs.
The European Union and the European Social Fund under the grant agreement no. TÁMOP 4.2.1/B-09/1/KMR-2010-0003.
Table of Contents
Sequences of elements of given sets of symbols have a great importance in different branches of natural sciences. For example in biology the 4-letter set containing the nucleotides (adenine, cytosine, guanine and thymine) and the 20-letter set containing the amino-acids (alanine, cysteine, asparagin-acid, glutamine-acid, phenyl, glycine, histidine, isoleucine, lysine, leucine, methionine, asparagine, proline, glutamine, arginine, serine, threonine, valine, triptophan, tyrosine) play leading role [124]. Mathematics uses the 10-letter set , while computer science prefer the binary set .
Arrays with elements from given sets of symbols have various applications, e.g. Dénes and Keedwell [63] reported on application in frequency allocation of multibeam satellites, Harvit [108] in designing mask configuration for spectrometers, Mitchell and Paterson [181] in cryptography, Ma [169] in picture coding and processing, Arazi [10] in position recovery of cars.
Complexity is a basic characteristic of arrays of symbols, since affects the cost of storage and reproduction, and the quantity of information stored in the arrays. The usual complexity measures are based on the time (or memory) needed for generating or recognizing them.
Cyclic sequences in which every possible sequence of a fixed length occurs exactly once have been studied for more than a hundred years [76]. This mathematical problem was extended to two-dimensional arrays by Reed and Stewart [232] who gave an example of sized perfect map. Fan et al. in 1985 [72] proved the existence of binary perfect maps for larger sizes.
In the first seven sections of this chapter we deal with the existence and construction of perfect arrays. The last three sections superperfect sequences, the subword complexity and its extension, the