1 A Classification Problem in the Credit Card Industry A Comparison of Three Solutions

发布于:2021-08-02 10:38:36

A Classi?cation Problem in the Credit Card Industry: A Comparison of Three Solutions
Christopher H. Jolly
GE Corporate Research and Development, Schenectady, NY 12301, USA

Mukkai S. Krishnamoorthy
Department of Computer Science, Rensselaer Polytechnic Institute, Troy, NY 12180, USA Abstract Classifying accounts is a critical part of ef?ciently servicing credit cards. A problem we call the Rule Coverage Problem (RCP) arrises when using rules for specifying a classi?cation strategy, and we develop and compare three approaches for solving this problem.

1 Introduction and RCP
GE Capital Services (GECS) is the world’s largest supplier and manager of private label credit cards, serving 60 million accounts for 300 retailers. With such a large number of accounts it is critical to perform all servicing functions effectively and ef?ciently since small improvements on a per-account basis have a signi?cant impact on pro?ts. Two major servicing functions are collections and marketing. Collecting from delinquent accounts consists of creating and implementing a collection strategy that allocates resources (e.g., collector phone calls and computer-generated letters) to delinquent accounts. For example, “good” customers should be separated from “bad” customers because we collect ef?ciently from them in different ways. An effective collection strategy maximizes payments less expenses over the long term while maximizing customer good will. A collection strategy is created either by intuition and experience or more recently by optimization programs such as PAYMENT [Makuch90]. Marketing active accounts consists of creating and implementing a marketing strategy that augment billing statements with promotional messages and inserts. For example, credit card insurance only needs to be marketed to those card holders who do not already have it, and a new store opening only needs to be marketed to those living (or working or shopping) within a reasonable driving distance. An effective marketing strategy increases credit card usage, lowers advertising costs, and increases (credit card and retailer) brand loyalty.

Large, Dynamic Account Base

Classi?ed Account Base

Classi?cation Needs Rules

Intelligent System
Figure 1: Classi?cation Problem

Thus, GECS needs to be able to classify accounts into homogeneous groups as shown in Figure 1.

1

Objects Rules Classes
Figure 2: Using Rules For Classi?cation An ef?cient method for specifying a classi?cation strategy is with rules as shown in Figure 2. Each rule tests attributes of an object in the rule’s antecedent and concludes a class in its consequent (rules have the form “if antecedent then consequent”). A collection strategy typically consists of hundreds of rules, and a marketing strategy typically consists of tens of rules. if x < 50 then class AA if y < 10 then class BB Figure 3: Ambiguous for x = 40 and y = 4 Rules are ambiguous when more than one applies. For example, if x = 40 and y = 4 for the rules in Figure 3, the rules could assign either class AA or BB since both rules apply. To resolve the ambiguity, we assign priorities to the rules. The rules are written (from top to bottom) from highest priority to lowest priority. Therefore, in this example the rules would assign class AA.

1.1

Rule Coverage

Higher priority rules can cover (i.e., render useless) lower priority rules. In Figure 4, for example, whenever the antecedent of the second rule is true, the antecedent of the first rule is true. Since the first rule has higher priority, the second rule never will assign class BB to any object. if x < 500 then class AA if x < 400 then class BB Figure 4: Simple Example of Rule Coverage Under the assumption that a person is specifying classi?cation rules, then a rule coverage is a logical error similar to an unreachable statement in a programming language (see Figure 5). An intelligent rule editor should point out each rule covered and the rules covering it. if 1 = 1 then goto label AA x = 400 AA: continue Figure 5: Example of an Unreachable Program Statement Informally, the Rule Coverage Problem (RCP) is determining if any rules are covered by higher-priority rules. It can be shown that RCP is Co-NP-complete (see [Garey79] for an introduction to NP-complete problems).

1.2

Solving RCP

We look for potential algorithms for solving RCP by reducing RCP to a better known problems and examining the algorithms developed for them. The reduction, however, should not be too complicated; otherwise, the algorithms that work well for the better known problem may not work well for RCP after the reduction. In this work we reduce

2

instances of RCP to instances of Co-SAT, which can be solved by many methods [Hooker88]. We solved them using two methods: integer programming and resolution. Since RCP is Co-NP-complete and some practical classi?cation strategies have over 1000 rules, for our third approach we de?ne a class of subproblems of RCP and develop ef?cient algorithms for some of them. All three approaches were tested on sample cases, and the results are shown in Section 4.

2 Reducing RCP to a Co-SAT
It can be shown that RCP reduces to the unsatis?ability problem, Co-SAT, as shown in Figure 6. We reduce RCP to the Single Rule Coverage Problem (SRCP), SRCP to what we de?ne as a theorem proving problem over multiplevalued variables (MVTP), MVTP to what we de?ne as Multi-SAT, and Multi-SAT to SAT. Only the problem de?nitions are presented here; the reductions and proofs of correctness have been omitted to save space. RCP SRCP MVTP Co-Multi-SAT Co-SAT Figure 6: Reductions to a Known Problem

2.1

The Rule Coverage Problem (RCP)

In this section we give the formal de?nition of the Rule Coverage Problem (RCP). We start with some de?nitions. Throughout, we assume n rules and m variables. Let V = { v 1, v 2, …, v m } be a set of variables. Let T = { t v , t v , …, t v } be a set of test sets. Each t v is a set of simple 1 2 m i boolean tests on variable v i . Simple boolean tests are not formally defined here, but some examples are x < 500 , x = 7 , and 3 < x < 20 . if x < 75 and y < 5 and z = 0 if x < 50 and z = 1 if x > 35 if y < 2 if y > 6 and z = 1 then class AA then class BB then class CC then class DD then class EE

Figure 7: Example Classi?cation Rules For example, for the rules in Figure 7, the variables V = { x, y, z } , and the set of test sets T = { t x, t y, t z } , where test set t x = { x < 50, x < 75, x > 35 } , test set t y = { y < 2, y < 5, y > 6 } , and test set t z = { z = 0, z = 1 } . Let R = { r 1, r 2, …, r n } be a sequence (order is important) of rules. A rule typically has two parts: an antecedent, the

3

condition under which a rule is true, and a consequent, the actions of a rule. Since only the antecedent is relevant here, only the antecedent is shown. A rule is defined here as a conjunction of simple boolean tests. In addition, we often write rules in a more compact form as is done with boolean expressions. We put each test inside parentheses and have implied and’s between tests. Figure 8 shows the rules in Figure 7 in shorthand notation. R1: R2: R3: R4: R5:
( x < 75 ) ( y < 5 ) ( z = 0 ) ( x < 50 ) ( z = 1 ) ( x > 35 ) ( y < 2) ( y > 6) ( z = 1)

Figure 8: Example Rules The Rule Coverage Problem RCP(V,T,R) is given variables V, tests T, and rules R, is there a rule r i that is not satisfiable whenever rules r j for j < i are false (or, equivalently, is there a rule r i such that whenever r i is true, rule r j , for some j < i , is true.) For example, consider the rules shown in Figure 8. Variable z takes on the values 0 and 1 only. In this case rule R4 is covered. When R4 is true, either x ≤ 35 or x > 35 . If x > 35 , then R3 is true. If x ≤ 35 , then either z = 1 or z = 0 . If z = 1 , then R2 is true. If z = 0 , then R1 is true ( y < 5 since R4 is true). Thus, rule R4 is covered.

2.2

The Single Rule Coverage Problem (SRCP)

A natural subproblem of RCP is the Single Rule Coverage Problem (SRCP). The Single Rule Coverage Problem SRCP(V,T,R) is given variable V, tests T, and rules R, is the last rule, r n , not satis?able whenever rules r j , for j < R , are false (or, equivalently, whenever r n is true, is there a rule r j , for some j < R , that is true). In ?rst order logic SRCP(V,T,R) is

? ? ? ?A ? ? ? ? j < n, r j ( A ) ? ? r n ( A ) ? ? A ( r n ( A ) ? ? j < n, r j ( A ) ) ,

or, equivalently,

where A represents the variable assignments, and r i ( A ) is the boolean value of the rule under assignment A.

2.3

Multiple-valued variables

Whereas boolean variables take on the values true and false (or 1 and 0), we write multiple-valued variables to indicate variables that take on integer values from 0 to some ?nite number. We also refer to such variables as multi-valued variables or multi-variables for short. We also de?ne some notation to simplify the use of multi-valued variables. For boolean literals, a variable with a bar over it (such as x ) is true if the variable has the value 0, and a variable without a bar over it (such as x ) is true if the variable has the value 1. For multi-valued literals, we list as subscripts the values that make the literal true. For example, if x is a multi-valued variable, the literal x o is true when x has the value 0; x 1 is true when x has the value 1; and x 0, 1 is true when x has the value 0 or 1. When not ambiguous, we omit the subscripted commas and write x 0, 1 as x 01 . We often write multi-literal to mean a literal of a multi-variable. If a multi-literal has all possible subscripts, then we say it reduces to true, since the literal is always satis?ed.

4

2.4

Multi-Variable Theorem Proving Problem (MVTP)

A theorem proving problem using propositional logic is determining if P ? Q is a tautology, where P and Q are expressions over boolean variables. We define the Multi-Variable Theorem Proving Problem (MVTP) as a theorem proving problem over multi-valued variables. Formally, MVTP(V,L,P,Q) is given a set of multi-variables V = { v 1, v 2, …, v m } and lengths L = { l v , l v , …, l v } , where v i can take on integer values between 0 and l v – 1 , and 1 2 m i two expressions, P and Q, over V, is P ? Q a tautology. For example, consider V = { w, x, y, z } , P = x234 ( x 04 + y 0 ) w 0 , and Q = x 34 + y 01 + z 0 , where l w = 2 , l x = 5 , l y = 3 , and l z = 2 . P ? Q is
x 234 ( x 04 + y 0 ) w 0 ? x 34 + y 01 + z 0 ,

which is a tautology that can be proved using algebra or a truth table on all possible values for the variables.

2.5

Multi-SAT

Multi-SAT, de?ned here, is similar to SAT except that the variables are multi-valued variables, not boolean variables. Formally, Multi-SAT(V,L,C) is given a set of multi-variables V = { v 1, v 2, …, v m } and lengths L = { l v , l v , …, l v } , 1 2 m where v i can take on integer values between 0 and l v – 1 , and given a set of clauses C over the multi-literals from the i variables in V, is the conjunction of the clauses in C satisfiable. For example, consider V = { w, x, y, z } and C = x 234, x 04 + y 0, w 0, x 012, y 2, z 1 , where l w = 2 , l x = 5 , l y = 3 , and l z = 2 . The clauses in C are not all satisfiable, which can be shown by algebra or a truth table.

3 RCP and incomplete subproblems
Since RCP is Co-NP-complete, we may want to de?ne and solve a tractable subproblem. One natural class of subproblems is N-rule coverage. An algorithm performs N-rule coverage if it detects all rules that are covered by N or fewer rules, and it may or may not detect rules that are covered by more than N rules. An example of 1-rule coverage is shown in Figure 4, and an example of 3-rule coverage is shown in Figure 8.

3.1

Strategy for detecting N-rule coverage Rule 2 Rule 1 Rule 4

Rule 3
Figure 9: Graphical Representation of Three-Rule Coverage Rule coverage can be viewed graphically as shown in Figure 9, where rules 1, 2, and 3 cover rule 4. One way of deciding RCP is to determine if each rule is occupying space not covered by higher priority rules. If for the moment we restrict the variables in RCP to having two values, then we can view RCP as a boolean sum. Consider, for example, the rules in Figure 10. We can represent ( x < 50 ) by x , ( x ≥ 50 ) by x, ( y < 2 ) by y , and so on

5

(z only takes on the values 0 and 1), and we obtain the boolean expressions shown in Figure 11. We can record the R1: R2: R3: R4:
( x < 50 ) ( y < 2 ) ( z = 0 ) ( x < 50 ) ( z = 1 ) ( x ≥ 50 ) ( y < 2)

Figure 10: Rules for Boolean Sum R1: xyz R2: xz R3: x R4: y Figure 11: Expressions for Mathematical Sum space occupied by keeping track of the sum. Beginning with the highest priority rule, the space occupied by R1 is xyz . We sequentially add the next highest priority rule and check if the total space occupied by the sum increases. Adding R2, the sum, xyz + xz , simplifies to xy + xz , which occupies more space than xyz because the former includes xyz . Adding R3, the sum, xy + xz + x , simplifies to y + z + x , which occupies more space than the preceding sum because it includes xyz . Adding R4, the sum, y + z + x + x , simplifies to y + z + x . Since this sum is identical to the previous sum, no unoccupied space was added to the sum, and thus rule R4 is covered. One ?aw in this procedure is that minimal boolean sums are not unique, and thus determining if adding a term to a sum increases the total occupied space is a dif?cult problem (as dif?cult as SRCP). To overcome this ?aw we can represent the sum in a canonical form such as all prime implicants, which can be computed by the consensus procedure of Quine [Schneeweiss89]. Another ?aw of this summing procedure is that we restricted it to boolean variables. We overcome this ?aw by extending Quine’s consensus procedure for use with multi-variables.

3.2

Multi-variable consensus procedure

The consensus procedure of Quine can be extended for use with multi-variables to form a multi-variable consensus procedure. We also add a continuation rule, which like the simpli?cation rule speeds up the procedure when implemented on a computer. The multi-variable consensus procedure is useful in solving RCP because we can compute all prime implicants in a running “sum”, and if a rule is absorbed by any multi-variable prime implicant, it is covered. Before presenting the multi-variable consensus rules, we introduce some notation. The superscripted minus on a variable or a term T ( T ) indicates that the range of each multi-variable in T is a (non-strict) non-empty subset of the range of the corresponding variable in T. For example, if T = x 12 y 04 , then T might be x 12 y 04 , x 1 y 04 , x 2 y 0 , etc. Simi+ + larly the superscripted plus on a variable or a term T ( T ) indicates that the range of each multi-variable in T is a (non-strict) superset of the range of the corresponding variable in T.

3.2.1 The multi-variable absorption rule
The multi-variable absorption rule is

T + T U = T,
where T and U are multi-variable terms. (The absorption rule is T + TU = T .) For example, w 024 x 12 + w 04 x 12 y 23 z 01 becomes w 024 x 12 .

-

3.2.2 The multi-variable continuation rule
The multi-variable continuation rule is

6

xr T + xs T = xr ∪ s T ,
where x is a multi-variable and T is a multi-variable term. For example, x 024 y 12 + x 14 y 12 becomes x 0124 y 12 . Multi-variable x r ∪ s is replaced by true if r ∪ s includes all ranges. The multi-variable continuation rule is not applied if r ∪ s = r or r ∪ s = s . Note that the multi-variable continuation rule is derived from the multi-variable simpli?cation rule when T = T and U is empty, followed by the multi-variable absorption rule.

3.2.3 The multi-variable simpli?cation rule
The multi-variable simpli?cation rule is

xr T + xs T U = xr T + xr ∪ s T U ,
where x is a multi-variable and T and U are multi-variable terms. (The simpli?cation rule is x + x T = x + T .) For ? ? ? example, x 024 y 12 + x 14 y 2 z 01 becomes x 024 y 12 + x 0124 y 2 z 01 . Multi-variable x r ∪ s is replaced by true if r ∪ s includes all ranges. The multi-variable simpli?cation rule is not applied if r ∪ s = s . (Note that Quine’s simpli?cation rule could be written more generally as x T + x TU = x T + TU , which is derived from one application of the consensus rule, fol? ? ? lowed by one application of the absorption rule.)

-

-

3.2.4 The multi-variable consensus rule
The multi-variable consensus rule is to add a multi-variable consensus of two multi-variable terms. The multi-variable consensus of x r T and x s U is

Q( x r T , x s U ;x) = x r ∪ s TU .
(The consensus of xT and xU is Q( xT , xU ;x) = TU .) For example, x 024 y 12 w 1 + x 14 y 2 z 01 becomes x 024 y 12 w 1 + x 14 y 2 z 01 + x 0124 y 2 z 01 w 1 . Multi-variable x r ∪ s is replaced by true if r ∪ s includes all ranges. The consensus rule is not applied if r ∪ s = r , r ∪ s = s , or TU = φ .

3.3

Rule coverage
-

The multi-variable consensus rules can be used to create 1-rule, 2-rule, and 3-rule coverage algorithms using the following additional rule, which we call the partial absorption rule.

xr T + xs T U = xr T + xs – r T U ,
where x s – r T U is omitted if s ? r . Note that the partial absorption rule is a more general form of the multi-variable absorption rule. When s ? r , x r T absorbs x s T U . Only the part of x s T U that is absorbed is removed. Theorem 1: When a rule, R, is covered under 2-rule coverage, either (1) applying partial absorption on R, once for each of the two rules, absorbs R, or (2) R is only absorbed by a result of applying a multi-variable rule on the two covering rules where a variable is reduced to true.
-

We illustrate the two cases with simple examples. Assume terms T 1 and T 2 cover T a . If T 1 = x 024 y 12 w 1 and T 2 = x 14 y 2 z 01 , then T a = x 01 y 2 z 0 w 1 is absorbed after applying the partial absorption rule twice, ?rst with T 1 to reduce T a to x 1 y 2 z 0 w 1 , and then with T 2 , which absorbs it. If T 1 = x 024 y 12 w 1 and T 2 = x 134 y 2 z 01 , where x 01234 = true , then T a = y 2 z 0 w 1 is absorbed by the consensus of T 1 and T 2 , which is y 2 z 01 . Theorem 2: When a rule, R, is covered under 3-rule coverage, either (1) applying partial absorption on R twice, once for each of the three rules and then again on the three rules, absorbs R, or (2) R is only absorbed by a result of applying the multi-variable rules on the three covering rules where a variable is reduced to true.

We illustrate with an example. Assume terms T 1 , T 2 , and T 3 cover T a . If T 1 = x 0 y 1 , T 2 = x 1 y 1 , and T 3 = x 01 y 0 , then T a = x 01 y 01 is absorbed after applying the partial absorption rule in two passes. On the ?rst pass T a is not altered by T 1 or T 2 but is by T 3 to produce x 01 y 1 , which is partially absorbed on the second pass by T 1 and ?nally absorbed by

7

T2 .

Here is an algorithm for 1-rule coverage for RCP. Let P be the set of incomplete prime implicants in a running sum. (In a complete algorithm, P would consist only of prime implicants, but in an incomplete algorithm the terms in P may not all be prime implicants, and we refer to these terms as incomplete prime implicants.) The first step is to convert the n rules (e.g., ( x < 5 ) ( y < 100 ) ) into n multi-variable terms (e.g., x 01 y 012 ), and we label the terms T i , for 1 ≤ i ≤ n . We put the first term (corresponding to the first rule) in P. The following steps are repeated for the subsequent terms, where the next term is labeled T i . If T i is absorbed by a term in P, then the rule corresponding to T i is covered. Otherwise, add T i to P. Since at each step, P contains all previous terms, this algorithm performs 1-rule coverage. The algorithm for using partial absorption for 2-rule coverage is the same as the algorithm for 1-rule coverage except for two differences. First, the partial absorption rule is used instead of the absorption rule. This ensures 2-rule coverage for all coverages that do not reduce a variable to true. Second, only the results of applying a multi-variable rule that reduce a variable to true are added to P (versus all results of applying a multi-variable rule). This ensures 2-rule coverage for all coverages that reduce a variable to true. The algorithm for using partial absorption for 3-rule coverage is the same as the algorithm for 2-rule coverage except for two differences. First, the partial absorption rule is applied for two passes on the terms in P (if the term is not partially absorbed on the ?rst pass, then the second pass is unnecessary). This ensures 3-rule coverage for all coverages that do not reduce a variable to true. Second, the results of applying a multi-variable rule that reduce a variable to true, in addition to being added to P, are used with the multi-variable rules and the other terms in P (that correspond to the original rules) and those results that reduce a variable to true are also added to P. This ensures 3-rule coverage when one or two variables are reduced to true. The running time of the 3-rule coverage algorithm is O ( n 4 m 3 ) for n terms of m variables. Note that the size of P is n plus the number of results having one or two variables reduced to true. The number of results having one variable reduced to true is O ( n 2 m ) , and if all of these combine with the n original terms to reduce two variables to true, then P grows to O ( n 3 m 2 ) . If, however, the number of terms added to P is O ( n ) , which is likely for our applications, then the running time is O ( n 2 m ) .

4 Experimental results
In this section we present the results of testing the three methods for solving RCP. We developed two groups of test cases: ? actual classi?cation rules used at GE Capital, and ? random Multi-SAT instances (satis?ability problems where each variable can take on more than two values). In the following graphs “CPLEX” represents the approach where instances of RCP are reduced to instances of CoSAT, which are solved using integer programming in the CPLEX package; “OTTER” represents the approach where instances of RCP are reduced to instances of Co-SAT, which are solved using resolution in the OTTER package; and “RULE” represents the approach where instances of RCP are solved with the multi-variable consensus procedure. The execution times in the graphs include only the time when searching for coverage (i.e., no overhead processing time from reading/interpreting the problem, etc.). We have fewer examples from the OTTER approach because its overhead processing uses days to weeks of CPU time.

8

4.1

Actual rules used at GE Capital

This group of test cases consists of the actual classi?cation rule sets used at GE Capital. However, we only have 10 test cases.

Figure 12: Rule Coverages for CPLEX, OTTER, and Multi-variable Consensus Rule

Figure 12 shows the rule coverages detected by the three approaches, and Figure 13 shows the execution times. The CPLEX and OTTER solutions, being complete, found all of the rule coverages, but they are much slower than the multi-variable consensus procedure. The multi-variable consensus procedure missed only 2 of the 468 covered rules, and it is much faster.

Figure 13: Execution Times for CPLEX, OTTER, and Multi-variable Consensus Rule

4.2

Random Multi-SAT problems

This group of test cases were created to be similar to actual classi?cation rules in order to provide additional insight on the execution times of the three methods. However, no randomly generated instance of Multi-SAT had any covered rules. This indicates that the instances of RCP that occur in practice may not be the harder instances, if the hard instances have a probability of being satis?ed equal to 0.5, as in the case of many hard instances of the satis?ability problem. Figure 14 shows the execution times for the CPLEX and multi-variable consensus procedure solutions. The multivariable consensus procedure is much faster for these test cases. Due to the extremely long running times of the OTTER solution, the results for this approach are not available yet.

9

Figure 14: Execution Times for Instances of Random Multi-SAT

5 Bibliography
[Makuch90] W. Makuch. Optimizing the Collection of Delinquent Consumer Credit. Ph.D. Thesis, Rensselaer Polytechnic Institute, May, 1990. W. Schneeweiss. Boolean Functions with Engineering Applications and Computer Programs. Springer-Verlag, New York, N.Y., 1989. J. Hooker. A Quantitative Approach to Logical Inference. Decision Support Systems, 4:4569, 1988. M. Garey and D. Johnson. Computers and Intractability: A Guide to the Theory of NPCompleteness. W. H. Freeman and Company, New York, N.Y., 1979.

[Schneeweiss89]

[Hooker88]

[Garey79]

10


相关推荐

最新更新

猜你喜欢