Viagra Blumen - We Always Have The Best Offers In Our Online

Transcription

Viagra Blumen - We Always Have The Best Offers In Our Online
Market Analysis Using a Combination of
Bayesian Networks and Description Logics
Phillip M. Yelland
© 1999 Sun Microsystems, Inc. All rights reserved. The SML Technical Report Series is published by Sun Microsystems Laboratories, of Sun
Microsystems, Inc. Printed in U.S.A.
Unlimited copying without fee is permitted provided that the copies are not made nor distributed for direct commercial advantage, and credit to the
source is given. Otherwise, no part of this work covered by copyright hereon may be reproduced in any form or by any means graphic, electronic,
or mechanical, including photocopying, recording, taping, or storage in an information retrieval system, without the prior written permission of the
copyright owner.
TRADEMARKS
Sun, Sun Microsystems, and the Sun logo are trademarks or registered trademarks of Sun Microsystems, Inc. in the U.S. and other countries. All
SPARC trademarks are used under license and are trademarks or registered trademarks of SPARC International, Inc. in the U.S. and other countries. Products bearing SPARC trademarks are based upon an architecture developed by Sun Microsystems, Inc.
For information regarding the SML Technical Report Series, contact Jeanie Treichel, Editor-in-Chief <jeanie.treichel@eng.sun.com>.
Market Analysis Using a Combination of
Bayesian Networks and Description Logics
Phillip M. Yelland,
Sun Microsystems Laboratories
SMLI TR-99-78
August 1999
Abstract:
The work described in this paper was inspired by a problem increasingly vexatious to many businesses
confronting the ever-diminishing life cycles of modern products—viz., that of predicting characteristics
(such as overall demand, segmentation, etc.) of markets facing new product introductions. A framework is
proposed that allows the market parameters of new products to be derived by analogy with those of old
ones. To do so, the framework combines the capabilities of Bayesian networks [14] and description logics
[16].
The paper commences with an exposition of the problem that motivates the work. There follow brief
descriptions of Bayesian networks and description logics in their unalloyed state, and a discussion of
issues surrounding their combination. The combined system is presented, along with an account of its formal details and an inference procedure for entailment. A sample application of the framework is given. The
paper concludes by comparing the proposed framework with related existing systems and by suggesting
possible courses for future development.
M/S MTV29-01
901 San Antonio Road
Palo Alto, CA 94303-4900
email address:
phillip.yelland@sun.com
home page URL:
http://www.sunlabs.com/~yelland
1 Introduction
The introduction of new products has been an enduring cause of difficulty for many companies.
To design, produce, market, and distribute a new product most effectively, a company is required
to predict characteristics of its market, such as the most propitious market segments, the ideal
positioning of the product with respect to those segments, the demand in those segments over
time, and so on. The problems arising from such requirements are particularly acute in modern
high technology markets, where very short product life cycles are the norm.
Unfortunately, by virtue of its novelty, a new product is usually a poor candidate for the sort of
time-series forecasting methods applied to established products [17]. On the other hand, in many
instances, the market parameters of a new product can be obtained by analogy. By distinguishing
the features of a new product embodied (albeit in slightly different form) in established products,
it is often possible to make some assessment of likely markets for the new one.1
The work described in this paper attempts to place such analogy-based forecasting methods on a
formal footing. To do so, it combines two popular approaches to knowledge representation for
expert systems and decision support: Bayesian networks (BN’s) and description logics (DL’s). In
this context, BN’s and DL’s offer complementary capabilities. On the one hand, Bayesian networks allow the uncertainties implicit in the forecasting process to be expressed in a concise and
intuitive manner. Description logics, on the other hand, furnish an effective means of describing
and organizing the characteristics of products. Conversely, BN’s are lacking as regards the representation of complex entities (like products), and DL’s have very limited capabilities for modeling uncertainty. By combining both formalisms, the framework described here seeks to bring the
strengths of both to bear, while ameliorating their weaknesses.
1.1 Overview of the Paper
The paper begins with a fuller description of the problem that motivates the work. Brief descriptions are given of Bayesian networks and description logics separately, and issues surrounding
their combination are explored. The combined system is presented, along with an account of its
1
An informal description of such procedures is also given in [17].
—1—
formal characteristics and an inference procedure for entailment in the system. An example shows
how the combined system may be applied to a sample problem of the kind envisaged at the outset. Related existing work is outlined, and compared with that described here. The paper concludes by suggesting possible courses for future development.
2 Background
2.1 Motivation
As was indicated in the introduction, a manifest feature of modern high technology markets is the
rapidity with which products are introduced, mature and approach obsolescence. For many
classes of product, life cycles of a mere six months’ duration are not unusual. Needless to say,
breakneck product turnover can impose great burdens on a company involved in such a market,
given the necessity of preparations for the design, production, marketing and distribution of new
products.
Under different circumstances, reasonably accurate forecasts of demand might be produced by
extrapolating prior demands using time-series analyses like those described in [17]. Unfortunately, with so many products commanding only evanescent life spans, such analyses are rarely
applicable.
A major vendor of technical workstation and server computers, Sun Microsystems, Inc., must
contend with the torrid pace of change in high technology markets and the problems it causes. In
response to these exigencies of new product introductions, Sun recently instituted a project entitled KAPTUR (Knowledge Acquisition for Planning Product Transitions through Understanding
Risk). The rubric of the KAPTUR project is to formulate forecasts for new products based not on
time-series analysis, but by drawing analogies with those products’ forebears. To achieve this,
attributes of existing products are identified, and the effects of those attributes on market parameters for the products are ascertained.
The intent of the work described in this paper is to supplement the KAPTUR process with a formal framework for expressing the relationships between product attributes and markets and for
—2—
combining those relationships to arrive at forecasts for new products. The next section introduces
the constituents of this formal framework.
2.2 Bayesian Networks
Bayesian networks are graphic representations of relationships between sets of random variables,
where for the purposes of this discussion, a random variable may be thought of as a variable that
adopts values from a given set according to some probability distribution. In more precise terms,
a Bayesian network (or BN) is a directed acyclic graph (DAG) with nodes representing random
variables and arcs portraying direct correlation between variables. Each node in a BN is also associated with a conditional probability distribution, which gives the probabilities associated with
values of the variable represented by the node for different configurations of its parents’ variables. A common practice—followed here—is to restrict attention to BN’s containing discrete
random variables only (i.e., variables ranging over finite sets), with conditional probability distributions specified using tables.
A BN encapsulates a set of independence assertions between the variables it contains, in the sense
that each variable in the network is independent of its non-descendants given its parents. To state
this formally, a little notation is necessary. The probability that random variable x takes on the
value c is denoted p(x = c). Similarly, p(x) represents the probability distribution comprising all
possible values of x. If y is another random variable, the corresponding joint and conditional
probabilities are denoted p( x = c, y = c ′ ), p( x , y ), p( x = c | y = c ′) and p( x | y ) . Now let x1, x2, …,
xn enumerate the nodes in a BN, such that each node appears after its parents. For each i, let πi be
the set of parents of node xi in the graph. Then according to the network, variable xi is conditionally independent of the variables {x1, x2, …, xn} given values for its parents πi, which is to say that
!, x
p( xi | x1 , x2 ,
i −1 )
= p( xi |π i ).
Thus a BN represents an encoding of a joint probability distribution for the variables it contains.
For by the chain rule:
!, x ) = ∏ p( x | x , x ,!, x
n
p( x1 , x 2 ,
n
i
i =1
—3—
1
2
i −1 )
And we also have that:
p( xi | x1 , x2 ,
!, x
i −1 )
= p( xi | π i ),
for i = 1,
!, n
Whence:
!
n
p( x1 , x 2 , , x n ) = ∏ p( xi | π i )
i =1
Observe that the factors p(xi | πi) are precisely the conditional probability tables attached to each
node in the network.
Having computed the joint probability encoded by a BN, calculation of the probability distributions associated with sets of nodes is straightforward, at least in concept. To illustrate, consider
; !, c @. Then
random variables x1 and x2, such that the entire range of x2 comprises the values c1 ,
n
the probability distribution of x1 may be derived from the joint probability distribution p( x1 , x2 )
by marginalizing out the variable x2:
p( x1 ) =
n
∑ p( x1 , x2 = c)
< ! A
c ∈ c1 , ,cn
More generally, marginalization can be applied to eliminate selections of variables from the full
joint probability distribution, allowing the computation of arbitrary joint probability distributions.
For BN’s of any significant size, computing probabilities directly by multiplication and marginalization is impractical; it requires the enumeration of sets of values whose sizes are potentially
exponential in the number of variables in the network. Fortunately, in the past decade or so, researchers have devised a number of algorithms for computing probabilities in Bayesian networks
that offer acceptable performance in many cases by exploiting the encoded independence assertions delineated above. Introductions to such algorithms may be found in [7], [22] or [14]. The
framework presented in this paper does not rely on the particular characteristics of any of these
algorithms, and any of them may be selected to perform the required calculations.
2.3 Description Logics
Description logics (also known as terminological description systems) grew out of attempts to
formalize the semantic networks and frame systems used in many early knowledge representation
efforts [4]. The past twenty years have seen the development of an impressive array of description
—4—
logics, ranging in expressive power from those equivalent to first-order predicate logic to far
more restricted systems like those described below [16].
All description logics comprise a means for describing sets of objects in a domain (such sets are
termed concepts) and relations (known as roles) between the objects in those sets. By way of an
example, consider the simple description logic )/ introduced in [3]. The syntax of )/ is as follows:
A − Atomic concept name
R − Role name
C :: = A |
−
−
C1 | | C2 |
−
∀R . C |
−
∃R
−
Atomic concept
Conjunction
Universal role quantification
Existential role quantification
Informally, atomic concepts are sets of objects that are assumed to be given. Also assumed given
are roles, relating objects in the domain. Three operations are provided for defining complex concepts:
•
Conjunction: The concept C1 |−| C2 comprises those objects that are members of both C1 and
C2.
•
Universal role quantification: All objects in the concept ∀R . C are related by role R only to
objects in C. The objects related to a given object by role R are known as the R-fillers of that
object; thus the concept ∀R . C may be described as the set of object with R-fillers all in C.
•
Existential role quantification: All objects in ∃R have at least one R-filler.
These informal specifications can be formalized by providing a model-theoretic semantics for
concept expressions. To wit, a model, D, c, r ,[[_]] , for )/ specifies a domain of objects D, for
each primitive concept P a set c( P ) ⊆ D , for each role R a relation r ( R) ⊆ D × D , and an interpretation function, [[ _ ]]: C →℘( D) , mapping concept expressions to subsets of the domain, satisfying:
[[ P ]] = c( P)
]] = [[ C1 ]] ∩ [[ C2 ]]
[[ C1 |−| C2
<
[[ ∃R ]] = <o ∈ D | ∃o′ ∈ D. o, o′ ∈ r( R)A
A
[[ ∀R . C ]] = o ∈ D | ∀o ′ ∈ D. o, o ′ ∈ r ( R) ⇒ o′ ∈[[ C2 ]]
—5—
3 Preliminary Considerations
Recall that the notational framework described in this paper was devised with a view to forecasting demand for proposed new product introductions. In principle, addressing such a problem with
the two formalisms described above is straightforward enough: Descriptions of features and
products are furnished using a description logic, and Bayesian networks specify the relationships
between product characteristics and market demand.
Combining the two formalisms requires some means of adapting BN’s (that normally express
relationships between random variables) to the specification of relationships between sets (the
denotations of DL expressions). To illustrate how this might be done, consider a node in a net-
; !, v @ .
work specifying the probability distribution of the random variable, x, with values v1 ,
n
Now in formal terms (see [8], for example), x defines a function fx from a sample space (the set of
; !, v @. By extension, it also defines a collection of sets
events under consideration) to the set v1 ,
f x−1 (v1 ),
!, f
−1
x ( vn )
n
— the pre-images of each value. Note that since each point in the sample
space is associated with precisely one value of x, this collection of pre-images partitions the
space—all are disjoint, and their union equals the space itself. An equivalent formulation of the
node in the network, therefore, is as an association of the pre-images f x−1 (v1 ),
!, f
−1
x ( vn )
with a
set of probabilities (the original probability distribution function of x). These pre-image sets may
be specified directly using expressions in a description logic, rather than indirectly using values
of a random variable. Accordingly, each event in the sample space would be interpreted as a random selection from the domain of objects, and the probability distribution defined by the network
would define the probabilities that such a random selection resulted in an object in the sets defined by the DL expressions.
Assigning sets of DL expressions to the nodes of a BN, however, is fraught with difficulties.
Consider, for example, ascribing the expression C1 |−| C2 to a node as part of its label set. In lieu of
any further information, this would only specify the probability of the combined expression, and
would fail to determine precisely the probability of either component. Such imprecision seems
contrary to the spirit of the Bayesian network paradigm, which seeks to produce a complete specification of probabilities in a problem. Such an expression would also interact with labels on other
—6—
nodes; for example, if C1 occurs in the label set of another node, the drafter of the network would
need to ensure that p(C1 |−| C2 , C1 ) = p(C1 , C2 ).
A possible remedy for such difficulties would involve restricting the labeling of nodes to some
class of “atomic” concept descriptions that do not feature conjunctions. Candidates from the language )/ would be primitive concepts, universal role quantification of primitive concepts (of the
!
form ∀R1 . ∀Rn . P , say), and existential role qualification ( ∃R ). Even with such restrictions,
however, difficulties abound. For example, if the expressions ∀R. C1 and ∀R. C2 appear in the label set of any node, then the network must assert (directly or indirectly) that p(∃R) = 1; otherwise
it follows that there are objects with no role fillers for R, implying that the sets ∀R. C1 and
∀R. C2 are not disjoint—a requirement if they are to label the same node.
A proliferation of such implicit consistency requirements can make the design of labeled networks inordinately onerous. As these examples illustrate, the interactions between different parts
of such a network can be extremely subtle, and apparently minor adjustments in one node can
have ramifications throughout the network. It would be desirable, therefore, to contrive a framework within which a complete probability distribution can be defined for the concepts of interest,
while minimizing the possibility for inconsistencies like those detailed above. The next section
lays out one such framework.
4 TDL: A Tiny Description Logic
The first step in constructing the combined framework is to adapt the description logic )/. The
resulting notation actually resembles the logics of feature structures [18] employed in linguistics,
in as far as it conflates universal and existential role quantification in a single construct. It also
invokes some of the flavor of type systems that incorporate record types and intersection [20].
—7—
4.1 Syntax
Formally, the syntax of this “Tiny Description Logic” is as follows:
P − Primitive concept name
R − Role name
C :: = P |
−
Primitive concept
−
|
|
C1 C2 |
Conjunction
−
R: C |
−
Role quantification
−
Some notation: If S is a set2 with members C1, C2, …, Cn, the expression | | S denotes the concept
C1 |−| C2 |−|
! |−| C . (That the latter is unambiguous follows from the commutativity and associativn
ity of conjunction in the semantics below.) Correspondingly, the expression C ∈C ′ is true iff C´
can be rearranged (again using commutativity and associativity) into the form C1 |−|
; !, C @.
where C ∈ C1 ,
! |−| C ,
n
n
The atomic expressions, A, of TDL are characterized by the following (recursive) definition:
A :: = P | R : A
In the sequel, it will be convenient to assume that the set of atomic expressions is finite.3 While
this restriction could be removed in principle, the technicalities required to deal with infinite sets
and probability measures over them would complicate the presentation.
4.2 Semantics
The semantics of TDL resembles that of )/, except that role quantification implies the existence
of at least one role filler. More explicitly, a model, D, c, r ,[[_]] , for TDL specifies a domain of
objects D, for each primitive concept P a set c( P ) ⊆ D , for each role R a relation r ( R) ⊆ D × D ,
and an interpretation function, [[ _ ]]: C →℘( D) , mapping concept expressions to subsets of the
domain, satisfying:
[[ P ]] = c( P)
[[ C1 |−| C2 ]] = [[ C1 ]] ∩ [[ C2 ]]
%K
&K
'
[[ R: C ]] = o ∈ D
∀o′ ∈ D. o, o′ ∈ r( R) ⇒ o′ ∈[[ C2 ]] and
∃o ′ ∈ D. o, o ′ ∈ r ( R)
2
(K
)K
*
Henceforth, all variables referring to sets appear in bold font.
This could be arranged by using only a finite number of primitive concepts and roles, and placing an upper bound on
the nesting of role quantification.
3
—8—
A useful property of atomic expressions in TDL is encapsulated in the following theorem:
Lemma 1.
Every expression in TDL is equivalent to a conjunction of atomic expressions.
Proof :
Define a function ν on expressions:
ν ( P) = P
ν (C1 |−| C2 ) = ν (C1 ) |−| ν (C2 )
−
ν ( R: C) = | | R: A A ∈ν (C)
<
A
It is routine to show that for all expressions C, ν (C ) is a conjunction of atomic expressions, and
that the denotations of both expressions are equal.
Q.E.D.
5 Labeled Bayesian Networks
The second element of the combined modeling framework provides a means of expressing probabilistic relationships between TDL concepts using Bayesian networks. The syntactic structure of
these networks is specified below, followed by a guide to their interpretation.
5.1 Syntax
Definition 1. A labeled Bayesian network (LBN) n, π , L , p comprises:
1. A DAG with nodes 1, …, n, such that each node i has parents
; ! @ ;! @
π i = π i1 , , π im ⊆ 1, , i − 1 .
2. An indexed collection of disjoint label sets L1 ,
!, L , such that each L ⊆ A , where A is
n
i
the set of atomic TDL expressions. For convenience’s sake, let C ∈L abbreviate
; !, n@. C ∈ L .
∃i ∈ 1,
i
3. For each node i, L ∈ Li and ( L1 ,
p( L | L1 ,
!, L ) .
!, L ) ∈ L × ! × L
m
π i1
m
—9—
π im ,
a conditional probability
!, L ∈ L , j ∈;1,!, n@, i ∈;1,!, j@, define:
p( L ,!, L ) = ∏ p( L | L ,!, L )
p( L ,!, L , L ,!, L ) = ∑ p( L ,!, L , L, L ,!, L )
For all L1 ∈ L1 ,
n
n
n
1
i −1
1
n
i +1
n
i =1
L ∈Li
i
1
π i1
π im
i −1
i +1
n
The nodes of a labeled Bayesian network represent collections of disjoint sets of objects denoted
by atomic expressions in TDL. The conditional probabilities in such a network assert the probability that an object selected at random will be a member of a given set, given that it is a member
of certain other sets.
That LBN’s express correlations between atomic concepts only is less of a restriction than it
might appear, given that, as shown in Lemma 1, every concept in TDL can be represented as a
conjunction of atomic concepts. The definition below specifies how such a conjunction can be
construed as a “joint event”—intuitively, the event that a randomly selected object is a member of
all the sets in the conjunction. Full details of the treatment of concepts in general are given in section 6.
Definition 2. A non-empty set S of atomic concepts is consistent wrt. an LBN n, π , L , p if for
; !, n@, | S ∩ L | ≤ 1. If S is consistent wrt. n,π , L, p , and S ⊆ L , let ( L ,!, L )
all i ∈ 1,
1
i
m
enumerate the members of S in the order of the label sets of which they are members. Then
−
p( | | S) = p( L1 ,
!, L ) .
m
5.2 Semantics
The models for TDL presented in section 4.2 provide a basis for interpreting the meaning of labeled Bayesian networks, in as far as they represent the atomic expressions that occur in the label
sets of such networks. The notion of a model needs to be extended, however, if it is to capture the
probabilistic attributes of LBN’s. The definition below encapsulates such an extension in the form
of a probability measure, which for the purposes of this paper can be thought of simply as a function that assigns a probability to sets of objects (see [8] for more technical details). Intuitively, the
value assigned a set by the probability measure of such a model represents the probability with
which a randomly-picked object is a member of that set.
— 10 —
Definition 3. A probabilistic TDL (PTDL) model D, c, r ,[[_]], µ is a TDL model
D, c, r ,[[_]] , together with a probability measure µ mapping subsets4 of D to the interval [0,1],
such that the denotation [[ C ]] of every concept expression C in TDL is in the domain of µ.
The relationship between a network and its models derives from the probabilities they ascribe to
sets of atomic concepts (as was hinted above, such probabilities form the basis for the assessment
of concepts in general). Specifically, for a probabilistic TDL model to satisfy a labeled Bayesian
network, it must:
•
Assign a probability to a conjunction of the atomic concepts in a network that agrees with the
probability assigned to that conjunction by the network itself.
•
Ensure that an object’s membership of concepts in the network is independent of its membership of concepts outside of the network. This means that the probability that an object is a
member of a conjunction with components inside and outside the network can be computed
simply by multiplying the probability that it’s a member of all the components inside the
network by the probability of membership of those components outside.
Definition 4. A PTDL model D, c, r ,[[_]], µ satisfies an LBN n, π , L , p (written
D, c, r ,[[_]], µ |= n, π , L , p ) iff the following hold:
−
−
1. For all non-empty, consistent S ⊆ L , µ ([[ | | S ]]) = p( | | S)
−
2. For all consistent S ⊆ L such that S ∩ L is non-empty, µ ([[ | | ( S \ L) ]]) ≠ 0 and
−
−
−
µ ([[ | | S ]]) = p( | | ( S ∩ L)) × µ ([[ | | ( S \ L) ]])
5.3 Consistency
Thus far, a framework has been proposed for labeling Bayesian networks with description logic
expressions. It was pointed out in section 3, however, that it is all too easy in general to produce
inconsistent specifications in this manner. While the definition of TDL and the restriction to
atomic concepts reduces the opportunities for producing inconsistencies, they are not eliminated.
The following lays down a condition sufficient to rule out inconsistency entirely.
4
Strictly speaking, the domain of µ should constitute a σ-algebra — again, see [8].
— 11 —
K
Definition 5. A labeled Bayesian network n, π , L , p is consistent iff p( L) = 0 for all
K
!× L such that
K
L = !, ( R : ! : R : P), !, ( R : ! : R : P ′ ), ! , for P ≠ P ′ and
P , P ′ ∈L , for some i ∈;1,!, n@ .
L ∈ L1 ×
1.
2.
n
1
1
m
m
i
Thus, if an LBN meets the above requirement, inference can proceed from it ensured of wellgroundedness. Formally:
Lemma 2.
All consistent LBN’s have at least one non-trivial model.
Proof:
Consider the construction of a model D, c, r ,[[_]], µ for an LBN n, π , L , p .
To construct D, first assume an enumeration B1 ,
>
C
!, B
m
of those atomic concepts not in L.5 For
each concept Bi, form a set N i = Bi , Bi , where the expression Bi is guaranteed distinct from
any atomic expression. Then D is simply the product L1 × L2 ×
= !, A ) ∈ D
Notation: Let a ( A) = ( A1 ,
k
!× L
n
× N1 ×
!× N
m.
; !, k @. A = A B , for atomic expression A.
∃i ∈ 1,
i
For each primitive concept P, put c( P ) = a ( P ).
For each role R, put r ( R) =
%& o, o′
'
()
*
∃A. o ∈a ( R: A), and
.
∀B. o ∈ a ( R: B ) ⇒ o ′ ∈a ( B )
Define the function [[_]] by induction using the equations given in section 4.2.
Associate arbitrarily a value pBi in the open interval (0,1) with each atomic concept Bi in the enumeration B1 ,
!, B , and let p
m
Bi
= 1 − pBi .
Define µ as follows:
•
µ ( ∅) = 0
•
For all o = ( A1 ,
•
For disjoint S1 , S2 ∈ D, let µ ( S1 ∪ S2 ) = µ ( S1 ) + µ ( S2 )
5
!, A , N ,!, N
n
1
m ) ∈ D,
;@
let µ ( o ) = p( A1 ,
Recall that it is assumed that the number of atomic concepts is finite.
— 12 —
!, A ) × p ×!× p
n
N1
Nm
.
It is fairly straightforward to verify that if the network n, π , L , p is consistent, the above postulates are non-contradictory and define a PTDL model that satisfies n, π , L , p .
Q.E.D.
6 Making Inferences
Recall the original intent stated in section 3: The determination of correlation between product
features and likely demand. In the context of the framework described in the previous sections,
“correlation” between TDL concepts can be construed as a conditional probability—the probability that a randomly-chosen object is a member of one concept given that it is a member of another. Given such an understanding, known correlation between product features and demand can
be specified using a labeled Bayesian network. Such a network circumscribes a class of models—
those PTDL models that satisfy it. Correlative relationships not appearing directly in the network
are derived as conditional probabilities that hold in all the network’s models.
The conditional probability of the (general) concept C1 given (general) C2 is denoted λ (C1 | C2 ) to
keep it distinct from the conditional probabilities (denoted by the symbol p) involving atomic
concepts asserted in the network. With respect to a given PTDL model
D, c, r ,[[_]], µ ,
λ (C1 |C2 ) is defined as the ratio of the joint probability of C1 and C2 to the probability of C2. In
symbols:
D, c, r ,[[_]], µ |= λ (C1 |C2 ) = λ
iff µ ([[ C1 ]] ∩ [[ C2 ]]) = λ × µ ([[ C2 ]])
A network entails a conditional probability only if that probability holds in all models of the network:
n, π , L, p |− λ (C1 | C2 ) = λ
⇒
0. 0 |= n,π , L, p ⇒ 0 |= λ(C |C ) = λ
∀
1
2
Combining the two definitions above gives:
n, π , L , p |− λ (C1 |C2 ) = λ ⇒ ∀ D, c, r ,[[ _]], µ |= n, π , L , p . µ ([[ C1 ]] ∩ [[ C2 ]]) = λ × µ ([[ C2 ]])
There is a particular class of conditional probability assertion for which a simple inference algorithm can be readily formulated. Assume we are given a consistent LBN % = n, π , L, p and two
−
−
TDL concepts C1 and C2. Further, assume that | | S1 = ν (C1 ) and | | S2 = ν (C2 ), where S1 ⊆ L, and
— 13 —
both S1 and S2 are consistent wrt. %. Define S2′ = S2 ∩ L. Then, for any model D, c, r ,[[_]], µ
of %, provided µ ([[ C2 ]]) ≠ 0:
−
−
µ ([[ C1 ]] ∩ [[ C2 ]]) µ ([[ | | S1 ]] ∩ [[ | | S2 ]])
=
−
µ ([[ C2 ]])
µ ([[ | | S2 ]])
=
=
=
−
µ ([[ | | ( S1∪ S2 )]])
−
µ ([[ | | S2 ]])
−
−
p( | | (( S1∪ S2 ) ∩ L)) × µ ([[ | | (( S1∪ S2 ) \ L)]])
−
−
p( | | ( S2 ∩ L )) × µ ([[ | | ( S2 \ L) ]])
−
−
p( | | ( S1∪ ( S2 ∩ L))) × µ ([[ | | ( S2 \ L)]])
−
−
p( | | ( S2 ∩ L)) × µ ([[ | | ( S2 \ L) ]])
since S1 ⊆ L
−
p( | | ( S1∪ S2′ ))
=
−
p( | | S2′ )
−
−
Finally, note that by the definition of a PTDL, µ ([[ C2 ]]) ≠ 0 ⇔ p( | | S2 ) ≠ 0 ⇔ p( | | S2′ ) ≠ 0.
−
So for any LBN % = n, π , L, p and any two TDL concepts C1 and C2, such that | | S1 = ν (C1 ) and
|−| S2 = ν (C2 ), establish that all of the following are true:
1. % is consistent,
2. S1 and S2 are consistent wrt. %,
3.
S1 ⊆ L,
4.
−
p( | | S2′ ) ≠ 0, where S2′ = S2 ∩ L,
Then the following inference is valid:
% |− λ (C1 | C2 ) =
−
p( | | ( S1∪ S2′ ))
−
p( | | S2′ )
Note that in computing the probabilities on the right hand side of the above equation, any of the
algorithms for inference in Bayesian networks mentioned in section 2.2 may be employed.
Such inferences may be set in the context of the forecasting problem posed at the outset: Provided
an LBN that isolates all the significant relationships between product characteristics and market
parameters in the form of conditional probabilities involving atomic concepts, formulate descriptions of a new product (C2) and a hypothetical market for that product (C1) as TDL expressions
involving (some of) those atomic concepts. Then the conditional probability λ (C1 | C2 ) provides
— 14 —
an indication that the new product will indeed face the market hypothesized. A concrete demonstration of this procedure is given in the next section.
7 Worked Example
This section demonstrates the application of the framework described above to an elementary
sample problem representative of (but far simpler than) those that arise in the context of Sun’s
KAPTUR program. The problem relates to the marketing of a new line of server computer products, each offering a particular combination of price, performance and storage capability. The
intent is to determine target market segments for different members of the product line-up.
The products themselves are described by concept expressions in TDL, composed from the primitive concepts and roles described informally in Figure 1. Figure 2 lists expressions that formally
characterize the defined concepts in the previous figure.
Figure 3 illustrates these concepts, and the hierarchical relationships between them (a concept fits
above another in the hierarchy if the latter is a specialization of it).
In Figure 4, a labeled Bayesian network (in the sense of Definition 1) is displayed that relates certain properties of products to their ideal market segments.6 For example, the network suggests
that an expensive product with high storage capacity is most likely a product for large businesses;
conversely, an expensive product with low storage capacity is probably a dud.
Given the information above, the inference procedure in the previous section may be employed to
estimate various aspects of the market for the new server products. For instance, the suitability of
the Silver model for sale as an EnterpriseProduct is given by the expression:
λ (EnterpriseProduct | Silver )
6
The LBN also specifies unconditional probabilities for storage capacities and prices.
— 15 —
Primitive Concepts
StorageUnit
Server
High, Medium, Low
LargeBusiness,
SmallBusiness
Dud
— The set of all storage products
— The set of all server products
— Quantity ranges
— Market segments
— A product that’s not likely to sell anywhere
Roles
Capacity
Speed
Tech
Price
Storage
MktSegment
— Capacity of a storage unit
— Speed of a storage unit
— Quality of innovation in a server product
— Price of a server product
— The type of storage unit in a server
— Ideal market segment for a server product
Defined Concepts
S10
S100
Gemstone
Metal
Opal
Sapphire
Silver
Gold
EnterpriseProduct
DepartmentalProduct
— Low-end storage unit
— High-end storage unit
— A high-end server line (not necessarily expensive)
— A low-end server line (cheap)
— A server in the Gemstone line with an S10 storage unit
— An expensive server in the Gemstone line with an S100 storage unit
— A server in the Metal line with an S10 storage unit
— A server in the Metal line with an S100 storage unit
— A product best suited for sale into large businesses
— A product best suited for sale into medium business
Figure 1: Informal descriptions
S10 = StorageUnit |−| Capacity:Low |−| Speed: Medium
S100 = StorageUnit |−| Capacity:High |−| Speed: High
Gemstone = Server |−| Tech: High
Metal = Server |−| Tech: Low |−| Price: Low
Opal = Gemstone |−| Storage: S10
Sapphire = Gemstone |−| Storage: S100 |−| Price: High
Silver = Metal |−| Storage: S10
Gold = Metal |−| Storage: S100
EnterpriseProduct
= MktSegment: LargeBusiness
DepartmentalProduct = MktSegment: MediumBusiness
Figure 2: Formal concept definitions
— 16 —
Server
Metal
Tech: Low
Price: Low
Gemstone
Tech: High
Gold
Storage: S100
Silver
Storage:S10
Storage
Unit
Sapphire
Storage: S100
Price: High
Opal
Storage: S10
S100
Capacity: High
Speed: High
Figure 3: Product taxonomies
— 17 —
S10
Capacity: Low
Speed: Medium
Storage:Capacity:High
Storage:Capacity:Low
0.4
0.6
Storage:Capacity:High
Storage:Capacity:High
Storage:Capacity:Low
Storage:Capacity:Low
Price:High
Price:Low
Price:High
Price:Low
Price:High
Price:Low
MktSegment:
MktSegment:
LargeBusiness SmallBusiness
0.7
0.1
0.8
0.2
0.2
0.1
0.2
0.4
0.6
0.4
Dud
0.2
0
0.7
0.4
Figure 4: Labeled Bayesian network
To compute the value of this expression, first express each concept in terms of its atomic constituents, as show in Lemma 1:
ν (EnterpriseProduct) = MktSegment:LargeBusiness
ν (Gold) = Server |−| Tech:Low |−| Price:Low
|−|
Storage:StorageUnit |−| Storage:Capacity:High |−| Storage:Speed:High
This gives:
;
@
S = ;Server , Tech:Low , Price:Low , Storage:StorageUnit , Storage:Capacity:High , Storage:Speed:High@
S ′ = ;Price:Low , Storage:Capacity:High@
S1 = MktSegment:LargeBusiness
2
2
It is not difficult to verify that all the preconditions set out in section 6 are satisfied in this case.
Hence by the inference rule:
:
?
−
p( | | MktSegment:LargeBusiness, Price:Low , Storage:Capacity:High )
−
p( | | Price:Low, Storage:Capacity:High )
0.8 × 0.4 × 0.4
7
=
0.4 × 0.4
= 0.8 7
λ (EnterpriseProduct | Silver ) =
:
7
?
It so happens in this case that the appropriate probability can be retrieved directly from the graph; this is not true in
general, but see section 9.
— 18 —
Other estimates of interest can be computed in similar fashion; for example:
λ (EnterpriseProduct | Opal) = 0.2
λ (EnterpriseProduct | Sapphire) = 0.7
λ (DepartmentalProduct | Sapphire) = 01
.
8 Related Work
Naturally, the work described here is by no means the first to combine elements of formal logic
and probability. Some of the later research of Carnap [6], for example, seeks to construe inferential reasoning in just this manner. More recently, Bacchus [2] and Halpern [9] have sought to
fashion modal logics encompassing probabilistic reasoning. While such general frameworks are
useful in elucidating judgements regarding probabilities, they give rise to inference procedures
too complex to mechanize effectively [1].
A number of researchers have turned to description logics as more tractable basis for probabilistic
inference. The work of five such researchers is most germane to that recounted here, and is outlined in the following sections.
8.1 Heinsohn
In [11], Heinsohn highlights a probabilistic extension to description logics developed in full in his
1993 Ph.D. thesis [12]. Heinsohn’s system extends a simple description logic (a minor generalization of the language )/ defined in section 2.3) with p-conditionings. A p-conditioning involving
concepts C1 and C2 is an assertion the form:
p ,p ]
C1 [
→ C2 ,
l
where 0 ≤ pl ≤ pu ≤ 1.
u
Such a p-conditioning imposes a constraint on the extensions of C1 and C2 and their cardinalities
in valid models such that:
[[ C1 ]] ∩ [[ C2 ]]
[[ C1 ]]
∈ pl , pu
(Clearly, Heinsohn’s framework requires that the domain of all objects is finite.) Heinsohn provides inference rules that allow new p-conditionings to be derived in the presence of concept
definitions.
— 19 —
8.2 Jaeger
Jaeger’s work, summarized in [13], is closely allied with Heinsohn’s, though Jaeger places rather
greater emphasis on the derivation of beliefs about individual objects in the presence of statistical
assertions. (The work detailed in this paper focuses exclusively on statistical assertions; for an
introduction to the wider context, see [9].) In Jaeger’s framework, general probabilistic assertions
of the form p(C | D) , for (possibly non-atomic) concept expressions C and D, impose linear
equalities on probability measures defined over the Lindenbaum algebra of concept expressions.8
Jaeger shows how such equalities determine a convex polytope (in the space of discrete probability distributions), which may be used to derive bounds on the conditional probabilities of other
concept expressions.
Both Heinsohn and Jaeger’s approach to probabilistic inference (viz., the derivation of probability
intervals from partial descriptions of conditional probabilities) resemble the rule-based approaches of some early expert systems [19]. While this sort of framework is very general, and
makes it easy to establish the basics of a knowledge base, it suffers in certain regards in comparison with the Bayesian network approach espoused here. For example (as is shown in [7]), it can
be very difficult to establish a priori that a set of conditional probability assertions of this nature
is actually consistent; in contrast, any labeled Bayesian network that satisfies the fairly simple
constraints in Definition 5 is consistent by construction. Furthermore, the efficiency of inferences
based on Bayesian networks can often be carried out much more efficiently than in the more general Heinsohn/Jaeger framework, as Koller, Levy and Pfeffer [15] point out.
8.3 Koller, Levy and Pfeffer
Introduced in [15], P-CLASSIC is likely the system most closely allied to that described here, in
that it features elements of Bayesian networks and description logics. However, in contrast to this
work, which uses a very simple description logic to extend the capabilities of Bayesian networks,
the intent of P-CLASSIC is to bolster an existing description logic (the CLASSIC system described
8
The Lindenbaum algebra is formed by taking the quotient of the set of all concept expressions modulo logical equivalence.
— 20 —
in [5]) with the facilities of Bayesian networks. These divergent motivations give rise to some
important differences in technical character.
The central contribution of P-CLASSIC is the probabilistic class, or p-class. A p-class resembles a
labeled Bayesian network (defined in section 5), in that it comprises a Bayesian network whose
nodes are associated with concept expressions. Again, the intent is to assert a degree of correlation (or “overlap,” as it is termed in [15]) between certain concepts by specifying a joint probability distribution involving those concepts.
Primitive concepts may label the nodes of a p-class, as with LBN’s. In contrast to LBN’s, however (where they may appear on a node in the company of arbitrary atomic expressions), a primitive concept may appear in a p-class on a node whose only other label is the negation of that concept.9
The most profound difference between the approach taken in this paper and that of P-CLASSIC
occurs in the treatment of role quantification. Recall that in an LBN, it is possible to specify explicitly dependencies between arbitrary atomic concept expressions, including concepts involving
role quantification, such as R: P. P-CLASSIC handles such dependencies much more indirectly.
In P-CLASSIC, a p-class ρ must contain for each role R, a node PC ( R ) whose labels are p-classes.
For any configuration of its parents, this node determines a probability Prρ , R ( ρ ′ ) for each pclass ρ ′. 10 P-class ρ ′ in turn defines a probability Prρ ′ (C ) for concept C. Let Φ denote the set
of all p-classes in the system. Then according to the original p-class ρ, the probability that an object has n R-fillers in concept C is
cept ∀R: C is calculated as
∑ρ
∑ ∑ρ
bR
n =1
n
Prρ , R ( ρ ′) .Prρ ′ (C) . The probability assigned the con′∈Φ
n
Prρ , R ( ρ ′) .Prρ ′ (C) , where bR is an arbitrary finite upper
′∈Φ
bound that P-CLASSIC insists is imposed on the number of R-fillers of any object.
This approach has some intuitive appeal, but it makes it very difficult to express relationships
involving concepts involving role quantification—particularly since (at least as described in [15])
P-CLASSIC allows no node PC ( R ) to be the parent of any other node. In fact, the semantics of P-
9
Koller et al., extend CLASSIC with negations of primitive concepts.
Strictly speaking, this probability depends on the number of R-fillers; to simplify the explanation here, however, this
dependency is ignored.
10
— 21 —
CLASSIC require that the properties of an object and its fillers are independent given the filler’s pclass, making it impossible to express the sort of conditional independence assertions entailed by
Figure 4.
While there is no doubt that the mechanisms encapsulated in P-CLASSIC are useful adjuncts to the
capabilities of the underlying CLASSIC description logic, the limitations in its approach regarding
role quantification render it ill-suited to the particular type of application the framework described in this paper was designed to address.
9 Conclusion
Though the framework proposed in this paper goes some way towards meeting the goals set out
for it, there are doubtless innumerable ways in which it might be extended and improved.
It is incontestable, for example, that the description logic TDL provides only the barest essentials
required of such a notation. It would be interesting to explore the possibility of incorporating features found in more expressive logics, such as the description logic CLASSIC, for example, that
underlies P-CLASSIC. Whether the introduction of such features in the framework are compatible
with the fully determinate probabilities and simple consistency conditions offered by the current
configuration is an open question, however. It may be that a more complex, interval-based approach to probabilities like that of Heinsohn and Jaeger is required.
From a more pragmatic point of view, for a practitioner using the framework to provide a complete specification of the probabilities in an LBN such as that in section 7 can constitute a major
challenge—such information is normally only extracted through hours of rather tedious interviewing of the chief participants in product design and marketing. Certain short-cuts can be made
to ease the process: For example, in making inferences of the form λ (C1 | C2 ), it is not difficult to
show that the probability tables ascribed the root nodes in an LBN (that is, those nodes with no
parents) are immaterial, and need not be estimated.
Nonetheless, for reasonably large networks, fully estimating all of the required probabilities may
still be burdensome. One interesting alternative would be to attempt to learn (or refine by learn-
— 22 —
ing) the probabilities or indeed the structure itself of the appropriate LBN by examination of historical data; see [10] for more details.
The framework, as it currently stands, requires that a single LBN capture at once all product features that are relevant to the determination of market parameters. Implicitly, any product features
not appearing in the network are considered immaterial. Truly innovative products (that sport features not considered in the original formulation) and unanticipated competitive developments require that the network be updated. Thus some means of structuring the networks—so as to allow
such updates to occur in an incremental and well-defined fashion—would be desirable; the multiply-sectioned Bayesian networks of [21] might provide a basis for such an extension.
10 Acknowledgements
I’d like to thank the members of the Advanced Planning Tools Group of the Supply Planning and
Management Department at Sun Microsystems—in particular, Jim Mullin and Eunice Lee—for
their kind cooperation and patient tuition concerning the intent and conduct of Sun’s KAPTUR
process. I’m also greatly obliged to my manager at Sun Microsystems Laboratories, Mario Wolzcko, and to other members of the Sun Labs Application Technologies Center.
11 References
[1]
Abadi, M., Halpern, J., Decidability and expressiveness for first-order logics of probability.
Information and Computation 112:1, 1994, pp. 1-36.
[2]
Bacchus, F., Representing and Reasoning with Probabilistic Knowledge. MIT Press, Cambridge, MA, 1990.
[3]
Brachman, R. J., Levesque, H. J., The tractability of subsumption in frame-based description languages. Proceedings of the 4th National Conference on Artificial Intelligence
(AAAI-84), 1984.
[4]
Brachman, R. J., Schmolze, J. G., An overview of the KL-ONE knowledge representation
system. Cognitive Science, Vol. 9, No. 2, 1985.
— 23 —
[5]
Brachman, R., Borgida, A., McGuinness, D., Patel-Schneider, P., Resnick, L., Living with
CLASSIC: When and how to use a KL-ONE-like language. In Sowa, J., (ed.), Principles of
Semantics Networks, Morgan Kaufmann, 1991.
[6]
Carnap, R., Logical Foundations of Probability. University of Chicago Press, Chicago IL, 2
edition, 1962.
[7]
Castillo, E., Gutiérrez, J., Hadi, A., Expert Systems and Probabilistic Network Models.
Monographs in Computer Science, Springer-Verlag, New York, NY, 1997.
[8]
Halmos, P., Measure Theory. Graduate Texts in Mathematics, 18, Springer-Verlag, New
York, NY, 1974.
[9]
Halpern, J., An analysis of first-order logics of probability, Artificial Intelligence 46, 1990,
pp. 311-350.
[10] Heckerman, D., Geiger, D., Chickering, D. M., Learning Bayesian networks: The combination of knowledge and statistical data. In Proceedings of the 10th Conference on Uncertainty in Artificial Intelligence, R. Lopez de Mantaras and D. Poole (editors), Morgan
Kaufmann Publishers, San Francisco, CA, 1994.
[11] Heinsohn, J., A Hybrid Approach for Modeling Uncertainty in Terminological Logics,
Deutsche Forschungszentrum für Künstliche Intelligenz Research Report-1991-24, 1991.
[12] Heinsohn, J., ALCP—Ein hybrider Ansatz zur Modellierung von Unsicherheit in terminologischen Logiken. Ph.D. thesis, Universität des Saarlandes, Saarbrücken, 1993.
[13] Jaeger, M., A logic for default reasoning about probabilities, in Proceedings of the Tenth
Conference on Uncertainty in Artificial Intelligence, R. Lopez de Mantaras and D. Poole
(editors), Morgan Kaufmann Publishers, San Francisco, CA, 1994.
[14] Jensen, F. V., An Introduction to Bayesian Networks. Springer, New York, 1996.
[15] Koller, D., Levy, A., Pfeffer, A., P-CLASSIC: A tractable probabilistic description logic.
Proceedings of 14th National Conference on Artificial Intelligence (AAAI-97), pp. 360-397,
1997.
— 24 —
[16] MacGregor, R., The evolving technology of classification-based knowledge representation
systems. in Sowa, J. F., (ed.) Principles of Semantic Networks: Explorations in the Representation of Knowledge, Morgan Kaufmann, San Mateo, 1991.
[17] Makridakis, S., Wheelwright, S., Hyndman, R., Forecasting: Methods and Applications
(3rd ed.). John Wiley & Sons, New York, NY, 1998.
[18] Moss, L. S., Completeness theorems for logics of feature structures, in Y.N. Moschovakis
(ed.), Logic From Computer Science, MSRI Publications Vol. 21, Springer-Verlag, 1991,
pp. 387-401.
[19] Neapolitan, R., E., Probabilistic Reasoning in Expert Systems: Theory and Algorithms.
Wiley-Interscience, New York, NY, 1990.
[20] Pierce B. C., Programming with Intersection Types and Bounded Polymorphism. Technical
Report CMU-CS-91-205, Carnegie-Mellon University, 1991.
[21] Xiang, Y., Poole, D., Beddoes, M., Multiply sectioned Bayesian networks and junction forests for large knowledge-based systems. Computational Intelligence, Vol.9, No.2, 1993.
[22] Zhang, N. L., Poole, D., Exploiting causal independence in Bayesian network inference.
Journal of Artificial Intelligence Research, 5, 1996.
— 25 —
About the Author
Phillip M. Yelland is a staff engineer at Sun Microsystems Laboratories. Since receiving his
M.A. and Ph.D. in Computer Science from the University of Cambridge in England, he has
worked in a wide variety of settings, ranging from software development management in a
number of high-tech startups to industrial research for major corporations. He has made a
number of contributions to academic and industry publications. Throughout, his activities
have reflected a preoccupation with the application of theoretical knowledge to the practical
conduct of business. In addition to his current research, which encompasses formal methods
for the analysis of object-oriented programming systems and the use of statistical techniques
in supply-chain management, he is also completing an M.B.A. at the University of California
at Berkeley.
— 26 —

Similar documents