Poster

Transcription

Poster

Algorithm 1: Concept annotation algorithm.
Algorithm 1: Concept annotation algorithm.
Input: A wiki W = (A, E)
0
Output: Annotated wiki W = (A, E0 )
Boosting Cross-‐lingual Knowledge Linking via
Concept Annotation catego
categories
as
as
Output: Annotated wiki W = (A, E )
2.2 Concept Annotation
Definition 2 Semantic Relatedness. Given a concept c, let
for
each
article
aa22 A
do
for
each
article
A
do
N
be
the
existing
annotations
in
the
same
article
with
c,
the
c
Important concepts in a wiki article are usually annotated by
catego
Extract
a1:
set
of
concepts
Caa inina;
a;
Algorithm
Concept
annotation
algorithm.
Her
Extract
a
set
of
concepts
C
Here
Semantic
Relatedness
between
a
candidate
destination
article
1
editors with hyperlinks to their corresponding articles in the
1
2
2 asby usb
Get
the
existing
annotations
N
in
a;
wiki
Get
the
existing
annotations
N
in
a;
a
a
and
the
context
annotations
of
c
is
computed
as:
wiki
a
same wiki database. These inner links guide readers to arti0
for
each
c
2
C
do
for
each
c
2
C
do
tained
tained
from
i
a
i
a
cles that provide related information about the subject of curOutput:
Annotated
wiki
W
=
(A,
E
)
1 X
Findthe
theset
set of
of candidate
candidate articles
ci ;ci ;
Find
articlesTcTi cof
of
Feature
1 was often used to estimate
2
SR(a, c) =
r(a, b)
(2)
Fea
i
rent article. The linkage structure
⇤
⇤
|Nc |
Get
b
arg max
maxb2T
⇥⇥SR(b,
NaN
);a );
i )i )
Get
b
==arg
LP(b|c
(b|c
SR(b,
Given
t
b2Nc
similarity between articles in traditional knowledge linking
b2Tccii LP
Giv
for each
article
a
2
A
do
⇤
⇤
N
=
N
[
{b
};
egories
of
[
]
a= Naa[ {b };
approaches, e.g., Wang et al., 2012 . However, there might
N
egorie
a
Extract a set of concepts Ca in a;
Her
end
putes
simi
be missing inner links in the wiki articles, especially when the
end
log(max(|Ia |, |Ib |)) log(|Ia \ Ib |)
putes
Get
the
existing
annotations
N
in
a;
wiki
b
a
categori
for
each
b
2
N
do
r(a, b) = 1
(3)
articles. L
Algorithm
1:bConcept
articles are newly created or have fewer editors. In this cirj
a annotation algorithm.
for
each
2
N
do
article
for
each
j ci 2 C
a a do
log(|W
|)
log(min(|I
|,
|I
|))
tained
a
b
as
if
<
a,
b
>
2
/
E
then
to
category
cumstance, link-based features cannot accurately assess the
j
Input: ifAFind
wiki
W
=2
(A,
E)
<
a,
b
>
/
E
then
the
set
of
candidate
articles
Tci of ci ;
to
cate
j
Cross-‐lingual knowledge linking is to automa)cally discover cross-‐lingual Fea
E
=
E
[
{<
a,
b
>}
0
structure-based similarity between articles. Therefore, our
categories
Concept A
nnotator
j
where Ia and Ib are the sets of inlinks of article a and article
⇤
Output:
Annotated
wiki
W
=
(A,
E(b|c
) i ) ⇥ SR(b, Na );
E
=
E
[
{<
a,
b
>}
Get
b
=
arg
max
LP
catego
b2T
j
Giv
c
end
basic
idea
is
to
first
uses
a
concept
annotator
to
enrich
the
i
links (CLs) between wikis, which can largely enrich b,the cross-‐lingual respectively;
and W is the set of all articles in the input
⇤
end
N
=
N
[
{b
};
egorie
inner links in wikis before finding new CLs.
a
a
end
wiki.
categories
of
two
articles,
the
category
similarity
is
computed
knowledge and facilitate sharing knowledge across different languages. The Algorithm
1:toConcept
annotation algorithm.
for
each
article
a
2
A
do
end
end
Recently, several approaches have
been proposed
annoputes
end
categories
of
two
articles,
the
category
similarity
is
computed
Algorithm
Concept
annotation
algorithm.
The Semantic Relatedness metric
has been
in several Extract
Input Wused
ikis
as
a
set
of
concepts
C
in
a;
Concept Here
CL P
redictor
a
[Mihalcea
seed CLs and the inner link structures are two important factors for finding tating documents
with 1:
concepts
in Wikipedia
and
for
each
b
2
N
do
article
end
return
N
j
a
d
Input: A wiki W = (A,
E)approaches for linking
as
other
documents
to
Wikipedia.
The
Get
the
existing
annotations
Na in a;is computed
wiki
by
Csomai,
2007;
Milne
and
Witten,
2008;
Kulkarni
et
al.,
2009;
categories
of
two
articles,
the
category
similarity
0
Input:
A
wiki
W
=
(A,
E)
if
<
a,
b
>
2
/
E
then
Extrac)on
to
cate
j
return
N
Algorithm
1:
Concept
annotation
algorithm.
new CLs. AShen
challenging p
roblem i
s h
ow t
o fi
nd l
arge s
cale C
Ls b
etween w
ikis d
inner
link
structures
in
the
wiki
are
used
to
compute
the
Se0
Output:
Annotated
wiki
W
=
(A,
E
)
for each cEi 2
C
do
tained
fr
et al., 2012]. These approaches aim to identify impora[
2
·
|
(C(a))
\
C(b)|
=
E
{<
a,
b
>}
catego
1!2
as
j
Output:
Annotated
wiki
Wwhen =
Eto )the
Relatedness.
One thing worth mentioning is that
(e.g. English and Baidu Baike) insufficient seed 2 · when
| 1!2
(C(a))
\
C(b)|
Find
the
set
of
candidate articles Tci of ci ;
fend
(a,
b)(=
(8)Featu
tant Wikipedia concepts
in the
given
documents
link
them
Input:
A and
wiki
W(A,
=there (A,
E)are mantic
51:
Feature
Outlink
similarity
1 be not ac⇤
fWiki b)(=
(8)
there are0 insufficient inner links, this metric
|
(C(a))|
+
|C(b)|
5 (a, might
1!2
Get
b
=
arg
max
LP
(b|c
)
⇥
SR(b,
N
);
corresponding
articles
in
Wikipedia.
The
concept
annotator
b2T
i
a
a
Given
c
Training
Outlinks
of
an
article
correspond
to
a
set
of
other
articles
CLs and inner links?
i
|
(C(a))|
+
|C(b)|
for
each
article
a
2
A
do
where
c
2
Output:
Annotated
wiki
W
=
(A,
E
)
end
1!2
curate
enough
to
reflect
the
relatedness
between
articles.
To
Feature
1:
Outlink
similarity
i
2
·
|
(C(a))
\
C(b)|
⇤
1!2
in our approach
provides
the similar
as existing apfor each
article
a 2 Afunction
doExtract
N
=
N
[
{b
};
egories
that
it
links
to.
The
outlink
similarity from
computes
thewiki
similaria
a
f
(a,
b)(=
(8)
a
set
of
concepts
C
in
a;
this
end,
our
approach
first
utilizes
the
known
CLs
to
merge
Here
(·)
maps
categories
one
to
another
5
end
a
Outlinks
ofAnnota)on
an
article
correspond
to
a
set
of
other
articles
1!2
instead
discovering
where
ENproaches. However,
Extract
aZH
setof of
conceptslinks
Cafrom
in a;an ex- the link structures
Regressio
Here
(·)
maps
categories
from
one
wiki
to
another
end
|
(C(a))|
+
|C(b)|
ties
between
articles
by
comparing
elements
in
their
outlinks.
putes
sim
1!2
1!2
of
two
wikis,
which
results
in
a
Integrated
return
N
Get
the
existing
annotations
N
in
a;
that
it
links
to.
outlink
similarity
computes
the
similarid The
wiki
by
using
the
CLs
between
categories,
which
can
be
obfor
each
article
a
2
A
do
ternal document
to
a
wiki,
concept
annotator
focuses
on
ena
cross-‐lingual l
ink categories
of
two
articles,
the
category
similarity
is
computed
Given
two
articles
a
2
W
and
b
2
W
,
let
O(a)
and
O(b)
for
each
b
2
N
do
The
CL pr
articles.
Get
the
existing
annotations
N
in
a;
1 can be ob2
j
a
wiki
by
using
the
CLs
between
categories,
which
a
Algorithm
1:
Concept
annotation
algorithm.
Concept
Network.
The
Semantic
Relatedness
is
computed
Regre
China
riching the inner links in a wiki.
Therefore,
there
will
be
ties
between
articles
by
comparing
elements
in
their
outlinks.
Extract
a
set
of
concepts
C
in
a;
for
each
c
2
C
do
Here
(·)
maps
categories
from
one
wiki
to
another
tained
from
Wikipedia.
中国
a
i
a
be
the
outlinks
of
a
and
b
respectively.
The
outlink
similarity
1!2
if < a, bj >2
/ E then
oftodifferen
asbased on the links in the
catego
Integrated
Concept
Network,
which
for
each
c
2
C
do
tained
from
Wikipedia.
i
a
Wiki 2
some
existing
before
annotating.
In our
Input:Tsinghua A wiki
W =links
(A,in
E)the articles
+categories,
Given
two
articles
a
2
W
and
b
2
W
,
let
O(a)
and
O(b)
The
C
1
2
Find
the
set
of
candidate
articles
T
of
c
;
Get
the
existing
annotations
N
in
a;
is
computed
as
wiki
by
using
the
CLs
between
which
can
be
obold
!
to
d
E
=
E
[
{<
a,
b
>}
c
i
a
categori
Feature
6
Category
similarity
0
+
j
i
is
formally
defined
as
follows.
Incremental 0
Predic)ng
Find
the
set
of
candidate
articles
T
of
c
;
categories
of
two
articles,
the
category
similarity
is
computed
ci algorithm.
i
approach,
the
inputs
of
concept
annotator
are
two
wikis
and
Feature
6
Category
similarity
University
Feature
1:
Outlink
similarity
Inner lOutput:
ink ⇤
Algorithm
1:
Concept
annotation
be
the
outlinks
of
a
and
b
respectively.
The
outlink
similarity this of
diff
Annotated wiki清华大学
W =⇤ (A,Inner E )link
purpo
end
Get
b
=
arg
max
LP
(b|c
)
⇥
SR(b,
N
);
for
each
c
2
C
do
2
·
|
(C(a))
\
C(b)|
tained
from
Wikipedia.
b2T
i
a
i
a
1!2
Given
two
articles
a
and
b,
let
C(a)
and
C(b)
be
the
cata set of CLs between
them.
The
concept
Get
b =
arg
maxannotation
(b|ci ) ⇥ SR(b,
Ncia3);fIntegrated
Outlinks
of
an
article
correspond
to
a
set
of
other
articles
Linking
Definition
Concept
Network.
Given
two
wikis
b2Tci LPprocess
2
·
|
(O(a))
\
O(b)|
Given
two
articles
a
and
b,
let
C(a)
and
C(b)
be
the
catas
1!2
b)(=
(8) as
where
is computed
old
!0
5 (a,of
+
⇤
end
+
Find
the
set
of
candidate
articles
T
c
;
f
(a,
b)
=
(4)
c
i
1
Feature
6
Category
similarity
consists
of
two
basic
steps:
(1)
concept
extraction,
and
(2)
⇤ W =
i W2 = (A
|2 ,1!2
(C(a))|
|C(b)|
N
=
N
[
{b
Input:
A
wiki
(A,
E)
+
W};
E2 ),
where
A+
Aegories
sets of
a
and
b
respectively.
Category
similarity
comBeijing
it
links
to.
The
outlink
similarity
computes
the
similaria
a
1 = (A1 , E1 ),
1 band
2 are that
N
=
N
[
{b
};
|
(O(a))|
+
|O(b)|
北京
egories
of
a
and
respectively.
Category
similarity
coma
a
⇤
1!2
this pu
for each
article a 2 A do
0
end
annotation.
Getend
b = arg maxb2TciofLP
(b|c
)
⇥
SR(b,
N
);
articles, iE1 and E2 are sets
and Wputes
, respecRegre
a of links in W1Given
2two
s(a,
b)
articles
a
and
b,
let
C(a)
and
C(b)
be
the
catties
between
articles
by
comparing
elements
in
their
outlinks.
2
·
|
(O(a))
\
O(b)|
similarities
between
categories
by
using
CLs
between
1!2
Output:
Annotated
wiki
W
=
(A,
E
)
end
Extract
a
set
of
concepts
C
in
a;
putes
similarities
between
categories
by
using
CLs(C(a))
HereLet1!2
(·)
maps
categories
from
one
wiki
to
another
k
where
(·)
is
a· =
function
tobetween
maps
articles
in W1 to their(4)
2
|
\
C(b)|
return
N
1!2
Concept Extraction. Inathis step,Npossible
concepts
⇤ that
d
f
(a,
b)
+
1!2
tively.
CL
=
{(a
,
b
)|a
2
B
,
b
2
B
}
be
the
set
of
1
i
i
i
1
i
2
i=1 of a Given
two
articles
a
2
W
and
b
2
W
,
let
O(a)
and
O(b)
=
N
[
{b
};
The C
egories
and
b
respectively.
Category
similarity
com1
2
a
a
for
each
b
2
N
do
Forbidden articles.
Let
E(c
)
and
E(c
)
be
the
set
of
articles
belonging
f
(a,
b)(=
(8)
j
a
|
(O(a))|
+
|O(b)|
corresponding
articles
in
W
if
there
are
CLs
between
them.
Getmight
the existing
annotations
N
in
a;
1
2
for
each
b
2
N
do
wiki
by
using
the
CLs
between
categories,
which
can
be
ob5be the set of articles
1!2
articles.
Let
E(c
)
and
E(c
)
2belonging
refer to other articlesjin the
wiki are identified in
a same
a
cross-lingual
links
between
W
and
W
,
where
B
✓
A
and
1
2
1
2
1
1
City
be Feature
thebetween
outlinks
ofcategories
a1!2
and
b respectively.
The
outlink
similarity
+ (C(a))|
of s(a
diff
故宫
|
+
|C(b)|
end
putes
similarities
by
using
CLs
between
2:
Outlink
similarity
if
<
a,
b
>
2
/
E
then
to
category
c
and
c
respectively,
the
similarity
between
two
for
each
c
2
C
do
an
input
article.
The
concepts
are
extracted
by
matching
all
tained
Wikipedia.
j
B2 ✓ from
A2 . The
Integrated
Concept Network
andwhere
Wis2 computed
is 1!2
iffor
< a,each
bj 郭金龙
>article
2
/ E then
i
a
1(·)similarity
Jinlong Guo
to category
c1 andofcW
the
between
twoarticles in W
Cross-‐lingual L
inks
is a 2function
to maps
2 1respectively,
a
2
A
do
For
eac
1 to their
+
as
old
!
0
+
Given
two
articles
a
2
W
and
b
2
W
,
let
O
(a)
and
for
each
b
2
N
do
the
n-grams
thecandidate
inputEarticle
with
the
elements
in
a
conarticles.
Let
E(c
)
and
E(c
)
be
the
set
of
articles
belonging
1
2
Find
the setinof
articles
T
of
c
;
j
a
ICN
(W
,
W
)
=
(V,
L),
where
V
=
A
[
A
;
L
is
a
set
of
Feature
1:
Outlink
similarity
E
=
E
[
{<
a,
b
>}
c
i
1
2
Feature
similarity
categories
is
computed
as
1
2
ia, bj >}
j1 6 2Category
=
E
[
{<
corresponding
articles
in
W
if
there
are
CLs
between
them.
mizes
the
categories
is
computed
as
2
+
Extract
a
set
of
concepts
C
in
a;
Here
(·)
maps
categories
from
one
wiki
to
another
this
p
⇤
a
1!2
O
(b)
be
the
outlinks
in
the
abstracts
and
infoboxes
of
a
and
trolled
vocabulary.
The
vocabulary
contains
the
titles
and
the
a
links
which
are
established
as
follows:
Get b = arg max
LP
(b|c
)
⇥
SR(b,
N
);
+ correspond
if
<
a,
b
>
2
/
E
then
Outlinks
of
an
article
to
a
set
of
other
articles
to
category
c
and
c
respectively,
the
similarity
between
two
b2T
i
a
j
Given
two
articles
a
and
b,
let
C(a)
and
C(b)
be
the
catc
end
1
2
where
c
is
predicte
2
·
|
(O(a))
\
O(b)|
Feature
2: Outlink
similarity
i
+ 1!2
0
0
0
i
b
respectively.
The
outlink
similarity
is
computed
as
Get
annotations
N
in
a;
anchor texts of all⇤ end
the articles
inthe
theexisting
input
wiki.
For
an
inwiki
by
using
the
CLs
between
categories,
which
can
be
obFor
f
(a,
b)
=
(4)
a
+
(v,
v ) 2ofEa
(v, vb )respectively.
2 E2 ! (v, v Category
) 2categories
L
that
itncomlinks
to.
computes
theOsimilari+
n X
m
1The outlink similarity
1_
E
=
E
[
{<
a,
b
>}
m
Na = Na [end
{b };
CL predic
is
computed
as
egories
and
similarity
j
X
Given
two
articles
a
2
W
and
b
2
W
,
let
(a)
and
X
X
end
1
2
0
0
0
0
|
(O(a))|
+
|O(b)|
1
put article a, the results offor
concept
extraction
contain
a set
1!2
1
Regress
mizes
a
b
each
c
2
C
do
ties
between
articles
by
comparing
elements
in
their
outlinks.
a
b
tained
from
Wikipedia.
+
(v,
v
)
2
E
^
(v,
b)
2
CL
^
(v
,
b
)
2
CL
!
(b,
b
)
2
L
+
+
propriately
i
a
1
endof concepts C = {ci }k and the sets
end
putes
similarities
between
categories
by
using
CLs
between
s(a
f
(a,
b)
=
(c
,
c
)
(9)
O
(b)
be
the
outlinks
in
the
abstracts
and
infoboxes
of
a
and
2
·
|
(O
(a))
\
O
(b)|
f
(a,
b)
=
(c
,
c
)
(9)
1!2
6
of
candidate
articles
6
0
0
0
0
i
j
i
j
end
i=1
end
where
(·)
is
a
function
to
maps
articles
in
W
to
their
is
pre
+
Given
two
articles
a
2
W
and
b
2
W
,
let
O(a)
and
O(b)
f
(a,
b)
=
(5)
1!2
1
The
CL
challengin
n
m
+
nm
2
1
2
nm
Find
the
set
of
candidate
articles
T
of
c
;
(v,
v
)
2
E
^
(v,
a)
2
CL
^
(a
,
v
)
2
CL
!
(a,
a
)
2
L
2
for{T
each
b
2
N
do
c
i
+
+
articles.
Let
E(c
)
and
E(c
)
be
the
set
of
articles
belonging
Feature
6
Category
similarity
X
X
i 1
b respectively.
The
outlink
similarity
is|O
computed
as them.
2
might link to.
end
|
(O
(a))|
+
(b)|
c1 , Tc2j, ..., Tcak } that each concept
i=1
j=1
i=1
j=1
1
1!2
corresponding
articles
in
W
if
there
are
CLs
between
a bThe outlink similarity
return
N
N
be
the
outlinks
of
a
and
b
respectively.
⇤
CL
pr
2
of
differ
Here,
w
d
d
ifAnnotation.
< a, breturn
>
2
/
E
then
to
category
c
and
c
respectively,
the
similarity
between
two
f
(a,
b)
=
(c
,
c
)
(9)
Get
b
=
arg
max
LP
(b|c
)
⇥
SR(b,
N
);
In the annotation
step,
links
from
the
conj
In the iIntegrated
Concept
Network, the link
structures
of
1
2
+ let C(a)
6articles
b2Tci
aLanguage-‐independent i+ and
j
Given
two
aInlink
and
b,
C(b)
be
the
catFeatures Feature
3:
similarity
end
Feature
2:
Outlink
similarity
is
computed
as
+
weights
of
old
!
to
propri
nm
0
2
·
|
(O
(a))
\
O
(b)|
cepts E
to =
theEdestination
are
identified.
Here
we
use
[ Cnocept
{< a, barticles
>}
For
⇤
categories
is
computed
as
two
wikis
are
merged
by
using
CLs.
The
Semantic
Related1!2
2|
(E(c
))
\
E(c
)|
Ar)cles
j
+W
2|
(E(c
))
\
E(c
)|
+
i=1
j=1
1!2
1
2
1!2
1
2
Inlinks
of
an
article
correspond
to
a
set
of
other
articles
N
=
N
[
{b
};
N-‐grams two
matching
Given
two
articles
a
2
W
and
b
2
,
let
O
(a)comand
egories
of
a
and
b
respectively.
Category
similarity
return
N
of
CLs
{(a
Extract C
oncept-‐Ar)cle f
(a,
b)
=
(5)
a
a
this
purp
1
2 is computed
challe
categories
of
two
articles,
the
category
similarity
d
2
(c
,
c
)
=
(10)
(c
,
c
)
=
(10)
metrics,
the
Link
Probability
and
the
Semantic
RelatedAlgorithm
1:
Concept
annotation
algorithm.
1
2
+ (a))|
+
ness computed based on Integrated Concept Network is more
end
1let I(a)
2
mizes
+
|
(O
+
|O
(b)|
linking
to
it,
and
I(b)
denote
inlinks
of
two
articles,
2
·
|
(O(a))
\
O(b)|
1!2
1!2
c
a
,a
,…,a
|
(E(c
))|
+
|E(c
)|
O
(b)
be
the
outlinks
in
the
abstracts
and
infoboxes
of
a
and
vocabulary
B
=
{(b
)
|
(E(c
))|
+
|E(c
)|
i
1!2
1
2
as
end
1
11
12
1p
n
m
1!2
1
2
Her
putes
similarities
between
categories
by
using
CLs
between
ness,
to
decide
the
correct
links
for
each
concept.
The
Link
reliable
than
on
a
single
wiki.
In
order
to
balance
the
Link
f12|
(a, b)
=
(4)
Feature 1: Outlink similarity
is
pre
XX
Feature 1:Input:
Outlink
similarity
(E(c
))
\
E(c
)|
end
+
the
inlink
similarity
is
computed
as
1!2
1
2
A wiki
W = (A, E)
1
ensure:
b
respectively.
The
outlink
similarity
is
computed
as
Feature
3:
Inlink
similarity
aapproach
b
|
(O(a))|
+
|O(b)|
1!2
Probability
measures
the
possibilities
that
an
article
is
the
(c
,
c
)
=
(10)
weigh
0
Probability
and
the
Semantic
Relatedness,
our
uses
c
a
,a
,…,a
f
(a,
b)
=
(c
,
c
)
(9)
1
2
a
b
for
each
b
2
N
do
CL
prb
articles.
Let
E(c
)
and
E(c
)
be
the
set
of
articles
belonging
6
Outlinks
of
an
article
correspond
to
a
set
of
other
articles
a
b
2
21
22
2q
j
a
i
j
1
2
Outlinks
of
an
article
correspond
to
a
set
of
other
articles
end destination of a given concept, which is approximated
where
c
2
C(a),
c
2
C(b),
n
=
|C(a)|,
and
m
=
|C(b)|.
s(a,
Output:
Annotated
wiki
W
=
(A,
E
)
|
(E(c
))|
+
|E(c
)|
where
c
2
C(a),
c
2
C(b),
n
=
|C(a)|,
and
m
=
|C(b)|.
Inlinks
of (·)
an is
article
correspond
to
a
of
other
articles
nm ii=1
Outlink s
jimilarity
2
·
|
(C(a))
\
C(b)|
1!2
1|1!2
2set
based
an algorithm that greedily selects
the
articles
that
maximize
i
j
of
CL
where
a
function
to
maps
articles
in
W
to
their
2
·
(I(a))
\
I(b)|
Feature
1:
Outlink
similarity
1!2
1
+
+
1!2
propri
j=1
that
it
links
to.
The
outlink
similarity
computes
the
similariif
<
a,
b
>
2
/
E
then
f
(a,
b)(=
(8)
c3 annotations
a31,a
to1category
c
and
c
respectively,
the
two
2 ·I(b)
| 1!2
(Osimilarity
(a))
\ O ofbetween
(b)|
returnon
Ndthe already known
52let
j The
that
links
to.Semantic
The outlink
computes
theAlgorithm
similari1
f
(a,
b)
=
(6)
32
3h
in,…,a
aitwiki.
3
linking
to
it,
I(a)
and
denote
inlinks
two
articles,
the similarity
product of the
two metrics.
outlines
the
alcorresponding
articles
in
W1!2
if (I(a))|
there are
CLs
between them.(5)
| CLs
+
|C(b)|
B
=
{
b)
=
2 (C(a))|
a
b f2 (a,
1!2
challe
|
+
|I(b)|
8a
2
Outlinks
of
an
article
correspond
to
a
set
of
other
articles
i
Regression-based
Model
for
Predicting
new
+
+
ties
between
articles
by
comparing
elements
in
their
outlinks.
where
c
2
C(a),
c
2
C(b),
n
=
|C(a)|,
and
m
=
|C(b)|.
for
article
a 2 Aindo
Ebetween
=the
[articles
{<each
a,
bby
Regression-based
Model
for
Predicting
new
CLs
Relatedness measures
between
des+computed
categories
is
computed
as
|
(O
(a))|
+
|O
(b)|
comparing
elements
in2|
their
outlinks.
gorithm
the
concept3annotator.
i
j
j >}
the
inlink
similarity
is
as
…relatednessties
…Ecandidate
1!2
(E(c
))
\
E(c
)|
Feature
2:
Outlink
similarity
1!2
1
2
ensure
Experiments
3.2
Cross-lingual
Links
Prediction
+
Her
that
it
links
to.
The
outlink
similarity
computes
the
similariExtract
a
set
of
concepts
C
in
a;
Feature
4:
Inlink
similarity
For
e
(c
,
c
)
=
(10)
Here
(·)
maps
categories
from
one
wiki
to
another
tinationGiven
article two
and the
surrounding
context
of
the
concept.
a 2, let
articles aend
2 W1 and
barticles
2 W2 ,alet
O(a)
and
O(b)
+
1
1!2
The
CL
predictor
in
our
approach
computes
the
weighted
sum
which
also
Given
two
2
W
and
b
2
W
O(a)
and
O(b)
Given
two
articles
a
2
W
and
b
2
W
,
let
O
(a)
and
The
CL
predictor
in
our
approach
computes
the
weighted
sum
Feature
similarity
1 Cross-lingual
2 Link
1
2
+ 3: Inlink
+
|
(E(c
))|
+
|E(c
)|
c
a
,a
,…,a
weigh
1!2
1
2
Regression-based
Model
for
Predicting
new
CLs
Let
I
(a)
and
I
(b)
denote
two
articles’
inlinks
that
are
2.3
Prediction
m
m1
m2
mk
Get
the
existing
annotations
N
in
a;
mizes
th
Formally,
these
two
metrics
are
defined
as
follows.
wiki
by
using
the
CLs
between
categories,
which
can
be
obties
between
articles
by
comparing
elements
in
their
outlinks.
Feature
1: Outlink
similarity
a
+
2
·
|
(I(a))
\
I(b)|
The
data
of
English
Wikipedia
and
Chinese
Wikipedia
has
n
m
be the outlinks of
a
and
b
respectively.
The
outlink
similarity
1!2
of
different
similarities
between
articles,
and
applies
a
threshO
(b)
be
the
outlinks
in
the
abstracts
and
infoboxes
of
a
and
In order
to aevaluate
the
CL
prediction
method
Inlinks
of
an b)
article
correspond
to
set
ofinlink
other
+ articles
X
X
be the outlinks offora each
and cbi respectively.
The
outlink
similarity
of
different
similarities
between
articles,
and
applies
a
threshend
of
CL
f
(a,
=
(6)
from
other
articles’
abstracts
and
infoboxes,
the
sim1
3
is
predic
2
C
do
a
b
tained
from
Wikipedia.
a
+
Outlinks
of
an
article
correspond
to
a
set
of
other
articles
been
used
to
evaluate
our
proposed
approach.
We
downAs
shown
in
Figure
2,
the
CL
predictor
takes
two
wikis
and
a
a
b
two articles
W1 andwhere
b2W
, let
O(a)
and
O(b)
The
CL
predictor
in
our
approach
computes
the
weighted
sum
Definition
1 Link Probability.
a conceptac 2
identified
22
c
C(a),
c
2
C(b),
n
=
|C(a)|,
and
m
=
|C(b)|.
selected
3,000
articles
having
CLs
to English
is computed
asGiven Given
|
(I(a))|
+
|I(b)|
8a
b
respectively.
The
outlink
similarity
is
computed
as
old
!
to
decide
whether
an
article
pair
should
have
a
CL.
For
linking
to
it,
let
I(a)
and
I(b)
denote
inlinks
of
two
articles,
1!2
i
j
0
f
(a,
b)
=
(c
,
c
)
(9)
+
B
=
{
is
computed
as
ilarity
is
computed
as
6
old
!
to
decide
whether
an
article
pair
should
have
a
CL.
For
Find
the
set
of
candidate
articles
T
of
c
;
i
j
CL
pred
0
c
i
load
the
English
Wikipedia
and
Chinese
Wikipedia
dumps
set
of
seed
CLs
between
them
as
inputs,
then
it
predicts
a
set
Feature
6
Category
similarity
end
i
that it links
to.article,
The outlink
computes
the
similariin an
the Linksimilarity
Probability
from thisof
concept
c btorespectively.
an- ⇤
Chinese
Wikipedia.
The existing CLs betw
be the outlinks
a and
The outlink this
similarity
of
different
similarities
between
and
applies
a
threshnm
+ isarticles,
the
inlink
similarity
computed
as
purpose,
a
scoring
function
is
defined:
Inlink s
imilarity
ensure
Feature
4:
Inlink
similarity
Therefo
Get
b
=
arg
max
LP
(b|c
)
⇥
SR(b,
N
);
+
+
propriat
of
new
CLs.
This
subsection
first
introduces
the
definitions
of
i=1
j=1
from
Wikipedia’s
download
site.
Both
of
the
two
dumps
this
purpose,
a
scoring
function
is
defined:
b2T
i
a
Regression-based
Model
for
Predicting
new
CLs
Given
two
articles
a
and
b,
let
C(a)
and
C(b)
be
the
catc
otherarticles
article by
a Definition
iscomparing
defined
by
the
approximated
conditional
ties between
elements
in1!2
their(O(a))
outlinks.
+were
+ as(b)|
22· ·| |the
(O
(a))
\
O
which
cles
used
the
ground
truth
for
the
ev
i
return
N
1!2
2
·
|
\
O(b)|
2
Semantic
Relatedness.
Given
a
concept
c,
let
is
computed
as
d
categories
of
two
articles,
category
similarity
is
computed
+
+
(I
(a))
\
I
(b)|
old
!
to
decide
whether
an
article
pair
should
have
a
CL.
For
1!2 two articles’
⇤
0 thethe
Algorithm
1:
Concept
annotation
algorithm.
+
Annota)on p
redic)on
f
(a,
b)
=
(5)
2
·
|
(O(a))
\
O(b)|
Let
I
(a)
and
I
(b)
denote
inlinks
that
are
different
similarity
features,
and
then
describes
learning
are
archived
in
August
2012,
English
Wikipedia
contains
challeng
2
1!2
where
the
N
=
N
[
{b
};
probability
of
a
given
c:
f
(a,
b)
=
(4)
f
(a,
b)
=
(7)
egories
of
a
and
b
respectively.
Category
similarity
coma
a
seed
CLs
were
extracted
from
the
rest
of
23
+
+
4
1
Given
two
articles
a
2
W
and
b
2
W
,
let
O(a)
and
O(b)
2
·
|
(I(a))
\
I(b)|
The
CL
predictor
in
our
approach
computes
the
weighted
sum
N
be
the
existing
annotations
in
the
same
article
with
c,
the
1
2
1!2
+
+
|
(O
(a))|
+
|O
(b)|
c
Definition
2
Semantic
Relatedness.
Given
a
concept
c,
let
as
+
f
(a,
b)
=
(4)
1!2
cle are usually annotated by
|
(I
(a))|
+
|I
(b)|
this
purpose,
a
scoring
function
is
defined:
1 end
|
(O(a))|
+
|O(b)|
1!2
method
for
predicting
new
CLs.
4
million
articles,
and
the
Chinese
Wikipedia
contains
499
f
(a,
b)
=
(6)
from
other
articles’
abstracts
and
infoboxes,
the
inlink
simput
y
is
a
Here,
1!2
2|
(E(c
))
\
E(c
)|
3
i
CLs
between
English
Wikipedia
and
Chinese
putes
similarities
between
categories
by
using
CLs
between
Input:
A
wiki
W
=
(A,
E)
1!2
1
2
|
(O(a))|
+
|O(b)|
be
the
outlinks
of
a
and
b
respectively.
The
outlink
similarity
Semantic
Relatedness
between
a
candidate
destination
article
different
between
articles,
and
applies
a(cthresh~
Ncinbethe
the existing annotations
in the same
the of(O(a))
1!2
| 1!2 (I(a))| + |I(b)|
8
2 · |c, 1!2
\ similarities
O(b)| thousand
count(a,
c) article with
orresponding
Feature
3:
Inlink
similarity
ly
annotated byarticles
b)
=
!
+
!
~
·
f
0s(a, articles,
,
c
)
=
(10)
and
there
are
already
239,309
cross-lingual
0
a,b
Feature
5
Category
similarity
model
on
ilarity
is
computed
as
1
2
~
weights
thethe
experiments,
we aimed
to investigate how
for
each
bold
2W
N1ato
do
articles.
Let
E(c
)
and
E(c
)
be
set
of
articles
belonging
LP
(a|c)
=
(1)
jFeature
s(a,
b)
=
!
+
!
~
·
f
f
(a,
b)
=
(4)
1
2
a
and
the
context
annotations
of
c
is
computed
as:
Definition
where
(·)
is
a
function
to
maps
articles
in
to
their
Output:
Annotated
wiki
W
=
(A,
E
)
is
computed
as
(11)
Semantic
Relatedness
between
a
candidate
destination
article
1
0
a,b
Link P
robability
!
decide
whether
an
article
pair
should
have
a
CL.
For
1!2
guide
readers to arti|
(E(c
))|
+
|E(c
)|
0to maps articles
2
·
|
(C(a))
\
C(b)|
+
gr links
articles
in the
Inlinks
of
an
article
correspond
to
a
set
of
other
articles
1!2
count(c)
links
between
English
Wikipedia
and
Chinese
Wikipedia.
Us1!2
1
2
Categories
are
tags
attached
to
articles,
which
represent
the and
tures.
The
where
(·)
is
a
function
in
W
to
their
Feature
4:
Inlink
similarity
(11)
of
CLs
{
|
(O(a))|
+
|O(b)|
tion
method
performed
before
after
the
c
1!2
1
if
<
a,
b
>
2
/
E
then
Feature
1:
Outlink
similarity
The
1!2
to
category
c
and
c
respectively,
the
similarity
between
two
=
!
+
!
⇥
f
(a,
b)
+
...
+
!
⇥
f
(a,
b)
j
f
(a,
b)(=
(8)
1
2
which
a
and
the
context
annotations
of
c
is
computed
as:
0
1
1
6
6
Six
features
are
defined
to
assess
the
similarities
between
ar5 of Concept
corresponding articles in W2 if 1there
them.
this purpose,
a scoring
function
is we
defined:
+
+ofC(b)
Xare CLs between
ion
abouttothe
subject of curreaders
arti+
+and
~
ing
these
data,
first
evaluate
the
effectiveness
linking
to
it,
let
I(a)
I(b)
denote
inlinks
two
articles,
=
!
+
!
⇥
f
(a,
b)
+
...
+
!
⇥
f
(a,
b)
topics
of
the
articles’
subjects.
Let
C(a)
and
denote
F1-measur
2
·
|
(I
(a))
\
I
(b)|
|
(C(a))|
+
|C(b)|
s(a,
b)
=
!
+
!
~
·
f
0
1
1
6
6
Let
I
(a)
and
I
(b)
denote
two
articles’
inlinks
that
are
tion,
and
how
the
CL
prediction
method
perfo
B
=
{(b
1!2
1!2
0
a,b
corresponding
articles
in
W
if
there
are
CLs
between
them.
E
=
E
[
{<
a,
b
>}
categories
is
computed
as
a
b
j
2
+
2
·
|
(O(a))
\
O(b)|
where
count(a,
c)
denotes
the
number
of
times
that
c
links
to
ticles
by
using
different
kinds
of
information,
all
of
these
SR(a,
c)(·)
=article
r(a, b)article
(2)
Outlinks
of
an
correspond
to
a
set
of
other
articles
where
1!2
for
each
a
2
A
do
e subject
was often
used to estimate
f
(a,
b)
=
(7)
where
is
a
function
to
maps
articles
in
W
to
their
X
of cur(11)
+
Feature
2:
Outlink
similarity
Annotation
and
CL
prediction
methods
respectively,
and
then
where
c
2
C(a),
c
2
C(b),
n
=
|C(a)|,
and
m
=
|C(b)|.
4
1!2
1
the
inlink
similarity
is
computed
as
1
⇤ abstracts
+ features are based on theFor
ferent
number
of(b)|
seeding
cross-lingual
links.
+ (a))|
+
f1 (a,
b) = denotes
(4)end
iW from
jarticles’
|N
other
and
infoboxes,
the
inlink
simensure:
c |2:
|
(I
+
|I
a
and
count(c)
the
number
of
times
that
c
appears
in
link
structures
and
therefore
are
each
article
a
in
,
the
article
b
in
W
that
maxiFeature
Outlink
similarity
1!2
SR(a,
c)
=
r(a,
b)
(2)
=as
+
!
⇥
f
(a,
b)
+
⇥
f
(a,
1!
2... + !6from
+
b2Nin
Category s
imilarity
Feature
5
Category
similarity
athere
set O
ofcomputes
concepts
Cthe
a;
put
y
aditional
knowledge Algorithm
linkingGiven1:
sed to estimate
⇤b)
c Extract
Here
(·)
maps
categories
one
wiki
to
another
0
1
1
6
|that
(O(a))|
+
|O(b)|
the
proposed
approach
a
whole.
a inevaluate
i
it
links
to.
The
outlink
similarity
similari1!2
corresponding
articles
W
if
are
CLs
between
them.
Concept
Annotation
Algorithm
1!2
two
articles
a
2
W
and
b
2
W
,
let
(a)
and
n
m
did
three
groups
of
experiments,
each
group
2
For
each
article
a
in
W
,
the
article
b
in
W
that
maxi|N
|
1
2
ilarity
⇤ is computed as 1 1X⇤X
2
c
+ score function s(a,
the whole
wiki.
~
language-independent.
end
]. However,
b2N
mizes
the
b
)
and
satisfies
s(a,
b
)
>
0
s(a,
b)
=
!
+
!
~
·
f
wledge
linkingSeman)c 012
there
might
c
Given
two
articles
a
2
W
and
b
2
W
,
let
O
(a)
and
+
Get
the
existing
annotations
Na2outlinks.
in
Feature
5Model
Category
similarity
model
2the
· | 1!2
(I(a))
\
I(b)|
0 Categories
a,ba;are tags attach
+
wiki
bywhich
using
the
CLs
between
categories,
can
be
oba
b which
to mizes
articles,
represent
1of
⇤
⇤
Regression-based
for
Predicting
new
CLs
number
of
seeding
cross-lingual
links.
In The
eac
Feature
2:
Outlink
similarity
ties
between
articles
by
comparing
elements
in
their
O
(b)
be
the
outlinks
in
the
abstracts
and
infoboxes
a
and
f
(a,
b)
=
(c
,
c
)
(9)
where
(·)
is
a
function
to
maps
articles
in
W
to
their
(11)
⇤
the
score
function
s(a,
b
)
and
satisfies
s(a,
b
)
>
0
6
1!2
1
f
(a,
b)
=
(6)
Input:
A
wiki
W
=
(A,
E)
i
j
3Wtags
+
end
ver,
thereespecially
might
articles,
when
the
For
each
article
a
in
,
the
article
b
W
that
maxi+in
+ compared
is
predicted
as
the
corresponding
article
of
a.
The
idea
of
for
each
c
2
C
do
Categories
are
attached
to
articles,
which
represent
the
tures.
nm
tained
from
Wikipedia.
1
2
+
3.1
Concept
Annotation
O
(b)
be
the
outlinks
in
the
abstracts
and
infoboxes
of
a
and
i
a
log(max(|I
|,
|I
|))
log(|I
\
I
|)
topics
of
the
articles’
subjects.
Let
C(a)
and
C(b)
denote
+
periments,
we
also
the
performance
|
(I(a))|
+
|I(b)|
8ai
Relatedness
=
!
+
!
⇥
f
(a,
b)
+
...
+
!
⇥
f
(a,
b)
0
a
b
a
b
2
·
|
(I
(a))
\
I
(b)|
1!2
0
1
1
6
6
1!2
Given
two
articles
a
2
W
and
b
2
W
,
let
O
(a)
and
corresponding
articles
in
W
if
there
are
CLs
between
them.
i=1
j=1
b
respectively.
The
outlink
similarity
is
computed
as
2
1
2
Given
two
articles
a
2
W
and
b
2
W
,
let
O(a)
and
O(b)
⇤
⇤C(b)
The
CL
predictor
in
our
approach
computes
the
weighted
sum
is
predicted
as
the
corresponding
article
of
a.
The
idea
of
when
the In this
r(a,
b) = 1wiki
(3)
return
N
where
ecially
fewer
editors.
cir- +Annotated
1
2
+
+
f
(a,
b)
=
(7)
d
Output:
W
=
(A,
E
)
Find
the
set
of
candidate
articles
T
of
c
;
log(max(|I
|,
|I
|))
log(|I
\
I
|)
4
topics
of
the
articles’
subjects.
Let
C(a)
and
denote
F1-me
mizes
the
score
function
s(a,
b
)
and
satisfies
s(a,
b
)
>
0
c
i
with
the
SVM
classification
model,
which
was
CL
prediction
is
simple
and
straightforward,
but
how
to
apFeature
6
Category
similarity
a b log(|W
b
a
b
i
+
+
categories
of
two
entities,
the
category
similarity
is
computed
+
+
respectively.
The
outlink
similarity
is
computed
as
|)
log(min(|I
|,
|I
|))
a
b
Feature
2:
Outlink
similarity
|and
(I (a))|
+ |I
(b)|buta how
Feature method,
4: is
Inlink
similarity
For
thea)evaluation
of
the
Annotation
we ranr(a,
(3)b⇤ = arg
O outlinks
(b) be the
outlinks
in
theGet
abstracts
and
infoboxes
of
and
ors.
In accurately
this cir- assess
1!2
⇤Concept
annot
theb) = 1 be the
put1 sh
yali
of
a
and
b
respectively.
The
outlink
similarity
CL
prediction
simple
straightforward,
to
apwhich
of
different
similarities
between
articles,
and
applies
threshmax
LP
(b|c
⇥
SR(b,
N
);
ous
knowledge
linking
approaches.
Table
For
each
article
a
in
W
,
the
article
b
in
W
that
maxi2|
(E(c
))
\
E(c
)|
b2T
i
a
Regression M
odel f
or P
redicting N
ew C
Ls log(|W
|)
log(min(|I
|,
|I
|))
+
+
1
2
is
predicted
as
the
corresponding
article
of
a.
The
idea
of
+
+
+
Given
two
articles
a
and
b,
let
C(a)
and
C(b)
be
the
catc
1!2
1
2
propriately
set
the
weights
of
different
similarity
features
is
a
a
b(a))
as
+ O (b)|
i
domly
selected
1000
articles
from
Chinese
Wikipedia.
There
Let
I
(a)
and
I
(b)
denote
two
articles’
inlinks that(10)
are
Given
two
articles
a
2
W
and
b
2
W
,
let
O
(a)
and
ately
assess
the
2
·
|
(O
\
en articles.
Therefore,
our
1
2
(c
,
c
)
=
1!2
b
respectively.
The
outlink
similarity
is
computed
as
⇤
⇤
Feature
5
Category
similarity
mode
S
R(a,c) i
s c
omputed b
ased o
n t
he l
inks i
n t
he I
ntegrated C
oncept N
etwork
⇤
where
I
and
I
are
the
sets
of
inlinks
of
article
a
and
article
1
2
+
imental
results.
a
b
+
+
propriately
set
the
weights
of
different
similarity
features
is
a
for
each
article
a
2
A
do
mizes
the
score
function
s(a,
b
)
and
satisfies
s(a,
b
)
>
0
is
computed
as
N
=
N
[
{b
};
+
old
!
to
decide
whether
an
article
pair
should
have
a
CL.
For
egories
of
aarticles’
and
b respectively.
Category
similarity
comf
(a,
b)
=
(5)
a2 · | 1!2
a
|
(E(c
))|
+
|E(c
)|
CL
prediction
is
simple
and
straightforward,
but
how
to
ap2
·
|
(C(a))
\
C(b)|
0
challenging
problem.
(O
(a))
\
O
(b)|
2
are
12,723
manually
created
inner
links
in
these
selected
arti1!2
1
2
Therefore,
ourto
1!2
from
other
abstracts
and
infoboxes,
the
inlink
simO+ (b)
beenrich
the
outlinks
the
abstracts
and
infoboxes
of
a
and
cept
annotator
the
where
Ia andin
I
are
the
sets
of
inlinks
of
article
a
and
article
+
+
Feature
1: Outlink
similarity
For each ar)cle a in W
, the ar)cle b* in W
that maximizes the following b, brespectively;
and| W1!2
is the
set
of all
articles
in predicted
the input
Categories
are
tags
attached
to
articles,
which
represent
the
tures.
(O
(a))|
+
|O
(b)|
1
2
f
(a,
b)(=
(8)
Extract
a
set
of
concepts
C
in
a;
Get
the
existing
In
each
group
of
the
experiment,
our
re
f
(a,
b)
=
(5)
is
as
the
corresponding
article
of
a.
The
idea
of
5
end
challenging
problem.
a
2
putes
similarities
between
categories
by
using
CLs
between
+
cles.
70%
of
the
existing
inner
links
in
the
articles
were
ranor
to
enrich
the
this
purpose,
a
scoring
function
is
defined:
+
+
ilarity
is
computed
as
propriately
set
the
weights
of
different
similarity
features
is
a
+
+
Here,
we
propose
a
regression-based
model
to
learn
the
b,
respectively;
and
W
is
the
set
of
all
articles
in
the
input
g new CLs.
a|C(b)|
b
b respectively. The outlink
|
(C(a))|
+
of| (O
an1!2
article
correspond
to a(b)|
set
of other
articles
wiki. similarity is computed asOutlinks
topics
of
the
articles’
subjects.
Let
C(a)
and
C(b)
denote
F1-me
2
·
|
(a))
\
O
(b)|
(O
(a))|
+
|O
1!2
where
c
2
C(a),
c
2
C(b),
n
=
|C(a)|,
and
m
=
|C(b)|.
1!2
learning
model
(RM)
always
performed
bet
score func)on s(a,b*) and sa)sfies s(a,b*)>0 is predicted as the i
j
CL
prediction
is
simple
and
straightforward,
but
how
to
apannotations
N
in
a;
for
each
c
2
C
do
for
each
b
2
N
do
domly
chosen
as
the
ground
truth
for
the
evaluation,
which
articles.
Let
E(c
)
and
E(c
)
be
the
set
of
articles
belonging
2
·
|
(O(a))
\
O(b)|
a
i
a
j
a
Feature
3:
Inlink
similarity
Here,
we
propose
a
regression-based
model
to
learn
the
1
2
wiki.
1!2
have been proposed to
annof2 (a, b) that
= it links
(5)
There
challenging
problem.
weights
of
features
based
on
a
set
of
known
CLs.
Given
a
set
to.
The
outlink
similarity
computes
the
similari+
+
+
+
model
in
terms
of
F1-measure;
it
shows
that
f
(a,
b)
=
(4)
The
Semantic
Relatedness
metric
has
been
used
in
several
+
+
Find
the
set
of
candidate
articles
T
of
c
;
propriately
set
the
weights
of
different
similarity
features
is
a
were
removed
from
the
articles
before
the
articles
are
fed
|
(O
(a))|
+
|O
(b)|
oposed
to
anno1
2
·
|
(I
(a))
\
I
(b)|
if
<
a,
b
>
2
/
E
then
Feature
3:
Inlink
similarity
c
i
to
category
c
and
c
respectively,
the
similarity
between
two
Here
(·)
maps
categories
from
one
wiki
to
another
wiki
corresponding a
r)cle o
f a
.
[
1!2
1!2
j by comparing
i set of other
n
nset of known
in Wikipedia Annota)on Mihalcea
andSemantic
1
2
1!2
2 · | Relatedness
(O
(a))
\
O
(b)|
Inlinks
of
an
article
correspond
to
a
articles
Regression-based
Model
for
Predicting
new
CLs
weights
of
features
based
on
a
CLs.
Given
a
set
1!2
ties
between
articles
elements
in
their
outlinks.
where
th
of
CLs
{(a
,
b
)}
as
training
data,
let
A
=
{(a
)}
and
Here,
we
propose
a
regression-based
model
to
learn
the
The
metric
has
been
used
in
several
|
(O(a))|
+
|O(b)|
is
more
suitable
for
the
knowledge
linking
pro
f
(a,
b)
=
(7)
⇤
i
i
i
4
1!2
other
approaches
for
linking
documents
to
Wikipedia.
The
i=1
i=1
[2008;
to the
the other
annotation
algorithm.
After
the annotation,
we
colf2 (a,
b) = Get b = argmax
(5)
Mihalcea
and et al.,
+ (a))| + |I + (b)|
challenging
problem.
LP
(b|c
)
⇥
SR(b,
c
);
E
=
E
[
{<
a,
b
>}
categories
is
computed
as
n
n
Kulkarni
2009;
b2T
i
i
by
using
CLs
between
categories,
which
can
be
obtained
j
|
(I
Inlinks
of
an
article
correspond
to
a
set
of
articles
c
+
+
~
n
1!2 traditional
linking
to
it,
let
I(a)
and
I(b)
denote
inlinks
of
two
articles,
Feature
3:
Inlink
similarity
i (b)|
Given
two
articles
a
2
W
and
b
2
W
,
let
O(a)
and
O(b)
put yi is
The
CL
predictor
in
our
approach
computes
the
weighted
sum
other
approaches
for
linking
documents
to
Wikipedia.
The
of
CLs
{(a
,
b
)}
as
training
data,
let
A
=
{(a
)}
classification
model
SVM.and
|
(O
(a))|
+
|O
1
2
s(a,
b)
=
!
+
!
~
·
f
B
=
{(b
)}
,
our
model
tries
to
optimal
weights
to
ifind
i the
i i=1
1!2
weights
features
based
on
a
set
of
known
CLs.
Given
a
set
inner
link
structures
in
the
wiki
are
used
to
compute
the
Se0
a,b
Criteria
i=1
lected
the
new
discovered
inner
links,
and
computed
the
Prerni
et
al.,
2009;
i
⇤
i=1
end
Here,
we
propose
a
regression-based
model
to
learn
the
ches aim to identifyinner
imporwhere
(·)
is
a
function
to
maps
articles
in
W
to
their
from
Wikipedia.
N
=
N
[
{b
};
(11)
n5 Categorybetween
linking
to
it,
let
I(a)
and
I(b)
denote
inlinks
of
two
articles,
a
a
link
structures
in
the
wiki
are
used
to
compute
the
Se1!2
1
be
the
outlinks
of
a
and
b
respectively.
The
outlink
similarity
n
n
Feature
similarity
model
o
of
different
similarities
articles,
and
applies
a
threshthe
inlink
similarity
is
computed
as
Inlinks
of
an
article
correspond
to
a
set
of
other
articles
Before
concept
annotation
being
perfor
B
=
{(b
)}
,
our
model
tries
to
find
the
optimal
weights
to
mantic
Relatedness.
One
thing
worth
mentioning
is
that
when
cision,
Recall
and
F1-measure
of
the
new
discovered
inner
identify
imporn
m
ensure:
of
CLs
{(a
,
b
)}
as
training
data,
let
A
=
{(a
)}
and
i
Feature
3:
Inlink
similarity
i=1
i=
i!0i=1
if6i=1
ments and link them
to the
+
X
X
weights
of
features
based
on
a
set
of
known
CLs.
Given
a
set
+
!
⇥
f
(a,
b)
+
...
+
!
⇥
(a,
b)
end
1
mantic
Relatedness.
One
thing
worth
mentioning
is
that
when
1
1
6
end
Feature
6
Category
similarity
is
computed
as
Categories
are
tags
attached
to
articles,
which
represent
the
tures.
Th
old
!
to
decide
whether
an
article
pair
should
have
a
CL.
For
corresponding
articles
in
W
if
there
are
CLs
between
them.
the
inlink
similarity
is
computed
as
a approach
b
n 0n
measure
of
our
does
not
always
incre
there
are
insufficient
inner
links,
this
metric
might
be
not
aclinks
based
on
the
ground
truth
links.
2
nk
them
to
the
linking
to
it,
let
I(a)
and
I(b)
denote
inlinks
of
two
articles,
n
ensure:
Inlinks
of
an
article
correspond
to
a
set
of
other
articles
f6 (a,
b) =
(ci , cweights
)
(9)
dia. The concept annotator
B
=
{(b
)}
,
our
model
tries
to
find
the
optimal
to
i
of
CLs
{(a
,
b
)}
as
training
data,
let
A
=
{(a
)}
and
j
i=1
there
are
insufficient
inner
links,
this
metric
might
be
not
aci
i
i
I
ntegrated C
oncept N
etwork
end
+
ZH
i=1
i=1
topics
of the
articles’
subjects.
Let
C(a)
and C(b)
denote
F1-meas
EN
for
each
b
2
N
do
this
purpose,
a
scoring
function
is
defined:
2
·
|
(I(a))
\
I(b)|
seed
CLs
are
used.
However,
the
F1-measure
Given
two
articles
a
and
b,
let
C(a)
and
C(b)
be
the
catcurate
enough
to
reflect
the
relatedness
between
articles.
To
ncept
annotator
nm
j
a
1!2
One
advantage
of
our
concept
annotation
method
is
to
Feature
2:
Outlink
similarity
the
inlink
similarity
is
computed
as
n
ilar function
as to
existing
linking
it,curate
letapI(a)
andtoI(b)
denote
inlinks of
two return
articles,
A regression model is used to learn the weights of features based on a set of i=1 j=1 ⇤
ensure:
enough
reflect
the
relatedness
between
articles.
To
0
0
2
·
|
(O(a))
\
O(b)|
B
=
{(b
)}
,
our
model
tries
to
find
the
optimal
weights
to
f
(a,
b)
=
(6)
+
1!2
i
N
3
For
each
article
a
in
W
,
the
article
W
i=1
ously as bthe in
number
ofthat
seed maxiCLs increases aft
d CLs
as existinglinks
ap- from an ex- ifthis
end,
our
approach
first
utilizes
the
known
to
merge
<
a,
b
>
2
/
E
then
1
2
+
leverage
the
cross-lingual
information
to
enhance
the
conegories
of
a
and
b
respectively.
Category
similarity
com2
·
|
(I(a))
\
I(b)|
j
1!2
f
(a,
b)
=
(4)
中国
scovering
|
(I(a))|
+
|I(b)|
8a
2
A,
8b
2
(B
{b
}),
s(a
,
b
)
s(a
,
b
)
>
0
1 ensure:
the
inlink
similarity
is
computed
as
Given
two
articles
a
2
W
and
b
2
W
,
let
O
(a)
and
China
1!2
i
i
i
i
i
this
end,
our
approach
first
utilizes
the
known
CLs
to
merge
1
2
⇤
⇤
known C
Ls. T
he o
p)mal w
eights s
hould s
a)sfies:
中国
0
0
f
(a,
b)
=
(6)
notation;
there
is
a
24.5%
increment
of the F1|
(O(a))|
+
|O(b)|
nks
from
an
exthe
link
structures
of
two
wikis,
which
results
in
a
Integrated
3
cept
annotation.
Therefore,
we
did
experiments
with
differ1!2
E
=
E
[
{<
a,
b
>}
China
mizes
the
score
function
s(a,
b
)
and
satisfies
s(a,
b
)
>
0
2|
(E(c
))
\
E(c
)|
putes
similarities
between
categories
by
using
LCs
between
清华大学
j
ept annotator focusestheonlink
en-structures
+
1!2
1 ,b )
2s(a , b ) > 0
2
·
|
(I(a))
\
I(b)|
~
1!2
of
two
wikis,
which
results
in
a
Integrated
|
(I(a))|
+
|I(b)|
8a
2
A,
8b
2
(B
{b
}),
s(a
+
Tsinghua s(a,
b)
=
!
+
!
~
·
f
O
(b)
be
the
outlinks
in
the
abstracts
and
infoboxes
of
a
and
1!2
i
i
i
i
i increases
the
number
of
seed
CLs
from
0.05
m
Tsinghua 0 1 , c2 )Figure
a,b
focuses
on
en(c
=
(10)
Concept
Network.
The
Semantic
Relatedness
is
computed
Feature
4:
Inlink
similarity
ent
number
of
CLs
used
in
the
concept
annotation.
0
0
f
(a,
b)
=
(6)
end2 · | The
3etworks
where
to mapswhich
articles
in
W
their
i. Therefore, there
will be Network.
(11)
Integrate n
(I(a))
\
I(b)|
articles.
Let
E(c
)
and
E(c
)
be
the
set
of
articles
belonging
University
1!2 (·) is a function
1 to
is
predicted
as
the
corresponding
article
of
a.
The
idea
of
also
means
Concept
Semantic
Relatedness
is
computed
1!2
1
2
University
|
(E(c
))|
+
|E(c
)|
清华大学
+
Table
1:
Results of re
C
1!2
1
2
+
+
million.
This
is
might
because
the
equivalent
, there will be
|
(I(a))|
+
|I(b)|
8a
2
A,
8b
2
(B
{b
}),
s(a
,
b
)
s(a
,
b
)
>
0
+
based
on
the
links
in
the
Integrated
Concept
Network,
which
3
shows
the
experimental
results
of
concept
annotation.
For
Feature
1:
Outlink
similarity
0
0
1!2
i
i
i
i
i
f
(a,
b)
=
(6)
=
!
+
!
⇥
f
(a,
b)
+
...
+
!
⇥
f
(a,
b)
b
respectively.
The
outlink
similarity
is
computed
as
Let
Ilinks
(a)
and
I (b)Feature
denote
two
articles’
inlinksinthat
are
0 between
1
1
6
6
4:
Inlink
similarity
s before annotating.based
In3 our
corresponding
articles
W
if
there
are
CLs
between
them.
on
the
in
the
Integrated
Concept
Network,
which
2
end
to
category
c
and
c
respectively,
the
similarity
two
Before
An
1 i }), of
2 experiments,
| 1!2 (I(a))|
+as|I(b)|
8a
2
A,
8b
2
(B
{b
s(a
,
b
)
s(a
,
b
)
>
0
CL
prediction
is
simple
and
straightforward,
but
how
to
apwhich
also
means
CLs
cannot
effectively
be
propagated
to
oth
otating. In our
i
i
i
i
is
formally
defined
follows.
three
groups
we
used
0,
0.1million
and
0.2
a
b
+
+
Outlinks
ofthe
andenote
articlesimilarity
correspond
to ainlinks
set of other
articles
+
+ infoboxes,
+2:
annotator areBeijing
two wikis
and other
where c~i 2 C(a), cj 2 C(b), when
n=
|C(a)|,
and
m
= links
|C(b)|.
from
articles’
abstracts
and
inlink
simFeature
Outlink
is
formally
defined
as
follows.
Feature
4:
Inlink
similarity
Let
I
(a)
and
I
(b)
two
articles’
that
are
北京
⇤
#Seed
CLs
Model
Precision
Reca
categories
is
computed
as
~
there
are
few
inner
in
wikis.
There
end
two wikisFeature
and
北京
+
million
CLs,
respectively.
As
shown
in
Figure
3,
our
apFor
each
article
a
in
W
,
the
article
b
in
W
that
maxiwhich
also
means
+
+
!
~
·
(
f
f
)
>
0
propriately
set
the
weights
of
different
similarity
features
is
a
1
2
+ similarithat
it links
to.
The
outlink
similarity
computes
the
ai ,bi
ai ,b‘
Inlink
similarity
concept annotation4:
process
+
+Network.
+
Definition
3
Integrated
Concept
Given
two
wikis
Given
two
articles
a
2
W
and
b
2
W
,
let
O
(a)
and
Beijing
2
·
|
(O
(a))
\
O
(b)|
ilarity
is
computed
as
1
2
1!2
⇤
⇤
which
also
means
0.05
Mil.
CLsiss(a,
SVM
92.1%
35.0%
Let
Ifrom
(a)
and IGiven
(b)
denote
two
articles’
inlinks
that
are
the
inner
links
important
for
knowledge
li
other
articles’
abstracts
and
infoboxes,
the
inlink
simotation process + Definition
3
Integrated
Concept
Network.
two
wikis
proach
can
obtain
82.1%
precision
and
70.5%
recall
when
no
return
N
+
mizes
the
score
function
s(a,
b
)
and
satisfies
b
)
>
0
d
+
Regression-based
Model
for
Predicting
new
CLs
~
~
n X
m problem.
f
(a,
b)
=
(5)
ties
between
articles
byinsets
comparing
elements
in theirof
outlinks.
conceptand
extraction,
(2)I (b)
Let(2)I (a)and
and
denote
two
articles’
inlinks
that
are
challenging
W
=
(A
,
E
),
W
=
(A
,
E
),
where
A
and
A
are
!
~
·
(
f
f
)
>
0
2
ni ,b
O
(b)
be
the
outlinks
the
abstracts
and
infoboxes
a
and
+
1
1
1
2
2
2
1
2
X
a
,b
a
RM
93.3%of CL
36.0%
effectively
improve
the
performance
pre
i
i
action,
‘
+
+
Therefore, a training data set is generated to feed the linear regression W
=
(A
,
E
),
W
=
(A
,
E
),
where
A
and
A
are
sets
1
Therefore,
we
generate
a
new
dataset
D
=
{(x
,
y
)}
,
CLs
are
used.
The
precision
and
recall
increase
as
the
numfrom
other
articles’
abstracts
and
infoboxes,
the
inlink
similarity
is
computed
as
1
1
1
2
2
2
1
2
i
i
is
predicted
as
the
corresponding
article
of
a.
The
idea
of
+
|
(O
(a))|
+
|O
(b)|
a
b
i=1
Forbidden City
+
~
~
+ 1!2
+The
Given
two
articles
a
2
W
and
b
2
W
,
let
O(a)
and
O(b)
from other
articles’
abstracts
and
infoboxes,
the
inlink
simof
articles,
E
and
E
are
sets
of
links
in
W
and
W
,
respecThe
CL
predictor
in
our
approach
computes
the weighted
sum
1
2
b
respectively.
outlink
similarity
is
computed
as
1
2
1
2
f
(a,
b)
=
(c
,
c
)
(9)
!
~
·
(
f
f
)
>
0
0.10
Mil.
CLs
SVM
79.7%
35.0%
·
|
(I
(a))
\
I
(b)|
6
a
,b
a
,b
Here,
we
propose
a
regression-based
model
to
learn
the
~
~
i
j
of articles,故宫
E1 and E2 ilarity
are sets is
of2links
in
W
and
W
,
respecber
of
used
CLs
increases.
When
0.2
million
CLs
are
used,
i
i
i
‘
1!2
n
1 as
2
~
~
!
~
·
(
f
f
)
>
0
CL
prediction
is
simple
and
straightforward,
but
how
to
apk
computed
algorithm t
o l
earn t
he w
eights:
ai ,bioutlink
ai ,bsimilarity
possible
concepts
that as tively.f郭金龙
nm
‘
Jinlong G2uo
where
the
input
vector
x
=
(
f
f
),
the
target
outTherefore,
we
generate
a
new
dataset
D
=
{(x
,
y
)}
,
Forbidden City
Let
CL
=
{(a
,
b
)|a
2
B
,
b
2
B
}
be
the
set
of
be
the
outlinks
of
a
and
b
respectively.
The
(a,
b)
=
(7)
i
a
,b
a
,b
k
i
i
ilarity
computed
of
different
similarities
between
articles,
and
applies
a
threshi
i
i
1
i
i
i
i
j6
=
i
4
i=1
RM
84.6%
37.4%
i=1
estep,
concepts
thatis
Feature
3:
Inlink
similarity
tively.
Let
CL
=
{(a
,
b
)|a
2
B
,
b
2
B
}
be
the
set
of
i=1
j=1
+
+
the
precision
increased
by
3.5%
and
the
recall
was
increased
+
+
i
i
i
1
i
2
i=1
n
weights
of
features
based
on
a
set
of
known
CLs.
Given
a
set
+
+
propriately
set
the
weights
of
different
similarity
features
is
a
Jinlong Guo
3.3
Incrementally
Knowledge
Linking
|
(I
(a))|
+
|I
(b)|
2
·
|
(I
(a))
\
I
(b)|
郭金龙
earesame
wiki
are
identified
in
Feature
Definition
n
1!2
故宫
1!2
cross-lingual
links
between
W
and
W
,
where
B
✓
A
and
2
·
|
(O
(a))
\
O
(b)|
Therefore,
we
generate
a
new
dataset
D
=
{(x
,
y
)}
,
put
y
is
always
set
to
1.
Then
we
train
a
linear
regression
is
computed
as
~
~
1
2
1
1
1!2
old
!the
toi=1
decide
whether
an
article
pair
ayn
CL.
For
i have
iSVM
identified in
0.15
Mil.should
CLs
80.9%
35.9%
i new
Therefore,
we
generate
a
dataset
D
=
{(x
,
y
,
i=1
0)}
cross-lingual links Inlinks
between+Wof
W+article
B
✓
A
and
by
3.2%.
According
the
results,
a
large
part
of
missing
inner
where
input
vector
x
=
(
f
f
),
the
target
outi
i
where
+
+
1 and
2 , where
1
1
=
1
Cross-li
n
f
(a,
b)
=
(7)
i
a
,b
a
,b
an
correspond
to
a
set
of
other
articles
i
f
(a,
b)
=
(5)
i i
i j6=i
4 Network
challenging
problem.
2 W(I
e extracted
allFeature
2
·
|
(a))
\
I
(b)|
of
CLs
{(a
,
b
)}
as
training
data,
let
A
=
{(a
)}
and
B
✓
A
.
The
Integrated
Concept
of
and
W
is
+
+
+
+
1!2
2
·
|
(I
(a))
\
I
(b)|
Six
features
are
defined
to
assess
the
similarities
between
ar2
2
1
2
i
i
i
by
matching by
all matching
~
~
RM
93.5%
38.2%
1!2
this
purpose,
a
scoring
function
is
defined:
5
Category
similarity
model
on
the
dataset
D
to
get
the
weights
of
different
feai=1
i=1
|
(O
(a))|
+
|O
(b)|
B
✓
A
.
The
Integrated
Concept
Network
of
W
and
W
is
|
(I
(a))|
+
|I
(b)|
~
~
links
can
be
discovered
with
an
acceptable
precision
by
our
2
2
1
2
1!2
where
the
input
vector
x
=
(
f
f
),
the
target
out1!2
2|
(E(c
))
\
E(c
)|
f
(a,
b)
=
(7)
where
the
input
vector
x
=
(
f
f
),
the
target
output
y
is
always
set
to
1.
Then
we
train
a
linear
regression
i
a
,b
a
,b
1!2
1
2
At
last,
we
evaluated
our
approach
as
a
who
important
resour
f
(a,
b)
=
(7)
i i
i j6=i
i
a
,b
a
,b
4
i
with
the
elements
in
a
conn
Here,
we
propose
a
regression-based
model
to
learn
the
i
i
i
j6
=
i
4
linking
to
it,
let
I(a)
and
I(b)
denote
inlinks
of
two
articles,
ICN
(W
,
W
)
=
(V,
L),
where
V
=
A
[
A
;
L
is
a
set
of
2
·
|
(O(a))
\
O(b)|
+
+
1L),
2where
2 of(a))|
+ (a))|
+of
ments in a con- ticles
1!2
0.20
Mil.
CLs
SVM
84.7%
37.3%
(c
,
c
)
=
(10)
by
kinds
all
these
B
=
{(b
)}
,
our
model
tries
to
find
the
optimal
weights
to
ICN (W
W2| ) 1!2
= different
(V,
V|I=
A1information,
[
Aarticles,
; L |is 1!2
a1 set
of
concept
annotation
method;
if
there
are
available
CLs,
they
(I
+
|I
(b)|
1
2
1 , using
2 Feature
(I
+
(b)|
Categories
are
tags
attached
to
which
represent
the
tures.
The
threshold
!
is
set
to
the
value
that
maximizes
the
i
i=1
3:f1Inlink
proposed
approach
aims
toadiscover
CLs with
y(4)
is0model
always
set
to2dataset
1.
Then
we
train
aknown
linear
regression
(a,
=
putb)ysimilarity
train
ai linear
regression
ry titles
contains
and
the
i is always set to 1. Then we put
Feature
5
Category
similarity
on
the
D
to
get
the
weights
of
different
feaweights
of
features
based
on
a
set
of
CLs.
Given
set
different
languag
|
(E(c
))|
+
|E(c
)|
links
which
are
established
as
follows:
1!2
1
he
andthe
thetitlesfeatures
RM
94.5% 37.9%
linkstopics
which
are
established
as
follows:
the
inlink
similarity
is
computed
as
can
be
used
to
improve
the
quality
of
the
discovered
inner
|
(O(a))|
+
|O(b)|
are
based
on
the
link
structures
and
therefore
are
0
0
0
1!2
ensure:
n
n
of
the
subjects.
Let
C(a)
and
C(b)on
denote
on
the training
CLs.
seeding
links,
we
did
not
use
all the availabl
Inlinks
of attached
anmodel
article
correspond
toF1-measure
a set
of
other
articles
0 articles’
0 Category
Feature
5 Category
similarity
the
dataset
D
to
get
the
weights
of
different
feaniki.
the For
inputan
wiki.
For
an0 inFeature
5
similarity
model
on
the
dataset
D
to
get
the
weights
of
different
feaof
CLs
{(a
,
b
)}
as
training
data,
let
A
=
{(a
)}
and
~
i
i
i
inCategories
are
tags
to
articles,
which
represent
the
tures.
The
threshold
!
is
set
to
the
value
that
maximizes
the
(v,
v
)
2
E
_
(v,
v
)
2
E
!
(v,
v
)
2
L
articles
in
Wiki
i=1
i=1 the proposed 1
2
s(a,
b)
=
!
+
!
~
·
f
0
links.
(v,
v
)
2
E
_
(v,
v
)
2
E
!
(v,
v
)
2
L
0
a,b
a
b
The datasets of English Wikipedia (4 million ar)cles) and Chinese Wikipedia (499 thousand ar)cles) that are archived in August 2012 has been used to evaluate 1
2
language-independent.
links;
instead,
we
randomly
select
50
thousand
n m = |C(b)|.
0
0
0
0 function
linking
to
it,
let
I(a)
and
I(b)
denote
inlinks
of
two
articles,
where
c
2
C(a),
c
2
C(b),
n
=
|C(a)|
and
where
(·)
is
a
to
maps
articles
in
W
to
their
ept
extraction
contain
a
set
(11)
Categories
are
tags
attached
to
articles,
which
represent
the
tures.
The
threshold
!
is
set
to
the
value
that
maximizes
the
0
0
0
0
1!2
1
B
=
{(b
)}
,
our
model
tries
to
find
the
optimal
weights
to
i
j
0
n contain a set
Categories
are
tags
attached
to
articles,
which
represent
the
tures.
The
threshold
!
is
set
to
the
value
that
maximizes
the
i i=1
0
(v,
v
)
2
E
^
(v,
b)
2
CL
^
(v
,
b
)
2
CL
!
(b,
b
)
2
L
topics
of
the
articles’
subjects.
Let
C(a)
and
C(b)
denote
F1-measure
on
the
training
CLs.
links
to
certain
l
1
(v,
v
)
2
E
^
(v,
b)
2
CL
^
(v
,
b
)
2
CL
!
(b,
b
)
2
L
2
·
|
(I(a))
\
I(b)|
seed
links.
We
also
randomly
removed
70%
1
=
!
+
!
⇥
f
(a,
b)
+
...
+
!
⇥
f
(a,
b)
Feature
1:
Outlink
similarity
approach. T
here a
re 2
39,309 c
ross-‐lingual l
inks b
etween t
wo w
ikis.
1!2
0
1
1
6
6
the
inlink
similarity
is
computed
as
he
sets topics
of
candidate
0
0
0
0
corresponding
articles
in W2on
if the
there
aredenote
CLs
between
them.
of thearticles
articles’
subjects.
Let
C(a)
denote
F1-measure
training
CLs.
ndidate
articles
ensure:
0
0
0 and C(b)
0
topics
of
the
articles’
subjects.
Let
C(a)
and
C(b)
F1-measure
on
the
160,000"
0 training CLs.
0 (for
f
(a,
b)
=
(6)
100%)
(v,
v
)
2
E
^
(v,
a)
2
CL
^
(a
,
v
)
2
CL
!
(a,
a
)
2
L
the
Chinese
Wikipedia
both
articles
havin
lem,
several
kn
3
2
(v,
v
)
2
E
^
(v,
a)
2
CL
^
(a
,
v
)
2
CL
!
(a,
a
)
2
L
+
Regression-based
Learning
Model
for
Predicting
CLs
2
ept
might
link
to.
Outlinks
of
an
article
is
a
set
of
other
articles
that
it
links
to.
Feature
2:
Outlink+ similarity
nk to.
⇤s(a
|
(I(a))|
|I(b)|
8a
2
A,
8b
2
(B
{b
}),
s(a
,
b
)
,
b
)
>
0
90%)
1!2
i
i
i
i
i
140,000"
links
and
having
no
cross-lingual
links). We fed
For
each
article
a
in
W
,
the
article
b
in
W
that
maxi R
esults o
f c
ross-‐lingual l
ink p
rediction
2
·
|
(I(a))
\
I(b)|
Table
1:
Results
of
cross-lingual
link
prediction
(%).
1
2 been proposed
+
1!2
to
nfrom
step,the
links
from The
theInconIn
the
Integrated
Concept
Network,
the
link
structures
of
conoutlink
similarity
computes
the
similarities
between
artithe Integrated Concept Network, the linkGiven
structures
twofof3articles
a
2
W
and
b
2
W
,
let
O
(a)
and
The
CL
predictor
in
our
approach
computes
the
weighted
sum
80%)
1
2
0
0
(a, b)
=
(6)
⇤ two wikis to our approach,
⇤
and
let0 it run itera
+
mizes
the
score
function
s(a,
b
)
and
satisfies
s(a,
b
)
>
120,000"
Before
Annotation
After
Annotation
+
are
identified.
we wikis
use
two
wikis
are
merged
by
using
CLs.
The
Semantic
Related70%)
|
(I(a))|
+
|I(b)|
8a
2
A,
8b
2
(B
{b
}),
s(a
,
b
)
s(a
,
b
)
>
0
. Here
we useHerecles
Feature
4:
Inlink
similarity
tween
wikis.
W
two
are
merged
by
using
CLs.
The
Semantic
Related1!2
i and apply a threshi
i i
i
by comparing elements in their outlinks.
Given
two arti-in the
O (b) be
the outlinks
abstracts
andsimilarities
infoboxes of
a
and
of
different
between
articles,
mentally
find of
newa.CLs.
Figure
4ofshows the n
which
also
means
is
predicted
as
the
corresponding
article
The
idea
#Seed
CLs RelatedModel
Precision
Reccall
F1-measure
Precision
Recall
F1-measure
60%)
y and the
Semantic
100,000"
+
+
+
mantic
Relatedness
computed
based
on
Integrated
Concept
Network
is
more
ness a
computed
based
on
Integrated
Concept
Network
is more
+
wikis
are
sparse
b respectively.
The
outlink
similarity
is computed
as an article pair should have a CL. For discovered CLs in each run.
cles
2
W
and
b
2
W
,
let
O(a)
and
O(b)
be
the
outlinks
Let
I
(a)
and
I
(b)
denote
two
articles’
inlinks
that
are
Feature
4:
Inlink
similarity
old
!
to
decide
whether
1
2
Blue
bars
corres
0
50%)
0.05
Mil. CLs
SVM
92.1
35.0
50.7
78.5
37.2
50.5
which
also
means
for
The
Link
CL
prediction
is
simple
and
straightforward,
but
how
to
apcept.each
Theconcept.
Link
reliable
than
on
a
single
wiki.
In
order
to
balance
the
Link
reliable than on a single wiki. In order to balance
the
80,000"
+ Link
+
+
Let
I
(a)
and
I
(b)
denote
two
articles’
inlinks
that
are
of
a
and
b
respectively.
The
outlink
similarity
is
computed
as
results
without
concept
annotation,
and
red
ba
40%)
lingual
links
are
this
purpose,
a
scoring
function
is
defined:
from
other
articles’
abstracts
and
infoboxes,
the
inlink
simRM
93.3
36.0
52.0
92.4
38.6
54.5
thatis an
is
the
nilities
article
thearticle
+
+
Probability
and the
Semantic Relatedness,
propriately!
set· (the
weightsf~
of different
similarity features is a
Probability and
the Semantic
Relatedness,
our approachour
usesapproach2uses
~
+
60,000"
·
|
(O
(a))
\
O
(b)|
30%)
1!2
~
f
)
>
0
sults
when the concept annotation
was perform
from
other
articles’
abstracts
and
infoboxes,
the
inlink
sima
,b
a
,b
all
the
possible
cr
i
i
i
‘
Mil. CLsan based
SVM
79.7
35.0
48.6
86.9
50.4
63.8
which
is0.10
approximated
~
~
oximated
based
f
(a,
b)
=
(5)
an
algorithm
that
greedily
selects
the
articles
that
maximize
algorithm that
greedily
the (O(a))
articles
that
maximize
ilarity
is 2selects
computed
as ilarity
challenging
problem.
2
!
~
·
(
f
f
)
>
0
20%)
· | 1!2
\ O(b)|
ai ,bi
ai ,b‘
+ (a))| + |O + (b)|
40,000"
~
is
computed
as
|
(O
According to the results,
new
CLs can be
1!2
s(a,
~10%)· fa,b
RM of
84.6
37.4
65.3b) = !0 + !
nsThe
in aSemantic
wiki. The Semantic
f1the
(a,
b) metrics.
=of the
(4)
the product
two
Algorithm
outlines the96.6
the
product
two
metrics.151.9
Algorithm
1 aloutlines49.3
the althis
problem,
w
n
Here,
we
propose
a
regression-based
model
to
learn
the
n
(11)
Therefore,
we
generate
a
new
dataset
D
=
{(x
,
y
)}
,
|
(O(a))|
+
|O(b)|
20,000"
i
i
each
run
no
matter
the
concept
annotation
wa
Therefore,
we
generate
a
new
dataset
D
=
{(x
,
y
)}
,
i=1
1!2
es candidate
desbetween
candidate
desi
i
0.15the
Mil.
CLsgorithm
SVM
80.9
35.9
49.7
88.1
57.3
69.5
0%)
in the
conceptinannotator.
gorithm
the concept annotator.2Feature
+
+
i=1
3:
Inlink
similarity
+
+
concept
annotati
=
!0(b)|
+ !1 ⇥ fPrecision)
...
+ !6of⇥
f6 (a, b)based~ on~ anot.
features
set ~
of
known
CLs.
Given
a
set
Recall)
F1/measure)
·
|
(I
(a))
\
I
(b)|
1 (a, b) + weights
2
·
|
(I
(a))
\
I
1!2
1!2
However,
there
were
more
CLs
discove
~
of
concept.
0"
ingthe
context
of the concept.
RM
93.5
38.2
54.2
93.7
56.2
70.2
where
the
input
vector
x
=
(
f
f
),
the
target
outn
n
input
vector
xi=1
fannotation
),was
the
outInlinks
of
an b)
article
to a set+ (7)
ofNo)CLs))
other where
articles
f4 (a,
=W1 correspond
(7) the of
82.1%)
70.5%)
75.9%)
i(f
,b
1"j6j6=
2"{(atarget
3"
4"
f
(a,
b)
=
i =as
ai ,baicept
aia,b
i ,bi data,
ilet
=A
ii =
CLs
{(a
,
b
)}
training
)}
and
where
(·)
is
a
function
that
maps
a
sets
of
articles
in
4
edge
linking.
O
+
i
i
i
1!2
done.
In
the
fourth
run,
m
i=1
+
+
2.3
Cross-lingual
Link
Prediction
lows.
|
(I
(a))|
+
|I
(b)|
2.3
Cross-lingual
Link
Prediction
efined0.20
as follows.
Mil. CLs
SVM
84.7
37.3
51.8
88.8
68.1
77.1
1!2
⇤
0.1)Mil.)CLs)
83.6%)
71.4%)
77.0%)
No"Annota,on"
34,876"
57,128"
76,932"
67,440"
|
(I
(a))|
+
|I
(b)|
put
y
is
always
set
to
1.
Then
we
train
a
linear
regression
nW
i {(b
linking
to
it,
let
I(a)
and
I(b)
denote
inlinks
of
two
articles,
1!2
For
each
article
a
in
W
,
the
article
b
in
that
maxiput
y
is
always
set
to
1.
Then
we
train
a
linear
regression
B
=
)}
,
our
model
tries
to
find
the
optimal
weights
to
1
2
to
their
corresponding
articles
in
W
if
there
are
CLs
between
i
discovered
than
in
the
third
run
when
c
i
wiki
article
with
2
i=1
Annota,on"
0.2)Mil.)CLs)
45,002"
78,334"
132,210"
152,223"using
85.6%)
73.7%)
79.2%)
RM
94.5
37.9
54.1
95.9
67.2
79.0
As
shown
in
Figure
2,
the
CL
predictor
takes
two
wikis
and
a
⇤
⇤
Feature
5
Category
similarity
model
on
the
dataset
D
to
get
the
weights
of
different
feaAs
shown
in
Figure
2,
the
CL
predictor
takes
two
wikis
and
a
ept c identified
thesimilarity
inlink similarity is computed
as the score functionmodel
Given
a concept c identified
mizes
s(a,
b
)
and
satisfies
s(a,
b
)
>
0
is
ensure:
tion;
but less CLs
were cles,
found feain
the
fourth
r
them.
Feature
5
Category
on
the
dataset
D
to
get
the
weights
of
different
so
that
new
set
of
seed
CLs
between
them
as
inputs,
then
it
predicts
a
set
seed CLs
between
them as inputs,
then itare
predicts
a set
oncept
c rtoegression anCategories
tags attached
to articles, which represent
the
tures.
The threshold !0 is set to
the
value
that
maximizes
the
Our odel is cof
ompared SVM classifica)on model.
y from
this
concept
cmto
an-(RM) set
+with R
esults o
f c
oncept a
nnotation
R
esults o
f i
ncremental l
inking
third
run
without
concept
annotation.
In sum
predicted
as
the
corresponding
article
of
a.
The
idea
of
CL
Feature
2:
Outlink
similarity
of
new
CLs.
This
subsection
first
introduces
the
definitions
of
Categories
are
tags
attached
to
articles,
which
represent
the
tures.
The
threshold
!
is
set
to
the
value
that
maximizes
the
cross-lingua
of new CLs. This subsection firsttopics
introduces
thearticles’
definitions
FigureCLs.
4: Results of Incrementally new
Knowledge
Linking
ted conditional
Results of Concept
Annotation
0 training
of the
subjects.
Let
C(a)
andFigure
C(b)3:denote
F1-measure
on the
2of· |
(I(a))
\ I(b)|
Zhichun Wang , Juanzi Li , Jie Tang
Beijing Normal University, Tsinghua University Mo)va)on Proposed Framework
?
Annota)ng Concepts Within Each Wiki Predic)ng New Cross-‐lingual Links Between Wikis
wiki ar)cle
Boosting
Number'of'discovered'CLs
Experiments

Poster

Transcription

Similar documents

Manual Capristo Lambda Simulator Control Suite (basic version)

Song PredicZon System

Defense Presentation

Modern Chinese Literature, Area Studies, and

Welcome New Students!

September 2014 - Advocates International

Spring 2007 - Chican@ and Latin@ Studies

slides - Polo Club of Data Science

The Development and Evaluation of a New Community

The World of Music: SDP layout of high