Co-authorship Database development using Google scholar Data
Transcription
Co-authorship Database development using Google scholar Data
COLLECTING AND MAPPING FACULTY COLLABORATION DATA Mingzhu Zhu, Dr. Brook Wu, Dr. Nancy Steffen-Fluhr, Dr. S. Roxanne Hiltz, Dr. Katia Passerini, Dr. Anatoliy Gruzd, Regina Collins Data collection from Google scholar Data collection from Scopus Scopus author search Fist name, last name Affiliation filtering Common names Pub list fetcher Extract title, year, source title… Pub detail page fetcher Extract bibliographic data mapping authors with their affiliation information Merge three Co-authorship Databases A list of NJIT faculty names was used to carry out Google scholar’s author search, and 63,937 raw search hits were collected. For the co-authorship database, only the publications with at least two NJIT authors qualify for inclusion. Co-authorship DB from Scopus Duplicate removal Co-authorship DB from Google scholar Publication merge Co-authorship DB manually created 2 years ago Author pub mapping Co-authorship data base Fig-2. Display of Publications in Google Scholar and ACM Co-authorship Database development using Google scholar Co-authorship data between NJIT faculty members Period study: Pubs in 2000-2010 Co-authorship data between NJIT faculty and graduate students Co-authorship data Between NJIT faculty and external authors An overview of co-authorship database development using Google scholar is illustrated in Fig-3. Co-authorship network visualization Co-authorship database development using Scopus An overview of co-authorship database development using Scopus is illustrated in Fig-1. NJIT Faculty Name List For each faculty name, do author search 1 Scopus NJIT Faculty Name List For each faculty name, do author search Author list Pub list Pub detail Bibliographic data extractor Co-authored publication extraction 2 Duplicate removal Page fetcher: 3 downloads pubs from different DLs Candidate coauthored publication {Title, URLs} Duplicate removal Publication data base Coauthor data base Author affiliation mapping Raw data process Fig-1. Overview of co-authorship database development from scopus A list of NJIT faculty names was used to carry out Scopus author search. Fortunately, no NJIT faculty members have common names. We use affiliation information to filter out those authors who come from other organizations but have the same name with NJIT faculty members. Co-authorship database was developed by extracting the bibliographic data for each faculty member. …… Sciencedirect Publication Database {Title, URLs} Affiliation filtering portal.acm.org Google Scholar Affiliation verification 4 Coauthored publication database {Title, authors} Fig-3. Overview of co-authorship database development Co-authored publication extraction: If a paper is co-authored by two NJIT faculty members, it should appear in two different publication lists. For instance, suppose there are two authors A and B, two publication lists named Pa and Pb are denoted: Publication list for author A: Pa={p(a)1, p(a)2, p(a)3, ……. p(a)n}, and Publication list for author B: Pb={p(b)1, p(b2, p(b)3, ……. p(b)m}. If a publication p is coauthored by A and B, then p ∈ Pa and p ∈ Pb. 2,043 coauthored publication candidates were identified. After removing the duplicates, only 1,914 pubs were left. Fig-4. Co-authorship network created using keywords “Jian”