Yizhou Yan Personal Info Education


Yizhou Yan
H +(86) 138 4080 9132
B yizhouyan9132@outlook.com
Personal Info
Name: Yizhou Yan
Address: Room 514, Dormitory Building 6, School of Software, Dalian University of Technology (DUT), Road No.8,
Development Zone, Dalian 116621, China
Date of Birth:
March 2, 1991
+(86) 138 4080 9132
2013.09-present Master in Software Engineering, School of Software, Dalian University of Technology, Dalian, China.
{ Rank: 2/51;
{ GPA: 3.75/4.00, 87.46/100;
{ Supervisor: Dr. Yu LIU (Professor/Assistant Dean)
2009.09-2013.06 Bachelor in Software Engineering, School of Software, Dalian University of Technology, Dalian, China.
{ Rank: 15/289;
{ GPA: 3.82/4.00, 89.30/100; Major Course GPA: 3.95/4.00, 92/100;
{ Thesis: Gene Enrichment Analysis Based on Non-negative Matrix Factorization (Supervised by Dr. Yu LIU)
Research Interests
My research mainly focuses on data analysis and mining. I have been working on the management and processing of biomolecular
data collected on a genome-wide scale (Computational Biology and Bioinformatics). I’m familiar with various NMF (Nonnegative Matrix Factorization) algorithms and have successfully applied to many experiments. I have also collected large sets
of scholarly/scientific data including datasets for calculating h-sequence. Currently I’m engaging in community detection for
large-scale data with applications to microarray data and scholarly data.
1. Zhewen SHI, Yu LIU, Yizhou YAN, Xiaowei ZHAO. A Hierarchical Community Detection Method in Complex Networks.
Journal of Computational Information Systems, vol.9, no.24, pp. 9715-9724, 2013.
2. Yu LIU, Zhen HUANG, Jing FANG, Yizhou YAN. An Article Level Metric in the Context of Research Community. WWW’14
Companion, Seoul, Korea, April 7-11, 2014.
3. Yu LIU,Yizhou YAN, Zhewen SHI, Aedin C Culhane (in preparation for submission). GeneSigCatcher: Automated retrieval of
most relevant PubMed Central articles for GeneSigDB.
4. Yu LIU, Yizhou YAN (in preparation for submission). ScholarSeq: A Benchmark dataset for calculating a sequence of impact
measures at individual level.
Awards and Certifications
Excellent Postgraduate Award(Top 5%)
First Class Scholarship for Postgraduates (Top 15%)
The third prize in NPMCM (National Postgraduate MCM)
Learning Merit Scholarship (Top 15%, twice; Top 5%, once), Individual Scholarship (Top 10%, once)
Honorable Mention in ICM, USA
The third prize in CUMCM (China Undergraduate MCM)
Technical Skills
C, C++, C#, Java, Matlab, R
MySQL, SQL Server, Oracle
Latex, Microsoft Office, EndNote
Research Experiences
2014.07-present Community detection among large networks.
{ Description: This project is about devising algorithms of community detection in large networks. We will propose
a new method applicable to large-scale networks derived from genome data and scholarly data, respectively.
{ My responsibilities: To learn various algorithms of community detection; to design the method; and to apply the
method to genome data.
{ Skill Acquired: Methods of Community Detection
2014.03-present A benchmark dataset for calculating a sequence of academic impact measures at individual level.
{ Description: We have constructed a benchmark dataset that can be used for various dynamic academic impact
assessments concerning time sequence (e.g. h-sequence). A corresponding system is under development, which
will provide management of sequence data for scholars majoring in Computer Science. This system will be publicly
accessible as a website very soon. A paper reporting this work is in preparation for submission.
{ My responsibilities: To crawl data from several websites and provide a dataset for calculating time-based impact
measures such as h-sequence; To implement major h-sequence methods on the dataset; To provide background
data for the online system.
{ Skill Acquired: Data crawling from websites; Data processing; h-index/h-sequence calculation
2013.09-2014.03 Name Disambiguation.
{ Description: We proposed a muti-level clustering algorithm combing the coauthor network and latent relations
among venues to solve the name disambiguation problem.
{ My responsibilities: To exploit NMF to identify the relationships between venues.
{ Skill Acquired: NMF on large-scale datasets; Hadoop; Name Disambiguation
2013.03-2013.09 Gene Set Enrichment Analysis.
{ Description: By collaborating with Dr. Aedin C Culhane at Harvard School of Public Health, we incorporate the
notion of degree of membership in fuzzy math into traditional NMF-based bi-clustering method and proposed a
novel process for classifying genes and phenotypes, finding associations between them at the same time. Sparseness
also be calculated to avoid noise. A paper on this work is in preparation for submission.
{ My responsibilities: To accomplish the whole project under the supervision of Dr. Aedin C Culhane and Prof. Yu
{ Skill Acquired: Details of GSEA; NMF algorithms
2012.08-2013.03 Automated retrieve relevant articles for GeneSigDB.
{ Description: In cooperate with Dr. Aedin C Culhane at Harvard School of Public Health, we utilize data mining
methods to describe a new strategy to identify the subset of publications most relevant to GeneSigDB. This
approach is expected to improve the efficiency of manual biocuration pipeline for GeneSigDB. The process contains
the optimization of PMC search keywords using Latent Semantic Analysis and Vector Space Model, the extraction
of tables from PDF files, and the classification of results. Biocurators found the pipeline useful and manually
confirmed 90% of predicated gene-list-positive articles contained gene signatures. A paper reporting this work is in
preparation for submission. The results are accessible at http://www.linkscholar.net/genelistfinder/.
{ My responsibilities: To accomplish the whole project under the supervision of Dr. Aedin C Culhane and Prof. Yu
{ Skill Acquired: Bioinformatics; Methods of data mining; Matlab and R; Paper reading and writing
2010.09-2011.08 Design of EDUGUI for Embedded Systems.
{ Description: EDUGUI is a lightweight GUI framework that not only provides a complete desktop environment for
the users, but also furnishes a set of convenient and rich APIs to the developers. It works well on many platforms,
including x86 Linux, x86_64 Linux and ARM Linux, in a fast and resource efficient manner.
{ My responsibilities: To fix bugs in this system; To improve the user interfaces; To implement several useful example
{ Skill Acquired: Handling large projects; Linux; SVN
{ TOEFL (2014/08): 107 (Listening: 29; Writing: 29; Reading: 26; Speaking: 23)
{ GRE (2013/09): 314+3.5 (Verbal: 150; Quantitive: 164; Writing: 3.5) (Will update on 11/16/2014)
Japanese: N1 of JLPT: Passed (TOP TEST of Japanese)
Chinese: Native
Teaching Assistant for Computer Networking (by Dr. Feng XIA )
Teaching Assistant for Introduction to Algorithms (by Dr. Lei WANG)
Teaching Assistant for Database System (by Dr. Yu LIU)
Leader of Embedded Group in Center of Innovation and Practice
Vice Leader of Embedded Group in Center of Innovation and Practice