Skip to Main Content


A Master’s Degree Student’s Story: Learning to analyze complex data and hitting the gym

January 11, 2023
by Fran Fried

Zichun Xu’s journey to the Yale School of Public Health – and its Biostatistics Data Science Pathway – began half a globe away, in Shanghai, with a brief detour to Canada.

“I spent four incredible years at Fudan University in Shanghai as an undergraduate student before coming to Yale,” Xu said. “My interests were eclectic back then; I took courses ranging from biology to mathematics and even Greek history. I have also worked on many interesting projects, including the effects of a chemical compound on cancer, a genome-wide association study, and clinical trial design.” He spent his fall 2018 semester at the University of Alberta in Edmonton, “where I met new friends and took many exciting trips.”

Xu, MS ’23, eventually zeroed in on pursuing biostatistics as his graduate field after working on several data analysis projects as an undergraduate, including analyzing UK Biobank data to find significant genetic variants for obesity and investigating biomarkers for a new type 2 diabetes treatment.

Xu said he chose YSPH to pursue his biostatistics studies because “it is a renowned institute in the field of public health.” The school, he said, provides abundant resources for students.

The Department of Biostatistics offers rigorous training by world-class faculty with an engaging academic environment. The department also has close connections with Yale School of Medicine and Yale University’s Department of Statistics & Data Science, which sparks interdisciplinary research.

One aspect Xu enjoys about YSPH is its academic freedom. He cited the ability to take a variety of courses, such as Statistical Inference with Assistant Professor Zhou Fan, whose teaching style he called “inspirational”; and Machine Learning in Biomedical Data with Leying Guan, assistant professor of biostatistics.

At Yale, you can choose from a wide range of elective courses and focus on a specific area that interests you


“At Yale, you can choose from a wide range of elective courses and focus on a specific area that interests you,” Xu said.[1] “For example, something that I have always wanted to learn about is the statistical foundation of data science, and there are a variety of relevant courses offered at Yale both by the biostatistics department and the statistics and data science department.”

When he’s not studying or working on projects, Xu enjoys working out, taking part in activities around Yale’s historic campus, and venturing into the vibrant city of New Haven.

In his downtime, he often can be found at the Payne Whitney Gym, where he plays basketball two or three times a week. “The court is also an important place for me to make new friends at Yale,” he said. He’s also a PC gamer, usually playing with his friends in China. And his favorite restaurant in New Haven is Frank Pepe’s, the world-famous apizza restaurant on Wooster Street, although he also frequents Chinese restaurants in Boston, Mass., which is about a three-hour drive from New Haven.

A Positive Role Model

Xu’s academic advisor and research supervisor is Hongyu Zhao, the Ira V. Hiscock Professor of Biostatistics; professor of genetics; and professor of statistics and data science.

Professor Zhao “has given me many valuable suggestions academically and for my career,” Xu said. “With his rigorous attitude and critical thinking toward science, he has been a role model for me when it comes to research.”

“Through constant exploring and interrogating,” Xu said, “I have substantially improved my ability to think critically and logically, and through many iterations of failure and success, I have developed the stamina to stick with a difficult problem and enjoy not only the moment that it is solved but the very process of solving it.”

Xu’s commitment to his field was recognized in November 2022, when he received the Dr. Colin White Memorial Scholarship, named in honor of a former chair of the Department of Epidemiology and Public Health at Yale. The scholarship is presented to a master’s degree student who has shown exemplary performance in their studies.

Rigorous Training and Engaging Studies

Biostatistics is a diverse field with many different areas of study. Xu has chosen to focus on genomic data analysis, in which he has some experience.

“Genomic data is a perfect example of what I describe as ‘big data,’” he explained. “The data are usually high-dimensional, with tens of thousands of variables and non-Gaussian, meaning that they are very different from the data we encounter daily. All these features bring significant challenges to data analysis, and that is where I think biostatisticians can shine.”

There’s also an X factor in data analysis, which Xu compared to a medical doctor diagnosing disease.

“We look at the data and poke around with different tools from our statistical toolbox to get a sense of what is going on,” Xu said. “Then we make the diagnostics by identifying problems within the current analysis pipeline or existing methods when they do not perform as expected. Finally, we prescribe a solution tailored to the specific conditions of the data. Sometimes this solution is just a tiny tweak of the existing pipeline. Other times it might be a completely new invention.”

Xu said the emergence of many new data-generating technologies has made the field more exciting than ever.

“Genomic data science is a burgeoning field with many open questions left to be addressed,” Xu said. “Especially for statisticians, the complexity and high dimensionality of genomic data are fertile ground for the invention of novel statistical methods, and I want to take advantage of the resources that YSPH offers to contribute to this field.”

He added: “In genomics, we can now categorize the central dogma from different perspectives. We have genetics data, gene expression data, chromatin accessibility data, and protein-omics data, and each type of data presents unique characteristics and challenges,” he said. “And it is up to us, the biostatisticians, to develop practical tools to extract useful information from them. Ultimately, we may be able to demystify the genetic regulation dynamics and untangle the mechanism behind complex diseases with these new data.”

Submitted by Sabrina Lacerda Naia dos Santos on January 11, 2023