This is Penn State

Data Sciences Research Seminar

 

The aim of the seminar is to put together Penn State researchers in data science related fields and to serve as a platform for networking, sharing expertise, and fostering multidisciplinary and translational research collaborations.

The seminar series is co-sponsored by the Data Sciences program, the Center for Big Data Analytics and Discovery Informatics, the Institute for Cyberscience, the Clinical and Translational Sciences Institute, the Biomedical Data Sciences (BD2K) Predoctoral Training Program at Pennsylvania State University.

Seminar Schedule

Date Presenter(s) Topic
October 2 Sharon Huang
Penn State College of Information Sciences and Technology
Generative Adversarial Networks for Image Synthesis and Segmentation
October 9 Naomi Altman
Dept. of Statistics & Bioinformatics and Genomics Program, PSU
Principal Component Analysis: Dimension reduction and Pattern Detection
October 16 Luke Huan
Big Data Research Lab, Baidu
TBA
October 23 Zihan Zhou
College of Information Sciences and Technology, PSU
Learning to Discover Structure for 3D Computer Vision
October 30 Daniel Susser
College of Information Sciences and Technology, PSU
Online Manipulation
November 27 Shomir Wilson
College of Information Sciences and Technology, PSU
Human Language Technologies for Understanding Online Privacy
December 4 Yasser El-Manzalawy
College of Information Sciences and Technology, PSU
From Bioinformatics to Translational and Health Informatics: Whitening the Black Box

Generative Adversarial Networks for Image Synthesis and Segmentation

October 2, 2018
Speaker: Sharon Huang, College of Information Sciences and Technology
Location: E202 Westgate Bldg.



Abstract: A frequent challenge in applying supervised learning techniques to biomedical image classification and segmentation is the lack of training data. Getting expert annotations can be expensive and time-consuming. One potential solution is to expand biomedical datasets via synthesis by generative models such as Generative Adversarial Networks (GANs). In this talk, I will present two GAN frameworks, StackGAN and AttnGAN, which can generate high-solution realistic images conditioned on natural language descriptions of a scene. I will also introduce SegAN, an adversarial neural network architecture that can be trained to perform automated image segmentation. The talk will conclude with a discussion of future research directions.


Principal Component Analysis: Dimension reduction and Pattern Detection

October 9, 2018
Speaker: Naomi Altman, Dept. of Statistics & Bioinformatics and Genomics Program
Location: E202 Westgate Bldg.



Abstract: Principal Component Analysis (PCA) is an essential tool for data scientists used both as an analysis tool in its own right and as a component of more complex analysis pipelines. In this talk, I will try to show why PCA has such a central role. En route, I will issues such as centering, the multivariate distribution of the data, kernel methods and principal curves.


Learning to Discover Structure for 3D Computer Vision

October 23, 2018
Speaker: Zihan Zhou, College of Information Sciences and Technology
Location: E202 Westgate Bldg.



Abstract: Although significant progress has been made about the imaging devices and 3D reconstruction techniques over the past few decades, obtaining accurate 3D models and the associated camera positions from images still remains a challenging problem in computer vision. Moreover, when directly utilized in real-world applications, existing methods often lead to unsatisfactory performances in terms of reliability, applicability, and scalability.

In this talk, I argue that much of the difficulties faced by the current techniques and applications can be alleviated by harnessing various forms of global spatial relationships in the scene, such as parallel lines, planar surfaces, and repetitive patterns. Specifically, I describe novel deep learning methods to discover such geometric structures from images, and demonstrate how to utilize them for robust and efficient 3D vision. As an ongoing research effort, I will also discuss new opportunities and challenges in exploiting global structures in large-scale visual data for emerging commercial and customer applications, such as architectural design, VR/AR, and autonomous driving.


Online Manipulation

October 30, 2018
Speaker: Daniel Susser, College of Information Sciences and Technology
Location: E202 Westgate Bldg.



Abstract: Privacy and surveillance scholars routinely argue that advertisers and employers manipulate us. Using information they collect about our preferences, interests, incomes, and so on, they induce us to buy their products, vote for their candidates, and work when and for however long it suits them. Yet what that means, exactly—what it means to manipulate someone—and how we might systematically identify cases of online manipulation has yet to be carefully explored. In this project, we develop a definition of manipulation, explore the ways information technology can be used to facilitate manipulative practices, and describe the harms that flow from engaging in such practices. While our discussion is philosophical, our aim is not merely conceptual: by clarifying the nature of manipulative online practices, we hope to guide technical and policy efforts to combat them.


Human Language Technologies for Understanding Online Privacy

November 27, 2018
Speaker: Shomir Wilson, College of Information Sciences and Technology
Location: E202 Westgate Bldg.



Abstract: Research has shown that internet users care about their privacy, but they do not have the time or legal expertise to understand the privacy policies of all the websites they visit or all the mobile apps they use. Fixing this gap in online notice and choice is the goal of the Usable Privacy Policy Project, an NSF-funded project to extract salient details from privacy policies and present them to internet users in ways that respond to their needs. I will present results from our work that show crowdworkers can answer questions about privacy policies with high accuracy and automated methods can identify important details in policy texts, such as statements about data collection and users' privacy options. I will then present some brief vignettes from my research on user privacy in online social networks, as well as some spinoff work on identifying section titles and prose text in documents from the web.


From Bioinformatics to Translational and Health Informatics: Whitening the Black Box

December 4, 2018
Speaker: Yasser El-Manzalawy, College of Information Sciences and Technology
Location: E202 Westgate Bldg.



Abstract: Machine learning has been extensively used in developing predictive models for a variety of bioinformatics tools. In many of these applications, to achieve the highest possible predictive performance, black box models (e.g., SVMs and Neural Networks) had been preferred over white box models (e.g., Decision Trees and Rule-based models). First, I argue that sacrificing interpretability for the sake of performance is a reasonable decision for many bioinformatics applications. However, I will demonstrate that uncareful utilization of black box models can lead to misinterpretations of the results. Second, I argue that interpretability of predictive models is essential for adopting these models in translational and healthcare settings. Unfortunately, using white box models is very challenging due to the high dimensionality, sparsity, and heterogeneity of the data. Alternatively, one can use a black box model augmented with interpretable predictions. Using a simple notion of interpretable predictions, I will present examples from my recent research on developing interpretable models for biomarker discovery from multi-omics and metagenomics data. I will conclude the talk with some discussions on the promise and challenges of adapting these approaches for integrative and predictive analyses of multi-omics, environmental, imagining, and phenotype data extracted from EHR systems.