Past events

CS&E Colloquium: Human-Centered AI through Scalable Visual Data Analytics

The computer science colloquium takes place on Mondays and Fridays from 11:15 a.m. - 12:15 p.m.

This week's speaker, Minsuk Kahng (Oregon State University), will be giving a talk titled "Human-Centered AI through Scalable Visual Data Analytics".

Abstract

While artificial intelligence (AI) is widely used today, people often use AI without understanding how it works. This could be detrimental especially when AI is used for high-stakes domains or produces erroneous outputs. How can we help people interpret complex AI systems that use large datasets?

In this talk, I will present my human-centered approach to AI interpretability. I create novel visual analytics systems that are scalable, interactive, and easy to use. With such tools, AI systems that perform well will be trusted, and those that do not can be improved. Specifically, I will discuss three threads of my work: (1) Scalable Interpretation: developing interactive visualization systems for exploring industry-scale deep neural networks (e.g., ActiVis deployed by Facebook); (2) Insight Discovery in Workflow: designing analytic workflows for pracitioners to make AI less biased and more accurate (e.g., fairness auditing, model debugging); and (3) Interactive AI Education: building interactive tools that broaden people's education access to deep learning (e.g., GAN Lab open-sourced with Google). I conclude with a vision to incorporate human's domain knowledge into AI and democratize the use of AI.

Biography

Minsuk Kahng is an Assistant Professor of Computer Science at Oregon State University. His research focuses on building visual analytics tools for people to interpret and interact with machine learning systems and large datasets. His work synergistically combines methods from data visualization, Explainable AI, human-computer interaction, and databases. Kahng's research led to deployed technologies by Facebook (e.g., ActiVis, ML Cube) and open-sourced tools (e.g., GAN Lab used by more than 200,000 people in over 190 countries). His research has been supported by NSF, DARPA, Google, and Facebook, and recognized by pretious awards, including Google PhD Fellowship and NSF Graduate Research Fellowship. Before joining Oregon State, Kahng earned his Ph.D. from Georgia Tech with a Dissertation Award and received his Master's and Bachelor's degrees from Seoul National University in South Korea. Website: https://minsuk.com

CS&E Colloquium: The Ins and Outs of Explanations in NLP

The computer science colloquium takes place on Mondays and Fridays from 11:15 a.m. - 12:15 p.m.

This week's speaker, Sam Carton (University of Chicago), will be giving a talk titled "The Ins and Outs of Explanations in NLP".

Abstract

In natural language processing (NLP) as in other areas of machine learning, the rise of large neural networks has led to increased interest in model explainability as a means to approach safety and ethics problems in applying such models to human-impactful decision tasks. In this talk I consider two perspectives on explanations in NLP: 1) as additional context by which humans can verify model predictions for improved human-model collaboration; and 2) as a mechanism by which to exert more fine-tuned control over model behavior--to make model predictions more robust, more aligned with human reasoning and even more accurate. I argue that ultimately these two perspectives form a virtuous circle of information flow from model to human and back, and that it is important to consider both in designing new explanation techniques and evaluations. I discuss my work on both perspectives before concluding with an agenda for future work in this area.

Biography

Sam Carton is a postdoctoral fellow working on explainable natural language processing with Chenhao Tan, initially at the University of Colorado Boulder and presently at the University of Chicago Department of Computer Science. He completed his PhD at the University of Michigan School of Information, working with Paul Resnick and Qiaozhu Mei. Sam publishes across a range of conferences from human-computer interaction to natural language processing. His work has been supported by grants from various sources including the NSF, Amazon and Salesforce.

CS&E Colloquium: Deep Neural Networks Explainability: Algorithms and Applications

The computer science colloquium takes place on Mondays and Fridays from 11:15 a.m. - 12:15 p.m.

This week's speaker, Mengnan Du (Texas A&M University), will be giving a talk titled "Deep Neural Networks Explainability: Algorithms and Applications".

Abstract

Deep neural networks (DNN) have achieved extremely high prediction accuracy in a wide range of fields such as computer vision, natural language processing, and recommender systems. Despite the superior performance, DNN models are often regarded as black-boxes and criticized for the lack of interpretability, since these models cannot provide meaningful explanations on how a certain prediction is made. Without the explanations to enhance the transparency of DNN models, it would become difficult to build up trust and credibility among end-users. In this talk, I will present our efforts to tackle the black-box problem and to make powerful DNN models more interpretable and trustworthy. First, I will introduce post-hoc interpretation approaches for predictions made by two standard DNN architectures, including Convolution Neural Network (CNN) and Recurrent Neural Network (RNN). Second, I will introduce the usage of explainability as a debugging tool to improve the generalization ability and fairness of DNN models.

Biography

Mengnan Du is currently a Ph.D. student in Computer Science at Texas A&M University, under the supervision of Dr. Xia Ben Hu. His research is on the broad area of trustworthy machine learning, with a particular interest in the areas of explainable, fair, and robust DNNs. He has had around 40 papers published in prestigious venues such as NeurIPS, AAAI, KDD, WWW, NAACL, ICLR, CACM, and TPAMI. He received over 1,200 citations with an H-index of 11. Three of his papers were selected for the Best Paper (Candidate) at WWW 2019, ICDM 2019, and INFORMS 2019, respectively. His paper on Explainable AI was also highlighted on the cover page of Communications of the ACM, January 2020 issue. He served as the Registration Chair of WSDM’22, and is the program committee member of conferences including NeurIPS, ICML, ICLR, AAAI, ACL, EMNLP, NAACL, etc.

CS&E Colloquium: Data Preparation: The Biggest Roadblock in Data Science

The computer science colloquium takes place on Mondays and Fridays from 11:15 a.m. - 12:15 p.m.

This week's speaker, Elkindi Rezig (MIT), will be giving a talk titled "Data Preparation: The Biggest Roadblock in Data Science".

Abstract

When building Machine learning (ML) models, data scientists face a significant hurdle: data preparation. ML models are exactly as good as the data we train them on. Unfortunately, data preparation is tedious and laborious because it often requires human judgment on how to proceed. In fact, data scientists spend at least 80% of their time locating the datasets they want to analyze, integrating them together, and cleaning the result.

In this talk, I will present my key contributions in data preparation for data science, which address the following problems: (1) data discovery: how to discover data of interest from a large collection of heterogeneous tables (e.g., data lakes); (2) error detection: how to find errors in the input and intermediate data in complex data workflows; and (3) data repairing: how to repair data errors with minimal human intervention. The developed systems are specifically designed to support data science development which poses particular requirements such as interactivity and modularity. The talk will feature demonstrations of data preparation systems as well as discussions of our developed algorithms and techniques that enable data preparation at scale.

Biography

El Kindi Rezig is a postdoctoral associate at the Computer Science and Artificial Intelligence Laboratory (CSAIL) of MIT where he works under the supervision of Michael Stonebraker. He earned his Ph.D. in Computer Science from Purdue University under the supervision of Walid Aref and Mourad Ouzzani. His research interests revolve around data management in general and data preparation for data science in particular. He has developed systems in collaboration with several organizations including Intel, Massachusetts General Hospital, and the U.S. Air Force.

Application deadline for integrated program

The application deadline for the computer science integrated program (Bachelor's/Master's) is March 15.

This is exclusively available to students officially admitted to the College of Science & Engineering Bachelor’s of Science in Computer Science, Bachelor’s of Computer Engineering, the College of Liberal Arts Bachelor’s of Arts in Computer Science, and the College of Liberal Arts Second Major in Computer Science. The program allows students with strong academic performance records to take additional credits (up to 16 credits) at undergraduate tuition rates during their last few semesters which can be applied towards the Computer Science M.S. program.

Applicants must have at least 75 credits completed at the time of their application. Read more about the program eligibility requirements.

Applications must be submitted online. Before applying, students should review the application procedures.

Students will be notified of the outcome of their application via email by June 1 for a fall start. In some cases, an admission decision will be put on hold until semester grades are finalized. Students will be notified if their application is on hold.

CS&E Colloquium: Translating AI to Impact: Uncertainty and Human-agent Interactions in Multi-agent Systems for Public Health and Conservation

The computer science colloquium takes place on Mondays and Fridays from 11:15 a.m. - 12:15 p.m.

This week's speaker, Elizabeth Bondi (Harvard University), will be giving a talk titled "Translating AI to Impact: Uncertainty and Human-agent Interactions in Multi-agent Systems for Public Health and Conservation".

Abstract

AI is now being applied widely in society, including to support decision-making in important, resource-constrained efforts in conservation and public health. Such real-world use cases introduce new challenges, like noisy, limited data and human-in-the-loop decision-making. I show that ignoring these challenges can lead to suboptimal results in AI for social impact systems. For example, previous research has modeled illegal wildlife poaching using a defender-adversary security game with signaling to better allocate scarce conservation resources. However, this work has not considered detection uncertainty arising from noisy, limited data. In contrast, my work addresses uncertainty beginning in the data analysis stage, through to the higher-level reasoning stage of defender-adversary security games with signaling. I introduce novel techniques, such as additional randomized signaling in the security game, to handle uncertainty appropriately, thereby reducing losses to the defender. I show similar reasoning is important in public health, where we would like to predict disease prevalence with few ground truth samples in order to better inform policy, such as optimizing resource allocation. In addition to modeling such real-world efforts holistically, we must also work with all stakeholders in this research, including by making our field more inclusive through efforts like my nonprofit, Try AI.

Biography

Elizabeth Bondi is a PhD candidate in Computer Science at Harvard University advised by Prof. Milind Tambe. Her research interests include multi-agent systems, remote sensing, computer vision, and deep learning, especially applied to conservation and public health. Among her awards are Best Paper Runner up at AAAI 2021, Best Application Demo Award at AAMAS 2019, Best Paper Award at SPIE DCS 2016, and an Honorable Mention for the NSF Graduate Research Fellowship Program in 2017.

CS&E Colloquium: SAUL: Towards Effective Data Science

The computer science colloquium takes place on Mondays and Fridays from 11:15 a.m. - 12:15 p.m.

This week's speaker, Lei Cao (MIT), will be giving a talk titled "SAUL: Towards Effective Data Science".

Abstract

An effective data system should satisfy SAUL properties: being scalable, automatic, and easy to keep human in the loop. It should automatically address low-level performance bottleneck to scale to big data. It should be tuning free or at least easy for users to tune. It should be easy to keep human in the loop such that users can easily customize the system to meet their domain specific requirements. The goal of my research is to build data systems satisfying SAUL. My talk will cover our most recent works targeting on the automatic dimension of SAUL, including RITA which automates the preprocessing of timeseries data and AutoAD which automates the tuning process of anomaly detection.

Timeseries analytics is of great importance to many real-world applications. However, traditional techniques of timeseries analytics heavily rely on human to preprocess the data and extract features, thus hard to use and unscalable. To solve this problem, we propose RITA which inspired by the pre-training model in natural language processing, uses the correlations among the values in timeseries to automatically produce high quality feature embeddings. The novelty attention mechanism scales RITA to highly complex, massive-scale timeseries data. Anomaly detection is critical in many scientific and engineering fields ranging from defending network intrusions to detecting seizures in EEG medical data. However, although previously research has offered a plethora of unsupervised anomaly detection algorithms, effective anomaly detection remains challenging for data scientists due to the manual process of determining which among these many algorithms is best suited to their particular domain. Automating this process is particularly challenging in unsupervised setting, where no labels are available for cross-validation. AutoAD solves this problem by using a fundamentally new strategy that unifies the merits of unsupervised anomaly detection and supervised classification.

Biography

Dr. Lei Cao is a Research Scientist at MIT CSAIL, working with Prof. Samuel Madden and Prof. Michael Stonebraker in the Data System group. Before that he worked for IBM T.J. Watson Research Center as a Research Staff Member in the AI, Blockchain, and Quantum Solutions group. His recent research is focused on developing systems and algorithms for data scientists to effectively make sense of data.

Past events

CS&E Colloquium: Human-Centered AI through Scalable Visual Data Analytics

Abstract

Biography

CS&E Colloquium: The Ins and Outs of Explanations in NLP

Abstract

Biography

CS&E Colloquium: Deep Neural Networks Explainability: Algorithms and Applications

Abstract

Biography

CS&E Colloquium: Data Preparation: The Biggest Roadblock in Data Science

Abstract

Biography

Application deadline for integrated program

CS&E Colloquium: Translating AI to Impact: Uncertainty and Human-agent Interactions in Multi-agent Systems for Public Health and Conservation

Abstract

Biography

CS&E Colloquium: SAUL: Towards Effective Data Science

Abstract

Biography

Spring break

Application deadline for the M.S. program

Application deadline for the M.C.S. program

More About Department News and Events

More About Events