Dragomir Radev Joins SEAS Faculty WIth Collaborations Already In The Works

Departments: Computer Science

Let’s say you download a recently published paper from an academic journal. Even if you’re familiar with the subject, there’s a good chance you may need some supplemental material to get a good handle on the study. How do you know which previous studies will provide the necessary background?

Dragomir Radev, professor of computer science is working on a program that will help. Radev, who joined the Yale SEAS faculty Jan. 1, was previously at the University of Michigan, where he was on the faculty after getting his PhD at Columbia and working on question answering at IBM.

The project, known as “multi-document summarization,” would tell users precisely what they need to read up on before they can tackle certain papers. 

“The system will figure out which papers you have to read before you can fully understand this paper - what are the stepping stones, the topics that you’re missing?” For example, he said, perhaps you’re reading a paper on natural language processing. “Do you need to know about Bayes Theorem or about Chinese Word Segmentation? Maybe yes, maybe no. There may be thousands of topics that are relevant to natural language processing, but for this paper, you might need to know only six of them.”

Natural Language Processing (NLP) is Radev’s main area of research. It is an area of study where computer science, linguistics, and artificial intelligence intersect, and it is increasingly prominent in our lives. It plays a role in everything from how Apple’s Siri assists us to translating text from one language to another. One indication of the level of interest in the subject is that 135 students have signed up for Radev’s NLP course this semester. Large class sizes are nothing new for Radev. He has taught an NLP MOOC (massive open online course) on Coursera with more than 10,000 students.

Radev is also the founder of the North American Computational Linguistics Olympiad (NACLO), (www.nacloweb.org), a computational linguistics competition for high school students in the United States and Canada. 

“This is a big competition that we’ve been running for 11 years now,” Radev said, adding that about 2,000 students participate each year. “It’s for people interested in linguistics, computational linguistics, natural language processing, and machine learning - that sort of thing.” The best students in the USA participate at the international level and have won many medals there. Among them are current Yale undergraduates Tom McCoy (’17), Aidan Kaplan (’17), and James Wedgwood (’20). Unlike many other high school events related to computer science, almost 50% of the participants in NACLO are female. Radev is the editor of two published volumes of NACLO problems.

His other areas of expertise include text summarization, dialogue systems, and deep neural learning – all of which have a wide range of applications. Even before he arrived at Yale, Radev was in contact with several faculty members from other fields about striking up collaborations, including those from the medical school, the humanities programs and the political science department. With the daunting amount of textual data that researchers in these fields encounter, Radev’s expertise should prove a valuable resource. 

“There’s a general awareness now that natural language processing and these other tools can be helpful to those other fields,” he said. “Ten years ago, many people from other fields didn’t even know that you could do this sort of work.” 

Radev has published close to 200 papers in NLP and related areas. He’s currently continuing several collaborations outside of Yale. One project is with IBM and the University of Michigan on building a conversational dialogue system for student advising. Another is with Robert Mankoff, editor of the New Yorker magazine’s cartoons. As part of an ongoing project, they’re looking at how well computers can recognize humor and even generate their own original jokes.

“It may or may not work, but it will be very interesting to be able to see if a computer can understand New Yorker cartoons and get the jokes in them,” he said.