Thursday Extra: "Computational linguistics: crawling the Web for non-English data"

Computer Science 2013-09-16

Summary:

On Thursday, September 19, Kim Spasaro 2014 will discuss the construction of an digital collection of written text in a specific language. She writes:

This summer I interned with Carnegie Mellon's Language Technologies Institute. While there, I was part of a project working to enable machine translation for Bantu languages. More specifically, I was responsible for building a corpus of Kinyarwanda phrases to be used for machine learning. At this talk, I will discuss how I used the Apache Nutch web crawler to launch a large-scale web crawl in search of Kinyarwanda data.

Refreshments will be served at 4:15 p.m. in the Computer Science Commons (Noyce 3817). The talk, “Computational linguistics: crawling the Web for non-English data,” will follow at 4:30 p.m. in Noyce 3821. Everyone is welcome to attend!

Link:

http://www.cs.grinnell.edu/drupal6/node/649

From feeds:

Gudgeon and gist » Computer Science

Tags:

thursday extra computational lingustics corpus linguistics web crawling

Authors:

stone

Date tagged:

09/16/2013, 16:10

Date published:

09/16/2013, 13:55