Wednesday, January 14, 2015

Two workshops on web-crawling at UCLA

The International Institute and the UCLA Program on International Migration are proud to present:

Introduction to Hyphe: A new webcrawler for analyzing controversies

Monday, January 26, from 3 to 5 p.m.
and
Wednesday, January 28, from 3 to 5 p.m.

Both workshops in the Laboratory for Digital Cultural Heritage, in the Young Research Library Research Commons

Presented by Mathieu Jacomy, M├ędiaLab, Sciences Po, Paris

Working on controversies – whether related to immigration, the environment, or police behavior -- can be greatly facilitated by crawling the websites maintained by actors involved in any controversy and thereby analyze their online connections.

These workshops are designed to introduce non-technical users to a new web crawler, Hyphe, designed so that researchers can control the building of a web corpus (by filtering and qualifying the websites to include in the corpus) while simultaneously providing  powerful tools capable of handling the huge amount of data available on the web.

Using modern and robust technologies such as Lucene, MongoDB, Scrapy, Twisted, Thrift, Domino.js, Sigma.js or Bootstrap, Hyphe can manage multiple corpora within each instance, bypassing crawling issues (redirections, cookies, javascript-only pages, …), handling multi-websites entities from the web interface, tagging the results, and so on…

Hyphe is easy to use.  Workshop participants will simply need a laptop equipped with a conventional web browser (Chrome, Firefox, etc) and access to the internet.  Depending on time and interest, the Wednesday workshop will also provide an overview of Gephi.


Mathieu Jacomy
 
is a research engineer at m├ędialab in Sciences Po Paris. Web mapping and visual networks analysis are his main fields of expertise. He created different tools dedicated to digital methods in social sciences, including the free network visualization platform Gephi. At the ICT-Migration program of the Fondation Maison des Sciences de l'Homme (directed by Dana Diminescu he developed the technical parts of the e-Diasporas Atlas project. Now in Sciences Po, he is in charge of the Dime Web instrument, supporting researchers in using digital methods

Bastian M., Heymann S., Jacomy M. (2009). Gephi: an open source software for exploring and manipulating networks. International AAAI Conference on Weblogs and Social Media.

Thanks to support from: the International Institute;  UCLA Interdisciplinary and Cross-campus Affairs; the UCLA School of Law; The UCLA Graduate School of Education and Information Studies; the Irene Flecknoe Ross Lecture Series in the Department of Sociology. The Irene Flecknoe Ross Lecture Series is made possible by a gift from Ray Ross in memory of his wife.