Seminar Managing Information on the Web


Tova Milo, Spring 2011


Seminar Information

The seminar focuses on managing, analyzing, sharing, and integrating data and applications across multiple sources, either on the Internet or at enterprises. This topic has received much attention in the database, AI, Web, IR and verification communities. We shall read recent papers in this area, focusing on several specific issues, and then explore possible future directions. A list of tentative topics/papers is enclosed.




         Data Exchange, Extraction and Integration

1.       Evaluating Entity Resolution Results, David Menestrina, Steven Whang, Hector Garcia-Molina, VLDB 2010 (Barak Cohen 2/3)

2.       Automatic Rule Refinement for Information Extraction, Bin Liu, Laura Chiticariu, Vivian Chu, H. Jagadish, Frederick Reiss, VLDB 2010 (Rinat Pichker 9/3)

3.       Exploiting Content Redundancy for Web Information Extraction, Pankaj Gulhane, Rajeev Rastogi, Srinivasan Sengamedu, Ashwin Tengli, VLDB 2010 (Evgeny Budilovsky 16/3)

4.       Entity Resolution with Evolving Rules, Steven Whang, Hector Garcia-Molina, VLDB 2010 (Eran Kravitz 30/3)

5.       MapMerge: Correlating Independent Schema Mappings, Bogdan Alexe, Mauricio Hernandez, Lucian Popa, Wang-Chiew Tan, VLDB 2010



         Web, Recommendations and Social Networks

1.       Active Knowledge: Dynamically Enriching RDF Knowledge Bases by Web Services, Nicoleta Preda; Fabian Suchanek, Gjergji Kasneci, Thomas Neumann, Wenjun Yuan, Gerhard Weikum SIGMOD 2010 (Amit Somech 6/4)

2.       Human-Assisted Graph Search: It's Okay to Ask Questions , A. Parameswaran, A. Das Sarma, H. Garcia-Molina, N. Polyzotis, J. Widom. To appear in VLDB 2011 (Bar Avidan 27/4)

3.       Load-Balanced Query Dissemination in Democratic Communities, Emiran Curtmola; Alin Deutsch; K.K. Ramakrishnan; Divesh Srivastava SIGMOD 2010 (Ofir Weisse 4/5)




1.       TRAMP: Understanding the Behavior of Schema Mappings through Provenance, Boris Glavic, Gustavo Alonso, Renée Miller, Laura Haas, VLDB 2010 (Yaron Margalit 11/5)

2.       Querying Data Provenance, Grigoris Karvounarakis; Zachary Ives, Val Tannen SIGMOD 2010 (Hila Cohen 18/5)

3.       Efficient Querying and Maintenance of Network Provenance at Internet-Scale, Wenchao Zhou; Micah Sherr; Tao Tao; Xiaozhou Li; Boon Thau Loo; Yun Mao SIGMOD 2010


         Cloud Computing

1.       Hadoop++: Making a Yellow Elephant Run Like a Cheetah (Without It Even Noticing), Jens Dittrich, Jorge Quiane, Alekh Jindal, Yagiz Kargin, Vinay Setty, Jörg Schad, VLDB 2010 (Alexandra Shpindovsky 1/6)

2.       MRShare: Sharing Across Multiple Queries in MapReduce, Tomasz Nykiel, Michalis Potamias, Chaitanya Mishra, George Kollios, Nick Koudas, VLDB 2010 (Itay Maoz 9/6)


         Probabilistic Data

1.       Querying Probabilistic Information Extraction , Daisy Zhe Wang, Michael Franklin, Minos Garofalakis, Joseph Hellerstein, VLDB 2010

2.       Lineage Processing over Correlated Probabilistic Databases, BHARGAV KANAGAL, University of Maryland; Amol Deshpande, Univ of Maryland SIGMOD 2010

3.       Evaluation of probabilistic threshold queries in MCDB, Luis Perez, Rice University; Subi Arumugam, U Florida; Christopher Jermaine, Rice U. SIGMOD 2010

4.       MCDB-R: Risk Analysis in the Database, MCDB-R: Risk Analysis in the Database