Information Retrieval

Assignment 2

Plagiarism

You are reminded that this work is for credit towards the composite mark in CE306/CE706 and that the work you submit must therefore be your own. Any material you make use of, whether it be from textbooks, the Web or any other source must be acknowledged as a comment in the program, and the extent of the reference indicated.

The context of your task

To properly evaluate a system, your test information needs must be germane (relevant) to the documents in the test document collection, and appropriate for predicted usage of the system. Given information needs and documents, you need to collect relevance assessments. This is a time-consuming and expensive process involving human beings (in this case you). For tiny collections, exhaustive judgments of relevance for each query and document pair can be obtained. For large modern collections, it is usual for relevance to be assessed only for a subset of the documents for each query. The most standard approach is pooling, where relevance is assessed over a subset of the collection that is formed from the top k documents returned by  many different IR systems (usually the ones to be evaluated).

The Document Collection (dataset) for this assignment you will use the dataset that you used in the first assignment (Wikipedia Movie Plots or COVID-19 Open Research Dataset, for CE306 and CE706 respectively).

Your task

This task comes in stages. Marks are given for each stage. The stages are as follows:

  • Building a Test Collection (10%) Imagine you would like to explore what search engine settings are most suitable for the collection you are indexing, to make searching as effective and efficient as possible. To start with this you should devise a small test collection that contains a number of queries, together with their expected results.
    • Identify three information needs covered by the collection and then compose a sample queries for each.
  • IR systems (20%) You are going to compare 2 IR systems. In the first assignment, you built an IR system, that would be your system 1. For your system 2, you can then vary different parameters. You could for example change the pre-processing pipeline by comparing a system that uses stemming with one that does not. However, this will require you to re-index the collection. Alternatively, you might want to try different retrieval models such as Boolean versus TF.IDF.
  • Pooling (10%) You will construct your pool by putting together the top 10 retrieval results from your 2 IR systems (your original from assignment 1 and the newly created one). You need to do this for each of your three queries. In the next step, you will judge every document in this pool.
    • N.B. Documents outside the pool are automatically considered to be irrelevant (Sparck Jones and van Rijsbergen, 1975)
  • Assessing relevance (20%) You will provide the binary relevance judgements. A document is either relevant or non-relevant (not relevant) for an information need.
    • For each information need pair (query) you need to assess if each document in the pool is relevant or not (if it satisfies the information need).
  • Evaluation (30%) Once you have a test collection you can explore the effect of each IR system on the evaluation results. To do that you need to identify a suitable metric. Use P@5 and R@5 as the metric of choice for this assignment.

Tasks in summary: Using the dataset from assignment 1, decide on 3 pieces of information you want to learn from the dataset. Use your original IR system from assignment 1 and a modified version to retrieve the answers from the dataset. You will then create a pool and assess the relevance of the documets in the pool given each of the queries. Finally, you will compare both systems in terms of P@5 and R@5.

You will have noticed that the percentages above only add up to 90%. This is because one of the important aspects of the project is that your work should be well documented. 10% of your mark will come from this.  The report should contain:

  • Design and design decisions/justifications of your overall architecture
  • The actual ground truth data that make up your test collection (i.e. queries with their matching documents)
  • Evaluation results
  • Discussion of your solution focusing on the comparison of both systems.

The report does not need to be long as long as it addresses all the above points.

Software

The backend search engine to be used is Elasticsearch. Apart from that you are free to write additional code in any language of your choice and employ any open-source tool that you find suitable.

Submission

You should submit:

  • Report (use the template below)

The submission should be submitted as a single pdf file via the electronic submission system. Please check the details of the submission deadline with the CSEE School Office.

The guidelines about late assignments are explained in the students’ handbook.

CE306 or CE706 – Information Retrieval 2021

Assigment 2

Student ID

Test collection (Task 1)

Include here the selected information needs and how they will be represented as a query.

Information needQuery
  
  
  

IR systems (Task 2)

Include here the details of your two IR systems and the difference between them.

Pool method (Task 3)

For each method retrieve the top 10 documents. Therefore for each query, you will have a maximum of 20 documents.

Query# different documentsId of the documents retrieve by System 1Id of the documents retrieve by System 2

Relevance assessments (Task 4)

To be consistent with all the queries, you need to define criteria to judge if a document is relevant for an information need. The same criteria should be used for all the queries. Notice that only containing the same words is not a valid criterion.

Relevance criteria:

QueryID of relevant documents

Evaluation (Task 5)

Include here the details of how you did this step including any issue that you had and how did you face it.  You may include screenshots to clarify.

 System 1System 2
 P@5R@5P@5R@5
Q1    
Q2    
Q3    

Discussion:  Include  the discussion of your solution focusing on the comparison of both systems.

Research Paper 101
Calculate your paper price
Pages (550 words)
Approximate price: -

Reasons to trust Research Paper 101

On Time Delivery

We pride ourselves in meeting the deadlines of our customers. We take your order, assign a writer but allow some more time for ourselves to edit the paper before delivering to you. You are guaranteed a flawless paper on a timely manner...

24x7 Customer Live Support

Our team at Research Paper 101 is committed to handling your paper according to the specfications and are available 24*7 for communication. Whenever you need a quick help, you can talk to our writers via the system messaging or contact support via live chat and we will deliver your message instantly.

Experienced Subject Experts

Online Experts from Research Paper 101 are qualified both academically and in their experiences. Many are Masters and Phd holders and therefore, are qualified to handle complex assignments that require critical thinking and analyses...

Customer Satisfaction

We offer dissertation papers as per students’ wishes. We also help craft out the best topics and design concept papers. By ordering with us, you are guaranteed of defending and making through those hard faced professors in the defense panel!

100% Plagiarism Free

We at Research Paper 101 take plagiarism as a serious offence. From the start, we train our writers to write all their papers from scratch. We also check if the papers have been cited appropriately. Our website also has a tool designed to check for plagiarism that has been made erroniusly. In essense, the paper you get will be 100% legit...

Affordable Prices

We understand that being a student is very challenging, some students balance between work and studies in order to survive. We therefore offer pocket friendly rates that are very competitive in the market.

Try it now!

Calculate the price of your order

Total price:
$0.00

How it works?

Follow these simple steps to get your paper done

Place your order

Fill in the order form and provide all details of your assignment.

Proceed with the payment

Choose the payment system that suits you most.

Receive the final file

Once your paper is ready, we will email it to you.

Our Services

No need to work on your paper at night. Sleep tight, we will cover your back. We offer all kinds of writing services.

Essays

Essay Writing Service

No matter what kind of academic paper you need and how urgent you need it, you are welcome to choose your academic level and the type of your paper at an affordable price. We take care of all your paper needs and give a 24/7 customer care support system.

Admissions

Admission Essays & Business Writing Help

An admission essay is an essay or other written statement by a candidate, often a potential student enrolling in a college, university, or graduate school. You can be rest assurred that through our service we will write the best admission essay for you.

Reviews

Editing Support

Our academic writers and editors make the necessary changes to your paper so that it is polished. We also format your document by correctly quoting the sources and creating reference lists in the formats APA, Harvard, MLA, Chicago / Turabian.

Reviews

Revision Support

If you think your paper could be improved, you can request a review. In this case, your paper will be checked by the writer or assigned to an editor. You can use this option as many times as you see fit. This is free because we want you to be completely satisfied with the service offered.

error: