Doctoral Dissertations

Off-campus UMass Amherst users: To download campus access dissertations, please use the following link to log into our proxy server with your UMass Amherst user name and password.

Non-UMass Amherst users: Please talk to your librarian about requesting this dissertation through interlibrary loan.

Dissertations that have an embargo placed on them will not be available to anyone until the embargo expires.

Neural Approaches for Language-Agnostic Search and Recommendation

Hamed Rezanejad Asl Bonab, University of Massachusetts AmherstFollow

Author ORCID Identifier

https://orcid.org/0000-0003-2811-706X

AccessType

Open Access Dissertation

Document Type

dissertation

Degree Name

Doctor of Philosophy (PhD)

Degree Program

Computer Science

Year Degree Awarded

2022

Month Degree Awarded

September

First Advisor

James Allan

Second Advisor

W. Bruce Croft

Third Advisor

Ramesh Sitaraman

Fourth Advisor

Evangelos Kanoulas

Subject Categories

Artificial Intelligence and Robotics | Databases and Information Systems

Abstract

There are significant efforts toward developing better neural approaches for information retrieval problems. However, the vast majority of these studies are conducted using English-only data. In fact, trends and statistics of non-English content and users on the Internet show exponential growth and that novel information retrieval systems need to be language-agnostic; they need to bridge the language barrier between users and content, leverage data from high-resource settings for lower-resourced settings, and be able to extend to new languages and local markets easily. To this end, we focus on search and recommendation as two vital components of information systems. We explore some of the complex cross-lingual issues to help develop an understanding of the challenges that someone designing a neural Cross-Lingual Information Retrieval (CLIR) system will need to address. We first introduce a contrastive analysis framework for simulating low-resource settings using higher-resourced ones---named Resource Scarcity Simulation (RSS). For this, we start with a true low-resource language and systematically down-sample a high-resource language's data to become an artificial low-resource language that is statistically similar to the true low-resource one. Given that obtaining extra resources in low-resource settings are extremely expensive, using our simulation framework one could study different possible solutions in the artificially created low-resource setting and extend the findings to the real low-resource problem. We focus on parallel translation corpora and aim to better understand the factors impacting the performance of CLIR systems. We then focus on neural CLIR approaches by bridging the language gap. We show that these models are performing sub-optimally because typical Cross-Lingual Embeddings (CLE) "translate" query terms into related terms---i.e., terms that appear in a similar context---rather than synonyms in the target language. We introduce Smart Shuffling CLE, by focusing on distinguishing synonyms with related terms in the training of the embedding using a dictionary to guide the re-ordering of tokens in two translating sentences. We further show that our CLE method is able to significantly boost the performance of an off-the-shelf neural re-ranking model as well as a simple word-by-word query translation CLIR system. We follow up on this work by injecting the dictionary knowledge into the self-attention part of a pre-trained BERT-based ranking model and show a significant improvement in the retrieval performance. Finally, we go beyond CLIR and study language-agnostic search and recommendation in the e-commerce domain. Due to a lack of experimental data in this area, we first collect and release XMarket, a large dataset covering 18 local e-commerce markets in 11 different languages. We focus on the market adaptation problem and using XMarket, we first study the problem of recommending relevant products to users in relatively resource-scarce markets by leveraging data from similar, richer in resources, auxiliary markets. Then, we further extend our findings toward a universal language-agnostic recommendation system by utilizing multilingual contents from multiple markets. Lastly, we construct a product search benchmark using our XMarket dataset and study language-agnostic product search performance across markets for single- and cross-market training scenarios. Our experiments suggest that training universal language-agnostic retrieval systems is challenging and not always training a model with data from multiple markets can help the overall performance. Our proposed language-agnostic universal recommendation model, named FOREC-XCB, demonstrates a robust effectiveness by leveraging data from multiple markets and languages and improves the performance for each target market when compared to strong baselines.

DOI

https://doi.org/10.7275/30977302

Recommended Citation

Rezanejad Asl Bonab, Hamed, "Neural Approaches for Language-Agnostic Search and Recommendation" (2022). Doctoral Dissertations. 2716.
https://doi.org/10.7275/30977302 https://scholarworks.umass.edu/dissertations_2/2716

Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 License.

Download

Included in

Artificial Intelligence and Robotics Commons, Databases and Information Systems Commons

COinS

ScholarWorks@UMass Amherst

Doctoral Dissertations

Neural Approaches for Language-Agnostic Search and Recommendation

Author ORCID Identifier

AccessType

Document Type

Degree Name

Degree Program

Year Degree Awarded

Month Degree Awarded

First Advisor

Second Advisor

Third Advisor

Fourth Advisor

Subject Categories

Abstract

DOI

Recommended Citation

Creative Commons License

Included in

Browse

Author Corner

Links

ScholarWorks@UMass Amherst

Doctoral Dissertations

Neural Approaches for Language-Agnostic Search and Recommendation

Author

Author ORCID Identifier

AccessType

Document Type

Degree Name

Degree Program

Year Degree Awarded

Month Degree Awarded

First Advisor

Second Advisor

Third Advisor

Fourth Advisor

Subject Categories

Abstract

DOI

Recommended Citation

Creative Commons License

Included in

Share

Browse

Author Corner

Links