Τhe objective of this project is to train a Natural Language Processing (NLP) algorithm to generate text based on the collected sparse rewards produced by a Deep Reinforcement Learning (DRL) model. In particular, a Transformer-based Natural Language Generation (NLG) model (e.g.,GPT-2) will be used to create text. At the end of a sentence another Transformer-based model fine tuned on a specific task (e.g., RoBERTa on Sentiment Analysis) will evaluate whether the goal has being accomplished (e.g., whether the NLG model has produced a positive comment). Using this pipeline the reward or the penalty of the latter will be backpropagated to the weights of the NLG model based on a DRL algorithm, such as Proximal Policy Optimization (PPO). This approach is extremely useful for augmenting textual data when it comes to tasks comprised of  few annotated data, or goal-based chatbots that want to accomplish an objective, such as to book a restaurant.

Project Details:

  • Funding Program: SPECIAL ACCOUNT FOR RESEARCH FUNDS-UNIVERSITY OF WEST ATTICA
  • Project ID:
  • Start Date: February 1, 2021 – End Date: January 31, 2022 
  • Funding : € 6.555 

Sign In

Register

Reset Password

Please enter your username or email address, you will receive a link to create a new password via email.