Datasets‎ > ‎

LABR: Large Scale Arabic Book Reviews

Description

This a set of Arabic book reviews containing over 63,000 reviews. This is the largest sentiment analysis dataset in Arabic to-date. The dataset was downloaded from www.goodreads.com during the month of March 2013. The package contains the cleaned up reviews, together with a utility class in Python that provides an easy interface to loading the standard training and tests. More information is available in the reference below and the README file.

Download

  • LABR v2.0 [11.6 MB] or browse and download the code and data from GitHub.
    • Includes standard splits of the data into training, validation, and testing, as well as scripts to reproduce the basic experiments described in [2].
    • Contains splits into three sentiment polarities: positive, negative, and neutral instead of just two classes as in version 1.

  • LABR v1.0 [8.5 MB] or browse and download the code and data from GitHub.
    • Includes standard splits of the data into training, validation, and testing, as well as scripts to reproduce the basic experiments described in [1].

Acknowledgements

  • This work is done jointly with Mahmoud Nabil and Amir Atiya.
  • Work on LABR v2.0 and the experiments described in [2] were performed by Mahmoud Nabil.

References

  1. Mohamed Aly and Amir Atiya. LABR: Large Scale Arabic Book Reviews Dataset, Meetings of the Association of Computational Linguistics (ACL), Sofia, Bulgaria, August 2013. [pdf]
  2. Mahmoud Nabil, Mohamed Aly, and Amir Atiya. LABR 2.0: Large Scale Arabic Sentiment Analysis Benchmark. arXiv e-print (arXiv:1411.6718), 2014. [pdf]
ċ
labr-1.0.zip
(8584k)
Mohamed Aly,
Aug 3, 2013, 9:14 AM
ċ
labr-2.0.zip
(11673k)
Mohamed Aly,
Mar 10, 2015, 11:09 PM
Comments