myCLASS

MyCLASS is an automatic document classifier.

Historic

The first version of this project dates from 2003 (doc2003). Since then, it has been regularly improved to handle large training corpus (30 million documents) and large classifications (70,000 classes) more and more voluminous. It has been used mainly in the field of patent classification by WIPO (see springer2011). MYCLASS has also been competing in the CLEF conferences (clefguyot2010) (clefpiroi2010). These last years have privileged the performance and the quality of the training. New perspectives are opened by using machine translation in order to have only one classifier in a pivot language see (wipo2017) and (wipo2018). The classifier has also been used with news corpus with Reuters and Dow Jones. We have experimented and compared that the use of neural networks is efficient and sufficient for the classification of text, in the case where the texts do not contain “hidden” information. For example, the “hidden” information is defined in the style of the text (irony, cynicism, emphasis, …). In these cases the use of “Deep Learning” is often more appropriate (see (ai2015))

Project

This project aims to:

  • To facilitate the installation of the classification tools.
  • Create a classification environment for train and test.
  • To introduce students to classification technology.
  • To be a ready-to-use solution in the field of patent classification.

Who can use these tools?

  • Students who wish to complete their knowledge of classification tools
  • Teachers wishing to experience this kind of classifier
  • The independent professional wanting to explore the possibilities of these tools
  • An organization willing to explore and implement automatic classification in its processes

How to contribute to this project?

  • By installing and using these tools
  • By adding information in this wiki to facilitate the use of these tools
  • translating these pages in other languages
  • by offering new features or new tools in the basic installation
  • adding new resources (corpus for test) in github under your account.

How to start?