Capturing programming content in online discussions

Mahdy Khayyamian, Jihie Kim

Research output: Chapter in Book/Report/Conference proceedingConference contributionpeer-review

Abstract

In this paper, we introduce a new problem: automatically capturing programming content in online discussions. We expect solving this problem helps enhance visual presentation of programming forum content, qualitative analysis of forum contributions, and forum text preprocessing and normalization. We map this problem to a sequence learning problem and use Conditional Random Fields to solve it. We compare the performance with a word-feature based baseline and a non-sequence classification method (Naïve Bayes). The best results are produced by CRF method with an F1-Score as of 86.9%. Moreover, we demonstrate that the CRF classifier maintains a good accuracy across different domains; a model learned from a C++ forum performs almost as well on other programming language forums for Java and Python. As a demonstration of how captured information can be used, we provide an example of user profiling with programming content. In particular, we correlate the percentage of programming content in student answers to the student's course performance.

Original languageEnglish
Title of host publicationProceedings of the 7th International Conference on Knowledge Capture
Subtitle of host publication"Knowledge Capture in the Age of Massive Web Data", K-CAP 2013
DOIs
StatePublished - 2013
Event7th International Conference on Knowledge Capture: "Knowledge Capture in the Age of Massive Web Data", K-CAP 2013 - Banff, AB, Canada
Duration: 23 Jun 201326 Jun 2013

Publication series

NameProceedings of the 7th International Conference on Knowledge Capture: "Knowledge Capture in the Age of Massive Web Data", K-CAP 2013

Conference

Conference7th International Conference on Knowledge Capture: "Knowledge Capture in the Age of Massive Web Data", K-CAP 2013
Country/TerritoryCanada
CityBanff, AB
Period23/06/1326/06/13

Keywords

  • Identifying programming content
  • Online forum analysis
  • Sequence learning
  • Text classification

Fingerprint

Dive into the research topics of 'Capturing programming content in online discussions'. Together they form a unique fingerprint.

Cite this