TY - GEN
T1 - Capturing programming content in online discussions
AU - Khayyamian, Mahdy
AU - Kim, Jihie
PY - 2013
Y1 - 2013
N2 - In this paper, we introduce a new problem: automatically capturing programming content in online discussions. We expect solving this problem helps enhance visual presentation of programming forum content, qualitative analysis of forum contributions, and forum text preprocessing and normalization. We map this problem to a sequence learning problem and use Conditional Random Fields to solve it. We compare the performance with a word-feature based baseline and a non-sequence classification method (Naïve Bayes). The best results are produced by CRF method with an F1-Score as of 86.9%. Moreover, we demonstrate that the CRF classifier maintains a good accuracy across different domains; a model learned from a C++ forum performs almost as well on other programming language forums for Java and Python. As a demonstration of how captured information can be used, we provide an example of user profiling with programming content. In particular, we correlate the percentage of programming content in student answers to the student's course performance.
AB - In this paper, we introduce a new problem: automatically capturing programming content in online discussions. We expect solving this problem helps enhance visual presentation of programming forum content, qualitative analysis of forum contributions, and forum text preprocessing and normalization. We map this problem to a sequence learning problem and use Conditional Random Fields to solve it. We compare the performance with a word-feature based baseline and a non-sequence classification method (Naïve Bayes). The best results are produced by CRF method with an F1-Score as of 86.9%. Moreover, we demonstrate that the CRF classifier maintains a good accuracy across different domains; a model learned from a C++ forum performs almost as well on other programming language forums for Java and Python. As a demonstration of how captured information can be used, we provide an example of user profiling with programming content. In particular, we correlate the percentage of programming content in student answers to the student's course performance.
KW - Identifying programming content
KW - Online forum analysis
KW - Sequence learning
KW - Text classification
UR - http://www.scopus.com/inward/record.url?scp=84883127349&partnerID=8YFLogxK
U2 - 10.1145/2479832.2479843
DO - 10.1145/2479832.2479843
M3 - Conference contribution
AN - SCOPUS:84883127349
SN - 9781450321020
T3 - Proceedings of the 7th International Conference on Knowledge Capture: "Knowledge Capture in the Age of Massive Web Data", K-CAP 2013
BT - Proceedings of the 7th International Conference on Knowledge Capture
T2 - 7th International Conference on Knowledge Capture: "Knowledge Capture in the Age of Massive Web Data", K-CAP 2013
Y2 - 23 June 2013 through 26 June 2013
ER -