A data type inference method based on long short-term memory by improved feature for weakness analysis in binary code

Research output: Contribution to journalArticlepeer-review

7 Scopus citations

Abstract

As software is used in various areas today, software security has become a crucial issue. Third-party libraries, which play a major role in software development, pose difficulties in analyzing and testing software security. It is essential to know the variables used in software and the data type information of each variable in order to identify the major weaknesses in the software. However, because the third-party library is generally of the binary code form, the variables, variable data type, program syntax, and semantic information in the source code are removed. Therefore, reconstructing the variables used and the data type information of the variables from binary code is the most important step in weak point analysis. Traditionally, this step of reconstructing information is based on pattern matching; however, the inference of data types is limited. We herein proposed a method of inferring data types using deep learning for variables determined based on pattern matching in binary code, and analyzed its performance. The proposed study has improved the feature generation method to solve the inconsistent problems of the features generated in the previous studies. As a result, the accuracy of prediction of float and double is improved by average 7.2% compared to the previous study, and the result is that the accuracy of 5.1% is increased overall.

Original languageEnglish
Pages (from-to)1044-1052
Number of pages9
JournalFuture Generation Computer Systems
Volume100
DOIs
StatePublished - Nov 2019

Keywords

  • Binary code
  • Data type inference
  • Long short-term memory
  • Reconstruction data information
  • Software weakness

Fingerprint

Dive into the research topics of 'A data type inference method based on long short-term memory by improved feature for weakness analysis in binary code'. Together they form a unique fingerprint.

Cite this