MACHINE_LEARN Telegram 3408
LLM4Decompile: Decompiling Binary Code with Large Language Models

8 Mar 2024 · Hanzhuo Tan, Qi Luo, Jing Li, Yuqun Zhang ·

Decompilation aims to convert binary code to high-level source code, but traditional tools like Ghidra often produce results that are difficult to read and execute. Motivated by the advancements in Large Language Models (LLMs), we propose LLM4Decompile, the first and largest open-source #LLM series (1.3B to 33B) trained to decompile binary code. We optimize the LLM training process and introduce the LLM4Decompile-End models to decompile binary directly. The resulting models significantly outperform GPT-4o and Ghidra on the HumanEval and ExeBench benchmarks by over 100% in terms of re-executability rate. Additionally, we improve the standard refinement approach to fine-tune the LLM4Decompile-Ref models, enabling them to effectively refine the decompiled code from Ghidra and achieve a further 16.2% improvement over the LLM4Decompile-End. LLM4Decompile demonstrates the potential of LLMs to revolutionize binary code decompilation, delivering remarkable improvements in readability and executability while complementing conventional tools for optimal results.

Paper: https://arxiv.org/pdf/2403.05286v3.pdf

Code: https://github.com/albertan017/LLM4Decompile



@Machine_learn
👍4



tgoop.com/Machine_learn/3408
Create:
Last Update:

LLM4Decompile: Decompiling Binary Code with Large Language Models

8 Mar 2024 · Hanzhuo Tan, Qi Luo, Jing Li, Yuqun Zhang ·

Decompilation aims to convert binary code to high-level source code, but traditional tools like Ghidra often produce results that are difficult to read and execute. Motivated by the advancements in Large Language Models (LLMs), we propose LLM4Decompile, the first and largest open-source #LLM series (1.3B to 33B) trained to decompile binary code. We optimize the LLM training process and introduce the LLM4Decompile-End models to decompile binary directly. The resulting models significantly outperform GPT-4o and Ghidra on the HumanEval and ExeBench benchmarks by over 100% in terms of re-executability rate. Additionally, we improve the standard refinement approach to fine-tune the LLM4Decompile-Ref models, enabling them to effectively refine the decompiled code from Ghidra and achieve a further 16.2% improvement over the LLM4Decompile-End. LLM4Decompile demonstrates the potential of LLMs to revolutionize binary code decompilation, delivering remarkable improvements in readability and executability while complementing conventional tools for optimal results.

Paper: https://arxiv.org/pdf/2403.05286v3.pdf

Code: https://github.com/albertan017/LLM4Decompile



@Machine_learn

BY Machine learning books and papers




Share with your friend now:
tgoop.com/Machine_learn/3408

View MORE
Open in Telegram


Telegram News

Date: |

It’s easy to create a Telegram channel via desktop app or mobile app (for Android and iOS): So far, more than a dozen different members have contributed to the group, posting voice notes of themselves screaming, yelling, groaning, and wailing in various pitches and rhythms. But a Telegram statement also said: "Any requests related to political censorship or limiting human rights such as the rights to free speech or assembly are not and will not be considered." Just at this time, Bitcoin and the broader crypto market have dropped to new 2022 lows. The Bitcoin price has tanked 10 percent dropping to $20,000. On the other hand, the altcoin space is witnessing even more brutal correction. Bitcoin has dropped nearly 60 percent year-to-date and more than 70 percent since its all-time high in November 2021. Among the requests, the Brazilian electoral Court wanted to know if they could obtain data on the origins of malicious content posted on the platform. According to the TSE, this would enable the authorities to track false content and identify the user responsible for publishing it in the first place.
from us


Telegram Machine learning books and papers
FROM American