Black Hat 2020: xGitGuard uses AI to detect inadvertently exposed data on GitHub

GitHub is often praised for offering a platform for developers to share their open source code and tools that they develop.

However, some developers often unknowingly, or inadvertently, neglect to remove sensitive information such as API tokens and user credentials from their code prior to posting it on GitHub.

Mistakes of this kind can expose an organization’s internal secrets and tokens to harvesting and potential misuse.

Security researchers at Comcast have developed a tool that detects organizations’ secrets and user credentials in cases where they inadvertently spill onto GitHub. The tool, called xGitGuard, is designed to be both scalable and rapid.

The tool was demonstrated during an Arsenal session at the Black Hat 2020 virtual conference on Thursday (August 7).

“xGitGuard takes advantage of a new text processing algorithm that can find secrets within files with a high level of accuracy,” according to Comcast. “This can significantly help operations to take proper actions in timely manner.”

Bahman Rashidi, senior cybersecurity researcher and system architect at Comcast, told The Daily Swig the tool is lean and fast at picking out stray secrets in code repositories thanks to the use of machine learning techniques.

Read more of the latest news from Black Hat 2020

“xGitGuard is an AI-based tool designed to detect if and when private information is improperly posted on GitHub (such as passwords or credentials),” Rashidi explained.

“Comcast technologists developed the tool to address a challenge common in software development environments where large numbers of developers are using GitHub on a daily basis.”

“The AI technology powering the tool works fast, even at enterprise scale,” he added.

The tool takes advantage of a new algorithm that can find “private information located within files and the context around them with a high level of accuracy”, according to Comcast.

Spilling secrets
Inadvertently exposing secrets through GitHub and other similar repositories is a known problem in infosec. Various tools to automatically detect leaks have been developed before, but Comcast says its utility overcomes previous limitations.

xGitGuard offers lower false positive detections due to the use of highly efficient and accurate text processing algorithms, says Comcast.

Rashidi explained: “The key to xGitGuard’s effectiveness is the way queries are generated.

“GitHub contains more than 30 million repositories and with API limitations, it is challenging to find secrets in a timely manner. xGitGuard introduces a new search approach that can utilize the existing GitHub search API and review search results at scale.”

Notify of

This site uses Akismet to reduce spam. Learn how your comment data is processed.

Inline Feedbacks
View all comments
Would love your thoughts, please comment.x