GitHub Exfiltration Indicators – The Purpose
Before we get to GitHub exfiltration indicators – GitHub is known platform for sharing open source projects and making developers’ work easier. Microsoft completed GitHub acquisition in October 2018.
Affiliate: Experience limitless no-code automation, streamline your workflows, and effortlessly transfer data between apps with Make.com.
“GitHub.com” can be used as exfiltration destination for organizational data. There is MITRE ATT&CK entry available exactly for this type of technique: MITRE ATT&CK – T1567.001 – Exfiltration Over Web Service: Exfiltration to Code Repository. Adversaries can use the service and organization employees can exfiltrate data not intentionally. In this case, your organization can use several GitHub exfiltration indicators in order to block the option, monitor or threat hunt the logs for past events to assess the “damage”. The indicators include Git CLI (command line tool) usage to upload files, GitHub API, HTTP requests to upload, edit, and create files through any web browser. Finally, your organization might have internal Git Enterprise server and there is no need for external “GitHub.com” service.
Something to Note before the indicators:
When you navigate to repository and you see 404 error, it is not necessarily non-existent. It could be that this repository is private and you do not have access.
Examples Description
Example convention names throughout the article:
user1 – example of user name on github: https://github.com/user1
repository1 – example of repository name of the user on github: https://github.com/user1/repository1
branch1 – example of branch name in the repository (in most cases, it will be “master”, but we will use branch1): https://github.com/user1/repository1/blob/branch1
file1 – example of file name inside a branch: https://github.com/user1/repository1/blob/branch1/file1
In addition, we provide Splunk query examples, meaning you will need to use field names from your specific environment.
“*” symbol is used as a wildcard in the query examples.
In Splunk there will be instances that will not require wildcards, since “-“, ” “, “.” are separators (environment settings dependent). The results will return faster without them.
GitHub Spotted Domains
github.com github.io atom.io github.community githubstatus.com githubassets.com githubusercontent.com githubapp.com *git-scm.com
Each of the above can have several subdomains. Descriptions:
github.com – Looks like the only one for sync / push / uploads.
github.io – A user can create a custom web page for his repositories as in example: https://user1.github.io/repository1.
atom.io – The domain belongs to GitHub. The tools itself is a text editor that meant to work with GitHub.
github.community – GitHub forum.
githubstatus.com – GitHub dashboard with health status of “GitHub.com” platform, includes any infrastructure incidents.
The next three are self-explanatory and used for various GitHub assets, like user avatars, JS, CSS, and Media files:
githubassets.com – GitHub asset files.
githubusercontent.com – User content files (avatars, uploaded files).
githubapp.com
collector.githubapp.com – githubapp subdomain, used as telemetry service to collect data from user’s browser when one is navigating through certain pages.
*git-scm.com – GitHub is not owner of this domain. Scott Chacon originally wrote the site content. Currently hosted in GitHub’s repository and maintained by the community. Several “docs.github.com” pages refer to this domain.
GitHub Desktop Indicators
All the sources state that “GitHub Desktop” depends on Git CLI (the command line interface that is covered next). Meaning, it will not work without it and you will see exactly the same GitHub exfiltration indicators of the “GitHub Desktop” usage as you would see of the Git CLI. So, no need to hunt / block the Desktop Application separately.
GitHub Exfiltration Indicators – Git CLI (Command Line)
Git command line switches and flags can show up in different places in the command line. Meaning that there can be a command like:
git SomeFlag SomeOtherCommand init SomeOtherFlag
So, when using a query do not search for specific command like “git init” (you will miss results like above), but for two words separately with wildcards:
*git* AND *init*
Another thing that should be noted, all the queries should be in the “Command Line” field of your event logs.
git init
The command will initialize / create empty repository or reinitialize the old one:
git init
By itself, GitHub’s git init will create local git repository out of the current folder. Probably you will see “cd” command in the logs before that (you can try to add “mkdir” command to the mix if you want to make sure). Example:
cd C:\repos\repository1
git init
Also, one can use “git init” in conjunction with the directory itself directly:
git init C:\repos\repository1
Hunting logic example for “git init”:
((*cd* AND *git* AND *init*) OR (*git* AND *init*)) NOT *YourGitEnterpriseDomain.com*
The above logic ensures that “cd” command will not show up in results separately without “git init” command usage and that “git init” in conjunction with specific directory will not be skipped. In addition, we do not want any results that contain git CLI usage and the domain of your Git Enterprise domain (if there is any).
Splunk query example:
earliest=-7d
((cd git init) OR (git init))
NOT "YourGitEnterpriseDomain.com" CommandLineField!=""
| fields TimeField CommandLineField HostNameField HostIPField UserNameField ParentProcessField
| stats count(CommandLineField) values by HostNameField
| sort -count(CommandLineField)
The above query will search for all the entries that contain (“cd” and “git” and “init”) or (“git” and “init”). All the results that contain your Enterprise Git domain will exclude, while the command line field will not be empty (since we are searching here only in the command line field). Only selected “fields” will pass to the stats command. All the original values per Host Name will be included in the result and number of Command Lines that contained the terms will appear. Finally, the “sort” command will sort results from the highest count to the lowest – it will be easier to see the most active host.
git remote
Github’s git remote “manages ‘remotes’ of the branches that you track”. The important subcommands are “add” and “set-url”.
1. Command line that adds remote destination before “git push”:
git remote add
Since the command sets the destination, the usage will be in conjunction with URL to “github.com” repository. Including the entry name that will be added, so you can call that entry later.
Example of adding repository with HTTPS:
git remote add TheRemoteEntryName https://github.com/user1/repository1.git
Example of adding repository with SSH:
git remote add TheRemoteEntryName [email protected]:user1/repository1.git
2. Command line for editing the existing remote destination of a local entry:
git remote set-url
HTTP Example:
git remote set-url TheRemoteEntryName https://github.com/user1/repository1.git
SSH Example:
git remote set-url TheRemoteEntryName [email protected]:user1/repository1.git
3. There are GitRemote Helpers that can also fall under “git remote” hunt category. There is no direct relation to “git remote” command from (1) and (2). GitRempte Helpers are “curl” related since Git does not support remote connection natively. The Helpers are executed by “fetch”, “push”, “option” commands and are used in conjunction with remote repository URL. The GitRemote Helpers include:
git-remote-http
git-remote-https
git-remote-ftp
git-remote-ftps
The “git-remote-TRANSPORT” commands will execute relatively to configured URL that one of the “push” / “fetch” / “option” commands will need to approach.
HTTP example:
git-remote-http TheRemoteEntryName http://github.com/user1/repository1.git
HTTPS example:
git-remote-https TheRemoteEntryName https://github.com/user1/repository1.git
FTP example:
git-remote-ftp TheRemoteEntryName ftp://github.com/user1/repository1.git
* Hunting logic for all the “git remote” executions:
(*git* AND *remote* AND *github.com*) NOT *YourGitEnterpriseDomain.com*
The above logic ensures that only entries with “github.com” will return along with “git remote” (all the other instances of this command are no use) and there will be no results with your Git Enterprise domain.
Splunk query:
earliest=-7d
"github.com"
git remote
NOT "YourGitEnterpriseDomain.com" CommandLineField!=""
| fields TimeField CommandLineField HostNameField HostIPField UserNameField ParentProcessField
| stats count(CommandLineField) values by HostNameField
| sort -count(CommandLineField)
git push
1. Github’s git push is the actual sync / upload command of the files from local repository to remote:
git push
This is the command that uploads the data to remote repository in conjunction with the origin entry (additional switches can appear):
git push TheRemoteEntryName
2. In addition to above you can find in the logs git http-push command:
git http-push https://github.com/user1/repository1.git
The command sends missing objects to remote repository over HTTP (also, curl powered).
3. There are git hooks custom scripts that can be used with the “git push” command.
"The pre-push hook runs during git push, after the remote refs have been updated but before any objects have been transferred".
These scripts mostly will be under “hooks” subfolder of the “git” folder. So, searching it by “*git* AND *push*” will cover, if not – do not bother. It is custom script and runs after “git push” command. If you found the “git push”, it does not matter if there is a custom hook running or not. If you are doing single host investigation, you might see this.
Path example for “pre-push” script:
git\hooks\pre-push
* Combining all the “git push” relations to this query:
(*git* AND *push*) NOT *YourGitEnterpriseDomain.com*
Splunk query:
earliest=-7d
git push
NOT "YourGitEnterpriseDomain.com" CommandLineField!=""
| fields TimeField CommandLineField HostNameField HostIPField UserNameField ParentProcessField
| stats count(CommandLineField) values by HostNameField
| sort -count(CommandLineField)
git-receive-pack
1. Git CLI git-receive-pack command is executed as part of the “git push” process, under “git send-pack” command. If you see “git-receive-pack” in the logs, it is meaning that “GitHub.com” repository (or any other specified source) will receive the information / file that you are uploading there from your end. Basically, “git send-pack” packs the information / files and “git-receive-pack” sends them to “github.com” (or any other specified source). This is important command, since we will see it in the proxy section later on.
Examples:
git-receive-pack TheRemoteEntryName
git-receive-pack 'user1/repository1.git'
2. There also can be “ssh” command executed under “git push” with the following example:
ssh [email protected] git-receive-pack 'user1/repository1.git'
* So, using only the command with wildcards should cover you. Query:
*git-receive-pack* NOT *YourGitEnterpriseDomain.com*
You can combine all the Git CLI commands in one query to optimize results.
Combining all Git CLI commands and their sub-executions to One Query
Searching for single queries will not show you the full picture. You will need to combine them and filter out the results. While combining all the queries we found several other indicators that will need to be filtered out that relate to Git installation on several platforms.
The query:
1. ((*git* AND *init*) OR (*cd* AND *git* AND *init*) OR (*git* AND *remote* AND *github.com*) OR (*git* AND *push*) OR *git-receive-pack*)
2. NOT *YourGitEnterpriseDomain.com*
3. <CommandLineField Should not be empty>
4. NOT (*git-init.xml* OR *git-init-db.xml* OR *push.xml* OR *git-receive-pack.xml*)
5. NOT (*git-init.html* OR *git-init-db.html* OR *push.html* OR *git-receive-pack.html*)
6. NOT (*git-init.1* OR *git-init-db.1* OR *push.1* OR *git-receive-pack.1*)
7. NOT (*github.com/Homebrew/* OR *github.com/repos/Homebrew/* OR *github.com/git/* OR */usr/libexec/gcc/* OR */usr/lib/gcc/* OR *push_* OR *code-push*)
8. NOT (*DSHA1DC_INIT_SAFE_HASH_DEFAULT* OR *$bindir*)
9. <The results at this point should be divided to chunks by Computer Name>
10. <Each chunk should filtered by:
CommandLineField=*github.com* AND ((CommandLineField=*git* AND CommandLineField=*push*) OR CommandLineField=*git-receive-pack*)
>
Query Explanations by line count:
1. The first line is a combination of all the separate queries from above.
2. Excludes your Git Enterprise domain if there is any.
3. Each query language is different, but here we make sure that CommandLineField is not empty, if so the specific result should be sorted out.
4th step through 6th will exclude all the filenames that relate to the commands in the main query with extensions of “*.1”, “*.html”, “*.xml”, which in their turn relate to Git CLI installation. You do not need these results in your event search.
7. “Homebrew” repositories relate to Git CLI installation on MacOS from Homebrew application. “Git” repository is used to download the source code on MacOS and Linux hosts while installing Git CLI. “/usr/libexec/gcc/” and “/usr/lib/gcc/” folders were spotted for the gcc executable that compiles the Git CLI source code that was downloaded from git repository during installation. You can add here all the “gcc” folder paths in your environment in order to omit these results. “push_” and “code-push” were included in some of the files names in parsed bash scripts in conjunction with “git” command, so these were filtered out from the search. Probably your environment will not have these.
8. “DSHA1DC_INIT_SAFE_HASH_DEFAULT” and “$bindir” relate to bash scripts parsing of Git CLI installation. These are necessary, since the scripts include the command lines of the main query and it was the easiest method to filter them out.
9. In Splunk there is an option to combine all the results into one row of events that are related to a Computer Name. This is handy for 10th step where we will filter these results.
10. All the results from above will be removed if they do not contain “github.com” in them and any instance of “git push” or “git-receive-pack” (which are the main indicators of file upload through CLI).
Splunk query example (the TimeField is cut to show only hours and minutes, since adding seconds will show too many results):
earliest=-7d
((git init) OR (cd git init) OR (git remote "github.com") OR (git push) OR "git-receive-pack")
NOT "YourGitEnterpriseDomain.com"
CommandLineField!=""
NOT ("git-init.xml" OR "git-init-db.xml" OR "push.xml" OR "git-receive-pack.xml")
NOT ("git-init.html" OR "git-init-db.html" OR "push.html" OR "git-receive-pack.html")
NOT ("git-init.1" OR "git-init-db.1" OR "push.1" OR "git-receive-pack.1")
NOT ("github.com/Homebrew/" OR "github.com/repos/Homebrew/" OR "github.com/git/" OR "/usr/libexec/gcc/" OR "/usr/lib/gcc/" OR "push_*" OR "code-push")
NOT ("DSHA1DC_INIT_SAFE_HASH_DEFAULT" OR "$bindir")
| eval TimeField=strftime(_time,"%Y-%m-%d %H:%M")
| fields TimeField CommandLineField HostNameField HostIPField UserNameField ParentProcessField
| stats count(CommandLineField) values by HostNameField
| search "values(CommandLineField)"="*github.com*" (("values(CommandLineField)"="*git*" "values(CommandLineField)"="*push*") OR "values(CommandLineField)"="*git-receive-pack*")
| sort -count(CommandLineField)
After executing several queries in your environment, you will understand if you need to exclude or include more strings.
Another problem of this query – results will be scattered for the whole week. Could be that “git push” will be executed a week ago and “git remote” command added new “github.com” repo a day ago. Meaning that the git push was not for the repository that was added later. In this case, you will need to show the time for each event, and then search for the appropriate case if there are any cases like this. The longer your time range will be, potentially the more problems you will have with results like these.
Git CLI Indicators in the Proxy Logs
We saw that “git push” executes another command under it – “git-receive-pack”. This command sends HTTP requests to “GitHub.com”, which can be found in the logs. Example:
Method: POST URL: https://github.com/user1/repository1.git/git-receive-pack Method: GET URL: https://github.com/user1/repository1.git/info/refs?service=git-receive-pack
Git CLI executable sends Both GET and POST packets – off course only the POST will upload the data to GitHub.
Both of the requests will have “git” User Agents. Example for Git UA:
git/2.29.3
Based on the above, your query logic should be:
UserAgent=*git* AND URL=*github.com* AND URL=*git-receive-pack* AND RequestMethod=POST
The query will result in all the User Agents that contain “git” string in them, while performing POST requests to “github.com” and the URL will contain “git-receive-pack” string. We can omit Git User Agent, since only Git UA sends requests to “git-receive-pack” URLs:
URL=*github.com* AND URL=*git-receive-pack* AND RequestMethod=POST
Blocking the above logic will make “git push” command from Git CLI useless and users will not be able to upload files through Git CLI to “GitHub.com”.
Splunk query:
earliest=-30d
HTTPmethodField=POST
UrlField="*github.com*"
"*git-receive-pack*"
| eval TimeField=strftime(_time,"%Y-%m-%d %H:%M")
| fields TimeField ActionField ByteOutField HTTPmethodField RefererField UserAgentField StatusCodeField SourceIPField SourceComputerNameField UrlField UserNameField
| stats values(UrlField) values by UserNameField
| sort values(TimeField)
The above Splunk query will make sure that HTTP Request Method will be POST, the URL will contain “github.com” and all the events will contain “git-receive-pack” string. All the results will be divided by User Name and sorted by time. You can also add 200 Status Codes to the mix if you want to see only allowed traffic:
StatusCodeField=2*
GitHub Exfiltration Indicators – HTTP Requests from Web Browser
Following are GitHub exfiltration indicators that we saw in the proxy logs after trying to “upload”, “delete”, “create”, “edit” files on GitHub through the web browser.
The Upload UI URL, before the file was selected:
Method: GET URL: https://github.com/user1/repository1/upload/branch1 Status Code: 200
The file was selected, the upload began until finish:
Method: POST URL: https://github-production-upload-manifest-file-#AAAA#.s3.amazonaws.com/ Status Code: 204 *** #AAAA# are 6 characters consisted of numbers and letters Method: POST URL: https://github.com/upload/manifests Status Code: 201 Method: POST URL: https://github.com/upload/policies/upload-manifest-files Status Code: 201 Method: PUT https://github.com/upload/upload-manifest-files/######### Status Code: 200 *** The last "#########" are 9 digits *** The next 2 relate to pressing [Commit] button on the Upload page Method: GET URL: https://github.com/user1/repository1/commit/<CommitID>/rollup?direction=sw Status Code: 200 Method: GET URL: https://github.com/user1/repository1/tree-commit/<SameCommitIDAsInPreviousPacket> Status Code: 200
Deleting a file:
Method: POST URL: https://github.com/user1/repository1/delete/branch1/file1.txt Status Code: 200 *** The next 3 relate to pressing [Commit] button on the Delete page Method: POST URL: https://github.com/user1/repository1/blob/branch1/file1 Status Code: 302 Method: GET URL: https://github.com/user1/repository1/commit/<CommitID>/rollup?direction=sw Status Code: 200 Method: GET URL: https://github.com/user1/repository1/tree-commit/<SameCommitIDAsInPreviousPacket> Status Code: 200
Creating new file:
Method: POST URL: https://github.com/user1/repository1/new/master Status Code: 200 *** The next 3 relate to pressing [Commit] button on the Create page Method: GET URL: https://github.com/user1/repository1/commit/<CommitID>/rollup?direction=sw Status Code: 200 Method: GET URL: https://github.com/user1/repository1/tree-commit/<SameCommitIDAsInPreviousPacket> Status Code: 200 Method: POST URL: https://github.com/user1/repository1/create/branch1 Status Code: 302
Editing a file on GitHub:
Method: POST URL: https://github.com/user1/repository1/edit/branch1/file1 Status Code: 200 *** The next 3 relate to pressing [Commit] button on the Edit page Method: GET URL: https://github.com/user1/repository1/blob/branch1/file1 Status Code: 200 Method: GET URL: https://github.com/user1/repository1/commit/<NewCommitID>/rollup?direction=e Status Code: 200 *** "direction=e" meaning "edit" Method: POST URL: https://github.com/user1/repository1/tree-save/branch1/file1 Status Code: 302
You can also check for “comment” HTTP request on GitHub if you like, but they pose lower risk.
Something to note about Commits: you cannot be a hundred percent sure that the commit was really commit if you query for it alone, since “cancel” will have a commit too. It depends if the Commit ID will be the next or the previous. Meaning, if a user wanted to edit a document and changed his mind and pressed “cancel” – you still will see a commit URL, but having previous Commit ID in it.
All the interesting parts of the URL structure under Web Browser usage:
*github.com* – In conjunction to the next strings.
*/upload/* – When file Upload UI is opened in the browser and several other URLs are generated after user press [Commit] (different directory structure).
*/new/* – New file creation UI on GitHub.
*/create/* – Relate to the “*new*” file creation, appears after user clicked on [Commit].
*/edit/* – UI of file editing.
*tree-save* – Relate to Edit, but after the user clicks [Commit].
You can also add the Amazon domain:
*s3.amazonaws.com* – In conjunction with:
*github-production-upload-manifest-file*
Query logic:
(*github.com* AND (*/upload/* OR */new/* OR */create/* OR */edit/*)) OR (*s3.amazonaws.com* AND *github-production-upload-manifest-file*)
You can block the above indicators in the proxy to stop users in your Organization to exfiltrate data to “GitHub.com” through the web browser.
Splunk query:
earliest=-30d
(("github.com" (url IN ("*/create/*" "*/edit/*" "*tree-save*" "*/new/*" "*/upload/*"))) OR ("s3.amazonaws.com" github upload))
NOT ("collector.githubapp.com")
| eval TimeField=strftime(_time,"%Y-%m-%d %H:%M")
| fields TimeField ActionField ByteOutField HTTPmethodField RefererField UserAgentField StatusCodeField SourceIPField SourceComputerNameField UrlField UserNameField
| stats values(UrlField) values by UserNameField
| sort values(TimeField)
The above Splunk query will add all the interesting “GitHub.com” URLs, including the amazonaws ones and will exclude the telemetry domain, since duplicates user’s browsing activity. You can also add 200 Status Codes to the mix if you want to see only allowed traffic:
StatusCodeField=2*
GitHub Exfiltration Indicators – GitHub API File System Commands
Another method of uploading data to “GitHub.com” is using their API. You can read about GitHub API Content usage in their documentation (including CURL examples).
1. To upload / update / create a file in repository you will need to use “PUT” HTTP Method with URL that will contain “repos” and “contents” in it:
https://api.github.com/repos/user1/repository1/contents/file1.txt
Checking the CURL command example from the API Documentation:
curl \
-X PUT \
-H "Accept: application/vnd.github.v3+json" \
https://api.github.com/repos/user1/repository/contents/file1.txt \
-d '{"message":"message","content":"content"}'
After checking some of the examples over the internet, including curl alternatives, each program will have “PUT” in its command line and the destination URL of “api.github.com” with “repos” in it. Aside from that, there can be other types of data upload through the API using POST method in conjunction with the URL of “api.github.com” and “repos”.
2. Good example is user repository creation with POST method to:
https://api.github.com/user/repos
CURL command example from GitHub API Documentation:
curl \
-X POST \
-H "Accept: application/vnd.github.v3+json" \
https://api.github.com/user/repos \
-d '{"name":"name"}'
3. There is also an option to update / upload an asset to “release” with POST HTTP request to:
https://api.github.com/repos/user1/repository1/releases/<ReleaseID>/assets
CURL command example from GitHub API Documentation:
curl \
-X POST \
-H "Accept: application/vnd.github.v3+json" \
https://api.github.com/repos/user1/repository1/releases/42/assets
* This way we can confirm the query logic for the file system commands:
*api.github.com* AND *repos* AND (PUT OR POST)
Now without “repos” string if you want to cover all the API usage:
*api.github.com* AND (PUT OR POST)
The above are the strings that you are going to search in Command Line or in parsed scripts in your logs.
Splunk query:
earliest=-7d
"api.github.com" repos (POST OR PUT)
| eval TimeField=strftime(_time,"%Y-%m-%d %H:%M")
| fields TimeField CommandLineField HostNameField HostIPField UserNameField ParentProcessField
| stats count(CommandLineField) values by HostNameField
| sort -count(CommandLineField)
Without “repos” string:
earliest=-7d
"api.github.com" (POST OR PUT)
| eval TimeField=strftime(_time,"%Y-%m-%d %H:%M")
| fields TimeField CommandLineField HostNameField HostIPField UserNameField ParentProcessField
| stats count(CommandLineField) values by HostNameField
| sort -count(CommandLineField)
GitHub Exfiltration Indicators – GitHub API through the Proxy
After executing GitHub API on the file system as commands or from scripts – the process will create web requests.
Query logic for the web requests:
URL=*api.github.com* AND URL=*repos* AND (RequestMethod=PUT OR RequestMethod=POST)
Splunk query:
earliest=-30d
"api.github.com" UrlField=*repos*
(HTTPmethodField IN (POST PUT))
| eval TimeField=strftime(_time,"%Y-%m-%d %H:%M")
| fields TimeField ActionField ByteOutField HTTPmethodField RefererField UserAgentField StatusCodeField SourceIPField SourceComputerNameField UrlField UserNameField
| stats values(UrlField) values by UserNameField
| sort values(TimeField)
Blocking GitHub Exfiltration in your Organization
You can block all the specific URLs through the wildcards that appear in the article or only part of them. Alternatively, you can use the easiest solution and block all the PUT and POST requests through your proxy to “github.com”, leaving only the login page. Meaning, this will leave the users only an option to login to their accounts, but not exfiltrate any data through the CLI, API or the web browser.
GitHub Exfiltartion Indicators – Searching Through GitHub Code Section
When you navigate to “GitHub.com” there is a search bar at the top right corner. Search there for something that is private for your organization, like internal domain that has no external internet access. It will search though all the GitHub and show you the first available results for “Repositories” section. You can review the results to see if there is anything private relating to your organization, but most important is to check the “Code” section. “Code” section will show you the actual results of your search term inside GitHub users’ content code files. To do the Code search you will need to register a free account with “GitHub.com”.
Really nice design and superb written content, nothing else we want : D.
Thank you