EzDev.org

bfg-repo-cleaner

Removes large or troublesome blobs like git-filter-branch does, but faster. And written in Scala BFG Repo-Cleaner by rtyley a simpler, faster alternative to git-filter-branch for deleting big files and removing passwords from git history.


Git BFG to retroactively enable LFS - protected commits issue

I have large files and was attempting to use the new Git LFS system.

I posted this question - Git lfs - "this exceeds GitHub's file size limit of 100.00 MB"

Edward Thomson correctly identified my issue - you cannot use LFS retroactively. He suggested I use BFG LFS support

This worked to a degree. The vast majority of my files were changed. However, there were protected commits that were not altered.

Of these protected commits some were over 100.00MB and thus caused a remote:error from github

Protected commits
-----------------

These are your protected commits, and so their contents will NOT be altered:

 * commit c7cd871b (protected by 'HEAD') - contains 165 dirty files :
    - Directions_api/Applications/LTDS/Cycling/Leisure/l__cyc.csv (147.3 KB)
    - Directions_api/Applications/LTDS/Cycling/Work/w_cyc.csv (434.0 KB)
    - ...

WARNING: The dirty content above may be removed from other commits, but as
the *protected* commits still use it, it will STILL exist in your repository.

If you *really* want this content gone, make a manual commit that removes it,
and then run the BFG on a fresh copy of your repo.

First of all - can someone explain why these commits are protected and different from those that BFG successfully changed?

Secondly - how can I unprotect these and allow BFG to edit them, thus allowing me to use LFS correctly and finally push successfully to GitHub


Source: (StackOverflow)

Permanently removing binary files from GitLab repos

We have a GitLab-hosted repo at work that contains some large binary files that we'd like to remove. I know of tools such as BFG Repo-Cleaner which will remove a file from a Git repository.

We often refer to specific commit IDs in GitLab. Would running BFG Repo-Cleaner mess these up?

If so, is there a better way to clean a repo that wouldn't mess these up?


Source: (StackOverflow)

Git/Bitbucket - Remove file from entire history/commits

Following the directions for BFG I proceeded to remove a private file that should not be on the repo/commit history.

https://rtyley.github.io/bfg-repo-cleaner/

I ran

$ bfg --delete-files .private  my-repo.git

and pushed the changes however, it caused me to merge the master branch and the file still shows, the code is in the .private file and all commits are still in the history.

How can I remove .private from the entire repo's commit history etc?


Source: (StackOverflow)

Possible to undo permanent deletion of file?

A colleague of mine attempted to permanently remove a file (Diff.java) from the history of our GitHub repo.

He had good reasons for wanting to do this, however something seems to have gone wrong as we seem to have lost quite a few files which have been replaced by equivalent files with the suffix .REMOVED.git-id. For example ivy-2.2.0.jar -> ivy-2.2.0.jar.REMOVED.git-id.

I have managed to repair the main development branch as I happened to have a copy locally. However there are many historical branches for development lines and tags for releases that now seem to be broken in the way described above.

I understand that he ran a process similar to:

$ git clone --mirror git://example.com/some-big-repo.git
$ java -jar bfg-1.12.3.jar --strip-biggest-blobs 500 some-big-repo
$ cd some-big-repo
$ git reflog expire --expire=now --all && git gc --prune=now --aggressive
$ git push

$ cd ..
$ java -jar bfg-1.12.3.jar --delete-files Diff.java some-big-repo
$ cd some-big-repo
$ git push

I am guessing that the process was destructive, and there is no way to recover unless we happen to have a clean mirror somewhere before this happened. Can anyone confirm or offer some advice?


Source: (StackOverflow)

Anyway around a fresh clone when using BFG Repo-Cleaner?

When using BFG Repo-Cleaner is there any way around not having everyone do a fresh clone? With a large team and multiple branches it is difficult to organize this. I am willing to run bfg multiple times should something be reintroduced as long as I don't have to have everyone re-clone the repo.

I'm thinking remove the files (ie private keys) from history, add them to .gitignore file, git push, and have the team rebase their branch.

Hoping Roberto Tyley sees this and can offer some advice.

Cheers!


Source: (StackOverflow)

BFG remove multiple folders

I found that BFG is much faster than the original git-filter-branch.

We have multiple svn repo to move to even more git repositories, this implies some repository folder merges and splits. During the process I need to remove a set of root folders and I'd like to remove those to the whole history.

I tried to use the BFG --delete-folders and it works fine for one single folder but I did not find a way to delete multiple folders. Is it even possible ? or shall I loop to call BFG as many times as I have folders to remove ?

Thanks for any help.


Source: (StackOverflow)

How to shrink Git repo and better manage large files

I've used the BFG repo cleaner in the past with great success. I've also recently started tinkering with Git LFS, although I'm still learning.

I had previously .gitignored many large files, and when I started tracking then with LFS, the .git/ folder became huge (due to .git/lfs/objects, mostly).

  1. By tracking files with Git LFS, am I going to massively increase my repo size right off the bat? That's been my experience so far, but it seems unnecessary (perhaps).

I see that BFG now supports Git LFS. It says I can use that command to reduce the size of my repo, and track with LFS.

  1. Is there a way to specify the size of the file to clean with BFG then track with LFS, or does it simply perform these actions for all files matched (based on name)?

I've been experimenting to try and find the answers to these questions, but progress has been slow.


Source: (StackOverflow)

How to batch-replace *all* instances (content, filenames and commit messages) of *Foo* to *Bar* in a repo in a single, simple step?

Suppose I have a giant repo for an as-of-yet unpublished software product called "Hammerstein", written by the famous German software company "Apfel" of which I am an employee.

One day, "Apfel" spins out the Hammerstein division and sells it to the even more famous company "OrĂ¡culo" which renames "Hammerstein" to "Reineta" as a matter of national pride and decides to open source it.

Agreements mandate that all references to "Hammerstein" and "Apfel" be replaced by "OrĂ¡culo" and "Reineta" in the repository.

All filenames, all commit messages, everything must be replaced.

So, for example:

  1. src/core/ApfelCore/main.cpp must become src/core/OraculoCore/main.cpp.

  2. The commit message that says "Add support for Apfel Groupware Server" must become "Add support for Oraculo Groupware Server"

  3. The strings ApfelServerInstance* local_apfel, #define REINETA and Url("http://apfel.de") must become OraculoServerInstance* local_oraculo, #define HAMMERSTEIN, etc.

This applies to files that are not in HEAD anymore as well.

What is the simplest and most pain-free method to achieve it with minimal manual intervention (so that it can be applied in batch to a potentially large number of repositories/assets)?

  1. BFG can replace the strings, but it seems to only have a --delete-file option, not a --rename-file, and even then it does not take patterns as an argument
  2. This approach seems to work only for HEAD and not for the whole history; I have had no luck using it with --tree-filter

Source: (StackOverflow)

Slim down a git repository with bfg

Faced with anarchic add of binary files by a coder, how to slim down a git repository not only removing the problematic files but also their history in the tree.

I tried using bfg but as it works on mirrored bare repository I've been faced with difficulties in getting the whole workflow, needing to gather answers from different places on the web.


Source: (StackOverflow)

Need to clone repo after using bfg repo-cleaner or pull on existing?

Reading the instructions for bfg repo-cleaner, the work flow seems like:

  1. clone the repo using the --mirror option
  2. strip the repo from unwanted items using bfg
  3. use git gc to physically remove the items
  4. do a push of the cleaned repo

However, then it is unclear to me whether you need to remove your own copy of the working directory and do a fresh clone, or whether you can just do a pull to get the clean repo/history? At the moment I am the only one who uses the repo.


Source: (StackOverflow)

Why is BFG changing my latest commit?

git filter-branch was taking a long time. Happily, I found BFG repo-cleaner.

But it is unexpectedly changing the contents of my last commit.

$ git clone --mirror example.com:/repo.git
$ cd repo.git
$ git log HEAD^!
commit 5f737d28756d4854d25899632abffe7cca2c7423
Author: Paul Draper <paul@example.com>
Date:   Sat Jan 24 19:31:47 2015 -0700

    Fix /contact and /folderEntries/listFoldersSimple
$ git diff --stat HEAD^!
 cake/app/controllers/folder_entries_controller.php |     1 +

And now I clean.

$ java -jar ~/bfg-1.12.0.jar -b 1M
...
In total, 161797 object ids were changed. Full details are logged here:
...
$ git log HEAD^!
commit 3ff700cebe32497423435b416ea11169b7fcbf90
Author: Paul Draper <paul@example.com>
Date:   Sat Jan 24 19:31:47 2015 -0700

    Fix /contact and /folderEntries/listFoldersSimple


    Former-commit-id: 5f737d28756d4854d25899632abffe7cca2c7423
$ git diff --stat HEAD^!
     cake/app/controllers/folder_entries_controller.php |     1 +
 .../lucidchart-tools/caja/ant-jars/guava-r09.jar   |   Bin 0 -> 1141964 bytes
 .../caja/ant-jars/guava-r09.jar.REMOVED.git-id     |     1 -
 cake/app/lucidchart-tools/caja/ant-jars/js.jar     |   Bin 0 -> 1122370 bytes
 .../caja/ant-jars/js.jar.REMOVED.git-id            |     1 -
 .../lucidchart-tools/caja/ant-jars/pluginc-src.jar |   Bin 0 -> 5172676 bytes
 .../caja/ant-jars/pluginc-src.jar.REMOVED.git-id   |     1 -
 .../app/lucidchart-tools/caja/ant-jars/pluginc.jar |   Bin 0 -> 2959487 bytes
 .../caja/ant-jars/pluginc.jar.REMOVED.git-id       |     1 -
 .../lucidchart-tools/caja/ant-jars/xercesImpl.jar  |   Bin 0 -> 1229125 bytes
 .../caja/ant-jars/xercesImpl.jar.REMOVED.git-id    |     1 -
 cake/app/lucidchart-tools/jsdoc/rhino/js.jar       |   Bin 0 -> 1111429 bytes
 .../jsdoc/rhino/js.jar.REMOVED.git-id              |     1 -
 cake/app/lucidchart-tools/selenium/chromedriver    |   Bin 0 -> 5778064 bytes
 .../selenium/chromedriver.REMOVED.git-id           |     1 -
 .../selenium/selenium-server-standalone-2.37.0.jar |   Bin 0 -> 34730734 bytes
 ...ium-server-standalone-2.37.0.jar.REMOVED.git-id |     1 -
 .../selenium-server-standalone-2.42.2-mod.jar      |   Bin 0 -> 34873583 bytes
 ...server-standalone-2.42.2-mod.jar.REMOVED.git-id |     1 -
 .../selenium/selenium-server-standalone-2.42.2.jar |   Bin 0 -> 34823352 bytes
 ...ium-server-standalone-2.42.2.jar.REMOVED.git-id |     1 -
 .../lucidchart-tools/test-runner-1.0-SNAPSHOT.jar  |   Bin 0 -> 9732125 bytes
 .../test-runner-1.0-SNAPSHOT.jar.REMOVED.git-id    |     1 -
 .../CommandLine/Scaffolders/DefaultScaffolder.phar |   Bin 0 -> 4404199 bytes
 .../DefaultScaffolder.phar.REMOVED.git-id          |     1 -
 .../WebPICmdLine/Microsoft.Web.Deployment.dll      |   Bin 0 -> 1201991 bytes
 .../Microsoft.Web.Deployment.dll.REMOVED.git-id    |     1 -
 cake/app/vendors/aws.phar                          |   Bin 0 -> 6784935 bytes
 cake/app/vendors/aws.phar.REMOVED.git-id           |     1 -
 .../tcpdf/fonts/dejavu-fonts-ttf-2.33/status.txt   |  6657 +++++
 .../status.txt.REMOVED.git-id                      |     1 -
 cake/app/vendors/tcpdf/tcpdf.php                   | 28808 +++++++++++++++++++
 cake/app/vendors/tcpdf/tcpdf.php.REMOVED.git-id    |     1 -
 .../img/onboarding-chart/04_shape manager.gif      |   Bin 0 -> 1413721 bytes
 .../04_shape manager.gif.REMOVED.git-id            |     1 -
 cake/app/webroot/img/onboarding-chart/05_share.gif |   Bin 0 -> 1341876 bytes
 .../onboarding-chart/05_share.gif.REMOVED.git-id   |     1 -
 .../js/closure/usage/rhino/javadoc/index-all.html  | 12027 ++++++++
 .../rhino/javadoc/index-all.html.REMOVED.git-id    |     1 -
 cake/app/webroot/js/closure/usage/rhino/js-14.jar  |   Bin 0 -> 1471932 bytes
 .../closure/usage/rhino/js-14.jar.REMOVED.git-id   |     1 -
 cake/app/webroot/js/closure/usage/rhino/js.jar     |   Bin 0 -> 1134765 bytes
 .../js/closure/usage/rhino/js.jar.REMOVED.git-id   |     1 -
 .../js/closure/usage/rhino/testsrc/tests.tar.gz    |   Bin 0 -> 1778543 bytes
 .../rhino/testsrc/tests.tar.gz.REMOVED.git-id      |     1 -
 cake/app/webroot/js/mathquill/font/Symbola.svg     |  5102 ++++
 .../js/mathquill/font/Symbola.svg.REMOVED.git-id   |     1 -
 .../webroot/js/templates/SoyToJsSrcCompiler.jar    |   Bin 0 -> 2154164 bytes
 .../SoyToJsSrcCompiler.jar.REMOVED.git-id          |     1 -
 cake/app/webroot/persona-pages/img/gif-v3.gif      |   Bin 0 -> 1570363 bytes
 .../persona-pages/img/gif-v3.gif.REMOVED.git-id    |     1 -
 .../webroot/persona-pages/img/interactive-gif.gif  |   Bin 0 -> 1434134 bytes
 .../img/interactive-gif.gif.REMOVED.git-id         |     1 -
 cake/build/closure/compiler.jar                    |   Bin 0 -> 6007184 bytes
 cake/build/closure/compiler.jar.REMOVED.git-id     |     1 -
 .../lucidchart-mobile-sliders-landscape-4.png      |   Bin 0 -> 1718536 bytes
 ...t-mobile-sliders-landscape-4.png.REMOVED.git-id |     1 -
 .../lucidchart-mobile-sliders-portrait-4.png       |   Bin 0 -> 1614308 bytes
 ...rt-mobile-sliders-portrait-4.png.REMOVED.git-id |     1 -
 .../Versions/A/OCHamcrestIOS                       |   Bin 0 -> 3671740 bytes
 .../Versions/A/OCHamcrestIOS.REMOVED.git-id        |     1 -
 .../OCMockitoIOS.framework/Versions/A/OCMockitoIOS |   Bin 0 -> 1299132 bytes
 .../Versions/A/OCMockitoIOS.REMOVED.git-id         |     1 -
 .../Versions/A/CrashReporter                       |   Bin 0 -> 1432156 bytes
 .../Versions/A/CrashReporter.REMOVED.git-id        |     1 -
 chart-ios/libFlurry_6.0.0.a                        |   Bin 0 -> 3819300 bytes
 chart-ios/libFlurry_6.0.0.a.REMOVED.git-id         |     1 -
 67 files changed, 52595 insertions(+), 33 deletions(-)

All of these extra files are ones that I want removed.

Why are all these files being changed in my latest commit?


Source: (StackOverflow)

Bitbucket is alarming that my git repo is too large but I cannot confirm large files

Bitbucket is alarming that my Git repository is over 1 GB. Actually, in Repository details page it says it is 1.7 GB. That's crazy. I must have included large data files in the version control. My local repository is in fact 10 GB, which means that at least I have been using .gitignore successfully to some extent to exclude big files from version control.

Next, I followed the tutorial here https://confluence.atlassian.com/display/BITBUCKET/Reduce+repository+size and tried to delete unused large data. The command files.git count-objects -v at the top level folder of my repo returned the following:

count: 5149
size: 1339824
in-pack: 11352
packs: 2
size-pack: 183607
prune-packable: 0
garbage: 0
size-garbage: 0

The size-pack 183607 KB is much smaller than 1.7 GB. I was a bit perplexed.

Next I downloaded the BFG Repo Cleaner https://rtyley.github.io/bfg-repo-cleaner and ran the command java -jar bfg-1.12.3.jar --strip-blobs-bigger-than 100M at the top level directory to remove files bigger than 100 MB from all the not latest commits. However, BFG returned the following message:

Warning : no large blobs matching criteria found in packfiles 
- does the repo need to be packed?

Repeating the same for 50M resulted in the same.

Does this mean that all the files larger than 50 MB are in the latest commit? In Source code browser in Bitbucket, I looked at folders that contain large data files but those files are not included (successfully ignored).

Could anyone explain briefly what is the source of confusion about the repository size and existence of large files in the repo?


Source: (StackOverflow)

How to verify that BFG Repo-Cleaner has correctly removed a large file from a git repository?

I have used the BFG Repo-Cleaner to remove a large file from a git repository:

java -jar ../bfg-1.11.8.jar --delete-folders escrow application.git
cd application.git
git reflog expire --expire=now --all
git gc --prune=now --aggressive
cd ..
mkdir clone
cd clone
git clone file:///home/damian/temp/TCLIPG-4370/test/application.git

I have used the script(http://stubbisms.wordpress.com/2009/07/10/git-script-to-show-largest-pack-objects-and-trim-your-waist-line/) to check my repository before and after running BFG Repo-Cleaner and it shows the removal of the escrow directory and there is also a reduction in memory in the two repositories.

Everything looks ok, but how can I verify that all my commits are the same? Would I have to create a script with git-for-each-ref and compare the commits, with the same name, in the two repositories, to verify that BFG has worked correctly?

Any suggestions would be greatly appreciated.


Source: (StackOverflow)

How to update/shrink the size of my github repo after running BFG Repo Cleaner

I have cleaned my repo with BFG Repo Cleaner using the following procedure:

$ git clone --mirror git://example.com/some-big-repo.git
$ java -jar bfg.jar --strip-biggest-blobs 500 some-big-repo.git
$ cd some-big-repo.git
$ git reflog expire --expire=now --all
$ git gc --prune=now --aggressive
$ git push

I can see that my local repo has shrunk with 1GB. Great. The problem that I'm having now and that I haven't been able to find any info on is that now I would like to also shrink the size of the GitHub-repo as well. How to achieve this?

git push didn't work and I also tried git push origin --force --all which gave me this error message: error: --all and --mirror are incompatible


Source: (StackOverflow)

How to delete one folder / directory using BFG repo cleaner?

How do I delete only one directory using BFG?

The help says:

delete folders with the specified names (eg '.svn', '*-tmp' - matches on folder name, not path within repo)

Which seems to mean that --delete-folders "config" will match all folders named config, anywhere in the repository.


Source: (StackOverflow)