EzDev.org

bfg-repo-cleaner

Removes large or troublesome blobs like git-filter-branch does, but faster. And written in Scala BFG Repo-Cleaner by rtyley a simpler, faster alternative to git-filter-branch for deleting big files and removing passwords from git history.


Git BFG to retroactively enable LFS - protected commits issue

I have large files and was attempting to use the new Git LFS system.

I posted this question - Git lfs - "this exceeds GitHub's file size limit of 100.00 MB"

Edward Thomson correctly identified my issue - you cannot use LFS retroactively. He suggested I use BFG LFS support

This worked to a degree. The vast majority of my files were changed. However, there were protected commits that were not altered.

Of these protected commits some were over 100.00MB and thus caused a remote:error from github

Protected commits
-----------------

These are your protected commits, and so their contents will NOT be altered:

 * commit c7cd871b (protected by 'HEAD') - contains 165 dirty files :
    - Directions_api/Applications/LTDS/Cycling/Leisure/l__cyc.csv (147.3 KB)
    - Directions_api/Applications/LTDS/Cycling/Work/w_cyc.csv (434.0 KB)
    - ...

WARNING: The dirty content above may be removed from other commits, but as
the *protected* commits still use it, it will STILL exist in your repository.

If you *really* want this content gone, make a manual commit that removes it,
and then run the BFG on a fresh copy of your repo.

First of all - can someone explain why these commits are protected and different from those that BFG successfully changed?

Secondly - how can I unprotect these and allow BFG to edit them, thus allowing me to use LFS correctly and finally push successfully to GitHub


Source: (StackOverflow)

Permanently removing binary files from GitLab repos

We have a GitLab-hosted repo at work that contains some large binary files that we'd like to remove. I know of tools such as BFG Repo-Cleaner which will remove a file from a Git repository.

We often refer to specific commit IDs in GitLab. Would running BFG Repo-Cleaner mess these up?

If so, is there a better way to clean a repo that wouldn't mess these up?


Source: (StackOverflow)

Git/Bitbucket - Remove file from entire history/commits

Following the directions for BFG I proceeded to remove a private file that should not be on the repo/commit history.

https://rtyley.github.io/bfg-repo-cleaner/

I ran

$ bfg --delete-files .private  my-repo.git

and pushed the changes however, it caused me to merge the master branch and the file still shows, the code is in the .private file and all commits are still in the history.

How can I remove .private from the entire repo's commit history etc?


Source: (StackOverflow)

Possible to undo permanent deletion of file?

A colleague of mine attempted to permanently remove a file (Diff.java) from the history of our GitHub repo.

He had good reasons for wanting to do this, however something seems to have gone wrong as we seem to have lost quite a few files which have been replaced by equivalent files with the suffix .REMOVED.git-id. For example ivy-2.2.0.jar -> ivy-2.2.0.jar.REMOVED.git-id.

I have managed to repair the main development branch as I happened to have a copy locally. However there are many historical branches for development lines and tags for releases that now seem to be broken in the way described above.

I understand that he ran a process similar to:

$ git clone --mirror git://example.com/some-big-repo.git
$ java -jar bfg-1.12.3.jar --strip-biggest-blobs 500 some-big-repo
$ cd some-big-repo
$ git reflog expire --expire=now --all && git gc --prune=now --aggressive
$ git push

$ cd ..
$ java -jar bfg-1.12.3.jar --delete-files Diff.java some-big-repo
$ cd some-big-repo
$ git push

I am guessing that the process was destructive, and there is no way to recover unless we happen to have a clean mirror somewhere before this happened. Can anyone confirm or offer some advice?


Source: (StackOverflow)

Anyway around a fresh clone when using BFG Repo-Cleaner?

When using BFG Repo-Cleaner is there any way around not having everyone do a fresh clone? With a large team and multiple branches it is difficult to organize this. I am willing to run bfg multiple times should something be reintroduced as long as I don't have to have everyone re-clone the repo.

I'm thinking remove the files (ie private keys) from history, add them to .gitignore file, git push, and have the team rebase their branch.

Hoping Roberto Tyley sees this and can offer some advice.

Cheers!


Source: (StackOverflow)

BFG remove multiple folders

I found that BFG is much faster than the original git-filter-branch.

We have multiple svn repo to move to even more git repositories, this implies some repository folder merges and splits. During the process I need to remove a set of root folders and I'd like to remove those to the whole history.

I tried to use the BFG --delete-folders and it works fine for one single folder but I did not find a way to delete multiple folders. Is it even possible ? or shall I loop to call BFG as many times as I have folders to remove ?

Thanks for any help.


Source: (StackOverflow)

How to shrink Git repo and better manage large files

I've used the BFG repo cleaner in the past with great success. I've also recently started tinkering with Git LFS, although I'm still learning.

I had previously .gitignored many large files, and when I started tracking then with LFS, the .git/ folder became huge (due to .git/lfs/objects, mostly).

  1. By tracking files with Git LFS, am I going to massively increase my repo size right off the bat? That's been my experience so far, but it seems unnecessary (perhaps).

I see that BFG now supports Git LFS. It says I can use that command to reduce the size of my repo, and track with LFS.

  1. Is there a way to specify the size of the file to clean with BFG then track with LFS, or does it simply perform these actions for all files matched (based on name)?

I've been experimenting to try and find the answers to these questions, but progress has been slow.


Source: (StackOverflow)

How to batch-replace *all* instances (content, filenames and commit messages) of *Foo* to *Bar* in a repo in a single, simple step?

Suppose I have a giant repo for an as-of-yet unpublished software product called "Hammerstein", written by the famous German software company "Apfel" of which I am an employee.

One day, "Apfel" spins out the Hammerstein division and sells it to the even more famous company "OrĂ¡culo" which renames "Hammerstein" to "Reineta" as a matter of national pride and decides to open source it.

Agreements mandate that all references to "Hammerstein" and "Apfel" be replaced by "OrĂ¡culo" and "Reineta" in the repository.

All filenames, all commit messages, everything must be replaced.

So, for example:

  1. src/core/ApfelCore/main.cpp must become src/core/OraculoCore/main.cpp.

  2. The commit message that says "Add support for Apfel Groupware Server" must become "Add support for Oraculo Groupware Server"

  3. The strings ApfelServerInstance* local_apfel, #define REINETA and Url("http://apfel.de") must become OraculoServerInstance* local_oraculo, #define HAMMERSTEIN, etc.

This applies to files that are not in HEAD anymore as well.

What is the simplest and most pain-free method to achieve it with minimal manual intervention (so that it can be applied in batch to a potentially large number of repositories/assets)?

  1. BFG can replace the strings, but it seems to only have a --delete-file option, not a --rename-file, and even then it does not take patterns as an argument
  2. This approach seems to work only for HEAD and not for the whole history; I have had no luck using it with --tree-filter

Source: (StackOverflow)

Git Merge Duplication after Ineffective BFG Use

I have somehow deeply borked by entire repository (used only by me) and could use some assistance in sorting it out.

Here is what I did. I realized that in my commit history, there were some files containing credentials that I did not want just laying around. So, I decided to be legit and try to use the BFG Repo-Cleaner to fix these issues. I threw all the credentials in .gitignores, and moved on to trying to scrub them out of the history. As per the documentation instructions, I executed these commands:

git clone --mirror myrepo.git
java -jar bfg.jar --delete-files stuffthatshouldbedeleted.txt  myrepo.git

At this point, BFG told me that x number of files had been found and removed. Sweet.

cd myrepo.git
git reflog expire --expire=now --all
git gc --prune=now --aggressive
git push

According to the terminal logs, it updated the repo. So far so good, right? I pop into my github account, and after a few clicks, find the credentials still there, file and all, in my history. I go back and try the same set of commands, but using this line instead of the file remover:

java -jar bfg.jar --replace-text passwords.txt  myrepo.git

where passwords.txt is a file containing string instances of all the credentials I would like gone. Again, BFG logs indicate that there are several instances that it has fixed. I push up, check, and the credentials are still there, sitting in Github. I notice that the SHA-1 keys for all of my commits have been altered, so presumably BFG did something, just not the thing I want it to do.

At this point, I give up and try to get back to work, figure I'll sort it out later. I do some work, try to push up, get a weird merge conflict (you are 50 ahead and 50 behind on commits). What? I try to pull and merge, and suddenly, every single commit in my git history is duplicated in name, and some of them are just blank. I check my Github network graph, and it looks like there is a second branch starting from my initial commit that exactly mirrors all of my commits that has been zippered in with my last commit (I have never branched, just been linearly chugging along).

I can't revert to a previous commit, because they are all chronologically duplicated. My credentials are still in there, with twice as many instances now, and my history is doubled and very confusing to try to understand. When I try to run BFG from the beginning now, cloning and mirroring the repo anew, it tells me that there are no credentials in it, despite the fact that I can see them in Github. I could really use some help in understanding what happened, and how, if at all, I can get back to a state of things again.

I am considering just deleting the entire repo and starting anew. I really don't want to do that.

tldr; Tried using BFG, somehow duplicated half-baked versions of all commits in my repo, can't untangle, and to add insult to injury, BFG did nothing and claims it's done its job.


Source: (StackOverflow)

BitBucket repo larger after using BFG repo cleaner

My BitBucket repo had a whole bunch of large files in that weren't needed. I removed them and then wanted to clear them out from the history to shrink down the repo which had gotten too big.

I ran BFG repo cleaner which reported 1755 files found and processed - all the ones I was expecting.

Ran the final git gc as instructed here: https://rtyley.github.io/bfg-repo-cleaner/

All fine - the .git folder shrunk to 17% its original size. Pushed it back up and the repo size as reported by BitBucket actually got larger!

Not sure what went wrong as all seemed to behave correctly up to that point.

Any advice gratefully received as I really don't want to have to recreate the repo to bring down the size.

Thanks


Source: (StackOverflow)

Inspect git repo after using BFG repo-cleaner

Very basic git question:

I uploaded some compromising information to Github and am using bfg to clean the repo. I followed the documentation and performed the following actions:

$ git clone --mirror git://example.com/some-big-repo.git
$ bfg --replace-text passwords.txt  my-repo.git

I received the following output:

Found 233 objects to protect
Found 9 commit-pointing refs : HEAD, refs/heads/experimental, refs/heads/master, ...

Protected commits
-----------------

These are your protected commits, and so their contents will NOT be altered:

 * commit 497fc1c8 (protected by 'HEAD')

Cleaning
--------

Found 80 commits
Cleaning commits:       100% (80/80)
Cleaning commits completed in 301 ms.

BFG aborting: No refs to update - no dirty commits found??

I'd like to see if the private information was cleared from my repo but I'm not sure how to check the files in the mirrored repo. Any ideas?


Source: (StackOverflow)

Remove unused large files from Git within a range

My repo is forked from an open sourced project, so I don't want to modify the commits before the ForkPoint tag. I've tried the BFG Repo Cleaner but it doesn't allow me to specify a range.

I want to

  1. Go through the history in ForkPoint..HEAD^
  2. Rewrite the commits to delete all files larger than 10M

How to remove unused objects from a git repository? says it should be something like this

BADFILES=$(find . -type f -size +10M -exec echo -n "'{}' " \;)
git filter-branch --index-filter \
"git rm -rf --cached --ignore-unmatch $BADFILES" ForkPoint..HEAD^

but wouldn't BADFILES only contain the files that exist in HEAD?

For instance, if I've mistakenly committed a HUGE_FILE then later made another commit that removes that file, the BADFILES search wouldn't find the HUGE_FILE since find doesn't see it in the current working tree.


Edit1: Now I'm considering using BFG on a clone, then moving my fork onto the original ForkPoint. Would this be the right command, given fatRepo and slimRepo?

mkdir merger ; cd merger ; git init
git remote add fat  ../fatRepo
git remote add slim ../slimRepo
git fetch --all
git checkout fat/ForkPoint
git cherry-pick slim/ForkPoint..slim/branchHead

Edit2: Cherry-picking didn't work because cherry-picking can't handle merges in slimRepo. Can I somehow crush down the history of slimRepo, and simply merge onto fatRepo/ForkPoint?

git <turn into a single commit> slim/rootNode..slim/ForkPoint
git checkout fat/ForkPoint
git merge slim/branchHead

Source: (StackOverflow)

Git Repository Only Gets Bigger After Using BFG

We are currently in the process of migrating our SVN repo to GIT (hosted at bitbucket). I used subgit to import all our branches/history into a bare repo i have locally on my (Windows) PC.

The repo is quite big (7.42 GB after the import) this is because it also contains information about SVN like revision numbers to provide a way to have a two way sync between Git and SVN (I'm only interested in a one way SVN to GIT).

I create a local clone of the imported bare repo and push all the branches to bitbucket. After a couple of hours (!) the repo was fully uploaded. BitBucket now gave me warnings about the repo size. I checked the size and it was 1.1GB. Thats not as big as the imported bare but still to big to have a fast repository.

After playing around with BFG i managed to remove soms large DLL/SQL export files using these commands on the bare repo (I only use the clone for pushing without all the svn-related refs):

java -jar bfg.jar --delete-files '{''specialized 2015''','''specialized,''insert-pcreeks''}.sql' --no-blob-protection

java -jar bfg.jar --delete-files 'Incara.*.dll' --no-blob-protection Incara.git

git reflog expire --expire=now --all && git gc --prune=now --aggressive

This took a while and afterwards the git_find_big.sh script did not show these large sql files anymore. But after pushing things back to bitbucket (as a new repo, not as a force push) it only got bigger (1.8GB)

Can you provide a possible explanation for this behavior?

I don't know if it matters but we used a non standard branch/tag model in svn. This resulted in branches like: /refs/heads/archive/some/path/to/branch. These branches seemed to work just fine and removing them also did not affect the size.

Next to these problems i noticed i had some XML files showing up in the git_find_big.sh output:

size,pack,SHA,location 12180,1011,56731c772febd7db11de5a66674fe6a1a9ec00a7 repository/frontend.xml 12074,1002,0cefaee608c06621adfa4a9120ed7ef651076c33 repository/frontend.xml 12073,1002,a1c36cf49ec736a7fc069dcc834b784ada4b6a06 repository/frontend.xml 12073,1002,1ba5bd92817347739d3fba375fc42641016a5c1d repository/frontend.xml 12073,1002,e9182762bfc5849bc6645fdd6358265c3930779f repository/frontend.xml 12073,1002,dff5733d67cb0306534ac41a4c55b3bbaa436a2e repository/frontend.xml 12072,1002,8ee628f645ce53d970c3cf9fdae8d2697224e64c repository/frontend.xml 12072,1002,1266dee72b33f7a05ca67488c485ea8afc323615 repository/frontend.xml

These files contain the frontend logic of the web platform we are using and are indeed quite big. But they should be treated as text right? Therefore I don't get why they show up as separate objects in the above output. Am i right this should not be happening?

The SVN import also resulted in some empty commits (for example when SVN creates or moves a branch it needs a new commit). I guess these can only be removed using filter-branch?

Sorry, I have a lot of questions! Could someone help me with this?

Thanks,

Piet


Source: (StackOverflow)

Slim down a git repository with bfg

Faced with anarchic add of binary files by a coder, how to slim down a git repository not only removing the problematic files but also their history in the tree.

I tried using bfg but as it works on mirrored bare repository I've been faced with difficulties in getting the whole workflow, needing to gather answers from different places on the web.


Source: (StackOverflow)

Need to clone repo after using bfg repo-cleaner or pull on existing?

Reading the instructions for bfg repo-cleaner, the work flow seems like:

  1. clone the repo using the --mirror option
  2. strip the repo from unwanted items using bfg
  3. use git gc to physically remove the items
  4. do a push of the cleaned repo

However, then it is unclear to me whether you need to remove your own copy of the working directory and do a fresh clone, or whether you can just do a pull to get the clean repo/history? At the moment I am the only one who uses the repo.


Source: (StackOverflow)