This is a memo on how to measure the number of lines in a source code repository.
Introduction
While reading source code, I sometimes wonder how many lines the entire repository has.
Here's a memo on how to do it.
Note: This article was translated from my original post.
How to Count Lines of Source Code
Count Lines with a Command
You can quickly count the total number of lines in a git repository with the following command:
git ls-files | xargs wc -l
git ls-files
lists all files managed in the git repository, and xargs wc -l
counts the lines.
If you want to count only specific files, you can use grep
to filter them, like this:
git ls-files | grep '\.go' | xargs wc -l
In this example, only files with the .go
extension are counted.
You can filter any files using a regular expression with grep
.
As a concrete example, let's count the lines of .py
files in the requests library.
# git clone https://github.com/psf/requests.git # cd requests git ls-files | grep '\.py' | xargs wc -l 86 docs/_themes/flask_theme_support.py 386 docs/conf.py 132 setup.py 180 src/requests/__init__.py 14 src/requests/__version__.py 50 src/requests/_internal_utils.py 540 src/requests/adapters.py 157 src/requests/api.py 314 src/requests/auth.py 17 src/requests/certs.py 79 src/requests/compat.py 561 src/requests/cookies.py 151 src/requests/exceptions.py 134 src/requests/help.py 33 src/requests/hooks.py 1032 src/requests/models.py 30 src/requests/packages.py 831 src/requests/sessions.py 128 src/requests/status_codes.py 99 src/requests/structures.py 1094 src/requests/utils.py 14 tests/__init__.py 23 tests/compat.py 58 tests/conftest.py 8 tests/test_adapters.py 27 tests/test_help.py 22 tests/test_hooks.py 428 tests/test_lowlevel.py 13 tests/test_packages.py 2839 tests/test_requests.py 78 tests/test_structures.py 165 tests/test_testserver.py 926 tests/test_utils.py 0 tests/testserver/__init__.py 134 tests/testserver/server.py 17 tests/utils.py 10800 total
The total .py
code has 10,800 lines.
Count Lines with Tools
There are many tools available to count lines of source code, and even a quick search reveals several popular ones:
- GitHub - AlDanial/cloc: cloc counts blank lines, comment lines, and physical lines of source code in many programming languages.
- GitHub - XAMPPRocky/tokei: Count your code, quickly.
- GitHub - boyter/scc: Sloc, Cloc and Code: scc is a very fast accurate code counter with complexity calculations and COCOMO estimates written in pure Go
We'll try the well-known cloc.
First, install cloc using your preferred package manager:
npm install -g cloc # https://www.npmjs.com/package/cloc sudo apt install cloc # Debian, Ubuntu sudo yum install cloc # Red Hat, Fedora sudo dnf install cloc # Fedora 22 or later sudo pacman -S cloc # Arch sudo emerge -av dev-util/cloc # Gentoo https://packages.gentoo.org/packages/dev-util/cloc sudo apk add cloc # Alpine Linux doas pkg_add cloc # OpenBSD sudo pkg install cloc # FreeBSD sudo port install cloc # macOS with MacPorts brew install cloc # macOS with Homebrew winget install AlDanial.Cloc # Windows with winget choco install cloc # Windows with Chocolatey scoop install cloc # Windows with Scoop
Now, let's count the requests library using cloc:
# git clone https://github.com/psf/requests.git # cd requests cloc . 93 text files. 81 unique files. 23 files ignored. github.com/AlDanial/cloc v 2.02 T=0.28 s (285.3 files/s, 60468.6 lines/s) ------------------------------------------------------------------------------- Language files blank comment code ------------------------------------------------------------------------------- Python 35 1997 1994 6809 reStructuredText 16 858 243 1931 Markdown 9 589 6 1626 DOS Batch 1 34 2 227 make 2 34 7 202 YAML 9 33 36 177 CSS 1 32 2 143 HTML 3 30 3 114 INI 1 3 0 15 Text 2 0 0 10 TOML 1 1 0 9 SVG 1 0 0 1 ------------------------------------------------------------------------------- SUM: 81 3611 2293 11264 -------------------------------------------------------------------------------
You can see the number of files, blank lines, comment lines, and code lines per language.
For Python code:
1997 (blank) + 1994 (comment) + 6809 (code) = 10800 (total)
It matches the previous manual counting result.
You can also specify git commits or measure differences. If you are interested, please refer to the official cloc repository.
Conclusion
That's a quick memo on how to count the number of source code lines in a git repository.
Hope it helps someone!
[Related Articles]
References
- Can you get the number of lines of code from a GitHub repository? - Stack Overflow
- bash - Count number of lines in a git repository - Stack Overflow
- GitHub - AlDanial/cloc: cloc counts blank lines, comment lines, and physical lines of source code in many programming languages.
- GitHub - XAMPPRocky/tokei: Count your code, quickly.
- GitHub - boyter/scc: Sloc, Cloc and Code: scc is a very fast accurate code counter with complexity calculations and COCOMO estimates written in pure Go