mirror of
https://github.com/affaan-m/everything-claude-code.git
synced 2026-05-14 02:10:07 +08:00
167 lines
4.8 KiB
Markdown
167 lines
4.8 KiB
Markdown
---
|
|
name: gget
|
|
description: gget CLI and Python workflow for quick genomic database queries, sequence lookup, BLAST-style searches, enrichment checks, and reproducible bioinformatics evidence logs.
|
|
origin: community
|
|
---
|
|
|
|
# gget
|
|
|
|
Use this skill when a task needs quick bioinformatics lookup across genomic
|
|
reference databases with the `gget` CLI or Python package.
|
|
|
|
## When to Use
|
|
|
|
- Finding Ensembl IDs, gene metadata, transcript details, or sequences.
|
|
- Running quick BLAST or BLAT lookups without building a full local pipeline.
|
|
- Fetching reference genome links and annotations from Ensembl.
|
|
- Querying protein structure, pathway, cancer, expression, or disease-association
|
|
modules through a single interface.
|
|
- Creating a reproducible first-pass evidence log before moving to heavier
|
|
tools such as Biopython, Snakemake, Nextflow, BLAST+, or database-specific
|
|
clients.
|
|
|
|
Use a dedicated workflow instead of `gget` when the task requires regulated
|
|
clinical interpretation, high-throughput production pipelines, or fine-grained
|
|
control over database versions and local indexes.
|
|
|
|
## Installation
|
|
|
|
Use a clean Python environment.
|
|
|
|
```bash
|
|
python -m venv .venv
|
|
. .venv/bin/activate
|
|
python -m pip install --upgrade pip
|
|
python -m pip install --upgrade gget
|
|
gget --help
|
|
```
|
|
|
|
If `uv` is available:
|
|
|
|
```bash
|
|
uv venv
|
|
. .venv/bin/activate
|
|
uv pip install gget
|
|
```
|
|
|
|
Before relying on an older environment, upgrade `gget` and re-check the module
|
|
docs. The upstream databases queried by `gget` change over time.
|
|
|
|
## Basic Patterns
|
|
|
|
CLI shape:
|
|
|
|
```bash
|
|
gget <module> [arguments] [options]
|
|
```
|
|
|
|
Python shape:
|
|
|
|
```python
|
|
import gget
|
|
|
|
result = gget.search(["BRCA1"], species="human")
|
|
print(result)
|
|
```
|
|
|
|
Common workflow:
|
|
|
|
1. Identify the species, assembly, gene ID type, and database needed.
|
|
2. Check the current module documentation for arguments.
|
|
3. Run a small query first.
|
|
4. Save output with an explicit filename and date.
|
|
5. Record module name, version, arguments, and database assumptions.
|
|
|
|
## Common Modules
|
|
|
|
Use current upstream docs for exact arguments. These modules are common first
|
|
choices:
|
|
|
|
- `gget search`: find Ensembl IDs from search terms.
|
|
- `gget info`: retrieve metadata for Ensembl, UniProt, or related IDs.
|
|
- `gget seq`: fetch nucleotide or amino-acid sequences.
|
|
- `gget ref`: retrieve reference genome download links.
|
|
- `gget blast`: run a quick BLAST query.
|
|
- `gget blat`: locate a sequence against supported genome assemblies.
|
|
- `gget muscle`: run multiple sequence alignment.
|
|
- `gget diamond`: run local sequence alignment against reference sequences.
|
|
- `gget alphafold` and `gget pdb`: inspect protein-structure references.
|
|
- `gget enrichr`, `gget opentargets`, `gget archs4`, `gget bgee`, `gget cbio`,
|
|
and `gget cosmic`: explore enrichment, target, expression, cancer, and disease
|
|
association data.
|
|
|
|
Do not assume every module supports every Python version or dependency set.
|
|
Some optional scientific dependencies have narrower version support than the
|
|
core package.
|
|
|
|
## Quick Examples
|
|
|
|
Find genes:
|
|
|
|
```bash
|
|
gget search -s human brca1 dna repair -o brca1-search.json
|
|
```
|
|
|
|
Fetch gene metadata:
|
|
|
|
```bash
|
|
gget info ENSG00000012048 -o brca1-info.json
|
|
```
|
|
|
|
Fetch a sequence:
|
|
|
|
```bash
|
|
gget seq ENSG00000012048 -o brca1-seq.fa
|
|
```
|
|
|
|
Run a small BLAST query:
|
|
|
|
```bash
|
|
gget blast "MEEPQSDPSVEPPLSQETFSDLWKLLPEN" -l 10 -o blast-results.json
|
|
```
|
|
|
|
Python example:
|
|
|
|
```python
|
|
import gget
|
|
|
|
genes = gget.search(["BRCA1", "DNA repair"], species="human")
|
|
info = gget.info(["ENSG00000012048"])
|
|
sequence = gget.seq("ENSG00000012048")
|
|
```
|
|
|
|
## Reproducibility Log
|
|
|
|
For scientific outputs, include enough metadata to replay the query.
|
|
|
|
```markdown
|
|
| Date | gget version | Module | Query | Species/assembly | Output | Notes |
|
|
| --- | --- | --- | --- | --- | --- | --- |
|
|
| 2026-05-11 | `gget --version` | search | `BRCA1 DNA repair` | human | `brca1-search.json` | Docs checked before run |
|
|
```
|
|
|
|
Also record:
|
|
|
|
- Python version and environment manager.
|
|
- Any optional dependency installed through `gget setup`.
|
|
- Database-specific identifiers returned by the query.
|
|
- Whether output is JSON, CSV, FASTA, or a DataFrame export.
|
|
- Any failures that were resolved by upgrading `gget`.
|
|
|
|
## Review Checklist
|
|
|
|
- Did you upgrade or verify the installed `gget` version?
|
|
- Did you check the current upstream module docs before using arguments?
|
|
- Is the species or assembly explicit?
|
|
- Are identifiers preserved exactly, including Ensembl/UniProt prefixes?
|
|
- Is the result labeled as database output rather than clinical interpretation?
|
|
- Is the query reproducible from the saved command or Python snippet?
|
|
- Are optional dependencies installed in an isolated environment?
|
|
|
|
## References
|
|
|
|
- [gget documentation](https://pachterlab.github.io/gget/)
|
|
- [gget updates](https://pachterlab.github.io/gget/en/updates.html)
|
|
- [gget GitHub repository](https://github.com/pachterlab/gget)
|
|
- [gget Bioinformatics paper](https://doi.org/10.1093/bioinformatics/btac836)
|