Analyzing “Alien Mummy” DNA: A Data Scientist’s Journey into Ancient Genomics
Recent Claims and the Scientific Pursuit
In recent weeks, the discovery and presentation of purported “alien mummies” in Peru have sparked both intrigue and skepticism within the scientific community and the public alike. While many researchers have focused on anatomical analyses—examining finger bones, hips, and other skeletal features—there remains a pressing need to scrutinize the genetic material from these samples. Such genetic investigations can offer critical insights into the origins and nature of these enigmatic remains.
My Approach and Expertise
As a data scientist with a background in molecular biology and bioinformatics, I am delving into the genetic data associated with these mummies. While my professional scope primarily involves data analysis and modeling, I have cultivated enough familiarity with biological datasets to navigate genomic workflows. Inspired by Avi Loeb’s philosophy—”we don’t need to wait for official confirmation; we can conduct our own investigations”—I embarked on analyzing the available genetic sequences to evaluate their authenticity and biological origin.
Data Collection and Initial Steps
Recently, a Reddit user shared links to three genetic sequences claimed to originate from these ancient samples. Recognizing an opportunity to apply my skills, I initiated an independent analysis pipeline. It’s important to note that sequencing and bioinformatics are resource-intensive endeavors: individual steps can span hours, requiring meticulous manual inspection and patience. Over the past week, I’ve been building and executing pipelines locally, with the aim of extracting meaningful insights from raw data.
Pipeline Overview
The analysis process encompasses several stages:
- Data Preparation: Formatting and quality control of raw sequencing reads.
- Alignment: Mapping reads to the human reference genome (hg38) using tools like Bowtie2.
- Extraction of Non-Human Reads: Isolating sequences that do not match human DNA, possibly indicating contamination or other organisms.
- Taxonomic Classification: Using Kraken2 to assign these non-human reads to known organisms.
- De Novo Assembly: Constructing longer contigs from unaligned reads with tools like MEGAHIT, facilitating the identification of potential microbial or other genetic material.
- Further Analysis: Applying BLAST searches, k-mer frequency analysis, and binning techniques to classify assembled sequences and assess their biological relevance.
Collaborative Efforts and Resources
Currently, I’ve partnered with two other bioinformatics experts from the community—both holders of PhDs with Next-Generation Sequencing (
0 thoughts on “I’m analyzing the “alien mummy” DNA so you don’t have to.”