Exploring Ancient DNA from Peru: A Data Scientist’s Analysis of the “Alien Mummy” Claims
In recent discussions surrounding Peru’s mysterious ancient mummies, much attention has been given to anatomical features—finger bones, hips, and skeletal structures. These observations are indeed intriguing and have garnered valuable insights from experts in anthropology and archaeology. However, an equally compelling aspect—namely, the genetic makeup of these remains—deserves its own rigorous scientific examination.
As a data scientist with foundational knowledge in molecular biology and bioinformatics, I’ve embarked on an investigative journey into the DNA sequences purportedly derived from these enigmatic mummies. Inspired by visionary researchers like Avi Loeb, who advocates for scientists to analyze evidence independently rather than waiting for official declarations, I set out to analyze the available genetic data firsthand.
The Situation: Claims and Data
Recently, a Reddit community shared links to three genetic sequencing datasets associated with the Peru mummies. These datasets, generated through next-generation sequencing (NGS), claim to contain genetic material from the ancient remains. My goal was to assess what these sequences reveal—whether they contain human DNA, trace evidence of other organisms, or signs of contamination.
Understanding the Challenge
Genomic data analysis is inherently complex and resource-intensive. Genome sequencing pipelines involve multiple steps—quality control, alignment, assembly, classification, and annotation—all of which can be time-consuming. Over the weekend, I experienced some processes taking up to 13 hours each, illustrating the computational demands involved. Nevertheless, I am committed to transparency and sharing results with the broader community.
The Workflow
-
Data Preparation: I started with raw sequencing reads, performing quality filtering and trimming to prepare the data for analysis.
-
Alignment to Human Genome: Using tools like Bowtie2, I aligned the reads to the human reference genome (hg38). This step helps separate human DNA from potential contaminants or non-human content.
-
Extraction of Non-Human Reads: Reads that did not align to human DNA were isolated for further analysis, as these could originate from bacteria, environmental contaminants, or other organisms.
-
Taxonomic Classification: I employed Kraken2, a fast and accurate classifier, to assign taxonomy to the non-human reads. This step helps identify the biological sources of the genetic material.
-
De Novo Assembly: For unaligned reads, I used the MEGAHIT assembler to construct longer contigs, attempting to reconstruct possible genomes or genomic fragments.
-
**Further

0 thoughts on “I’m analyzing the “alien mummy” DNA so you don’t have to.”