Question Details

No question body available.

Tags

python excel web-scraping beautifulsoup html-parsing

Answers (2)

January 1, 2026 Score: 0 Rep: 71,727 Quality: Medium Completeness: 80%

The solution below traverses the page by anchoring the search on the h2.mw-headline elements (which exclude the main introductory text), and then checking if the following sibling element is storing the memory names and descriptions:

import requests, bs4
from bs4 import BeautifulSoup as soup
import csv

def getdata(): d = requests.get('https://deadbydaylight.fandom.com/wiki/Tome1-Awakening').text page = soup(d, 'html.parser') vals = [i for i in page.selectone('.mw-content-ltr.mw-parser-output').children if not isinstance(i, bs4.NavigableString)] #get all body content results = [] while vals: node = vals.pop(0) if node.name == 'h2' and node.selectone('span.mw-headline'): if vals and getattr(node2:=vals.pop(0), 'attrs', {}).get('class',[]) == ['tabber', 'wds-tabber']: #lookahead to see if subsequent sibling stores both tabs and content results.append([node.gettext(strip=True), '']) tabs = [i.gettext(strip=True) for i in node2.select('ul.wds-tabs > li.wds-tabstab')] text = [i.gettext(strip=True) for i in node2.select('div.wds-tabcontent i')] results.extend([*map(list, zip(tabs, text))])

with open('memorytitles.csv', 'w') as f: write = csv.writer(f) write.writerows(results)

You can see the contents of memory_titles.csv here.

November 29, 2025 Score: 0 Rep: 1 Quality: Low Completeness: 40%

I would recommend using the mediawiki api which is supported by fandom to do this task. Perhaps wikitext is easier to parse than html for this usecase you're talking about. I can't really tell what opperation you're trying to do but whatever it is it's probably easier to do it on the markdown format of wikitext instead of plain html.

https://deadbydaylight.fandom.com/api.php?action=parse&page=Tome1-Awakening&prop=wikitext&formatversion=2

I got the api endpoint above from reading this page (method 3) on the mediawiki api documentation https://www.mediawiki.org/wiki/API:Getthecontentsofapage