Decentralized Web Archiving and Replay

Web archiving is the practice of temporal versioning and preservation of representations of resources on the web. An archival replay is the practice of the playback of the archived historical representation of web resources while maintaining the essence and fidelity. Decentralized content-addressable file systems such as InterPlanetary Filesystem (IPFS) and decentralized naming systems such as InterPlanetary Name System (IPNS) can offer the opportunity to preserve all the historical versions of files stored in them. However, current implementations of IPFS and IPNS are not history-aware and are lacking native support for versioning. 

Our initial work on InterPlanetary Wayback (IPWB) was a successful exploration of the possibilities of web archiving in the early days of IPFS. However, it relied on a local index for the system to operate, which made its operations centralized, despite the archival data being decentralized. We are addressing this shortcoming by proposing a potential solution based on IPFS object media types, immutable linked-lists, and namespaces that operates within the existing IPFS primitives to make web archiving truly decentralized.

Current web archiving practices are primarily run by organizations and institutions who control what gets archived and replayed. Emerging web archiving tools allow individuals to archive parts of the web, but their collections suffer from disuse due to the lack of visibility on the web. We are proposing a system that democratizes web archiving to allow everyone, from large organizations to individuals, to participate in archiving the web and make their collections discoverable using the decentralized peer-to-peer network.

Back to all videos