As part of our entry to the 2009 Billion Triple Challenge (BTC), we have been using two pieces of great infrastructure: Amazon Web Services and the quad store – 4store. Today, we are making publicly available an Amazon Machine Image for 4store. Additionally, we are making an Elastic Block Storage snapshot of the BTC dataset for 4store. Thus, developers can easily get started using 4store with a billion triples on Amazon’s cloud.
- We assume you have used Amazon EC2 before.
- The 4store AMI and the associated EBS snapshot are currently only available in the EU-West Amazon region.
- The id of the AMI is : ami-62547f16
- The id of the BTC snapshot is : snap-1a8c6073
- The 4store AMI is based on Debian Squeeze 64-bit. We use the AMI (ami-745b7000) provided by alestic.com as the starting point.
Using the 4store AMI:
- The AMI is 64-bit so you need to start it on a 64-bit EC2 instance
- Checkout 4store.org for documentation about using 4store.
- If you’re going to use 4store without the BTC dataset, you need to create the directory /mnt/4store once the instance has started.
Using the 4store AMI with the BTC dataset:
- Start the AMI as above.
- Make sure that the Security Group you use allows for HTTP traffic on the port range 4000-4060 as we start a 4store instance for roughly every 20 million triples.
- Create an EBS volume from the BTC snapshot and attach it to your EC2 instance.
- Mount the volume at /mnt/4store
- In the root home directory (~/), you’ll find a shell script called btc.sh. This will allow you to start 4store for btc. Run “btc.sh start”. This will launch all the 4store backends and HTTP servers. This will take a bit of time to start around 30 minutes to an hour.
- Once this is complete, you’ll be able to access the billion triples over the 50 some sparql endpoints that have been started on ports 4000 – 4057.
Paul Groth, Christophe Guéret, Stefan Schlobach
Knowledge Representation and Reasoning Group
Department of Artificial Intelligence
Vrije Universiteit Amsterdam
UPDATE: The EBS Volume is no longer available due to cost. The AMI snaphot is still there. The BTC dataset can be recreated with the scripts available in the AMI.