How cryptography and peer-to-peer networks contribute value to society

By: Omar Metwally, M.D.

3/13/2022

Objective:

To illustrate the utility of cryptography and peer-to-peer networking in protecting the authenticity, integrity, and availability of information.

https://en.wikipedia.org/wiki/Snowflake#/media/File:Snowflake_macro_photography_1.jpg

1. Information is the useful synthesis of data.

Our email inboxes, phones, and hard drives are constantly filling up with data; however, collecting, organizing, and archiving the useful nuggets of information in an ocean of junk requires time, money, and energy. The number of useful emails in my inboxes is a small fraction of the total number of emails, which are mostly spam. I don’t pay for extra storage out of principle. Why fund a company whose spam filters are more likely to block important emails than spam? Why perpetuate the problem?

Similarly with the high-resolution photos which take up so much memory on my phone and hard disk: most of these photographs do not deserve the 2+ MB of memory they occupy on my phone and PC. I’ll commonly snap a photo of a beautiful landscape, a critter I encounter on a walk, or something I need to remember for a short period of time (for example, where I parked). Backing up every photo and video on my phone seems wasteful considering that, like my email inbox, only a small proportion are media that I actually want to preserve. The alternative, however, would be to manually go through each of my inboxes and every photo I take on my phone and make a conscious decision whether to keep or delete a file. This latter strategy often proves far too time-intensive to pursue on a consistent basis.

2. Data that exists in only one location is as good as gone.

I once asked a colleague how he backs up his digital information. “I’ve never needed to back up my data,” he answered. This is a fallacy. Every possible failure of a digital system will eventually and inevitably occur. Hard disks fail all the time. People accidentally delete and lose files. Important bits of information drown in oceans of spam and junk, to the extent that locating them becomes practically impossible. Networked systems get hacked. People lose or upgrade their phones and change platforms, only to realize years later that they never backed up their old Android or iPhone which is now resting in a landfill.

Preserving information in a way that facilitates future retrieval requires:

– a consistent schema for organizing files and directories

– multiple physical (e.g. HDDs and SSDs) and cloud-based storage systems

– a consistent version control schema

– consistency in backing up information to each of these media

In other words, if you really cherish your data, you need to be organized, anticipate what can (and inevitably will) go wrong, and back up consistently. If it’s important information, chances are you’ll also want to encrypt your disks in a way that prevents unauthorized parties from accessing the data, without accidentally losing access to your own data.

3. Cryptography is arguably one of the most useful and powerful technologies in modern-day computing.

Modern cryptography is the basis for digital tools that protect the authenticity and integrity of information. While information ends up in the wrong hands all the time, encryption ensures that only the intended recipient can “unlock” the information. To lay people, “encryption” may conjure messaging apps designed for protect one’s privacy. However, another compelling use case of cryptography, which may be unknown to lay computer users, is to mathematically prove the authenticity of digital information. Algorithms such as SHA256 [https://csrc.nist.gov/glossary/term/SHA_256] can generate a mathematically unique string of numbers and letters, which can serve as a “fingerprint” for a file’s authenticity. Altering even the slightest letter in a document changes this cryptographic fingerprint.

Just like no two individuals have the same fingerprint, so do non-identical files yield unique cryptographic hashes. For instance, an attorney who needs to ensure the authenticity of a collection of evidence can use a cryptographic hashing algorithm such as SHA256 to prove beyond a doubt that the data do indeed represent what the attorney claims they do. However, it’s important to note that these hashing algorithms do not necessarily preserve the actual data to which they refer. It is still upon the attorney to back up the evidence in a secure and redundant manner. Furthermore, the attorney must ensure that each backup is identical. Although a small discrepancy may or may not be consequential in court (for instance, accidentally adding a space, period, or comma may or may not alter the interpreted meaning of a document), the cryptographic hash will be altered, negating the utility of the hashing algorithm.

4. Distributing and decentralizing information is a key value proposition of blockchain networks

Encryption and hashing preceded cryptocurrencies. Hash functions, which are defined by the National Institute of Standards and Technology, are generally free to use and are accessible via command line on any computer. Arguably the biggest value proposition of blockchain networks, on a technical level, is their capacity to add verifiable and tamper-proof timestamps to cryptographic hashes, by propagating a verifiable and identical chronological database across numerous peers around the world. Being able to reliably exchange information with thousands of computers across the world, spanning many different geographic areas, yields redundancy that would be implausible to replicate by entrusting any one party to create thousands of backups, spread them around the world, ensure that they can be accessed reliably, and also ensure the integrity of the original information. In reality, governments restrict access to online content all the time. People in affected locations can use tools such as VPNs to try and circumvent these limitations, but as long as a critical number of nodes is online, the information will not be lost, even if it is inaccessible from a certain geographic region due to inability to run a p2p client.

Cryptocurrencies create financial incentives for people to volunteer hard disk space, broadband, their time, skills, computing resources, and energy to contribute to a peer-to-peer network. Rather than relying on one party to ensure the integrity, authenticity, and availability of data (which is typically hosted in a relatively small number of geographic locations), blockchains are essentially distributed databases (also known as “distributed ledgers” when used in the context of exchanging digital value).

5. Ensuring information availability is another value proposition of blockchain networks

I have been experimenting with IPFS (“InterPlanetary Filesystem” [https://ipfs.io/]), a peer-to-peer file-sharing networking, since 2017. Each byte stored directly on a blockchain network is relatively expensive. While all blockchains are peer-to-peer networks, not all peer-to-peer networks are blockchain. IPFS, an example of a peer-to-peer network that is not a blockchain, allows users to easily upload directories and files to the network, where they are relayed from node to node. IPFS itself is free to use; that is, there is no subscription fee to cover hosting costs because volunteers around the world share in hosting the data. However, this utopian dream of “share everything, preserve everything” ignores the reality of the cost of hosting data. Bandwidth, disk space, processing power, and electricity cost money. Data hosted on IPFS can be “pinned” using a 3rd-party service, but this crosses the line of decentralization and places trust in a 3rd-party service to ensure the persistence of these data. Furthermore, it’s unclear to me why a 3rd-party service would volunteer their resources freely without charging a hosting fee.

Filecoin is a cryptocurrency developed by the creators of IPFS (Protocol Labs) which aims to solve this missing economic incentive. The Filecoin protocol aims to incentivize miners (people with a lot of computing power and storage capacity) to host others’ data by rewarding them with the Filecoin cryptocurrency in exchange for running software that can mathematically prove that the hosted data (1) exist on their hard drive(s), and (2) can be retrieved by the party that is paying Filecoin in exchange for their data to be hosted.

I downloaded the Filecoin client (“Lotus”) and spent several days running IPFS and Lotus in parallel in order to see if hosting a 113 MB file on Filecoin was a better alternative to using traditional cloud servers, and also to learn about the economics of the Filecoin ecosystem. I provide here my impressions of this limited experience without a recommendation for or against any cryptocurrency.

It took me a few hours to sync the Filecoin mainnet to completion. I had to download a snapshot of the chain in order to sync, and I could not locate a SHA256 checksum of the snapshot used to sync. I was unable to sync by connecting to peers directly. Using snapshots hosted on a centralized server which are not associated with published checksums is never best practice because there’s otherwise no way to ensure the authenticity or integrity of what one thinks they are downloading.

The Slack channels used by the Filecoin community are active, and I received timely answers to my questions by knowledgeable contributors. Once the Filecoin chain was synced, I proceeded to upload a 113 MB file using its IPFS hash (that is, the file was already uploaded to IPFS, and I used the IPFS hash to point to the data). The process of uploading data generally entails (1) identifying storage providers (miners) who are willing and able to host one’s data; (2) uploading the data to the storage providers; and (3) paying a transaction fee to upload the data. These transactions are referred to as “deals” and can range from 180 to 540 days in duration. Miners can specify parameters such as the minimum and maximum file size they are willing to host, duration of hosting, and their cost per Gigabyte per time period (in the case of Filecoin, per 30-second epoch). Retrieving data involves a separate set of processes, but I haven’t yet made it that far.

In Filecoin, miners host others’ data, which may or may not be encrypted. This is a potential legal gray area because miners generally don’t know what they’re hosting, and miners are often located in jurisdictions separate from the party seeking hosting services. Deals can be arranged on a Slack channel or third-party reputation marketplaces, but rarely does one know whom exactly they’re dealing with. What happens if a party is uploading content that is illegal in their jurisdiction? Or perhaps legal in their jurisdiction but forbidden in the miner’s jurisdiction?

The process of trying to host data on Filecoin is far more complex than using traditional cloud servers. The average person is unlikely to succeed without a strong commitment to the steep learning curve involved in using these command-line tools. Some of the complexities can theoretically be simplified using third-party services, but this can potentially negate the advantages of using an incentivized p2p network in the first place.

The Filecoin protocol incentivizes miners to contribute their computing resources (and time) to host others’ data by rewarding them for reliably hosting others’ data and financially punishing them by deducting penalties from the collateral they have to put up. Due to the relatively early stage of development of these tools, Filecoin documentation recommends making multiple deals with up to 10 different miners to ensure the availability of one’s data, in case one or more miners’ do not make good on their deal.

On my first attempt to upload a 113 MB file, the “deal” failed for unclear reasons, despite my attempts to troubleshoot the Lotus client’s behavior with the help of technical support volunteers. My starting balance was one Filecoin (1 FIL). Here are some numbers central to the (failed) transaction:

Initial wallet balance: 1 FIL

Cost of hosting 113 MB file with a particular miner for 180 days: 0.01296 FIL ($0.225504, at an exchange rate of $17.4 per FIL on March 12th, 2022).

Wallet balance after the escrow funds were returned to my wallet (i.e. after the deal failed):

0.996353443699298176 FIL

Difference between initial and final wallet balance = amount of “gas” burned (network transaction fees):

0.006646556300701767 FIL

Therefore, 51.285% of the original proposed cost of hosting the file (0.01296 FIL) was burned in the form of gas. In other words, 0.006646556300701767 FIL / 0.01296 FIL = 0.5128515664121734

While the amount of burned gas may seem trivial, it accounts for a majority of the cost of the failed deal (51.285%)! If the goal is to establish 10 deals with 10 different miners, then the cost of gas associated with failed deals can quickly add up.

6. Mathematical proof of data availability may or may not be necessary

There are certainly cases in which it’s necessary to prove mathematically not just the integrity and authenticity of data (for example, using hashing functions such as SHA256), but also the availability of the data. Filecoin aims to mathematically prove both the existence and availability of data hosted on a peer to peer network while incentivizing miners to uphold deals with parties who need data hosted. However, there are also many instances where a SHA256 checksum uploaded to a blockchain with an immutable timestamp is more than sufficient. In this latter case, the responsibility of organizing, archiving, and maintaining identical copies of these data falls upon the party willing to pay for the weight of this proof. As mentioned above, there are instances where entrusting miners to store and deliver content may be undesirable for legal reasons, privacy, or simply the need to trust that at least one miner with whom one conducts a deal will uphold their end of the deal.

In conclusion, cryptography and peer-to-peer networking are powerful technologies that can help protect the integrity of information and ensure its persistence. Various blockchain networks use financial incentives in different ways to provide a variety of value propositions to network participants. Clearly understanding one’s goals as the relate to information preservation/exchange, and clearly understanding each network’s value proposition, is key to making good investments of one’s time and resources.

On the economics of knowledge creation and sharing

Omar Metwally, MD 
omar.metwally@gmail.com
University of California, San Francisco

DDASH - Ethereum Operating System 
for Knowledge Creation and Sharing
==========================================
    Github repository
    -----------------
    Project website
    -----------------
    First Draft

Abstract

This work bridges the technical concepts underlying distributed computing and blockchain technologies with their profound socioeconomic and sociopolitical implications, particularly on academic research and the healthcare industry. Several examples from academia, industry, and healthcare are explored throughout this paper. The limiting factor in contemporary life sciences research is often funding: for example, to purchase expensive laboratory equipment and materials, to hire skilled researchers and technicians, and to acquire and disseminate data through established academic channels. In the case of the U.S. healthcare system, hospitals generate massive amounts of data, only a small minority of which is utilized to inform current and future medical practice. Similarly, corporations too expend large amounts of money to collect, secure and transmit data from one centralized source to another. In all three scenarios, data moves under the traditional paradigm of centralization, in which data is hosted and curated by individuals and organizations and of benefit to only a small subset of people.

1. Introduction

In its current siloed state, data is a liability rather than an asset. The value of data depends on its quantity and quality. Organizations, including corporations, government, and academia, have few incentives to share data outside the context of selling it. For instance, advertisers use data procured  from individuals’ browsing history and social media use (via internet service providers, social media and search engines) to create detailed profiles of individuals’ online behavior and spending habits and more effective sell products to unknowing consumers. While this paradigm fits naturally into a capitalistic society, these economics of data collection and transfer do not facilitate the generation or sharing of knowledge in the academic setting.

A typical university-based research group depends upon external funding to support its research activities. These funds often originate from governmental bodies, philanthropic organizations, or corporations and are difficult to secure [1]. Only a small minority of tenure track scientists ever becomes principal investigators, and a lab that is productive today can become defunct tomorrow if its principal investigator is unable to secure funding for laboratory equipment and supplies such as microscope parts, reagents, and to compensate technicians and trainees [2]. Principal investigators spend a majority of their time writing grant applications rather than participating directly in the process of knowledge generation [3].

It is often said that publications are the currency of academia. The maxim “publish or perish” applies to most research groups, whose work culminates in peer-reviewed publications with publication fees commonly amounting to several thousand dollars [4]. Moreover, these peer-reviewed publications are heavily biased toward so-called “positive results,” in which mathematical correlations between variables are described [5]. The vast majority of data produced by scientific researchers do not refute the null hypothesis; in a best case scenario, they are deemed “negative results,” and are discarded; in a worst case scenario, they are data that can’t be replicated, verified, or are outright fraudulent [6]. The result is the modern-day academic machinery. This severely flawed system, a victim of many conflicting economic forces, results in a tremendously inefficient workflow in which most grant money is wasted in the form of negative, and therefore unpublishable, results. Principal investigators spend a majority of their time trying to secure funding. The ultimate winner is the $10 billion business of academic publishing [6]. In this reality, data with the potential to produce vast knowledge is rendered into a vastly wasted opportunity to exponentially build on communities’ resources. Individuals’ roles are minimized by the centralization of resources in the hands of a privileged few.

2. Background

While the term “blockchain” has been touted to near-hysteria in popular media in the context of initial coin offerings and get-rich-quick schemes, an understanding of this data structure’s logic reveals the tremendous and fascinating socioeconomic implications of storing data on blockchain. In its most simplified form, a blockchain is a ledger [7]. The reason for blockchain’s natural association with financial derivatives lies in its ability to mathematically prove the authenticity of data and demonstrate proof of stake and proof of work [8].

The starting port for these use cases is the typical consumer, who is separate from (and often completely unaware of) the data collected about him or her. For instance, a customer’s online behavior is collected and used to up-sell the customer as much as algorithmically possible [9]. Customers have  nothing to gain (and a few thousand dollars each year in extra spending to lose) from such data, which companies can sell to data brokers and merchants [10]. Analogously, the majority of taxpayers have no access to — and oftentimes no way to directly benefit from — publications funded through research that ends up property of academic journals [11, 12].

2.1 Case Study: Proof of Stake

Consider a research lab living from grant to grant, sifting through negative results to find crumbs of publishable positive results. If its lab notebooks were stored in the form of a blockchain, every experiment conducted, every machine learning model and dataset, and every clinical trial would generate data that lives on the blockchain as a cryptographic asset. Also referred to as “coins” and “tokens,” these cryptographic assets have inherent value because they are perfect receipts of the existence and transfer of data [13]. Never before in history has such a perfect ledger existed [14, 15]. On the blockchain, a relatively worthless set of negative results generated by a research lab becomes, when combined with negative results from thousands of other research groups, a trove of extremely valuable scientific data which can be traced to its owners whenever and however it is utilized. This large collection of negative results can become the source of unexpected positive results.

Moreover these blockchain-hosted data take on a new life as a financial derivative [16, 17]. These cryptographic assets, perfect receipts of the creation and movement of knowledge, can be traded by third-parties analogously to the way a company’s common stock is bought and sold on private and public marketplaces, albeit without the same regulations and on a different scale [13]. These tokens enable individuals, small and large groups alike to be compensated for their services in ways that are impractical or impossible in traditional economies [18]. Rather than relying on the slow and inefficient process of securing funding through grants, research labs can codify contracts on the blockchain to allow third-parties to bid for services and products rendered, on the metadata (what kind of knowledge research labs generate through their scholarly activities), and allow third-parties to become stakeholders in a research group’s success by directly benefitting from these research activities. For instance, if I believe that a particular group is contributing to science and society in a positive way, I can economically support this group by donating computing power and electrical energy to support the integrity of their lab notebook-turned-ledger, or by trading fiat for tokens representing proof of stake in their scholarly activities. What are today opportunities exclusive to accredited investors and institutions will become abundant opportunities for individuals to influence how perceived value circulates through society.

2.2 Case Study: Proof of Work

Consider the United States healthcare system, which still excludes millions of Americans from access to healthcare and financially ruins even more [19, 20]. Insurance companies are able to impose high premiums simply because they can. This is the logic of a capitalistic society, and insurance companies alone enjoy the benefits of owning valuable health data to their fullest extent — at the expense of those whose health data was collected [20, 21]. Imagine, on a smaller scale, a radiology group that puts a copy of every imaging study they do on a blockchain, along with a timestamp, a description of which type of study was done, and why it was performed. In doing so, data that would have otherwise been discarded can be engaged with by third-parties while directly benefiting the radiology group as well. For instance, grassroots-based health insurance co-ops could emerge from these sources of data which are otherwise privy to insurance companies, to the benefit of health consumers, who can undergo imaging studies and receive other healthcare services at a fraction of current costs. Information about which studies are performed — where, by whom, and why, and the result of those studies, can be used to lower healthcare costs while improving health outcomes, rather than raise healthcare costs and increasing profits.

One question that naturally arises, especially in the context of current centralized data paradigms, is: why would healthcare providers be incentivized to make public valuable data that is routinely used by corporations and insurance companies to maximize profits? One powerful force driving healthcare costs upward is the process through which health providers bill patients via insurance companies. Whether ordering relatively common drugs or expensive therapeutics or procedures, healthcare systems rely on administrators whose role is to submit authorization requests to insurance companies for approval to prescribe therapeutics on their patients’ behalf [22]. When a service is rendered in the hospital or in a clinic, a healthcare team is reimbursed a fraction of the amount they bill for, creating a cat and mouse game in which providers continuously bill as high as possible for services rendered with the expectation that they will only receive a fraction of what they bill for, and in which insurance companies place limitations on which drugs and services this will pay for and how much of the cost they will cover [23]. Blockchain would provide an end to this cat-and-mouse game and create a race to the bottom for healthcare costs, through price transparency and elimination of bloated administrative layers that handle authorization requests and billing, while creating a race to the top for healthcare outcomes as this ledger of health services and outcomes would be publicly accessible on a blockchain. Simultaneously, healthcare providers can immediately receive payment for services rendered, and although individual payments may be less, overall profits would increase because payments would arrive immediately and there would be no need for entire departments of administrators whose entire role is to maximally inflate bills sent to insurance companies (and patients, insured and uninsured) and to see these bills through collection.

2.3 Informing current and future medical practice

We may well already have all the knowledge we need to cure many illnesses currently considered incurable [24]. We may well have all the data we need to create intelligent machines that can interpret CT scans, diagnose disease, and synthesize drugs to cure any condition. The reason this knowledge hasn’t culminated in more rapid advancement in healthcare and science is that information is fragmented into pieces, siloed, and ultimately rendered worthless data. Blockchain allows transparent access to data. It would be naive to imply that a data structure will cure society of all its ailments. However blockchain allows data to culminate into extremely valuable information, once at the disposal of a powerful few, now to the benefit of all who become stakeholders by contributing to, interacting with, and propagating data.

3. The need for a ledger of scholarly assets

The need for this project, a protocol for the hosting and sharing of data on a distributed network (“Distributed Data Sharing Hyperledger,” or DDASH), arises from the observations by the above examples, as well as the observation that numerous research groups at UCSF and other academic institutions are working in parallel in their endeavors to create knowledge with little synergistic interaction [25]. How would research group A at UCSF Medical Center know that research group B at the University of Michigan is working to answer the same scientific questions, for instance? Without a transparent glimpse into which resources an organization owns and how they are being used and shared, both research groups miss opportunities for synergistic collaboration, within and among organizations.

Those acquainted with the politics of contemporary academia will be quick to raise several criticisms. Working within the current reality of Google, the most comprehensive collection of information known to humanity as of September 2017, why can’t research groups A and B simply host their digital assets — data and knowledge gleamed from this data — on websites or public databases? And if groups A and B are competing to be the first to publish in academic journals and competing to drink from the same pools of grant funding, why would any research group benefit by sharing the results of experiments that were costly to run before they can reap the benefits of publication and intellectual property [26]? The answer is in blockchain’s ability to capture proof of work and proof of stake in a network’s digital assets. There is nothing to stop a competing research group from stealing these data and benefiting at their competitors’ expense. Hosting data in the form of knowledge on a blockchain elegantly solves this problem through irrefutable mathematical proof of data ownership, transfer, and authenticity [27].

3.1 Distributed Data Sharing Hyperledger (DDASH)

DDASH (link to open source Github repository) is a ledger of scholarly data and knowledge produced by life science, informatics, and clinical researchers at UCSF and other academic institutions. The need for this project arises from the negative impact of data siloing, competition, and counterproductive financial incentives in the academic world on the creation and sharing of knowledge. Concretely, researchers can host data — datasets, experimental results, and machine learning models, among other examples of scholarly knowledge — on the distributed InterPlanetary Filesystem (IPFS) network and record the location of these assets on an Ethereum-based blockchain, along with a description of the asset, when it was created, and who has privileges to access the data.

3.2 Network Architecture

We believe that the IPFS protocol’s combination of security and speed is well suited for this application. IPFS uses content-based addressing, in which a hashing function determines a file’s network address based on the file’s contents [28]. Storing data in the form of a directed acyclic diagram (in this case, a Merkle DAG) results in trees that can be efficiently traversed and queried. IPFS is a peer-to-peer network in which data is continuously circulating through network participants’ machines which are running the client software. Data are rendered permanent by virtue of content-based addressing and persistent by virtue of its peer-to-peer architecture, and data are rapidly accessible without the bottlenecks that Internet Protocol imposes.

3.3 Blockchain as a ledger

The blockchain functions as a decentralized ledger of digital resources and the movement of these resources throughout the network. As the DDASH protocol is formalized, more robust mechanisms for associating IPFS hashes with the owner of the resource and the permissions granted by the owner are necessary. Currently the DDASH protocol accounts for the following elements:

  • IPFS content-addressed hash, which defines the location of an asset on the IPFS network
  • The owner’s public key fingerprint
  • The public key fingerprints of users authorized to access the resource, or a designation as “public”
  • Timestamp

In its current form, DDASH interfaces between the IPFS network and the Ethereum blockchain. One can conceive an alternative version of the DDASH protocol that seamlessly integrates a ledger-based indexing and permission management system, using for example IPFS’s native public and private keys and a native IPFS ledger. Keeping the networking architecture separate from the blockchain has tangible advantages, however, including the versatility of allowing users to create digital assets using any permutation of blockchains, private and public.

3.4 Security

DDASH allows users to manage access to privileged resources using public-key encryption. Public-key encryption allows users to identify themselves on the network using a verifiable public key, which can be used to encrypt resources such that they can only be unencrypted using a  corresponding private key accessible exclusively to the intended recipient. Future versions of the DDASH protocol may feature ways to host resources on private clusters and manage access to these clusters on the blockchain. In doing so, resources are secured by limiting the movement of certain data to a subset of the swarm (network peers), and through a second layer of encryption. This not only allows data to move much more quickly through a network, it also greatly enhances security compared to the antiquated paradigm of data hosted on centralized, and therefore inherently vulnerable, servers. Common sources of wasted IT budgets and wasted productivity, such as forgotten, cracked and stolen passwords, or easily-intercepted HTTP network traffic, are obviated by virtue of the DDASH protocol. What stands between the theoretical underpinnings of this protocol and its implementation in academic centers and healthcare systems is not a question of the feasibility of this technology, but rather, whether legislation governing health information and computing will keep up with emerging trends in computing. Catastrophic beaches of sensitive consumer information, such as the Equifax data breach, have become regular occurrences and urgent reminders of the shortcomings of our antiquated Internet Protocol and undeserved trust in institutions that centralize large amounts of highly sensitive data at individuals’ expense [29].

3.5. DDASH Repository

DDASH is hosted as an open source repository at https://github.com/osmode/ddash.

We intend for this nascent project to illustrate the concepts and the larger vision outlined here while serving as a starting point for a formalized protocol for hosting and interacting with distributed digital assets. We made this a public repository early in the conception of this project in order to allow the codebase to benefit from the technical expertise and creativity of the open source community, and to allow the project to benefit from the rapid and exciting evolution in computing paradigms driven by the blockchain and distributing computing communities.

4. Using DDASH

DDASH currently runs on the blackswan private Ethereum network at 104.236.141.200. It benefits from the open source work produced by the IPFSEthereumOpenPGPweb3.py, and py-ipfs communities.

The Go Ethereum clientweb3.py, and py-ipfs Python packages are all prerequisite. The instructions here are for machines running Ubuntu 16.04. A Ethereum node must be connected to the blackswan private network and possess the ability to lock/unlock accounts to send transactions.

4.1 Directory Structure

Start by creating these directories:

mkdir /home/omarmetwally/blackswan
mkdir /home/omarmetwally/blackswan/gnupg
mkdir /home/omarmetwally/blackswan/data

4.2 Genesis Block

To connect to the blackswan network, you’ll need to use the same genesis block defined in genesis.json (see the Github repository). Move this file to /home/omarmetwally/blackswan/ and set your genesis block (you only need to do this once, and you need to install the Ethereum go client geth and Ethereum developer tools first):

 

geth --datadir=/home/omarmetwally/blackswan/data init /home/omarmetwally/blackswan/genesis.json

bootnode --genkey=boot.key

bootnode --nodekey=boot.key 

4.3 Go Ethereum client and IPFS daemons

In order to use the web3.py and ipfs wrappers, you’ll need to run geth and ipfs daemons in the background, respectively:

geth --verbosity 1 --datadir /home/omarmetwally/blackswan/data --networkid 4828 --port 30303 --rpcapi="db,eth,net,web3,personal,web3" --rpc 104.236.141.200 --rpcport 8545  console 

Be very careful when enabling RPC while your accounts are unlocked. This can lead to Ethereum wallet attacks, hence the recommendation to keep your development environment completely separate from any real Ether you might own.

The above command starts the go Ethereum client on your local machine and attempts to connect to the blackswan server at 104.236.141.200. Remember to set your genesis block according to the above directions. Trying to join this network with a different genesis block (such as the default genesis block) will not work.

Then open a new terminal window or tab and start the ifps daemon:

ipfs daemon

4.4 DDASH command line interface

Once your Ethereum and IPFS nodes are running, your account is unlocked, and you can interact with both clients, start the DDASH command line interface (CLI):

python main.py

                    DDASH
    ::: Distributed Data Sharing Hyperledger :::
    https://github.com/osmode/ddash

    Welcome to the DDASH Command Line Interface.

[1]   ddash> sanity check
      IPFS and geth appear to be running.
[2]   ddash> set directory /home/omarmetwally/blackswan/gnupg
[3]   ddash> new key
[4]   ddash> show keys
[5]   ddash> use key 0
[6]   ddash> show accounts
[7]   ddash> use account 0
[8]   ddash> set recipient your_recipient's_pubkey_id 
[9]   ddash> set file /path/to/clinical/trial/data.csv
[10]  ddash> encrypt
[11]  ddash> upload
[12]  ddash> checkout QmUahy9JKE6Q5LSHArePowQ91fsXNR2yKafTYtC9xQqhwP

The above commands:
1. check if IPFS daemon and Go Ethereum client are running
2. specify working directory (need to have read/write permission)
3. generate a new PGP keypair
4. list all PGP keypairs on your machine
5. uses the first (index 0) keypair as your identity
6. list Ethereum accounts
7. specify index of Ethereum account to use for transactions
8. specify an intended recipient's public key
9. upload the file to IPFS and create transaction containing the hash, user id of the person who uploaded the file, and recipient's public key id (or "public" indicating that it's not encrypted).
10. encrypt file from step 9 using public key from step 8
11. upload file from step 9 to IPFS network
12. check blockchain using IPFS hash as handle

 

4.5 Mining on the blackswan Ethereum network

Mining difficulty is currently relatively easy (1e6) on the blackswan network. Mine Ether by running:

geth --verbosity 4 --datadir /Users/omarmetwally/Desktop/blackswan/data --networkid 4828 --port 30303 --rpc 104.236.141.200--rpcport 8545  --mine console

 

5. Acknowledgements

I’m grateful to my mentor, Dr. David Avrin (UCSF) for his belief in this vision and for his unwavering support. My colleagues, Dr. Michael Wang and Dr. Steven Chan, provided formative feedback during the conception of these ideas. Steven Truong (UC Berkeley) inspired me with his technical creativity. Visionaries such as Vitalik Buterin and Juan Benet, and many brilliant minds contributing to the open source communities they inspired, conceived the technical underpinnings which are allowing these concepts to grow into powerful tools which I believe will transform and modernized academic research. 

6. References

  1. Grover A et al. “The Economics of Academic Medical Centers.” N Engl J Med 2014; 370:2360-2362. June 19, 2014. DOI: 10.1056/NEJMp1403609.
  2. Bohannon J. “Want to be a PI?”Science. June 2 2014. http://www.sciencemag.org/careers/2014/06/want-be-pi-what-are-odds. Accessed: 11 September 2017.
  3. Kaplan K. “A roll of the dice.” Nature 479, 433-435 (2011). doi:10.1038/nj7373-433.
  4. Van Noreen R. “Open access: the true cost of science publishing.” Nature. 27 March 2013. Accessed: 11 September 2017. Available: https://www.nature.com/news/open-access-the-true-cost-of-science-publishing-1.12676.
  5. World Health Organization. “WHO Statement on public disclosure of clinical trial results.” Published: 9 April 2015. Accessed: 11 September 2017. Available: http://www.who.int/ictrp/results/reporting/en/.
  6. Ionnidis JP. “Why most published research findings are false.” PLoS Medicine Published: 30 August 2005. Accessed: 11 September 2017. Available: https://doi.org/10.1371/journal.pmed.0020124.
  7. Narayan A et al. Bitcoin and cryptocurrency technologies. Princeton University Press, 19 July 2016.
  8. Narayan A and Clark J. “Bitcoin’s academic pedigree.” ACM Vol 15:14, 29 August 2017. doi: 10.1145/3134434.3136559
  9. Keyes D. “Amazon looks to gain a machine learning advantage.” Business Insider. Published: 8 September 2017. Accessed: 11 September 2017. Available: http://www.businessinsider.com/amazon-looks-to-gain-a-machine-learning-advantage-2017-9.
  10. Federal Trade Commission. “Data Brokers: A Call for Transparency and Accountability.” Published: May 2014. Accessed: 11 September 2017. Available: https://www.ftc.gov/system/files/documents/reports/data-brokers-call-transparency-accountability-report-federal-trade-commission-may-2014/140527databrokerreport.pdf.
  11. Kimbrough, Julie L., and Laura N. Gasaway. “Publication of government-funded research, open access, and the public interest.” Vand. J. Ent. & Tech. L. 18 (2015): 267.
  12. California State Department of Public Health. “California Taxpayer access to publicly funded research act (Assembly Bill No. 609).” Published: 29 September 2014. Accessed: 11 September 2017. Available: https://leginfo.legislature.ca.gov/faces/billNavClient.xhtml?bill_id=201320140AB609.
  13. Nakamoto, S. “Bitcoin: A peer-to-peer electronic cash system.” Satoshi Nakamoti Institute. Published: 31 October 2008. Accessed: 11 September 2017. Available: http://nakamotoinstitute.org/bitcoin/#selection-2203.14-2203.21 .
  14. Aspnes J et al. “Exposing computationally-challenged Byzantine imposters.” Yale University Department of Computer Science. Published: 26 July 2005. Accessed: 11 September 2017. Available: http://cs.yale.edu/publications/techreports/tr1332.pdf.
  15. Boyle TF. “GLT and GLR: component architecture for general ledgers.”  https://linas.org/mirrors/www.gldialtone.com/2001.07.14/GLT-GLR.htm .
  16. Wood G. “Ethereum: a secure decentralized transaction ledger.” http://gavwood.com/paper.pdf.
  17. Buterin V. “Notes on scalable blockchain protocols.” Ethereum Foundation. Published: 31 May 2015. https://github.com/vbuterin/scalability_paper/blob/master/scalability.pdf.
  18. Y Combinator. June 30th 2017. “IPFS, CoinList, and the Filecoin ICO with Juan Benet .” https://blog.ycombinator.com/ipfs-coinlist-and-the-filecoin-ico-with-juan-benet-and-dalton-caldwell/.
  19. Centers for Disease Control and Prevention. “Health Insurance Coverage.” https://www.cdc.gov/nchs/fastats/health-insurance.htm.
  20. Metwally O. “Building smart contract-based health insurance.” Published: 30 June 2014. https://omarmetwally.wordpress.com/2014/06/30/building-smart-contract-based-health-insurance/.
  21. Angrisano C et al l (McKinsey Global Institute). “Accounting for the cost of health care in the United States.” Jan 2017. http://www.mckinsey.com/industries/healthcare-systems-and-services/our-insights/accounting-for-the-cost-of-health-care-in-the-united-states.
  22. California Department of Health and Human Services. “Treatment Authorization Request.” http://www.dhcs.ca.gov/provgovpart/Pages/TAR.aspx.
  23. Jiwani A et al. “Billing and insurance-related administrative costs in the United States’ health care: synthesis of micro-costing evidence.”  BMC Health Serv Res. 2014 Nov 13;14:556. doi: 10.1186/s12913-014-0556-7.
  24. Hamermesh RG and Guisti K. “One obstacle to curing cancer: patient data isn’t shared.” Harvard Business Review, 28 Nov 2016. https://hbr.org/2016/11/one-obstacle-to-curing-cancer-patient-data-isnt-shared.
  25. Distributed Data Sharing Hyperledger. https://github.com/osmode/ddash.
  26. Fecher B, Friesike S and Hebing M. “What drives academic data sharing?” PLoS One Published: February 25, 2015. Available: https://doi.org/10.1371/journal.pone.0118053.
  27. Ethereum Foundation. “A next-generation smart contract and decentralized application platform.” https://github.com/ethereum/wiki/wiki/White-Paper .
  28. Benet J. “IPFS – content addressed, versioned, p2p file system.” arXiv:1407.3561 [cs.NI].
  29. Gressen S (Federal Trade Commission). “The Equifax Data Breach: What to Do.” Published: 8 September 2017. Available: https://www.consumer.ftc.gov/blog/2017/09/equifax-data-breach-what-do

Democratizing healthcare through decentralized consensus

The concept of cryptocurrency, and more broadly, of decentralized consensus, represents a shift away from the old-world paradigm of centralized authority. My parents’ generation (and their parents’ generation) grew up accustomed to confiding their trust in infallible governments, fail-safe banks, and reputable degree-granting academic institutions to which they paid decades’ worth of savings so that their children would have a better chance in society. Although decentralized consensus is silently changing the economic underpinnings of our society, I regard cryptocurrency and decentralized consensus as safeguards of the democratic ideals espoused by our constitution. The reality is that cryptocurrency is here to stay. Paradigm shifts are a constant in human history, and I believe that the emergence of decentralized consensus will mark one of the most momentous paradigm shifts in human history.

My friends and I went to hear Andreas Antonopolous, a cryptography and cryptocurrency guru, answer Bitcoin questions yesterday. If I were to summarize the 2-hour meetup in one sentence, it would be the following: the details of how cryptocurrencies are traded are still maturing, but the concept of decentralized consensus is here to stay. Decentralized consensus holds the promise of democracy 2.0, something that’s remained a Utopian dream except in the tiny country of Switzerland. Decentralized consensus holds the promise of a better world where governments and organizations don’t steal from politically weak, defenseless individuals. As Antonopolous points out, we’re fortunate enough to have a benevolent government in the United States, but the majority of the world is not so fortunate. Decentralized consensus holds the promise of empowering people to exercise the power of their vote to truly make healthcare a human right. Before I expound on this latter point, I want to outline some technical underpinnings for the uninitiated, so bear with me.

Satoshi Nakamoto’s most remarkable achievement with Bitcoin is the cryptocurrency’s success in solving the problem of a decentralized public ledger. In the case of the US Dollar or any other currency backed by a governmental body or bank, there exists a central authority that acts as the ledger. Bitcoin’s brilliance lies in the fact that the ledger is public, encompassing potentially everyone and anyone. The blockchain ledger is the communal ledger that lends cryptocurrencies their value. It’s characterized by the following 2 criteria [4]:

  • Blocks are very difficult to discover (Difficulty Factor * 2^32 hashes)
  • Blocks are easy to validate

A Bitcoin comes into existence when a “miner” uses her/his machine (and therefore computing resources, disk space, and electrical energy) to generate new blocks that record cryptocurrency transactions. The block chain with the most cumulative computational work is accepted by consensus as the valid block. In other words,  physical energy (electricity) is converted into Bitcoins. Keep that in mind if you ever find yourself wondering whether or not cryptocurrency is “a thing.” The reward for mining Bitcoins diminishes with time, as the horizontal asymptote of ~21 million BTC is approached (around 2024).

This setup has a few interesting results with regard to game theory. While mathematicians reading this will quickly pick up on the fact that wielding >50% of mining power holds the theoretical potential to manipulate the currency, game theorists should also note that this system strongly incentivizes cooperation and veracity [2] (I won’t get into the details here, but I’ll refer you to a suggested reading list at the end of the post).

The Bitcoin protocol is not Turing-complete. Enter Ethereum, a Turing-complete protocol for scripting contracts in the blockchain. Ethereum is big. If you’re not a believer yet in Vitalik Buterin and his work, I encourage you to check out the whitepaper for an interesting read. Ethereum uses a Python-like scripting language (Serpent) to convert contracts into cryptographic building blocks. For the first time in history, parties entering into agreements are not at the mercy of inherently biased third parties. Ethereum marks an era in which algorithms — not banks, governments, or individuals — hold the power to validate and execute contracts.

One interesting result of this decentralization is the so-called Decentralized Autonomous Organization (DAO), in which each member is represented as a cryptographic public key [1]. A contract that exists as lines of code in a Turing-complete language means that we can go beyond simple two-party agreements, like this prenuptial agreement written in Ethereum, to a corporate-like structure that automates redistribution of internal capital among participants in exchange for services provided, assets, or computational power. Transactions can contain information like votes, changes in the contract (such as amendments), or adding/removing members [1]. Most importantly, this is all automated without reliance on an escrow or central authority.

The U.S. healthcare crisis has demonstrated how lawmakers, insurance companies, and healthcare systems are struggling to figure out a way to fairly distribute access to healthcare. The U.S. healthcare system was hurt by an incentive system that rewards procedures rather than quality of care and health outcomes. Recent changes in CMS reimbursement are starting to change this, prompting the emergence of Accountable Care Organizations that receive payment in exchange for providing healthcare to a fixed population, rather than on a fee-for-service basis. The healthcare system failed for the same reason the financial industry lost its credibility in the 2008 financial crisis: third parties succeed in manipulating an easily manipulable system in their favor. People were robbed blind.

I’ll give a simple example of what I’ll call Decentralized Autonomous Health Insurance. Let’s say individuals A through J enter an agreement with physicians X and Y, in which X and Y agree to provide healthcare to individuals A-J. Let’s say in this simplified example that X and Y are not reimbursed for their services, but by A-J’s health outcomes (in ancient China, physicians were paid when their patients were healthy, not when they were sick). Let’s also say that X & Y have a practice that accepts cryptocurrency as payment. Then, A-J and X&Y can pen a virtual contract with the following stipulations:

  1. A-J pay 20 Bitcoins per year to receive care from X & Y’s practice.
  2. The cost to X & Y of providing healthcare to A-J is deducted from the pool of Bitcoins in (1)
  3. X & Y will receive a minimum reimbursement of 10 Bitcoins per patient per year.
  4. If the cost of providing healthcare is less than 10 Bitcoins per person per year, the surplus is shared evenly between providers (X & Y) and patients A – J. This incentivizes patients A – J to take care of their health so they get a bonus at the end of the year, and it incentivizes X & Y to adhere to primary/preventative medicine best practices (including taking time to counsel patients).
  5. A-J can vote annually on which providers they want to provide them with healthcare.
  6. A-J can vote annually on important decisions that affect the distribution of healthcare services.

We might even imagine a scenario in which each patient’s medical record is encoded and distributed in a decentralized manner such that it exists as undecipherable bytes among millions of computers around the world, rather than behind the walls of a single healthcare system. For example, a chip could keep track of our health habits and automatically append these data to our blockchain-based medical records. These data (such as smoking and exercise habits) could then be integrated into the communal contract, so that sedentary smokers have to pay more Bitcoins per year than active non-smokers in order to receive care from X & Y. In this model, individuals’ health (not access to healthcare!) is the internal capital. Everyone is both a payer and consumer of healthcare, and everyone has the power to vote on the bounds and conditions of care provided. This type of Ethereum-based Decentralized Autonomous Health Insurance would have no administrative overhead, no bureaucracy, and no board of directors to decide who is healthy enough to be insured.

I’m less interested in the exact economics of the hypothetical example above than in the broader concept of decentralized consensus and the self-fulfilling social contract. It’s time to decentralize health insurance the same way cryptocurrency is decentralizing currency.

Cryptocurrency and Ethereum are a new social and technological frontier, which haven’t really reached mainstream yet. These young protocols still have to pass several important tests (such as reliable security mechanisms) and prove their scalability before they become widely used, but I’m optimistic. The future will be one shaped by knowledge, and less so by historical inertia. Decentralized Autonomous Organizations hold the promise of just distribution of scarce resources, including the most vital one of all: access to healthcare.

References:

  1. Ethereum Whitepaper
  2. Vitalik Buterin’s blog
  3. Bitcoin: Open source P2P money
  4. Brian Warner’s technical introduction to Bitcoin

 

[First published on my Quora blog on May 7th 2014]