import * as React from 'react'
  /* @jsx mdx */
import { mdx } from '@mdx-js/react';
/* @jsxRuntime classic */

/* @jsx mdx */

import DefaultLayout from "C:/Alvearie/alvearie.github.io/node_modules/gatsby-theme-carbon/src/templates/Default.js";
export const _frontmatter = {};
const layoutProps = {
  _frontmatter
};
const MDXLayout = DefaultLayout;
export default function MDXContent({
  components,
  ...props
}) {
  return <MDXLayout {...layoutProps} {...props} components={components} mdxType="MDXLayout">


    <p>{`By Denis Ricard `}{` `}{` `}{` | `}{` `}{` `}{` Published June 2, 2021`}</p>
    <p>{`Once any information is digitized and secured using methods such as encryption, firewalls and authorization mechanisms, the next logical step is to
mine the data to gain insight. However, before any personal data can be used to support secondary purposes, applicable privacy laws that govern this
personal data, such as HIPAA, GDPR, CCPA, PIPEDA, or others, must be understood and adhered to in addition to security requirements.`}</p>
    <p>{`Primary Use of data is when usage of the data falls within the confines of the privacy law (as permitted use) and what the data subject consented to.
In Primary Use cases, it’s best practice to pseudonymize direct identifiers (DI), which consist of fields that uniquely identify an individual on their
own, unless the use case specifically requires the data subject to be identifiable. It is important to note that encrypting identifiers may not suffice
to de-characterize data as non-personal because encrypted fields are still considered to be private information under some privacy legal frameworks.`}</p>
    <p>{`When source data are to be used for secondary purposes to the ones they were originally collected, they must first be de-identified or anonymized to a
point where the risk of re-identifying data subjects in the data is very low. Different privacy laws define different criteria and technologies on how
data must be processed to be sufficiently de-identified or anonymized. De-Identified or anonymized data typically falls outside the scope of the
corresponding privacy legal frameworks and can be more readily shared, combined and mined as it is no longer considered to be personal information. `}</p>
    <p>{`In order to generate de-identified and anonymized data, in addition to the Direct Identifiers (DI) that may exist in the dataset, Quasi Identifiers
(QI) - which consist of fields that, when combined with each other, can re-identify an individual - must also be protected. Adding quasi-identifying
fields to a quasi-identifier progressively reduces the corresponding population that can be matched to an individual record and, therefore, increases
the probability of a successful re-identification attack. Protection of QI fields can be achieved through different means, such as via data value
generalization or perturbation to prevent the combined QI fields in records from being associated with unique or few individuals.`}</p>
    <span {...{
      "className": "gatsby-resp-image-wrapper",
      "style": {
        "position": "relative",
        "display": "block",
        "marginLeft": "auto",
        "marginRight": "auto",
        "maxWidth": "1152px"
      }
    }}>{`
      `}<span parentName="span" {...{
        "className": "gatsby-resp-image-background-image",
        "style": {
          "paddingBottom": "53.81944444444444%",
          "position": "relative",
          "bottom": "0",
          "left": "0",
          "backgroundImage": "url('data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAALCAYAAAB/Ca1DAAAACXBIWXMAAA7DAAAOwwHHb6hkAAACn0lEQVQoz32TXUiTcRTGB63EpWL5hUk3XXTRVQYRlRB9QBB22U1RdBGCXXjRRZQhfUpKWH5rpitZZkYqTNQykbWsnFpTmzP1ndl0H7p3Tudk7vPX62J31gMP5/wvznPO+fMcGYSx2e04RCcb8Hq9+Hy+CAOBAKFQiM0QDoc3pWzaImKbmyXocRHyeRkdG0M3NMTg8DDf9XpEUYwIeDyeCN1ud6TpvyA789rBuTYnKsMKBusSM4KAyzLH+soyHpcLvzSp6HTyfUQvNRrkq26AaZMQKbbZbRiMBow/Jxg3juNd9yI7rLSz5f4M8Y+n2F03RnaHgbvfTHywOjCvreGVVnZYbTjMFlYXnawuOAmtrkcEF2atCIYpTMZpfk2aICBNmFVvJvGGloySUVKq9MSW9bG1rI2Y2mbSm5s51NXC6a5GLmqaKPjRSelML422ftpFHa1CH52TGvqELwxbR5l0C8iOSoLx1z+SWtBDarmOlKJWkm6VsbOwjMTnz1GoKpE/zEP+KAf5k0vEqC4Tp80hfvAS8e+y2aE+wa7u42QMZLFv+iSyIw0WEm5+Ju1BP2nVIyQXt5N07ylJRZUkKxtIaakjsSKfuIo8tiuvolDnEvv+MoqBC8S8PcW2xoPEqDOR9+wl7VumJFg/T8K1HtLv9JJeIwnWaImt7kBe3cS2hgbiXlaRUVnA/qZCjqmLOdtdzHlNCbnGWm4YnlI08YKa+TfUWl6htLZIf6i0osjXobjdj6J0mIy6T5xq/cQ17SAqo5Eh2zyz4iKuNQ/+YGDDtv+F7MAzO3vKzVxR/0aln0NwLBP0+/9ZEJLMGwgFCYSDf2M0lxiUKNMIy5gdK/jdSwQ8K3hX3bgk/y1KhhaXliQ6I2+nlG/EKKMXFL2QKP4APovGKL4r/TEAAAAASUVORK5CYII=')",
          "backgroundSize": "cover",
          "display": "block"
        }
      }}></span>{`
  `}<img parentName="span" {...{
        "className": "gatsby-resp-image-image",
        "alt": "Screen Shot",
        "title": "Screen Shot",
        "src": "/static/7fd192685438bfb1a9d1b52610cadd26/3cbba/qi_examples.png",
        "srcSet": ["/static/7fd192685438bfb1a9d1b52610cadd26/7fc1e/qi_examples.png 288w", "/static/7fd192685438bfb1a9d1b52610cadd26/a5df1/qi_examples.png 576w", "/static/7fd192685438bfb1a9d1b52610cadd26/3cbba/qi_examples.png 1152w", "/static/7fd192685438bfb1a9d1b52610cadd26/70f3b/qi_examples.png 1301w"],
        "sizes": "(max-width: 1152px) 100vw, 1152px",
        "style": {
          "width": "100%",
          "height": "100%",
          "margin": "0",
          "verticalAlign": "middle",
          "position": "absolute",
          "top": "0",
          "left": "0"
        },
        "loading": "lazy"
      }}></img>{`
    `}</span>
    <h3>{`Practical examples of re-identification protection`}</h3>
    <p>{`Encrypted personal information (PI) is still considered PI under some privacy legal frameworks and encrypting field values makes it harder to use
the field for anything else than an equality predicate. As a result, even in privacy frameworks that allow encryption, this approach is not a good
choice when dealing with QI fields that must be used in various and complex computations. QI fields are typically valuable in data analyses as they
involve demographics, geographic locations, event dates, etc., that are associated with individuals, so their privacy protection should be performed
in a utility-preserving way. At the other end of the spectrum, redacting field values removes any utility from those fields and can break applications
that expect field data to be of a certain format.`}</p>
    <span {...{
      "className": "gatsby-resp-image-wrapper",
      "style": {
        "position": "relative",
        "display": "block",
        "marginLeft": "auto",
        "marginRight": "auto",
        "maxWidth": "650px"
      }
    }}>{`
      `}<span parentName="span" {...{
        "className": "gatsby-resp-image-background-image",
        "style": {
          "paddingBottom": "88.19444444444444%",
          "position": "relative",
          "bottom": "0",
          "left": "0",
          "backgroundImage": "url('data:image/png;base64,iVBORw0KGgoAAAANSUhEUgAAABQAAAASCAYAAABb0P4QAAAACXBIWXMAAA7DAAAOwwHHb6hkAAACSElEQVQ4y5WUy28SURTG+Rtd6KZdVbtxg7qgjYkWN41v000xPoMWXYgpMYYi2iZCEykUqpgagdICE4Zh5DEDM8O8+LwzdYDpILQnOXOSe8/93e/MPfe6QIzjOHQ6PEat3+873BqfZC7jQ9M11OuMhXIknQXqMiZrDIuDMgu2rYD+I5HYgyBpZKEOva+OgPWpUFNhu91Go9kytamaDrGngWlK+FUpI1//gVw9jabA2BSPKh8LNHw0We0LyJRS+Li3gY19P97vrSJT+ToWOgWom1FSRKTKnxDc9cO3tYLH8UXc+3yZwF9NhJpAnufRarUcZewcRfEmfRsPvlzH0gc37m5egjc8g+jvwNhOGBwKTdO2kvV/KmnuECtRD24GZ+AJzMOzRoDrs1h4e4H8jpiZY+UOgKIoIpvNkj7s2Ha1EkOpF7jqP2dC3c/msBC4iIW183gS86Irc46qTGChULABR6OsSXieWMKtyCwpeR43Qm7c37yGlwkvvlNxRzuNPWVr0lLZ6DJYjS1iOTKHR1tX8DByB+8yPnw7DEPRFHvJ/2ubk9CWyJLW8eF1chnB9FOs74aQrcYhyJ3TA09CVU3Gz2oCO6UwtnPb2KdyZO4UCp0+hB73qAC+18QRw0HXj++/A2i8OGc1XtDAdVVbo5vARqNBXps62U2HLMsO1zTNTFYUZTCmqgpavIgq24VODkZWlKFCQRBQKpXAMAxqtZrNjY3y+TySySQoihrkGLFCVZE/KKFGLkaxWBzelGkmSeRJY1mzgklmsP4ChbdPQZJYKsAAAAAASUVORK5CYII=')",
          "backgroundSize": "cover",
          "display": "block"
        }
      }}></span>{`
  `}<img parentName="span" {...{
        "className": "gatsby-resp-image-image",
        "alt": "Screen Shot",
        "title": "Screen Shot",
        "src": "/static/19cf5a824d902e4d0927339f7bc05a2a/90640/privacy_vs_utility.png",
        "srcSet": ["/static/19cf5a824d902e4d0927339f7bc05a2a/7fc1e/privacy_vs_utility.png 288w", "/static/19cf5a824d902e4d0927339f7bc05a2a/a5df1/privacy_vs_utility.png 576w", "/static/19cf5a824d902e4d0927339f7bc05a2a/90640/privacy_vs_utility.png 650w"],
        "sizes": "(max-width: 650px) 100vw, 650px",
        "style": {
          "width": "100%",
          "height": "100%",
          "margin": "0",
          "verticalAlign": "middle",
          "position": "absolute",
          "top": "0",
          "left": "0"
        },
        "loading": "lazy"
      }}></img>{`
    `}</span>
    <p>{`Our goal is to reduce the re-identification risk under a viable threshold while preserving data attributes that are interesting to
data scientists and algorithms that use the data. Let’s take three specific examples to illustrate the point.`}</p>
    <h3>{`1 - Zip Code`}</h3>
    <p>{`HIPAA Safe Harbor has very specific requirements for US ZIP code handling. It allows to preserve the first 3 digits of the ZIP code but only if all
ZIP codes covered by these three digits have an aggregate population that is larger than 20,000 people, or else it must be replaced with 000. As of
the 2010 Census, 17 ZIP codes are restricted in such a way. With the Alvearie Data De-Identification server, the Safe Harbor requirement could be
satisfied with the following utility-preserving options:`}</p>
    <table>
      <thead parentName="table">
        <tr parentName="thead">
          <th parentName="tr" {...{
            "align": null
          }}>{`Option`}</th>
          <th parentName="tr" {...{
            "align": null
          }}>{`Value`}</th>
        </tr>
      </thead>
      <tbody parentName="table">
        <tr parentName="tbody">
          <td parentName="tr" {...{
            "align": null
          }}>{`maskPrefixLength`}</td>
          <td parentName="tr" {...{
            "align": null
          }}>{`3 (default)`}</td>
        </tr>
        <tr parentName="tbody">
          <td parentName="tr" {...{
            "align": null
          }}>{`maskPrefixRequireMinPopulation`}</td>
          <td parentName="tr" {...{
            "align": null
          }}>{`true`}</td>
        </tr>
        <tr parentName="tbody">
          <td parentName="tr" {...{
            "align": null
          }}>{`maskPrefixMinPopulation`}</td>
          <td parentName="tr" {...{
            "align": null
          }}>{`20000 (default)`}</td>
        </tr>
      </tbody>
    </table>
    <p>{`When using HIPAA Expert determination, other options could be considered at the discretion of the expert. For example, generalizing to the
first 3 digits unless the minimum population requirement of 15,000 is not met in which case, generalize to 2 digits. In this scenario, the
following options would be used:`}</p>
    <table>
      <thead parentName="table">
        <tr parentName="thead">
          <th parentName="tr" {...{
            "align": null
          }}>{`Option`}</th>
          <th parentName="tr" {...{
            "align": null
          }}>{`Value`}</th>
        </tr>
      </thead>
      <tbody parentName="table">
        <tr parentName="tbody">
          <td parentName="tr" {...{
            "align": null
          }}>{`maskPrefixLength`}</td>
          <td parentName="tr" {...{
            "align": null
          }}>{`3`}</td>
        </tr>
        <tr parentName="tbody">
          <td parentName="tr" {...{
            "align": null
          }}>{`maskTruncateIfNotMinPopulation`}</td>
          <td parentName="tr" {...{
            "align": null
          }}>{`True`}</td>
        </tr>
        <tr parentName="tbody">
          <td parentName="tr" {...{
            "align": null
          }}>{`maskPrefixMinPopulation`}</td>
          <td parentName="tr" {...{
            "align": null
          }}>{`15000`}</td>
        </tr>
        <tr parentName="tbody">
          <td parentName="tr" {...{
            "align": null
          }}>{`maskTruncateLengthIfNotMinPopulation`}</td>
          <td parentName="tr" {...{
            "align": null
          }}>{`2`}</td>
        </tr>
      </tbody>
    </table>
    <p>{`The Alvearie `}<a parentName="p" {...{
        "href": "https://github.com/Alvearie/de-identification"
      }}>{`Data De-Identification`}</a>{` server allows the application of more than one data protection method to the same data element. This
applies when multiple values are specified in the maskingProviders parameter of a rule. The data protection methods are applied in the
sequence in which they are listed in the data de-identification configuration file. Here is an example that retrieves information about the
state where an individual resides for selected US states. Starting with a US postal (ZIP) code it translates five-digit codes to three digits
and then maps the result to states ME, CT, and Other. This is achieved by chaining the ZIPCODE provider with the GENERALIZE provider.`}</p>
    <p><strong parentName="p">{`ZIPCODE:`}</strong>{` `}<br />{`
Use default options`}</p>
    <p><strong parentName="p">{`GENERALIZE:`}</strong></p>
    <table>
      <thead parentName="table">
        <tr parentName="thead">
          <th parentName="tr" {...{
            "align": null
          }}>{`Source Value`}</th>
          <th parentName="tr" {...{
            "align": null
          }}>{`Target Value`}</th>
        </tr>
      </thead>
      <tbody parentName="table">
        <tr parentName="tbody">
          <td parentName="tr" {...{
            "align": null
          }}>{`040,041,042,043,044,045, 046,047,048,049`}</td>
          <td parentName="tr" {...{
            "align": null
          }}>{`ME`}</td>
        </tr>
        <tr parentName="tbody">
          <td parentName="tr" {...{
            "align": null
          }}>{`060,061,062,063,064,065,066,067,068,069`}</td>
          <td parentName="tr" {...{
            "align": null
          }}>{`CT`}</td>
        </tr>
        <tr parentName="tbody">
          <td parentName="tr" {...{
            "align": null
          }}>{`*`}</td>
          <td parentName="tr" {...{
            "align": null
          }}>{`Other`}</td>
        </tr>
      </tbody>
    </table>
    <h3>{`2 - Birth date and Death date`}</h3>
    <p>{`There is a multitude of generalizations, perturbations, obfuscation and boundaries that can be imposed on dates, timestamps and durations.
Let’s consider a case where the birth date (/Patient/birthDate) of an individual should be masked for individuals who died at 5 years of age or
less. Here are the possible DATEDEPENDENCY masking provider utility preservation options that would be used with the Alvearie `}<a parentName="p" {...{
        "href": "https://github.com/Alvearie/de-identification"
      }}>{`Data De-Identification`}</a>{`
server to support such a case.`}</p>
    <table>
      <thead parentName="table">
        <tr parentName="thead">
          <th parentName="tr" {...{
            "align": null
          }}>{`Option`}</th>
          <th parentName="tr" {...{
            "align": null
          }}>{`Value`}</th>
        </tr>
      </thead>
      <tbody parentName="table">
        <tr parentName="tbody">
          <td parentName="tr" {...{
            "align": null
          }}>{`dateYearDeleteNDaysValue`}</td>
          <td parentName="tr" {...{
            "align": null
          }}>{`1825 (365*5)`}</td>
        </tr>
        <tr parentName="tbody">
          <td parentName="tr" {...{
            "align": null
          }}>{`datetimeYearDeleteNIntervalCompareDate`}</td>
          <td parentName="tr" {...{
            "align": null
          }}>{`/Patient/deceased`}</td>
        </tr>
      </tbody>
    </table>
    <p>{`Here is a common example following the HIPAA Safe Harbor method where dates are generalized to year and ages over 89 must be categorized as
90 years or older. With the Alvearie Data De-Identification server, supporting such a case simply requires these options to be used with the
DATETIME masking provider on a birthdate field. Note that an option also exists to generalize to month and year instead of year only.`}</p>
    <table>
      <thead parentName="table">
        <tr parentName="thead">
          <th parentName="tr" {...{
            "align": null
          }}>{`Option`}</th>
          <th parentName="tr" {...{
            "align": null
          }}>{`Value`}</th>
        </tr>
      </thead>
      <tbody parentName="table">
        <tr parentName="tbody">
          <td parentName="tr" {...{
            "align": null
          }}>{`generalizeYearMaskAgeOver90`}</td>
          <td parentName="tr" {...{
            "align": null
          }}>{`True`}</td>
        </tr>
      </tbody>
    </table>
    <h3>{`3 - Name`}</h3>
    <p>{`Name is a direct identifier and must be masked carefully. To maintain data utility with names, it is possible to replace names with fake names of the
same gender with a certain amount of accuracy. Depending on the privacy framework we are working under, it can also be possible to select a replacement
name with pseudo-randomness so that we achieve consistent replacement of the name as we see it repeating in the original dataset. The pseudo-randomness
feature is not allowed in HIPAA Safe Harbor because the entire original value is hashed and used to select the pseudo-random feature.
With the Alvearie `}<a parentName="p" {...{
        "href": "https://github.com/Alvearie/de-identification"
      }}>{`Data De-Identification`}</a>{` server, the NAME masking provider would preserve gender for recognized given names with the
following options. Note that for localization purpose, files with new names can be added to the project and selected through property files.`}</p>
    <table>
      <thead parentName="table">
        <tr parentName="thead">
          <th parentName="tr" {...{
            "align": null
          }}>{`Option`}</th>
          <th parentName="tr" {...{
            "align": null
          }}>{`Value`}</th>
        </tr>
      </thead>
      <tbody parentName="table">
        <tr parentName="tbody">
          <td parentName="tr" {...{
            "align": null
          }}>{`maskingAllowUnisex`}</td>
          <td parentName="tr" {...{
            "align": null
          }}>{`false (default)`}</td>
        </tr>
        <tr parentName="tbody">
          <td parentName="tr" {...{
            "align": null
          }}>{`maskPseudorandom`}</td>
          <td parentName="tr" {...{
            "align": null
          }}>{`false (default)`}</td>
        </tr>
        <tr parentName="tbody">
          <td parentName="tr" {...{
            "align": null
          }}>{`maskGenderPreserve`}</td>
          <td parentName="tr" {...{
            "align": null
          }}>{`true`}</td>
        </tr>
      </tbody>
    </table>
    <p>{`With all the possible permutations, varying data sets, evolving schemas, new regions or countries to support, each with different privacy rules,
hardcoding these rules within an application is not a viable long-term solution. Also, exposing coding interfaces to cover this amount of change
can be error-prone and problematic in healthcare-certified environments, where code changes require going over a costly verification and validation process. `}</p>
    <h3>{`Introducing the Alvearie Data De-Identification Server`}</h3>
    <p>{`The Alvearie `}<a parentName="p" {...{
        "href": "https://github.com/Alvearie/de-identification"
      }}>{`Data De-Identification`}</a>{` server allows applications to strike the perfect balance between privacy and utility by providing a suite of
utility-preserving masking providers. The rich configuration schema allows applications to support privacy frameworks from different regions and
adapt to various dataset schemas without changing the code itself. Since the application is built with healthcare in mind, it contains common rules
that experts tend to select. The generic masking providers also help to handle rare and unexpected data from other domains than healthcare.`}</p>
    <h3>{`Creating a masking configuration from what we learned earlier`}</h3>
    <p>{`We can use some of the masking configuration ideas from the previous section to create a configuration file that the Alvearie Data De-identification
server can use. The configuration has two main sections. The “rules” section defines masking rules using one of the supported masking providers
along with the desired options. In the “json” section that follows, input fields are assigned to one of the previously defined masking rules.
The “json” section also describes the schema of the input data as being FHIR records, and lists the supported resource type, which is
“Patient” in this case. This resource type partitioning allows us to potentially define different masking rules for the same field as it appears
in different resource types.`}</p>
    <p>{`The masking configuration has a very rich set of features to allow solutions to strike the perfect balance between privacy and utility.
Refer to the `}<a parentName="p" {...{
        "href": "https://github.com/Alvearie/de-identification/blob/master/docs/masking-config-overview.md"
      }}>{`Masking Configuration Overview`}</a>{` for a more detailed look at masking configuration features such as
JSON arrays, conditionals and internationalization support. `}</p>
    <pre><code parentName="pre" {...{
        "className": "language-json"
      }}>{`{
  "rules": [
    {
      "name": "MaskZipCode",
      "maskingProviders": [
        {
          "type": "ZIPCODE",
          "maskPrefixRequireMinPopulation": true
        }
      ]
    },
    {
      "name": "MaskBirthDay",
      "maskingProviders": [
        {
          "type": "DATETIME",
          "generalizeYearMaskAgeOver90": true
        }
      ]
    },
    {
      "name": "MaskFirstName",
      "maskingProviders": [
        {
          "type": "NAME",
          "maskGenderPreserve": true
        }
      ]
    },
    {
      "name": "MaskFamilyName",
      "maskingProviders": [
        {
          "type": "NAME"
        }
      ]
    },    
    {
      "name": "PHONE",
      "maskingProviders": [
        {
          "type": "PHONE"
        }
      ]
    }
  ],
  "json": {
    "schemaType": "FHIR",
    "messageTypeKey": "resourceType",
    "messageTypes": [
      "Patient"
    ],
    "maskingRules": [
      {
        "jsonPath": "/fhir/Patient/name/given",
        "rule": "MaskFirstName"
      },
      {
        "jsonPath": "/fhir/Patient/name/family",
        "rule": "MaskLastName"
      },
      {
        "jsonPath": "/fhir/Patient/telecom/value",
        "rule": "PHONE"
      },
      {
        "jsonPath": "/fhir/Patient/birthDate",
        "rule": "MaskBirthDay"
      },
      {
        "jsonPath": "/fhir/Patient/address/postalCode",
        "rule": "MaskZipCode"
      }
    ]
  }
}
`}</code></pre>
    <h3>{`Using the Alvearie Data De-Identification Server`}</h3>
    <p>{`The fastest way to get started with the Alvearie Data De-Identification server is by using helm charts to deploy the server to a
Kubernetes cluster of your choice. The latest instructions are available on `}<a parentName="p" {...{
        "href": "https://github.com/Alvearie/de-identification/blob/master/de-identification-app/chart/README.md"
      }}>{`GitHub`}</a>{`.`}</p>
    <h4>{`Pre-requisites`}</h4>
    <ul>
      <li parentName="ul">{`Kubernetes cluster 1.10+`}</li>
      <li parentName="ul">{`Helm 3.0.0+`}</li>
      <li parentName="ul">{`Jq 1.6+`}</li>
    </ul>
    <h4>{`Check out the code and install the chart`}</h4>
    <pre><code parentName="pre" {...{}}>{`git clone https://github.com/Alvearie/de-identification.git
cd de-identification/de-identification-app/chart
helm install deid .
`}</code></pre>
    <h4>{`Test the newly created service with original message text:`}</h4>
    <pre><code parentName="pre" {...{
        "className": "language-json"
      }}>{`{
  "resourceType": "Patient",
  "id": "example",
  "address": {
    "postalCode": "10001"
  },
  "name": [
    {
      "use": "official",
      "family": "Leroy",
      "given": [
        "Peter",
        "James"
      ]
    },
    {
      "use": "usual",
      "given": [
        "Anna"
      ]
    }
  ],
  "telecom": [
    {
      "system": "phone",
      "value": "+1-3471234567",
      "use": "work",
      "rank": 1
    }
  ],
  "birthDate": "1974-12-25"
}
`}</code></pre>
    <pre><code parentName="pre" {...{}}>{`kubectl port-forward service/deid 8888:8080&

curl -k POST --header 'Content-Type: application/json' --header 'Accept: application/json' -d '{ 
"config":"{\\"rules\\":[{\\"name\\":\\"MaskZipCode\\",\\"maskingProviders\\": [{\\"type\\":\\"ZIPCODE\\",\\"maskPrefixRequireMinPopulation\\":true}]},{\\"name\\":\\"MaskBirthDay\\",\\"maskingProviders\\":[{\\"type\\":\\"DATETIME\\",\\"generalizeYearMaskAgeOver90\\":true}]},{\\"name\\":\\"MaskFirstName\\",\\"maskingProviders\\":[{\\"type\\":\\"NAME\\",\\"maskGenderPreserve\\":true}]},{\\"name\\":\\"MaskFamilyName\\",\\"maskingProviders\\":[{\\"type\\":\\"NAME\\",\\"maskPseudorandom\\":true}]},{\\"name\\":\\"PHONE\\",\\"maskingProviders\\":[{\\"type\\":\\"PHONE\\"}]}],\\"json\\":{\\"schemaType\\":\\"FHIR\\",\\"messageTypeKey\\":\\"resourceType\\",\\"messageTypes\\":[\\"Patient\\"],\\"maskingRules\\":[{\\"jsonPath\\":\\"/fhir/Patient/name/given\\",\\"rule\\":\\"MaskFirstName\\"},{\\"jsonPath\\":\\"/fhir/Patient/name/family\\",\\"rule\\":\\"MaskFamilyName\\"},{\\"jsonPath\\":\\"/fhir/Patient/address/postalCode\\",\\"rule\\":\\"MaskZipCode\\"},{\\"jsonPath\\":\\"/fhir/Patient/telecom/value\\",\\"rule\\":\\"PHONE\\"},{\\"jsonPath\\":\\"/fhir/Patient/birthDate\\",\\"rule\\":\\"MaskBirthDay\\"}]}}" , "data": ["{\\"resourceType\\":\\"Patient\\",\\"id\\":\\"example\\",\\"address\\":{\\"postalCode\\":\\"10001\\"},\\"name\\":[{\\"use\\":\\"official\\",\\"family\\":\\"Leroy\\",\\"given\\":[\\"Peter\\",\\"James\\"]},{\\"use\\":\\"usual\\",\\"given\\":[\\"Anna\\"]}],\\"telecom\\":[{\\"system\\":\\"phone\\",\\"value\\":\\"+1-3471234567\\",\\"use\\":\\"work\\",\\"rank\\":1}],\\"birthDate\\":\\"1974-12-25\\"}"], "schemaType": "FHIR" }' \\
'http://localhost:8888/api/v1/deidentification' | jq "."
`}</code></pre>
    <h4>{`The output will look like this:`}</h4>
    <pre><code parentName="pre" {...{
        "className": "language-json"
      }}>{`{
  "data": [
    {
      "resourceType": "Patient",
      "id": "example",
      "address": {
        "postalCode": "100"
      },
      "name": [
        {
          "use": "official",
          "family": "Kole",
          "given": [
            "Jess",
            "Jimmie"
          ]
        },
        {
          "use": "usual",
          "given": [
            "Gillian"
          ]
        }
      ],
      "telecom": [
        {
          "system": "phone",
          "value": "+1-3475353644",
          "use": "work",
          "rank": 1
        }
      ],
      "birthDate": "1974"
    }
  ],
  "audit": null
}
`}</code></pre>
    <h4>{`Observations:`}</h4>
    <ul>
      <li parentName="ul">{`The postal code was reduced to three digits.`}</li>
      <li parentName="ul">{`The first names have their gender preserved where it could be identified.`}</li>
      <li parentName="ul">{`The family name is replaced with a consistent pseudo-random value.`}</li>
      <li parentName="ul">{`The phone number has the country and area code preserved but other digits randomized so applications expecting the format to be valid will still work.`}</li>
      <li parentName="ul">{`The birth date has been generalized to the year alone.`}</li>
    </ul>
    <h3>{`Typical Usage pattern`}</h3>
    <p>{`Typically, the Alvearie De-Identification server is used after the raw input data has been consolidated so that input fields have a consistent and
meaningful value. The output data having the desired trade-off between privacy and usability is ready to mine. For cases where the schema is highly
variable and new fields with PI may suddenly appear, the “defaultNoRuleResolution” configuration parameter can be set to “true” so that any input
field without a specific rule association is nullified in the output. Create a rule that uses the MAINTAIN masking provider and assign it to any
field that must be sent through untouched.`}</p>
    <h3>{`Conclusion`}</h3>
    <p>{`Now that you know the basics of data de-identification, you can expand on the configuration above to de-identify more resourceTypes.
You can also connect more components of the Alvearie project together as described in the `}<a parentName="p" {...{
        "href": "https://alvearie.io/blog/clinical-records-ingestion-pattern"
      }}>{`Clinical Data Ingestion Pattern`}</a>{` blog.`}</p>
    <p />

    </MDXLayout>;
}
;
MDXContent.isMDXComponent = true;
      