• Big data privacy for machine learning ju

    From ScienceDaily@1:317/3 to All on Tue Nov 16 21:30:40 2021
    Big data privacy for machine learning just got 100 times cheaper
    Hashing method slashes cost of implementing differential privacy

    Date:
    November 16, 2021
    Source:
    Rice University
    Summary:
    Computer scientists have discovered an inexpensive way for tech
    companies to implement a rigorous form of personal data privacy
    when using or sharing large databases for machine learning.



    FULL STORY ==========================================================================
    Rice University computer scientists have discovered an inexpensive way
    for tech companies to implement a rigorous form of personal data privacy
    when using or sharing large databases for machine learning.


    ========================================================================== "There are many cases where machine learning could benefit society if
    data privacy could be ensured," said Anshumali Shrivastava, an associate professor of computer science at Rice. "There's huge potential for
    improving medical treatments or finding patterns of discrimination,
    for example, if we could train machine learning systems to search for
    patterns in large databases of medical or financial records. Today,
    that's essentially impossible because data privacy methods do not scale." Shrivastava and Rice graduate student Ben Coleman hope to change that
    with a new method they'll present this week at CCS 2021, the Association
    for Computing Machinery's annual flagship conference on computer and communications security.

    Using a technique called locality sensitive hashing, Shirvastava and
    Coleman found they could create a small summary of an enormous database
    of sensitive records. Dubbed RACE, their method draws its name from
    these summaries, or "repeated array of count estimators" sketches.

    Coleman said RACE sketches are both safe to make publicly available and
    useful for algorithms that use kernel sums, one of the basic building
    blocks of machine learning, and for machine-learning programs that perform common tasks like classification, ranking and regression analysis. He
    said RACE could allow companies to both reap the benefits of large-scale, distributed machine learning and uphold a rigorous form of data privacy
    called differential privacy.

    Differential privacy, which is used by more than one tech giant, is based
    on the idea of adding random noise to obscure individual information.

    "There are elegant and powerful techniques to meet differential privacy standards today, but none of them scale," Coleman said. "The computational overhead and the memory requirements grow exponentially as data becomes
    more dimensional." Data is increasingly high-dimensional, meaning
    it contains both many observations and many individual features about
    each observation.

    RACE sketching scales for high-dimensional data, he said. The sketches
    are small and the computational and memory requirements for constructing
    them are also easy to distribute.

    "Engineers today must either sacrifice their budget or the privacy of
    their users if they wish to use kernel sums," Shrivastava said. "RACE
    changes the economics of releasing high-dimensional information with differential privacy.

    It's simple, fast and 100 times less expensive to run than existing
    methods." This is the latest innovation from Shrivasta and his
    students, who have developed numerous algorithmic strategies to make
    machine learning and data science faster and more scalable. They and
    their collaborators have: found a more efficient way for social media
    companies to keep misinformation from spreading online, discovered how
    to train large-scale deep learning systems up to 10 times faster for
    "extreme classification" problems, found a way to more accurately and efficiently estimate the number of identified victims killed in the Syrian civil war, showed it's possible to train deep neural networks as much
    as 15 times faster on general purpose CPUs (central processing units)
    than GPUs (graphics processing units), and slashed the amount of time
    required for searching large metagenomic databases.

    The research was supported by the Office of Naval Research's Basic
    Research Challenge program, the National Science Foundation, the Air
    Force Office of Scientific Research and Adobe Inc.

    ========================================================================== Story Source: Materials provided by Rice_University. Original written
    by Jade Boyd. Note: Content may be edited for style and length.


    ========================================================================== Journal Reference:
    1. Benjamin Coleman, Anshumali Shrivastava. A One-Pass Private
    Sketch for
    Most Machine Learning Tasks. Submitted to arXiv, 2021 [abstract] ==========================================================================

    Link to news story: https://www.sciencedaily.com/releases/2021/11/211116103110.htm

    --- up 5 days, 2 hours, 55 minutes
    * Origin: -=> Castle Rock BBS <=- Now Husky HPT Powered! (1:317/3)