Astera Institute Berkeley, CA

First name

Last name

Password (6+ characters)

By clicking Agree & Join, you agree to the LinkedIn User Agreement, Privacy Policy and Cookie Policy.

Senior sys admin / site reliability engineer

Astera Institute Berkeley, CA

1 week ago

Be among the first 25 applicants

See who Astera Institute has hired for this role

First name

Last name

Password (6+ characters)

By clicking Agree & Join, you agree to the LinkedIn User Agreement, Privacy Policy and Cookie Policy.

About Astera

Jed McCaleb founded the Astera Institute as a non-profit dedicated to developing high leverage technologies that can lead to massive returns for humanity.

About Obelisk

Obelisk is the Artificial General Intelligence (AGI) lab at Astera. Obelisk’s mission is to produce AGI in a safe, socially beneficial way. We are focusing on different problems and different approaches than some other AGI efforts. In particular we are focused on the following problems:

How does an agent continuously adapt to a changing environment and incorporate new information?

In a complicated stochastic environment with sparse rewards, how does an agent associate rewards with the correct set of actions that led to those rewards?

How does higher level planning arise?

What we're looking for

We’re looking for a system administrator / site reliability engineer (SRE) who will be in charge of the low level systems that we use to do our machine learning research. We use a large number of GPUs to run experiments of various sizes. We need someone to make that infrastructure performant, reliable, efficient, and secure.

We’re currently using the following technologies, but as our first and only SRE, you would be free to change most of this:

Bare-metal servers running Ubuntu, configured via Ansible.
Some of our servers are on-prem, some are rented from a specialty provider of GPU servers.
Clusters running Kubernetes, deployed via Ansible (Kubespray).
We run various services including self-hosted GitHub runners.
Our machine learning training uses Ray for multi-node jobs.
Tailscale for VPN / secure access.
Google Workspace for SSO.

Your Responsibilities:

Network administration: make it fast, easy, and secure for us to connect to our clusters.
Kubernetes cluster management: make sure our clusters and all the workloads we run on them are reliable and easy to use.
Information security: make sure everything we do is secure.

Basic Qualifications:

5 years relevant experience in domains such as Linux server administration, networking, information security, or Kubernetes administration.

Preferred Qualifications:

Experience running a bare-metal Kubernetes cluster
Deep knowledge of networking (TCP/IP, NAT, firewalls, VLANs)
Familiarity with Tailscale

Location

You will be required to be in the office in Berkeley, California at least once per week because we have our own hardware on-premise. Beyond that, most work can be done remotely, but you must be available during normal Pacific business hours.

Why work here?

Plenty of funding and computers.
Trying to advance the state of the art in AI, which requires facing fascinating technical problems.
Small focus. Other places (e.g., DeepMind) are doing research into lots of problems simultaneously, or are doing research and building products (e.g. Anthropic). We are completely focused on a small set of problems.
Small. This has benefits and disadvantages, but a huge advantage is less communication overhead and bureaucracy. This makes work faster and more fun.
No outside funding means there’s no pressure to chase trends or make products.

Compensation Range: $150K - $300K

Seniority level
Mid-Senior level
Employment type
Full-time
Job function
Information Technology
Industries
Research Services

Referrals increase your chances of interviewing at Astera Institute by 2x

See who you know

Get notified about new Senior System Administrator jobs in Berkeley, CA.

Similar jobs

Lead Site Reliability Engineer

Lead Site Reliability Engineer

Arta Finance

Mountain View, CA 17 hours ago
SRE Lead

SRE Lead

TekSalt Solutions

United States 1 month ago
Associate SRE

Associate SRE

Beacon Hill

Milwaukee, WI 2 days ago
Site Reliability Engineer Lead/Architect/ San Leandro, CA and Fremont, CA (HYBRID)/ Contract & FullTime (12+ YEARS)

Site Reliability Engineer Lead/Architect/ San Leandro, CA and Fremont, CA (HYBRID)/ Contract & FullTime (12+ YEARS)

Software Technology Inc.

Fremont, CA 1 month ago
AVP, SRE

AVP, SRE

LPL Financial

Fort Mill, SC 2 weeks ago
Site Reliability Engineering (SRE) Manager

Site Reliability Engineering (SRE) Manager

Unreal Staffing, Inc

Seattle, WA 1 week ago
Site Reliability Engineering (SRE) Manager

Site Reliability Engineering (SRE) Manager

Unreal Staffing, Inc

San Francisco, CA 1 week ago
Site Reliability Engineering (SRE) Manager

Site Reliability Engineering (SRE) Manager

Unreal Staffing, Inc

Phoenix, AZ 1 week ago
Senior Manager SRE- Engineering

Senior Manager SRE- Engineering

GEICO

Chevy Chase, MD 1 week ago
Site Reliability Engineering (SRE) Manager

Site Reliability Engineering (SRE) Manager

Unreal Staffing, Inc

Austin, TX 1 week ago
Sr Principal SRE

Sr Principal SRE

myGwork - LGBTQ+ Business Community

Redwood City, CA 3 days ago
Site Reliability Engineering (SRE) Manager

Site Reliability Engineering (SRE) Manager

Unreal Staffing, Inc

Los Angeles, CA 1 week ago
Senior AWS DevOps Automation Engineer - REMOTE

Senior AWS DevOps Automation Engineer - REMOTE

Perficient

United States 1 week ago
Engineer Senior – Platform and Data Automation

Engineer Senior – Platform and Data Automation

Starbucks

Seattle, WA 2 weeks ago
Senior Site Reliability Engineer - TEAMS

Senior Site Reliability Engineer - TEAMS

Microsoft

Redmond, WA 1 week ago
Senior DevOps Engineer - VP - Hybrid

Senior DevOps Engineer - VP - Hybrid

Citi

Irving, TX 2 weeks ago
Senior Software Engineer, DevOps and Infrastructure Automation

Senior Software Engineer, DevOps and Infrastructure Automation

NVIDIA

Santa Clara, CA 2 days ago
Senior Director of SRE

Senior Director of SRE

JPMorganChase

Jersey City, NJ 1 week ago
Lead SRE

Lead SRE

LPL Financial

Fort Mill, SC 2 weeks ago
Senior/Lead SRE

Senior/Lead SRE

Extend Information Systems Inc.

Atlanta, GA 2 months ago
Senior Security SRE

Senior Security SRE

Madison-Davis, LLC

United States 1 week ago
SRE Tech Lead

SRE Tech Lead

talentslab.io

San Francisco, CA 2 weeks ago
Sr Principal SRE

Sr Principal SRE

Oracle

United States 1 week ago
Sr. SW Engineer - CSMS/OCPP SME

Sr. SW Engineer - CSMS/OCPP SME

Donato Technologies, Inc.

Bentonville, AR 1 month ago
Sr Principal SRE

Sr Principal SRE

Oracle

Redwood City, CA 1 week ago
Senior Associate - SRE & Platform Maintenance Lead

Senior Associate - SRE & Platform Maintenance Lead

New York Life Insurance Company

New York, NY 1 week ago
Senior DevOps Engineer - Assistant Vice President

Senior DevOps Engineer - Assistant Vice President

Deutsche Bank

Cary, NC 3 months ago

Looking for a job?

Visit the Career Advice Hub to see tips on interviewing and resume writing.

View Career Advice Hub