Cookie Consent by Free Privacy Policy Generator ๐Ÿ“Œ First impressions: GPU + GCP Batch

๐Ÿ  Team IT Security News

TSecurity.de ist eine Online-Plattform, die sich auf die Bereitstellung von Informationen,alle 15 Minuten neuste Nachrichten, Bildungsressourcen und Dienstleistungen rund um das Thema IT-Sicherheit spezialisiert hat.
Ob es sich um aktuelle Nachrichten, Fachartikel, Blogbeitrรคge, Webinare, Tutorials, oder Tipps & Tricks handelt, TSecurity.de bietet seinen Nutzern einen umfassenden รœberblick รผber die wichtigsten Aspekte der IT-Sicherheit in einer sich stรคndig verรคndernden digitalen Welt.

16.12.2023 - TIP: Wer den Cookie Consent Banner akzeptiert, kann z.B. von Englisch nach Deutsch รผbersetzen, erst Englisch auswรคhlen dann wieder Deutsch!

Google Android Playstore Download Button fรผr Team IT Security



๐Ÿ“š First impressions: GPU + GCP Batch


๐Ÿ’ก Newskategorie: Programmierung
๐Ÿ”— Quelle: dev.to

Weihao and I have been working on programmatic benchmarks for DeepCell on Google Batch.

We tried Vertex AI custom training jobs but ran into an issue with service accounts. It appears that the training job ran on the expected(?) service account, but in an unexpected project. We didn't track down how to give that project's user access to BigQuery. We also figured that we may want to run the container a little closer to the metal (not a VM though).

Enter Google Batch โ€ฆ I've used Batch-like products but never with a GPU. Initial work often looks like a lot of red failures ๐Ÿฅฒ

Screenshot of the Batch jobs list. Mostly failures, some successes.

First impressions:

1: BigQuery rate limit

I forgot BigQuery has a fairly low rate limit (5 ops per 10 seconds). So a batch of 10 finishing too close would overwhelm the table update. Quick fix with retry logic.

2: GPU scarcity

We've had bad luck getting GPUs. The zone reports exhausted resource pools on the regular:

Screenshot of an error message showing that the GCE resource pool is exhausted for the zone.

We ran into a surprising quota issue as well, running out of persistent disk SSDs โ€“ย even though we weren't using anyโ€ฆ

Screenshot of error message showing inadequate SSD quota: limit 500, usage 480, wanted 30.

The quota page showed the usage going up and down (again, we never observed any disks in the GCE console):

Screenshot of the quota visualization showing

You can (kinda) see it trying different availability zones within region us-central1 here:

Screenshot of the monitoring metric for allocated quota, showing several availability zones summing up to an overall regional usage.

We tried increasing the quota to 1TB (from 500 GB). No luck so far: no resourcesโ€ฆ!

The quota goes up in increments of 30GB, one per zone resource exhaustion error. I'm guessing it's a Batch implementation detail to spin up the disks in anticipation of having a VM ready.

Fortunatelyโ€ฆ! there is no billing charge for these disks. It's nice that it only bills when it actually runs, although it's odd to use up the quota.

I've heard several reports of using GPU on Batch, but it's clear that the incantations are arcane indeed. If you know how to reliably get GPUs or have worked through these errorsโ€“ please let me know!

...



๐Ÿ“Œ First impressions: GPU + GCP Batch


๐Ÿ“ˆ 63.4 Punkte

๐Ÿ“Œ Run Your First Multi-Worker TensorFlow Training Job With GCP AI Platform


๐Ÿ“ˆ 24.68 Punkte

๐Ÿ“Œ Announcing our first GCP VRP Prize winner and updates to 2020 program


๐Ÿ“ˆ 24.68 Punkte

๐Ÿ“Œ SAP will Datentreuhรคnder bei GCP werden


๐Ÿ“ˆ 18.69 Punkte

๐Ÿ“Œ GTXiLib, MongoDB Atlas on GCP, Chrome 66 Beta & More! - TL;DR 105


๐Ÿ“ˆ 18.69 Punkte

๐Ÿ“Œ Google I/O 2018, Learning Representations ML Conf, & more from the GCP! - TL;DR 110


๐Ÿ“ˆ 18.69 Punkte

๐Ÿ“Œ .app from Google Registry, Firebase at WWDC, & GCP updates for app developers - TL;DR 112


๐Ÿ“ˆ 18.69 Punkte

๐Ÿ“Œ Google Updates Cloud Firestore NoSQL Database Beta for GCP


๐Ÿ“ˆ 18.69 Punkte

๐Ÿ“Œ USN-3443-3: Linux kernel (GCP) vulnerability


๐Ÿ“ˆ 18.69 Punkte

๐Ÿ“Œ Actions on Google, Kotlin momentum for Android, GCP Asset Inventory, & Gmail Delegation


๐Ÿ“ˆ 18.69 Punkte

๐Ÿ“Œ So I wanted to print to my GCP (Brother HL series) from not only my Chromebook, but from my Linux Mint laptop. WTF?


๐Ÿ“ˆ 18.69 Punkte

๐Ÿ“Œ Cloud Data Exfiltration via GCP Storage Buckets and How to Prevent It


๐Ÿ“ˆ 18.69 Punkte

๐Ÿ“Œ Ask /r/linux: How do you secure and harden your Linux bastion hosts in the public cloud (AWS, GCP, Azure, DO, Linode etc)?


๐Ÿ“ˆ 18.69 Punkte

๐Ÿ“Œ USN-3468-3: Linux kernel (GCP) vulnerabilities


๐Ÿ“ˆ 18.69 Punkte

๐Ÿ“Œ USN-3484-3: Linux kernel (GCP) vulnerability


๐Ÿ“ˆ 18.69 Punkte

๐Ÿ“Œ New Google Play Console Data, GCP Database Options, Chrome 76, & more!


๐Ÿ“ˆ 18.69 Punkte

๐Ÿ“Œ รœber 34 Mio. Schwachstellen in AWS, Azure und GCP


๐Ÿ“ˆ 18.69 Punkte

๐Ÿ“Œ USN-3507-2: Linux kernel (GCP) vulnerabilities


๐Ÿ“ˆ 18.69 Punkte

๐Ÿ“Œ Android NDK r21, Security Health Analytics for GCP, Bazel 1.0


๐Ÿ“ˆ 18.69 Punkte

๐Ÿ“Œ Google Cloud Platform (GCP) Security Best Practices


๐Ÿ“ˆ 18.69 Punkte

๐Ÿ“Œ Hashicorp Vault GCP IAM Integration Authentication Bypass


๐Ÿ“ˆ 18.69 Punkte

๐Ÿ“Œ Getting Started with Distributed TensorFlow on GCP


๐Ÿ“ˆ 18.69 Punkte

๐Ÿ“Œ Gmail, GCP, Youtube: Service-Quota von 0 fรผhrte zu massivem Google-Ausfall


๐Ÿ“ˆ 18.69 Punkte

๐Ÿ“Œ Persistent GCP backdoors with Googleโ€™s Cloud Shell


๐Ÿ“ˆ 18.69 Punkte

๐Ÿ“Œ Security features on Google Cloud Platform (GCP)


๐Ÿ“ˆ 18.69 Punkte

๐Ÿ“Œ GUI Linux on GCP/AWS/Azure


๐Ÿ“ˆ 18.69 Punkte

๐Ÿ“Œ Lateral Movement & Privilege Escalation in GCP; Compromise Organizations without Dropping an Implant


๐Ÿ“ˆ 18.69 Punkte

๐Ÿ“Œ Google Cloud launches Mission Critical Services for GCP


๐Ÿ“ˆ 18.69 Punkte

๐Ÿ“Œ Thinking of building a more user friendly alternative to cron and GCP cloud scheduler for cloud cron jobs? Does it sound useful?


๐Ÿ“ˆ 18.69 Punkte

๐Ÿ“Œ heise-Angebot: iX-Workshop: Public Clouds im Vergleich โ€“ AWS, Azure, GCP, Telekom, Hetzner


๐Ÿ“ˆ 18.69 Punkte

๐Ÿ“Œ heise-Angebot: iX-Workshop: Public Clouds von AWS, Azure, GCP, Telekom, Hetzner im Vergleich


๐Ÿ“ˆ 18.69 Punkte

๐Ÿ“Œ heise-Angebot: iX-Workshop: Public Clouds von AWS, Azure, GCP, Telekom, Hetzner im Vergleich


๐Ÿ“ˆ 18.69 Punkte

๐Ÿ“Œ CVE-2022-1941 | protobuf-python/protobuf-cpp ProtocolBuffers resource consumption (GCP-2022-019)


๐Ÿ“ˆ 18.69 Punkte

๐Ÿ“Œ GCP CUD: Are There Better Ways to Save Up the Cloud?


๐Ÿ“ˆ 18.69 Punkte

๐Ÿ“Œ Deploying an Infrastructure as Code Project in GCP Using Terraform.


๐Ÿ“ˆ 18.69 Punkte











matomo