Is there software that tracks internal dependencies for CI/CD?

Jim@programming.dev · 10 days ago

Thanks for sharing. We use all pytest-style tests using pytest fixtures. I’ll keep my eyes open for memory issues when we test upgrading to python 3.12+.

Very helpful info!

Jim@programming.dev · 11 days ago

I’m most excited about the new REPL. I’m going to push for 3.13 upgrade as soon as we can (hipefully early next year). I’ve messed around with rc1 and the REPL is great.

Do you know why pytest was taking up so much RAM? We are also on 3.11 and I’m probably going to wait until 3.13 is useable for us.

Jim@programming.dev · 12 days ago

EOL for 3.8 is coming up in a few short weeks!

Jim@programming.dev · 12 days ago

So cool!! Mercury is definitely the most mysterious inner planet due to its difficulty to get a space probe there even though it’s the closest planet.

The spacecraft will arrive next year, and I can’t wait for all the Science it will uncover!

Jim@programming.dev · 16 days ago

TIL this exists

Jim@programming.dev · 17 days ago

I also like the POSIX “seconds since 1970” standard, but I feel that should only be used in RAM when performing operations (time differences in timers etc.). It irks me when it’s used for serialising to text/JSON/XML/CSV.

I’ve seen bugs where programmers tried to represent date in epoch time in seconds or milliseconds in json. So something like “pay date” would be presented by a timestamp, and would get off-by-one errors because whatever time library the programmer was using would do time zone conversions on a timestamp then truncate the date portion.

If the programmer used ISO 8601 style formatting, I don’t think they would have included the timepart and the bug could have been avoided.

Use dates when you need dates and timestamps when you need timestamps!

Jim@programming.dev · 17 days ago

Do you use it? When?

Parquet is really used for big data batch data processing. It’s columnar-based file format and is optimized for large, aggregation queries. It’s non-human readable so you need a library like apache arrow to read/write to it.

I would use parquet in the following circumstances (or combination of circumstances):

The data is very large
I’m integrating this into an analytical query engine (Presto, etc.)
I’m transporting data that needs to land in an analytical data warehouse (Snowflake, BigQuery, etc.)
Consumed by data scientists, machine learning engineers, or other data engineers

Since the data is columnar-based, doing queries like select sum(sales) from revenue is much cheaper and faster if the underlying data is in parquet than csv.

The big advantage of csv is that it’s more portable. csv as a data file format has been around forever, so it is used in a lot of places where parquet can’t be used.

Jim@programming.dev · 24 days ago

They’re asking for TV manufacturers to block a VPN app in the TV. Not to block VPN in general.

Jim@programming.dev · 1 month ago

Python does not follow semver.

https://docs.python.org/3/faq/general.html#how-does-the-python-version-numbering-scheme-work

Jim@programming.dev · 1 month ago

Password managers support passkeys.

Jim@programming.dev · 1 month ago

If you are being intentional about its use, then you can get a lot out of it. But for some, maybe even most, YouTube is a distraction.

Jim@programming.dev · 1 year ago

Is there software that tracks internal dependencies for CI/CD?

Jim@programming.dev · 1 year ago

Story time:

There was a long data pipeline that produced wrong results. The wrong results were subtle but reproducible. Each run was about an hour long in dev, and there was no intermediate data set. It takes some input, runs for an hour, and produces an output.

The code was inherited and was a bit of a mess. Instead of digging through the code, I re-ran the pipeline through from about 6 months ago when we knew there was know bug. It was about 100+ commits since that time.

Mind you, the bug could’ve been anywhere in the codebase as far as I was concerned.

Took about a day of git bisect to narrow it down… to nothing. I found out that running code from the first commit from 6 months ago also produced incorrect data. Oops. That’s weird though because the code was running correctly back then.

A few days of debugging later, and I eventually found the culprit: a dependency package got bumped a couple weeks back. Some sort of esoteric parser had a bug but didn’t fail. It incorrectly parsed some data after the bump. Going back a version fixed the bug.

So yeah, git bisect killed about a day of my time.

Jim@programming.dev · 1 year ago

The code in the community’s banner is in python 2. Can we get that changed?

Jim

Is there software that tracks internal dependencies for CI/CD?

Is there software that tracks internal dependencies for CI/CD?