SQLite in Python
-
@HardwareGeek said in SQLite in Python:
it's now chewing on a lot of data.
File 987 of 3235. Um, how many days is this going to take?
-
@HardwareGeek said in SQLite in Python:
@HardwareGeek said in SQLite in Python:
it's now chewing on a lot of data.
File 987 of 3235. Um, how many days is this going to take?
That depends, do you need to
update XCodeinstall Windows updatesreboot in the middle?
-
@izzion said in SQLite in Python:
do you need to update XCode
Oh, hell no!
install Windows updates
Not voluntarily.
-
@izzion said in SQLite in Python:
install Windows updates
There are updates available. If you do not want to install them now, I will install them tonight when nobody is using this computer.
-
The .db file is up to 100 MB. It's almost half done.
-
File 1618 of 3235. Officially past the half-way point, at least in terms of files, not necessarily half the data.
-
@Bulb said in SQLite in Python:
not sure where the C3 part came from
Wikipedia says: "consistent with 3 properties"
classes that were not designed with multiple inheritance in mind can break if you multiply inherit them.
A result of the OOP craze was people carelessly inheriting from anything that was never meant to serve as a base class.
In other news, monkey patching is fragile, too.
-
@topspin said in SQLite in Python:
In other news, monkey patching is fragile, too.
If you do weird things, stuff breaks. News at 11.
-
File 2245 of 3235. Less than 1000 to go.
-
C'est fini.
5155m23.217s — 3 days 13 hours 55 minutes
.db file is 200 MB.
Typical query with 3-way join is ~5 seconds, which is quite usable for my purposes.
Success!
Just one more question:
Each row (name) in the database has a column (count) for the number of times that name was found in the source data. I would like to retrieve random rows (easy,ORDER BY RANDOM()
) with the probability of selecting a row (after meeting other filtering criteria) proportional to its count value. How?
-
@HardwareGeek Generate cumulative sums for an ordering (any will do)
as a GENERATED STORED columnin your Python code and write it back (as GENERATED columns can't reference other rows), index on that, and then do a query to find the row which brackets a suitable random integer? You might be able to hoist the computation into SQLite via a suitable query.